GPU Error

Message boards : Number crunching : GPU Error
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Rantanplan

Send message
Joined: 17 Sep 18
Posts: 18
Credit: 1,908,082
RAC: 270
Message 2264 - Posted: 25 Mar 2019, 17:35:44 UTC - in response to Message 2262.  

ok, see no quick way to install cuda 10.1 , version 9.1 is latest on Ubuntu 18.10.

ok, i'll try to install then Cuda 8 , very later this evening, maybe tomorrow and see.


Driver 418 is available if you add the nvidia repository.


Almost installed before, but have not worked yet.
ID: 2264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rantanplan

Send message
Joined: 17 Sep 18
Posts: 18
Credit: 1,908,082
RAC: 270
Message 2273 - Posted: 25 Mar 2019, 20:34:22 UTC

ok, i tried everything, than changing the gpu, thats next,
i hope the GTX isn't trash allready.
ID: 2273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,388,362
RAC: 128,271
Message 2276 - Posted: 26 Mar 2019, 0:10:05 UTC - in response to Message 2258.  

ok, system log shows : API 7.5 will be used for the app.

Mean that, that CUDA 7.5 will be used ?
The app was built with cuda version 10.1 I believe.
API 7.5 sounds like a BOINC version number. That has no effect whatsoever on the CUDA verssion needed or used.
ID: 2276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rantanplan

Send message
Joined: 17 Sep 18
Posts: 18
Credit: 1,908,082
RAC: 270
Message 2285 - Posted: 26 Mar 2019, 6:21:04 UTC - in response to Message 2276.  
Last modified: 26 Mar 2019, 7:18:39 UTC

No, it's version 7.12.0.

I think it's a hardware issue. No changes after gpu swaping.
Next it will try another old Intel mainboard + nvidia gpu.

Can't believe what all i have to check an do for so much
available Boinc points to get.

-----

edit: Runs great on seti@home (cuda 6.0), but NF no changes.
I'll give up.
ID: 2285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rantanplan

Send message
Joined: 17 Sep 18
Posts: 18
Credit: 1,908,082
RAC: 270
Message 2287 - Posted: 26 Mar 2019, 10:26:37 UTC

fully misstake.

Tried to install the cuda*.run package, guesses ?

Even i marked the install of the cuda toolkit,
the setup doesn't install an path the "lib64" folder for
the needed libraries and config files.

The manually setup don't work for me,
Keyboard and mouse were uninstalled
for unknown reason. I've had it all.
ID: 2287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,745,178
RAC: 288,337
Message 2292 - Posted: 26 Mar 2019, 15:06:27 UTC - in response to Message 2287.  

fully misstake.

Tried to install the cuda*.run package, guesses ?

Even i marked the install of the cuda toolkit,
the setup doesn't install an path the "lib64" folder for
the needed libraries and config files.

The manually setup don't work for me,
Keyboard and mouse were uninstalled
for unknown reason. I've had it all.


Sorry for all your troubles... sounds frustrating.
ID: 2292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jonathan Brier
Avatar

Send message
Joined: 13 Sep 11
Posts: 1
Credit: 24,790,823
RAC: 134,706
Message 2324 - Posted: 29 Mar 2019, 14:51:51 UTC

I'm also seeing a few tasks with memory access issues.

Now starting the targeted Martinet search:
Error code 700: an illegal memory access was encountered file polDiscTest_gpuCuda.cu line 415.
polDisc Test had an error. Aborting.


Task with error: https://numberfields.asu.edu/NumberFields/result.php?resultid=40204463

This is Nvidia 418.54 with 980m GPU and application version: Get Decic Fields v3.01 (cuda30)
x86_64-pc-linux-gnu

There are successful tasks too on the same system: https://numberfields.asu.edu/NumberFields/result.php?resultid=40234951
ID: 2324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,745,178
RAC: 288,337
Message 2325 - Posted: 29 Mar 2019, 15:38:15 UTC - in response to Message 2324.  

I'm also seeing a few tasks with memory access issues.

Task with error: https://numberfields.asu.edu/NumberFields/result.php?resultid=40204463

This is Nvidia 418.54 with 980m GPU and application version: Get Decic Fields v3.01 (cuda30)
x86_64-pc-linux-gnu

There are successful tasks too on the same system: https://numberfields.asu.edu/NumberFields/result.php?resultid=40234951


Thanks for reporting. It looks like you have a couple tasks with that same error.

I have seen some of these before on other hosts. When I rerun them with cuda-memcheck they finish successfully with no errors. But of course I have a different GPU, so maybe it's something with that particular compute capability. I'm still trying to figure it out. At least it doesn't happen too often.
ID: 2325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stiwi

Send message
Joined: 13 Mar 19
Posts: 10
Credit: 32,374,387
RAC: 32,574
Message 2326 - Posted: 29 Mar 2019, 21:09:35 UTC - in response to Message 2325.  

I'm also seeing a few tasks with memory access issues.

Task with error: https://numberfields.asu.edu/NumberFields/result.php?resultid=40204463

This is Nvidia 418.54 with 980m GPU and application version: Get Decic Fields v3.01 (cuda30)
x86_64-pc-linux-gnu

There are successful tasks too on the same system: https://numberfields.asu.edu/NumberFields/result.php?resultid=40234951


Thanks for reporting. It looks like you have a couple tasks with that same error.

I have seen some of these before on other hosts. When I rerun them with cuda-memcheck they finish successfully with no errors. But of course I have a different GPU, so maybe it's something with that particular compute capability. I'm still trying to figure it out. At least it doesn't happen too often.


I also have 2 tasks with this error.
The tasks got this error when I restarted my computer while the Wu was running.
Maybe this appeared when the tasks where interrupted at an unfortunate time.
ID: 2326 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rantanplan

Send message
Joined: 17 Sep 18
Posts: 18
Credit: 1,908,082
RAC: 270
Message 2327 - Posted: 29 Mar 2019, 21:34:01 UTC
Last modified: 29 Mar 2019, 21:43:32 UTC

I got a little sneaky solution for me.

I simply deleted this files as root:

/etc/OpenCL/mesa.icd and intel.icd

reboot. It's getting nomore errors.

At administration:
Could u send me some GPU-wus , please ?
It always "says" job cache full.
ID: 2327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 1 Feb 17
Posts: 23
Credit: 52,585,281
RAC: 26,424
Message 2328 - Posted: 29 Mar 2019, 22:02:18 UTC - in response to Message 2327.  

I got a little sneaky solution for me.

I simply deleted this files as root:

/etc/OpenCL/mesa.icd and intel.icd

reboot. It's getting nomore errors.

At administration:
Could u send me some GPU-wus , please ?
It always "says" job cache full.


That's the BOINC client saying job cache is full so it won't contact any project server for more work.
ID: 2328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : GPU Error


Main page · Your account · Message boards


Copyright © 2024 Arizona State University