No support for NVidia 5000 series?

Message boards : Number crunching : No support for NVidia 5000 series?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Orange Kid

Send message
Joined: 12 Apr 17
Posts: 4
Credit: 54,437,485
RAC: 4,505
Message 3873 - Posted: 6 Jun 2025, 16:21:12 UTC

Tasks are all failing.


<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
GPU Summary String = [CUDA|NVIDIAGeForceRTX5070Ti|1|15817MB|57099|300].
Loading GPU lookup table from file.
GPU was not found in the lookup table. Using default values:

numBlocks = 1024.
threadsPerBlock = 32.
polyBufferSize = 32768.
Setting GPU device number 0.
Cuda initialization was successful.
CHECKPOINT_FILE = wu_sf6_DS-15x11_Grp7020780of13107200_checkpoint.
Checkpoint Flag = 0.
Reading file ../../projects/numberfields.asu.edu_NumberFields/sf6_DS-15x11_Grp7020780of13107200.dat
K = x^2 - 10
S = [2, 5]
Disc Bound = 320000000000000
Skip = (P^3)*(Q^9)
Num Congruences = 32
SCALE = 1.000000
|dK| = 40
Signature = [2,0]
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_sf6_DS-15x11_Grp7020780of13107200_0_r315192737_0
Now starting the targeted Martinet search:
Num Cvecs = 32.
Doing Cvec 1.
Error code 700: an illegal memory access was encountered file polDiscTest_gpuCuda.cu line 2213.
polDisc Test had an error. Aborting.


</stderr_txt>
]]>
ID: 3873 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1406
Credit: 732,666,592
RAC: 817,352
Message 3874 - Posted: 6 Jun 2025, 18:04:22 UTC - in response to Message 3873.  

I checked the database and there are successful results for 5000 series cards, but they are all with the Windows OpenCL version of the app.

It looks like you are the only one running the linux CUDA version. CUDA is supposed to be backwards compatible, unless they recently changed that with the 5000 series cards. Any chance your Nvidia driver is outdated?
ID: 3874 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1406
Credit: 732,666,592
RAC: 817,352
Message 3875 - Posted: 6 Jun 2025, 18:18:03 UTC - in response to Message 3874.  

It looks like you are the only one running the linux CUDA version. CUDA is supposed to be backwards compatible, unless they recently changed that with the 5000 series cards. Any chance your Nvidia driver is outdated?

I take that back. There is now a 2nd user with a 5070Ti with the same error as you. So it's not just you...
ID: 3875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Orange Kid

Send message
Joined: 12 Apr 17
Posts: 4
Credit: 54,437,485
RAC: 4,505
Message 3876 - Posted: 6 Jun 2025, 20:32:33 UTC - in response to Message 3875.  

Using the 570.99 driver, Mint 22.1
Two other computers are the same except a 4070ti super in one and a 3070ti in the other. Those two a fine.
ID: 3876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1406
Credit: 732,666,592
RAC: 817,352
Message 3877 - Posted: 6 Jun 2025, 21:34:12 UTC - in response to Message 3876.  

I found this in the Nvidia docs:
The NVIDIA Blackwell GPU architecture retains and extends the same CUDA programming model provided by previous NVIDIA GPU architectures such as NVIDIA Ampere and Turing architectures

So the app should be compatible with the newer 5000 series GPUs. It's possible that the code is ignoring one of the stricter rules, and the newer CUDA implementation is now enforcing it, causing the error. When I find some time in the next couple weeks I will try to rebuild the code with the latest CUDA library.
ID: 3877 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile SerVal

Send message
Joined: 1 Jan 20
Posts: 15
Credit: 4,735,400
RAC: 29,460
Message 3911 - Posted: 8 Aug 2025, 19:50:33 UTC

No support for NVidia 5000 series?
- looks like only partially. :)

The tenth attempt to install
Video card NVIDIA GeForce RTX 5060 Ti MSI OC 16Gb was not successful. :(
Operating system Windows Server 2016 or 2019. Unable to install display driver.
So, the purchased card has been idle for 3 days already.
(GeForce GTX 460 and GTX980 is OK).
ID: 3911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 4 Jan 25
Posts: 28
Credit: 120,425,194
RAC: 554,396
Message 3912 - Posted: 9 Aug 2025, 2:18:21 UTC - in response to Message 3911.  

Video card NVIDIA GeForce RTX 5060 Ti MSI OC 16Gb was not successful. :(
Operating system Windows Server 2016 or 2019. Unable to install display driver.
That is an OS/driver issue, not a project issue.

Win Server 2019/2022 driver is v539.41
Win Server 2016 is not supported.
Grant
Darwin NT, Australia.
ID: 3912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1406
Credit: 732,666,592
RAC: 817,352
Message 3913 - Posted: 9 Aug 2025, 5:25:57 UTC

I just poked around and found a bunch of successful results with 5000 series cards, such as this host:
https://numberfields.asu.edu/NumberFields/results.php?hostid=2899985

So I'm guessing that hosts with computation errors could have a driver problem.
ID: 3913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile SerVal

Send message
Joined: 1 Jan 20
Posts: 15
Credit: 4,735,400
RAC: 29,460
Message 3914 - Posted: 9 Aug 2025, 14:08:36 UTC - in response to Message 3912.  

That is an OS/driver issue, not a project issue.
Yes, but the result is the same. The server is idle. :(
====
Another attempt to install display drivers on
GeForce RTX 5060. Windows Server 2019. No success.
Grant (SSSF), Thank you!
====
ID: 3914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 4 Jan 25
Posts: 28
Credit: 120,425,194
RAC: 554,396
Message 3915 - Posted: 9 Aug 2025, 22:48:24 UTC - in response to Message 3913.  

I just poked around and found a bunch of successful results with 5000 series cards, such as this host:
https://numberfields.asu.edu/NumberFields/results.php?hostid=2899985

So I'm guessing that hosts with computation errors could have a driver problem.
Very likely.

From the Task Stderr_output it looks like the driver version is 575.64 on his system- GNOME 48 (Flatpak runtime) [6.15.9-gentoo-dist|libc 2.40]
So at least you know what works with the RTX 5000 series and CUDA.

Get the details of a few of the others that work & see what's common between them, do the same for those that aren't working.
Might give a quick (or much quicker) idea of where the issue lies.




FYI- not sure if you are aware of an issue with latest BOINC version 8.2.4- on this project and a couple of others, the Remaining (estimated) time no longer updates as work is returned and new Tasks issued. It remains at whatever value it had prior to upgrading.
Going back to the previous version, it works again. Apparently on at least one project, it doesn't freeze and continue to work as it should even with the new version.
Grant
Darwin NT, Australia.
ID: 3915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1406
Credit: 732,666,592
RAC: 817,352
Message 3916 - Posted: 10 Aug 2025, 0:46:20 UTC - in response to Message 3915.  

So there are only 2 linux hosts with a 5000 series card (database only keeps last 5 days of results). The one mentioned earlier with successful results and this one with failures:
https://numberfields.asu.edu/NumberFields/results.php?hostid=2900384

So there's not much information to go on. I will point out that there are many windows hosts with 5000 series cards that are working very well. So if there's a problem it's with the linux Cuda verison.
ID: 3916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 4 Jan 25
Posts: 28
Credit: 120,425,194
RAC: 554,396
Message 3917 - Posted: 10 Aug 2025, 2:51:04 UTC - in response to Message 3916.  
Last modified: 10 Aug 2025, 3:18:36 UTC

So there are only 2 linux hosts with a 5000 series card (database only keeps last 5 days of results). The one mentioned earlier with successful results and this one with failures:
https://numberfields.asu.edu/NumberFields/results.php?hostid=2900384


Driver 575.64, GNOME 48 (Flatpak runtime) [6.8.0-71-generic|libc 2.40]

Same Video driver, same libcurl version. And while it is also GNOME 48, the working one is based on 6.15.9-gentoo-dist, the failing one 6.8.0-71-generic

process exited with code 1 (0x1, -255)

Setting GPU device number 0.
Cuda Error: failed to set the device.
Error: Failed to initialize Cuda.


So the issue appears to be with how the new hardware, with the same driver, interacts with the OS/Kernel. It works on one OS/Kernel version, but not on another.
Grant
Darwin NT, Australia.
ID: 3917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1406
Credit: 732,666,592
RAC: 817,352
Message 3918 - Posted: 10 Aug 2025, 4:11:53 UTC - in response to Message 3917.  

So there are only 2 linux hosts with a 5000 series card (database only keeps last 5 days of results). The one mentioned earlier with successful results and this one with failures:
https://numberfields.asu.edu/NumberFields/results.php?hostid=2900384


Driver 575.64, GNOME 48 (Flatpak runtime) [6.8.0-71-generic|libc 2.40]

Same Video driver, same libcurl version. And while it is also GNOME 48, the working one is based on 6.15.9-gentoo-dist, the failing one 6.8.0-71-generic

process exited with code 1 (0x1, -255)

Setting GPU device number 0.
Cuda Error: failed to set the device.
Error: Failed to initialize Cuda.


So the issue appears to be with how the new hardware, with the same driver, interacts with the OS/Kernel. It works on one OS/Kernel version, but not on another.

More importantly, the error is that it can't "set the device", so it's probably some kind of environment problem, maybe a permission or SElinux thing.
ID: 3918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 4 Jan 25
Posts: 28
Credit: 120,425,194
RAC: 554,396
Message 3919 - Posted: 10 Aug 2025, 6:43:57 UTC - in response to Message 3918.  

More importantly, the error is that it can't "set the device", so it's probably some kind of environment problem, maybe a permission or SElinux thing.
It's trying to set Device number 0, but the OS considers it Device number 1?
Something, somewhere is confused.
Grant
Darwin NT, Australia.
ID: 3919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DKlimax

Send message
Joined: 8 Jun 23
Posts: 18
Credit: 34,813,334
RAC: 119,827
Message 3920 - Posted: 12 Aug 2025, 14:41:27 UTC - in response to Message 3919.  

More importantly, the error is that it can't "set the device", so it's probably some kind of environment problem, maybe a permission or SElinux thing.
It's trying to set Device number 0, but the OS considers it Device number 1?
Something, somewhere is confused.

Maybe BOINC detection of GPUs. I'd check BOINC logs on startup.
ID: 3920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DKlimax

Send message
Joined: 8 Jun 23
Posts: 18
Credit: 34,813,334
RAC: 119,827
Message 3921 - Posted: 12 Aug 2025, 14:43:08 UTC - in response to Message 3914.  

That is an OS/driver issue, not a project issue.
Yes, but the result is the same. The server is idle. :(
====
Another attempt to install display drivers on
GeForce RTX 5060. Windows Server 2019. No success.
Grant (SSSF), Thank you!
====

IIRC Nvidia doesn't like when one uses consumer GPUs on server and setup will block it. Did you try manual driver installation?
ID: 3921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile SerVal

Send message
Joined: 1 Jan 20
Posts: 15
Credit: 4,735,400
RAC: 29,460
Message 3922 - Posted: 12 Aug 2025, 21:16:09 UTC - in response to Message 3921.  

Ребятки, спасибо за отклик. Сейчас ситуация такова:
установил Windows Server 2019 с нуля,
установил Boinc Manager,
установил драйвер дисплея с сайта nVidia.
Всё вручную. Всё заработало, за исключением сети Интернет.
То есть, внутри всё работает, но наружу доступа нет :(

Перебираю драйверы сетевых карт, но пока безуспешно.
Драйвер чипсета установить не удалось. (может быть в этом всё дело).
В общем, Хуангу лучше мне на глаза не показываться.

Всем привет и хорошего настроения.
FurMark и Cinebench - тоже радуют. :)
ID: 3922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 1 Feb 17
Posts: 24
Credit: 63,160,377
RAC: 36,035
Message 3923 - Posted: 13 Aug 2025, 21:16:10 UTC

ID: 3923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1406
Credit: 732,666,592
RAC: 817,352
Message 3925 - Posted: 14 Aug 2025, 4:33:08 UTC - in response to Message 3923.  

My 5080 does not work either
https://numberfields.asu.edu/NumberFields/result.php?resultid=263000644

That looks like the original error that started this thread. I notice that both use the same driver 570.99.

Next time you update the driver, can you try it again to see if the problem persists?
ID: 3925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 14 May 23
Posts: 16
Credit: 266,870,812
RAC: 498,860
Message 3926 - Posted: 15 Aug 2025, 19:23:25 UTC - in response to Message 3925.  

Just to make everyone clear, what's reported in the post is not the actual driver version number. That is just what the Boinc client reports since the client is unable to report Nvidia driver version numbers past two decimal units. All drivers with xxx.xxx format will report as xxx.99

To see the real version number use the Nvidia software or dmesg or journal logs to find the correct and full version number.
ID: 3926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : No support for NVidia 5000 series?


Main page · Your account · Message boards


Copyright © 2025 Arizona State University