Message boards :
Number crunching :
No support for NVidia 5000 series?
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Apr 17 Posts: 4 Credit: 54,437,485 RAC: 4,505 ![]() ![]() |
Tasks are all failing. <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> GPU Summary String = [CUDA|NVIDIAGeForceRTX5070Ti|1|15817MB|57099|300]. Loading GPU lookup table from file. GPU was not found in the lookup table. Using default values: numBlocks = 1024. threadsPerBlock = 32. polyBufferSize = 32768. Setting GPU device number 0. Cuda initialization was successful. CHECKPOINT_FILE = wu_sf6_DS-15x11_Grp7020780of13107200_checkpoint. Checkpoint Flag = 0. Reading file ../../projects/numberfields.asu.edu_NumberFields/sf6_DS-15x11_Grp7020780of13107200.dat K = x^2 - 10 S = [2, 5] Disc Bound = 320000000000000 Skip = (P^3)*(Q^9) Num Congruences = 32 SCALE = 1.000000 |dK| = 40 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_sf6_DS-15x11_Grp7020780of13107200_0_r315192737_0 Now starting the targeted Martinet search: Num Cvecs = 32. Doing Cvec 1. Error code 700: an illegal memory access was encountered file polDiscTest_gpuCuda.cu line 2213. polDisc Test had an error. Aborting. </stderr_txt> ]]> |
![]() Send message Joined: 8 Jul 11 Posts: 1406 Credit: 732,666,592 RAC: 817,352 ![]() ![]() ![]() |
I checked the database and there are successful results for 5000 series cards, but they are all with the Windows OpenCL version of the app. It looks like you are the only one running the linux CUDA version. CUDA is supposed to be backwards compatible, unless they recently changed that with the 5000 series cards. Any chance your Nvidia driver is outdated? |
![]() Send message Joined: 8 Jul 11 Posts: 1406 Credit: 732,666,592 RAC: 817,352 ![]() ![]() ![]() |
It looks like you are the only one running the linux CUDA version. CUDA is supposed to be backwards compatible, unless they recently changed that with the 5000 series cards. Any chance your Nvidia driver is outdated? I take that back. There is now a 2nd user with a 5070Ti with the same error as you. So it's not just you... |
Send message Joined: 12 Apr 17 Posts: 4 Credit: 54,437,485 RAC: 4,505 ![]() ![]() |
Using the 570.99 driver, Mint 22.1 Two other computers are the same except a 4070ti super in one and a 3070ti in the other. Those two a fine. |
![]() Send message Joined: 8 Jul 11 Posts: 1406 Credit: 732,666,592 RAC: 817,352 ![]() ![]() ![]() |
I found this in the Nvidia docs: The NVIDIA Blackwell GPU architecture retains and extends the same CUDA programming model provided by previous NVIDIA GPU architectures such as NVIDIA Ampere and Turing architectures So the app should be compatible with the newer 5000 series GPUs. It's possible that the code is ignoring one of the stricter rules, and the newer CUDA implementation is now enforcing it, causing the error. When I find some time in the next couple weeks I will try to rebuild the code with the latest CUDA library. |
![]() Send message Joined: 1 Jan 20 Posts: 15 Credit: 4,735,400 RAC: 29,460 ![]() |
No support for NVidia 5000 series? - looks like only partially. :) The tenth attempt to install Video card NVIDIA GeForce RTX 5060 Ti MSI OC 16Gb was not successful. :( Operating system Windows Server 2016 or 2019. Unable to install display driver. So, the purchased card has been idle for 3 days already. (GeForce GTX 460 and GTX980 is OK). |
Send message Joined: 4 Jan 25 Posts: 28 Credit: 120,425,194 RAC: 554,396 ![]() ![]() |
Video card NVIDIA GeForce RTX 5060 Ti MSI OC 16Gb was not successful. :(That is an OS/driver issue, not a project issue. Win Server 2019/2022 driver is v539.41 Win Server 2016 is not supported. Grant Darwin NT, Australia. |
![]() Send message Joined: 8 Jul 11 Posts: 1406 Credit: 732,666,592 RAC: 817,352 ![]() ![]() ![]() |
I just poked around and found a bunch of successful results with 5000 series cards, such as this host: https://numberfields.asu.edu/NumberFields/results.php?hostid=2899985 So I'm guessing that hosts with computation errors could have a driver problem. |
![]() Send message Joined: 1 Jan 20 Posts: 15 Credit: 4,735,400 RAC: 29,460 ![]() |
That is an OS/driver issue, not a project issue. Yes, but the result is the same. The server is idle. :( ==== Another attempt to install display drivers on GeForce RTX 5060. Windows Server 2019. No success. Grant (SSSF), Thank you! ==== |
Send message Joined: 4 Jan 25 Posts: 28 Credit: 120,425,194 RAC: 554,396 ![]() ![]() |
I just poked around and found a bunch of successful results with 5000 series cards, such as this host:Very likely. From the Task Stderr_output it looks like the driver version is 575.64 on his system- GNOME 48 (Flatpak runtime) [6.15.9-gentoo-dist|libc 2.40] So at least you know what works with the RTX 5000 series and CUDA. Get the details of a few of the others that work & see what's common between them, do the same for those that aren't working. Might give a quick (or much quicker) idea of where the issue lies. FYI- not sure if you are aware of an issue with latest BOINC version 8.2.4- on this project and a couple of others, the Remaining (estimated) time no longer updates as work is returned and new Tasks issued. It remains at whatever value it had prior to upgrading. Going back to the previous version, it works again. Apparently on at least one project, it doesn't freeze and continue to work as it should even with the new version. Grant Darwin NT, Australia. |
![]() Send message Joined: 8 Jul 11 Posts: 1406 Credit: 732,666,592 RAC: 817,352 ![]() ![]() ![]() |
So there are only 2 linux hosts with a 5000 series card (database only keeps last 5 days of results). The one mentioned earlier with successful results and this one with failures: https://numberfields.asu.edu/NumberFields/results.php?hostid=2900384 So there's not much information to go on. I will point out that there are many windows hosts with 5000 series cards that are working very well. So if there's a problem it's with the linux Cuda verison. |
Send message Joined: 4 Jan 25 Posts: 28 Credit: 120,425,194 RAC: 554,396 ![]() ![]() |
So there are only 2 linux hosts with a 5000 series card (database only keeps last 5 days of results). The one mentioned earlier with successful results and this one with failures: Driver 575.64, GNOME 48 (Flatpak runtime) [6.8.0-71-generic|libc 2.40] Same Video driver, same libcurl version. And while it is also GNOME 48, the working one is based on 6.15.9-gentoo-dist, the failing one 6.8.0-71-generic process exited with code 1 (0x1, -255) Setting GPU device number 0. Cuda Error: failed to set the device. Error: Failed to initialize Cuda. So the issue appears to be with how the new hardware, with the same driver, interacts with the OS/Kernel. It works on one OS/Kernel version, but not on another. Grant Darwin NT, Australia. |
![]() Send message Joined: 8 Jul 11 Posts: 1406 Credit: 732,666,592 RAC: 817,352 ![]() ![]() ![]() |
So there are only 2 linux hosts with a 5000 series card (database only keeps last 5 days of results). The one mentioned earlier with successful results and this one with failures: More importantly, the error is that it can't "set the device", so it's probably some kind of environment problem, maybe a permission or SElinux thing. |
Send message Joined: 4 Jan 25 Posts: 28 Credit: 120,425,194 RAC: 554,396 ![]() ![]() |
More importantly, the error is that it can't "set the device", so it's probably some kind of environment problem, maybe a permission or SElinux thing.It's trying to set Device number 0, but the OS considers it Device number 1? Something, somewhere is confused. Grant Darwin NT, Australia. |
Send message Joined: 8 Jun 23 Posts: 18 Credit: 34,813,334 RAC: 119,827 ![]() ![]() ![]() |
More importantly, the error is that it can't "set the device", so it's probably some kind of environment problem, maybe a permission or SElinux thing.It's trying to set Device number 0, but the OS considers it Device number 1? Maybe BOINC detection of GPUs. I'd check BOINC logs on startup. |
Send message Joined: 8 Jun 23 Posts: 18 Credit: 34,813,334 RAC: 119,827 ![]() ![]() ![]() |
That is an OS/driver issue, not a project issue. IIRC Nvidia doesn't like when one uses consumer GPUs on server and setup will block it. Did you try manual driver installation? |
![]() Send message Joined: 1 Jan 20 Posts: 15 Credit: 4,735,400 RAC: 29,460 ![]() |
Ребятки, спасибо за отклик. Сейчас ситуация такова: установил Windows Server 2019 с нуля, установил Boinc Manager, установил драйвер дисплея с сайта nVidia. Всё вручную. Всё заработало, за исключением сети Интернет. То есть, внутри всё работает, но наружу доступа нет :( Перебираю драйверы сетевых карт, но пока безуспешно. Драйвер чипсета установить не удалось. (может быть в этом всё дело). В общем, Хуангу лучше мне на глаза не показываться. Всем привет и хорошего настроения. FurMark и Cinebench - тоже радуют. :) |
Send message Joined: 1 Feb 17 Posts: 24 Credit: 63,160,377 RAC: 36,035 ![]() ![]() ![]() |
My 5080 does not work either https://numberfields.asu.edu/NumberFields/result.php?resultid=263000644 |
![]() Send message Joined: 8 Jul 11 Posts: 1406 Credit: 732,666,592 RAC: 817,352 ![]() ![]() ![]() |
My 5080 does not work either That looks like the original error that started this thread. I notice that both use the same driver 570.99. Next time you update the driver, can you try it again to see if the problem persists? |
![]() ![]() Send message Joined: 14 May 23 Posts: 16 Credit: 266,870,812 RAC: 498,860 ![]() ![]() ![]() |
Just to make everyone clear, what's reported in the post is not the actual driver version number. That is just what the Boinc client reports since the client is unable to report Nvidia driver version numbers past two decimal units. All drivers with xxx.xxx format will report as xxx.99 To see the real version number use the Nvidia software or dmesg or journal logs to find the correct and full version number. |