Older GPUs not working

Message boards : Number crunching : Older GPUs not working
Message board moderation

To post messages, you must log in.

AuthorMessage
JStateson

Send message
Joined: 13 Feb 18
Posts: 3
Credit: 3,890,647
RAC: 0
Message 3472 - Posted: 1 Apr 2023, 10:36:12 UTC
Last modified: 1 Apr 2023, 10:47:22 UTC

I have seven s9000 and s9050 type GPUs. These are OpenCL 1.2 and run extremely well in Milkyway but fairly slow though useable in Einstein. Their GPU chip is same as the HD-7950 or 7970. Unaccountably, they consistently fail here in NumberFields. GPU-z shows %0 utilization the entire time the app is running. After about 4 hours they generate the following error

"C:\Users\JSTATE~1\AppData\Local\Temp\OCL27052T3.cl", line 1941: warning: 
          shift count is too large
      if( (pow2 & 0xFFFFFFFF)==0 ) { d  = 32; pow2 >>= 32; }
                                                       ^


My s9100 and s9150 are OpenCl 2.1 and run fine on this project. Their GPU chip is different.

Any possibility of getting those older boards to work? I enabled "DEBUG" and "DBG_THREAD" settings in the .h and .cl files but nothing showed up. Code probably did not get that far to print anything but I am just guessing.
ID: 3472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1323
Credit: 410,765,618
RAC: 245,570
Message 3473 - Posted: 1 Apr 2023, 18:25:48 UTC - in response to Message 3472.  

That is very strange. The code that's referenced should only be reached under CUDA (the multi-precision words are 64bits in CUDA and only 32bits in OpenCL, hence the error about the shift being too large).

The OpenCL and CUDA apps use much of the same code, and the differences are handled via preprocessor directives. Here is the best theory I have for what is happening: You have a system environment variable named "CUDA" so when your openCL compiler reaches a point in the code with the "#ifdef CUDA" directive, it ends up setting the multi-precision word larger than it should, and will also set other parameters incorrectly. If this is what's happening, I would first check your system variables for "CUDA", otherwise it could be the openCL implementation on your system that is setting the CUDA directive. On windows, I believe the openCL implementation is part of a system dll file, so not sure how to handle that. Do you know how to get the exact command line that your openCL compiler is using?
ID: 3473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JStateson

Send message
Joined: 13 Feb 18
Posts: 3
Credit: 3,890,647
RAC: 0
Message 3474 - Posted: 3 Apr 2023, 1:09:43 UTC - in response to Message 3473.  
Last modified: 3 Apr 2023, 1:21:02 UTC

That is very strange. The code that's referenced should only be reached under CUDA (the multi-precision words are 64bits in CUDA and only 32bits in OpenCL, hence the error about the shift being too large).

The OpenCL and CUDA apps use much of the same code, and the differences are handled via preprocessor directives. Here is the best theory I have for what is happening: You have a system environment variable named "CUDA" so when your openCL compiler reaches a point in the code with the "#ifdef CUDA" directive, it ends up setting the multi-precision word larger than it should, and will also set other parameters incorrectly. If this is what's happening, I would first check your system variables for "CUDA", otherwise it could be the openCL implementation on your system that is setting the CUDA directive. On windows, I believe the openCL implementation is part of a system dll file, so not sure how to handle that. Do you know how to get the exact command line that your openCL compiler is using?


I spent some time looking at this as solving puzzles is one of my bad habits plus I am retired.

Looking at the .h and .cl I see they are all functionally the same eg:

gpuMultiPrecAMD_v402.h
pdtKernelAMD_v402.cl
pdtKernel_v402.cl
gpuMultiPrec_v402.h

The app copies them down to a boinc slot and drops the "v402". The one named "amd" is no different than the nvidia one except the name (and the include name)

The following are the cpu, intel, nvidia and amd executables. It seems there is no difference between the nvidia and amd so the OpenCL library differenciate.

02/12/2023  09:56 AM        22,536,448 GetDecics_4.00_windows_x86_64
03/31/2023  02:32 PM        22,719,607 GetDecics_4.02_windows_x86_64__opencl_intel
03/31/2023  02:32 PM        22,575,516 GetDecics_4.02_windows_x86_64__opencl_nvidia
02/12/2023  09:55 AM        22,575,516 GetDecics_4.02_windows_x86_64__opencl_amd


I did find (I assume) the command line options in the sched_reply xml but they were all empty
<file_ref>
    <file_name>sf7_DS-16x11_Grp4834097of8000000.dat</file_name>
    <open_name>in</open_name>
</file_ref>
<command_line>

</command_line>
</workunit>


I tried running in standalone mode by removing the soft links and executing the AMD app in a folder with all the files it needed (that I could guess at)
04/02/2023  07:08 PM    <DIR>          .
04/02/2023  07:08 PM    <DIR>          ..
04/02/2023  06:13 PM               349 gpuLookupTable.txt
04/02/2023  06:13 PM            34,028 gpuMultiPrecAMD.h
04/02/2023  06:13 PM             1,061 in
04/02/2023  06:31 PM             8,642 init_data.xml
04/02/2023  06:13 PM               728 mp_int.h
04/02/2023  06:36 PM                 0 out
04/02/2023  06:31 PM               124 out.1
04/02/2023  06:13 PM            76,649 pdtKernelAMD.cl
02/12/2023  09:55 AM        22,536,448 test.exe
02/12/2023  09:55 AM        22,575,516 test2.exe
              10 File(s)     45,233,545 bytes
               2 Dir(s)  1,820,312,457,216 bytes free

C:\Users\jstateson\Downloads\numbers_h110\test>test2


When I ran "test.exe" the cpu app runs and seemed to work as the iteration started.
Unfortunately "test2.exe", the amd app, did nothing whatsoever. GPU-z showed no load on the GPU.
The contents of stderr is

C:\Users\jstateson\Downloads\numbers_h110\test>type stderr.txt
19:08:37 (40056): Can't set up shared mem: -1. Will run in standalone mode.
GPU Summary String = [CAL|AMDFireProS9050|3|12288MB||102].
Loading GPU lookup table from file.
GPU was not found in the lookup table.  Using default values:
  numBlocks = 1024.
  threadsPerBlock = 32.
  polyBufferSize = 32768.


Anyway, this is as far as I got. I never figured out how the xml file got created. If I run the app without the xml file I get an error and the app closes
C:\Users\jstateson\Downloads\numbers_h110\test>test2

C:\Users\jstateson\Downloads\numbers_h110\test>type stderr.txt
20:04:42 (66488): Can't open init data file - running in standalone mode
GPU Summary String = .
Loading GPU lookup table from file.
GPU was not found in the lookup table.  Using default values:
  numBlocks = 1024.
  threadsPerBlock = 32.
  polyBufferSize = 32768.
20:04:42 (66488): Can't open init data file - running in standalone mode
Error: Failed to obtain OpenCL device id.
Error: Failed to initialize OpenCL.


I as never able to duplicates that 31 bit error.

If you got any ideas let me know.

[edit] The AMD app runs fine on my s9100, s9150, VII and MI25 amd boards but fails on the s9000 and s9150

The s90x0 are OpenCL platform 2.1 and device 1.2
whereas the s91x0 are 2.1 and 2.0

My s9000 and s9050 work fine under OpenCl for Einstein and Milkway under windows and I have used them under linux before.
ID: 3474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1323
Credit: 410,765,618
RAC: 245,570
Message 3475 - Posted: 3 Apr 2023, 5:10:12 UTC - in response to Message 3474.  

I was unfamiliar with the AMD FirePro 9050, so I did a quick search. I think it is powerful enough to run the app, but I could be wrong.

When GPU-z shows no load, that is when openCL is compiling the code. We have seen this issue before with some of the AMD drivers, where the openCL compiler hangs indefinitely (this was discussed somewhere in previous threads, but I am too lazy to dig through it right now). I recall the app working for some users, and after updating the driver they had the same problem you are currently seeing. I should mention, I also had this problem with the stock openCL on linux - I fixed it by installing the AMD RocM driver manually.

I imagine AMD is no longer updating the driver for your older card, so you may be stuck. When I get a chance tomorrow, I will look through old results in the database to see if any other users have a working FirePro GPU, then I can ask them what driver version they have.

You said you have used these cards under linux. Does that include NumberFields? Because if it works there, then we know it is most likely the driver (to be more precise, the openCL part of the driver).
ID: 3475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JStateson

Send message
Joined: 13 Feb 18
Posts: 3
Credit: 3,890,647
RAC: 0
Message 3476 - Posted: 3 Apr 2023, 14:12:55 UTC - in response to Message 3475.  

Was running only SETI, Einstein and Milkyway in Ubuntu 18.04 Never tried Numberfields and no longer use these cards in Ubuntu due to the difficulty of rolling back to the kernel blessed by AMD.

s9000 and s9050 have the Tahiti chipset as do the HD-7970, 7950, R9 and all variations of Tahiti just have more memory and use more or less cores
The Tahiti problem was discussed before here

note the only difference is the memory and number of cores:
https://www.techpowerup.com/gpu-specs/radeon-hd-7970.c296
https://www.techpowerup.com/gpu-specs/firepro-s9000.c1879

The s9100, s9150 and Mi25 are recognized as w9100, w9150 and wx9100 respectively in windows 10 and work fine on all projects. These boards all go for under $100 USD used but need a good DIY fan.

The xml file in the slot assigned to Numberfields only recognizes the first card. That does not seem to be a problem even though there are 4 really different cards in the system with the 2.0 OpenCl devices. How is that file generated? Is there a debugging feature that can be enabled by supplying an option in the command line?


The following code should never have executed since DIGIT_BIT was defined to be 31 in an earlier include statement.

  if (DIGIT_BIT>32) {
    if( (pow2 & 0xFFFFFFFF)==0 ) { d  = 32; pow2 >>= 32; }
    }
ID: 3476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1323
Credit: 410,765,618
RAC: 245,570
Message 3477 - Posted: 3 Apr 2023, 18:25:24 UTC - in response to Message 3476.  

I queried the database for anyone with a gpu string containing "fire". Only 99 such hosts. Of those, there were only 2 with recent successful results. The first was your s9150 host and the 2nd had the following info string:
[BOINC|7.16.11][CAL|AMD FirePro W5100|1|4096MB||200][vbox|6.0.14|1|1]
I'm not sure how that host compares to yours from a performance perspective. But either way, there are no other users with a successful s9000 or s9050.

I'm not sure exactly who generates that xml file. I would guess the client generates it from data it receives from the project.

You mentioned other cards in the system. Are these other cards Nvidia by any chance? That could explain how the openCL compiler is getting confused - maybe the presence of the Nvidia card has resulted in the creation of the "CUDA" environment variable. That is the only way the 64bit code can be reached (In hind sight, I should have used a less common name).
ID: 3477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Older GPUs not working


Main page · Your account · Message boards


Copyright © 2024 Arizona State University