Message boards :
News :
New GPU OpenCL versions available
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6
Author | Message |
---|---|
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 495,411,746 RAC: 564,093 |
When you get a chance, could you look at the stderr.txt for a WU that's stuck and paste the output here (before you unstick it). That should give me an idea where in the processing it is before it hangs. Don't worry about getting the stderr output. I gathered the information from the WUs that failed. It is building the opencl code without error and doesn't hang until it gets to the GPU (which I suspected was the case). It appears to hang because it is waiting on the GPU to finish. I don't know if the GPU is actually stuck, or something is blocking the kernels from running. |
Send message Joined: 20 Jun 12 Posts: 3 Credit: 51,772,719 RAC: 0 |
On linux with amd vega64 it got stock with 100% utilization and 80w power usage (not doing anything), until i disabled the screen saver. After that worked for 2 days without any problem. For vega64 the app is very inefficient, 7 minutes for a workunit at 160-170w and maximum stable is numBlocks 1024 with threadsPerBlock 64. Running 2 workunits it hangs with the same 100% utilization and 80w power usage. Amd driver apencl=rocr. A linux laptop with amd rx580 finishes in 8 to 8:10 minutes using 40w with 1024-64 settings, and 9:30 minutes using 38-40w with 1024-32. Many workunits error up with "SIGSEGV: segmentation violation" using amd driver opencl=legacy,rocr. Both graphics cards with rocm 4.0.1 driver have a memory problem, memory goes from 0 to 3200-3500mb and back every 2-3 seconds with lag in desktop and running times in hours. |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 495,411,746 RAC: 564,093 |
On linux with amd vega64 it got stock with 100% utilization and 80w power usage (not doing anything), until i disabled the screen saver. After that worked for 2 days without any problem. Thanks for the feedback! Without it it's hard to know anything is wrong since everything runs smoothly on my machines. The only time I've had issues is when I stream video while the app is running, and I've learned to suspend GPU processing in the client before doing such things. I thought that was typical behavior with GPU processing, but maybe I am wrong? The inefficiency on the vega64 might be the driver, since all apps use the exact same opencl code. The biggest source of problems is the opencl compiler which comes bundled with the driver. Regarding the SIGSEGV errors, I looked at the stderr and it appears these errors all occur before the GPU is acquired. In fact, they occur just before the polynomial buffers are allocated on the CPU. I don't see any memory allocation errors, but maybe the SIGSEGV is thrown before the return from the malloc call. Whatever the cause, this is not a GPU or opencl problem. The problem could be a resource problem on the CPU side, since the opencl apps require bigger data buffers on the CPU. If you continue to see these errors try reducing the numBlocks to 256 or 512 (keep threadsPerBlock at 64 for best performance). Not sure what to make of the memory usage jumping between 0 and 3.2GB. Memory usage on the GPU should never get that high with the NumberFields app, unless numBlocks was set way too high. |
Send message Joined: 20 Jun 12 Posts: 3 Credit: 51,772,719 RAC: 0 |
The last driver version from amd site changed the opencl from "pal" to "rocr" for cards build after vega and this might be the problem. Amd RX580 it is using the same version of opencl=legacy for cards build before vega64. RadeonOpenCompute, rocm4.0.1 it is open source driver and it comes with all the new libraries and compilers and this can create problems, one boinc project results in errors with this driver but works with amd driver. The app settings were standard and it looked like it tried to fill the gpu memory , from 0 to 700mb to 1200mb all the way to 3200-3500mb and back to zero every 2-3 seconds. |
Send message Joined: 24 May 19 Posts: 38 Credit: 1,359,440 RAC: 719 |
Yes, I meant the Microsoft Edge browser, which has in the settings>system>Use hardware acceleration when available "enabled". I wanted to do a try to disable hardware acceleration in the Edge browser and for security I also revived the system and as they imagine when you block wu processing and consequently the GPU, my trick does not work, that is, if I start the browser does not unlock my GPU. I also tried to reset the Edge browser and restarted the system for security and the trick works, when you block WU processing and consequently the GPU, because apparently by default it enables GPU hardware acceleration. There was a WU that blocked my system and then it came back, but it didn't go wrong, on the contrary it completed the processing and if I remember correctly it is the one that took the most time to process among those of yesterday. If it can be useful to you, I noticed from the GPU metrics, that when the percentage of WU processing freezes and GPU usage also splashes 99% video memory rises from about 800/900 MB to about 1900/2000 MB. If you then pause WU processing or close the BOINC client, after about a minute the GPU unlocks and the video memory returns to its initial values of about 800/900 MB. I also noticed, opening task manager, that the app GetDecics_4.02_windows_x86_64__opencl_amd does not appear until you block the percentage of WU processing and that is, after one minute from the beginning and only 4/5% usage appears in the CPU column, while in the GPU column the usage is 0%, despite the GPU metrics indicating 99% usage. A curiosity, in the file "gpuLookupTable_v402.txt", for AMD GPUs, in the column "GPU Name" it says RX570, in the column "numBlocks" it says 2048 and in the column "threadsPerBlock" it says 64, why does the app take these parameters also for my GPU which is an AMD Radeon RX 5700 XT 50th Anniversary and not an RX570? |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 495,411,746 RAC: 564,093 |
A curiosity, in the file "gpuLookupTable_v402.txt", for AMD GPUs, in the column "GPU Name" it says RX570, in the column "numBlocks" it says 2048 and in the column "threadsPerBlock" it says 64, why does the app take these parameters also for my GPU which is an AMD Radeon RX 5700 XT 50th Anniversary and not an RX570? That would be a flaw in the lookup table code. It checks if the string in the lookup table is a substring of the BOINC provided GPU string. In this case "RX570" is a substring of "RX5700". This method should be made more robust so that there is a one-to-one mapping between entries in the lookup table and actual GPUs. The RX5700 should be able to handle the 2048x64 setting, so not a big deal. |
Send message Joined: 24 May 19 Posts: 38 Credit: 1,359,440 RAC: 719 |
I'm sorry for my ignorance, but where do I find the GPU string provided by BOINC? However I think the RX570 is not a substring of the RX5700, because they are 2 GPUs of different architecture see the following links for more information: AMD RADEON RX570 https://www.amd.com/en/products/graphics/radeon-rx-570 AMD RADEON RX5700 https://www.amd.com/en/products/graphics/amd-radeon-rx-5700-xt-50th-anniversary |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 495,411,746 RAC: 564,093 |
I'm sorry for my ignorance, but where do I find the GPU string provided by BOINC? This is a "string" from a software perspective, meaning just a series of raw characters. The characters "RX570" are part of "RX5700", which is what we mean when we say "RX570" is a substring of "RX5700". This has nothing to do with the actual graphics cards. The string given by the BOINC api is printed at the top of the stderr. The stderr is given with each completed task. For example here is one of your tasks: https://numberfields.asu.edu/NumberFields/result.php?resultid=107841147 And your GPU string is: GPU Summary String = [CAL|AMDRadeonRX5700XT50thAnniversary|1|8176MB||200] |
Send message Joined: 24 May 19 Posts: 38 Credit: 1,359,440 RAC: 719 |
I'm sorry for my ignorance, but where do I find the GPU string provided by BOINC? Thank you for your explanation. I noticed that string in the app's WU stderr, but because you told me about BOINC, I was thinking about the client and that the string was inside some of its files. |
Send message Joined: 24 May 19 Posts: 38 Credit: 1,359,440 RAC: 719 |
Update. AMD has released a new optional version of the drivers on 21.2.2 (WIN 10 64 bits. 20H2) and IT DOES NOT WORK. OpenCL: AMD/ATI 0 GPU: AMD Radeon RX 5700 XT 50th Anniversary (driver version 3188.4 (PAL, LC), OpenCL 2.0 AMD-APP device version (3188.4), 8176MB, 8176MB available, peak GFLOPS 9370). |
Send message Joined: 24 May 19 Posts: 38 Credit: 1,359,440 RAC: 719 |
Update. AMD has released a new optional version of the drivers on 21.2.3 (WIN 10 64 bits. 20H2) and IT DOES NOT WORK. OpenCL: AMD/ATI 0 GPU: AMD Radeon RX 5700 XT 50th Anniversary (driver version 3188.4 (PAL, LC), OpenCL 2.0 AMD-APP device version (3188.4), 8176MB, 8176MB available, peak GFLOPS 9370). |