GPU status update

Message boards : News : GPU status update
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
hsdecalc

Send message
Joined: 22 Feb 19
Posts: 3
Credit: 5,366,654
RAC: 9
Message 2466 - Posted: 18 Jun 2019, 9:19:36 UTC

Ok or wrong:

On my pc the Nvidia GPU tasks additionally need a full CPU core (Get Decic Fields v3.03 (opencl_nvidia) windows_x86_64).
The GTX 1080 Ti card has a power usage of only 25% and the gpu utilization is 50%.

So I run two tasks. The gpu is underutilized and the cpu load is too high.
https://numberfields.asu.edu/NumberFields/result.php?resultid=50063132
ID: 2466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,167,154
RAC: 241,889
Message 2467 - Posted: 18 Jun 2019, 17:10:17 UTC - in response to Message 2466.  

Ok or wrong:

On my pc the Nvidia GPU tasks additionally need a full CPU core (Get Decic Fields v3.03 (opencl_nvidia) windows_x86_64).
The GTX 1080 Ti card has a power usage of only 25% and the gpu utilization is 50%.

So I run two tasks. The gpu is underutilized and the cpu load is too high.
https://numberfields.asu.edu/NumberFields/result.php?resultid=50063132


GPU utilization will be fixed with optimization. This is dependent on individual card. My utilization is 97%, but that's because I optimized for my card, and those settings are now the default. When I find the time I will add an app_info.xml interface so that volunteers can easily tweak settings and report back the optimized values for their card.

I'm not sure why the cpu load is high. I've noticed that too on the cuda version. As a test, I removed the call to the GPU and the app finished within seconds. That means the vast majority of the time should be spent on the GPU. I'm guessing the CPU is "waiting" on the GPU to finish. The CPU buffers polynomials and then throws them at the GPU. I currently do this in blocks. I could increase the buffer to include every polynomial so there is only 1 call to the GPU. Then the CPU would not have to constantly wait on the GPU, but this would require close to 100GB of memory (worst case) which is not practical. But maybe there is a better way, I just haven't found it yet.
ID: 2467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sterling

Send message
Joined: 14 Jun 19
Posts: 1
Credit: 1,234,896
RAC: 0
Message 2473 - Posted: 20 Jun 2019, 4:41:08 UTC

Error: Failed to obtain OpenCL device id.
Error: Failed to initialize OpenCL.


Any chance I could look at your code that initializes ?
ID: 2473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,167,154
RAC: 241,889
Message 2474 - Posted: 20 Jun 2019, 15:26:55 UTC - in response to Message 2473.  

Error: Failed to obtain OpenCL device id.
Error: Failed to initialize OpenCL.

Any chance I could look at your code that initializes?


Please do! I will send it to you.

Just so you know, when it cant obtain a device id, that's usually a sign of a configuration problem (not a code problem). I've seen it on my own system when I updated the graphics driver and then the client lost its connection to the GPU. To fix it I had to reboot.

Feel free to also look at the openCL code. Some AMD cards have a problem with it and I still don't know why (my best guess is buggy drivers, but it could be something in the code)
ID: 2474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 216
Credit: 9,899,302
RAC: 0
Message 2475 - Posted: 20 Jun 2019, 20:46:17 UTC - in response to Message 2467.  
Last modified: 20 Jun 2019, 20:47:16 UTC

Ok or wrong:

On my pc the Nvidia GPU tasks additionally need a full CPU core (Get Decic Fields v3.03 (opencl_nvidia) windows_x86_64).
The GTX 1080 Ti card has a power usage of only 25% and the gpu utilization is 50%.

So I run two tasks. The gpu is underutilized and the cpu load is too high.
https://numberfields.asu.edu/NumberFields/result.php?resultid=50063132


GPU utilization will be fixed with optimization. This is dependent on individual card. My utilization is 97%, but that's because I optimized for my card, and those settings are now the default. When I find the time I will add an app_info.xml interface so that volunteers can easily tweak settings and report back the optimized values for their card.

I'm not sure why the cpu load is high. I've noticed that too on the cuda version. As a test, I removed the call to the GPU and the app finished within seconds. That means the vast majority of the time should be spent on the GPU. I'm guessing the CPU is "waiting" on the GPU to finish. The CPU buffers polynomials and then throws them at the GPU. I currently do this in blocks. I could increase the buffer to include every polynomial so there is only 1 call to the GPU. Then the CPU would not have to constantly wait on the GPU, but this would require close to 100GB of memory (worst case) which is not practical. But maybe there is a better way, I just haven't found it yet.



Could you increase the buffer to the following:

Read the user / computer specific max memory usage from there computing preferences and divide it but the max. running tasks (number of threads and graphics devices, if there supported / enabled) and set this value as max largest buffer?
This are just my two cents~
ID: 2475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,167,154
RAC: 241,889
Message 2481 - Posted: 30 Jun 2019, 4:12:05 UTC - in response to Message 2478.  

Thanks for the info. I will use this when I optimize by card.

Just added an app_config.xml to two of my machines so I can run 2 concurrent WU's on my GPU's which increases their load from 79% to 97%.
1st machine is an I5 (4 cores) with a GTX1060 gpu.
<app_config>
<app>
<name>GetDecics</name>
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

2nd machine is an I7 (8 multithreaded cores) with a GTX1070 gpu.
<app_config>
<app>
<name>GetDecics</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

So 2 cores are allocated to the GPU's and the other cores are running CPU WU's.
This is giving me a slight increase in throughput.
ID: 2481 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : News : GPU status update


Main page · Your account · Message boards


Copyright © 2024 Arizona State University