GPU status update

Message boards : News : GPU status update
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2424 - Posted: 15 May 2019, 22:05:56 UTC

It's been over a month since our last update, but I now have some good news. I have made some improvements to the GPU code and am ready to start deploying the new GPU apps.

I will start with the AMD OpenCL version for Linux. This will be a beta version. I have had a hell of a time with the AMD implementation of openCL, and this app still doesn't work on my Fedora system, and I believe strongly it's due to the graphics driver. But I have had the help of a volunteer named Wiktor and it runs fine for him (I believe he runs Ubuntu). Please keep in mind that AMD officially only supports RHEL and Ubuntu, so I will be interested to hear if this app works for anyone with an "unsupported" linux distro like myself.

I also have openCL Windows apps that were cross compiled using mingW. I have no means of testing these, so I am not ready to deploy them just yet. But if anyone would like to take them for a spin offline, please let me know, and I can send them to you.
ID: 2424 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Diffident

Send message
Joined: 30 Apr 18
Posts: 2
Credit: 1,376,706
RAC: 0
Message 2425 - Posted: 16 May 2019, 0:09:36 UTC - in response to Message 2424.  

How about writing the AMD GPU app so it works with the ROCm opencl driver. The ROCm driver works great for Einstein@home.
ID: 2425 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2426 - Posted: 16 May 2019, 2:58:52 UTC - in response to Message 2425.  

How about writing the AMD GPU app so it works with the ROCm opencl driver. The ROCm driver works great for Einstein@home.


As far as I know, it has nothing to do with the app. The openCL code works perfectly on Nvidia and with AMD on Ubuntu. I think what you are suggesting is I try the ROCm driver on my Fedora system. I did try that early on with no success, but perhaps I should try again now that I have more experience with video drivers.
ID: 2426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 211
Credit: 8,420,526
RAC: 8,322
Message 2429 - Posted: 16 May 2019, 8:16:30 UTC - in response to Message 2424.  

Feel free to send me the openCL Windows app via mail. Would like to give it a try. :)
ID: 2429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Diffident

Send message
Joined: 30 Apr 18
Posts: 2
Credit: 1,376,706
RAC: 0
Message 2430 - Posted: 16 May 2019, 9:21:02 UTC - in response to Message 2426.  
Last modified: 16 May 2019, 9:21:24 UTC

How about writing the AMD GPU app so it works with the ROCm opencl driver. The ROCm driver works great for Einstein@home.


As far as I know, it has nothing to do with the app. The openCL code works perfectly on Nvidia and with AMD on Ubuntu. I think what you are suggesting is I try the ROCm driver on my Fedora system. I did try that early on with no success, but perhaps I should try again now that I have more experience with video drivers.


There must be something different. When using the ROCm driver I can run Einstein@home, but Milkway@home will instantly stop with a computation error. I think the ROCm driver should be preferred since AMD is moving everything to opensource instead using the opencl bits from closed pro driver.
ID: 2430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2434 - Posted: 16 May 2019, 16:38:26 UTC - in response to Message 2430.  

There must be something different. When using the ROCm driver I can run Einstein@home, but Milkway@home will instantly stop with a computation error. I think the ROCm driver should be preferred since AMD is moving everything to opensource instead using the opencl bits from closed pro driver.


I agree. From what I've read ROCm is the way to go. When I get a chance I will look into that again.
ID: 2434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2435 - Posted: 16 May 2019, 18:40:40 UTC

Thanks to the successful testing by Speedy51, I will be able to deploy the Windows Nvidia OpenCL app. I should get to that in the next couple hours.

In the meantime, has anyone with an AMD card on linux tried to test that version? I deployed it ~12 hours ago and no tasks have been sent out yet. My own system cant seem to download tasks for it either, so I think something might be wrong with how I setup the plan class.
ID: 2435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2436 - Posted: 16 May 2019, 20:45:09 UTC

I just deployed the windows Nvidia version as a beta app. Please test and report any suspicious behavior.
ID: 2436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2437 - Posted: 16 May 2019, 20:54:12 UTC - in response to Message 2429.  

Feel free to send me the openCL Windows app via mail. Would like to give it a try. :)


I just sent you the AMD version, since I now have confidence in Nvidia version. Thanks!
ID: 2437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 211
Credit: 8,420,526
RAC: 8,322
Message 2439 - Posted: 17 May 2019, 8:32:02 UTC - in response to Message 2436.  

Runtime from a sf5 task: 1 hour and 3 minutes; which is ~3 times faster than a CPU task.
Runtime from a sf6 DS7x10: ~7 minutes, which i cant relate to a CPU task at this time.

No errors while running the task.
It seems to work fine, at least for me. :)
ID: 2439 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2440 - Posted: 17 May 2019, 14:23:08 UTC

The Nvidia windows version seems to be doing pretty well. Many successful results from multiple users. Only 2 compute errors.

I am going on a road trip (vacation) for a week. I will have cell phone coverage, but unable to do any major project maintenance. I will look into the above compute errors when I return.
ID: 2440 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 211
Credit: 8,420,526
RAC: 8,322
Message 2443 - Posted: 17 May 2019, 16:31:37 UTC - in response to Message 2440.  

Have a nice vacation. I will try to complete as much GPU tasks as i can. ;)
ID: 2443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 3 May 18
Posts: 4
Credit: 5,313,872
RAC: 0
Message 2446 - Posted: 19 May 2019, 8:18:57 UTC - in response to Message 2435.  

Thanks to the successful testing by Speedy51, I will be able to deploy the Windows Nvidia OpenCL app. I should get to that in the next couple hours.

In the meantime, has anyone with an AMD card on linux tried to test that version? I deployed it ~12 hours ago and no tasks have been sent out yet. My own system cant seem to download tasks for it either, so I think something might be wrong with how I setup the plan class.


I have AMD cards but use Windows so I can't help sorry.

ID: 2446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2451 - Posted: 27 May 2019, 1:14:05 UTC - in response to Message 2440.  

The Nvidia windows version seems to be doing pretty well. Many successful results from multiple users. Only 2 compute errors.

I am going on a road trip (vacation) for a week. I will have cell phone coverage, but unable to do any major project maintenance. I will look into the above compute errors when I return.


I found the bug that was causing the compute errors. It's minor and affects less than 1% of the WUs. I will get a fix out there later this evening.
ID: 2451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Henk Haneveld

Send message
Joined: 12 Oct 17
Posts: 2
Credit: 188,354
RAC: 10
Message 2452 - Posted: 27 May 2019, 15:12:34 UTC

The estimated runtime for results with the new version is very, very wrong.

They show a runtime of 22 seconds on my host but take about an hour to finish.
ID: 2452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2453 - Posted: 27 May 2019, 15:38:47 UTC - in response to Message 2452.  

The estimated runtime for results with the new version is very, very wrong.

They show a runtime of 22 seconds on my host but take about an hour to finish.


That's because CreditNew restarts the stats calculations with each new app version. I'm not sure exactly how to change it's initial value. I too saw this last night, but by this morning it is now estimating 25 minutes per task, which is accurate for my GPU.
ID: 2453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2454 - Posted: 27 May 2019, 17:20:54 UTC - in response to Message 2453.  

The estimated runtime for results with the new version is very, very wrong.

They show a runtime of 22 seconds on my host but take about an hour to finish.


That's because CreditNew restarts the stats calculations with each new app version. I'm not sure exactly how to change it's initial value. I too saw this last night, but by this morning it is now estimating 25 minutes per task, which is accurate for my GPU.


So I believe CreditNew uses rsc_fpops_est as it's initial starting point. This was at least 10x too low. I have now fixed this, so going forward the initial flops estimates should be better.
ID: 2454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Kowal
Avatar

Send message
Joined: 12 Oct 13
Posts: 14
Credit: 33,331,442
RAC: 25,626
Message 2458 - Posted: 6 Jun 2019, 21:08:25 UTC

The beta apps are functioning as intended, yes?

Will we see a Windows app for opencl_amd?
ID: 2458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2459 - Posted: 7 Jun 2019, 1:55:05 UTC - in response to Message 2458.  

The beta apps are functioning as intended, yes?

Will we see a Windows app for opencl_amd?


Yes, thanks for reminding me. The nvidia apps have been working well, so I just promoted them to normal app status. There is still some room for optimization, but the apps are stable, so I think this is a good idea.

There are a couple people helping with the amd opencl versions. Exact same opencl code that works perfectly on nvidia, but amd cards are very finicky. I believe it comes down to inconsistent drivers. The amd opencl on linux had about half a dozen successful results which is a good sign.
ID: 2459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 913
Credit: 95,286,549
RAC: 45,994
Message 2463 - Posted: 14 Jun 2019, 16:20:59 UTC - in response to Message 2459.  

I removed the beta designation on the AMD opencl linux version. Although there were relatively few results, those that tested the beta version did so successfully. I will keep an eye on results; if problems arise I will revert back to beta status.
ID: 2463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : GPU status update


Main page · Your account · Message boards


Copyright © 2019 Arizona State University