GPU app status update

Message boards : News : GPU app status update
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Jesse Viviano

Send message
Joined: 20 Dec 14
Posts: 17
Credit: 12,153,123
RAC: 0
Message 2381 - Posted: 7 Apr 2019, 19:16:10 UTC

Maybe you might be more productive by developing a multithreaded CPU app. One advantage of a multithreaded app is that you can monopolize the CPU's cache if it takes every CPU hardware thread rather than compete against other BOINC projects, allowing the program to get a higher CPU cache hit rate. GPUs are great at parallel operations that work in lockstep. A multicore CPU can work on way fewer threads than a GPU at once, but the CPU threads in a program can operate mostly independently if programmed right if that is needed. GPUs are great at operations where large amounts of threads need to do the same operation in lock step. However, GPUs are bad at situations where a significant amount of threads need to do their own things independent of other threads.
ID: 2381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,515,322
RAC: 129,066
Message 2382 - Posted: 7 Apr 2019, 19:17:55 UTC - in response to Message 2380.  
Last modified: 7 Apr 2019, 20:03:34 UTC

Hmmm. The documentation says that credit from runtime shouldn't be used for GPUs, because they aren't benchmarked. So, what speed is the validator assuming in calculating those obscene credits? My guess would be that it's using the raw advertising claim ('GFLOPS peak'), which - coupled with the extended runtimes caused by the current optimisation state of the Beta app - would be a double whammy.

I'm in regular contact with the BOINC developers, but I'd like to check the figures behind that assumption, before raising the subject with them. Eric, could you let me know - by PM might be best - one or two example HostIDs exhibiting this problem, so I can run the numbers?

Edit - never mind, I found some. They do stick out rather, don't they?

Yes, that checks out. Device peak FLOPS (overstated by marketing) x raw elapsed time (extended by inefficient app) x the same fiddle factor as my CPUs were getting = awarded credit

But in the last hour and a half, the credit awarded to my own CPUs has gone up nearly tenfold. That can't be sustainable, either.
ID: 2382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef J
Avatar

Send message
Joined: 16 Apr 14
Posts: 7
Credit: 6,694,951
RAC: 0
Message 2383 - Posted: 7 Apr 2019, 20:40:57 UTC

Credit "backwards" steps what you do now...wow... are most ****** thing i see in history of boinc projects ... you really dont need volunters and you acting like that..
I realy dont need another collatz" here..xx millions per day... but 100-200 per task from this project now is - laugh to volunter eyes..
ID: 2383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,515,322
RAC: 129,066
Message 2384 - Posted: 7 Apr 2019, 20:52:15 UTC - in response to Message 2383.  

Credit "backwards" steps what you do now...wow... are most ****** thing i see in history of boinc projects ... you really dont need volunters and you acting like that..
I realy dont need another collatz" here..xx millions per day... but 100-200 per task from this project now is - laugh to volunter eyes..
You are running the Windows application, on CPUs only (hosts 1465177 and 1488499). The credits are normal by the BOINC definition of work done - even a little high.

This thread is about the GPU experiment, and the problems it's causing for everyone in establishing a fair reward for both types of application. Please have patience - we're trying to sort something out.
ID: 2384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 404,028,078
RAC: 289,866
Message 2385 - Posted: 7 Apr 2019, 21:38:22 UTC

So CreditNew has curbed the gpu abuse. But cpu credits are higher than expected. It looks like my host credits are starting to come down, so I hope this means the stat calculations are starting to settle.

Richard - are there any more configuration items I need to modify for CreditNew to work better? For example, how sensitive is it to rsc_flops_est in the WU template?
ID: 2385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 404,028,078
RAC: 289,866
Message 2386 - Posted: 7 Apr 2019, 21:46:13 UTC - in response to Message 2379.  

It might be worth reporting back your findings to the PARI people, and suggesting that they compare notes with FFTW to see if similar techniques could be employed.


Somebody else suggested this too. I do plan on contacting them. The issue might be that they picked an algorithm that worked well in general for any degree; mine is specialized to degree 10.
ID: 2386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,515,322
RAC: 129,066
Message 2387 - Posted: 7 Apr 2019, 22:15:29 UTC - in response to Message 2385.  

Richard - are there any more configuration items I need to modify for CreditNew to work better? For example, how sensitive is it to rsc_flops_est in the WU template?
To be honest, I don't think so, but I don't know for certain. My personal opinion is that <rsc_flops_est> must be buried in there somewhere, but people I've spoken to who claim to understand the control-feedback algorithm involved (apparently it's a well-known tool in control engineering) don't agree with me - they use words like Kalman filter, or PI-controller, which I don't understand. <rsc_flops_est> by batch would be a nightmare for you to implement here - don't do anything until I can speak to others tomorrow.
ID: 2387 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Vitaly

Send message
Joined: 5 Jan 13
Posts: 43
Credit: 37,997,595
RAC: 38,151
Message 2402 - Posted: 15 Apr 2019, 12:32:51 UTC - in response to Message 2387.  

Yes, 2 millions tasks for ten days. This is cool.

God blessed this project )
ID: 2402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nick

Send message
Joined: 23 Oct 18
Posts: 5
Credit: 7,397,389
RAC: 0
Message 2403 - Posted: 16 Apr 2019, 20:25:42 UTC - in response to Message 2386.  

It might be worth reporting back your findings to the PARI people, and suggesting that they compare notes with FFTW to see if similar techniques could be employed.


Somebody else suggested this too. I do plan on contacting them. The issue might be that they picked an algorithm that worked well in general for any degree; mine is specialized to degree 10.


How did you specialize it?

Do you think the technique could be generalized so that more efficient specialized algorithms could be produced for any fixed degree?
ID: 2403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 404,028,078
RAC: 289,866
Message 2404 - Posted: 16 Apr 2019, 21:00:11 UTC - in response to Message 2403.  

How did you specialize it?

Do you think the technique could be generalized so that more efficient specialized algorithms could be produced for any fixed degree?


The first step of the algorithm is to remove the "content" from the polynomial and it's derivative, which means the gcd of all the coefficients. Our polys are monic so the content is 1 and no gcd calculation is necessary. For the derivative, the leading coeff is 10 (i.e. the derivative of x^10 is 10*x^9) so the content will be one of {1,2,5,10}, so we just need to determine if 2 and 5 divide each coeff. This turns out to be much simpler than computing a general gcd of all the coefficients.

Since we are only interested in the absolute discriminant, I don't need to keep track of the sign, so that saves on operations. Also, the last part of the test divides out all factors of 2 and 5; and then checks that what remains is a square. This means we don't care about square factors, which allows us to further reduce the number of operations. The final result is not the true discriminant, so this kind of improvement can't be used in a general discriminant algorithm.

To answer your question; yes, specialized algorithms could be used for each degree. You could use a switch statement to call the appropriate algorithm. But the general algorithm does have it's advantages, for example the code is much more compact and easier to maintain. And if you are only computing a small number of discriminants, you wouldn't notice the speed difference. It only became an issue here since we are ultimately computing many trillions of them.

By the way, I did talk with the folks at PARI. They confirmed their algorithm is more general and designed to be more efficient for higher degrees and coefficients that are much larger.
ID: 2404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Anne

Send message
Joined: 12 Jan 17
Posts: 1
Credit: 3,787,210
RAC: 0
Message 2407 - Posted: 22 Apr 2019, 5:38:46 UTC

Hi
Do you have any plans to update to version 3.00 for computers running Mac?
ID: 2407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 404,028,078
RAC: 289,866
Message 2408 - Posted: 22 Apr 2019, 6:39:39 UTC - in response to Message 2407.  

Hi
Do you have any plans to update to version 3.00 for computers running Mac?


I would like to eventually. I used to have a "hackintosh" VM, but it stopped working. So I first need to get access to a mac (possibly another VM), before I can port the code again.
ID: 2408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 404,028,078
RAC: 289,866
Message 2409 - Posted: 23 Apr 2019, 4:52:38 UTC - in response to Message 2367.  

I just deployed the new cpu apps. Version 3.00. Feel free to abort any WUs associated with the older versions (2.xx).

My 32-bit Linux machine is still using version 2.12.
Are there plans to release version 3.00 apps for this platform (and 32-bit Windows)?


I felt sorry for those poor old 32 bit machines slowly plodding along... so I went ahead and rebuilt the code for them. Version 3.00 is now available for linux-i686 and win32.

These are cross-compiles so I cannot extensively test them, so keep an eye on them.
ID: 2409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : News : GPU app status update


Main page · Your account · Message boards


Copyright © 2024 Arizona State University