GPU app status update

Author	Message
Jesse Viviano Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0	Message 2381 - Posted: 7 Apr 2019, 19:16:10 UTC Maybe you might be more productive by developing a multithreaded CPU app. One advantage of a multithreaded app is that you can monopolize the CPU's cache if it takes every CPU hardware thread rather than compete against other BOINC projects, allowing the program to get a higher CPU cache hit rate. GPUs are great at parallel operations that work in lockstep. A multicore CPU can work on way fewer threads than a GPU at once, but the CPU threads in a program can operate mostly independently if programmed right if that is needed. GPUs are great at operations where large amounts of threads need to do the same operation in lock step. However, GPUs are bad at situations where a significant amount of threads need to do their own things independent of other threads. ID: 2381 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 28 Oct 11 Posts: 181 Credit: 300,382,538 RAC: 234,002	Message 2382 - Posted: 7 Apr 2019, 19:17:55 UTC - in response to Message 2380. Last modified: 7 Apr 2019, 20:03:34 UTC Hmmm. The documentation says that credit from runtime shouldn't be used for GPUs, because they aren't benchmarked. So, what speed is the validator assuming in calculating those obscene credits? My guess would be that it's using the raw advertising claim ('GFLOPS peak'), which - coupled with the extended runtimes caused by the current optimisation state of the Beta app - would be a double whammy. I'm in regular contact with the BOINC developers, but I'd like to check the figures behind that assumption, before raising the subject with them. Eric, could you let me know - by PM might be best - one or two example HostIDs exhibiting this problem, so I can run the numbers? Edit - never mind, I found some. They do stick out rather, don't they? Yes, that checks out. Device peak FLOPS (overstated by marketing) x raw elapsed time (extended by inefficient app) x the same fiddle factor as my CPUs were getting = awarded credit But in the last hour and a half, the credit awarded to my own CPUs has gone up nearly tenfold. That can't be sustainable, either. ID: 2382 · Rating: 0 · rate: / Reply Quote

jozef J Send message Joined: 16 Apr 14 Posts: 7 Credit: 6,694,951 RAC: 0	Message 2383 - Posted: 7 Apr 2019, 20:40:57 UTC Credit "backwards" steps what you do now...wow... are most ****** thing i see in history of boinc projects ... you really dont need volunters and you acting like that.. I realy dont need another collatz" here..xx millions per day... but 100-200 per task from this project now is - laugh to volunter eyes.. ID: 2383 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 28 Oct 11 Posts: 181 Credit: 300,382,538 RAC: 234,002	Message 2384 - Posted: 7 Apr 2019, 20:52:15 UTC - in response to Message 2383. Credit "backwards" steps what you do now...wow... are most ****** thing i see in history of boinc projects ... you really dont need volunters and you acting like that.. I realy dont need another collatz" here..xx millions per day... but 100-200 per task from this project now is - laugh to volunter eyes.. You are running the Windows application, on CPUs only (hosts 1465177 and 1488499). The credits are normal by the BOINC definition of work done - even a little high. This thread is about the GPU experiment, and the problems it's causing for everyone in establishing a fair reward for both types of application. Please have patience - we're trying to sort something out. ID: 2384 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1388 Credit: 696,167,017 RAC: 829,051	Message 2385 - Posted: 7 Apr 2019, 21:38:22 UTC So CreditNew has curbed the gpu abuse. But cpu credits are higher than expected. It looks like my host credits are starting to come down, so I hope this means the stat calculations are starting to settle. Richard - are there any more configuration items I need to modify for CreditNew to work better? For example, how sensitive is it to rsc_flops_est in the WU template? ID: 2385 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1388 Credit: 696,167,017 RAC: 829,051	Message 2386 - Posted: 7 Apr 2019, 21:46:13 UTC - in response to Message 2379. It might be worth reporting back your findings to the PARI people, and suggesting that they compare notes with FFTW to see if similar techniques could be employed. Somebody else suggested this too. I do plan on contacting them. The issue might be that they picked an algorithm that worked well in general for any degree; mine is specialized to degree 10. ID: 2386 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 28 Oct 11 Posts: 181 Credit: 300,382,538 RAC: 234,002	Message 2387 - Posted: 7 Apr 2019, 22:15:29 UTC - in response to Message 2385. Richard - are there any more configuration items I need to modify for CreditNew to work better? For example, how sensitive is it to rsc_flops_est in the WU template? To be honest, I don't think so, but I don't know for certain. My personal opinion is that <rsc_flops_est> must be buried in there somewhere, but people I've spoken to who claim to understand the control-feedback algorithm involved (apparently it's a well-known tool in control engineering) don't agree with me - they use words like Kalman filter, or PI-controller, which I don't understand. <rsc_flops_est> by batch would be a nightmare for you to implement here - don't do anything until I can speak to others tomorrow. ID: 2387 · Rating: 0 · rate: / Reply Quote

Vitaly Send message Joined: 5 Jan 13 Posts: 45 Credit: 53,582,103 RAC: 43,263	Message 2402 - Posted: 15 Apr 2019, 12:32:51 UTC - in response to Message 2387. Yes, 2 millions tasks for ten days. This is cool. God blessed this project ) ID: 2402 · Rating: 0 · rate: / Reply Quote

Nick Send message Joined: 23 Oct 18 Posts: 5 Credit: 7,397,389 RAC: 0	Message 2403 - Posted: 16 Apr 2019, 20:25:42 UTC - in response to Message 2386. It might be worth reporting back your findings to the PARI people, and suggesting that they compare notes with FFTW to see if similar techniques could be employed. Somebody else suggested this too. I do plan on contacting them. The issue might be that they picked an algorithm that worked well in general for any degree; mine is specialized to degree 10. How did you specialize it? Do you think the technique could be generalized so that more efficient specialized algorithms could be produced for any fixed degree? ID: 2403 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1388 Credit: 696,167,017 RAC: 829,051	Message 2404 - Posted: 16 Apr 2019, 21:00:11 UTC - in response to Message 2403. How did you specialize it? Do you think the technique could be generalized so that more efficient specialized algorithms could be produced for any fixed degree? The first step of the algorithm is to remove the "content" from the polynomial and it's derivative, which means the gcd of all the coefficients. Our polys are monic so the content is 1 and no gcd calculation is necessary. For the derivative, the leading coeff is 10 (i.e. the derivative of x^10 is 10*x^9) so the content will be one of {1,2,5,10}, so we just need to determine if 2 and 5 divide each coeff. This turns out to be much simpler than computing a general gcd of all the coefficients. Since we are only interested in the absolute discriminant, I don't need to keep track of the sign, so that saves on operations. Also, the last part of the test divides out all factors of 2 and 5; and then checks that what remains is a square. This means we don't care about square factors, which allows us to further reduce the number of operations. The final result is not the true discriminant, so this kind of improvement can't be used in a general discriminant algorithm. To answer your question; yes, specialized algorithms could be used for each degree. You could use a switch statement to call the appropriate algorithm. But the general algorithm does have it's advantages, for example the code is much more compact and easier to maintain. And if you are only computing a small number of discriminants, you wouldn't notice the speed difference. It only became an issue here since we are ultimately computing many trillions of them. By the way, I did talk with the folks at PARI. They confirmed their algorithm is more general and designed to be more efficient for higher degrees and coefficients that are much larger. ID: 2404 · Rating: 0 · rate: / Reply Quote

Anne Send message Joined: 12 Jan 17 Posts: 1 Credit: 3,787,210 RAC: 0	Message 2407 - Posted: 22 Apr 2019, 5:38:46 UTC Hi Do you have any plans to update to version 3.00 for computers running Mac? ID: 2407 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1388 Credit: 696,167,017 RAC: 829,051	Message 2408 - Posted: 22 Apr 2019, 6:39:39 UTC - in response to Message 2407. Hi Do you have any plans to update to version 3.00 for computers running Mac? I would like to eventually. I used to have a "hackintosh" VM, but it stopped working. So I first need to get access to a mac (possibly another VM), before I can port the code again. ID: 2408 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1388 Credit: 696,167,017 RAC: 829,051	Message 2409 - Posted: 23 Apr 2019, 4:52:38 UTC - in response to Message 2367. I just deployed the new cpu apps. Version 3.00. Feel free to abort any WUs associated with the older versions (2.xx). My 32-bit Linux machine is still using version 2.12. Are there plans to release version 3.00 apps for this platform (and 32-bit Windows)? I felt sorry for those poor old 32 bit machines slowly plodding along... so I went ahead and rebuilt the code for them. Version 3.00 is now available for linux-i686 and win32. These are cross-compiles so I cannot extensively test them, so keep an eye on them. ID: 2409 · Rating: 0 · rate: / Reply Quote