Message boards :
News :
GPU app status update
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 ![]() ![]() |
Maybe you might be more productive by developing a multithreaded CPU app. One advantage of a multithreaded app is that you can monopolize the CPU's cache if it takes every CPU hardware thread rather than compete against other BOINC projects, allowing the program to get a higher CPU cache hit rate. GPUs are great at parallel operations that work in lockstep. A multicore CPU can work on way fewer threads than a GPU at once, but the CPU threads in a program can operate mostly independently if programmed right if that is needed. GPUs are great at operations where large amounts of threads need to do the same operation in lock step. However, GPUs are bad at situations where a significant amount of threads need to do their own things independent of other threads. |
Send message Joined: 28 Oct 11 Posts: 181 Credit: 287,060,663 RAC: 197,127 ![]() ![]() ![]() |
Hmmm. The documentation says that credit from runtime shouldn't be used for GPUs, because they aren't benchmarked. So, what speed is the validator assuming in calculating those obscene credits? My guess would be that it's using the raw advertising claim ('GFLOPS peak'), which - coupled with the extended runtimes caused by the current optimisation state of the Beta app - would be a double whammy. I'm in regular contact with the BOINC developers, but I'd like to check the figures behind that assumption, before raising the subject with them. Eric, could you let me know - by PM might be best - one or two example HostIDs exhibiting this problem, so I can run the numbers? Edit - never mind, I found some. They do stick out rather, don't they? Yes, that checks out. Device peak FLOPS (overstated by marketing) x raw elapsed time (extended by inefficient app) x the same fiddle factor as my CPUs were getting = awarded credit But in the last hour and a half, the credit awarded to my own CPUs has gone up nearly tenfold. That can't be sustainable, either. |
Send message Joined: 16 Apr 14 Posts: 7 Credit: 6,694,951 RAC: 0 ![]() |
Credit "backwards" steps what you do now...wow... are most ****** thing i see in history of boinc projects ... you really dont need volunters and you acting like that.. I realy dont need another collatz" here..xx millions per day... but 100-200 per task from this project now is - laugh to volunter eyes.. |
Send message Joined: 28 Oct 11 Posts: 181 Credit: 287,060,663 RAC: 197,127 ![]() ![]() ![]() |
Credit "backwards" steps what you do now...wow... are most ****** thing i see in history of boinc projects ... you really dont need volunters and you acting like that..You are running the Windows application, on CPUs only (hosts 1465177 and 1488499). The credits are normal by the BOINC definition of work done - even a little high. This thread is about the GPU experiment, and the problems it's causing for everyone in establishing a fair reward for both types of application. Please have patience - we're trying to sort something out. |
![]() Send message Joined: 8 Jul 11 Posts: 1373 Credit: 646,553,992 RAC: 858,522 ![]() ![]() ![]() |
So CreditNew has curbed the gpu abuse. But cpu credits are higher than expected. It looks like my host credits are starting to come down, so I hope this means the stat calculations are starting to settle. Richard - are there any more configuration items I need to modify for CreditNew to work better? For example, how sensitive is it to rsc_flops_est in the WU template? |
![]() Send message Joined: 8 Jul 11 Posts: 1373 Credit: 646,553,992 RAC: 858,522 ![]() ![]() ![]() |
It might be worth reporting back your findings to the PARI people, and suggesting that they compare notes with FFTW to see if similar techniques could be employed. Somebody else suggested this too. I do plan on contacting them. The issue might be that they picked an algorithm that worked well in general for any degree; mine is specialized to degree 10. |
Send message Joined: 28 Oct 11 Posts: 181 Credit: 287,060,663 RAC: 197,127 ![]() ![]() ![]() |
Richard - are there any more configuration items I need to modify for CreditNew to work better? For example, how sensitive is it to rsc_flops_est in the WU template?To be honest, I don't think so, but I don't know for certain. My personal opinion is that <rsc_flops_est> must be buried in there somewhere, but people I've spoken to who claim to understand the control-feedback algorithm involved (apparently it's a well-known tool in control engineering) don't agree with me - they use words like Kalman filter, or PI-controller, which I don't understand. <rsc_flops_est> by batch would be a nightmare for you to implement here - don't do anything until I can speak to others tomorrow. |
![]() Send message Joined: 5 Jan 13 Posts: 44 Credit: 50,835,303 RAC: 44,735 ![]() ![]() ![]() |
Yes, 2 millions tasks for ten days. This is cool. God blessed this project ) |
Send message Joined: 23 Oct 18 Posts: 5 Credit: 7,397,389 RAC: 0 ![]() |
It might be worth reporting back your findings to the PARI people, and suggesting that they compare notes with FFTW to see if similar techniques could be employed. How did you specialize it? Do you think the technique could be generalized so that more efficient specialized algorithms could be produced for any fixed degree? |
![]() Send message Joined: 8 Jul 11 Posts: 1373 Credit: 646,553,992 RAC: 858,522 ![]() ![]() ![]() |
How did you specialize it? The first step of the algorithm is to remove the "content" from the polynomial and it's derivative, which means the gcd of all the coefficients. Our polys are monic so the content is 1 and no gcd calculation is necessary. For the derivative, the leading coeff is 10 (i.e. the derivative of x^10 is 10*x^9) so the content will be one of {1,2,5,10}, so we just need to determine if 2 and 5 divide each coeff. This turns out to be much simpler than computing a general gcd of all the coefficients. Since we are only interested in the absolute discriminant, I don't need to keep track of the sign, so that saves on operations. Also, the last part of the test divides out all factors of 2 and 5; and then checks that what remains is a square. This means we don't care about square factors, which allows us to further reduce the number of operations. The final result is not the true discriminant, so this kind of improvement can't be used in a general discriminant algorithm. To answer your question; yes, specialized algorithms could be used for each degree. You could use a switch statement to call the appropriate algorithm. But the general algorithm does have it's advantages, for example the code is much more compact and easier to maintain. And if you are only computing a small number of discriminants, you wouldn't notice the speed difference. It only became an issue here since we are ultimately computing many trillions of them. By the way, I did talk with the folks at PARI. They confirmed their algorithm is more general and designed to be more efficient for higher degrees and coefficients that are much larger. |
Send message Joined: 12 Jan 17 Posts: 1 Credit: 3,787,210 RAC: 0 ![]() |
Hi Do you have any plans to update to version 3.00 for computers running Mac? |
![]() Send message Joined: 8 Jul 11 Posts: 1373 Credit: 646,553,992 RAC: 858,522 ![]() ![]() ![]() |
Hi I would like to eventually. I used to have a "hackintosh" VM, but it stopped working. So I first need to get access to a mac (possibly another VM), before I can port the code again. |
![]() Send message Joined: 8 Jul 11 Posts: 1373 Credit: 646,553,992 RAC: 858,522 ![]() ![]() ![]() |
I just deployed the new cpu apps. Version 3.00. Feel free to abort any WUs associated with the older versions (2.xx). I felt sorry for those poor old 32 bit machines slowly plodding along... so I went ahead and rebuilt the code for them. Version 3.00 is now available for linux-i686 and win32. These are cross-compiles so I cannot extensively test them, so keep an eye on them. |