GPU app - beta version for linux nvidia

Author	Message
Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 825,004,089 RAC: 1,135,449	Message 2316 - Posted: 28 Mar 2019, 17:01:44 UTC - in response to Message 2314. Excellent stuff for the GPU app, Is there a daily quota? I ask as I can not get any tasks. Thanks The daily quota is high. I too have noticed the scheduler is a little fickle at times. I get the uninformative message "got 0 new tasks" after the client clearly requested new tasks. I turned on the most verbose debug setting in the scheduler and looked at the log after clicking update in the client. Still nothing to say why it wont give work. And then 30 minutes later, for some reason, it downloads a bunch of work. I've been over all the possible server config options and see nothing that could explain this. I've also looked at the feeder and it's definitely interleaving work for both batches, so I'm pretty sure the scheduler is the problem. There is one other thing I am going to try... ID: 2316 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 825,004,089 RAC: 1,135,449	Message 2317 - Posted: 28 Mar 2019, 17:03:47 UTC - in response to Message 2315. Any estimate on when we'll have an app that will run under windows? Hard to say. I need to get he OpenCL version working first before I can answer that (my hope is that porting openCL to windows will be easier than cuda). ID: 2317 · Rating: 0 · rate: / Reply Quote

Azmodes Send message Joined: 20 May 18 Posts: 6 Credit: 165,471,630 RAC: 0	Message 2318 - Posted: 28 Mar 2019, 17:20:34 UTC If anyone cares, I have calculated average task times for some of my cards (sample size 50): GTX 1660 Ti: 239 secs GTX 1080 Ti: 306 secs GTX 1070 Ti: 337 secs GTX 980: 333 secs These are with two tasks running in tandem on each card. Hooray for Turing! ID: 2318 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 825,004,089 RAC: 1,135,449	Message 2320 - Posted: 28 Mar 2019, 17:43:25 UTC - in response to Message 2318. If anyone cares, I have calculated average task times for some of my cards (sample size 50): GTX 1660 Ti: 239 secs GTX 1080 Ti: 306 secs GTX 1070 Ti: 337 secs GTX 980: 333 secs These are with two tasks running in tandem on each card. Hooray for Turing! Interesting! Thanks for sharing. ID: 2320 · Rating: 0 · rate: / Reply Quote

HK-Steve Send message Joined: 23 Jun 17 Posts: 5 Credit: 57,277,776 RAC: 0	Message 2321 - Posted: 28 Mar 2019, 23:33:08 UTC - in response to Message 2316. Excellent stuff for the GPU app, Is there a daily quota? I ask as I can not get any tasks. Thanks The daily quota is high. I too have noticed the scheduler is a little fickle at times. I get the uninformative message "got 0 new tasks" after the client clearly requested new tasks. I turned on the most verbose debug setting in the scheduler and looked at the log after clicking update in the client. Still nothing to say why it wont give work. And then 30 minutes later, for some reason, it downloads a bunch of work. I've been over all the possible server config options and see nothing that could explain this. I've also looked at the feeder and it's definitely interleaving work for both batches, so I'm pretty sure the scheduler is the problem. There is one other thing I am going to try... Great to hear that you have one more thing to try. Much appreciated. Tasks were comming, but would sit for hrs with no tasks.. I was set for 0.10 work, I have now adjusted to 0.50 and see if that makes a difference.. Will let you know if that changes anything ID: 2321 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 825,004,089 RAC: 1,135,449	Message 2322 - Posted: 29 Mar 2019, 0:14:03 UTC - in response to Message 2321. Great to hear that you have one more thing to try. Much appreciated. Tasks were comming, but would sit for hrs with no tasks.. I was set for 0.10 work, I have now adjusted to 0.50 and see if that makes a difference.. Will let you know if that changes anything I created a new thread for this issue. In a nutshell, I think its fixed now. ID: 2322 · Rating: 0 · rate: / Reply Quote

jozef J Send message Joined: 16 Apr 14 Posts: 7 Credit: 6,718,171 RAC: 21	Message 2335 - Posted: 30 Mar 2019, 12:33:01 UTC And windows app?? Lunex dont work if i have multiple hdds ssds ,sata carsd in pcislots, and other hardware connected to pc.. yes single ssd in one pc configuration work as i have before.. but windows 10 now handle perfect heavy loaded computers.. but as i try yesterday .. three diffrent linux distros but lunex was not able detect hardware on my pc ,no one " command" work ... feel waste of resources when you make app for linux first..there is hundreds people with 20×× nvidia cards... wait for normal app.. ID: 2335 · Rating: 0 · rate: / Reply Quote

Azmodes Send message Joined: 20 May 18 Posts: 6 Credit: 165,471,630 RAC: 0	Message 2341 - Posted: 31 Mar 2019, 10:08:12 UTC - in response to Message 2335. I see credit was recently doubled to 8,000 (and the average runtime also seems to have increased with it, though not quite doubled) and now decreased to 800. I'm not complaining, just curious about the reasoning. feel waste of resources when you make app for linux first..there is hundreds people with 20×× nvidia cards... RTX cards work just fine, though? ID: 2341 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 825,004,089 RAC: 1,135,449	Message 2344 - Posted: 31 Mar 2019, 17:45:13 UTC - in response to Message 2341. I see credit was recently doubled to 8,000 (and the average runtime also seems to have increased with it, though not quite doubled) and now decreased to 800. I'm not complaining, just curious about the reasoning. The difference in average times on the CPU went from 1.5 hours for the DS15x11 WUs to 3 hours for the DS14x12 WUs (these are averages over 100 WUs, all on the same machine). This is the reason for the initial doubling. So it is approximate, and of course there is no guarantee the GPU times will scale the same, but it should be close. Then, as discussed in another thread, the credits were way too high for CPUs. To bring them back in line, I had to scale them back down by 10. The cpu WUs are still paying almost 2x the original (from several weeks ago). Here's the problem: The original crediting system could be majorly abused by GPUs, so we had to change it. CreditNew paid next to nothing, so now we are paying a fixed amount per WU. This is not ideal either, as GPUs and CPUs should be treated differently. I think the best solution will be to use CreditNew, but hack the validator code to scale up the payout for GPUs. That will take work though, and I'd rather spend my energy finishing the GPU apps. How does the current credit/hour for the GPU compare to other projects? ID: 2344 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 28 Oct 11 Posts: 181 Credit: 327,446,279 RAC: 221,576	Message 2346 - Posted: 31 Mar 2019, 21:30:58 UTC - in response to Message 2344. How does the current credit/hour for the GPU compare to other projects? If they are the same WUs, they should get the same credit - whatever hardware they are run on. The concept of "credit/hour for the GPU" is a movable feast. I run the same GPUs for both SETI and GPUGrid: the credit awarded is roughly in the ratio 1::20. SETI pays too low, GPUGrid pays much too high. I usually reckon that the staff at Einstein make the most reasonable attempt to follow the definition of the cobblestone (section Claimed and granted credit). The 'cobblestone' is the sunday best formal name for the credit, named after Jeff Cobb of SETI. A 1 GFlop device will earn 200 cobblestones per day. More coming in the other thread. ID: 2346 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 1 Feb 17 Posts: 28 Credit: 100,024,656 RAC: 145,869	Message 2348 - Posted: 1 Apr 2019, 1:24:27 UTC - in response to Message 2344. Last modified: 1 Apr 2019, 1:28:58 UTC I see credit was recently doubled to 8,000 (and the average runtime also seems to have increased with it, though not quite doubled) and now decreased to 800. I'm not complaining, just curious about the reasoning. The difference in average times on the CPU went from 1.5 hours for the DS15x11 WUs to 3 hours for the DS14x12 WUs (these are averages over 100 WUs, all on the same machine). This is the reason for the initial doubling. So it is approximate, and of course there is no guarantee the GPU times will scale the same, but it should be close. Then, as discussed in another thread, the credits were way too high for CPUs. To bring them back in line, I had to scale them back down by 10. The cpu WUs are still paying almost 2x the original (from several weeks ago). Here's the problem: The original crediting system could be majorly abused by GPUs, so we had to change it. CreditNew paid next to nothing, so now we are paying a fixed amount per WU. This is not ideal either, as GPUs and CPUs should be treated differently. I think the best solution will be to use CreditNew, but hack the validator code to scale up the payout for GPUs. That will take work though, and I'd rather spend my energy finishing the GPU apps. How does the current credit/hour for the GPU compare to other projects? One of the lowest credit projects now for GPUs. 261k for a 1080Ti yesterday but not sure if that was all at 800 or some older 4k/8k tasks. SETI uses CreditNew and will be better than 1 credit/sec so like 100k or something. Asteroids is probably lower but A@H is more of a CPU project IMO. Collatz, MW, E@H, GPUGrid, Enigma, Amicable, PrimeGrid, and Moo produce more. If the GPU app can do 20x more work then per second it should receive 20x the credit compared to your own CPU app. ID: 2348 · Rating: 0 · rate: / Reply Quote

Alez Send message Joined: 31 Oct 12 Posts: 14 Credit: 30,615,327 RAC: 0	Message 2351 - Posted: 2 Apr 2019, 10:25:22 UTC In my opinion, the change to 800 credits per task on a GPU is far too low. Here's a simple number Eric. In all the years of running numberfields on a CPU, I found 7 decic's. Since the GPU app became available, I am now up to 18 in about a week of running on only 2 cards. I'm sure this will be replicated by everyone else running GPU's here. I believe you are quoting that a GPU does 20 x the work of a CPU. You need to encourage GPU's not make the project not worthwhile crunching compared to other GPU projects. Apparently you had to drop the GPU credit because it was making the CPU credit too high. Why can't the award of points be based on whether the task was run on a CPU or a GPU ? Other project's manages that task. The increased numbers of decic's found by just myself should indicate how much more work is being achieved by people running GPU's and should encourage you to award and attract more crunchers to use that platform. On most projects, the introduction of GPU apps makes the CPU app almost obsolete. ID: 2351 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 28 Oct 11 Posts: 181 Credit: 327,446,279 RAC: 221,576	Message 2354 - Posted: 2 Apr 2019, 22:11:17 UTC - in response to Message 2351. I disagree. Payment should be for the work done in searching - the same number of credits for the search task, whatever the device used. GPUs will win out vastly in the number of credits awarded per unit of time - per second, per hour, per day, however you choose to measure it. Your own statement that you've found more candidate values since the GPU app was released confirms that you're conducting more searches. Good on you - that's your reward. You don't need to be compensated twice - once for doing more searches, and again for doing searches on a different device. ID: 2354 · Rating: 0 · rate: / Reply Quote

Alez Send message Joined: 31 Oct 12 Posts: 14 Credit: 30,615,327 RAC: 0	Message 2356 - Posted: 3 Apr 2019, 0:08:41 UTC - in response to Message 2354. I disagree. Payment should be for the work done in searching - the same number of credits for the search task, whatever the device used. GPUs will win out vastly in the number of credits awarded per unit of time - per second, per hour, per day, however you choose to measure it. Your own statement that you've found more candidate values since the GPU app was released confirms that you're conducting more searches. Good on you - that's your reward. You don't need to be compensated twice - once for doing more searches, and again for doing searches on a different device. It's not compensation for finding candidates. Points are an incentive to conduct the searches in the first place, at least for me. Incentivise GPU crunching and you get more people finding more candidates far quicker than using CPU's. Set the points too low and I use my GPU's elsewhere. Mercenary as that may be, points awarded are my incentive to provide my equipment and burn my electricity for projects. Others may have different motives which I can respect. This is simply my outlook. ID: 2356 · Rating: 0 · rate: / Reply Quote

Cruncher Pete Send message Joined: 19 Aug 11 Posts: 3 Credit: 96,631,139 RAC: 10,896	Message 2357 - Posted: 4 Apr 2019, 3:02:01 UTC - in response to Message 2354. I disagree. Payment should be for the work done in searching - the same number of credits for the search task, whatever the device used. GPUs will win out vastly in the number of credits awarded per unit of time - per second, per hour, per day, however you choose to measure it. Your own statement that you've found more candidate values since the GPU app was released confirms that you're conducting more searches. Good on you - that's your reward. You don't need to be compensated twice - once for doing more searches, and again for doing searches on a different device. I disagree with you Richard. In deed I am not sure what you are saying in your first sentence. Are you saying that we should get the same credit weather we use a CPU or a GPU? Your statement "whatever device used" seems to indicate that. For a bit of background, I joined SETI before any other project was running. Currently, I have dedicated 16 High Powered computers purely for Scientific Research each with one or two GPU's. I stopped running SETI when research into the subject of "credit new" indicated that it is an unfair basic system to follow as is pro Personally by the number of projects that refused to use credit new. Personally, I paid for my equipment and have dedicated them for Scientific Research, I became a credit whore since the creation of credit new and will crunch any legitimate project that is Scientifically orientated that gives the best credit. Are you serious and wish to move a goal post again during the game by changing the number of credits for each task weather you are using a CPU or a GPU? I don't understand. ID: 2357 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 28 Oct 11 Posts: 181 Credit: 327,446,279 RAC: 221,576	Message 2358 - Posted: 4 Apr 2019, 7:21:00 UTC - in response to Message 2357. Yes, I mean the same credit whether done on CPU or GPU. But let's be clear: I mean the same credit PER TASK. If you use a GPU, you will complete many more tasks, so your overall figures - RAC, total credit - will be much larger: there's your reward. But you'll be doing the same work, and that's what the credit system is designed to reflect. I say 'designed', and I agree the design is flawed: I am particularly concerned that no attempt has been made internally to assess and correct its behaviour since it was launched in 2010. But I would like to go back to a situation where it was expected that all projects paid as near as dammit the same level of credit, so that decisions on what to crunch were made on other factors - scientific interest, publication rate, or whatever else tickles your fancy. ID: 2358 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 14 May 23 Posts: 18 Credit: 314,683,119 RAC: 362,418	Message 3730 - Posted: 15 Aug 2024, 22:26:22 UTC - in response to Message 2190. After all these years, we finally have our first GPU app. It's only a beta version for 64bit linux with Nvidia GPUs. Support for other platforms and GPUs will be coming soon. If you'd like to help test this app, you will need to check the "run test applications" box on the project preferences page. I generated a special batch of work for this app from some older WUs that I have truth for. This will help to find any potential bugs that are still there. A few potential issues: 1. This was built with the Cuda SDK version 10.1, so it uses a relatively new Nvidia driver version and only supports compute capability 3.0 and up. If this affects too many users out there, I will need to rebuild with on older SDK. 2. I was not able to build a fully static executable, but I did statically link the ones most likely to be a problem (i.e. pari, gmp, std c++) Please report any problems. I am still relatively new to the whole GPU app process, so I am sure there will be issues of some kind. Also, feel free to leave comments regarding what platform, GPU, etc I should concentrate on next. I was thinking I would attack linux OpenCL (i.e. ATI/AMD) next as that should be a quick port of what I did with Nvidia. I think the windows port will take much longer, since I normally use mingw to cross-compile but I don't think that's compatible with the nvidia compiler. Eric, can you explain the part about only building part of the app as statically compiled. How much of application execution depends on the cpu? I am trying to understand why execution is so much slower on my Epyc hosts with RTX 3080 cards versus my Ryzen Zen 4 hosts with the same RTX 3080 cards as the Epyc hosts. All hosts run the same Ubuntu 24.04 distro with the same exact kernels. The Zen 4 hosts all have 32GB of memory and the Epyc hosts all have 128GB of memory. The only difference in environments between these two cohorts of hosts is the 2X greater cpu clock speeds of the Ryzen 7950X hosts which are running the cpu cores at ~5.0Ghz versus the Eypc hosts which are only running ~2.1-2.4Ghz clock speeds on their cores. So my thinking is the execution of each CUDA30 task must be going out to the cpu for more than the initial data retrieval of the input file. The storage systems of all hosts are using identical M.2 Gen 4 SSD's and the storage speeds are pretty much identical so I don't think that has a major influence in task speed. Can you comment on how much the type and speed of the host cpu has on the speed of execution of the CUDA30 app? Thanks. ID: 3730 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 825,004,089 RAC: 1,135,449	Message 3731 - Posted: 16 Aug 2024, 7:17:04 UTC - in response to Message 3730. After all these years, we finally have our first GPU app. It's only a beta version for 64bit linux with Nvidia GPUs. Support for other platforms and GPUs will be coming soon. If you'd like to help test this app, you will need to check the "run test applications" box on the project preferences page. I generated a special batch of work for this app from some older WUs that I have truth for. This will help to find any potential bugs that are still there. A few potential issues: 1. This was built with the Cuda SDK version 10.1, so it uses a relatively new Nvidia driver version and only supports compute capability 3.0 and up. If this affects too many users out there, I will need to rebuild with on older SDK. 2. I was not able to build a fully static executable, but I did statically link the ones most likely to be a problem (i.e. pari, gmp, std c++) Please report any problems. I am still relatively new to the whole GPU app process, so I am sure there will be issues of some kind. Also, feel free to leave comments regarding what platform, GPU, etc I should concentrate on next. I was thinking I would attack linux OpenCL (i.e. ATI/AMD) next as that should be a quick port of what I did with Nvidia. I think the windows port will take much longer, since I normally use mingw to cross-compile but I don't think that's compatible with the nvidia compiler. Eric, can you explain the part about only building part of the app as statically compiled. How much of application execution depends on the cpu? I am trying to understand why execution is so much slower on my Epyc hosts with RTX 3080 cards versus my Ryzen Zen 4 hosts with the same RTX 3080 cards as the Epyc hosts. All hosts run the same Ubuntu 24.04 distro with the same exact kernels. The Zen 4 hosts all have 32GB of memory and the Epyc hosts all have 128GB of memory. The only difference in environments between these two cohorts of hosts is the 2X greater cpu clock speeds of the Ryzen 7950X hosts which are running the cpu cores at ~5.0Ghz versus the Eypc hosts which are only running ~2.1-2.4Ghz clock speeds on their cores. So my thinking is the execution of each CUDA30 task must be going out to the cpu for more than the initial data retrieval of the input file. The storage systems of all hosts are using identical M.2 Gen 4 SSD's and the storage speeds are pretty much identical so I don't think that has a major influence in task speed. Can you comment on how much the type and speed of the host cpu has on the speed of execution of the CUDA30 app? Thanks. You quoted the original post which is over 5 years old, and much has changed since then. But to answer your question... The CPU works in parallel with the GPU. There are 2 buffers of polynomials for testing - as the GPU processes the one buffer, the CPU does backend processing on the other buffer and then queues it for the next pass. I have seen my CPUs use between 20% and 50% of a core, depending on the relative speeds of the CPU and GPU. You want your core usage to be below 100% otherwise that means the GPU is waiting on the CPU. One way to get around this problem is to run multiple instances on the GPU by configuring your app_config file. Also be careful with hyper-threading since that can use up all your available cores. On Linux just do a "top" command and see what the CPU usage is while it's running and modify your BOINC manager accordingly. Another thing to keep in mind when comparing two systems is the speed of the RAM and the motherboard (i.e. address bus). Remember there is a lot of data being transferred back and forth between the CPU/GPU and RAM. ID: 3731 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 14 May 23 Posts: 18 Credit: 314,683,119 RAC: 362,418	Message 3732 - Posted: 16 Aug 2024, 19:22:47 UTC - in response to Message 3731. Thanks for the reply Eric. The information about the two buffers being processed, one on the gpu and the cpu answers my question and explains the difference. Simply a matter of the faster clocks and processing on the 7950X hosts compared to the pedestrian Epycs along with the faster memory subsystems on the Ryzens compared to the DDR4 3200 ECC memory on the Epycs. I keep cpu usage down around 90% on all hosts so no overcommitment on the cpu is going on. I run 2X tasks on all my gpus to keep them loaded close to 100% at all times. ID: 3732 · Rating: 0 · rate: / Reply Quote