Message boards :
Number crunching :
Gerasim is not working
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9
| Author | Message |
|---|---|
|
Send message Joined: 18 Nov 25 Posts: 24 Credit: 271,620 RAC: 277 |
Now it works. |
ConanSend message Joined: 3 Sep 11 Posts: 40 Credit: 18,018,270 RAC: 25,652 |
All my Nvidia work units fail with this message Boinc Manager 8.2.8 <message> exceeded elapsed time limit 1137.91 (100000.00G/87.88G)</message> <stderr_txt> GPU Summary String = [CUDA|GeForceGT730|1|2048MB|46192|102]. Loading GPU lookup table from file. GPU was not found in the lookup table. Using default values: numBlocks = 1024. threadsPerBlock = 32. polyBufferSize = 32768. Successfully Built Program. Successfully Created Stage 1 Kernel: pdtKernelSubResultantInit. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB8. Successfully Created Stage 1 Kernel: pdtKernelSubResultantMpInit. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB7DegA9. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB7DegA8. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB6DegA9. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB6DegA8. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB6DegA7. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB5. Successfully Created Stage 1 Kernel: pdtKernelSubResultantDegB4. Successfully Created Stage 2 Kernel: pdtKernelDiv2. Successfully Created Stage 2 Kernel: pdtKernelDiv5. Successfully Created Stage 2 Kernel: pdtKernelDivP. Successfully Created Stage 3 Kernel. Successfully Created Polynomial Memory Buffer. Successfully Created Output Flag Memory Buffer. Successfully Created Discriminant Data Buffer. Successfully Created PolyA Data Buffer. Successfully Created PolyB Data Buffer. Successfully Created DegA Data Buffer. Successfully Created DegB Data Buffer. Successfully Created G Data Buffer. Successfully Created H Data Buffer. Successfully Created mpA Data Buffer. Successfully Created mpB Data Buffer. OpenCL initialization was successful. CHECKPOINT_FILE = sf7_DS-16x12-11_Grp644604of3125000_245838365_0.wu_checkpoint. Checkpoint Flag = 0. Reading file in K = x^2 + 10 S = [2, 5] Disc Bound = 3200000000000000 Skip = (P^5)*(Q^5) Num Congruences = 16 SCALE = 1.000000 |dK| = 40 Signature = [0,1] Opening output file out Now starting the targeted Martinet search: Num Cvecs = 16. Doing Cvec 1. Doing Cvec 2. Doing Cvec 3. </stderr_txt> All hit the same run time (to the second) and time out, my AMD runs fine. Conan |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1443 Credit: 931,111,089 RAC: 2,834,955 |
All my Nvidia work units fail with this message I think it's hitting the "rsc_fpops_bound" limit and then getting killed. I thought this bound was set conservatively, but I could revisit it. Either way, I don't think the parameter can be changed in the current set of WUs, but I can increase it in the next batch. Is it possible your benchmarks think your GPU is much faster than it really is? The benchmarks are used to convert the FLOPS bound into a time bound, and 1138 sec sounds too fast for your card. Maybe you could rerun benchmarks, or find a way to spoof them to make your GPU look slower. |
|
Send message Joined: 18 Nov 25 Posts: 24 Credit: 271,620 RAC: 277 |
I thought this bound was set conservatively, but I could revisit it. Either way, I don't think the parameter can be changed in the current set of WUs, but I can increase it in the next batch.GeForceGT730 This card was released 12 years ago. It has only 384 CUDA cores. This is likely insufficient to calculate the task on time. It currently finishes after 18 minutes. It's understandable why different people might have such cards for free calculations. We see that it's interrupted at stage 3. Although the longest stage seems to be at stage 13. (I have a draft script that can make this change to published tasks on a Linux server, but not for Widnows Server. And writing something like this for Windows would certainly be a disaster...) I think it's better to increase the limits for subsequent batches. Especially since we haven't yet officially announced that we've fully restored the server. (Last week, I ran various tests on the RTX4060, including changes to gpuLookupTable.txt. I can post the logs, but they're very, very long. I don't think that's suitable for a forum thread like this. To put it briefly: the default settings run for about 4 minutes on average. The best settings showed a runtime of 08:53:16-08:53:28, or 12 seconds. If I haven't messed something up.) |
ConanSend message Joined: 3 Sep 11 Posts: 40 Credit: 18,018,270 RAC: 25,652 |
Thanks Erik and Demis Don't worry too much about it, I was curious more than anything. The card is still put in computers that are set up as a cheaper entry point and it is mainly used for doing BOINC. It is my brothers computer so I will see if I can source a cheap 4060 or 5060 and get it too him, it will be better at the games he sometimes plays anyway. Thanks again Conan |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1443 Credit: 931,111,089 RAC: 2,834,955 |
Thanks Erik and Demis I bumped up the rsc_fpops_bound in the template file just for good measure, so newer batches are allowed to run longer. But one thing that's worrisome is that NumberFields uses the exact same template file and hasn't had this problem in many years. So I wonder if there's something else different between Gerasim and NumberFields server settings that is causing this, maybe a difference in version or a config parameter. @Conan: Have you used this card on a NumberFields WU and does it exhibit the same behavior? Can you compare the "Measured floating point speed" under computer details between NumberFields and Gerasim to see if there is a big difference? |
|
Send message Joined: 4 Jan 25 Posts: 47 Credit: 237,363,099 RAC: 841,873 |
Is it possible your benchmarks think your GPU is much faster than it really is? The benchmarks are used to convert the FLOPS bound into a time bound, and 1138 sec sounds too fast for your card. Maybe you could rerun benchmarks, or find a way to spoof them to make your GPU look slower.Only the CPU has benchmarking to determine it's actual performance. With GPUs it's hard coded using the GFLOPs reported by the driver, from the video card. Grant Darwin NT, Australia. |
|
Send message Joined: 4 Jan 25 Posts: 47 Credit: 237,363,099 RAC: 841,873 |
It is also an abomination.I thought this bound was set conservatively, but I could revisit it. Either way, I don't think the parameter can be changed in the current set of WUs, but I can increase it in the next batch.GeForceGT730 Some cards used GDDR5, others DDR3! It's OK as a video card, but it's useless for compute work. Grant Darwin NT, Australia. |
|
Send message Joined: 1 Feb 17 Posts: 29 Credit: 100,381,296 RAC: 14,081 |
Thanks Erik and Demis The latest BOINC clients give a much higher benchmark compared to older versions. Like 10x |
|
Send message Joined: 4 Jan 25 Posts: 47 Credit: 237,363,099 RAC: 841,873 |
For reference, i found a GT 720 that successfully completed a Task. 10hrs 35 min Better to not use it at all IMHO. Grant Darwin NT, Australia. |
ConanSend message Joined: 3 Sep 11 Posts: 40 Credit: 18,018,270 RAC: 25,652 |
Is it normal for all my work units (that are still on my computer, 53 of them) to be wiped from my account on Gerasim? If I continue to run them will they be credited or do they disappear into the ether and not be counted? This has happened before more than once. Thanks Conan |
|
Send message Joined: 18 Nov 25 Posts: 24 Credit: 271,620 RAC: 277 |
I think this is normal. Yesterday, the database was completely cleared of test tasks. All completed and received tasks have been awarded points. Attention. We're experiencing some kind of glitch in our provider's internal network today! A request has been sent to the provider. Judging by the diagnostics, this is the same event as the one mentioned in the post about magic: https://numberfields.asu.edu/NumberFields/forum_thread.php?id=667&postid=4093#4093 We apologize. |
SerValSend message Joined: 1 Jan 20 Posts: 60 Credit: 58,219,509 RAC: 81,946 |
Hi, Eric. **** Tasks Ready to send: 0 In progress: 0 Send work: Enabled **** @Eric Прошу обеспечить участников заданиями для CPU. ( 100k for Win-CPU, + 100k Linux-cpu. ) **** Regards for all. :) |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1443 Credit: 931,111,089 RAC: 2,834,955 |
Hi, Eric. Done. |