Message boards :
Number crunching :
300day work unit
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Jan 16 Posts: 6 Credit: 11,031,524 RAC: 0 |
Take a look at the screenshot: Should i abort that WU or should i wait some days? I will not wait 290 Days :D |
Send message Joined: 28 Oct 11 Posts: 180 Credit: 257,294,379 RAC: 194,567 |
It's really helpful when people post references to tasks as searchable (or better, clickable) text. But you showed enough of the task name for me to find Host 27281 WU 12731079 Since your two predecessors have both aborted the task before you, you have the honour, sir. The task is yours, should you choose to accept it. I presume a nice fast Xeon server like that runs continuously and reliably: it should be an ideal platform for the task. Have a look at the thread Long running wu_Qsqrt421_DS1x5 units - how long to let them run?. The longest so far is 28 days, I think - but yours doesn't come into the slowest category (those with N2 in the interval [-78,-38], and the N1 range spanning zero). I think you can be confident the actual (rather than estimated) runtime will be well below 300 days - almost certainly below 30 days. And you should earn double credits at the end of it. |
Send message Joined: 9 Jan 16 Posts: 6 Credit: 11,031,524 RAC: 0 |
Double credit sounds nice and with 30 days of runtime i will be fine. I wasnt sure if the 300 days showing up is an accurate time-value or if the task may run in loop or something like that. Thx for your helpfull answer, i will not abort that wu ;-) |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,360,812 RAC: 628,635 |
Richard - Thanks for responding in my absence. I returned late last night from conference/vacation and am now getting caught up on project issues that I have neglected over the last 2 weeks. |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
I have another long running work unit that appears to be in the same batch as the original poster's batch. See wu_Qsqrt421_DS3x8_CV1_S815_N2_-194073_N1_805123to806926. The contents of my stderr.txt as of this writing is listed below: Checkpoint Flag = 0. Cvec Starting Index = 0. N1 Start = 0. N2 Start = 0. k1 Start = 0. k2 Start = 0. PolyCount starting value = 0. Stat Count 1 = 0. Stat Count 2 = 0. Stat Count 3 = 0. Elapsed Time = 0 (sec). Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS3x8_CV1_S815_N2_-194073_N1_805123to806926.dat K = x^2 - 421 S = [2] Disc Bound = 256 Skip = P^1 Num Congruences = 1 SCALE = 0.815000 N2_MIN = -194073 N2_MAX = -194073 N1_MIN = 805123 N1_MAX = 806926 |dK| = 421 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS3x8_CV1_S815_N2_-194073_N1_805123to806926_1_0 Now starting the targeted Martinet search: N2_L = -194073. N2_U = -194073. N2 = -194073. N1_L = 805123. N1_U = 806926. N1 = 805123. N1 = 805124. N1 = 805125. This task has run for 1 day, 10 hours, and 56 minutes as of this writing. The time it took while processing N1 = 805123 was over 24 hours. After that long period of time, N1 = 805124 was written. Another user who was assigned this work unit timed out. This work unit is causing this dual-core 32-bit Intel Core Duo T2400 computer to cancel work units it could not start before their deadlines because one of the cores in this computer is tied up on this mammoth work unit. (Don't criticize this user for not installing a 64-bit operating system or 64-bit BOINC because the CPU is incapable of running 64-bit code.) I did not post in the other thread because that thread's parameters excluded this work unit because this work unit is part of a different batch than what that thread is discussing. |
Send message Joined: 10 Dec 12 Posts: 5 Credit: 22,083,545 RAC: 0 |
Daniel, I was one of the ones that aborted that WU. I don't mind running long work units, I've complete about 3 so far, the longest being 29 days and I have half a dozen still going including one currently at 31 days, but that, along with another I got, showed a progress of a fraction of a percent after several hours, so I aborted it. Richard maybe right about the estimated runtime being wrong but the progress was unlike any of the other long WUs I've had. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,360,812 RAC: 628,635 |
Hey Jesse - Sorry for the late response, somehow I missed your message. The DS3x8 batch should not suffer from the original problems that were discussed in the other threads. So I am sorry to hear this case is taking so long; maybe there is still an issue in my formula for breaking up the search space. I am going to guess that this WU will start to speed up as the N1 value increases. This should be the fast region which is why a bunch of N1s are lumped into the same WU, but sometimes the edge values can take longer (but not 24 hours). Something that might be happening, at least I've seen it when I run stand-alone, is that the output buffers sometimes are not flushed until they reach a certain size. When this happens, the output appears to be hung on 1 value of N1 for a long time and then all of a sudden the entire WU finishes, dumping the rest of the output to the screen at one time. I'm not sure if stderr acts the same way. |
Send message Joined: 28 Oct 11 Posts: 180 Credit: 257,294,379 RAC: 194,567 |
It's not as simple as that, I'me afraid. I can see three different classes of long-running DS.x8 tasks. Examples: 1) wu_Qsqrt421_DS3x8_CV1_S815_N2_-194129_N1_805669_k2_-2 (just finished). I had four of these - now three - which reached 97% without drawing attention to themselves, but are extremely slow to finish the last 3%. Overall, they seem to be finishing in under 2 days - so hardly 'long' at all. 2) The ones I mentioned in the adjacent thread a couple of days ago. wu_Qsqrt421_DS1x8_CV1_S815_N2_-115_N1_-8062to-3289 (20 days) wu_Qsqrt421_DS1x8_CV1_S815_N2_-92_N1_-8049to-3280 (19 days) wu_Qsqrt421_DS1x8_CV1_S815_N2_-117_N1_-8062to-3291 (12 days) All are spending their slow-running time in the 48% - 53% area (so far - I don't know when they speed up again). 3) 'Slow starters' wu_Qsqrt421_DS3x8_CV1_S815_N2_-194088_N1_806767to808794 (2 days running, 0.275% progress) wu_Qsqrt421_DS1x5_CV2_S815_N2_-44_N1_-343to298 (1 day 21 hours, 17.653%) |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
I have been taught that stderr in C is not buffered. If you are using C++, cerr is the C++ version of stderr and is not buffered. C++ also provides a buffered version of stderr called clog, which is naturally much easier on SSDs (because a read-modify-write cycle for each character must occur without a buffer which therefore racks up plenty of wear and tear to the SSD) and faster on all media but is inappropriate for reporting errors if there are any time-sensitive errors that any delay caused by a buffer is unacceptable. It is meant as a place for a program to output logs that would be inappropriate to pipe to an output file in case the program's output is piped to a file, and such logs would be inappropriate to dump to that file. By the way, the work unit has sped up since the first N1 completed, but I still have a long way to go. I still want to try to complete this work unit, but I am at 0.340% as of this writing. I have run this for 2 days, 11 hours, and 18 minutes as of this writing. Here is my current stderr below: Checkpoint Flag = 0. Cvec Starting Index = 0. N1 Start = 0. N2 Start = 0. k1 Start = 0. k2 Start = 0. PolyCount starting value = 0. Stat Count 1 = 0. Stat Count 2 = 0. Stat Count 3 = 0. Elapsed Time = 0 (sec). Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS3x8_CV1_S815_N2_-194073_N1_805123to806926.dat K = x^2 - 421 S = [2] Disc Bound = 256 Skip = P^1 Num Congruences = 1 SCALE = 0.815000 N2_MIN = -194073 N2_MAX = -194073 N1_MIN = 805123 N1_MAX = 806926 |dK| = 421 Signature = [2,0] Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS3x8_CV1_S815_N2_-194073_N1_805123to806926_1_0 Now starting the targeted Martinet search: N2_L = -194073. N2_U = -194073. N2 = -194073. N1_L = 805123. N1_U = 806926. N1 = 805123. N1 = 805124. N1 = 805125. N1 = 805126. N1 = 805127. N1 = 805128. N1 = 805129. |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
I have aborted my result because the computer that had the tardy result managed to finish and report it. I therefore consider further processing of this work unit to be an inefficient use of energy and processor time which could be put to better use on another result from this project or another project. |
Send message Joined: 9 Jan 16 Posts: 6 Credit: 11,031,524 RAC: 0 |
wu_Qsqrt421_DS3x8_CV1_S815_N2_-194114_N1_805809to806198_1 is one of my longest (and last of that kind) running task, with currently ~7days and will finish in ~2,5 days. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,360,812 RAC: 628,635 |
I have aborted my result because the computer that had the tardy result managed to finish and report it. I therefore consider further processing of this work unit to be an inefficient use of energy and processor time which could be put to better use on another result from this project or another project. Jesse - That makes sense. I went ahead and granted you manual credit for the lost CPU cycles. For anyone else in the same situation, let me know and I can do the same for you (private message works). |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,360,812 RAC: 628,635 |
wu_Qsqrt421_DS3x8_CV1_S815_N2_-194114_N1_805809to806198_1 is one of my longest (and last of that kind) running task, with currently ~7days and will finish in ~2,5 days. Ok Daniel. Thanks for hanging in there! |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
Thanks! |
Send message Joined: 20 Dec 14 Posts: 17 Credit: 12,153,123 RAC: 0 |
By the way, I placed a suggestion in the wish list at http://numberfields.asu.edu/NumberFields/forum_thread.php?id=276 that could help you diagnose work units that are poorly split (they are too short e.g. a few seconds to a few minutes and therefore tax your back end by having it process loads of short work units a.k.a. runts rather than a reasonable number of regular-sized work units or they are giant work units). Basically, I would like future versions of your programs to place UTC timestamps into each line in the stderr output. This may help you figure out which numbers are slow and which ones are fast, and allow you to adjust your work unit generator to produce fewer runts and giants. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 545,360,812 RAC: 628,635 |
By the way, I placed a suggestion in the wish list at http://numberfields.asu.edu/NumberFields/forum_thread.php?id=276 that could help you diagnose work units that are poorly split (they are too short e.g. a few seconds to a few minutes and therefore tax your back end by having it process loads of short work units a.k.a. runts rather than a reasonable number of regular-sized work units or they are giant work units). Basically, I would like future versions of your programs to place UTC timestamps into each line in the stderr output. This may help you figure out which numbers are slow and which ones are fast, and allow you to adjust your work unit generator to produce fewer runts and giants. Yes, that's a good suggestion. I will add it to the list! I do have some post-processing scripts which, given all the results from a batch of WUs, plot timing as a function of various loop indices. The data is a little "noisy" since each WU is processed by a different computer, but nonetheless it has helped me to find better ways to breakup the search space. |