Message boards :
Number crunching :
Talk about your long ones
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Jun 14 Posts: 11 Credit: 45,376,032 RAC: 0 |
10675756 9875276 30 Apr 2015, 14:17:14 UTC 3 May 2015, 19:14:48 UTC Completed and validated 276,695.77 276,228.60 7,307.92 Get Decics with Bounded Discriminant v3.02 I do believe this is the longest running one I have ever had that completed and got credit for, what is the longest one ever run that completed and got credit |
Send message Joined: 15 Mar 15 Posts: 11 Credit: 113,280,935 RAC: 0 |
I have a long-running WU (wu12E10_SF73-0_Idx3_Grp58252of116272) that reports "Memory Leak Detected!!!". It's been running for 64 hours, BOINC reports 190 to go (and rising). It's not likely to meet it's June 3rd deadline. Abort? |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
I have a long-running WU (wu12E10_SF73-0_Idx3_Grp58252of116272) that reports "Memory Leak Detected!!!". It's been running for 64 hours, BOINC reports 190 to go (and rising). It's not likely to meet it's June 3rd deadline. Well, with the 3 day grace period that means you actually have 6 days. Another thing to consider, the 190 hours reported by BOINC may not be accurate. It's up to you, but I'd say hang on a little longer. As for the memory leaks, those are only reported when the app is stopped. If it reports it multiple times on a single WU, it's a sign that something is periodically interrupting the process, which can prolong run times. Windows, as well as some AV software, run various background tasks, which can cause the BOINC client to suspend jobs. To get around that, in the BOINC manager under computing preferences, I set the "while processor usage is less than" to 0. I apologize if you already know all this. |
Send message Joined: 11 Jun 14 Posts: 11 Credit: 45,376,032 RAC: 0 |
10830548 9890344 24 May 2015, 20:27:34 UTC 1 Jun 2015, 1:43:45 UTC Completed and validated 595,565.21 595,057.10 9,504.58 Get Decics with Bounded Discriminant v3.02 I just completed the one listed above looks like it took 6.9 days to run it on a 3500Mhz machine. |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
10830548 9890344 24 May 2015, 20:27:34 UTC 1 Jun 2015, 1:43:45 UTC Completed and validated 595,565.21 595,057.10 9,504.58 Get Decics with Bounded Discriminant v3.02 That might be a new record, especially since it's with a fast machine. This is the kind of case that will benefit from the new 64 bit windows executables - with the new version it would have taken ~3.5 days instead (still long, but much better). |
Send message Joined: 16 Apr 14 Posts: 7 Credit: 6,694,951 RAC: 0 |
http://numberfields.asu.edu/NumberFields/results.php?userid=17911&offset=0&show_names=0&state=4&appid= 5 hours maximum. but just I do not understand it..-:) |
Send message Joined: 16 Apr 14 Posts: 7 Credit: 6,694,951 RAC: 0 |
But right now I count two GDBD 3.02 and shows me the time used 77 and 44 hours. completion time is unknown |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
http://numberfields.asu.edu/NumberFields/results.php?userid=17911&offset=0&show_names=0&state=4&appid= That link seems to be broken. What exactly is your concern? |
Send message Joined: 16 Apr 14 Posts: 7 Credit: 6,694,951 RAC: 0 |
yes..I removed them yesterday. I just wanted to find out how long to calculate new tasks on different CPUs. |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
yes..I removed them yesterday. The run times vary by app and by batch. GBD WUs are faster than GD on average, but have periodic outliers that are quite a bit higher than the average. In technical terms, GBD has a smaller mean but a much larger variance than GD. On a 3.5 GHz Linux machine, GD is averaging about 7 hours per WU for the sf3_DS10x271 units and about 2 hours per WU for the more recent Qsqrt421 units. I run very few of the GBD WUs on my machines, so I can't say with certainty what the average runtimes are. If I had to guess based on the tests I have run and based on runtimes of results being returned, I would say GBD averages less than 1 hour per WU on a 3GHz or faster machine. |
Send message Joined: 10 Jan 12 Posts: 8 Credit: 2,406,752 RAC: 40 |
I've two long running results for both the progress bar does not change since some days, I've recognized it yesterday. But the results checkpoint round every minute (the files "GetBoundedDecics_state" and "boinc_task_state.xml" are updated every minute) http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10081686 CPU time now: 26 hours at 38,37% (server deadline 19 Jun 2015, 12:34:13 UTC) http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10094998 CPU time now: 19 hours at 35,95% (server deadline 21 Jun 2015, 9:58:28 UTC) can you extend the deadline on Server for my results, so no one else gets this results? Or should I abort this results on my host? Matthias |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
I've two long running results I would let them continue. I don't know an easy way to extend the deadline on a result that is already in progress. But if it goes beyond the grace period and you end up losing the credit on it, let me know and I will manually give you credit (which is relatively easy to do). |
Send message Joined: 10 Jan 12 Posts: 8 Credit: 2,406,752 RAC: 40 |
interesting, package http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10081686 has a valid result with CPU time less 10000 seconds. My result hasn't finished after more than 2 days CPU time. The CPU performance is not so different based on boinc measurement. This looks strange to me. And the values in the "GetBoundedDecics_state" file of my result increase every minute. Today morning I had the following values: 0 -2 30 8 -62 13324144 0 13322140 0 241474 Matthias |
Send message Joined: 10 Jan 12 Posts: 8 Credit: 2,406,752 RAC: 40 |
For me it looks like check pointing does not work correct. But I'm not sure. Attached the last two starts of the result, running round 8 hours each day. http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10081686 Reading checkpoint file. Checkpoint Flag = 1. a5 Starting Index = 0. a22 Starting Value = -2. a21 Starting Value = 30. a32 Starting Value = 8. a31 Starting Value = -62. PolyCount starting value = 13324144. Stat Count 1 = 0. Stat Count 2 = 13322140. Stat Count 3 = 0. Elapsed Time = 241474 (sec). Entering MartinetSearch routine... Disc Bound = 120000000000.00000000 Reading file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782.dat: K = y^2 - 73 TgtFlag = 0 a1 Index = 6 NumVals_a5 = 1 a5 values: -38 + -8w a22_L = 0 a22_U = 0 a21_L = 32 a21_U = 32 a32_L = 3 a32_U = 27 |dK| = 73 Signature = [2,0] a11 = 1 a12 = 2 sig1a1 = -8.5440037453175311678716483262397064346 sig2a1 = 8.5440037453175311678716483262397064346 Ca1_pre = 29.200000 Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782_0_0 Now starting the Martinet search: Doing case a5 = -38 + -8w... 2nd part of Martinet bound = 18.978846. Martinet bound = 48.178846. a22_L = -2. a22_U = 0. a22 = -2. a21_L = 21. a21_U = 32. a21 = 21. a21 = 22. a21 = 23. a21 = 24. a32_L = -11. a32_U = 13. a21 = 25. a32_L = -9. a32_U = 15. a21 = 26. a32_L = -7. a32_U = 17. a21 = 27. a32_L = -5. a32_U = 19. Reading checkpoint file. Checkpoint Flag = 1. a5 Starting Index = 0. a22 Starting Value = -2. a21 Starting Value = 27. a32 Starting Value = 8. a31 Starting Value = -30. PolyCount starting value = 14936179. Stat Count 1 = 0. Stat Count 2 = 14933914. Stat Count 3 = 0. Elapsed Time = 268104 (sec). Entering MartinetSearch routine... Disc Bound = 120000000000.00000000 Reading file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782.dat: K = y^2 - 73 TgtFlag = 0 a1 Index = 6 NumVals_a5 = 1 a5 values: -38 + -8w a22_L = 0 a22_U = 0 a21_L = 32 a21_U = 32 a32_L = 3 a32_U = 27 |dK| = 73 Signature = [2,0] a11 = 1 a12 = 2 sig1a1 = -8.5440037453175311678716483262397064346 sig2a1 = 8.5440037453175311678716483262397064346 Ca1_pre = 29.200000 Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782_0_0 Now starting the Martinet search: Doing case a5 = -38 + -8w... 2nd part of Martinet bound = 18.978846. Martinet bound = 48.178846. a22_L = -2. a22_U = 0. a22 = -2. a21_L = 21. a21_U = 32. a21 = 21. a21 = 22. a21 = 23. a21 = 24. a32_L = -11. a32_U = 13. a21 = 25. a32_L = -9. a32_U = 15. a21 = 26. a32_L = -7. a32_U = 17. a21 = 27. a32_L = -5. a32_U = 19. a21 = 28. a32_L = -3. a32_U = 21. Matthias |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
Yes, it looks like there's something wrong with the check pointing. The a22 value for that WU should be fixed at 0, but your checkpoint file has it at -2. That would explain why it sat there forever. If you still have the stderr, could you send it to me via private message? That might help me to track down what happened. In the meantime, I'll try to replicate this on my test machine. Thanks! |
Send message Joined: 28 Oct 11 Posts: 180 Credit: 242,469,824 RAC: 148,949 |
I've aborted a few tasks recently on my laptop for extreme slow running - over 500,000 seconds in one case. Error tasks for computer 1289 Full stderr.txt seems to be viewable by clicking the task links. I've also noticed recently that there seems to be a significant likelihood that if you shut down BOINC while a Qsqrt task is running, and then restart it, it reaches a 'success' completion a couple of seconds later - even though it didn't previously appear to be anywhere near finishing. Restarting BOINC partway through the run is, of course, when the checkpoint file is read back in. |
Send message Joined: 10 Jan 12 Posts: 8 Credit: 2,406,752 RAC: 40 |
Hi Eric, stderr.txt is on the way. I had 4 results, that like to run "forever" Matthias |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
Hi Eric, Ok. Thanks! You can abort those jobs - no need to waste computer resources on them. I will post back once I learn what going on. |
Send message Joined: 8 Jul 11 Posts: 1341 Credit: 496,514,376 RAC: 571,174 |
It appears that only the 64 bit windows versions had the problem. I believe it's fixed now and I uploaded the new binaries (both GD and GBD). Please let me know if you continue to see the eratic behavior. You may need to abort your current work queue in order to pick up the new apps. |
Send message Joined: 28 Oct 11 Posts: 180 Credit: 242,469,824 RAC: 148,949 |
I see I've picked up a new v2.05 app already. I'll abort unstarted v2.01 tasks, but let the ones which are already running complete without interruption. |