Talk about your long ones

Message boards : Number crunching : Talk about your long ones
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Grandpa

Send message
Joined: 11 Jun 14
Posts: 11
Credit: 45,376,032
RAC: 0
Message 1271 - Posted: 3 May 2015, 21:53:21 UTC

10675756 	9875276 	30 Apr 2015, 14:17:14 UTC 	3 May 2015, 19:14:48 UTC 	Completed and validated 	276,695.77 	276,228.60 	7,307.92 	Get Decics with Bounded Discriminant v3.02 


I do believe this is the longest running one I have ever had that completed and got credit for, what is the longest one ever run that completed and got credit
ID: 1271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDLS

Send message
Joined: 15 Mar 15
Posts: 11
Credit: 113,280,935
RAC: 0
Message 1282 - Posted: 31 May 2015, 4:26:07 UTC

I have a long-running WU (wu12E10_SF73-0_Idx3_Grp58252of116272) that reports "Memory Leak Detected!!!". It's been running for 64 hours, BOINC reports 190 to go (and rising). It's not likely to meet it's June 3rd deadline.

Abort?
ID: 1282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1283 - Posted: 31 May 2015, 6:55:34 UTC - in response to Message 1282.  

I have a long-running WU (wu12E10_SF73-0_Idx3_Grp58252of116272) that reports "Memory Leak Detected!!!". It's been running for 64 hours, BOINC reports 190 to go (and rising). It's not likely to meet it's June 3rd deadline.

Abort?


Well, with the 3 day grace period that means you actually have 6 days. Another thing to consider, the 190 hours reported by BOINC may not be accurate. It's up to you, but I'd say hang on a little longer.

As for the memory leaks, those are only reported when the app is stopped. If it reports it multiple times on a single WU, it's a sign that something is periodically interrupting the process, which can prolong run times. Windows, as well as some AV software, run various background tasks, which can cause the BOINC client to suspend jobs. To get around that, in the BOINC manager under computing preferences, I set the "while processor usage is less than" to 0. I apologize if you already know all this.
ID: 1283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grandpa

Send message
Joined: 11 Jun 14
Posts: 11
Credit: 45,376,032
RAC: 0
Message 1285 - Posted: 1 Jun 2015, 11:30:57 UTC

10830548 	9890344 	24 May 2015, 20:27:34 UTC 	1 Jun 2015, 1:43:45 UTC 	Completed and validated 	595,565.21 	595,057.10 	9,504.58 	Get Decics with Bounded Discriminant v3.02 


I just completed the one listed above looks like it took 6.9 days to run it on a 3500Mhz machine.
ID: 1285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1287 - Posted: 1 Jun 2015, 19:43:43 UTC - in response to Message 1285.  

10830548 	9890344 	24 May 2015, 20:27:34 UTC 	1 Jun 2015, 1:43:45 UTC 	Completed and validated 	595,565.21 	595,057.10 	9,504.58 	Get Decics with Bounded Discriminant v3.02 


I just completed the one listed above looks like it took 6.9 days to run it on a 3500Mhz machine.


That might be a new record, especially since it's with a fast machine.

This is the kind of case that will benefit from the new 64 bit windows executables - with the new version it would have taken ~3.5 days instead (still long, but much better).
ID: 1287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef J
Avatar

Send message
Joined: 16 Apr 14
Posts: 7
Credit: 6,694,951
RAC: 0
Message 1289 - Posted: 6 Jun 2015, 9:51:19 UTC

ID: 1289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef J
Avatar

Send message
Joined: 16 Apr 14
Posts: 7
Credit: 6,694,951
RAC: 0
Message 1290 - Posted: 6 Jun 2015, 12:54:44 UTC - in response to Message 1289.  

But right now I count two GDBD 3.02 and shows me the time used 77 and 44 hours. completion time is unknown
ID: 1290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1291 - Posted: 7 Jun 2015, 5:46:48 UTC - in response to Message 1289.  

http://numberfields.asu.edu/NumberFields/results.php?userid=17911&offset=0&show_names=0&state=4&appid=

5 hours maximum. but just I do not understand it..-:)


That link seems to be broken. What exactly is your concern?
ID: 1291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef J
Avatar

Send message
Joined: 16 Apr 14
Posts: 7
Credit: 6,694,951
RAC: 0
Message 1292 - Posted: 7 Jun 2015, 11:01:16 UTC - in response to Message 1291.  

yes..I removed them yesterday.
I just wanted to find out how long to calculate new tasks on different CPUs.
ID: 1292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1293 - Posted: 7 Jun 2015, 23:24:53 UTC - in response to Message 1292.  

yes..I removed them yesterday.
I just wanted to find out how long to calculate new tasks on different CPUs.


The run times vary by app and by batch. GBD WUs are faster than GD on average, but have periodic outliers that are quite a bit higher than the average. In technical terms, GBD has a smaller mean but a much larger variance than GD.

On a 3.5 GHz Linux machine, GD is averaging about 7 hours per WU for the sf3_DS10x271 units and about 2 hours per WU for the more recent Qsqrt421 units.

I run very few of the GBD WUs on my machines, so I can't say with certainty what the average runtimes are. If I had to guess based on the tests I have run and based on runtimes of results being returned, I would say GBD averages less than 1 hour per WU on a 3GHz or faster machine.
ID: 1293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 10 Jan 12
Posts: 8
Credit: 2,141,406
RAC: 3,303
Message 1297 - Posted: 18 Jun 2015, 9:52:46 UTC

I've two long running results
for both the progress bar does not change since some days, I've recognized it yesterday.
But the results checkpoint round every minute (the files "GetBoundedDecics_state" and "boinc_task_state.xml" are updated every minute)

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10081686
CPU time now: 26 hours at 38,37% (server deadline 19 Jun 2015, 12:34:13 UTC)

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10094998
CPU time now: 19 hours at 35,95% (server deadline 21 Jun 2015, 9:58:28 UTC)

can you extend the deadline on Server for my results, so no one else gets this results?
Or should I abort this results on my host?
Matthias
ID: 1297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1298 - Posted: 18 Jun 2015, 18:04:30 UTC - in response to Message 1297.  

I've two long running results
for both the progress bar does not change since some days, I've recognized it yesterday.
But the results checkpoint round every minute (the files "GetBoundedDecics_state" and "boinc_task_state.xml" are updated every minute)

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10081686
CPU time now: 26 hours at 38,37% (server deadline 19 Jun 2015, 12:34:13 UTC)

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10094998
CPU time now: 19 hours at 35,95% (server deadline 21 Jun 2015, 9:58:28 UTC)

can you extend the deadline on Server for my results, so no one else gets this results?
Or should I abort this results on my host?


I would let them continue. I don't know an easy way to extend the deadline on a result that is already in progress. But if it goes beyond the grace period and you end up losing the credit on it, let me know and I will manually give you credit (which is relatively easy to do).
ID: 1298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 10 Jan 12
Posts: 8
Credit: 2,141,406
RAC: 3,303
Message 1308 - Posted: 23 Jun 2015, 9:13:57 UTC - in response to Message 1298.  

interesting, package
http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10081686
has a valid result with CPU time less 10000 seconds.
My result hasn't finished after more than 2 days CPU time.
The CPU performance is not so different based on boinc measurement.
This looks strange to me.
And the values in the "GetBoundedDecics_state" file of my result increase every minute.

Today morning I had the following values:
0
-2
30
8
-62
13324144
0
13322140
0
241474
Matthias
ID: 1308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 10 Jan 12
Posts: 8
Credit: 2,141,406
RAC: 3,303
Message 1313 - Posted: 25 Jun 2015, 6:38:29 UTC

For me it looks like check pointing does not work correct. But I'm not sure.
Attached the last two starts of the result, running round 8 hours each day.
http://numberfields.asu.edu/NumberFields/workunit.php?wuid=10081686

Reading checkpoint file.
Checkpoint Flag = 1.
a5 Starting Index = 0.
a22 Starting Value = -2.
a21 Starting Value = 30.
a32 Starting Value = 8.
a31 Starting Value = -62.
PolyCount starting value = 13324144.
Stat Count 1 = 0.
Stat Count 2 = 13322140.
Stat Count 3 = 0.
Elapsed Time = 241474 (sec).
Entering MartinetSearch routine...
Disc Bound = 120000000000.00000000
Reading file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782.dat:
K = y^2 - 73
TgtFlag = 0
a1 Index = 6
NumVals_a5 = 1
a5 values:
-38 + -8w
a22_L = 0
a22_U = 0
a21_L = 32
a21_U = 32
a32_L = 3
a32_U = 27
|dK| = 73
Signature = [2,0]
a11 = 1
a12 = 2
sig1a1 = -8.5440037453175311678716483262397064346
sig2a1 = 8.5440037453175311678716483262397064346
Ca1_pre = 29.200000
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782_0_0
Now starting the Martinet search:

Doing case a5 = -38 + -8w...
2nd part of Martinet bound = 18.978846.
Martinet bound = 48.178846.
a22_L = -2.
a22_U = 0.
a22 = -2.
a21_L = 21.
a21_U = 32.
a21 = 21.
a21 = 22.
a21 = 23.
a21 = 24.
a32_L = -11.
a32_U = 13.
a21 = 25.
a32_L = -9.
a32_U = 15.
a21 = 26.
a32_L = -7.
a32_U = 17.
a21 = 27.
a32_L = -5.
a32_U = 19.
Reading checkpoint file.
Checkpoint Flag = 1.
a5 Starting Index = 0.
a22 Starting Value = -2.
a21 Starting Value = 27.
a32 Starting Value = 8.
a31 Starting Value = -30.
PolyCount starting value = 14936179.
Stat Count 1 = 0.
Stat Count 2 = 14933914.
Stat Count 3 = 0.
Elapsed Time = 268104 (sec).
Entering MartinetSearch routine...
Disc Bound = 120000000000.00000000
Reading file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782.dat:
K = y^2 - 73
TgtFlag = 0
a1 Index = 6
NumVals_a5 = 1
a5 values:
-38 + -8w
a22_L = 0
a22_U = 0
a21_L = 32
a21_U = 32
a32_L = 3
a32_U = 27
|dK| = 73
Signature = [2,0]
a11 = 1
a12 = 2
sig1a1 = -8.5440037453175311678716483262397064346
sig2a1 = 8.5440037453175311678716483262397064346
Ca1_pre = 29.200000
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_12E10_SF73-0_Idx6_Grp24787of111782_0_0
Now starting the Martinet search:

Doing case a5 = -38 + -8w...
2nd part of Martinet bound = 18.978846.
Martinet bound = 48.178846.
a22_L = -2.
a22_U = 0.
a22 = -2.
a21_L = 21.
a21_U = 32.
a21 = 21.
a21 = 22.
a21 = 23.
a21 = 24.
a32_L = -11.
a32_U = 13.
a21 = 25.
a32_L = -9.
a32_U = 15.
a21 = 26.
a32_L = -7.
a32_U = 17.
a21 = 27.
a32_L = -5.
a32_U = 19.
a21 = 28.
a32_L = -3.
a32_U = 21.
Matthias
ID: 1313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1314 - Posted: 25 Jun 2015, 16:05:51 UTC - in response to Message 1313.  

Yes, it looks like there's something wrong with the check pointing. The a22 value for that WU should be fixed at 0, but your checkpoint file has it at -2. That would explain why it sat there forever. If you still have the stderr, could you send it to me via private message? That might help me to track down what happened. In the meantime, I'll try to replicate this on my test machine. Thanks!
ID: 1314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,353,862
RAC: 127,899
Message 1315 - Posted: 25 Jun 2015, 17:36:16 UTC - in response to Message 1314.  

I've aborted a few tasks recently on my laptop for extreme slow running - over 500,000 seconds in one case.

Error tasks for computer 1289

Full stderr.txt seems to be viewable by clicking the task links.

I've also noticed recently that there seems to be a significant likelihood that if you shut down BOINC while a Qsqrt task is running, and then restart it, it reaches a 'success' completion a couple of seconds later - even though it didn't previously appear to be anywhere near finishing. Restarting BOINC partway through the run is, of course, when the checkpoint file is read back in.
ID: 1315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 10 Jan 12
Posts: 8
Credit: 2,141,406
RAC: 3,303
Message 1316 - Posted: 26 Jun 2015, 6:41:29 UTC - in response to Message 1314.  

Hi Eric,
stderr.txt is on the way.

I had 4 results, that like to run "forever"
Matthias
ID: 1316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1317 - Posted: 26 Jun 2015, 7:41:16 UTC - in response to Message 1316.  

Hi Eric,
stderr.txt is on the way.

I had 4 results, that like to run "forever"


Ok. Thanks! You can abort those jobs - no need to waste computer resources on them.

I will post back once I learn what going on.
ID: 1317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 1318 - Posted: 26 Jun 2015, 9:32:23 UTC - in response to Message 1317.  

It appears that only the 64 bit windows versions had the problem. I believe it's fixed now and I uploaded the new binaries (both GD and GBD).

Please let me know if you continue to see the eratic behavior. You may need to abort your current work queue in order to pick up the new apps.
ID: 1318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,353,862
RAC: 127,899
Message 1319 - Posted: 26 Jun 2015, 10:03:08 UTC - in response to Message 1318.  

I see I've picked up a new v2.05 app already. I'll abort unstarted v2.01 tasks, but let the ones which are already running complete without interruption.
ID: 1319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Talk about your long ones


Main page · Your account · Message boards


Copyright © 2024 Arizona State University