Some work restart from zero after short "pause"

Message boards : Number crunching : Some work restart from zero after short "pause"
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 211
Credit: 8,816,142
RAC: 82
Message 1904 - Posted: 25 Nov 2017, 3:13:48 UTC
Last modified: 25 Nov 2017, 3:18:06 UTC

I was pausing some sf3 work (3 WU´s) and after restart they just started from 0%, like they ignore there checkpoint. (See code, cp disabled???)
Those units already run for >1hrs and I decided to abort them in order to switch to an other project for a while. Also the runtime seems to be resetted.

Why is he starting the BOINC runtime Debugger? There is no need, bcs "203 (0x000000CB) EXIT_ABORTED_VIA_GUI"


aborted by user
</message>
<stderr_txt>
Checkpoint Flag = 0.
Cvec Starting Index = 0.

ID: 1904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 944
Credit: 102,326,990
RAC: 68,114
Message 1905 - Posted: 25 Nov 2017, 16:13:38 UTC - in response to Message 1904.  

It's possible that it never got to the point in the code where it checkpoints.
ID: 1905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 211
Credit: 8,816,142
RAC: 82
Message 1906 - Posted: 25 Nov 2017, 21:29:32 UTC - in response to Message 1905.  

It's possible that it never got to the point in the code where it checkpoints.


Even after 1,5 hrs? Mhm.

https://numberfields.asu.edu/NumberFields/result.php?resultid=21141746
He was at N2=16 and decided to restart the WU. Maybe its bcs my BOINC manager is not up-to-date...
ID: 1906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 944
Credit: 102,326,990
RAC: 68,114
Message 1907 - Posted: 26 Nov 2017, 5:01:22 UTC - in response to Message 1906.  

Yeah, it looks like there may have been a problem with your system or the BOINC manager. The next computer that picked up that WU finished it successfully and the stderr shows check pointing working as expected.
ID: 1907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 211
Credit: 8,816,142
RAC: 82
Message 1908 - Posted: 27 Nov 2017, 23:13:21 UTC - in response to Message 1907.  

Yeah, it looks like there may have been a problem with your system or the BOINC manager. The next computer that picked up that WU finished it successfully and the stderr shows check pointing working as expected.


Yes. Now I upgraded the manager, after I got the same problem on amicable numbers.
Now it should work, i´ll be back in a few days.
ID: 1908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 211
Credit: 8,816,142
RAC: 82
Message 1909 - Posted: 30 Nov 2017, 0:08:01 UTC

Works fine now. :)

This comeback will be sadly very short, since I have my own sub project running on yoyo@home.
ID: 1909 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Some work restart from zero after short "pause"


Main page · Your account · Message boards


Copyright © 2019 Arizona State University