Long running wu_Qsqrt421_DS1x5 units - how long to let them run?

Message boards : Number crunching : Long running wu_Qsqrt421_DS1x5 units - how long to let them run?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
raddoc

Send message
Joined: 17 Dec 15
Posts: 6
Credit: 9,414,792
RAC: 3
Message 1695 - Posted: 20 Apr 2016, 15:18:39 UTC

using application 2.08 windows x86 64

wu Qsqrt421_D53x8_CV1_S815_N2 then

a. 194161_N1_805973_k2_-2_1 and
b. 194163_N1_805977_k2_-1_1

Remaining time listed as 00:00:00 for both and 100% complete

a shows CPU time last chkpt 3d 17:28:51
CPU time 4d 07:39:02
Elapsed time 5d 18:34:51

b shows CPU time last chkpt 3d 01:46:36
CPU time 3d 14:23:49
Elapsed time 4d 20:56:25

Are they finished but continue to run?
Should I abort them?
I have a 3rd related wu running on the same machine for which the CPU time at last chkpt and CPU total are close with 3+ days elapsed and 2+ days remaining and 57+ percent complete.
ID: 1695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 216
Credit: 9,899,302
RAC: 0
Message 1696 - Posted: 20 Apr 2016, 16:18:04 UTC - in response to Message 1695.  
Last modified: 20 Apr 2016, 16:18:29 UTC

I have seen the progress meter go to 100.000% and the WU still continues processing for another few hours. I believe what is happening is that the progress is really 99.9995% and the client is rounding it up to 100. No need to worry that it's stuck; the WU will eventually finish.

Just wait and drink tea. :)
ID: 1696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,816,478
RAC: 288,766
Message 1697 - Posted: 20 Apr 2016, 17:35:12 UTC - in response to Message 1695.  

It was discovered there was a bug in the code that reports the progress to the client (always reports 0) so BOINC is doing it's own estimate and is wrong.

Other than that, there is nothing wrong with the WU and it will eventually finish, but it could be another several days (so I hear from others, I personally have only waited a few more hours).

Version 2.10 of the app fixes the progress meter problem.
ID: 1697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
raddoc

Send message
Joined: 17 Dec 15
Posts: 6
Credit: 9,414,792
RAC: 3
Message 1698 - Posted: 21 Apr 2016, 3:47:51 UTC

Still going - drinking tea, etc. - I'll decide tomorrow . . .
ID: 1698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
raddoc

Send message
Joined: 17 Dec 15
Posts: 6
Credit: 9,414,792
RAC: 3
Message 1699 - Posted: 21 Apr 2016, 13:40:56 UTC

All 3 wus are still going

The first 2 still have remaining time listed as 00:00:00 for both and 100% complete

a) shows CPU time last chkpt 4d 14:28 (up from 3d 17:28:51)
CPU time 5d 01:21 (up from 4d 07:39:02)
Elapsed time 6d 16:58 (up from 5d 18:34:51)

b) shows CPU time last chkpt 4d 05:21 (up from 3d 01:46:36)
CPU time 4d 08:12 (up from 3d 14:23:49)
Elapsed time 5d 19:19 (up from 4d 20:56:25)

I assume this indicates progress or should I abort them?

The 3rd related wu running on the same machine has made demonstrable progress: the CPU time at last chkpt and CPU total remain close and are now 3d 08:48 & 3d 10:20, now with 4+ days elapsed and 1+ days remaining and 71.5% (previous 57+ percent) complete.
ID: 1699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,816,478
RAC: 288,766
Message 1700 - Posted: 21 Apr 2016, 14:41:23 UTC - in response to Message 1699.  

Since they are making progress I would let them continue. I promise you they will finish.
ID: 1700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dj Ninja

Send message
Joined: 23 Feb 13
Posts: 29
Credit: 21,480,710
RAC: 0
Message 1701 - Posted: 22 Apr 2016, 12:46:06 UTC

Beside the primegrid race I have two of these WUs left. One of them "stuck" at 100% and another other holding at 57% for several days. I will continue both of them, the program runs quite stable (excluding the PARI Crashes) and I've done many of these WUs in the past. All of them have been granted credit.
ID: 1701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
raddoc

Send message
Joined: 17 Dec 15
Posts: 6
Credit: 9,414,792
RAC: 3
Message 1702 - Posted: 22 Apr 2016, 16:24:31 UTC

All 3 wus are still going

The first 2 still have remaining time listed as 00:00:00 for both and 100% complete

a) shows CPU time last chkpt 5d 11:30 (up from 4d 14:28 & 3d 17:28:51)
CPU time 5d 23:06 (up from 5d 01:21 & 4d 07:39:02)
Elapsed time 7d 19:46 (up from 6d 16:58 & 5d 18:34:51)

b) shows CPU time last chkpt 4d 18:56 (up from 4d 05:21 & 3d 01:46:36)
CPU time 5d 05:50 (up from 4d 08:12 & 3d 14:23:49)
Elapsed time 6d 22:06 (up from 5d 19:19 & 4d 20:56:25)

I assume this indicates enough progress to continue.

The 3rd related wu running on the same machine is gone (I assume completed)
ID: 1702 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,816,478
RAC: 288,766
Message 1703 - Posted: 22 Apr 2016, 17:28:06 UTC - in response to Message 1702.  


The 3rd related wu running on the same machine is gone (I assume completed)


It looks like you completed a long one yesterday. I wonder if that's it.
http://numberfields.asu.edu/NumberFields/results.php?hostid=26919
ID: 1703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
raddoc

Send message
Joined: 17 Dec 15
Posts: 6
Credit: 9,414,792
RAC: 3
Message 1705 - Posted: 23 Apr 2016, 16:53:34 UTC

wu 15560839 is still running, it is the one I listed as a on previous posts.

CPU time last chkpt 6d 08:51 from 5d 11:30 (up from 4d 14:28 & 3d 17:28:51)
CPU time 6d 19:41 from 5d 23:06 (up from 5d 01:21 & 4d 07:39:02)
Elapsed time 8d 20:00 from 7d 19:46 (up from 6d 16:58 & 5d 18:34:51)

This is truly long . . . but still running

15565332 finished 22 April and 15161473 finished 22 April and appear to be credited appropriately, thanks.

How do I upgrade the app to 2.10 - just wait until this wu finishes and then reset project or remove and then add project?
ID: 1705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,816,478
RAC: 288,766
Message 1706 - Posted: 23 Apr 2016, 19:31:14 UTC - in response to Message 1705.  


How do I upgrade the app to 2.10 - just wait until this wu finishes and then reset project or remove and then add project?


The client picks it up automatically. Looking at your task list, it looks like your more recent jobs have been using it, so you should be good.
ID: 1706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
raddoc

Send message
Joined: 17 Dec 15
Posts: 6
Credit: 9,414,792
RAC: 3
Message 1707 - Posted: 24 Apr 2016, 13:34:48 UTC

At last . . .

It finished with CPU time of 624,877.40 sec and total time of 805,908.80 sec or 173.57 hr and 223.88 hr or 7.31d and 9.33d. I noticed that on most work units the total time is closer to the CPU time and credit seems to be based on the CPU time.

But I am glad that it did go to completion!
ID: 1707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : Long running wu_Qsqrt421_DS1x5 units - how long to let them run?


Main page · Your account · Message boards


Copyright © 2024 Arizona State University