Extra long WUs

Message boards : News : Extra long WUs

To post messages, you must log in.

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 685
Credit: 46,245,231
RAC: 40,836
Message 1385 - Posted: 10 Dec 2015, 4:20:33 UTC

As some of you may be aware from the message boards, there are a small number of extra long WUs (about 6 per data set). These will eventually be fixed in the WU generator, but until then I am manually deleting them when I come across them.

IF you happen to be crunching on one of these when I pull the plug, please let me know and I will manually grant you credit for the lost work.

I can't think of a better way to handle these troublesome WUs. They are like a hot potato that keeps getting passed around from one user to another. Most results fail with "no reply", others are "aborted via the GUI" (probably because users think they are stuck). With a 10 day grace period and 8 failures before the server cancels the WU, it takes a large amount of time before these WUs naturally disappear.

Note that these long WUs will finish normally within 2 or 3 days, so it doesn't hurt to let them continue if you think you may have one. It takes a history of "no replies" before I become aware of it, so the first few users that get a long one are safe from having it cancelled.
ID: 1385 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Matt Kowal
Avatar

Send message
Joined: 12 Oct 13
Posts: 9
Credit: 13,629,549
RAC: 42,104
Message 1386 - Posted: 13 Dec 2015, 5:34:19 UTC - in response to Message 1385.  

I have no issue with long WUs. I let them finish.
The project appears to be movingly along swimmingly.
Cheers.
ID: 1386 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Vitaly

Send message
Joined: 5 Jan 13
Posts: 16
Credit: 3,484,513
RAC: 2,736
Message 1387 - Posted: 13 Dec 2015, 10:37:00 UTC - in response to Message 1385.  

It looks like I have one:
http://numberfields.asu.edu/NumberFields/workunit.php?wuid=11936859

Do you mean it will be finished in 2-3 days?

Thanks.
ID: 1387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 685
Credit: 46,245,231
RAC: 40,836
Message 1388 - Posted: 13 Dec 2015, 19:22:59 UTC - in response to Message 1387.  

It looks like I have one:
http://numberfields.asu.edu/NumberFields/workunit.php?wuid=11936859

Do you mean it will be finished in 2-3 days?

Thanks.


Yes, it should finish within a few days. Obviously will depend on how fast your machine is. I will add it to my watch list. Thanks!
ID: 1388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Vitaly

Send message
Joined: 5 Jan 13
Posts: 16
Credit: 3,484,513
RAC: 2,736
Message 1391 - Posted: 17 Dec 2015, 20:37:29 UTC - in response to Message 1388.  

Is it possible to prolong my task for 2-3 days?

It took 75 hours on my machine and 91 percent has already been completed.

The task stuck two times: on 9% and 91%.
And I am stuck again.

Thanks.
ID: 1391 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 98
Credit: 75,353,157
RAC: 59,115
Message 1392 - Posted: 17 Dec 2015, 22:35:08 UTC

My wu_12E10_SF53-0_Idx9_Grp75512of81406 is in much the same position. It's been running for 8.75 days, and is due to reach even the extended deadline (grace period) at 11:45 tomorrow morning.

Like Vitaly's, it moved very slowly from 0 to ~9%, and from ~90% onwards (exact transition points not observed). But it's now reached 95.796%, and I don't think it's ever stopped completely. Barring disasters, I see no reason why it shouldn't finish here, and it would be nice to supply one of the last half-dozen for this subfield. But it would hardly be worth sending out the next replication which would be automatically generated tomorrow, only to cancel it again later.
ID: 1392 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 685
Credit: 46,245,231
RAC: 40,836
Message 1393 - Posted: 18 Dec 2015, 5:06:45 UTC - in response to Message 1391.  

Is it possible to prolong my task for 2-3 days?

It took 75 hours on my machine and 91 percent has already been completed.

The task stuck two times: on 9% and 91%.
And I am stuck again.

Thanks.


It looks like you have until Dec 22nd, so I think you should be good. But let me know if it starts to go beyond the 22nd. Thanks!
ID: 1393 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 685
Credit: 46,245,231
RAC: 40,836
Message 1394 - Posted: 18 Dec 2015, 5:15:32 UTC - in response to Message 1392.  

My wu_12E10_SF53-0_Idx9_Grp75512of81406 is in much the same position. It's been running for 8.75 days, and is due to reach even the extended deadline (grace period) at 11:45 tomorrow morning.

Like Vitaly's, it moved very slowly from 0 to ~9%, and from ~90% onwards (exact transition points not observed). But it's now reached 95.796%, and I don't think it's ever stopped completely. Barring disasters, I see no reason why it shouldn't finish here, and it would be nice to supply one of the last half-dozen for this subfield. But it would hardly be worth sending out the next replication which would be automatically generated tomorrow, only to cancel it again later.


I believe if I "cancel" that WU then you will end up not getting credit, and I am uncertain if the final result gets returned to the server if it's been cancelled. Since I rarely cancel WUs, I really don't know how the server behaves under these circumstances. If anyone has ideas, I am open to suggestions.
ID: 1394 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Vitaly

Send message
Joined: 5 Jan 13
Posts: 16
Credit: 3,484,513
RAC: 2,736
Message 1395 - Posted: 18 Dec 2015, 6:05:45 UTC - in response to Message 1393.  

This is strange but on my machine is displayed that I should sent out the task until December 19:
http://prntscr.com/9fk6vf

There is a 3 days gap.
I have this problem for all NumberFields tasks.

Hope I really have time untill December 22.
ID: 1395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 685
Credit: 46,245,231
RAC: 40,836
Message 1396 - Posted: 18 Dec 2015, 7:20:50 UTC - in response to Message 1395.  

This is strange but on my machine is displayed that I should sent out the task until December 19:
http://prntscr.com/9fk6vf

There is a 3 days gap.
I have this problem for all NumberFields tasks.

Hope I really have time untill December 22.


Here's my link to the result:
http://numberfields.asu.edu/NumberFields/result.php?resultid=12927950

Maybe the difference is the 3 day grace period. Whatever the cause, don't worry about it. I doubt anyone will be able to get it and return it before you. And if they did, just let me know and I will manually give you credit.
ID: 1396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 98
Credit: 75,353,157
RAC: 59,115
Message 1398 - Posted: 18 Dec 2015, 12:27:01 UTC - in response to Message 1396.  

Well, mine didn't make it within the grace period, although it's making progress and seems to have speeded up this morning - now at 96.046%. But a new copy has gone out anyway - WU 11855997 - and the new holder of the hot (cold?) potato is anonymous. Strange computer - 16-CPU Mac with an ancient GPU.
ID: 1398 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 98
Credit: 75,353,157
RAC: 59,115
Message 1400 - Posted: 19 Dec 2015, 13:19:48 UTC - in response to Message 1398.  

The long one is back and validated. Looking at the anonymous new wingmate, I suggest it might be safe to cancel that task - I think (s)he might have shut down for the weekend, or even the holiday - plenty of tasks in progress, and no contact since yesterday. I doubt they've even started it yet.
ID: 1400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 685
Credit: 46,245,231
RAC: 40,836
Message 1401 - Posted: 20 Dec 2015, 7:10:41 UTC - in response to Message 1400.  

The long one is back and validated. Looking at the anonymous new wingmate, I suggest it might be safe to cancel that task - I think (s)he might have shut down for the weekend, or even the holiday - plenty of tasks in progress, and no contact since yesterday. I doubt they've even started it yet.


Great! Thanks Richard. It looks like it capped your credit since it was so long - I will fix that in the morning.
ID: 1401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Vitaly

Send message
Joined: 5 Jan 13
Posts: 16
Credit: 3,484,513
RAC: 2,736
Message 1403 - Posted: 20 Dec 2015, 7:29:40 UTC - in response to Message 1401.  
Last modified: 20 Dec 2015, 7:30:02 UTC

It is interesting what will be with that cancelled task?
Does it mean that it will not be calculated at all?

Thanks.
ID: 1403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 685
Credit: 46,245,231
RAC: 40,836
Message 1404 - Posted: 20 Dec 2015, 16:47:52 UTC - in response to Message 1403.  

It is interesting what will be with that cancelled task?
Does it mean that it will not be calculated at all?

Thanks.


It turns out that a little while after Richard returned it, the newer host contacted the server at which time the server automatically cancelled that task since it hadn't been started yet. So it all worked out well.
ID: 1404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 98
Credit: 75,353,157
RAC: 59,115
Message 1405 - Posted: 20 Dec 2015, 23:25:58 UTC - in response to Message 1401.  

I will fix that in the morning.

Wow - thanks Eric. I was needing a new toaster.
ID: 1405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jamora

Send message
Joined: 11 Sep 12
Posts: 1
Credit: 40,723,221
RAC: 0
Message 1520 - Posted: 28 Jan 2016, 18:45:48 UTC

Seems I got two of them long running WUs

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12420118 has currently been running for 683 hours and is at ~51%

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12314856 has been running for 775 hours and is at ~43%

I'll keep crunching, maybe they'll finish one day...
ID: 1520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : News : Extra long WUs


Main page · Your account · Message boards


Copyright © 2017 Arizona State University