abort?

Message boards : Number crunching : abort?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Rasputin42

Send message
Joined: 5 Nov 11
Posts: 25
Credit: 1,450,603
RAC: 0
Message 478 - Posted: 10 Mar 2012, 5:02:09 UTC

I have some tasks on a slow computer.
Do i abort the task?
They have been running for 5 days and are about 50% done with the deadline reached already.

On a side note:Should the project/ boinc not have prevented this?
ID: 478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 480 - Posted: 10 Mar 2012, 7:40:04 UTC - in response to Message 478.  

I have some tasks on a slow computer.
Do i abort the task?
They have been running for 5 days and are about 50% done with the deadline reached already.

On a side note:Should the project/ boinc not have prevented this?


I guess it depends on how slow it is and how far past the deadline it is. If it just passed the deadline, then you will have 7 more days to finish it. If it's going to be close, let me know and I will increase the grace period (I was considering doing this anyways).

In regards to boinc being able to prevent this, that I'm not sure about. My feeling is that it depends on alot of factors, like which projects you're running, their resource shares, your preferences, etc. Maybe someone else can explain that better.
ID: 480 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42

Send message
Joined: 5 Nov 11
Posts: 25
Credit: 1,450,603
RAC: 0
Message 483 - Posted: 10 Mar 2012, 13:34:20 UTC

The tasks have been running for about 168 hours and reached about 57 %.
They might need another 7 days or so.
(Proc. atom d510, ubuntu 64 bit)
ID: 483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 484 - Posted: 10 Mar 2012, 15:26:40 UTC - in response to Message 483.  

The tasks have been running for about 168 hours and reached about 57 %.
They might need another 7 days or so.
(Proc. atom d510, ubuntu 64 bit)


that's a really slow host and for the worse you probably did not disable HT.

way too slow for running bound decics with the current deadline.

you might be able to save 2 of them by suspending everything else, but that's just a guess..
ID: 484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42

Send message
Joined: 5 Nov 11
Posts: 25
Credit: 1,450,603
RAC: 0
Message 485 - Posted: 10 Mar 2012, 16:04:48 UTC

Thanks,frankenhagen.
It runs 4 tasks at the same time.
However, for what it's worth, it runs completely and directly on solar power.
It only needs 15W to run, so in terms of energy efficiency, it is very good.
Total computing power is not exactly great.
ID: 485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 486 - Posted: 10 Mar 2012, 16:32:30 UTC - in response to Message 485.  

It runs 4 tasks at the same time.


if you want to finish 2 of them, kill those other 2, and set boinc not to request new work. as i said, they will definitely run a lot faster, but if they'll finish in a week - hard to tell.


However, for what it's worth, it runs completely and directly on solar power. It only needs 15W to run, so in terms of energy efficiency, it is very good.Total computing power is not exactly great.


i definitely did not want to critize you - it's just too tough of a job for that host. at least as long as the default deadline is not extended.
but you should be able to run the other app - "Get Bounded Decics".

i'd limit boinc to use only 50% of processors on that host anyway...


ID: 486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 489 - Posted: 10 Mar 2012, 18:34:42 UTC - in response to Message 486.  

I am not familiar with the atom d510, but it sounds like frank knows what he's talking about. I will increase the grace period by another week just for safe measure.
ID: 489 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 491 - Posted: 10 Mar 2012, 19:13:49 UTC - in response to Message 489.  

I am not familiar with the atom d510, but it sounds like frank knows what he's talking about.


i got several of those things (even slower ones), i am not using them on boinc, but i got a pretty good idea how they perfom.

I will increase the grace period by another week just for safe measure.


that's exactly the right measure to take!


ID: 491 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 519 - Posted: 14 Mar 2012, 17:31:44 UTC - in response to Message 491.  

just checked the server status. it's showing runtimes of up to 400h now for get decics.
you will not find a lot of guys out here who will dare to run such things.

if there is any chance to split the jobs so they take only a few hours, you'd probably be able to speed up progress.
ID: 519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 521 - Posted: 14 Mar 2012, 18:32:16 UTC - in response to Message 519.  

just checked the server status. it's showing runtimes of up to 400h now for get decics.
you will not find a lot of guys out here who will dare to run such things.

if there is any chance to split the jobs so they take only a few hours, you'd probably be able to speed up progress.



I haven't checked, but I am sure that was from a slow host. Did you notice that the min was 29 hours, and the average was 66 hours (even with that 400 hour outlier averaged in).

Actually, progress has improved substantially from where it was a week ago (due to extra credits). If it keeps up at this rate, it should complete the current batch within 4 weeks. But yes, you are right, faster WUs would improve user retention and result in faster progress. Rest assurred, I have resolved the issue with my work generator so that the future batches will have much faster WUs.
ID: 521 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42

Send message
Joined: 5 Nov 11
Posts: 25
Credit: 1,450,603
RAC: 0
Message 533 - Posted: 15 Mar 2012, 10:13:28 UTC

My tasks have been running continuously for 290h (about 11 days).
The deadline was mar-9 11 pm.
Calculating from the the percentage done, it will take another 8 days to finish.
How is the remaining time calculated?
The progress of these tasks are very non linear.
ID: 533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 535 - Posted: 15 Mar 2012, 14:34:50 UTC - in response to Message 533.  

My tasks have been running continuously for 290h (about 11 days).
The deadline was mar-9 11 pm.
Calculating from the the percentage done, it will take another 8 days to finish.
How is the remaining time calculated?
The progress of these tasks are very non linear.


A total of 19 days is still within the grace period, so you are good there.

The progress is more of a spatial progress through the search region, so yes the timing is very non-linear. Sometimes the first ~50% goes pretty fast. On my hosts, once I hit about 85% it usually finishes within about an hour. I would guess that once you hit about 65% you are about half way. That might be a better way for you to estimate the time remaining. What are the actual percentages they are at?

Thanks for being patient and hanging in there.
ID: 535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnMD
Avatar

Send message
Joined: 21 Oct 11
Posts: 4
Credit: 1,305,128
RAC: 1,682
Message 537 - Posted: 15 Mar 2012, 19:12:26 UTC

I have 2 long running Bounded WUs.
http://stat.la.asu.edu/NumberFields/workunit.php?wuid=781080
has been running 162 hours and has recently "timed out".
Cpu-time and %-complete advance continuously, but "remaining" has been stuck at 19 hours for the last couple of days.
http://stat.la.asu.edu/NumberFields/workunit.php?wuid=784564
is similar - just a day younger - and cannot get past 30 hours remaining.
Abort ?
ID: 537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 538 - Posted: 15 Mar 2012, 19:49:52 UTC - in response to Message 537.  

there allways is the possibility to manually grant credits even if such things happen. but that's a decision eric has to make.

since we most likely are only talking about a few tasks in that misery, this could be a solution.
ID: 538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 539 - Posted: 15 Mar 2012, 21:01:04 UTC - in response to Message 537.  

I have 2 long running Bounded WUs.
http://stat.la.asu.edu/NumberFields/workunit.php?wuid=781080
has been running 162 hours and has recently "timed out".
Cpu-time and %-complete advance continuously, but "remaining" has been stuck at 19 hours for the last couple of days.
http://stat.la.asu.edu/NumberFields/workunit.php?wuid=784564
is similar - just a day younger - and cannot get past 30 hours remaining.
Abort ?


Periodically, the Bounded Decic app has a long WU. You are still well within the grace period, so it should finish and you should get credit. As long as the progress meter continues to move forward, then I wouldn't worry about it. Not sure why the remaining time should be stuck, as that usually goes down as the progress gets closer to 100%.
ID: 539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 540 - Posted: 15 Mar 2012, 21:04:09 UTC - in response to Message 538.  

there allways is the possibility to manually grant credits even if such things happen. but that's a decision eric has to make.

since we most likely are only talking about a few tasks in that misery, this could be a solution.


I have no problem manually granting credit if it's warranted. The only problem is I am unsure how to do that. If someone knows the mechanism, please post it here.
ID: 540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnMD
Avatar

Send message
Joined: 21 Oct 11
Posts: 4
Credit: 1,305,128
RAC: 1,682
Message 541 - Posted: 15 Mar 2012, 23:19:54 UTC - in response to Message 540.  
Last modified: 15 Mar 2012, 23:20:24 UTC

Thanks Eric for your interest.
781080 has been re-issued, so you're likely to get an alternative result anyway.
784564 will time-out in a few hours - I guess that too will find a new home.
I'll continue for now - at least until there's feedback from one of these WUs.
ID: 541 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 542 - Posted: 16 Mar 2012, 1:21:01 UTC - in response to Message 541.  

Thanks Eric for your interest.
781080 has been re-issued, so you're likely to get an alternative result anyway.
784564 will time-out in a few hours - I guess that too will find a new home.
I'll continue for now - at least until there's feedback from one of these WUs.


Dont worry too much about it being reissued. Many times the reissued WU times out or gets aborted via the GUI, and the original late result becomes the canonical result. Whatever happens, you still get credit for it.
ID: 542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42

Send message
Joined: 5 Nov 11
Posts: 25
Credit: 1,450,603
RAC: 0
Message 543 - Posted: 16 Mar 2012, 22:03:46 UTC

I have two tasks at about 64.3%, run time so far 321h( or 13.3 days),
If that is about the half way point, they will run for 26 Days total!!!!!!
Even for a slow (isch) computer, these kind of times are way, way too long!
Another task, also running for 321h is at 82%.

This is the other end of extreme.
Milkyway mt cpu tasks are running for between 5 and 40 seconds each.
The download and upload takes longer than that--very inefficient.
That is ridiculous as well.
ID: 543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,762,198
RAC: 288,280
Message 544 - Posted: 17 Mar 2012, 7:11:59 UTC - in response to Message 543.  

I have two tasks at about 64.3%, run time so far 321h( or 13.3 days),
If that is about the half way point, they will run for 26 Days total!!!!!!
Even for a slow (isch) computer, these kind of times are way, way too long!
Another task, also running for 321h is at 82%.

This is the other end of extreme.
Milkyway mt cpu tasks are running for between 5 and 40 seconds each.
The download and upload takes longer than that--very inefficient.
That is ridiculous as well.


Yes, those times are quite obscene. It's probably best to not run the Get Decics app on this host (until the next batch is released). I would let the current WUs finished, so you can at least get credit for them.

I'm wondering, is the app being interrupted alot? Because if so, it starts from the last checkpoint. It should checkpoint every 5 minutes, but on a slower host it may checkpoint less often (since checkpointing occurs at a specific point in the loop, and it may take more than 5 minutes to get to that point). Note that every time the progress % goes up is an opportunity to checkpoint.
ID: 544 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : abort?


Main page · Your account · Message boards


Copyright © 2024 Arizona State University