Longer deadline possible?

Message boards : Number crunching : Longer deadline possible?
Message board moderation

To post messages, you must log in.

AuthorMessage
ChertseyAl

Send message
Joined: 19 Aug 11
Posts: 45
Credit: 1,014,069
RAC: 0
Message 75 - Posted: 29 Aug 2011, 17:45:13 UTC

Normally I find a 3-day deadline fine for WUs from other projects, but the sometimes very long run times mean that I can't complete a one day cache within 3 days most times and have to abort a lot of WUs that won't finish in time. Can't afford to risk letting more than one WU run at the moment as I 'lost' a 42 hour WU that got reported 11 minutes late :(

If you need to keep to 3 days that's fine, I'll just have to keep aborting work, but 5 or 7 days would be helpful as I'd be more confident in letting more than one WU run as they'd have a good chance of completing :)

Cheers,

Al.
ID: 75 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 543,090,930
RAC: 610,516
Message 76 - Posted: 30 Aug 2011, 0:17:58 UTC - in response to Message 75.  

Sorry about that, especially since you lost the credit. I also noticed some hosts download a ton of work units, which they have no chance of finishing before the deadline. I will revisit the config settings when I get home tonight, and hopefully resolve some of these issues.
ID: 76 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
reklov

Send message
Joined: 21 Aug 11
Posts: 2
Credit: 1,011,939
RAC: 0
Message 77 - Posted: 30 Aug 2011, 5:50:35 UTC - in response to Message 76.  

I assume that the strongly varying length of wu's are relevant for the big downloads. On my CPU I had wu's with runtime less than one minute and others with more than 20 hours. Maybe you can see that the runtime is more homogenous?
ID: 77 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 543,090,930
RAC: 610,516
Message 78 - Posted: 30 Aug 2011, 7:22:01 UTC - in response to Message 77.  

I've been thinking about how to make the runtimes more consistent. It's not a trivial problem. I break up the search space into chunks of equal size. This is the easiest method, but unfortunately each region goes through a series of tests which lead to vastly different calculations, hence the huge discrepancy in run times.

With that said, I believe it is possible to determine in advance which regions will require more time, and then break those regions into smaller pieces. But it will require some time to develop and then test. I plan to look into this after I take care of a few other issues. Other than the occasional long run time, the app appears to be relatively stable.
ID: 78 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 543,090,930
RAC: 610,516
Message 79 - Posted: 30 Aug 2011, 7:31:50 UTC - in response to Message 75.  

Normally I find a 3-day deadline fine for WUs from other projects, but the sometimes very long run times mean that I can't complete a one day cache within 3 days most times and have to abort a lot of WUs that won't finish in time. Can't afford to risk letting more than one WU run at the moment as I 'lost' a 42 hour WU that got reported 11 minutes late :(

If you need to keep to 3 days that's fine, I'll just have to keep aborting work, but 5 or 7 days would be helpful as I'd be more confident in letting more than one WU run as they'd have a good chance of completing :)

Cheers,

Al.


Hi Al,

Just to let you know, I increased the grace period by a few days, so hopefully you wont have to abort anymore. I also modified the settings so it should stop uploading an excessive number of WUs (not sure if you had that problem, but I did on my PC).

Eric
ID: 79 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 19 Aug 11
Posts: 45
Credit: 1,014,069
RAC: 0
Message 86 - Posted: 30 Aug 2011, 17:37:21 UTC - in response to Message 79.  
Last modified: 30 Aug 2011, 17:52:18 UTC

Thanks Eric.

I guess in general I get more work than I could crunch in time, but that's because I seem to get long running WUs - Occasionally I get a batch of short ones which I can complete on time :)

The 'short' deadlines and long predicted runtimes force the project into panic mode and stop any other projects from running, which is why I only let one at a time run to give the other core a chance to do something else.

Al.

Edit

p.s. I'd really like crunching maths projects, it's just that this one takes a fair bit of 'babysitting' at the moment. I generally don't like having to abort WUs (although it makes no difference to the project as they get resent anyway), but even time I abort a WU a kitten dies (I made that part up). Be nice if the credit granting settled down too - It's all over the place at the moment :)

/Edit
ID: 86 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 101 - Posted: 2 Sep 2011, 9:11:03 UTC - in response to Message 86.  

Thanks Eric.

I guess in general I get more work than I could crunch in time, but that's because I seem to get long running WUs - Occasionally I get a batch of short ones which I can complete on time :)


Nah, you get more work than you can complete on time because your cache is too big. If you have a reliable 24/7 connection then there is no reason to set "Connect about every..." to more than 0 and "Additional cache..." to more than 0.1.

The 'short' deadlines and long predicted runtimes force the project into panic mode and stop any other projects from running, which is why I only let one at a time run to give the other core a chance to do something else.


Keep a small cache and don't worry about panic mode. They may go into panic mode for a while but eventually that stops and then your other projects get their time too. The more you micro-manage it the worse you make the problem. If you like baby sitting and aborting and suspending then keep a large cache and fill your boots with the baby sitting. If you want to make it easy on yourself then keep a small cache and let BOINC do its thing. Been doing that for years and it works just fine.

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 19 Aug 11
Posts: 45
Credit: 1,014,069
RAC: 0
Message 103 - Posted: 2 Sep 2011, 17:52:37 UTC - in response to Message 101.  


Nah, you get more work than you can complete on time because your cache is too big.


Actually it's only 1 day, and mostly I'm NNW on all projects.


If you have a reliable 24/7 connection


I'm in the UK - Go figure. To be fair, this cable connection is actually very good on speed and uptime :) Way too expensive for what it is though :(

If you want to make it easy on yourself then keep a small cache and let BOINC do its thing. Been doing that for years and it works just fine.


OK. I will!

Actually, I don't babysit and micromanage - Probably only look at the machines twice a day - Don't have the time. Just like to keep an eye out for WUs that won't complete on time or projects with faulty WUs that I'd be shredding :)

Al.
ID: 103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Patrick Harnett*

Send message
Joined: 21 Aug 11
Posts: 1
Credit: 1,001,649
RAC: 0
Message 104 - Posted: 2 Sep 2011, 20:25:08 UTC

+1 vote on this. I was looking at my tasks and have two with very long run times invalidated (a 409,000 second and a 423,000 second (117 hours) task) . Both were running on an i7 CPU.
ID: 104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 543,090,930
RAC: 610,516
Message 108 - Posted: 3 Sep 2011, 0:36:23 UTC - in response to Message 104.  

+1 vote on this. I was looking at my tasks and have two with very long run times invalidated (a 409,000 second and a 423,000 second (117 hours) task) . Both were running on an i7 CPU.


They should never be that long. The most I have seen on my computer is 20 hours, but on occasion I have seen some results returned that were 30+ hours.

I have an idea what the problem might be, because I did see some very long run times on my windows computer about a month ago. The problem was that windows or some other process (e.g. virus scanner) would do something periodically and cause the app to stop before it could checkpoint (space between checkpoints can sometimes be ~1 hour). I fixed this by setting "While processor usage is less than" to 0 in the preferences. You may not want to change this though, as it may slow your computer down. The real fix is for the app to checkpoint more often; the code change is done but I need to test it before I promote the app.

And I am sorry your results were invalidated. I know how frustrating that can be.
ID: 108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Longer deadline possible?


Main page · Your account · Message boards


Copyright © 2024 Arizona State University