Message boards :
Number crunching :
Longer deadline possible?
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 Aug 11 Posts: 45 Credit: 1,014,069 RAC: 0 |
Normally I find a 3-day deadline fine for WUs from other projects, but the sometimes very long run times mean that I can't complete a one day cache within 3 days most times and have to abort a lot of WUs that won't finish in time. Can't afford to risk letting more than one WU run at the moment as I 'lost' a 42 hour WU that got reported 11 minutes late :( If you need to keep to 3 days that's fine, I'll just have to keep aborting work, but 5 or 7 days would be helpful as I'd be more confident in letting more than one WU run as they'd have a good chance of completing :) Cheers, Al. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 548,310,138 RAC: 635,376 |
Sorry about that, especially since you lost the credit. I also noticed some hosts download a ton of work units, which they have no chance of finishing before the deadline. I will revisit the config settings when I get home tonight, and hopefully resolve some of these issues. |
Send message Joined: 21 Aug 11 Posts: 2 Credit: 1,011,939 RAC: 0 |
I assume that the strongly varying length of wu's are relevant for the big downloads. On my CPU I had wu's with runtime less than one minute and others with more than 20 hours. Maybe you can see that the runtime is more homogenous? |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 548,310,138 RAC: 635,376 |
I've been thinking about how to make the runtimes more consistent. It's not a trivial problem. I break up the search space into chunks of equal size. This is the easiest method, but unfortunately each region goes through a series of tests which lead to vastly different calculations, hence the huge discrepancy in run times. With that said, I believe it is possible to determine in advance which regions will require more time, and then break those regions into smaller pieces. But it will require some time to develop and then test. I plan to look into this after I take care of a few other issues. Other than the occasional long run time, the app appears to be relatively stable. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 548,310,138 RAC: 635,376 |
Normally I find a 3-day deadline fine for WUs from other projects, but the sometimes very long run times mean that I can't complete a one day cache within 3 days most times and have to abort a lot of WUs that won't finish in time. Can't afford to risk letting more than one WU run at the moment as I 'lost' a 42 hour WU that got reported 11 minutes late :( Hi Al, Just to let you know, I increased the grace period by a few days, so hopefully you wont have to abort anymore. I also modified the settings so it should stop uploading an excessive number of WUs (not sure if you had that problem, but I did on my PC). Eric |
Send message Joined: 19 Aug 11 Posts: 45 Credit: 1,014,069 RAC: 0 |
Thanks Eric. I guess in general I get more work than I could crunch in time, but that's because I seem to get long running WUs - Occasionally I get a batch of short ones which I can complete on time :) The 'short' deadlines and long predicted runtimes force the project into panic mode and stop any other projects from running, which is why I only let one at a time run to give the other core a chance to do something else. Al. Edit p.s. I'd really like crunching maths projects, it's just that this one takes a fair bit of 'babysitting' at the moment. I generally don't like having to abort WUs (although it makes no difference to the project as they get resent anyway), but even time I abort a WU a kitten dies (I made that part up). Be nice if the credit granting settled down too - It's all over the place at the moment :) /Edit |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
Thanks Eric. Nah, you get more work than you can complete on time because your cache is too big. If you have a reliable 24/7 connection then there is no reason to set "Connect about every..." to more than 0 and "Additional cache..." to more than 0.1. The 'short' deadlines and long predicted runtimes force the project into panic mode and stop any other projects from running, which is why I only let one at a time run to give the other core a chance to do something else. Keep a small cache and don't worry about panic mode. They may go into panic mode for a while but eventually that stops and then your other projects get their time too. The more you micro-manage it the worse you make the problem. If you like baby sitting and aborting and suspending then keep a large cache and fill your boots with the baby sitting. If you want to make it easy on yourself then keep a small cache and let BOINC do its thing. Been doing that for years and it works just fine. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 19 Aug 11 Posts: 45 Credit: 1,014,069 RAC: 0 |
Actually it's only 1 day, and mostly I'm NNW on all projects.
I'm in the UK - Go figure. To be fair, this cable connection is actually very good on speed and uptime :) Way too expensive for what it is though :( If you want to make it easy on yourself then keep a small cache and let BOINC do its thing. Been doing that for years and it works just fine. OK. I will! Actually, I don't babysit and micromanage - Probably only look at the machines twice a day - Don't have the time. Just like to keep an eye out for WUs that won't complete on time or projects with faulty WUs that I'd be shredding :) Al. |
Send message Joined: 21 Aug 11 Posts: 1 Credit: 1,001,649 RAC: 0 |
+1 vote on this. I was looking at my tasks and have two with very long run times invalidated (a 409,000 second and a 423,000 second (117 hours) task) . Both were running on an i7 CPU. |
Send message Joined: 8 Jul 11 Posts: 1346 Credit: 548,310,138 RAC: 635,376 |
+1 vote on this. I was looking at my tasks and have two with very long run times invalidated (a 409,000 second and a 423,000 second (117 hours) task) . Both were running on an i7 CPU. They should never be that long. The most I have seen on my computer is 20 hours, but on occasion I have seen some results returned that were 30+ hours. I have an idea what the problem might be, because I did see some very long run times on my windows computer about a month ago. The problem was that windows or some other process (e.g. virus scanner) would do something periodically and cause the app to stop before it could checkpoint (space between checkpoints can sometimes be ~1 hour). I fixed this by setting "While processor usage is less than" to 0 in the preferences. You may not want to change this though, as it may slow your computer down. The real fix is for the app to checkpoint more often; the code change is done but I need to test it before I promote the app. And I am sorry your results were invalidated. I know how frustrating that can be. |