Not enough available work

Message boards : News : Not enough available work
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 527,062,800
RAC: 572,020
Message 2762 - Posted: 11 May 2020, 6:39:44 UTC

There are reports of not enough available WUs.

I believe this is caused by the "accelerated retries" mechanism. So I have temporarily disabled this feature.

This mechanism normally works very well, but I think the current Pentathlon is pushing it to the limit. In particular, if too many WUs are being aborted, then the work queue is flooded with "failed" jobs that now have a higher priority, and these jobs require a "reliable host". To make things worse, any host aborting too many jobs is flagged as unreliable, so won't be given any of these tasks.
ID: 2762 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 527,062,800
RAC: 572,020
Message 2774 - Posted: 21 May 2020, 4:16:09 UTC - in response to Message 2762.  

Things are more stable now and the number of resends has dropped substantially. So I am re-enabling the accelerated retries.
ID: 2774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChrisRaisin

Send message
Joined: 17 May 20
Posts: 2
Credit: 248,953
RAC: 0
Message 2784 - Posted: 25 May 2020, 11:00:47 UTC - in response to Message 2762.  
Last modified: 25 May 2020, 11:06:15 UTC

I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects
have tasks running for less than 2 minutes.

One of the NumberFields@home jobs is even reporting that it will take
410 days 5 hrs. 22 mins and 45 seconds to complete (over a year!)

Every other job is showing 9 days 11 hours and 11 minutes estimated completion time

Surely this project should not be stopping other tasks (such as Coronavirus research) from running.
ID: 2784 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 7 Oct 11
Posts: 17
Credit: 17,470,176
RAC: 4,611
Message 2785 - Posted: 25 May 2020, 14:15:34 UTC - in response to Message 2784.  

I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects
have tasks running for less than 2 minutes.

One of the NumberFields@home jobs is even reporting that it will take
410 days 5 hrs. 22 mins and 45 seconds to complete (over a year!)

Every other job is showing 9 days 11 hours and 11 minutes estimated completion time

Surely this project should not be stopping other tasks (such as Coronavirus research) from running.


It is not the Project doing it's Boinc doing it, it's running the workunits that need to be run first so they do not exceed the deadlines for each workunit. Setting a smaller will give more flexibility for Boinc to run your Project choices, most people with always on internet, and no charges for different times of the day, run anything from a zero cache to something like 0.5 in both boxes or even 0.5 in the first box and 0.25 in the second box. That will give you around one days worth of workunits depending on how long the tasks are. Another way to affect things is the Resource Share you have set for each Project, the ones with the higher resource share will get more cpu time than ones with a lower resource share. BUT you will often still see all of the cores tied up on one Project instead of 3 cores on Project and 1 core on another Project, on a 4 core cpu. Boinc does not have the ability to set that project a uses 1 cpu core and project b uses 3 cpu cores, in that same 4 cpu core example. There are some 3rd party apps that can help with that but they require alot of manual attention and I don't recommend that for new crunchers. Try and use the built in settings and Boinc will be easier to use in the long run.
ID: 2785 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 527,062,800
RAC: 572,020
Message 2786 - Posted: 25 May 2020, 18:14:46 UTC - in response to Message 2784.  

I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects
have tasks running for less than 2 minutes.

One of the NumberFields@home jobs is even reporting that it will take
410 days 5 hrs. 22 mins and 45 seconds to complete (over a year!)

Every other job is showing 9 days 11 hours and 11 minutes estimated completion time

Surely this project should not be stopping other tasks (such as Coronavirus research) from running.


Something is horribly wrong with those estimated times. If it's a task on the AMD GPU then it is hung and you should kill it and disable beta apps for this project. If this is a CPU task then it should only take a few hours. Is it possible you are running another project that happens to be multi-threaded? If so, the NumberFields task might only be allocated a small fraction of a core, which would obviously increase runtime. In task manager look what the CPU percentage is for the NumberFields task - that might give you a hint as to what is going on.
ID: 2786 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 527,062,800
RAC: 572,020
Message 2815 - Posted: 7 Jun 2020, 4:28:26 UTC - in response to Message 2774.  

Things are more stable now and the number of resends has dropped substantially. So I am re-enabling the accelerated retries.


I am noticing a large number of resends again, so will temporarily disable the accelerated retries just to be safe.
ID: 2815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Not enough available work


Main page · Your account · Message boards


Copyright © 2024 Arizona State University