Message boards :
News :
Not enough available work
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Jul 11 Posts: 1344 Credit: 527,062,800 RAC: 572,020 |
There are reports of not enough available WUs. I believe this is caused by the "accelerated retries" mechanism. So I have temporarily disabled this feature. This mechanism normally works very well, but I think the current Pentathlon is pushing it to the limit. In particular, if too many WUs are being aborted, then the work queue is flooded with "failed" jobs that now have a higher priority, and these jobs require a "reliable host". To make things worse, any host aborting too many jobs is flagged as unreliable, so won't be given any of these tasks. |
Send message Joined: 8 Jul 11 Posts: 1344 Credit: 527,062,800 RAC: 572,020 |
Things are more stable now and the number of resends has dropped substantially. So I am re-enabling the accelerated retries. |
Send message Joined: 17 May 20 Posts: 2 Credit: 248,953 RAC: 0 |
I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects have tasks running for less than 2 minutes. One of the NumberFields@home jobs is even reporting that it will take 410 days 5 hrs. 22 mins and 45 seconds to complete (over a year!) Every other job is showing 9 days 11 hours and 11 minutes estimated completion time Surely this project should not be stopping other tasks (such as Coronavirus research) from running. |
Send message Joined: 7 Oct 11 Posts: 17 Credit: 17,470,176 RAC: 4,611 |
I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects It is not the Project doing it's Boinc doing it, it's running the workunits that need to be run first so they do not exceed the deadlines for each workunit. Setting a smaller will give more flexibility for Boinc to run your Project choices, most people with always on internet, and no charges for different times of the day, run anything from a zero cache to something like 0.5 in both boxes or even 0.5 in the first box and 0.25 in the second box. That will give you around one days worth of workunits depending on how long the tasks are. Another way to affect things is the Resource Share you have set for each Project, the ones with the higher resource share will get more cpu time than ones with a lower resource share. BUT you will often still see all of the cores tied up on one Project instead of 3 cores on Project and 1 core on another Project, on a 4 core cpu. Boinc does not have the ability to set that project a uses 1 cpu core and project b uses 3 cpu cores, in that same 4 cpu core example. There are some 3rd party apps that can help with that but they require alot of manual attention and I don't recommend that for new crunchers. Try and use the built in settings and Boinc will be easier to use in the long run. |
Send message Joined: 8 Jul 11 Posts: 1344 Credit: 527,062,800 RAC: 572,020 |
I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects Something is horribly wrong with those estimated times. If it's a task on the AMD GPU then it is hung and you should kill it and disable beta apps for this project. If this is a CPU task then it should only take a few hours. Is it possible you are running another project that happens to be multi-threaded? If so, the NumberFields task might only be allocated a small fraction of a core, which would obviously increase runtime. In task manager look what the CPU percentage is for the NumberFields task - that might give you a hint as to what is going on. |
Send message Joined: 8 Jul 11 Posts: 1344 Credit: 527,062,800 RAC: 572,020 |
Things are more stable now and the number of resends has dropped substantially. So I am re-enabling the accelerated retries. I am noticing a large number of resends again, so will temporarily disable the accelerated retries just to be safe. |