Not enough available work

Author	Message
Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1391 Credit: 700,496,167 RAC: 822,335	Message 2762 - Posted: 11 May 2020, 6:39:44 UTC There are reports of not enough available WUs. I believe this is caused by the "accelerated retries" mechanism. So I have temporarily disabled this feature. This mechanism normally works very well, but I think the current Pentathlon is pushing it to the limit. In particular, if too many WUs are being aborted, then the work queue is flooded with "failed" jobs that now have a higher priority, and these jobs require a "reliable host". To make things worse, any host aborting too many jobs is flagged as unreliable, so won't be given any of these tasks. ID: 2762 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1391 Credit: 700,496,167 RAC: 822,335	Message 2774 - Posted: 21 May 2020, 4:16:09 UTC - in response to Message 2762. Things are more stable now and the number of resends has dropped substantially. So I am re-enabling the accelerated retries. ID: 2774 · Rating: 0 · rate: / Reply Quote

ChrisRaisin Send message Joined: 17 May 20 Posts: 2 Credit: 248,953 RAC: 0	Message 2784 - Posted: 25 May 2020, 11:00:47 UTC - in response to Message 2762. Last modified: 25 May 2020, 11:06:15 UTC I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects have tasks running for less than 2 minutes. One of the NumberFields@home jobs is even reporting that it will take 410 days 5 hrs. 22 mins and 45 seconds to complete (over a year!) Every other job is showing 9 days 11 hours and 11 minutes estimated completion time Surely this project should not be stopping other tasks (such as Coronavirus research) from running. ID: 2784 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 7 Oct 11 Posts: 17 Credit: 17,470,176 RAC: 0	Message 2785 - Posted: 25 May 2020, 14:15:34 UTC - in response to Message 2784. I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects have tasks running for less than 2 minutes. One of the NumberFields@home jobs is even reporting that it will take 410 days 5 hrs. 22 mins and 45 seconds to complete (over a year!) Every other job is showing 9 days 11 hours and 11 minutes estimated completion time Surely this project should not be stopping other tasks (such as Coronavirus research) from running. It is not the Project doing it's Boinc doing it, it's running the workunits that need to be run first so they do not exceed the deadlines for each workunit. Setting a smaller will give more flexibility for Boinc to run your Project choices, most people with always on internet, and no charges for different times of the day, run anything from a zero cache to something like 0.5 in both boxes or even 0.5 in the first box and 0.25 in the second box. That will give you around one days worth of workunits depending on how long the tasks are. Another way to affect things is the Resource Share you have set for each Project, the ones with the higher resource share will get more cpu time than ones with a lower resource share. BUT you will often still see all of the cores tied up on one Project instead of 3 cores on Project and 1 core on another Project, on a 4 core cpu. Boinc does not have the ability to set that project a uses 1 cpu core and project b uses 3 cpu cores, in that same 4 cpu core example. There are some 3rd party apps that can help with that but they require alot of manual attention and I don't recommend that for new crunchers. Try and use the built in settings and Boinc will be easier to use in the long run. ID: 2785 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1391 Credit: 700,496,167 RAC: 822,335	Message 2786 - Posted: 25 May 2020, 18:14:46 UTC - in response to Message 2784. I have been suspending jobs because there appears to be many of them all estimated to run at least 9 hours and all hogging the work from all other projects, Many of these other projects have tasks running for less than 2 minutes. One of the NumberFields@home jobs is even reporting that it will take 410 days 5 hrs. 22 mins and 45 seconds to complete (over a year!) Every other job is showing 9 days 11 hours and 11 minutes estimated completion time Surely this project should not be stopping other tasks (such as Coronavirus research) from running. Something is horribly wrong with those estimated times. If it's a task on the AMD GPU then it is hung and you should kill it and disable beta apps for this project. If this is a CPU task then it should only take a few hours. Is it possible you are running another project that happens to be multi-threaded? If so, the NumberFields task might only be allocated a small fraction of a core, which would obviously increase runtime. In task manager look what the CPU percentage is for the NumberFields task - that might give you a hint as to what is going on. ID: 2786 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1391 Credit: 700,496,167 RAC: 822,335	Message 2815 - Posted: 7 Jun 2020, 4:28:26 UTC - in response to Message 2774. Things are more stable now and the number of resends has dropped substantially. So I am re-enabling the accelerated retries. I am noticing a large number of resends again, so will temporarily disable the accelerated retries just to be safe. ID: 2815 · Rating: 0 · rate: / Reply Quote