Message boards :
Number crunching :
One host only gets one task at a time for GPU, another gets 65
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Nov 11 Posts: 4 Credit: 1,450,113 RAC: 0 |
Why does this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800357 get one at a time, yet this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800361 gets 65? No tasks from any other project were in the queues. |
Send message Joined: 8 Jul 11 Posts: 1344 Credit: 542,901,202 RAC: 606,851 |
Why does this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800357 get one at a time, yet this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800361 gets 65? No tasks from any other project were in the queues. That's hard to say. Are you using an app_config.xml file for either of them? Is one of the hosts relatively new? I've noticed new hosts can sometimes grab more tasks when first connected and it takes a little time for the stats to settle down. |
Send message Joined: 14 Nov 11 Posts: 4 Credit: 1,450,113 RAC: 0 |
Neither has an app config. Both haven't done this project in a long time, but they're not new to it. My first thought was the last time one of the hosts was running it, something happened to get it on the naughty list, limiting it to 1 task at a time. Climate prediction does this if you crash a task. Next task only issued after 1 day, or after you return a correct task. Does this project use naughty lists? |
Send message Joined: 8 Jul 11 Posts: 1344 Credit: 542,901,202 RAC: 606,851 |
Neither has an app config. Both haven't done this project in a long time, but they're not new to it. My first thought was the last time one of the hosts was running it, something happened to get it on the naughty list, limiting it to 1 task at a time. Climate prediction does this if you crash a task. Next task only issued after 1 day, or after you return a correct task. Does this project use naughty lists? You're talking about the "daily_result_quota": <daily_result_quota> N </daily_result_quota> Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts.Yes, we use that parameter. If that's the issue, it will eventually resolve itself as more correct results are returned. The only other thing I can think of is you are using different preferences for the 2 hosts, and the one has the "resource share" set too low. |
Send message Joined: 14 Nov 11 Posts: 4 Credit: 1,450,113 RAC: 0 |
The single task host now has plenty. But I don't understand the formula you quoted, I wasn't limited to 1 a day, I was limited to one at a time. Perhaps every good task returned increased it, so I was now due 2 a day, so one more? |
Send message Joined: 8 Jul 11 Posts: 1344 Credit: 542,901,202 RAC: 606,851 |
The single task host now has plenty. It's hard to say exactly how it's implemented in the code, but that seemed like the closest thing to a "naughty list". Whatever the cause, at least it's working now. |
Send message Joined: 14 Nov 11 Posts: 4 Credit: 1,450,113 RAC: 0 |
There's another anomaly I've spotted. The good host mentioned above wasn't asking for a task from Numberfields, even though it has only 1.6 days of Primegrid task left on it's GPU, and it's got a 2 day buffer. However when I go into the projects tab of Boinctasks, I find another time (which I can't see anywhere in the Boinc Manager), of just over 2 days. Looks like there are three ways of working out the time left. The normal way, "fraction done exact" (which you can set in app_config), and this weird one, which it seems is used to decide when to get more work. No wonder I always get 2 weeks of Primegrid instead of 2 days. Boinc is an absolute mess. If I complain, they blame the projects for using old or modified server software. |