One host only gets one task at a time for GPU, another gets 65

Message boards : Number crunching : One host only gets one task at a time for GPU, another gets 65
Message board moderation

To post messages, you must log in.

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 14 Nov 11
Posts: 4
Credit: 1,450,113
RAC: 0
Message 3681 - Posted: 27 Mar 2024, 11:36:33 UTC
Last modified: 27 Mar 2024, 11:37:15 UTC

Why does this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800357 get one at a time, yet this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800361 gets 65? No tasks from any other project were in the queues.
ID: 3681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 542,901,202
RAC: 606,851
Message 3682 - Posted: 27 Mar 2024, 15:23:38 UTC - in response to Message 3681.  

Why does this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800357 get one at a time, yet this host https://numberfields.asu.edu/NumberFields/show_host_detail.php?hostid=2800361 gets 65? No tasks from any other project were in the queues.

That's hard to say. Are you using an app_config.xml file for either of them? Is one of the hosts relatively new? I've noticed new hosts can sometimes grab more tasks when first connected and it takes a little time for the stats to settle down.
ID: 3682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 14 Nov 11
Posts: 4
Credit: 1,450,113
RAC: 0
Message 3684 - Posted: 30 Mar 2024, 15:46:51 UTC - in response to Message 3682.  

Neither has an app config. Both haven't done this project in a long time, but they're not new to it. My first thought was the last time one of the hosts was running it, something happened to get it on the naughty list, limiting it to 1 task at a time. Climate prediction does this if you crash a task. Next task only issued after 1 day, or after you return a correct task. Does this project use naughty lists?
ID: 3684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 542,901,202
RAC: 606,851
Message 3685 - Posted: 31 Mar 2024, 4:26:33 UTC - in response to Message 3684.  

Neither has an app config. Both haven't done this project in a long time, but they're not new to it. My first thought was the last time one of the hosts was running it, something happened to get it on the naughty list, limiting it to 1 task at a time. Climate prediction does this if you crash a task. Next task only issued after 1 day, or after you return a correct task. Does this project use naughty lists?


You're talking about the "daily_result_quota":
<daily_result_quota> N </daily_result_quota>
Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts. 
Yes, we use that parameter. If that's the issue, it will eventually resolve itself as more correct results are returned.

The only other thing I can think of is you are using different preferences for the 2 hosts, and the one has the "resource share" set too low.
ID: 3685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 14 Nov 11
Posts: 4
Credit: 1,450,113
RAC: 0
Message 3686 - Posted: 31 Mar 2024, 9:51:14 UTC - in response to Message 3685.  

The single task host now has plenty.

But I don't understand the formula you quoted, I wasn't limited to 1 a day, I was limited to one at a time. Perhaps every good task returned increased it, so I was now due 2 a day, so one more?
ID: 3686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1344
Credit: 542,901,202
RAC: 606,851
Message 3687 - Posted: 31 Mar 2024, 15:22:36 UTC - in response to Message 3686.  

The single task host now has plenty.

But I don't understand the formula you quoted, I wasn't limited to 1 a day, I was limited to one at a time. Perhaps every good task returned increased it, so I was now due 2 a day, so one more?

It's hard to say exactly how it's implemented in the code, but that seemed like the closest thing to a "naughty list". Whatever the cause, at least it's working now.
ID: 3687 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 14 Nov 11
Posts: 4
Credit: 1,450,113
RAC: 0
Message 3688 - Posted: 31 Mar 2024, 15:32:52 UTC - in response to Message 3687.  
Last modified: 31 Mar 2024, 15:34:09 UTC

There's another anomaly I've spotted. The good host mentioned above wasn't asking for a task from Numberfields, even though it has only 1.6 days of Primegrid task left on it's GPU, and it's got a 2 day buffer. However when I go into the projects tab of Boinctasks, I find another time (which I can't see anywhere in the Boinc Manager), of just over 2 days. Looks like there are three ways of working out the time left. The normal way, "fraction done exact" (which you can set in app_config), and this weird one, which it seems is used to decide when to get more work. No wonder I always get 2 weeks of Primegrid instead of 2 days. Boinc is an absolute mess. If I complain, they blame the projects for using old or modified server software.
ID: 3688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : One host only gets one task at a time for GPU, another gets 65


Main page · Your account · Message boards


Copyright © 2024 Arizona State University