scheduler problem

Message boards : Number crunching : scheduler problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1343
Credit: 515,898,796
RAC: 580,555
Message 2319 - Posted: 28 Mar 2019, 17:40:42 UTC

You may have noticed the server scheduler has been very fickle of late.

I think part of the problem is related to how WUs have been segregated into batches (1 for normal app and 1 for GPU app). Since the GPU app has been relatively stable and has agreed with the CPU version on every WU tested (over 400), I have decided to remove the batch descriminator (this will open up the DS14x12 WUs to the GPU app).

I now have over 100 GPU WUs. Interestingly enough they are all of the DS14x12 variety (batch 0) and all have a higher priority (due to accelerated retry mechanism). Ironically, I think this points to the root cause of the scheduler problem...

I think this may have fixed the problem, at least for me it has. Please let me know if you still have problems receiving tasks.
ID: 2319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HK-Steve

Send message
Joined: 23 Jun 17
Posts: 5
Credit: 42,264,426
RAC: 0
Message 2323 - Posted: 29 Mar 2019, 13:10:49 UTC

Looks good so far.
Thanks Eric
ID: 2323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 7 Oct 11
Posts: 17
Credit: 17,442,841
RAC: 26,645
Message 2499 - Posted: 21 Jul 2019, 22:34:00 UTC - in response to Message 2319.  

You may have noticed the server scheduler has been very fickle of late.

I think part of the problem is related to how WUs have been segregated into batches (1 for normal app and 1 for GPU app). Since the GPU app has been relatively stable and has agreed with the CPU version on every WU tested (over 400), I have decided to remove the batch descriminator (this will open up the DS14x12 WUs to the GPU app).

I now have over 100 GPU WUs. Interestingly enough they are all of the DS14x12 variety (batch 0) and all have a higher priority (due to accelerated retry mechanism). Ironically, I think this points to the root cause of the scheduler problem...

I think this may have fixed the problem, at least for me it has. Please let me know if you still have problems receiving tasks.


Are there about equal numbers of both cpu and gpu workunits being created or you focusing on gpu workunits right now? I ask because I have some cpu cores that can't get work here right now and if I should be trying the gpu workunits instead.
ID: 2499 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1343
Credit: 515,898,796
RAC: 580,555
Message 2501 - Posted: 22 Jul 2019, 3:07:27 UTC - in response to Message 2499.  

You may have noticed the server scheduler has been very fickle of late.

I think part of the problem is related to how WUs have been segregated into batches (1 for normal app and 1 for GPU app). Since the GPU app has been relatively stable and has agreed with the CPU version on every WU tested (over 400), I have decided to remove the batch descriminator (this will open up the DS14x12 WUs to the GPU app).

I now have over 100 GPU WUs. Interestingly enough they are all of the DS14x12 variety (batch 0) and all have a higher priority (due to accelerated retry mechanism). Ironically, I think this points to the root cause of the scheduler problem...

I think this may have fixed the problem, at least for me it has. Please let me know if you still have problems receiving tasks.


Are there about equal numbers of both cpu and gpu workunits being created or you focusing on gpu workunits right now? I ask because I have some cpu cores that can't get work here right now and if I should be trying the gpu workunits instead.


The same WUs work on both cpu and gpu, so you should be able to get tasks for either.
ID: 2501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 7 Oct 11
Posts: 17
Credit: 17,442,841
RAC: 26,645
Message 2503 - Posted: 22 Jul 2019, 11:38:52 UTC - in response to Message 2501.  

You may have noticed the server scheduler has been very fickle of late.

I think part of the problem is related to how WUs have been segregated into batches (1 for normal app and 1 for GPU app). Since the GPU app has been relatively stable and has agreed with the CPU version on every WU tested (over 400), I have decided to remove the batch descriminator (this will open up the DS14x12 WUs to the GPU app).

I now have over 100 GPU WUs. Interestingly enough they are all of the DS14x12 variety (batch 0) and all have a higher priority (due to accelerated retry mechanism). Ironically, I think this points to the root cause of the scheduler problem...

I think this may have fixed the problem, at least for me it has. Please let me know if you still have problems receiving tasks.


Are there about equal numbers of both cpu and gpu workunits being created or you focusing on gpu workunits right now? I ask because I have some cpu cores that can't get work here right now and if I should be trying the gpu workunits instead.


The same WUs work on both cpu and gpu, so you should be able to get tasks for either.


Thank you, it seems to be working now, your idea in the other thread about reissued workunits, was hopefully the problem.
ID: 2503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : scheduler problem


Main page · Your account · Message boards


Copyright © 2024 Arizona State University