GPU not getting any tasks

Message boards : Number crunching : GPU not getting any tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
nCtrl

Send message
Joined: 23 Nov 20
Posts: 4
Credit: 524,116
RAC: 0
Message 2931 - Posted: 8 Dec 2020, 21:19:54 UTC
Last modified: 8 Dec 2020, 22:14:21 UTC

I've been trying to get GPU tasks all evening but it always returns :
12/8/2020 9:06:18 PM | NumberFields@home | Requesting new tasks for NVIDIA GPU
12/8/2020 9:06:20 PM | NumberFields@home | Scheduler request completed: got 0 new tasks


According to the event log, the GPU is detected.

12/8/2020 8:42:51 PM |  | CUDA: NVIDIA GPU 0: GeForce GT 730M (driver version 425.31, CUDA version 10.1, compute capability 3.5, 1024MB, 824MB available, 582 GFLOPS peak)
12/8/2020 8:42:51 PM |  | OpenCL: NVIDIA GPU 0: GeForce GT 730M (driver version 425.31, device version OpenCL 1.2 CUDA, 1024MB, 824MB available, 582 GFLOPS peak)


From what I've read on similar topics on this forum, driver version is OK and openCL, CUDA and compute capability all meet the requirements. The OS is

12/8/2020 8:42:51 PM |  | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.19042.00)


Any clue as to why I'm not receiving any?

EDIT: After some digging on the debugger, I found this, I've removed the unnecessary bits:


12/8/2020 10:54:12 PM | | [work_fetch] ------- start work fetch state -------
12/8/2020 10:54:12 PM | | [work_fetch] target work buffer: 86400.00 + 172800.00 sec
12/8/2020 10:54:12 PM | | [work_fetch] --- state for NVIDIA GPU ---
12/8/2020 10:54:12 PM | | [work_fetch] shortfall 259200.00 nidle 1.00 saturated 0.00 busy 0.00
12/8/2020 10:54:12 PM | NumberFields@home | [work_fetch] share 1.000

12/8/2020 10:54:12 PM | | [work_fetch] ------- end work fetch state -------
12/8/2020 10:54:12 PM | NumberFields@home | choose_project: scanning
12/8/2020 10:54:12 PM | NumberFields@home | NVIDIA GPU needs work - buffer low
12/8/2020 10:54:12 PM | NumberFields@home | checking NVIDIA GPU
12/8/2020 10:54:12 PM | NumberFields@home | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 259200.00
12/8/2020 10:54:12 PM | NumberFields@home | NVIDIA GPU set_request: 259199.999999

12/8/2020 10:54:12 PM | NumberFields@home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (259200.00 sec, 1.00 inst) Intel GPU (0.00 sec, 0.00 inst)
12/8/2020 10:54:12 PM | NumberFields@home | Sending scheduler request: To fetch work.
12/8/2020 10:54:12 PM | NumberFields@home | Requesting new tasks for NVIDIA GPU
12/8/2020 10:54:12 PM | NumberFields@home | [sched_op] NVIDIA GPU work request: 259200.00 seconds; 1.00 devices
12/8/2020 10:54:14 PM | NumberFields@home | Scheduler request completed: got 0 new tasks
12/8/2020 10:54:14 PM | NumberFields@home | Project requested delay of 31 seconds
12/8/2020 10:54:14 PM | NumberFields@home | [work_fetch] backing off NVIDIA GPU 2191 sec
[...]
12/8/2020 11:05:54 PM | NumberFields@home | can't fetch NVIDIA GPU: project is backed off


For some reason it seems I'm being denied the WUs and being put on a 30 min waiting loop over and over, (which is what I'm guessing "project is backoff means) unless the reason is you ran out of nvidia work units. (Somebody with an AMD GPU tried at the same time and got them, so whatever this is it looks like it's Nvidia specific.)
ID: 2931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2932 - Posted: 8 Dec 2020, 22:22:21 UTC - in response to Message 2931.  

Just above the lines you bold faced I see this:
[work_fetch] share 0.000 blocked by project preferences

So it looks like something is set wrong with your project preferences. Do you have "Use NVIDIA GPU" unchecked or the "resource share" set to zero?
ID: 2932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nCtrl

Send message
Joined: 23 Nov 20
Posts: 4
Credit: 524,116
RAC: 0
Message 2933 - Posted: 9 Dec 2020, 4:26:43 UTC - in response to Message 2932.  
Last modified: 9 Dec 2020, 4:36:03 UTC

Resource share is 100, and "use NVIDIA GPU" is indeed checked.

What you see must be for the CPU, the setting "use CPU" is unchecked in my preferences as I want to receive only GPU tasks on this computer. Maybe I trimmed out too much of the log

12/9/2020 5:12:59 AM |  | [work_fetch] ------- start work fetch state -------
12/9/2020 5:12:59 AM |  | [work_fetch] target work buffer: 86400.00 + 172800.00 sec
12/9/2020 5:12:59 AM |  | [work_fetch] --- project states ---
12/9/2020 5:12:59 AM | NumberFields@home | [work_fetch] REC 0.000 prio -0.000 can request work
12/9/2020 5:12:59 AM |  | [work_fetch] --- state for CPU ---
12/9/2020 5:12:59 AM |  | [work_fetch] shortfall 1036800.00 nidle 4.00 saturated 0.00 busy 0.00
12/9/2020 5:12:59 AM | NumberFields@home | [work_fetch] share 0.000 blocked by project preferences
12/9/2020 5:12:59 AM |  | [work_fetch] --- state for NVIDIA GPU ---
12/9/2020 5:12:59 AM |  | [work_fetch] shortfall 259200.00 nidle 1.00 saturated 0.00 busy 0.00
12/9/2020 5:12:59 AM | NumberFields@home | [work_fetch] share 0.000 project is backed off  (resource backoff: 55.12, inc 600.00)
12/9/2020 5:12:59 AM |  | [work_fetch] --- state for Intel GPU ---
12/9/2020 5:12:59 AM |  | [work_fetch] shortfall 259200.00 nidle 1.00 saturated 0.00 busy 0.00
12/9/2020 5:12:59 AM | NumberFields@home | [work_fetch] share 0.000 no applications
12/9/2020 5:12:59 AM |  | [work_fetch] ------- end work fetch state -------
12/9/2020 5:12:59 AM | NumberFields@home | choose_project: scanning
12/9/2020 5:12:59 AM | NumberFields@home | can't fetch CPU: blocked by project preferences
12/9/2020 5:12:59 AM | NumberFields@home | can't fetch NVIDIA GPU: project is backed off
12/9/2020 5:12:59 AM | NumberFields@home | can't fetch Intel GPU: no applications

as you can see, the share of the CPU is indeed 0.0 since I've blocked it from the preferences, whereas the Nvidia GPU share 0.0 because of this "project is backed off" reason.

Basically the loop it's in is:
1) Request GPU tasks -> Denied, project is backed off (X), where X behaves like some time delay.
2) When time reaches 0, it retries
12/9/2020 5:13:59 AM | NumberFields@home | can't fetch CPU: blocked by project preferences
12/9/2020 5:13:59 AM | NumberFields@home | can fetch NVIDIA GPU <-------
12/9/2020 5:13:59 AM | NumberFields@home | NVIDIA GPU needs work - buffer low <-------
12/9/2020 5:13:59 AM | NumberFields@home | checking CPU
12/9/2020 5:13:59 AM | NumberFields@home | CPU can't fetch: blocked by project preferences
12/9/2020 5:13:59 AM | NumberFields@home | checking NVIDIA GPU
12/9/2020 5:13:59 AM | NumberFields@home | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 259200.00 <--------
12/9/2020 5:13:59 AM | NumberFields@home | NVIDIA GPU set_request: 259200.000000
12/9/2020 5:13:59 AM | NumberFields@home | Requesting new tasks for NVIDIA GPU <------
12/9/2020 5:14:01 AM | NumberFields@home | Scheduler request completed: got 0 new tasks <--------
12/9/2020 5:14:01 AM | NumberFields@home | Project requested delay of 31 seconds
12/9/2020 5:14:01 AM | NumberFields@home | [work_fetch] backing off NVIDIA GPU 1488 sec <--------

3) Repeat

So it does realize it lacks GPU work, it does realize it has to fetch, but when it does so it get none and it returns to this "project back off" state for another half an hour.
I've tried to detach/re-attach and reset but to no avail.
ID: 2933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2934 - Posted: 9 Dec 2020, 5:12:45 UTC - in response to Message 2933.  

So I dug a little deeper. According to this:
https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
GT 730M is compute capability 3.0
GT 730 is compute capability 3.5
The server says your card is GT 730M, but the client says you have compute capability 3.5 (from your event log). I think this inconsistency is the problem. I checked your host in the database and this is what it has:
[BOINC|7.16.11][CUDA|GeForce GT 730M|1|1024MB|42531|102][INTEL|Intel(R) HD Graphics 4600|1|1629MB||102][vbox|6.1.12|1|1]

No mention of the compute capability, but it does say GT 730M. I'm not sure how it's determining the compute capability, but it's acting as if it's 3.0, and that's the reason it's not giving you work. This is my best guess at this point. I will dig deeper tomorrow, when I have more time.
ID: 2934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nCtrl

Send message
Joined: 23 Nov 20
Posts: 4
Credit: 524,116
RAC: 0
Message 2936 - Posted: 9 Dec 2020, 15:39:40 UTC - in response to Message 2934.  
Last modified: 9 Dec 2020, 15:41:07 UTC

I don't think that page is correct, I've run the devicequery tool from the cuda toolkit and this is the result:


Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 730M"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 1024 MBytes (1073741824 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            758 MHz (0.76 GHz)
  Memory Clock rate:                             1001 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS


and it looks like it's indeed compute capability 3.5. Thanks for looking into this
ID: 2936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2937 - Posted: 9 Dec 2020, 18:37:09 UTC - in response to Message 2936.  

That web page could indeed be wrong. But just as a test, I changed the plan class to allow compute capability 3.0 or higher. You may need to click the update button to see the changes.

The scheduler logs show this:
Sending reply to [HOST#2782766]: 0 results, delay req 31.00
This is uninformative, but is consistent with what you are seeing. I increased the debug level to see if it can give any more useful information.
ID: 2937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nCtrl

Send message
Joined: 23 Nov 20
Posts: 4
Credit: 524,116
RAC: 0
Message 2940 - Posted: 9 Dec 2020, 19:16:58 UTC - in response to Message 2937.  

That somehow seems to have done it, I've received gpu tasks. Thank you very much!
ID: 2940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,830,278
RAC: 288,937
Message 2941 - Posted: 9 Dec 2020, 19:23:50 UTC - in response to Message 2940.  

That somehow seems to have done it, I've received gpu tasks. Thank you very much!


Yes, the scheduler log now shows results being sent.

Somehow BOINC thought your compcap was 3.0 even though devicequery showed it as 3.5. There were problems in the past with compcap 3.0. So we should keep an eye on this, just in case devicequery is wrong and your compcap is not what you think it is.
ID: 2941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : GPU not getting any tasks


Main page · Your account · Message boards


Copyright © 2024 Arizona State University