GPU not getting any tasks

Author	Message
nCtrl Send message Joined: 23 Nov 20 Posts: 4 Credit: 524,116 RAC: 0	Message 2931 - Posted: 8 Dec 2020, 21:19:54 UTC Last modified: 8 Dec 2020, 22:14:21 UTC I've been trying to get GPU tasks all evening but it always returns : 12/8/2020 9:06:18 PM \| NumberFields@home \| Requesting new tasks for NVIDIA GPU 12/8/2020 9:06:20 PM \| NumberFields@home \| Scheduler request completed: got 0 new tasks According to the event log, the GPU is detected. 12/8/2020 8:42:51 PM \| \| CUDA: NVIDIA GPU 0: GeForce GT 730M (driver version 425.31, CUDA version 10.1, compute capability 3.5, 1024MB, 824MB available, 582 GFLOPS peak) 12/8/2020 8:42:51 PM \| \| OpenCL: NVIDIA GPU 0: GeForce GT 730M (driver version 425.31, device version OpenCL 1.2 CUDA, 1024MB, 824MB available, 582 GFLOPS peak) From what I've read on similar topics on this forum, driver version is OK and openCL, CUDA and compute capability all meet the requirements. The OS is 12/8/2020 8:42:51 PM \| \| OS: Microsoft Windows 10: Professional x64 Edition, (10.00.19042.00) Any clue as to why I'm not receiving any? EDIT: After some digging on the debugger, I found this, I've removed the unnecessary bits: 12/8/2020 10:54:12 PM \| \| [work_fetch] ------- start work fetch state ------- 12/8/2020 10:54:12 PM \| \| [work_fetch] target work buffer: 86400.00 + 172800.00 sec 12/8/2020 10:54:12 PM \| \| [work_fetch] --- state for NVIDIA GPU --- 12/8/2020 10:54:12 PM \| \| [work_fetch] shortfall 259200.00 nidle 1.00 saturated 0.00 busy 0.00 12/8/2020 10:54:12 PM \| NumberFields@home \| [work_fetch] share 1.000 12/8/2020 10:54:12 PM \| \| [work_fetch] ------- end work fetch state ------- 12/8/2020 10:54:12 PM \| NumberFields@home \| choose_project: scanning 12/8/2020 10:54:12 PM \| NumberFields@home \| NVIDIA GPU needs work - buffer low 12/8/2020 10:54:12 PM \| NumberFields@home \| checking NVIDIA GPU 12/8/2020 10:54:12 PM \| NumberFields@home \| [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 259200.00 12/8/2020 10:54:12 PM \| NumberFields@home \| NVIDIA GPU set_request: 259199.999999 12/8/2020 10:54:12 PM \| NumberFields@home \| [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (259200.00 sec, 1.00 inst) Intel GPU (0.00 sec, 0.00 inst) 12/8/2020 10:54:12 PM \| NumberFields@home \| Sending scheduler request: To fetch work. 12/8/2020 10:54:12 PM \| NumberFields@home \| Requesting new tasks for NVIDIA GPU 12/8/2020 10:54:12 PM \| NumberFields@home \| [sched_op] NVIDIA GPU work request: 259200.00 seconds; 1.00 devices 12/8/2020 10:54:14 PM \| NumberFields@home \| Scheduler request completed: got 0 new tasks 12/8/2020 10:54:14 PM \| NumberFields@home \| Project requested delay of 31 seconds 12/8/2020 10:54:14 PM \| NumberFields@home \| [work_fetch] backing off NVIDIA GPU 2191 sec [...] 12/8/2020 11:05:54 PM \| NumberFields@home \| can't fetch NVIDIA GPU: project is backed off For some reason it seems I'm being denied the WUs and being put on a 30 min waiting loop over and over, (which is what I'm guessing "project is backoff means) unless the reason is you ran out of nvidia work units. (Somebody with an AMD GPU tried at the same time and got them, so whatever this is it looks like it's Nvidia specific.) ID: 2931 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 803,309,589 RAC: 894,380	Message 2932 - Posted: 8 Dec 2020, 22:22:21 UTC - in response to Message 2931. Just above the lines you bold faced I see this: [work_fetch] share 0.000 blocked by project preferences So it looks like something is set wrong with your project preferences. Do you have "Use NVIDIA GPU" unchecked or the "resource share" set to zero? ID: 2932 · Rating: 0 · rate: / Reply Quote

nCtrl Send message Joined: 23 Nov 20 Posts: 4 Credit: 524,116 RAC: 0	Message 2933 - Posted: 9 Dec 2020, 4:26:43 UTC - in response to Message 2932. Last modified: 9 Dec 2020, 4:36:03 UTC Resource share is 100, and "use NVIDIA GPU" is indeed checked. What you see must be for the CPU, the setting "use CPU" is unchecked in my preferences as I want to receive only GPU tasks on this computer. Maybe I trimmed out too much of the log 12/9/2020 5:12:59 AM \| \| [work_fetch] ------- start work fetch state ------- 12/9/2020 5:12:59 AM \| \| [work_fetch] target work buffer: 86400.00 + 172800.00 sec 12/9/2020 5:12:59 AM \| \| [work_fetch] --- project states --- 12/9/2020 5:12:59 AM \| NumberFields@home \| [work_fetch] REC 0.000 prio -0.000 can request work 12/9/2020 5:12:59 AM \| \| [work_fetch] --- state for CPU --- 12/9/2020 5:12:59 AM \| \| [work_fetch] shortfall 1036800.00 nidle 4.00 saturated 0.00 busy 0.00 12/9/2020 5:12:59 AM \| NumberFields@home \| [work_fetch] share 0.000 blocked by project preferences 12/9/2020 5:12:59 AM \| \| [work_fetch] --- state for NVIDIA GPU --- 12/9/2020 5:12:59 AM \| \| [work_fetch] shortfall 259200.00 nidle 1.00 saturated 0.00 busy 0.00 12/9/2020 5:12:59 AM \| NumberFields@home \| [work_fetch] share 0.000 project is backed off (resource backoff: 55.12, inc 600.00) 12/9/2020 5:12:59 AM \| \| [work_fetch] --- state for Intel GPU --- 12/9/2020 5:12:59 AM \| \| [work_fetch] shortfall 259200.00 nidle 1.00 saturated 0.00 busy 0.00 12/9/2020 5:12:59 AM \| NumberFields@home \| [work_fetch] share 0.000 no applications 12/9/2020 5:12:59 AM \| \| [work_fetch] ------- end work fetch state ------- 12/9/2020 5:12:59 AM \| NumberFields@home \| choose_project: scanning 12/9/2020 5:12:59 AM \| NumberFields@home \| can't fetch CPU: blocked by project preferences 12/9/2020 5:12:59 AM \| NumberFields@home \| can't fetch NVIDIA GPU: project is backed off 12/9/2020 5:12:59 AM \| NumberFields@home \| can't fetch Intel GPU: no applications as you can see, the share of the CPU is indeed 0.0 since I've blocked it from the preferences, whereas the Nvidia GPU share 0.0 because of this "project is backed off" reason. Basically the loop it's in is: 1) Request GPU tasks -> Denied, project is backed off (X), where X behaves like some time delay. 2) When time reaches 0, it retries 12/9/2020 5:13:59 AM \| NumberFields@home \| can't fetch CPU: blocked by project preferences 12/9/2020 5:13:59 AM \| NumberFields@home \| can fetch NVIDIA GPU <------- 12/9/2020 5:13:59 AM \| NumberFields@home \| NVIDIA GPU needs work - buffer low <------- 12/9/2020 5:13:59 AM \| NumberFields@home \| checking CPU 12/9/2020 5:13:59 AM \| NumberFields@home \| CPU can't fetch: blocked by project preferences 12/9/2020 5:13:59 AM \| NumberFields@home \| checking NVIDIA GPU 12/9/2020 5:13:59 AM \| NumberFields@home \| [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 259200.00 <-------- 12/9/2020 5:13:59 AM \| NumberFields@home \| NVIDIA GPU set_request: 259200.000000 12/9/2020 5:13:59 AM \| NumberFields@home \| Requesting new tasks for NVIDIA GPU <------ 12/9/2020 5:14:01 AM \| NumberFields@home \| Scheduler request completed: got 0 new tasks <-------- 12/9/2020 5:14:01 AM \| NumberFields@home \| Project requested delay of 31 seconds 12/9/2020 5:14:01 AM \| NumberFields@home \| [work_fetch] backing off NVIDIA GPU 1488 sec <-------- 3) Repeat So it does realize it lacks GPU work, it does realize it has to fetch, but when it does so it get none and it returns to this "project back off" state for another half an hour. I've tried to detach/re-attach and reset but to no avail. ID: 2933 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 803,309,589 RAC: 894,380	Message 2934 - Posted: 9 Dec 2020, 5:12:45 UTC - in response to Message 2933. So I dug a little deeper. According to this: https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications GT 730M is compute capability 3.0 GT 730 is compute capability 3.5 The server says your card is GT 730M, but the client says you have compute capability 3.5 (from your event log). I think this inconsistency is the problem. I checked your host in the database and this is what it has: [BOINC\|7.16.11][CUDA\|GeForce GT 730M\|1\|1024MB\|42531\|102][INTEL\|Intel(R) HD Graphics 4600\|1\|1629MB\|\|102][vbox\|6.1.12\|1\|1] No mention of the compute capability, but it does say GT 730M. I'm not sure how it's determining the compute capability, but it's acting as if it's 3.0, and that's the reason it's not giving you work. This is my best guess at this point. I will dig deeper tomorrow, when I have more time. ID: 2934 · Rating: 0 · rate: / Reply Quote

nCtrl Send message Joined: 23 Nov 20 Posts: 4 Credit: 524,116 RAC: 0	Message 2936 - Posted: 9 Dec 2020, 15:39:40 UTC - in response to Message 2934. Last modified: 9 Dec 2020, 15:41:07 UTC I don't think that page is correct, I've run the devicequery tool from the cuda toolkit and this is the result: Detected 1 CUDA Capable device(s) Device 0: "GeForce GT 730M" CUDA Driver Version / Runtime Version 10.1 / 10.1 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 1024 MBytes (1073741824 bytes) ( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores GPU Max Clock rate: 758 MHz (0.76 GHz) Memory Clock rate: 1001 Mhz Memory Bus Width: 64-bit L2 Cache Size: 524288 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1 Result = PASS and it looks like it's indeed compute capability 3.5. Thanks for looking into this ID: 2936 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 803,309,589 RAC: 894,380	Message 2937 - Posted: 9 Dec 2020, 18:37:09 UTC - in response to Message 2936. That web page could indeed be wrong. But just as a test, I changed the plan class to allow compute capability 3.0 or higher. You may need to click the update button to see the changes. The scheduler logs show this: Sending reply to [HOST#2782766]: 0 results, delay req 31.00 This is uninformative, but is consistent with what you are seeing. I increased the debug level to see if it can give any more useful information. ID: 2937 · Rating: 0 · rate: / Reply Quote

nCtrl Send message Joined: 23 Nov 20 Posts: 4 Credit: 524,116 RAC: 0	Message 2940 - Posted: 9 Dec 2020, 19:16:58 UTC - in response to Message 2937. That somehow seems to have done it, I've received gpu tasks. Thank you very much! ID: 2940 · Rating: 0 · rate: / Reply Quote

Eric Driver Project administrator Project developer Project tester Project scientist Send message Joined: 8 Jul 11 Posts: 1434 Credit: 803,309,589 RAC: 894,380	Message 2941 - Posted: 9 Dec 2020, 19:23:50 UTC - in response to Message 2940. That somehow seems to have done it, I've received gpu tasks. Thank you very much! Yes, the scheduler log now shows results being sent. Somehow BOINC thought your compcap was 3.0 even though devicequery showed it as 3.5. There were problems in the past with compcap 3.0. So we should keep an eye on this, just in case devicequery is wrong and your compcap is not what you think it is. ID: 2941 · Rating: 0 · rate: / Reply Quote