Support for Intel GPUs

Message boards : News : Support for Intel GPUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
JagDoc

Send message
Joined: 7 Sep 13
Posts: 2
Credit: 14,111,742
RAC: 6,179
Message 3159 - Posted: 18 Nov 2021, 22:00:06 UTC

ID: 3159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Vato
Avatar

Send message
Joined: 4 Dec 19
Posts: 6
Credit: 11,698,585
RAC: 71,264
Message 3160 - Posted: 19 Nov 2021, 1:46:48 UTC

Aborted both of mine after 9+ hours with no sign of any work being done or any actual GPU usage
ID: 3160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3161 - Posted: 19 Nov 2021, 5:30:54 UTC - in response to Message 3158.  

If this will help, it ran for 2.5 hours before I aborted all wus.

https://numberfields.asu.edu/NumberFields/result.php?resultid=124654535


If I understand this correctly, it hung while trying to compile the openCL code, and the debug dump occurred as a result of aborting it.
ID: 3161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3162 - Posted: 19 Nov 2021, 5:39:02 UTC - in response to Message 3159.  

First wu finished after 3h36m:
https://numberfields.asu.edu/NumberFields/result.php?resultid=124655746


Well, at least it finished. But the run time suggests either a bad build of the openCL code or the GPU is not powerful enough for this app. This is the same thing we saw with older Nvidia/AMD cards. Contention may also be playing a role - if other things were running on the GPU at the same time or all the CPU cores were maxed out (recall, the app needs a small portion of a cpu core).
ID: 3162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3163 - Posted: 19 Nov 2021, 5:41:52 UTC - in response to Message 3160.  

Aborted both of mine after 9+ hours with no sign of any work being done or any actual GPU usage


Looking at the stderr.txt, it hung while trying to build the openCL, the same problem others are having.
ID: 3163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,470,702
RAC: 128,691
Message 3164 - Posted: 19 Nov 2021, 6:22:17 UTC - in response to Message 3162.  

Contention may also be playing a role - if other things were running on the GPU at the same time or all the CPU cores were maxed out (recall, the app needs a small portion of a cpu core).
I've run intel_gpu tasks under BOINC for a number of projects - originally SETI@Home, and currently both Einstein and WCG/Covid. All my iGPU experience is under Windows.

Contention certainly needs to be considered carefully. All GPU applications require some degree of CPU support while running, but the OpenCL language running under Windows on an iGPU is an outlier. It doesn't need much CPU time, but it wants CPU access FAST. That CPU support can be made available in two different ways: either by ensuring that a CPU core is free from a continuous running BOINC-style workload (which is wasteful), or - as first recognised at Einstein - by setting the CPU part of the iGPU application to run at real-time priority.

!!!

That latter suggestion clearly has to be handled with care by experienced users, but it works for me. It perhaps gives some insight into what's going on under the hood. Unfortunately, the iGPU is the ugly sister of the GPGPU programming world, and I don't know of anyone who has really got to grips with programming it properly under BOINC - too little time, too much to do.
ID: 3164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Drago75

Send message
Joined: 6 Feb 21
Posts: 4
Credit: 3,337,063
RAC: 0
Message 3165 - Posted: 19 Nov 2021, 8:13:42 UTC

My HD620 finished 3 WU's last night while the tablet was not used for anything. The CPU cores were also free. The times to finish varied between 2:16 and 3:45 hours.
ID: 3165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer

Send message
Joined: 15 Aug 12
Posts: 10
Credit: 4,211,960
RAC: 9,336
Message 3166 - Posted: 20 Nov 2021, 0:12:52 UTC - in response to Message 3139.  

I just added support for Intel GPUs. There are 2 versions - 64 bit linux and 64 bit windows, both require openCL 1.1 or higher.


My WUs error out on the glibc version (2.27) of Linux Mint 19.3 (the OpenCL used is 2.0)

<stderr_txt>
../../projects/numberfields.asu.edu_NumberFields/GetDecics_4.02_x86_64-pc-linux-gnu__opencl_intel: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ../../projects/numberfields.asu.edu_NumberFields/GetDecics_4.02_x86_64-pc-linux-gnu__opencl_intel)
../../projects/numberfields.asu.edu_NumberFields/GetDecics_4.02_x86_64-pc-linux-gnu__opencl_intel: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ../../projects/numberfields.asu.edu_NumberFields/GetDecics_4.02_x86_64-pc-linux-gnu__opencl_intel)
../../projects/numberfields.asu.edu_NumberFields/GetDecics_4.02_x86_64-pc-linux-gnu__opencl_intel: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ../../projects/numberfields.asu.edu_NumberFields/GetDecics_4.02_x86_64-pc-linux-gnu__opencl_intel)
</stderr_txt>
ID: 3166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3167 - Posted: 20 Nov 2021, 17:52:14 UTC

Update on the glibc problem (only applies to linux users):

The root cause of the problem is that I upgraded my build system in the last year and now have a newer version of glibc. Anyone using a version of glibc older than 2.33 will have this problem. The problem seems to be with a new symbol "fstat" which was not even in the old binaries (according to objdump).

I think the solution is for me to downgrade my system to an earlier version. This will break other things for me, so I am hesitant to do this in the near term. Since this app is a beta version and most runtimes take over 2 hours on the UHD 620/630 cards and hang on lesser cards, it seems like this app will not get much use until the newer intel cards come out in 2022. Hopefully by then I can improve my build system, so we can stop having these compatibility issues.
ID: 3167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pag

Send message
Joined: 21 Mar 17
Posts: 1
Credit: 166,200
RAC: 0
Message 3168 - Posted: 21 Nov 2021, 14:18:27 UTC

all my wu finished with miscalculations Error during calculations
ID: 3168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3169 - Posted: 21 Nov 2021, 16:41:13 UTC - in response to Message 3168.  

all my wu finished with miscalculations Error during calculations


According to the stderr, it failed to build the openCL code. At least it exited gracefully instead of the openCL compiler hanging indefinitely which sometimes happens for others.
ID: 3169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 27 Dec 19
Posts: 6
Credit: 622,748
RAC: 135
Message 3170 - Posted: 21 Nov 2021, 22:40:25 UTC

It didn't find my nvidia gpu
GPU Summary String = [CUDA|NVIDIAGeForceGTX1650|1|4095MB|49676|300].
Loading GPU lookup table from file.
GPU was not found in the lookup table. Using default values:
numBlocks = 1024.
threadsPerBlock = 32.
polyBufferSize = 32768.
ID: 3170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3171 - Posted: 22 Nov 2021, 0:16:47 UTC - in response to Message 3170.  

It didn't find my nvidia gpu
GPU Summary String = [CUDA|NVIDIAGeForceGTX1650|1|4095MB|49676|300].
Loading GPU lookup table from file.
GPU was not found in the lookup table. Using default values:
numBlocks = 1024.
threadsPerBlock = 32.
polyBufferSize = 32768.


That's because it's not in the lookup table. But no worries, performance should still be relatively good with the default values.
ID: 3171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill F

Send message
Joined: 27 Sep 21
Posts: 11
Credit: 1,862,316
RAC: 3,533
Message 3172 - Posted: 22 Nov 2021, 0:56:20 UTC - in response to Message 3171.  

It didn't find my nvidia gpu
GPU Summary String = [CUDA|NVIDIAGeForceGTX1650|1|4095MB|49676|300].
Loading GPU lookup table from file.
GPU was not found in the lookup table. Using default values:
numBlocks = 1024.
threadsPerBlock = 32.
polyBufferSize = 32768.


That's because it's not in the lookup table. But no worries, performance should still be relatively good with the default values.



A NVIDIAGeForceGTX1650|1|4095MB|49676|300] is a pretty mid range GPU that I believe has been around for a while. Are the project developers having to build an entire look up table or are there common industry tables available ?
ID: 3172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3173 - Posted: 22 Nov 2021, 2:57:22 UTC - in response to Message 3172.  

It didn't find my nvidia gpu
GPU Summary String = [CUDA|NVIDIAGeForceGTX1650|1|4095MB|49676|300].
Loading GPU lookup table from file.
GPU was not found in the lookup table. Using default values:
numBlocks = 1024.
threadsPerBlock = 32.
polyBufferSize = 32768.


That's because it's not in the lookup table. But no worries, performance should still be relatively good with the default values.



A NVIDIAGeForceGTX1650|1|4095MB|49676|300] is a pretty mid range GPU that I believe has been around for a while. Are the project developers having to build an entire look up table or are there common industry tables available ?


The settings in the lookup table are unique to this project, so yes it would have to be built up manually. The default settings should be pretty good for most cards. The idea was that over time some entries could be added to the lookup table to improve performance on specific cards.
ID: 3173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,470,702
RAC: 128,691
Message 3174 - Posted: 22 Nov 2021, 12:51:55 UTC

I'm slightly surprised that you've felt the need to research and supply additional data about GPU compute devices. Shouldn't that be made available to you and all other application developers, either by the BOINC platform, or from the manufacturer's driver tools?

I've looked into what is made available to you by BOINC / OpenCL for my iGPUs.

name				Intel(R) HD Graphics 4600	Intel(R) HD Graphics 4600
vendor				Intel(R) Corporation		Intel(R) Corporation
vendor_id			32902				32902
available			1				1
half_fp_config			0				0
single_fp_config		158				158
double_fp_config		0				0
endian_little			1				1
execution_capabilities		1				1
global_mem_size			1360632218			1360632218
local_mem_size			65536				65536
max_clock_frequency		400				400
max_compute_units		20				20
nv_compute_capability_major	0				0
nv_compute_capability_minor	0				0
amd_simd_per_compute_unit	0				0
amd_simd_width			0				0
amd_simd_instruction_width	0				0
opencl_platform_version		OpenCL 1.2			OpenCL 1.2
opencl_device_version		OpenCL 1.2			OpenCL 1.2
opencl_driver_version		10.18.14.5162			10.18.14.5162
device_num			0	
peak_flops			64000000000	
opencl_available_ram		1360632218	
opencl_device_index		0	
warn_bad_cuda			0
That's a pretty eclectic list. The first column is taken from an internal BOINC file called 'coproc_info.xml', and the second from the sched_request_...xml file sent by our clients to your servers. I don't know why the nv and amd fields are present in the iGPU report - presumably it was simpler to use a common data structure across all vendors.

Anyway, numBlocks, threadsPerBlock and polyBufferSize are clearly all missing. How important are they, where can we get them from, and should we ask the BOINC developers to do the heavy lifting?
ID: 3174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer

Send message
Joined: 15 Aug 12
Posts: 10
Credit: 4,211,960
RAC: 9,336
Message 3175 - Posted: 22 Nov 2021, 17:01:47 UTC - in response to Message 3167.  

Update on the glibc problem (only applies to linux users):

The root cause of the problem is that I upgraded my build system in the last year and now have a newer version of glibc. Anyone using a version of glibc older than 2.33 will have this problem. The problem seems to be with a new symbol "fstat" which was not even in the old binaries (according to objdump).

I think the solution is for me to downgrade my system to an earlier version. This will break other things for me, so I am hesitant to do this in the near term. Since this app is a beta version and most runtimes take over 2 hours on the UHD 620/630 cards and hang on lesser cards, it seems like this app will not get much use until the newer intel cards come out in 2022. Hopefully by then I can improve my build system, so we can stop having these compatibility issues.


Forget (most) Linux users then, as the repositories of the most common used distro/versions are no further than 2.31
ID: 3175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3176 - Posted: 22 Nov 2021, 17:55:53 UTC - in response to Message 3174.  

I'm slightly surprised that you've felt the need to research and supply additional data about GPU compute devices. Shouldn't that be made available to you and all other application developers, either by the BOINC platform, or from the manufacturer's driver tools?

I've looked into what is made available to you by BOINC / OpenCL for my iGPUs.

name				Intel(R) HD Graphics 4600	Intel(R) HD Graphics 4600
vendor				Intel(R) Corporation		Intel(R) Corporation
vendor_id			32902				32902
available			1				1
half_fp_config			0				0
single_fp_config		158				158
double_fp_config		0				0
endian_little			1				1
execution_capabilities		1				1
global_mem_size			1360632218			1360632218
local_mem_size			65536				65536
max_clock_frequency		400				400
max_compute_units		20				20
nv_compute_capability_major	0				0
nv_compute_capability_minor	0				0
amd_simd_per_compute_unit	0				0
amd_simd_width			0				0
amd_simd_instruction_width	0				0
opencl_platform_version		OpenCL 1.2			OpenCL 1.2
opencl_device_version		OpenCL 1.2			OpenCL 1.2
opencl_driver_version		10.18.14.5162			10.18.14.5162
device_num			0	
peak_flops			64000000000	
opencl_available_ram		1360632218	
opencl_device_index		0	
warn_bad_cuda			0
That's a pretty eclectic list. The first column is taken from an internal BOINC file called 'coproc_info.xml', and the second from the sched_request_...xml file sent by our clients to your servers. I don't know why the nv and amd fields are present in the iGPU report - presumably it was simpler to use a common data structure across all vendors.

Anyway, numBlocks, threadsPerBlock and polyBufferSize are clearly all missing. How important are they, where can we get them from, and should we ask the BOINC developers to do the heavy lifting?


I think there is some confusion here on the purpose of the lookup table. Let me attempt to clear this up. The parameters "numBlocks" and "threadsPerBlock" are CUDA terminology and describe how to break up the data on the GPU. There is different but equivalent terminology for openCL (I forget the exact names, but it's not important). The optimal parameters are somewhat dependent on the algorithm itself, so the only way to determine them is to run several WUs at different settings to see which is best. The idea was that willing volunteers could do this analysis on their specific card and then report back the optimal setting to be added to the lookup table, with the hope that anybody with that same card would see the same improvement.

For those interested, the testing process is described here. This can be a tedius process, but there have been several volunteers who have helped and have lead to several of the entries in the table. Note that improvement over the default settings is usually less than 10%, so don't expect to see huge performance gains.
ID: 3176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,470,702
RAC: 128,691
Message 3177 - Posted: 26 Nov 2021, 15:46:54 UTC

My UHD 620 laptop survived the transition to Windows 11 (and preserved the BOINC v7.16.20 installation intact, somewhat to my surprise) - details at host 1232275.

I've put through task 125207453 as a test: it returned a valid result in just under two hours, so there's another data point. I've commented with concern at other projects that their invocation of the -cl-mad-enable OpenCL compiler flag has led to validation errors, especially on this machine: but I would guess that the fused multiply + add opcode is more likely to cause problems with floating point arithmetic, rather than here. Still, I mention it just in case...

This machine is an ultraportable, and after an earlier Windows 10 update, the fan only operates at zero or full speed: that doesn't sound good. I'll keep it in reserve and available for testing, but I won't be running it full time.
ID: 3177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,932,398
RAC: 289,636
Message 3178 - Posted: 26 Nov 2021, 17:25:23 UTC - in response to Message 3177.  

My UHD 620 laptop survived the transition to Windows 11 (and preserved the BOINC v7.16.20 installation intact, somewhat to my surprise) - details at host 1232275.

I've put through task 125207453 as a test: it returned a valid result in just under two hours, so there's another data point. I've commented with concern at other projects that their invocation of the -cl-mad-enable OpenCL compiler flag has led to validation errors, especially on this machine: but I would guess that the fused multiply + add opcode is more likely to cause problems with floating point arithmetic, rather than here. Still, I mention it just in case...

This machine is an ultraportable, and after an earlier Windows 10 update, the fan only operates at zero or full speed: that doesn't sound good. I'll keep it in reserve and available for testing, but I won't be running it full time.


Yes, we only use integer ops here.

The good news is the results are correct. This tells me the build process is working and the intel GPUs are capable of compiling the openCL and running it successfully. The only thing I don't like are the run times, but I think this is because the intel GPUs are not powerful enough to handle this app. I look forward to seeing how well the new Intel Arc GPUs fare after they debut next year.

For now I will leave this as a beta app. The following cards have returned successful results: HD500, HD515, Gen9, UHD620, UHD630.
ID: 3178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : News : Support for Intel GPUs


Main page · Your account · Message boards


Copyright © 2024 Arizona State University