|
1)
Message boards :
Number crunching :
Running slow on Intel GPU
(Message 3942)
Posted 24 Aug 2025 by esek Post: There may be some issues with the Intel graphics driver. In primegrid, other users' Arc A580s complete GFN-18 and GFN-19 tasks faster than mine and other users' Arc B580s. Other user's A580 GFN-18: Run time 678.70 https://www.primegrid.com/result.php?resultid=1972100672 Other user's A580 GFN-19: Run time 2,060.71 https://www.primegrid.com/result.php?resultid=1972095096 B580 GFN-18: Run time 813.03 https://www.primegrid.com/result.php?resultid=1971845683 B580 GFN-19: Run time 2,336.39 https://www.primegrid.com/result.php?resultid=1971829718 Other user's B580 GFN-18: Run time 866.14 https://www.primegrid.com/result.php?resultid=1972316750 |
|
2)
Message boards :
Number crunching :
Running slow on Intel GPU
(Message 3940)
Posted 24 Aug 2025 by esek Post: There are WUs runtime under Linux. Lowering threadsPerBlock value reduces the run time on the Intel GPU. Also, the x86_64-pc-linux-gnu application doesn't have the problem with the entire thread being occupied; the cpu time is significantly shorter than the run time. There may be some issue with the windows_x86_64 application. Arch Linux [6.16.3-arch1-1|libc 2.42] https://numberfields.asu.edu/NumberFields/result.php?resultid=263784218 GPU Summary String = [CAL|AMDRadeonGraphics(radeonsi,raphael_mendocino,LLVM20.1.8,DRM3.64,6.16.3-arch1-1)|1|2048MB||300][INTEL|Intel(R)Arc(TM)B580Graphics|1|11605MB||300]. Loading GPU lookup table from file. GPU was not found in the lookup table. Using default values: numBlocks = 1024. threadsPerBlock = 32. polyBufferSize = 32768. Run time 1 hours 12 min 39 sec CPU time 3 min 59 sec https://numberfields.asu.edu/NumberFields/result.php?resultid=263790057 GPU Summary String = [CAL|AMDRadeonGraphics(radeonsi,raphael_mendocino,LLVM20.1.8,DRM3.64,6.16.3-arch1-1)|1|2048MB||300][INTEL|Intel(R)Arc(TM)B580Graphics|1|11605MB||300]. Loading GPU lookup table from file. GPU found in lookup table: GPU Name = B580. numBlocks = 2560. threadsPerBlock = 4. polyBufferSize = 10240. Run time 9 min 45 sec CPU time 1 min 38 sec |
|
3)
Message boards :
Number crunching :
Running slow on Intel GPU
(Message 3938)
Posted 24 Aug 2025 by esek Post: Setting a small threadsPerBlock reduces the task's runtime from over 3000 seconds to around 400 seconds. https://numberfields.asu.edu/NumberFields/result.php?resultid=263760673 GPU was not found in the lookup table. Using default values: numBlocks = 1024. threadsPerBlock = 32. polyBufferSize = 32768. Run time 55 min 58 sec CPU time 55 min 58 sec https://numberfields.asu.edu/NumberFields/result.php?resultid=263760666 GPU found in lookup table: GPU Name = B580. numBlocks = 2560. threadsPerBlock = 4. polyBufferSize = 10240. Run time 13 min 31 sec CPU time 13 min 31 sec On Linux, modifications will cause the file size to change and the program will not run. https://numberfields.asu.edu/NumberFields/result.php?resultid=263769261 <core_client_version>8.0.4</core_client_version> <![CDATA[ <message> couldn't start app: Task file gpuLookupTable_v402.txt: file has the wrong size</message> ]]> |
|
4)
Message boards :
Number crunching :
Running slow on Intel GPU
(Message 3937)
Posted 24 Aug 2025 by esek Post: I'm using the latest driver, 32.0.101.7026. I've found that lowering threadsPerBlock to smaller values like 4 or 8 can improve performance on Intel GPU. |
|
5)
Message boards :
Number crunching :
Running slow on Intel GPU
(Message 3935)
Posted 23 Aug 2025 by esek Post: I noticed that this caused high memory reads on Intel GPUs, but not on NVIDIA GPUs. Even on older NVIDIA GPUs with smaller caches, bandwidth bottlenecks weren't as severe. This could be one of the root causes of the problem. VTune showed that memory activity was close to 100%, XVE SBID stalls were close to 100%, and XVE array activity and XVE pipelines were very low.By comparison with other projects, this may be abnormal. Intel may have some issues with OpenCL, causing a severe bandwidth bottleneck. However, Intel GPUs also tend to be slower in other projects. As for CPU time, this may be a common issue for Intel. In GFN-18, Intel also fully occupies a CPU thread, but reducing its CPU time by running a large number of CPU tasks and lowering the priority of GPU tasks does not result in longer task execution times. On Nvidia GPUs, larger GFN tasks have a lower CPU time ratio. On Intel GPUs, the CPU may be waiting rather than sleeping while the GPU performs computations. However, on Amicable Numbers and AP27, the CPU time is smaller and does not always occupy an entire thread. GPU XVE Stall Reasons XVE Instruction Fetch Stall 0.9% XVE Barrier Stall 0.0% XVE Dist or Acc Stall 48.8% XVE Send Stall 4.0% XVE Pipe Stall 0.1% XVE SBID Stall 99.1% XVE Control Stall 1.1% XVE Other Stall 0.0% GPU XVE Pipelines ALU0 and ALU1 Utilization 0.0% ALU0 and XMX Utilization 0.0% Multiple Pipe Utilization 0.0% XVE ALU0 pipeline active 0.1% XVE ALU1 pipeline active 1.2% XVE XMX pipeline active 0.0% |
|
6)
Message boards :
Number crunching :
Running slow on Intel GPU
(Message 3933)
Posted 23 Aug 2025 by esek Post: On B580, it takes about 7 times as long as 2080ti to complete a workunit. https://imgur.com/a/hti7VZj GPU Vector Engine XVE Arrays Active: 1.0% Idle: 7.5% Stalled: 91.5% GPU Computing Threads Dispatch XVE Threads Occupancy 80.0% Thread Dispatcher Active 0.0% GPU L3 Cache Bandwidth and Misses Average Bandwidth, GB/sec L3 Read: 289.754 /sec L3 Write: 269.693 /sec L3 Misses, Misses/sec 1,906,192,273 /sec L3 Input Available 24.8% L3 Output Ready 11.6% L3 Busy 100.0% L3 Stalled 14.6% SQ Full 2.2% GPU Memory Access Average Bandwidth, GB/sec Read: 237.404 /sec Write: 74.790 /sec GPU Memory Active 99.8% |