Posts by esek

1) Message boards : Number crunching : Running slow on Intel GPU (Message 3942)
Posted 24 Aug 2025 by esek
Post:
There may be some issues with the Intel graphics driver. In primegrid, other users' Arc A580s complete GFN-18 and GFN-19 tasks faster than mine and other users' Arc B580s.

Other user's A580 GFN-18: Run time 678.70
https://www.primegrid.com/result.php?resultid=1972100672
Other user's A580 GFN-19: Run time 2,060.71
https://www.primegrid.com/result.php?resultid=1972095096
B580 GFN-18: Run time 813.03
https://www.primegrid.com/result.php?resultid=1971845683
B580 GFN-19: Run time 2,336.39
https://www.primegrid.com/result.php?resultid=1971829718
Other user's B580 GFN-18: Run time 866.14
https://www.primegrid.com/result.php?resultid=1972316750
2) Message boards : Number crunching : Running slow on Intel GPU (Message 3940)
Posted 24 Aug 2025 by esek
Post:
There are WUs runtime under Linux. Lowering threadsPerBlock value reduces the run time on the Intel GPU.

Also, the x86_64-pc-linux-gnu application doesn't have the problem with the entire thread being occupied; the cpu time is significantly shorter than the run time. There may be some issue with the windows_x86_64 application.

Arch Linux [6.16.3-arch1-1|libc 2.42]
https://numberfields.asu.edu/NumberFields/result.php?resultid=263784218
GPU Summary String = [CAL|AMDRadeonGraphics(radeonsi,raphael_mendocino,LLVM20.1.8,DRM3.64,6.16.3-arch1-1)|1|2048MB||300][INTEL|Intel(R)Arc(TM)B580Graphics|1|11605MB||300].
Loading GPU lookup table from file.
GPU was not found in the lookup table.  Using default values:
  numBlocks = 1024.
  threadsPerBlock = 32.
  polyBufferSize = 32768.
Run time 	1 hours 12 min 39 sec
CPU time 	3 min 59 sec 

https://numberfields.asu.edu/NumberFields/result.php?resultid=263790057
GPU Summary String = [CAL|AMDRadeonGraphics(radeonsi,raphael_mendocino,LLVM20.1.8,DRM3.64,6.16.3-arch1-1)|1|2048MB||300][INTEL|Intel(R)Arc(TM)B580Graphics|1|11605MB||300].
Loading GPU lookup table from file.
GPU found in lookup table:
  GPU Name = B580.
  numBlocks = 2560.
  threadsPerBlock = 4.
  polyBufferSize = 10240.
Run time 	9 min 45 sec
CPU time 	1 min 38 sec 
3) Message boards : Number crunching : Running slow on Intel GPU (Message 3938)
Posted 24 Aug 2025 by esek
Post:
Setting a small threadsPerBlock reduces the task's runtime from over 3000 seconds to around 400 seconds.
https://numberfields.asu.edu/NumberFields/result.php?resultid=263760673
GPU was not found in the lookup table.  Using default values:
  numBlocks = 1024.
  threadsPerBlock = 32.
  polyBufferSize = 32768.
Run time	55 min 58 sec
CPU time	55 min 58 sec

https://numberfields.asu.edu/NumberFields/result.php?resultid=263760666
GPU found in lookup table:
  GPU Name = B580.
  numBlocks = 2560.
  threadsPerBlock = 4.
  polyBufferSize = 10240.
Run time	13 min 31 sec
CPU time	13 min 31 sec

On Linux, modifications will cause the file size to change and the program will not run.
https://numberfields.asu.edu/NumberFields/result.php?resultid=263769261
<core_client_version>8.0.4</core_client_version>
<![CDATA[
<message>
couldn't start app: Task file gpuLookupTable_v402.txt: file has the wrong size</message>
]]>
4) Message boards : Number crunching : Running slow on Intel GPU (Message 3937)
Posted 24 Aug 2025 by esek
Post:
I'm using the latest driver, 32.0.101.7026.
I've found that lowering threadsPerBlock to smaller values ​​like 4 or 8 can improve performance on Intel GPU.
5) Message boards : Number crunching : Running slow on Intel GPU (Message 3935)
Posted 23 Aug 2025 by esek
Post:
I noticed that this caused high memory reads on Intel GPUs, but not on NVIDIA GPUs. Even on older NVIDIA GPUs with smaller caches, bandwidth bottlenecks weren't as severe. This could be one of the root causes of the problem.

VTune showed that memory activity was close to 100%, XVE SBID stalls were close to 100%, and XVE array activity and XVE pipelines were very low.By comparison with other projects, this may be abnormal.

Intel may have some issues with OpenCL, causing a severe bandwidth bottleneck. However, Intel GPUs also tend to be slower in other projects.

As for CPU time, this may be a common issue for Intel. In GFN-18, Intel also fully occupies a CPU thread, but reducing its CPU time by running a large number of CPU tasks and lowering the priority of GPU tasks does not result in longer task execution times.

On Nvidia GPUs, larger GFN tasks have a lower CPU time ratio. On Intel GPUs, the CPU may be waiting rather than sleeping while the GPU performs computations. However, on Amicable Numbers and AP27, the CPU time is smaller and does not always occupy an entire thread.

GPU XVE Stall Reasons
XVE Instruction Fetch Stall
0.9%
XVE Barrier Stall
0.0%
XVE Dist or Acc Stall
48.8%
XVE Send Stall
4.0%
XVE Pipe Stall
0.1%
XVE SBID Stall
99.1%
XVE Control Stall
1.1%
XVE Other Stall
0.0%

GPU XVE Pipelines
ALU0 and ALU1 Utilization
0.0%
ALU0 and XMX Utilization
0.0%
Multiple Pipe Utilization
0.0%
XVE ALU0 pipeline active
0.1%
XVE ALU1 pipeline active
1.2%
XVE XMX pipeline active
0.0%
6) Message boards : Number crunching : Running slow on Intel GPU (Message 3933)
Posted 23 Aug 2025 by esek
Post:
On B580, it takes about 7 times as long as 2080ti to complete a workunit.
https://imgur.com/a/hti7VZj

GPU Vector Engine
XVE Arrays
Active: 1.0%
Idle: 7.5%
Stalled: 91.5%

GPU Computing Threads Dispatch
XVE Threads Occupancy
80.0%
Thread Dispatcher Active
0.0%

GPU L3 Cache Bandwidth and Misses
Average Bandwidth, GB/sec
L3 Read: 289.754 /sec
L3 Write: 269.693 /sec
L3 Misses, Misses/sec
1,906,192,273 /sec
L3 Input Available
24.8%
L3 Output Ready
11.6%
L3 Busy
100.0%
L3 Stalled
14.6%
SQ Full
2.2%

GPU Memory Access
Average Bandwidth, GB/sec
Read: 237.404 /sec
Write: 74.790 /sec
GPU Memory Active
99.8%





Main page · Your account · Message boards


Copyright © 2026 Arizona State University