Posts by Richard Haselgrove

1) Message boards : Number crunching : More than one task per GPU? (Message 3383)
Posted 19 Oct 2022 by Richard Haselgrove
Post:
Best to refer people directly to the User Manual

In this case, there's a whole section on app_config.xml files: client configuration - project-level_configuration
2) Message boards : News : Future of the Project (Message 3305)
Posted 10 Aug 2022 by Richard Haselgrove
Post:
Sorry to hear that. But it confirms a decision of my own.

All of us will have been incurring elevated electricity prices during the current emergencies, and here in the UK, we face another significant price jump in less than two months - at the beginning of October.

In recent years, I've been concentrating on ever-more-powerful GPU crunching, but I think with this price rise the time has come to cut back on that indulgence. So I'll be taking a dozen - reasonably modern and powerful - GPUs out of service, at least temporarily. In recent years, GPU projects have been increasingly encroaching on CPU computing, with applications utilising wasteful spin-wait synchronisation loops.

So mothballing the GPUs will release a similar number of CPU cores, and many of them will find their way here. We'll see whose hardware fails first, but I hope to give you a decent shove towards your objectives before either of us succumbs.
3) Message boards : Number crunching : My user listing has disappeared from top participants stats pages (Message 3185)
Posted 10 Dec 2021 by Richard Haselgrove
Post:
You're currently listed at number 20, so you can relax. The pages are cached, and they're not all updated at the same instant. It can sometimes happen that you're in the process of crossing a page boundary when the snapshots are taken - you might fall into the gap, or you might even appear on both! It should sort itself out at the next refresh.
4) Message boards : Number crunching : HTTPS for Master URL please (Message 3180)
Posted 28 Nov 2021 by Richard Haselgrove
Post:
Project admins should pester David Anderson to ask why there is still no sign of a v7.18 release for all platforms.
5) Message boards : News : Support for Intel GPUs (Message 3177)
Posted 26 Nov 2021 by Richard Haselgrove
Post:
My UHD 620 laptop survived the transition to Windows 11 (and preserved the BOINC v7.16.20 installation intact, somewhat to my surprise) - details at host 1232275.

I've put through task 125207453 as a test: it returned a valid result in just under two hours, so there's another data point. I've commented with concern at other projects that their invocation of the -cl-mad-enable OpenCL compiler flag has led to validation errors, especially on this machine: but I would guess that the fused multiply + add opcode is more likely to cause problems with floating point arithmetic, rather than here. Still, I mention it just in case...

This machine is an ultraportable, and after an earlier Windows 10 update, the fan only operates at zero or full speed: that doesn't sound good. I'll keep it in reserve and available for testing, but I won't be running it full time.
6) Message boards : News : Support for Intel GPUs (Message 3174)
Posted 22 Nov 2021 by Richard Haselgrove
Post:
I'm slightly surprised that you've felt the need to research and supply additional data about GPU compute devices. Shouldn't that be made available to you and all other application developers, either by the BOINC platform, or from the manufacturer's driver tools?

I've looked into what is made available to you by BOINC / OpenCL for my iGPUs.

name				Intel(R) HD Graphics 4600	Intel(R) HD Graphics 4600
vendor				Intel(R) Corporation		Intel(R) Corporation
vendor_id			32902				32902
available			1				1
half_fp_config			0				0
single_fp_config		158				158
double_fp_config		0				0
endian_little			1				1
execution_capabilities		1				1
global_mem_size			1360632218			1360632218
local_mem_size			65536				65536
max_clock_frequency		400				400
max_compute_units		20				20
nv_compute_capability_major	0				0
nv_compute_capability_minor	0				0
amd_simd_per_compute_unit	0				0
amd_simd_width			0				0
amd_simd_instruction_width	0				0
opencl_platform_version		OpenCL 1.2			OpenCL 1.2
opencl_device_version		OpenCL 1.2			OpenCL 1.2
opencl_driver_version		10.18.14.5162			10.18.14.5162
device_num			0	
peak_flops			64000000000	
opencl_available_ram		1360632218	
opencl_device_index		0	
warn_bad_cuda			0
That's a pretty eclectic list. The first column is taken from an internal BOINC file called 'coproc_info.xml', and the second from the sched_request_...xml file sent by our clients to your servers. I don't know why the nv and amd fields are present in the iGPU report - presumably it was simpler to use a common data structure across all vendors.

Anyway, numBlocks, threadsPerBlock and polyBufferSize are clearly all missing. How important are they, where can we get them from, and should we ask the BOINC developers to do the heavy lifting?
7) Message boards : News : Support for Intel GPUs (Message 3164)
Posted 19 Nov 2021 by Richard Haselgrove
Post:
Contention may also be playing a role - if other things were running on the GPU at the same time or all the CPU cores were maxed out (recall, the app needs a small portion of a cpu core).
I've run intel_gpu tasks under BOINC for a number of projects - originally SETI@Home, and currently both Einstein and WCG/Covid. All my iGPU experience is under Windows.

Contention certainly needs to be considered carefully. All GPU applications require some degree of CPU support while running, but the OpenCL language running under Windows on an iGPU is an outlier. It doesn't need much CPU time, but it wants CPU access FAST. That CPU support can be made available in two different ways: either by ensuring that a CPU core is free from a continuous running BOINC-style workload (which is wasteful), or - as first recognised at Einstein - by setting the CPU part of the iGPU application to run at real-time priority.

!!!

That latter suggestion clearly has to be handled with care by experienced users, but it works for me. It perhaps gives some insight into what's going on under the hood. Unfortunately, the iGPU is the ugly sister of the GPGPU programming world, and I don't know of anyone who has really got to grips with programming it properly under BOINC - too little time, too much to do.
8) Message boards : News : Support for Intel GPUs (Message 3154)
Posted 18 Nov 2021 by Richard Haselgrove
Post:
Probably best to just abort them when you see this behavior.
Yes, I'll probably do that before I go to bed this evening, but we may as well get as much information from them as possible.

I have one machine with

02-Nov-2021 12:05:09 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics 530 (driver version 21.20.16.5103, device version OpenCL 2.0, 1298MB, 1298MB available, 202 GFLOPS peak)
(host 33342)

That has reached
Now starting the targeted Martinet search:
  Num Cvecs = 50.
    Doing Cvec 1.
    Doing Cvec 2.
    Doing Cvec 3.
but it's taken two and a half hours - barely better then an abacus! - but that too has no task_state.xml file. It is, however, showing a steady and appropriate progress value of 5.003%, so maybe that was a false alarm.

I also have UHD 620 machine, but it's not crunching at the moment. Maybe tomorrow.
9) Message boards : News : Support for Intel GPUs (Message 3150)
Posted 18 Nov 2021 by Richard Haselgrove
Post:
Still running, after an hour and three quarters. I'm even more convinced it's stalled, so I'll have a deeper dig later.

Workunit has settings

             <rsc_fpops_est>30000000000000.000000</rsc_fpops_est>
    <rsc_fpops_bound>700000000000000000000.000000</rsc_fpops_bound>
That means that BOINC will allow it to run for over four centuries before it intervenes to abort it. I don't think I've quite got that much patience...
10) Message boards : News : Support for Intel GPUs (Message 3149)
Posted 18 Nov 2021 by Richard Haselgrove
Post:
First task has reached an indicated 99% progress after 50 minutes, but I strongly suspect this is BOINC's imaginary 'pseudo progress' from a stalled app. I'll keep an eye on it.
11) Message boards : News : Support for Intel GPUs (Message 3147)
Posted 18 Nov 2021 by Richard Haselgrove
Post:
And this doesn't look right, either.

D:\BOINCdata\slots\5>dir
 Volume in drive D has no label.
 Volume Serial Number is E0ED-E51A

 Directory of D:\BOINCdata\slots\5

18/11/2021  16:06    <DIR>          .
18/11/2021  16:06    <DIR>          ..
18/11/2021  16:06                 0 boinc_lockfile
18/11/2021  16:06               117 GetDecics_4.02_windows_x86_64__opencl_intel
30/01/2021  10:51               349 gpuLookupTable.txt
30/01/2021  10:51            34,028 gpuMultiPrec.h
18/11/2021  16:06               113 in
18/11/2021  16:06            10,923 init_data.xml
30/01/2021  10:51               728 mp_int.h
18/11/2021  16:06               128 out
30/01/2021  10:51            76,646 pdtKernel.cl
18/11/2021  16:06               281 stderr.txt
              10 File(s)        123,313 bytes
I don't see any sign of the app recording progress. I'd expect to see a 'boinc_task_state.xml' file by now.

Initial runtime estimate was 10 min 42 sec (BOINC estimates are notoriously unreliable for a new app): it's been running for 18:30, but I'm not sure it's done anything.
12) Message boards : News : Support for Intel GPUs (Message 3146)
Posted 18 Nov 2021 by Richard Haselgrove
Post:
This doesn't look right. From stderr, in running:

GPU Summary String = [CUDA|NVIDIAGeForceGTX1660SUPER|2|4095MB|47212|300][INTEL|Intel(R)HDGraphics4600|1|1297MB||102].
Loading GPU lookup table from file.
GPU found in lookup table:
  GPU Name = GTX1660.
  numBlocks = 8192.
  threadsPerBlock = 32.
  polyBufferSize = 262144.
The "Intel HD Graphics 4600" is right (present on the machine, OpenCL driver installed, known to BOINC, ready for use), but the GPU Name and (probably) metrics are from the wrong manufacturer.
13) Message boards : News : Support for Intel GPUs (Message 3145)
Posted 18 Nov 2021 by Richard Haselgrove
Post:
Just had this happen:

18/11/2021 15:36:28 | NumberFields@home | Scheduler request completed: got 42 new tasks
18/11/2021 15:36:28 | NumberFields@home | [sched_op] estimated total Intel GPU task duration: 26357 seconds
18/11/2021 15:36:30 | NumberFields@home | Started download of GetDecics_4.02_windows_x86_64__opencl_intel
The download of the .exe file is going very slowly (21 MB at a speed of 16 KBps), but it's getting there. Should be finished in 15 mins or so. I'll be watching host 1291 - any particular feedback you'd like?

Edit:
18/11/2021 16:05:58 | NumberFields@home | Finished download of GetDecics_4.02_windows_x86_64__opencl_intel
18/11/2021 16:06:04 | NumberFields@home | Starting task wu_sf3_DS-16x271-15_Grp669648of1000000_0
Nearly half an hour to download! A caching server (e.g. Cloudflare) can help with that - otherwise, just have patience. It'll be better when the initial rush has died down.
14) Message boards : Number crunching : HTTPS for Master URL please (Message 3128)
Posted 3 Oct 2021 by Richard Haselgrove
Post:
Another reason for not acting too precipitately: the SSL expiry problem only affected clients running on the Windows platform. They use a static ca-bundle.crt file, which can't be automatically updated. Other platforms use the operating system's security bundle, so they get updates with system updates.

The emergency release we are expecting 'real soon now' will be for Windows only. The http->https patch will only be fully effective when a planned release is made across all platforms. That's been overdue for a while, but there's no sign of a plan for when it might take place.
15) Message boards : Number crunching : HTTPS for Master URL please (Message 3124)
Posted 2 Oct 2021 by Richard Haselgrove
Post:
Please keep an eye on future developments - specifically, https://github.com/BOINC/boinc/pull/4539, "all master URL update to https".

David Anderson wrote that yesterday, when he should have been fixing a problem caused by the expiry of an SSL certificate the day before.

Judging by his comment, "Note: code you wrote a long time ago sometimes doesn't seem to make any sense at all", I don't expect this feature to be fully tested in time for the emergency release we expect in the next few days, but there's hope that the problems you encountered will be reduced when the next recommended release reaches widespread coverage,
16) Message boards : Number crunching : Team credit lost ! (Message 3035)
Posted 30 Jan 2021 by Richard Haselgrove
Post:
... your tasks still show "in progress"....
If you click through to the workunit (as distinct form the task), they show as 'WU cancelled'. I've also just got a batch (across several machines, including this one) of 'download failed' - scheduler issued the work, but the data file couldn't be found. I think that's all part of the process of the database healing itself.

I'm not worried about any of these - though it might be worth giving what remains of the database a good spring-clean at the end of this run. You might want to do that anyway to check that you've got all the results you were expecting.

But if you've got a load of unmatched upload files, and a load of incomplete database records, a query might be able to flip the database state to 'needs validation', and recover them.
17) Message boards : Number crunching : Team credit lost ! (Message 3033)
Posted 30 Jan 2021 by Richard Haselgrove
Post:
Each of my machines is making an unusual report every time it requests new work:

30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399197of2000000_0 (expired)
30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399198of2000000_0 (expired)
30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399199of2000000_0 (expired)
30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399200of2000000_0 (expired)
30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399201of2000000_0 (expired)
30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399202of2000000_0 (expired)
30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399347of2000000_0 (expired)
30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399689of2000000_0 (expired)

They appear at the top of https://numberfields.asu.edu/NumberFields/results.php?hostid=1291&offset=0&show_names=1&state=1: judging by the dates, they're probably in a similar state.
18) Message boards : News : Finally recovering from hard drive crash. (Message 3026)
Posted 29 Jan 2021 by Richard Haselgrove
Post:
Yes, I see the new datestamps on the export files - that should do it.

But my credit now (131,860,347) is still below the last export before the crash (132,015,902). So the first movement will be downwards. Nothing we can do about that.
19) Message boards : News : New and improved apps coming soon (Message 2984)
Posted 5 Jan 2021 by Richard Haselgrove
Post:
We used to use that trick a lot at SETI@Home, which was open-source from the very beginning. The volunteers (collectively) had far more time for optimising the various apps than the project staff, and the staff actively encouraged volunteers to help speed things up - provided quality and accuracy were maintained.

There are four main tags to watch for:

<app_name>GetDecics</app_name>
<version_num>400</version_num>
<platform>windows_x86_64</platform>
<plan_class>default</plan_class>

Two appear in <workunit>, and three appear in <result> (<version_num> appears in both). The complete set of four has to be consistent for each task, and they have to match an <app_version> which is already defined on your system. Apart from that, it's easy....

Some people even devised scripts or applications to automate the process - look for the term 'reschedule' at SETI.
20) Message boards : News : New and improved apps coming soon (Message 2980)
Posted 5 Jan 2021 by Richard Haselgrove
Post:
Yes, the new version 4.01 is running fine on 'Linux Mint 19.1 Tessa' (host 1697845)


Next 20


Main page · Your account · Message boards


Copyright © 2022 Arizona State University