1)
Message boards :
Number crunching :
My user listing has disappeared from top participants stats pages
(Message 3185)
Posted 10 Dec 2021 by Richard Haselgrove Post: You're currently listed at number 20, so you can relax. The pages are cached, and they're not all updated at the same instant. It can sometimes happen that you're in the process of crossing a page boundary when the snapshots are taken - you might fall into the gap, or you might even appear on both! It should sort itself out at the next refresh. |
2)
Message boards :
Number crunching :
HTTPS for Master URL please
(Message 3180)
Posted 28 Nov 2021 by Richard Haselgrove Post: Project admins should pester David Anderson to ask why there is still no sign of a v7.18 release for all platforms. |
3)
Message boards :
News :
Support for Intel GPUs
(Message 3177)
Posted 26 Nov 2021 by Richard Haselgrove Post: My UHD 620 laptop survived the transition to Windows 11 (and preserved the BOINC v7.16.20 installation intact, somewhat to my surprise) - details at host 1232275. I've put through task 125207453 as a test: it returned a valid result in just under two hours, so there's another data point. I've commented with concern at other projects that their invocation of the -cl-mad-enable OpenCL compiler flag has led to validation errors, especially on this machine: but I would guess that the fused multiply + add opcode is more likely to cause problems with floating point arithmetic, rather than here. Still, I mention it just in case... This machine is an ultraportable, and after an earlier Windows 10 update, the fan only operates at zero or full speed: that doesn't sound good. I'll keep it in reserve and available for testing, but I won't be running it full time. |
4)
Message boards :
News :
Support for Intel GPUs
(Message 3174)
Posted 22 Nov 2021 by Richard Haselgrove Post: I'm slightly surprised that you've felt the need to research and supply additional data about GPU compute devices. Shouldn't that be made available to you and all other application developers, either by the BOINC platform, or from the manufacturer's driver tools? I've looked into what is made available to you by BOINC / OpenCL for my iGPUs. name Intel(R) HD Graphics 4600 Intel(R) HD Graphics 4600 vendor Intel(R) Corporation Intel(R) Corporation vendor_id 32902 32902 available 1 1 half_fp_config 0 0 single_fp_config 158 158 double_fp_config 0 0 endian_little 1 1 execution_capabilities 1 1 global_mem_size 1360632218 1360632218 local_mem_size 65536 65536 max_clock_frequency 400 400 max_compute_units 20 20 nv_compute_capability_major 0 0 nv_compute_capability_minor 0 0 amd_simd_per_compute_unit 0 0 amd_simd_width 0 0 amd_simd_instruction_width 0 0 opencl_platform_version OpenCL 1.2 OpenCL 1.2 opencl_device_version OpenCL 1.2 OpenCL 1.2 opencl_driver_version 10.18.14.5162 10.18.14.5162 device_num 0 peak_flops 64000000000 opencl_available_ram 1360632218 opencl_device_index 0 warn_bad_cuda 0That's a pretty eclectic list. The first column is taken from an internal BOINC file called 'coproc_info.xml', and the second from the sched_request_...xml file sent by our clients to your servers. I don't know why the nv and amd fields are present in the iGPU report - presumably it was simpler to use a common data structure across all vendors. Anyway, numBlocks, threadsPerBlock and polyBufferSize are clearly all missing. How important are they, where can we get them from, and should we ask the BOINC developers to do the heavy lifting? |
5)
Message boards :
News :
Support for Intel GPUs
(Message 3164)
Posted 19 Nov 2021 by Richard Haselgrove Post: Contention may also be playing a role - if other things were running on the GPU at the same time or all the CPU cores were maxed out (recall, the app needs a small portion of a cpu core).I've run intel_gpu tasks under BOINC for a number of projects - originally SETI@Home, and currently both Einstein and WCG/Covid. All my iGPU experience is under Windows. Contention certainly needs to be considered carefully. All GPU applications require some degree of CPU support while running, but the OpenCL language running under Windows on an iGPU is an outlier. It doesn't need much CPU time, but it wants CPU access FAST. That CPU support can be made available in two different ways: either by ensuring that a CPU core is free from a continuous running BOINC-style workload (which is wasteful), or - as first recognised at Einstein - by setting the CPU part of the iGPU application to run at real-time priority. !!! That latter suggestion clearly has to be handled with care by experienced users, but it works for me. It perhaps gives some insight into what's going on under the hood. Unfortunately, the iGPU is the ugly sister of the GPGPU programming world, and I don't know of anyone who has really got to grips with programming it properly under BOINC - too little time, too much to do. |
6)
Message boards :
News :
Support for Intel GPUs
(Message 3154)
Posted 18 Nov 2021 by Richard Haselgrove Post: Probably best to just abort them when you see this behavior.Yes, I'll probably do that before I go to bed this evening, but we may as well get as much information from them as possible. I have one machine with 02-Nov-2021 12:05:09 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics 530 (driver version 21.20.16.5103, device version OpenCL 2.0, 1298MB, 1298MB available, 202 GFLOPS peak)(host 33342) That has reached Now starting the targeted Martinet search: Num Cvecs = 50. Doing Cvec 1. Doing Cvec 2. Doing Cvec 3.but it's taken two and a half hours - barely better then an abacus! - but that too has no task_state.xml file. It is, however, showing a steady and appropriate progress value of 5.003%, so maybe that was a false alarm. I also have UHD 620 machine, but it's not crunching at the moment. Maybe tomorrow. |
7)
Message boards :
News :
Support for Intel GPUs
(Message 3150)
Posted 18 Nov 2021 by Richard Haselgrove Post: Still running, after an hour and three quarters. I'm even more convinced it's stalled, so I'll have a deeper dig later. Workunit has settings <rsc_fpops_est>30000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>700000000000000000000.000000</rsc_fpops_bound>That means that BOINC will allow it to run for over four centuries before it intervenes to abort it. I don't think I've quite got that much patience... |
8)
Message boards :
News :
Support for Intel GPUs
(Message 3149)
Posted 18 Nov 2021 by Richard Haselgrove Post: First task has reached an indicated 99% progress after 50 minutes, but I strongly suspect this is BOINC's imaginary 'pseudo progress' from a stalled app. I'll keep an eye on it. |
9)
Message boards :
News :
Support for Intel GPUs
(Message 3147)
Posted 18 Nov 2021 by Richard Haselgrove Post: And this doesn't look right, either. D:\BOINCdata\slots\5>dir Volume in drive D has no label. Volume Serial Number is E0ED-E51A Directory of D:\BOINCdata\slots\5 18/11/2021 16:06 <DIR> . 18/11/2021 16:06 <DIR> .. 18/11/2021 16:06 0 boinc_lockfile 18/11/2021 16:06 117 GetDecics_4.02_windows_x86_64__opencl_intel 30/01/2021 10:51 349 gpuLookupTable.txt 30/01/2021 10:51 34,028 gpuMultiPrec.h 18/11/2021 16:06 113 in 18/11/2021 16:06 10,923 init_data.xml 30/01/2021 10:51 728 mp_int.h 18/11/2021 16:06 128 out 30/01/2021 10:51 76,646 pdtKernel.cl 18/11/2021 16:06 281 stderr.txt 10 File(s) 123,313 bytesI don't see any sign of the app recording progress. I'd expect to see a 'boinc_task_state.xml' file by now. Initial runtime estimate was 10 min 42 sec (BOINC estimates are notoriously unreliable for a new app): it's been running for 18:30, but I'm not sure it's done anything. |
10)
Message boards :
News :
Support for Intel GPUs
(Message 3146)
Posted 18 Nov 2021 by Richard Haselgrove Post: This doesn't look right. From stderr, in running: GPU Summary String = [CUDA|NVIDIAGeForceGTX1660SUPER|2|4095MB|47212|300][INTEL|Intel(R)HDGraphics4600|1|1297MB||102]. Loading GPU lookup table from file. GPU found in lookup table: GPU Name = GTX1660. numBlocks = 8192. threadsPerBlock = 32. polyBufferSize = 262144.The "Intel HD Graphics 4600" is right (present on the machine, OpenCL driver installed, known to BOINC, ready for use), but the GPU Name and (probably) metrics are from the wrong manufacturer. |
11)
Message boards :
News :
Support for Intel GPUs
(Message 3145)
Posted 18 Nov 2021 by Richard Haselgrove Post: Just had this happen: 18/11/2021 15:36:28 | NumberFields@home | Scheduler request completed: got 42 new tasksThe download of the .exe file is going very slowly (21 MB at a speed of 16 KBps), but it's getting there. Should be finished in 15 mins or so. I'll be watching host 1291 - any particular feedback you'd like? Edit: 18/11/2021 16:05:58 | NumberFields@home | Finished download of GetDecics_4.02_windows_x86_64__opencl_intelNearly half an hour to download! A caching server (e.g. Cloudflare) can help with that - otherwise, just have patience. It'll be better when the initial rush has died down. |
12)
Message boards :
Number crunching :
HTTPS for Master URL please
(Message 3128)
Posted 3 Oct 2021 by Richard Haselgrove Post: Another reason for not acting too precipitately: the SSL expiry problem only affected clients running on the Windows platform. They use a static ca-bundle.crt file, which can't be automatically updated. Other platforms use the operating system's security bundle, so they get updates with system updates. The emergency release we are expecting 'real soon now' will be for Windows only. The http->https patch will only be fully effective when a planned release is made across all platforms. That's been overdue for a while, but there's no sign of a plan for when it might take place. |
13)
Message boards :
Number crunching :
HTTPS for Master URL please
(Message 3124)
Posted 2 Oct 2021 by Richard Haselgrove Post: Please keep an eye on future developments - specifically, https://github.com/BOINC/boinc/pull/4539, "all master URL update to https". David Anderson wrote that yesterday, when he should have been fixing a problem caused by the expiry of an SSL certificate the day before. Judging by his comment, "Note: code you wrote a long time ago sometimes doesn't seem to make any sense at all", I don't expect this feature to be fully tested in time for the emergency release we expect in the next few days, but there's hope that the problems you encountered will be reduced when the next recommended release reaches widespread coverage, |
14)
Message boards :
Number crunching :
Team credit lost !
(Message 3035)
Posted 30 Jan 2021 by Richard Haselgrove Post: ... your tasks still show "in progress"....If you click through to the workunit (as distinct form the task), they show as 'WU cancelled'. I've also just got a batch (across several machines, including this one) of 'download failed' - scheduler issued the work, but the data file couldn't be found. I think that's all part of the process of the database healing itself. I'm not worried about any of these - though it might be worth giving what remains of the database a good spring-clean at the end of this run. You might want to do that anyway to check that you've got all the results you were expecting. But if you've got a load of unmatched upload files, and a load of incomplete database records, a query might be able to flip the database state to 'needs validation', and recover them. |
15)
Message boards :
Number crunching :
Team credit lost !
(Message 3033)
Posted 30 Jan 2021 by Richard Haselgrove Post: Each of my machines is making an unusual report every time it requests new work: 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399197of2000000_0 (expired) 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399198of2000000_0 (expired) 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399199of2000000_0 (expired) 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399200of2000000_0 (expired) 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399201of2000000_0 (expired) 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399202of2000000_0 (expired) 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399347of2000000_0 (expired) 30/01/2021 18:42:13 | NumberFields@home | Didn't resend lost task wu_sf3_DS-16x271-1_Grp399689of2000000_0 (expired) They appear at the top of https://numberfields.asu.edu/NumberFields/results.php?hostid=1291&offset=0&show_names=1&state=1: judging by the dates, they're probably in a similar state. |
16)
Message boards :
News :
Finally recovering from hard drive crash.
(Message 3026)
Posted 29 Jan 2021 by Richard Haselgrove Post: Yes, I see the new datestamps on the export files - that should do it. But my credit now (131,860,347) is still below the last export before the crash (132,015,902). So the first movement will be downwards. Nothing we can do about that. |
17)
Message boards :
News :
New and improved apps coming soon
(Message 2984)
Posted 5 Jan 2021 by Richard Haselgrove Post: We used to use that trick a lot at SETI@Home, which was open-source from the very beginning. The volunteers (collectively) had far more time for optimising the various apps than the project staff, and the staff actively encouraged volunteers to help speed things up - provided quality and accuracy were maintained. There are four main tags to watch for: <app_name>GetDecics</app_name> <version_num>400</version_num> <platform>windows_x86_64</platform> <plan_class>default</plan_class> Two appear in <workunit>, and three appear in <result> (<version_num> appears in both). The complete set of four has to be consistent for each task, and they have to match an <app_version> which is already defined on your system. Apart from that, it's easy.... Some people even devised scripts or applications to automate the process - look for the term 'reschedule' at SETI. |
18)
Message boards :
News :
New and improved apps coming soon
(Message 2980)
Posted 5 Jan 2021 by Richard Haselgrove Post: Yes, the new version 4.01 is running fine on 'Linux Mint 19.1 Tessa' (host 1697845) |
19)
Message boards :
News :
New and improved apps coming soon
(Message 2979)
Posted 5 Jan 2021 by Richard Haselgrove Post: No, it's not a public feature of the client - it's just making use of the way BOINC stores the data defining the tasks in the cache. For each task, there's a <workunit> and a <result> xml chunk in client_state.xml. At the time I wrote that, the only difference between a GPU task and a CPU task was the <plan_class> line in the <result> chunk. Text editor, search'n'replace, and 'cuda30' turned into 'default'. Voila.I've converted the GPU tasks back to run on the CPU, so they won't be wasted or need to be resent.Richard, The new cuda app is version 4.01Now, we'll have to tweak the <version_num> lines in both <workunit> and <result>, but it's still doable - but it takes care and understanding. As we used to say, "For advanced users only. At your own risk." But I've still got a Mint 19 machine, so I can try the new app.. |
20)
Message boards :
News :
New and improved apps coming soon
(Message 2975)
Posted 4 Jan 2021 by Richard Haselgrove Post: No need to apologise - it was my decision to upgrade. The machine was largely idle because GPUGrid has come to the end of its current research run, and I'm not really enthused by make-work projects like Collatz and PrimeGrid. I'd been thinking of upgrading for a while, and you gave me an excuse to get my brain into gear after the holidays. It's generally running smoothly, but I hit two tasks today which seemed to get stuck in an endless loop. Task 105827986 (from wu_sf3_DS-16x270_Grp 3738460 of 3932160) Task 105983915 (from wu_sf3_DS-16x270_Grp 3639160 of 3932160) I don't monitor the GPU loadings, but I do run a monitor which displays the 'CPU efficiency' of the task - %age of time the CPU is under load. That's typically 20% for the Linux GPU tasks, but dropped well down below 10% for these: elapsed time continued to rise, but task progress froze at 90.880% and 90.629% respectively. I paused them a couple of times each (which for GPU tasks removes them completely from memory): they re-started from checkpoint OK, but froze again at the same point. In the end, I aborted them. I noticed the lookup file, but didn't explore it in detail. BOINC itself concentrates on the "compute capability", which is invariant for each iteration of the NVidia architecture - it determines such things as the number of shaders per multiplex, and the minimum CUDA level required in the driver. Do you plan to document your usage anywhere? |