Posts by Richard Haselgrove

21) Message boards : News : New and improved apps coming soon (Message 2984)
Posted 5 Jan 2021 by Richard Haselgrove
Post:
We used to use that trick a lot at SETI@Home, which was open-source from the very beginning. The volunteers (collectively) had far more time for optimising the various apps than the project staff, and the staff actively encouraged volunteers to help speed things up - provided quality and accuracy were maintained.

There are four main tags to watch for:

<app_name>GetDecics</app_name>
<version_num>400</version_num>
<platform>windows_x86_64</platform>
<plan_class>default</plan_class>

Two appear in <workunit>, and three appear in <result> (<version_num> appears in both). The complete set of four has to be consistent for each task, and they have to match an <app_version> which is already defined on your system. Apart from that, it's easy....

Some people even devised scripts or applications to automate the process - look for the term 'reschedule' at SETI.
22) Message boards : News : New and improved apps coming soon (Message 2980)
Posted 5 Jan 2021 by Richard Haselgrove
Post:
Yes, the new version 4.01 is running fine on 'Linux Mint 19.1 Tessa' (host 1697845)
23) Message boards : News : New and improved apps coming soon (Message 2979)
Posted 5 Jan 2021 by Richard Haselgrove
Post:
I've converted the GPU tasks back to run on the CPU, so they won't be wasted or need to be resent.
Richard,

I forgot to ask you what you meant by the above statement. Is this a newer feature of the client that I am not aware of? There have been times when I wished I could do such a conversion.
No, it's not a public feature of the client - it's just making use of the way BOINC stores the data defining the tasks in the cache. For each task, there's a <workunit> and a <result> xml chunk in client_state.xml. At the time I wrote that, the only difference between a GPU task and a CPU task was the <plan_class> line in the <result> chunk. Text editor, search'n'replace, and 'cuda30' turned into 'default'. Voila.

The new cuda app is version 4.01
Now, we'll have to tweak the <version_num> lines in both <workunit> and <result>, but it's still doable - but it takes care and understanding. As we used to say, "For advanced users only. At your own risk."

But I've still got a Mint 19 machine, so I can try the new app..
24) Message boards : News : New and improved apps coming soon (Message 2975)
Posted 4 Jan 2021 by Richard Haselgrove
Post:
No need to apologise - it was my decision to upgrade. The machine was largely idle because GPUGrid has come to the end of its current research run, and I'm not really enthused by make-work projects like Collatz and PrimeGrid. I'd been thinking of upgrading for a while, and you gave me an excuse to get my brain into gear after the holidays.

It's generally running smoothly, but I hit two tasks today which seemed to get stuck in an endless loop.

Task 105827986 (from wu_sf3_DS-16x270_Grp 3738460 of 3932160)
Task 105983915 (from wu_sf3_DS-16x270_Grp 3639160 of 3932160)

I don't monitor the GPU loadings, but I do run a monitor which displays the 'CPU efficiency' of the task - %age of time the CPU is under load. That's typically 20% for the Linux GPU tasks, but dropped well down below 10% for these: elapsed time continued to rise, but task progress froze at 90.880% and 90.629% respectively. I paused them a couple of times each (which for GPU tasks removes them completely from memory): they re-started from checkpoint OK, but froze again at the same point. In the end, I aborted them.

I noticed the lookup file, but didn't explore it in detail. BOINC itself concentrates on the "compute capability", which is invariant for each iteration of the NVidia architecture - it determines such things as the number of shaders per multiplex, and the minimum CUDA level required in the driver. Do you plan to document your usage anywhere?
25) Message boards : News : New and improved apps coming soon (Message 2972)
Posted 3 Jan 2021 by Richard Haselgrove
Post:
After what feels like a very long day (the promised 'upgrade in place' tool appears not to exist), host 2248192 now reports itself as "Linux Mint 20". That claims to include 'Ubuntu GLIBC 2.31', but it's evidently backwards compatible - both GTX 1660 SUPER cards are now happily crunching with the new app.
26) Message boards : News : New and improved apps coming soon (Message 2969)
Posted 2 Jan 2021 by Richard Haselgrove
Post:
Going to leave it for tonight. I've converted the GPU tasks back to run on the CPU, so they won't be wasted or need to be resent. I'll have another look at upgrading in daylight tomorrow, now I know what I'm looking for.
27) Message boards : News : New and improved apps coming soon (Message 2965)
Posted 2 Jan 2021 by Richard Haselgrove
Post:
Hit a snag on host 2248192

/lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by [cuda 30 app])
Investigating.

Seems I've got

GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1.3) stable release version 2.27.
and it seems the free-standing downloads of 2.29 are no longer available. May try upgrading from Mint 'Tessa' 19.1 to Mint 'Mate' 20 ???
28) Message boards : Number crunching : **Wish List** WU' Limits (Message 2929)
Posted 23 Nov 2020 by Richard Haselgrove
Post:
Sounds like you also need to tone down your "Store up to an additional --- days of work" setting, too. That's the one that leads to large bunches at each work request.
29) Message boards : News : Server problems this past weekend (Message 2921)
Posted 16 Nov 2020 by Richard Haselgrove
Post:
No, it goes right back to the start of the problem:

14-Nov-2020 20:29:27 [NumberFields@home] Error reported by file upload server: can't open log file '../log_mimas/file_upload_handler.log' (errno: 9)
is the earliest on this machine.

I think it can happen if/when you run short on disk space?
30) Message boards : News : Server problems this past weekend (Message 2919)
Posted 16 Nov 2020 by Richard Haselgrove
Post:
Welcome back. I've sent one batch up, and all seemed well - I'll do the others in stages.

Did you see the specific error message on uploads?

16/11/2020 14:43:17 | NumberFields@home | [error] Error reported by file upload server: can't open log file '../log_mimas/file_upload_handler.log' (errno: 9)
Might be helpful.
31) Message boards : News : New GPU OpenCL versions available (Message 2889)
Posted 3 Oct 2020 by Richard Haselgrove
Post:
Someone pointed out to me that CUDA has a blockingSync flag that limits the cpu usage without much hit to the performance. My first attempt at using it caused a crash and I haven't gone back to it yet. It's on my to-do list along with a hundred other things.
Synchronisation is definitely one of the black arts of GPU programming. David Anderson wrote a useful primer for BOINC application programmers 10 years ago - https://boinc.berkeley.edu/trac/wiki/GpuSync - but so far as I know has never revisited the subject since.

The native CUDA language for NVidia probably has the best support for various synchronisation methods - since David wrote, Cuda 5.0 has added stream callbacks, and I'm told they're the most efficient - but I've never seen them used in a BOINC science application. I've also been told they were not added to the OpenCL interface (though https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clSetEventCallback.html suggests otherwise).

The CPU load of a GPU application varies hugely according to the architecture of the GPU, the programming language chosen, and the sync model used. I tried to start a debate on tackling those issues within BOINC a year ago - https://github.com/BOINC/boinc/issues/2949 - but interest has been, to say the least, minimal. We are now up to CUDA 11.1, with doubled FP32 pathways on the RTX 30x0 range, but still nothing moves - it's all left to individual project programmers like yourself, with hundreds of other priorities to juggle.
32) Message boards : News : Expired SSL certificates in BOINC Client -- User Action Required (Message 2827)
Posted 12 Jun 2020 by Richard Haselgrove
Post:
We're good again - thanks, Eric.

Running on Windows 7. I restored the old ca-bundle dated 30 July 2019, and got this log:

12/06/2020 11:12:38 | NumberFields@home | Sending scheduler request: Requested by user.
12/06/2020 11:12:38 | NumberFields@home | Requesting new tasks for CPU
12/06/2020 11:12:38 | NumberFields@home | [sched_op] CPU work request: 1793.75 seconds; 0.00 devices
12/06/2020 11:12:38 | NumberFields@home | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
12/06/2020 11:12:38 | NumberFields@home | [sched_op] Intel GPU work request: 0.00 seconds; 0.00 devices
12/06/2020 11:12:38 | NumberFields@home | [http] HTTP_OP::init_post(): https://numberfields.asu.edu/NumberFields_cgi/cgi
12/06/2020 11:12:38 | NumberFields@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Info: Found bundle for host numberfields.asu.edu: 0x502b020 [can pipeline]
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Info: Re-using existing connection! (#4862) with host numberfields.asu.edu
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Info: Connected to numberfields.asu.edu (129.219.51.76) port 443 (#4862)
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: POST /NumberFields_cgi/cgi HTTP/1.1
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: Host: numberfields.asu.edu
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.17.0)
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: Accept: */*
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: Accept-Language: en_GB
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: Content-Length: 22335
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server: Expect: 100-continue
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Sent header to server:
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
12/06/2020 11:12:38 | NumberFields@home | [http] [ID#1] Info: We are completely uploaded and fine
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server: HTTP/1.1 200 OK
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server: Date: Fri, 12 Jun 2020 10:12:44 GMT
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server: Server: Apache/2.4.29 (Ubuntu)
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server: Vary: Accept-Encoding
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server: Content-Encoding: gzip
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server: Content-Length: 1591
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server: Content-Type: text/xml
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Received header from server:
12/06/2020 11:12:39 | NumberFields@home | [http] [ID#1] Info: Connection #4862 to host numberfields.asu.edu left intact
12/06/2020 11:12:40 | NumberFields@home | Scheduler request completed: got 1 new tasks

The download worked as well:

12/06/2020 11:12:42 | NumberFields@home | Started download of sf3_DS-15x271_Grp5038225of6553600.dat
12/06/2020 11:12:44 | NumberFields@home | Finished download of sf3_DS-15x271_Grp5038225of6553600.dat

(I won't bother you with the full http --> https redirect trace)

No detail recorded about the certificate negotiation, but both connections completed over port 443, so secure.
33) Message boards : News : Expired SSL certificates in BOINC Client -- User Action Required (Message 2826)
Posted 11 Jun 2020 by Richard Haselgrove
Post:
Late night here, but I'll test in the morning and report. Hopefully good news!
34) Message boards : News : Batch plan (Message 2820)
Posted 8 Jun 2020 by Richard Haselgrove
Post:
I've noticed that I have received a few of the sf7_DS-16x4 tasks, which match the {2,11} row in your new table with 7680 tasks in the batch.

They're coming in with a 3 day deadline, but with a runtime estimate nearly three times the baseload batch (20,000 GFLOPs instead of 7,000 GFLOPs). That's going to skew the runtime estimates, if they're really 'relatively quick' runners.
35) Message boards : News : Expired SSL certificates in BOINC Client -- User Action Required (Message 2819)
Posted 8 Jun 2020 by Richard Haselgrove
Post:
But I agree the recent drops are due to the SSL certificate issue. Not sure what to do about it, and I don't think disabling SSL is a good idea.
Both LHC@home and Rosetta@home managed to modify their server certificates in a way that enabled work to flow even for people with the older client version / certificate bundle. You could reach out to those two projects, either over the BOINC projects email list, or privately in case they don't want to be public about their security settings.
36) Message boards : Number crunching : bok's stats not updating (Message 2812)
Posted 6 Jun 2020 by Richard Haselgrove
Post:
The same feed supplies all the statistics sites, and BOINCstats is updating normally - now.

BOINCstats stopped updating for a couple of days: it couldn't access the feed, because of the same https expired-certificate bug that affected all of us. Willy bypassed the https bug, and the stats started again.

If anyone notices that their favourite stats site has stopped updating, try initiating a conversation with the operator of the site, like Delayed updates from some projects - certificate error?

It's always worth checking https://numberfields.asu.edu/NumberFields/stats/ (or a similar url for any project) to verify if a stats problem is local or remote.
37) Message boards : News : Expired SSL certificates in BOINC Client -- User Action Required (Message 2808)
Posted 1 Jun 2020 by Richard Haselgrove
Post:
If any Windows user, 64-bit only, is still affected by this, there is a hotfix v7.16.7 of BOINC available from https://boinc.berkeley.edu/download.php
38) Message boards : News : Expired SSL certificates in BOINC Client -- User Action Required (Message 2799)
Posted 31 May 2020 by Richard Haselgrove
Post:
the CRT file already installed is valid from 1998 till 2028 !
The certificate file is a bundle, containing 133 different certificates.

The Microsoft Windows tool only shows you the expiry date of the one on the top of the pile - the first certificate in the bundle.

The problem here is caused by certificate number 6 in the bundle, which expired yesterday. Follow any of the suggestions for editing/replacing your bundle, and you will be able to contact this project's servers again.
39) Message boards : News : Expired SSL certificates in BOINC Client -- User Action Required (Message 2795)
Posted 31 May 2020 by Richard Haselgrove
Post:
(it should be in your BOINC root directory).
This problem primarily affects Windows, and some (but not all) versions of Linux.

The file needing replacement can be found in the program directory for Windows - most commonly C:\Program Files\BOINC
40) Message boards : Number crunching : Upload server down? (Message 2792)
Posted 30 May 2020 by Richard Haselgrove
Post:
Thanks Eric, but you can relax - not your end.

Problem has been tracked down to an expired certificate in the ca-bundle.crt shipped with the client - see https://github.com/BOINC/boinc/issues/3789

I have hacked my bundle file with a replacement certificate, as described in Peer certificate cannot be authenticated with given CA certificates at BOINC dev, and confirmed it works here.

Toby Broom - moderator at LHC - has made a downloadable file with the edit - details in the thread.


Previous 20 · Next 20


Main page · Your account · Message boards


Copyright © 2024 Arizona State University