61)
Message boards :
News :
GPU app - beta version for linux nvidia
(Message 2354)
Posted 2 Apr 2019 by Richard Haselgrove Post: I disagree. Payment should be for the work done in searching - the same number of credits for the search task, whatever the device used. GPUs will win out vastly in the number of credits awarded per unit of time - per second, per hour, per day, however you choose to measure it. Your own statement that you've found more candidate values since the GPU app was released confirms that you're conducting more searches. Good on you - that's your reward. You don't need to be compensated twice - once for doing more searches, and again for doing searches on a different device. |
62)
Message boards :
Number crunching :
Too much credit ?
(Message 2353)
Posted 2 Apr 2019 by Richard Haselgrove Post: I disagree. Payment should be for the work done in searching - the same number of credits for the search task, whatever the device used. GPUs will win out vastly in the number of credits awarded per unit of time - per second, per hour, per day, however you choose to measure it. Your own statement that you've found more candidate values since the GPU app was released confirms that you're conducting more searches. Good on you - that's your reward. You don't need to be compensated twice - once for doing more searches, and again for doing searches on a different device. |
63)
Message boards :
Number crunching :
Too much credit ?
(Message 2347)
Posted 31 Mar 2019 by Richard Haselgrove Post: Sure, I can send you my observations via PM if that works for you. Or would the BOINC projects mailing list be better?Just seen that you've found the 'outlier' setting :-) CreditNew doesn't kick in until host_app_version.pfc_avg and app_version.pfc_avg have usable values, so you'll probably have to unleash the outliers for a few days to fill up the tables. But there are still genuine outliers in the data, so we'll have to - eventually - find a way of distinguishing them: not by an absolute time value, but something relative. You'd probably fell able to speak more freely in PM, but I think we need to get this on the record sometime, so I think the time has come for a formal report to the projects mailing list. In the meantime - being Windows only - my GPUs haven't joined the party, but my CPUs are plodding along exactly as normal. I request that the evidence in https://boincstats.com/en/stats/122/user/detail/1969/charts be taken into account. |
64)
Message boards :
News :
GPU app - beta version for linux nvidia
(Message 2346)
Posted 31 Mar 2019 by Richard Haselgrove Post: How does the current credit/hour for the GPU compare to other projects?If they are the same WUs, they should get the same credit - whatever hardware they are run on. The concept of "credit/hour for the GPU" is a movable feast. I run the same GPUs for both SETI and GPUGrid: the credit awarded is roughly in the ratio 1::20. SETI pays too low, GPUGrid pays much too high. I usually reckon that the staff at Einstein make the most reasonable attempt to follow the definition of the cobblestone (section Claimed and granted credit). The 'cobblestone' is the sunday best formal name for the credit, named after Jeff Cobb of SETI. A 1 GFlop device will earn 200 cobblestones per day. More coming in the other thread. |
65)
Message boards :
Number crunching :
Too much credit ?
(Message 2337)
Posted 30 Mar 2019 by Richard Haselgrove Post: ... and the message boards were quiet about it (until now).It's a generally observed bit of psychology. If credit is too low, everyone complains. If credit is too high, there's an unspoken (and instinctive) conspiracy of silence. The definition of credit is 'work done' - the number of floating point operations computed during the course of the work. So, equal credit per task for the same WU is right - until somebody thinks up a more efficient algorithm and computes the task with less work, at which point there's no fair answer. Incidentally, notice those weasel words 'floating point operations'. Would I be right in thinking that this project mostly utilises integer arithmetic? And would GPUs be especially efficient - i.e. 'fast' - when processing integers? BOINC makes no effort to assess the real speed of GPUs - unlike CPUs, which are benchmarked (badly) in both integer and floating point mode. Eric, would it be possible for you to make notes on your credit experiences during this transition? I really think that BOINC should - belatedly - address the real effectiveness of these various credit schemes (especially at times of transition), and your cool-headed observations would be most helpful. |
66)
Message boards :
Number crunching :
GPU Error
(Message 2276)
Posted 26 Mar 2019 by Richard Haselgrove Post: API 7.5 sounds like a BOINC version number. That has no effect whatsoever on the CUDA verssion needed or used.ok, system log shows : API 7.5 will be used for the app.The app was built with cuda version 10.1 I believe. |
67)
Message boards :
News :
GPU app - beta version for linux nvidia
(Message 2244)
Posted 24 Mar 2019 by Richard Haselgrove Post: I wonder if I should be calling a function to set the device. I vaguely remember seeing something about that, but I completely forgot to follow up on it. Having only a single GPU, I was not perceptive to this "bug".Yes, you should. https://boinc.berkeley.edu/trac/wiki/AppCoprocessor#Deviceselection Concentrate on boinc_get_init_data() - the older command line --device N is so old it can be relegated to an afterthought. |
68)
Message boards :
News :
Server upgrade
(Message 2144)
Posted 25 Jan 2019 by Richard Haselgrove Post: It is not just a problem with the stats export. When I look at my account page here at NumberFields@home (but also at other projects) the information for NumberFields@home in the "Projects in which you are participating list is wrong (the NumberFields@home specific data earlier on the page is correct though). In my case the data in the list is a copy from another project but definitely not the NumberFields@home data.Likewise here. In my case, the cross-project line is copied from the immediately previous project line in alphabetical order. I've looked at the current stats export for my account: <user> <id>1969</id> <name>Richard Haselgrove</name> <create_time>1319792132</create_time> <total_credit>96705660.619374</total_credit> <expavg_credit>34201.016603</expavg_credit> <expavg_time>1548365625.070331</expavg_time> <cpid>68aa4b6077c3fe48975e530f6ad94ca5</cpid> </user>which looks fine, and is also displayed correctly on BOINCstats So this looks like a processing error at netsoft online (which provides the aggregated cross-project stats for this list and the cross-project certificate). You may need to liaise with James Drews (contact details at GitHub) to work out what's going on. |
69)
Message boards :
News :
Server upgrade
(Message 2130)
Posted 11 Jan 2019 by Richard Haselgrove Post: Good to hear that things are under control. I've been watching the emails passing back and forth: you're in good hands - far better than mine - so appropriate for me to keep out of the noise. |
70)
Message boards :
News :
Server upgrade
(Message 2127)
Posted 11 Jan 2019 by Richard Haselgrove Post: OK, I'll pass that upstream. Laurence Field of CERN (LHC@Home) is working to create a 'server stable' versioning system, and that may have been where you picked up the reference to v1.0.3 - I think that's what they're testing at the moment. v7.15.0 is the correct value for the client code contained in the master Git repo alongside the server stuff. I'll check out the latest state of play and report back. |
71)
Message boards :
News :
Server upgrade
(Message 2125)
Posted 10 Jan 2019 by Richard Haselgrove Post: What exactly did you update? Can you identify the new server code version that you used? There have been changes to the db_dump code to accommodate GDPR (European privacy legislation). It's possible - wild guess here - that additional configuration may be necessary, to accommodate (or not) user choices about revealing or withholding their private stats information. I see you've written to the projects mailing list: I mentioned your report about the update to the development team in a conference call earlier this evening. Between them, I hope those will attract the attention of the people who know. |
72)
Message boards :
News :
Vacation
(Message 2093)
Posted 23 Aug 2018 by Richard Haselgrove Post: Sorry to add to your vacation woes (feel free to postpone until you get back), but I've just found this in an event log: 23/08/2018 21:13:53 | NumberFields@home | Started download of sf4_DS-13x271-11_Grp156548of200000.datI'll nip downstairs and see if there's any more debris available to analyse. Edit - found I still had the download url, so got a fresh copy. Identical to the first, 8,295 bytes. From the error message above, the 'expected' MD5 is the one calculated on your server and stored in your database: the 'got' MD5 is the one BOINC calculated locally here. The download files have the same (independently calculated) MD5 as my local BOINC 'got' - which suggests that the value stored in your database is wrong. Task is 30805859: looking at Error tasks for computer 1288, this wasn't the only one. |
73)
Message boards :
Number crunching :
I cant send finished wus
(Message 2018)
Posted 11 Apr 2018 by Richard Haselgrove Post: And we can see that WU (23257062 - task 25321670) was reported and validated by host 95038 at 2:10:51 UTC on 10 Apr 2018. We can't see the user name ('anonymous'), to match the 'hidden' setting for your computers. But you might be able to see some host log activity around that time. |
74)
Message boards :
Number crunching :
I cant send finished wus
(Message 2016)
Posted 11 Apr 2018 by Richard Haselgrove Post: Here is an absolutely off-the-wall observation, which may be completely unrelated - but it has some similarities with this situation. I present it as food for thought, nothing more. Like HerrJeh, I have multiple machines - all are at my home, and share an internet connection. They have the same public IP addresses, but different private IP addresses behind the router. I'm connected to a different BOINC project, which like this one runs every BOINC function from a single server. That other project distributes long-running (up to 18 hour) workunits, but regards them as time-critical - it likes to have them returned within 24 hours. So I don't want to get a cache of tasks in advance, and in fact the project often has no tasks immediately available on demand. All of which is a long way of explaining why I find myself repeatedly clicking the 'Update' button in an attempt to get new work before the current task completes. What I am observing is that I can issue repeated requests every 30 seconds from one machine, and connect to the server every time. But if I try to connect from a different machine (same LAN, same IP) in between, the second machine can't connect. If I stop updating the first machine for a minute or two, the second machine can connect, and goes on connecting for as long as is needed and allowed by their 30-second backoff interval. The nature of the server contact required doesn't affect the connection failures: I've just had a machine which couldn't connect to upload results, while I was requesting new work on a different machine. It isn't simply congestion at the server port: that would be more random. This observation is strictly about multiple connection attempts, closely spaced in time, from different computers sharing the same public IP address. It feels more like a server OS-level problem than a BOINC problem, and it's been consistent for weeks, if not months. Their server is running 11/04/2018 11:03:50 | | [http] [ID#1] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5My machines are all Windows 7/64 and run recent versions of BOINC (mostly v7.9 test builds). I haven't yet explored the http logs for further clues, but I'll give it a try when I have time. |
75)
Message boards :
Number crunching :
I cant send finished wus
(Message 2009)
Posted 10 Apr 2018 by Richard Haselgrove Post: hosts that could return Decic tasks but not Septic tasks.I noticed that in the logs, and couldn't understand it. Both types are uploading from my machines with no distinction. All communications - specifically including uploading - are handled by BOINC (the same BOINC in both cases), so there should be no difference. The build tools and libraries used in your applications should make no difference either - the comms happen after your app has already shut down. I looked at the destination of each type: <upload_url>http://numberfields.asu.edu/NumberFields_cgi/file_upload_handler/</upload_url> (septics) <upload_url>http://numberfields.asu.edu/NumberFields_cgi/file_upload_handler/</upload_url> (sf4_DS-13x271) and my eye can't spot a difference. Is there a difference in the file size offered for upload? |
76)
Message boards :
Number crunching :
I cant send finished wus
(Message 1999)
Posted 4 Apr 2018 by Richard Haselgrove Post: Maybe others are also unable to return or get new work?I've not noticed any problems here, across multiple Windows machines. Here's the http_debug for a successful upload, for comparison. I don't think xfer_debug would add much extra information (though no harm in selecting it): these problems usually occur during the header negotiation. 04/04/2018 09:26:18 | NumberFields@home | [http] [ID#15238] Sent header to server: Accept: */*If the problem is widespread, one possible cause is the security certificate verification you see here. I'm using a test version of BOINC (v7.9.2) which includes an updated ca-bundle.crt file: the latest test versions (currently v7.9.3, soon to be v7.9.4) are available at https://boinc.berkeley.edu/download_all.php |
77)
Message boards :
News :
Minor server overload problems
(Message 1973)
Posted 27 Feb 2018 by Richard Haselgrove Post: I was in a conference call this evening with a couple of *very* experienced BOINC server administrators. One thought that on a lightly-loaded project, a database connection might well time out between active requests: the other had never seen such a thing. On reflection, both thought there might be some useful information about a possible database server stoppage in the MySQL logs. They suggested I copy your question to the boinc_projects mailing list, both to remind them to look again, and to get some broader responses from the community. OK if I do that in the morning? |
78)
Message boards :
News :
Minor server overload problems
(Message 1971)
Posted 27 Feb 2018 by Richard Haselgrove Post: The server has gone into the 'feeder not running' maintenance mode again - 1 hour backoff on scheduler requests, no reporting and no new work. Edit - sorry, working now. Panic over. |
79)
Message boards :
News :
Database crash
(Message 1939)
Posted 8 Jan 2018 by Richard Haselgrove Post: Yes, things getting back to normal here too - reported completed work and got a few new ones. I'm getting some of those "permanent http error" too. They seem to be data files missing from the download storage area (HTTP 404), so of course the downloads fail for all replications of the workunit. You'll have a few jobs to re-issue when all this is over. |
80)
Message boards :
News :
Minor server overload problems
(Message 1929)
Posted 31 Dec 2017 by Richard Haselgrove Post: It's fairly well known that a stopped feeder puts the server into a form of maintenance mode - nothing gets done, volunteer hosts are backed off for 1 hour (as my log shows). I presume this is deliberate to stop the situation getting worse until it can be inspected. What that doesn't say is why the feeder stopped in the first place - your second post about the lack of temp file space seems as good an explanation as any. At intervals throughout the day, I've seen the project come back up fully (so I could report and refill): then go into full maintenance mode with these boards down as well: then return to normal working (the current state: just reported and received new tasks). Which sounds like a good moment to go out and start celebrating the new year... |