Message boards :
News :
Vacation
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Jul 11 Posts: 1340 Credit: 478,056,506 RAC: 564,282 |
I will be on vacation for 2 weeks, and my internet access will be spotty. So I apologize in advance for any delays in responding to the boards. I generated enough work to last for the next two weeks, so hopefully I wont need to do any remote logins. Thanks all! |
Send message Joined: 17 Mar 15 Posts: 10 Credit: 45,563,435 RAC: 0 |
I planned to wish you a good vacation, but . . . Sorry to have to say this, but I'm starting to get Download Checksum Errors on several files. The tasks that come in that way have already been sent to other computers with the same results so it looks like the errors are in the workunit files themselves, not communications or what ever else. I hope that not all of your newly generated work behave that way! PS I really do wish you a great vacation in any event! |
Send message Joined: 30 Apr 18 Posts: 4 Credit: 972,300 RAC: 0 |
Add to the list: NO STATS exports have happened in almost 24 hours. https://numberfields.asu.edu/NumberFields/stats/ Index of /NumberFields/stats Name Last modified Size Description [ ] badge_team.gz 2018-08-22 09:53 2.9K [ ] badge_user.gz 2018-08-22 09:53 36K [ ] db_dump.xml 2018-08-22 09:53 1.1K [ ] host.gz 2018-08-22 09:52 55M [ ] host_deleted.gz 2018-08-22 09:53 95 [ ] tables.xml 2018-08-22 09:53 4.6K [ ] team.gz 2018-08-22 09:53 161K [ ] user.gz 2018-08-22 09:50 572K [ ] user_deleted.gz 2018-08-22 09:50 94 |
Send message Joined: 30 Apr 18 Posts: 4 Credit: 972,300 RAC: 0 |
Appears the server(s) are having major issue -keep getting the "project is down for maintenance" message when requesting work or trying to access the project web site. |
Send message Joined: 8 Jul 11 Posts: 1340 Credit: 478,056,506 RAC: 564,282 |
Well, I still have 1 day before I leave, so I will have a chance to look into these things. |
Send message Joined: 8 Jul 11 Posts: 1340 Credit: 478,056,506 RAC: 564,282 |
Stats were being updated every 2 days - I changed it to every 24 hours. That explains that. The "project is down for maintenance" message is related to a problem I am currently working. A particular user with 400k hosts is doing individual queries on every host (via an automated script I assume) and is essentially acting like a denial of service attack. This happens once every hour, and has gotten progressively worse to the point where it is now taking about 15 minutes for these stupid queries to finish. During that time, it is very difficult for anyone else to squeeze a query in, hence it appears that the server is down. |
Send message Joined: 28 Oct 11 Posts: 180 Credit: 238,345,434 RAC: 86,462 |
Sorry to add to your vacation woes (feel free to postpone until you get back), but I've just found this in an event log: 23/08/2018 21:13:53 | NumberFields@home | Started download of sf4_DS-13x271-11_Grp156548of200000.datI'll nip downstairs and see if there's any more debris available to analyse. Edit - found I still had the download url, so got a fresh copy. Identical to the first, 8,295 bytes. From the error message above, the 'expected' MD5 is the one calculated on your server and stored in your database: the 'got' MD5 is the one BOINC calculated locally here. The download files have the same (independently calculated) MD5 as my local BOINC 'got' - which suggests that the value stored in your database is wrong. Task is 30805859: looking at Error tasks for computer 1288, this wasn't the only one. |
Send message Joined: 8 Jul 11 Posts: 1340 Credit: 478,056,506 RAC: 564,282 |
Hey Richard - Thanks for the analysis. I have also been looking into these download errors and have been trying to piece together what happened. I diff'd some of the files in question against the original and they are identical, so like you said it's the database value that's off. I'm starting to think this is related to the other database problem. There were several times when the database locked up in the middle of creating work. The create work script copies the WU file into a subdirectory and then adds a database entry (not sure of the order of these operations). When the database locks up, this script crashes and I have to rerun it (but only on those files not already copied). Sometimes it throws a "duplicate entry error" which tells me it already has a db entry but hasn't copied the file yet. Maybe this is the root cause of the problem? If this is the problem it should be a relatively small number of files affected I would think. |
Send message Joined: 17 Mar 15 Posts: 10 Credit: 45,563,435 RAC: 0 |
I received about 45 of the error files back when I first posted about this problem. Since then I've only received one which I guess may have been a resend. There have been no additional errors, so it looks like there were indeed only a few tasks involved. |
Send message Joined: 5 Jan 13 Posts: 43 Credit: 40,927,133 RAC: 1 |
I want to notify that I have three erroneous WUs: https://numberfields.asu.edu/NumberFields/workunit.php?wuid=27643537 https://numberfields.asu.edu/NumberFields/workunit.php?wuid=27638607 https://numberfields.asu.edu/NumberFields/workunit.php?wuid=27622803 Best Regards. |