Vacation

Message boards : News : Vacation
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,257,202
RAC: 235,524
Message 2087 - Posted: 23 Aug 2018, 4:59:21 UTC

I will be on vacation for 2 weeks, and my internet access will be spotty. So I apologize in advance for any delays in responding to the boards.

I generated enough work to last for the next two weeks, so hopefully I wont need to do any remote logins.

Thanks all!
ID: 2087 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gomeyer

Send message
Joined: 17 Mar 15
Posts: 10
Credit: 45,563,435
RAC: 0
Message 2088 - Posted: 23 Aug 2018, 10:12:57 UTC
Last modified: 23 Aug 2018, 10:41:54 UTC

I planned to wish you a good vacation, but . . .

Sorry to have to say this, but I'm starting to get Download Checksum Errors on several files. The tasks that come in that way have already been sent to other computers with the same results so it looks like the errors are in the workunit files themselves, not communications or what ever else. I hope that not all of your newly generated work behave that way!

PS I really do wish you a great vacation in any event!
ID: 2088 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 30 Apr 18
Posts: 4
Credit: 972,300
RAC: 0
Message 2089 - Posted: 23 Aug 2018, 14:27:01 UTC

Add to the list:
NO STATS exports have happened in almost 24 hours.
https://numberfields.asu.edu/NumberFields/stats/
Index of /NumberFields/stats

Name	Last modified	Size	Description
[   ]	badge_team.gz	2018-08-22 09:53	2.9K	 
[   ]	badge_user.gz	2018-08-22 09:53	36K	 
[   ]	db_dump.xml	2018-08-22 09:53	1.1K	 
[   ]	host.gz	2018-08-22 09:52	55M	 
[   ]	host_deleted.gz	2018-08-22 09:53	95	 
[   ]	tables.xml	2018-08-22 09:53	4.6K	 
[   ]	team.gz	2018-08-22 09:53	161K	 
[   ]	user.gz	2018-08-22 09:50	572K	 
[   ]	user_deleted.gz	2018-08-22 09:50	94


ID: 2089 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 30 Apr 18
Posts: 4
Credit: 972,300
RAC: 0
Message 2090 - Posted: 23 Aug 2018, 15:30:20 UTC

Appears the server(s) are having major issue -keep getting the "project is down for maintenance" message when requesting work or trying to access the project web site.

ID: 2090 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,257,202
RAC: 235,524
Message 2091 - Posted: 23 Aug 2018, 16:14:09 UTC - in response to Message 2088.  

Well, I still have 1 day before I leave, so I will have a chance to look into these things.
ID: 2091 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,257,202
RAC: 235,524
Message 2092 - Posted: 23 Aug 2018, 17:51:00 UTC - in response to Message 2091.  

Stats were being updated every 2 days - I changed it to every 24 hours. That explains that.

The "project is down for maintenance" message is related to a problem I am currently working. A particular user with 400k hosts is doing individual queries on every host (via an automated script I assume) and is essentially acting like a denial of service attack. This happens once every hour, and has gotten progressively worse to the point where it is now taking about 15 minutes for these stupid queries to finish. During that time, it is very difficult for anyone else to squeeze a query in, hence it appears that the server is down.
ID: 2092 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 223,174,342
RAC: 119,803
Message 2093 - Posted: 23 Aug 2018, 21:11:53 UTC
Last modified: 23 Aug 2018, 21:32:26 UTC

Sorry to add to your vacation woes (feel free to postpone until you get back), but I've just found this in an event log:

23/08/2018 21:13:53 | NumberFields@home | Started download of sf4_DS-13x271-11_Grp156548of200000.dat
23/08/2018 21:13:54 | NumberFields@home | [error] MD5 check failed for sf4_DS-13x271-11_Grp156548of200000.dat
23/08/2018 21:13:54 | NumberFields@home | [error] expected d41d8cd98f00b204e9800998ecf8427e, got 4b6d8299823a72f0621e8968af8165f4
23/08/2018 21:13:54 | NumberFields@home | [error] Checksum or signature error for sf4_DS-13x271-11_Grp156548of200000.dat
I'll nip downstairs and see if there's any more debris available to analyse.

Edit - found I still had the download url, so got a fresh copy. Identical to the first, 8,295 bytes.

From the error message above, the 'expected' MD5 is the one calculated on your server and stored in your database: the 'got' MD5 is the one BOINC calculated locally here. The download files have the same (independently calculated) MD5 as my local BOINC 'got' - which suggests that the value stored in your database is wrong.

Task is 30805859: looking at Error tasks for computer 1288, this wasn't the only one.
ID: 2093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,257,202
RAC: 235,524
Message 2094 - Posted: 23 Aug 2018, 22:47:16 UTC - in response to Message 2093.  

Hey Richard - Thanks for the analysis.
I have also been looking into these download errors and have been trying to piece together what happened. I diff'd some of the files in question against the original and they are identical, so like you said it's the database value that's off.

I'm starting to think this is related to the other database problem. There were several times when the database locked up in the middle of creating work. The create work script copies the WU file into a subdirectory and then adds a database entry (not sure of the order of these operations). When the database locks up, this script crashes and I have to rerun it (but only on those files not already copied). Sometimes it throws a "duplicate entry error" which tells me it already has a db entry but hasn't copied the file yet. Maybe this is the root cause of the problem? If this is the problem it should be a relatively small number of files affected I would think.
ID: 2094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gomeyer

Send message
Joined: 17 Mar 15
Posts: 10
Credit: 45,563,435
RAC: 0
Message 2095 - Posted: 25 Aug 2018, 10:17:01 UTC
Last modified: 25 Aug 2018, 10:18:13 UTC

I received about 45 of the error files back when I first posted about this problem. Since then I've only received one which I guess may have been a resend. There have been no additional errors, so it looks like there were indeed only a few tasks involved.
ID: 2095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Vitaly

Send message
Joined: 5 Jan 13
Posts: 43
Credit: 39,133,815
RAC: 55,204
Message 2096 - Posted: 27 Aug 2018, 19:32:25 UTC

I want to notify that I have three erroneous WUs:

https://numberfields.asu.edu/NumberFields/workunit.php?wuid=27643537
https://numberfields.asu.edu/NumberFields/workunit.php?wuid=27638607
https://numberfields.asu.edu/NumberFields/workunit.php?wuid=27622803

Best Regards.
ID: 2096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Vacation


Main page · Your account · Message boards


Copyright © 2024 Arizona State University