Server Outage on Friday May 3rd

Message boards : News : Server Outage on Friday May 3rd
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1341
Credit: 510,647,991
RAC: 557,416
Message 821 - Posted: 1 May 2013, 19:26:34 UTC

We will be taking the project server down this Friday to perform the previously mentioned upgrade. We plan to take it down at approx 11am and hope to be back up by 3pm (AZ time). During this time, all project functionality will be disabled, including the main website and the forums.
ID: 821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1341
Credit: 510,647,991
RAC: 557,416
Message 822 - Posted: 1 May 2013, 19:45:09 UTC - in response to Message 821.  

And just a reminder...

If you haven't already, you might need to remove/add the project in order to pickup the alias. This can also be done after the upgrade, but you risk losing any work that completes after the old server is shut down.

You can check which url you are using by looking at the client_state.xml file (I forget where it's located). If there are any references to stat.la.asu.edu, then you need to update the urls. Unless I am mistaken, you can manually change this file and then you wont have to reattach to the project. You will need to replace all occurances of "stat.la.asu.edu" with "numberfields.asu.edu". Hopefully the more experienced crunchers will correct me if I am wrong about this...
ID: 822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1341
Credit: 510,647,991
RAC: 557,416
Message 827 - Posted: 4 May 2013, 9:16:55 UTC - in response to Message 822.  

The server upgrade was completed by about 1pm, and as far as I can tell, the new server seems to be working. The Cox DNS hasn't been updated yet so I can't fully test the new system, but I can log in remotely to the server and see WUs going out and results coming back; and the log files don't show anything unusual.

But please let me know if you have any problems whatsoever.
ID: 827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 180
Credit: 246,636,459
RAC: 155,794
Message 829 - Posted: 4 May 2013, 9:48:39 UTC

Congratulations Eric, all seems to be working well here. I hope you can go off and get some well-earned rest.

A note on those urls:

I think I joined the project at an intermediate stage. I joined using the 'alias' url (numberfields.asu.edu), but I picked up the 'raw' scheduler address (stat.la.asu.edu). That meant that my first update requests after the upgrade failed ("couldn't connect to server").

If this happens to users, don't panic, and don't bother with the manual adjustment that Eric mentions. Just wait until BOINC has tried a few more times (or hurry it along with the 'update' button). After the tenth failure, you should see

Fetching scheduler list
Master file download succeeded

in your message/event log. Once that has happened, reporting completed work and fetching new work should proceed as normal (it did for me).
ID: 829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1341
Credit: 510,647,991
RAC: 557,416
Message 834 - Posted: 4 May 2013, 17:44:20 UTC - in response to Message 829.  

Congratulations Eric, all seems to be working well here. I hope you can go off and get some well-earned rest.

A note on those urls:

I think I joined the project at an intermediate stage. I joined using the 'alias' url (numberfields.asu.edu), but I picked up the 'raw' scheduler address (stat.la.asu.edu). That meant that my first update requests after the upgrade failed ("couldn't connect to server").

If this happens to users, don't panic, and don't bother with the manual adjustment that Eric mentions. Just wait until BOINC has tried a few more times (or hurry it along with the 'update' button). After the tenth failure, you should see

Fetching scheduler list
Master file download succeeded

in your message/event log. Once that has happened, reporting completed work and fetching new work should proceed as normal (it did for me).



Thanks Richard for the helpful hint! I didn't realize it self-corrected after the 10th failure.

If users are still having connection problems after the scheduler url is corrected, try restarting your client. This is something I needed to do on my linux boxes (which are running an older client version).
ID: 834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Server Outage on Friday May 3rd


Main page · Your account · Message boards


Copyright © 2024 Arizona State University