Orphaned processes on Windows

Message boards : Number crunching : Orphaned processes on Windows
Message board moderation

To post messages, you must log in.

AuthorMessage
MaynardVizzutti

Send message
Joined: 11 Apr 15
Posts: 4
Credit: 4,708,500
RAC: 250
Message 1267 - Posted: 26 Apr 2015, 16:22:35 UTC

I've had a few workunits that get to around 86% then apparently become disconnected somehow: BOINC continues to show accumulated "Elapsed" time, but the progress percent never moves again. The Windows process isn't doing any work and never gets destroyed. Suspending the task with "leave in memory" disabled, then resuming allows the work to continue next time it gets scheduled, but with a new instance of GetDecics_2.00_windows_intelx86 *32. The old orphaned process continues to sit around.

Presumably the new instance picks up at the last checkpoint.

I believe task 10600400 is an example, but I'm not absolutely sure which tasks went with the problem.
ID: 1267 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1321
Credit: 409,470,002
RAC: 236,590
Message 1268 - Posted: 27 Apr 2015, 16:05:03 UTC - in response to Message 1267.  

I've had a few workunits that get to around 86% then apparently become disconnected somehow: BOINC continues to show accumulated "Elapsed" time, but the progress percent never moves again. The Windows process isn't doing any work and never gets destroyed. Suspending the task with "leave in memory" disabled, then resuming allows the work to continue next time it gets scheduled, but with a new instance of GetDecics_2.00_windows_intelx86 *32. The old orphaned process continues to sit around.

Presumably the new instance picks up at the last checkpoint.

I believe task 10600400 is an example, but I'm not absolutely sure which tasks went with the problem.


I can't say what could be causing that. There was nothing out of the ordinary in the stderr, other than it looked like it had been stopped and restarted several times. Also, the results file looked fine.

Is this something that's been ongoing or did it just start happening?
ID: 1268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MaynardVizzutti

Send message
Joined: 11 Apr 15
Posts: 4
Credit: 4,708,500
RAC: 250
Message 1272 - Posted: 4 May 2015, 12:37:30 UTC - in response to Message 1268.  

Thanks for your response, and sorry for the delay in responding to you.

I have not been around this project long enough to know whether it's an ongoing thing (since 11-APR), but I noticed three instances within a couple hours of each other, on the day of my original post.

Since that time, I have seen no further occurrences. If there are some information-gathering steps you can recommend, I'll be happy to do that if another case should occur.
ID: 1272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Orphaned processes on Windows


Main page · Your account · Message boards


Copyright © 2024 Arizona State University