abort?

Message boards : Number crunching : abort?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Rasputin42

Send message
Joined: 5 Nov 11
Posts: 25
Credit: 1,107,815
RAC: 0
Message 545 - Posted: 17 Mar 2012, 15:45:21 UTC

No, it is running without any interruptions and boinc only.
Running Ubuntu 11.04.(64-bit)
I did notice, that the cpu utilization is not anywhere near 100% for any of the tasks.
My guess, this is due to the application.
ID: 545 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 926
Credit: 97,224,939
RAC: 55,139
Message 546 - Posted: 17 Mar 2012, 18:05:41 UTC - in response to Message 545.  

No, it is running without any interruptions and boinc only.
Running Ubuntu 11.04.(64-bit)
I did notice, that the cpu utilization is not anywhere near 100% for any of the tasks.
My guess, this is due to the application.


That might be why it's so slow. My CPU usage is about 100% on linux (according to the top command). On windows, it is 95% because I set it that way in the preferences. You may want to double check your preferences, but the app itself has no control over that.
ID: 546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 3 Sep 11
Posts: 26
Credit: 2,370,588
RAC: 0
Message 547 - Posted: 18 Mar 2012, 0:04:07 UTC - in response to Message 545.  

No, it is running without any interruptions and boinc only.
Running Ubuntu 11.04.(64-bit)
I did notice, that the cpu utilization is not anywhere near 100% for any of the tasks.
My guess, this is due to the application.


G'Day Rasputin42,

I am running Linux 64 bit (Fedora 16 version), and my CPU utilisation is sitting around 99% all the time (computers virtually only do BOINC).

So perhaps CPU throttling is working to keep temperatures down on your CPU?

Also check that "Use 100% of CPU" is selected in your account preferences.

Plus check that the option "While Processor usage is less than ? percent", is set to 0 (zero) as Zero means no restriction on the CPU usage, any other figure will restrict the amount of work the CPU does.

Hope this can be of help.

Conan

ID: 547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42

Send message
Joined: 5 Nov 11
Posts: 25
Credit: 1,107,815
RAC: 0
Message 549 - Posted: 18 Mar 2012, 13:06:10 UTC

I checked the settings and they are OK.
However, i finished one (of the four)tasks.
The other 3 are listed as " timed out- no response " in my account.
Does this mean, that i will not get any credits?
The cpu time for the finished task is 1,083,110.00 sec (or 300h or 12.5 days)
Total run time 1,276,200.57(or 14.75 days.
Regarding the credits, this is about 5 credits/hour.
ID: 549 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 926
Credit: 97,224,939
RAC: 55,139
Message 550 - Posted: 18 Mar 2012, 18:36:58 UTC - in response to Message 549.  

I checked the settings and they are OK.
However, i finished one (of the four)tasks.
The other 3 are listed as " timed out- no response " in my account.
Does this mean, that i will not get any credits?
The cpu time for the finished task is 1,083,110.00 sec (or 300h or 12.5 days)
Total run time 1,276,200.57(or 14.75 days.
Regarding the credits, this is about 5 credits/hour.


You should still get the credit if it's within the grace period. If not, then let me know and I can manually give you credit (I have been told this can be done, but I haven't tried yet).
ID: 550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 3 Sep 11
Posts: 26
Credit: 2,370,588
RAC: 0
Message 553 - Posted: 18 Mar 2012, 23:17:08 UTC

G'Day Erik and Rasputin42,

The WU 726362 has been reissued and returned by none other than yourself, Erik. In this case with a result already returned BOINC normally gives 0.00 credit to the second returned result if only 1 result is needed.

So if Rasputin42 returns this work unit now he wont get any credit for it as you already have the credit for that result.

If Rasputin42 has not started this WU then it might be a good idea to abort it.

Is this correct Erik?

Conan
ID: 553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 926
Credit: 97,224,939
RAC: 55,139
Message 556 - Posted: 19 Mar 2012, 1:31:31 UTC - in response to Message 553.  

G'Day Erik and Rasputin42,

The WU 726362 has been reissued and returned by none other than yourself, Erik. In this case with a result already returned BOINC normally gives 0.00 credit to the second returned result if only 1 result is needed.

So if Rasputin42 returns this work unit now he wont get any credit for it as you already have the credit for that result.

If Rasputin42 has not started this WU then it might be a good idea to abort it.

Is this correct Erik?

Conan


Well, isn't that embarassing. The wiki page said that the user could still return a result if it was within the grace period; I had assumed they would still get credit for it, but I could be wrong.

If Rasputin hasn't started it, then he should definitely abort it. And if what you say is true (regarding credits), then he should probably abort it anyways, and I will have to give him manual credit.
ID: 556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 128
Credit: 106,945,858
RAC: 30,496
Message 558 - Posted: 19 Mar 2012, 8:31:36 UTC - in response to Message 556.  

G'Day Erik and Rasputin42,

The WU 726362 has been reissued and returned by none other than yourself, Erik. In this case with a result already returned BOINC normally gives 0.00 credit to the second returned result if only 1 result is needed.

So if Rasputin42 returns this work unit now he wont get any credit for it as you already have the credit for that result.

If Rasputin42 has not started this WU then it might be a good idea to abort it.

Is this correct Erik?

Conan

Well, isn't that embarassing. The wiki page said that the user could still return a result if it was within the grace period; I had assumed they would still get credit for it, but I could be wrong.

If Rasputin hasn't started it, then he should definitely abort it. And if what you say is true (regarding credits), then he should probably abort it anyways, and I will have to give him manual credit.

If Rasputin hasn't started the task, then the BOINC client on his computer should have aborted it. (I've watched that behaviour on my computers during this episode, so I can vouch for the accuracy of that statement - although I'm running different versions of BOINC from Rasputin).

If Rasputin has started the task, and aborts it now, we'll never know whether the Wiki is accurate.

Better if he lets it run and reports it in the normal way. Then, if BOINC awards it zero credit, you can award the points manually, and I can amend the Wiki (if someone can post a link to the page).
ID: 558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 926
Credit: 97,224,939
RAC: 55,139
Message 561 - Posted: 19 Mar 2012, 17:29:04 UTC - in response to Message 558.  


If Rasputin hasn't started the task, then the BOINC client on his computer should have aborted it. (I've watched that behaviour on my computers during this episode, so I can vouch for the accuracy of that statement - although I'm running different versions of BOINC from Rasputin).

If Rasputin has started the task, and aborts it now, we'll never know whether the Wiki is accurate.

Better if he lets it run and reports it in the normal way. Then, if BOINC awards it zero credit, you can award the points manually, and I can amend the Wiki (if someone can post a link to the page).


I agree. Here is the link to the wiki page:
http://boinc.berkeley.edu/trac/wiki/ProjectOptions

Here is the relevant text from that page:
x
A "grace period" for task reporting. A task is considered time-out (and a new replica generated) if it is not reported by client_deadline + x.

According to this, a new task should not have been generated yet. I'm starting to think I am using the wrong units. I have been assuming hours, when maybe it's seconds???
ID: 561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 128
Credit: 106,945,858
RAC: 30,496
Message 563 - Posted: 19 Mar 2012, 20:42:17 UTC - in response to Message 561.  
Last modified: 19 Mar 2012, 20:59:09 UTC

Here is the relevant text from that page:
<report_grace_period>x</report_grace_period>
A "grace period" for task reporting. A task is considered time-out (and a new replica generated) if it is not reported by client_deadline + x.

According to this, a new task should not have been generated yet. I'm starting to think I am using the wrong units. I have been assuming hours, when maybe it's seconds???

If we look at the component bits of the WU that started this, we see:

Task 796428:
Sent 21 Feb 2012 | 18:24:50 UTC
Received 28 Feb 2012 | 18:25:13 UTC

Task 865007:
Created 28 Feb 2012 | 18:25:14 UTC
Sent 3 Mar 2012 | 12:37:53 UTC
Deadline 10 Mar 2012 | 4:15:29 UTC

Task 887065:
Created 10 Mar 2012 | 4:15:30 UTC
Sent 10 Mar 2012 | 8:53:51 UTC

That's only 1 second between the disposal of the previous task by whatever route, and the creation of the replacement. That doesn't sound like a grace period in my book, even if your value was presumed to be denominated in seconds. I did wonder if the grace period might actually be after the replacement task was generated, but before it was added to the 'ready to send' queue - but that doesn't seem to work either.

I'll keep poking around while we wait to see what happens when Rasputin's task returns - but we may have to call in David for assistance again.

Edit: [trac]changeset:23040[/trac] implies that your control interface should indeed be measured in hours, but the figure you enter is multiplied by 3600 to be stored and operated in seconds.

Edit 2: looking at http://boinc.berkeley.edu/trac/browser/boinc/trunk/boinc/sched/sched_config.cpp

164         if (xp.parse_double("grace_period_hours", x)) { 
165             report_grace_period = (int)(x*3600); 
166             continue; 
167         } 
168         if (xp.parse_int("report_grace_period", report_grace_period)) continue; 

What tag are you using? report_grace_period is seconds, grace_period_hours is - er, what it says on the tin.
ID: 563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 564 - Posted: 19 Mar 2012, 21:04:54 UTC - in response to Message 563.  

OMFG!

and again - badly documented, no clue where to pull the strings.. :(
ID: 564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 128
Credit: 106,945,858
RAC: 30,496
Message 565 - Posted: 19 Mar 2012, 21:23:53 UTC - in response to Message 564.  

Wiki updated for the confusion over units, though I don't think that's the real underlying problem.
ID: 565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 926
Credit: 97,224,939
RAC: 55,139
Message 566 - Posted: 19 Mar 2012, 21:37:30 UTC - in response to Message 565.  

Wiki updated for the confusion over units, though I don't think that's the real underlying problem.


Thanks Richard. That was quick.

I am pretty sure I was using the "report_grace_period", otherwise I wouldn't have questioned the units. I cant ssh into the project server from work, so I have a call into Greg to take a look at the config file. He will be able to tell us exactly what is going on there.
ID: 566 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 45
Credit: 7,144,042
RAC: 0
Message 567 - Posted: 19 Mar 2012, 22:00:32 UTC - in response to Message 566.  

Wiki updated for the confusion over units, though I don't think that's the real underlying problem.


Thanks Richard. That was quick.

I am pretty sure I was using the "report_grace_period", otherwise I wouldn't have questioned the units. I cant ssh into the project server from work, so I have a call into Greg to take a look at the config file. He will be able to tell us exactly what is going on there.


It was report_grace_period. I changed to grace_period_hours.
ID: 567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 926
Credit: 97,224,939
RAC: 55,139
Message 568 - Posted: 20 Mar 2012, 7:11:47 UTC - in response to Message 563.  


If we look at the component bits of the WU that started this, we see:

Task 796428:
Sent 21 Feb 2012 | 18:24:50 UTC
Received 28 Feb 2012 | 18:25:13 UTC

Task 865007:
Created 28 Feb 2012 | 18:25:14 UTC
Sent 3 Mar 2012 | 12:37:53 UTC
Deadline 10 Mar 2012 | 4:15:29 UTC

Task 887065:
Created 10 Mar 2012 | 4:15:30 UTC
Sent 10 Mar 2012 | 8:53:51 UTC

That's only 1 second between the disposal of the previous task by whatever route, and the creation of the replacement. That doesn't sound like a grace period in my book, even if your value was presumed to be denominated in seconds. I did wonder if the grace period might actually be after the replacement task was generated, but before it was added to the 'ready to send' queue - but that doesn't seem to work either.


Did you notice that for task 865007, the time between when it was sent and the deadline was 7 days 3.63 hours. In the WU template, is set to exactly 7 days, so where is the extra 3.63 hours coming from? Could the grace period be included in the deadline, or could there be a communication delay between scheduler and client which is causing the discrepancy?
ID: 568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 128
Credit: 106,945,858
RAC: 30,496
Message 569 - Posted: 20 Mar 2012, 8:28:53 UTC - in response to Message 568.  

Did you notice that for task 865007, the time between when it was sent and the deadline was 7 days 3.63 hours. In the WU template, <delay_bound> is set to exactly 7 days, so where is the extra 3.63 hours coming from? Could the grace period be included in the deadline, or could there be a communication delay between scheduler and client which is causing the discrepancy?

To be honest, I don't have a clue.

But by complete coincidence, David fixed a bug like that last night:

Changeset 25458

scheduler: fix a bug in assigning deadlines to resent jobs.
From Kevin.

The 'Kevin' in question will be Kevin Reed of IBM/World Community Grid - if anyone has noticed a discussion about deadlines there, it might be useful to pass it on.

ID: 569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 926
Credit: 97,224,939
RAC: 55,139
Message 570 - Posted: 20 Mar 2012, 20:37:27 UTC - in response to Message 569.  

Well, the WU that started this whole discussion has now disappeared from the database, no doubt from db_purge doing it's job. My fault, I should have disabled db_purge.

If and when Rasputin returns this result, I assume it will just end up in the bit bucket. I bet he wouldn't have gotten credit anyways due to the invalid grace period. The question now is how to assign manual credit (as I promised) to a non existant result...

ID: 570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 571 - Posted: 21 Mar 2012, 17:06:57 UTC - in response to Message 570.  

Well, the WU that started this whole discussion has now disappeared from the database, no doubt from db_purge doing it's job. My fault, I should have disabled db_purge.


The question now is how to assign manual credit (as I promised) to a non existant result...


no way - now you can only edit the user record and raise his credit.

ID: 571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Hoshione

Send message
Joined: 19 Aug 11
Posts: 1
Credit: 558,615
RAC: 511
Message 573 - Posted: 22 Mar 2012, 18:12:37 UTC

Good evening the same thing happend to me,

6 (six) WUs running the same time since betwen 62 and 89 h, reached betw. 55 and 59 percent with a maturity of 55 - 66 h. The Deadline for all of them are today 23:58 o´clock. What schould i do ?

Is there a Chance to extend the computation time?

Thanks for a quick reply

Hoshione

a proud member of http://www.seti-germany.de/
ID: 573 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 19 Aug 11
Posts: 76
Credit: 2,002,860
RAC: 0
Message 574 - Posted: 22 Mar 2012, 18:46:57 UTC - in response to Message 573.  

Good evening the same thing happend to me,

6 (six) WUs running the same time since betwen 62 and 89 h, reached betw. 55 and 59 percent with a maturity of 55 - 66 h. The Deadline for all of them are today 23:58 o´clock. What schould i do ?


suspend all other projects and give them another day..

ID: 574 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : abort?


Main page · Your account · Message boards


Copyright © 2019 Arizona State University