Need more Time!

Message boards : Number crunching : Need more Time!
Message board moderation

To post messages, you must log in.

AuthorMessage
Testing32b

Send message
Joined: 2 Apr 16
Posts: 2
Credit: 3,389
RAC: 0
Message 1642 - Posted: 10 Apr 2016, 11:08:43 UTC

Hello, i need more time for a task is that possible?
ID: 1642 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1643 - Posted: 10 Apr 2016, 15:57:23 UTC - in response to Message 1642.  

Hello, i need more time for a task is that possible?


It looks like you only have 2 in progress and the earliest is due April 20th. Or are you referring to some other WU that has already timed out?
ID: 1643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dj Ninja

Send message
Joined: 23 Feb 13
Posts: 29
Credit: 21,480,710
RAC: 0
Message 1646 - Posted: 12 Apr 2016, 0:35:40 UTC

I have to make the same announcement due to the following two WUs:

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=15161036
http://numberfields.asu.edu/NumberFields/workunit.php?wuid=15161091

Both WUs are still running, 57% and 71% done after ~272 hours.
ID: 1646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1647 - Posted: 12 Apr 2016, 4:39:46 UTC - in response to Message 1646.  

I can't extend the deadline after it's already past, but the server still accepts late results, so it doesn't hurt to let it continue. If someone manages to return it before you, let me know and I will get you the credit.
ID: 1647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dj Ninja

Send message
Joined: 23 Feb 13
Posts: 29
Credit: 21,480,710
RAC: 0
Message 1654 - Posted: 14 Apr 2016, 2:56:26 UTC

I got the first one (#15161091) of my "fat" WUs down.
Unfortunately, it found nothing and might have blown the credit cap once again.

# The search is complete. Stats:
# Inspected 1511495782 polynomials.
# Num Polys passing PolDisc test = 0 (0%).
# Num Polys passing irreducibility test = 0 (0%).
# Num Polys passing field disc test = 0 (0%).
# Elapsed Time = 1158462 (sec)

The second fat WU is still going at 57% and 322 hours of running time.
ID: 1654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1655 - Posted: 14 Apr 2016, 5:56:52 UTC - in response to Message 1654.  

I got the first one (#15161091) of my "fat" WUs down.
Unfortunately, it found nothing and might have blown the credit cap once again.

# The search is complete. Stats:
# Inspected 1511495782 polynomials.
# Num Polys passing PolDisc test = 0 (0%).
# Num Polys passing irreducibility test = 0 (0%).
# Num Polys passing field disc test = 0 (0%).
# Elapsed Time = 1158462 (sec)

The second fat WU is still going at 57% and 322 hours of running time.


Ok, thanks for the report. I took care of the credit cap issue.
ID: 1655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Roadranner

Send message
Joined: 3 Sep 12
Posts: 2
Credit: 15,509,853
RAC: 92
Message 1656 - Posted: 14 Apr 2016, 7:09:21 UTC

I finished another long runner after 32 days, but it was done before by another user:

wu_Qsqrt421_DS3x8_CV1_S815_N2_-194088_N1_805269to806765

Could you please have a look at it.
ID: 1656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,373,642
RAC: 128,024
Message 1657 - Posted: 14 Apr 2016, 7:29:10 UTC - in response to Message 1656.  

Easier to find as WU 12731776
ID: 1657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1658 - Posted: 14 Apr 2016, 14:53:55 UTC - in response to Message 1656.  

I finished another long runner after 32 days, but it was done before by another user:

wu_Qsqrt421_DS3x8_CV1_S815_N2_-194088_N1_805269to806765

Could you please have a look at it.


I took care of that.

Anyone else reading this... if you were denied credit on a long runner because of missing a deadline, let me know and I can adjust your credit.

Thanks!
ID: 1658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Testing32b

Send message
Joined: 2 Apr 16
Posts: 2
Credit: 3,389
RAC: 0
Message 1660 - Posted: 15 Apr 2016, 11:02:46 UTC

I am a bit dissapointed, i am running a task for 3 days now and it just does not finish. I am at 99,683% and i get 0,100% per 12 hours. I am sorry, but your programming seems to be bad.
ID: 1660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MaynardVizzutti

Send message
Joined: 11 Apr 15
Posts: 4
Credit: 4,705,812
RAC: 7
Message 1661 - Posted: 15 Apr 2016, 13:21:11 UTC - in response to Message 1660.  

[...] i am running a task for 3 days now and it just does not finish.


Welcome to the project!

This is not unusual with some work units. I have processed several such units, as have many other volunteers. The first 80% or so goes pretty fast, then progress slows greatly. I have spent a day or two on the last 1% of a unit.

In almost all cases, the built-in deadline grace period is long enough to cover the completion of the work unit, and when it's not, the project administrators are very accommodating about awarding appropriate credit for lost time.

I predict your work unit will complete normally on its own. If it goes on for several more days, you might want to mention it again here.



[/quote]
ID: 1661 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1662 - Posted: 15 Apr 2016, 16:29:22 UTC - in response to Message 1660.  

I am a bit dissapointed, i am running a task for 3 days now and it just does not finish. I am at 99,683% and i get 0,100% per 12 hours. I am sorry, but your programming seems to be bad.


It's not a matter of programming. The cases that are exhibiting the strange behavior are the Qsqrt421 cases. This is a special search that is trying to leverage the GetDecics app and not have to write a new app. The GetDecics app was not originally designed for such a large discriminant base field (i.e. disc=421), and unfortunately the timing has strange side effects.

The basic issue is this. If you plot timing as a function of loop index you get something that vaguely resembles a Gaussian. The lower discriminant base fields have a nice large variance which gives something closer to a uniform timing distribution when you break the region into pieces (we use larger pieces over the tail of the Gaussian and smaller pieces around the mean). The problem with the large base field cases (Qsqrt421) is that the variance of the Gaussian gets very small and the timing plot starts to resemble a delta function. The code that breaks up the search space estimates the mean of the Gaussian, and when the Gaussian has a very small variance, the error in the estimate causes a larger impact on timing. When the estimate of the mean is off, the end of a WU starts climbing up the delta function, and you end up spending 99% of your time in the last 1% of the search.

There is nothing I can do about this, other than writing a new app for this specific case, and maybe use trickle up WUs (which somebody suggested in another thread). I increased the deadline and the grace period to help alleviate the problems, which is a much easier solution than writing a whole new app. Especially since this "special" search will be coming to an end soon.

If I have some time later I will try and post some plots of timing so you can see what I am talking about.
ID: 1662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1663 - Posted: 15 Apr 2016, 16:44:14 UTC - in response to Message 1662.  

Something else to be aware of...
I have seen the progress meter go to 100.000% and the WU still continues processing for another few hours. I believe what is happening is that the progress is really 99.9995% and the client is rounding it up to 100. No need to worry that it's stuck; the WU will eventually finish.
ID: 1663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,373,642
RAC: 128,024
Message 1664 - Posted: 15 Apr 2016, 22:43:40 UTC - in response to Message 1663.  

Something else to be aware of...
I have seen the progress meter go to 100.000% and the WU still continues processing for another few hours. I believe what is happening is that the progress is really 99.9995% and the client is rounding it up to 100. No need to worry that it's stuck; the WU will eventually finish.

It's even worse than that. I have a current task which has been running over 5 days, and is displaying 100%. (Don't worry, I've seen several like that, and most of them have completed already - it can take as long as it wants. I think this one has already been showing 100% for well over a day.)

The key file to investigate is boinc_task_state.xml in the task's slot directory.

It says:

<active_task>
    <project_master_url>http://numberfields.asu.edu/NumberFields/</project_master_url>
    <result_name>wu_Qsqrt421_DS3x8_CV1_S815_N2_-194161_N1_805982_k2_-1_0</result_name>
    <checkpoint_cpu_time>421149.300000</checkpoint_cpu_time>
    <checkpoint_elapsed_time>499853.935949</checkpoint_elapsed_time>
    <fraction_done>0.000000</fraction_done>
    <peak_working_set_size>7725056</peak_working_set_size>
    <peak_swap_size>304140288</peak_swap_size>
    <peak_disk_usage>13256</peak_disk_usage>
</active_task>

<fraction_done> would normally be filled in by your application - I think you've said that the overheads of adding reporting at the very innermost loop would be too high, so you've left it a little further out. But that means that in this particular parameter space, all the progress comes from the BOINC client's attempt to reassure that all is well, by inventing its own pseudo-progress to report.

By design, pseudo-progress tends asymptotically to a fraction of 1 (100%), but never reaches it. Because this task has run so far beyond its initial estimate (probably 7 hours on this machine), the asymptotic limit has become indistinguishable from 1 (to three decimal places). It's not the first time that BOINC coding has failed to cope transparently with extreme cases.

The checkpoint (state) file for this task contains

0
-194161
805982
-1
91
3518541053
0
0
0
499851

and stderr's report on the Martinet search has reached

Now starting the targeted Martinet search:
    N2_L = -194161.
    N2_U = -194161.
      N2 = -194161.
        N1_L = 805982.
        N1_U = 805982.
          N1 = 805982.
            k2 range: -1 => -1.
            k2 = -1.
            k1 range: 76 => 136.
            k1 = 76.
            k1 = 77.
            k1 = 78.
            k1 = 79.
            k1 = 80.
            k1 = 81.
            k1 = 82.
            k1 = 83.
            k1 = 84.
            k1 = 85.
            k1 = 86.
            k1 = 87.
            k1 = 88.
            k1 = 89.
            k1 = 90.
            k1 = 91.

if that helps you track down what it's up to.
ID: 1664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1666 - Posted: 16 Apr 2016, 3:49:24 UTC - in response to Message 1664.  

Richard,

Thanks for that information; it helped me to track down the problem.

My formula for fraction_done had one term that was the ratio of 2 integers - I needed to type cast them to float otherwise the ratio got truncated to 0. The issue was more noticable on the WUs where the outer loops only process 1 value and the truncated term was the dominant term in the formula (all other terms go to zero).

I originally thought things were working because the progress meter was going up; I didn't realize that BOINC was doing it's own progress estimation. This also explains why my linux machines, which use an older client, go directly from 0 to 100% on these types of WUs.

This is not too critical, as it only affects the progress meter, but I will try to get updated apps out there soon.
ID: 1666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,373,642
RAC: 128,024
Message 1669 - Posted: 16 Apr 2016, 13:43:23 UTC - in response to Message 1666.  

Always glad to help. We tracked that one down just in time - task 16717469 has just finished and reported all by itself, taking the evidence with it.
ID: 1669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dj Ninja

Send message
Joined: 23 Feb 13
Posts: 29
Credit: 21,480,710
RAC: 0
Message 1675 - Posted: 17 Apr 2016, 2:30:13 UTC

The second fat WU has finally given up the ghost after 16 days of working time, but again with no noticeable outcome. Credit cap?
# Inspected 2017069449 polynomials.
ID: 1675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,708,838
RAC: 288,154
Message 1676 - Posted: 17 Apr 2016, 3:46:11 UTC - in response to Message 1675.  

The second fat WU has finally given up the ghost after 16 days of working time, but again with no noticeable outcome. Credit cap?
# Inspected 2017069449 polynomials.


Took care of the credit cap.

The Qsqrt421 searches are looking for an extremely rare field, so most of the time nothing will be found, but many polynomials will be tested.
ID: 1676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dj Ninja

Send message
Joined: 23 Feb 13
Posts: 29
Credit: 21,480,710
RAC: 0
Message 1690 - Posted: 19 Apr 2016, 3:32:00 UTC

Is user rwild reading here?

What's about WU #15040289? Are you still on it or is it up to me?

I'm participating in the Primegrid race the next days. I'll be back!
ID: 1690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Need more Time!


Main page · Your account · Message boards


Copyright © 2024 Arizona State University