Long running wu_Qsqrt421_DS1x5 units - how long to let them run?

Message boards : Number crunching : Long running wu_Qsqrt421_DS1x5 units - how long to let them run?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,731,838
RAC: 288,248
Message 1552 - Posted: 23 Feb 2016, 14:52:37 UTC - in response to Message 1551.  

After 44 days my last long running wu (task #13680649) ended up in a computation error. :-(
On the day before I saw the wu stopping for a few times without any reason.


Sorry to hear that. It looks like there was a problem acquiring the lock file and not with the app. Either way, I granted you credit for the lost cpu cycles. Hopefully one of your wing men can finish it.
ID: 1552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 20 Dec 14
Posts: 17
Credit: 12,153,123
RAC: 0
Message 1560 - Posted: 2 Mar 2016, 3:51:18 UTC

I think that I will need a time extension on result 15014223 which is part of work unit 12346684. My stderr is below:
Checkpoint Flag = 0.
Cvec Starting Index = 0.
N1 Start = 0.
N2 Start = 0.
k1 Start = 0.
k2 Start = 0.
PolyCount starting value = 0.
Stat Count 1 = 0.
Stat Count 2 = 0.
Stat Count 3 = 0.
Elapsed Time = 0 (sec).
Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat
    K = x^2 - 421
    S = [2]
    Disc Bound = 16
    Skip = P^1
    Num Congruences = 1
    SCALE = 0.815000
    N2_MIN = -54
    N2_MAX = -54
    N1_MIN = -502
    N1_MAX = 447
    |dK| = 421
    Signature = [2,0]
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0
Now starting the targeted Martinet search:
    N2_L = -54.
    N2_U = -54.
      N2 = -54.
        N1_L = -502.
        N1_U = 447.
          N1 = -502.
          N1 = -501.
          N1 = -500.
          N1 = -499.
          N1 = -498.
          N1 = -497.
          N1 = -496.
          N1 = -495.
          N1 = -494.
          N1 = -493.
          N1 = -492.
          N1 = -491.
          N1 = -490.
          N1 = -489.
          N1 = -488.
          N1 = -487.
          N1 = -486.
          N1 = -485.
          N1 = -484.
          N1 = -483.
          N1 = -482.
          N1 = -481.
          N1 = -480.
          N1 = -479.
          N1 = -478.
          N1 = -477.
          N1 = -476.
          N1 = -475.
          N1 = -474.
          N1 = -473.
          N1 = -472.
          N1 = -471.
          N1 = -470.
          N1 = -469.
          N1 = -468.
          N1 = -467.
          N1 = -466.
          N1 = -465.
          N1 = -464.
          N1 = -463.
          N1 = -462.
          N1 = -461.
          N1 = -460.
          N1 = -459.
          N1 = -458.
          N1 = -457.
          N1 = -456.
          N1 = -455.
          N1 = -454.
          N1 = -453.
          N1 = -452.
          N1 = -451.
          N1 = -450.
          N1 = -449.
          N1 = -448.
          N1 = -447.
          N1 = -446.
          N1 = -445.
          N1 = -444.
          N1 = -443.
          N1 = -442.
          N1 = -441.
          N1 = -440.
          N1 = -439.
          N1 = -438.
          N1 = -437.
          N1 = -436.
          N1 = -435.
          N1 = -434.
          N1 = -433.
          N1 = -432.
          N1 = -431.
          N1 = -430.
          N1 = -429.
          N1 = -428.
          N1 = -427.
          N1 = -426.
          N1 = -425.
          N1 = -424.
          N1 = -423.
          N1 = -422.
          N1 = -421.
          N1 = -420.
          N1 = -419.
          N1 = -418.
          N1 = -417.
          N1 = -416.
          N1 = -415.
          N1 = -414.
          N1 = -413.
          N1 = -412.
          N1 = -411.
          N1 = -410.
          N1 = -409.
          N1 = -408.
          N1 = -407.
          N1 = -406.
          N1 = -405.
          N1 = -404.
          N1 = -403.
          N1 = -402.
          N1 = -401.
          N1 = -400.
          N1 = -399.
          N1 = -398.
          N1 = -397.
          N1 = -396.
          N1 = -395.
          N1 = -394.
          N1 = -393.
          N1 = -392.
          N1 = -391.
          N1 = -390.
          N1 = -389.
          N1 = -388.
          N1 = -387.
          N1 = -386.
          N1 = -385.
          N1 = -384.
          N1 = -383.
          N1 = -382.
          N1 = -381.
          N1 = -380.
          N1 = -379.
          N1 = -378.
          N1 = -377.
          N1 = -376.
          N1 = -375.
          N1 = -374.
          N1 = -373.
          N1 = -372.
          N1 = -371.
          N1 = -370.
          N1 = -369.
          N1 = -368.
          N1 = -367.
          N1 = -366.
          N1 = -365.
          N1 = -364.
          N1 = -363.
          N1 = -362.
          N1 = -361.
          N1 = -360.
          N1 = -359.
          N1 = -358.
          N1 = -357.
          N1 = -356.
          N1 = -355.
          N1 = -354.
          N1 = -353.
          N1 = -352.
          N1 = -351.
          N1 = -350.
          N1 = -349.
          N1 = -348.
          N1 = -347.
          N1 = -346.
          N1 = -345.
          N1 = -344.
          N1 = -343.
          N1 = -342.
          N1 = -341.
          N1 = -340.
          N1 = -339.
          N1 = -338.
          N1 = -337.
          N1 = -336.
          N1 = -335.
          N1 = -334.
          N1 = -333.
          N1 = -332.
          N1 = -331.
          N1 = -330.
          N1 = -329.
          N1 = -328.
          N1 = -327.
          N1 = -326.
          N1 = -325.
          N1 = -324.
          N1 = -323.
          N1 = -322.
          N1 = -321.
          N1 = -320.
          N1 = -319.
          N1 = -318.
          N1 = -317.
          N1 = -316.
          N1 = -315.
          N1 = -314.
          N1 = -313.
          N1 = -312.
          N1 = -311.
          N1 = -310.
          N1 = -309.
          N1 = -308.
          N1 = -307.
          N1 = -306.
          N1 = -305.
          N1 = -304.
          N1 = -303.
          N1 = -302.
          N1 = -301.
          N1 = -300.
          N1 = -299.
          N1 = -298.
          N1 = -297.
          N1 = -296.
          N1 = -295.
          N1 = -294.
          N1 = -293.
          N1 = -292.
          N1 = -291.
          N1 = -290.
          N1 = -289.
          N1 = -288.
          N1 = -287.
          N1 = -286.
          N1 = -285.
          N1 = -284.
          N1 = -283.
          N1 = -282.
          N1 = -281.
          N1 = -280.
          N1 = -279.
          N1 = -278.
          N1 = -277.
          N1 = -276.
          N1 = -275.
          N1 = -274.
          N1 = -273.
          N1 = -272.
          N1 = -271.
          N1 = -270.
          N1 = -269.
          N1 = -268.
          N1 = -267.
          N1 = -266.
          N1 = -265.
          N1 = -264.
          N1 = -263.
          N1 = -262.
          N1 = -261.
          N1 = -260.
          N1 = -259.
          N1 = -258.
          N1 = -257.
          N1 = -256.
          N1 = -255.
          N1 = -254.
          N1 = -253.
          N1 = -252.
          N1 = -251.
          N1 = -250.
          N1 = -249.
          N1 = -248.
          N1 = -247.
          N1 = -246.
          N1 = -245.
          N1 = -244.
          N1 = -243.
          N1 = -242.
          N1 = -241.
          N1 = -240.
          N1 = -239.
          N1 = -238.
          N1 = -237.
          N1 = -236.
          N1 = -235.
          N1 = -234.
          N1 = -233.
          N1 = -232.
          N1 = -231.
          N1 = -230.
          N1 = -229.
          N1 = -228.
          N1 = -227.
          N1 = -226.
          N1 = -225.
          N1 = -224.
          N1 = -223.
          N1 = -222.
          N1 = -221.
          N1 = -220.
          N1 = -219.
          N1 = -218.
          N1 = -217.
          N1 = -216.
          N1 = -215.
          N1 = -214.
          N1 = -213.
          N1 = -212.
          N1 = -211.
          N1 = -210.
          N1 = -209.
          N1 = -208.
          N1 = -207.
          N1 = -206.
          N1 = -205.
          N1 = -204.
          N1 = -203.
          N1 = -202.
          N1 = -201.
          N1 = -200.
          N1 = -199.
          N1 = -198.
          N1 = -197.
          N1 = -196.
          N1 = -195.
          N1 = -194.
          N1 = -193.
          N1 = -192.
          N1 = -191.
          N1 = -190.
          N1 = -189.
          N1 = -188.
          N1 = -187.
          N1 = -186.
          N1 = -185.
          N1 = -184.
          N1 = -183.
          N1 = -182.
          N1 = -181.
          N1 = -180.
          N1 = -179.
          N1 = -178.
          N1 = -177.
          N1 = -176.
          N1 = -175.
          N1 = -174.
          N1 = -173.
          N1 = -172.
          N1 = -171.
          N1 = -170.
          N1 = -169.
          N1 = -168.
          N1 = -167.
          N1 = -166.
          N1 = -165.
          N1 = -164.
          N1 = -163.
          N1 = -162.
          N1 = -161.
          N1 = -160.
          N1 = -159.
          N1 = -158.
          N1 = -157.
          N1 = -156.
          N1 = -155.
          N1 = -154.
          N1 = -153.
          N1 = -152.
          N1 = -151.
          N1 = -150.
          N1 = -149.
          N1 = -148.
          N1 = -147.
          N1 = -146.
          N1 = -145.
          N1 = -144.
          N1 = -143.
          N1 = -142.
          N1 = -141.
          N1 = -140.
          N1 = -139.
          N1 = -138.
          N1 = -137.
          N1 = -136.
Reading checkpoint file.
Checkpoint Flag = 1.
Cvec Starting Index = 0.
N1 Start = -137.
N2 Start = -54.
k1 Start = 90.
k2 Start = 6.
PolyCount starting value = 1626159753.
Stat Count 1 = 0.
Stat Count 2 = 0.
Stat Count 3 = 0.
Elapsed Time = 1194533 (sec).
Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat
    K = x^2 - 421
    S = [2]
    Disc Bound = 16
    Skip = P^1
    Num Congruences = 1
    SCALE = 0.815000
    N2_MIN = -54
    N2_MAX = -54
    N1_MIN = -502
    N1_MAX = 447
    |dK| = 421
    Signature = [2,0]
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0
Now starting the targeted Martinet search:
    N2_L = -54.
    N2_U = -54.
      N2 = -54.
        N1_L = -137.
        N1_U = 447.
          N1 = -137.
          N1 = -136.
Reading checkpoint file.
Checkpoint Flag = 1.
Cvec Starting Index = 0.
N1 Start = -137.
N2 Start = -54.
k1 Start = 90.
k2 Start = 6.
PolyCount starting value = 1626159753.
Stat Count 1 = 0.
Stat Count 2 = 0.
Stat Count 3 = 0.
Elapsed Time = 1194533 (sec).
Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat
    K = x^2 - 421
    S = [2]
    Disc Bound = 16
    Skip = P^1
    Num Congruences = 1
    SCALE = 0.815000
    N2_MIN = -54
    N2_MAX = -54
    N1_MIN = -502
    N1_MAX = 447
    |dK| = 421
    Signature = [2,0]
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0
Now starting the targeted Martinet search:
    N2_L = -54.
    N2_U = -54.
      N2 = -54.
        N1_L = -137.
        N1_U = 447.
          N1 = -137.
          N1 = -136.
          N1 = -135.
          N1 = -134.
Reading checkpoint file.
Checkpoint Flag = 1.
Cvec Starting Index = 0.
N1 Start = -134.
N2 Start = -54.
k1 Start = 89.
k2 Start = 4.
PolyCount starting value = 1788972354.
Stat Count 1 = 0.
Stat Count 2 = 0.
Stat Count 3 = 0.
Elapsed Time = 1263973 (sec).
Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat
    K = x^2 - 421
    S = [2]
    Disc Bound = 16
    Skip = P^1
    Num Congruences = 1
    SCALE = 0.815000
    N2_MIN = -54
    N2_MAX = -54
    N1_MIN = -502
    N1_MAX = 447
    |dK| = 421
    Signature = [2,0]
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0
Now starting the targeted Martinet search:
    N2_L = -54.
    N2_U = -54.
      N2 = -54.
        N1_L = -134.
        N1_U = 447.
          N1 = -134.
Reading checkpoint file.
Checkpoint Flag = 1.
Cvec Starting Index = 0.
N1 Start = -134.
N2 Start = -54.
k1 Start = 89.
k2 Start = 4.
PolyCount starting value = 1788972354.
Stat Count 1 = 0.
Stat Count 2 = 0.
Stat Count 3 = 0.
Elapsed Time = 1263973 (sec).
Reading file ../../projects/numberfields.asu.edu_NumberFields/Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447.dat
    K = x^2 - 421
    S = [2]
    Disc Bound = 16
    Skip = P^1
    Num Congruences = 1
    SCALE = 0.815000
    N2_MIN = -54
    N2_MAX = -54
    N1_MIN = -502
    N1_MAX = 447
    |dK| = 421
    Signature = [2,0]
Opening output file ../../projects/numberfields.asu.edu_NumberFields/wu_Qsqrt421_DS1x5_CV2_S815_N2_-54_N1_-502to447_6_0
Now starting the targeted Martinet search:
    N2_L = -54.
    N2_U = -54.
      N2 = -54.
        N1_L = -134.
        N1_U = 447.
          N1 = -134.

The few interruptions are due to an OS update, Microsoft Edge crash (this is rare, fortunately!), a video card driver update (I will shut BOINC down due to GPU projects before updating a video card driver because changing a video card driver while a program is using it to compute will cause that program to crash and result in a computation error.), and shutting down BOINC to play some occasional games that I just bought on Steam on February 29, 2016 (Papers, Please and Ys I & II Chronicles+). I have been letting this work unit get crunched for almost all of the time since February 15 to now with a few exceptions.

If this work unit fails, could you reissue this work unit after chopping it up to much smaller work units? Each value of N1 generally seems to take multiple hours. My estimate is around 3 to 4 hours per value of N1.
ID: 1560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,731,838
RAC: 288,248
Message 1562 - Posted: 2 Mar 2016, 14:59:31 UTC - in response to Message 1560.  

There are now about 20 of these long WUs left (down from over 100). At some point soon, I will cancel the rest and just run them offline.

In regards to extending the deadline, that won't help with a task already in progress. But I can grant you manual credit if someone beats you to the finish line.
ID: 1562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,381,002
RAC: 127,954
Message 1567 - Posted: 2 Mar 2016, 23:14:56 UTC - in response to Message 1562.  

I've had two of these running for a while now (my longest one succumbed to an unfortunate BOINC coding error). Looks like I've recently picked up a third:



That one would be WU 12731691.
ID: 1567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 20 Dec 14
Posts: 17
Credit: 12,153,123
RAC: 0
Message 1568 - Posted: 3 Mar 2016, 2:25:09 UTC - in response to Message 1562.  

What I am afraid of is that another user gets the work unit after my result times out, and then that user aborts the replacement result. If timeouts are counted as errors, then that abortion pushes the error limit to the max of 8 errors, and the work unit will be automatically cancelled before anyone can complete and report it. There are currently three timeouts, two abortions, and one computation error. If timeouts are not considered errors when deciding whether to automatically cancel a work unit, then I do not have much to worry about. If they are considered errors, then my work and the work of the other wingmen who have decided to continue computing could get prematurely wasted by an automatic cancellation.
ID: 1568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,731,838
RAC: 288,248
Message 1569 - Posted: 3 Mar 2016, 5:44:03 UTC - in response to Message 1568.  

What I am afraid of is that another user gets the work unit after my result times out, and then that user aborts the replacement result. If timeouts are counted as errors, then that abortion pushes the error limit to the max of 8 errors, and the work unit will be automatically cancelled before anyone can complete and report it. There are currently three timeouts, two abortions, and one computation error. If timeouts are not considered errors when deciding whether to automatically cancel a work unit, then I do not have much to worry about. If they are considered errors, then my work and the work of the other wingmen who have decided to continue computing could get prematurely wasted by an automatic cancellation.


The timeouts are not considered errors, so I don't believe they count towards the cancellation limit. Somebody please correct me if I am wrong.
ID: 1569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond

Send message
Joined: 12 Aug 12
Posts: 7
Credit: 20,464,039
RAC: 0
Message 1578 - Posted: 5 Mar 2016, 3:24:09 UTC - in response to Message 1544.  

It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs.

The stderr for this WU looks particularly bad. I suspect it could take at least another 6 days to finish. I wont feel bad if you decide to kill it. Either way I will get you manual credit for the lost CPU cycles.

If anyone else has one of these bad WUs please report, either by message board or private message, so I can try and remove them from the system.

Sorry for the inconvenience!

I have 2 of these WUs running at the moment. One at 271 hours and the other at 385 hours. Leave them run?

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12420089

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866

Thanks/Ed

The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday).

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866

No happy returns from anyone yet.
ID: 1578 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDLS

Send message
Joined: 15 Mar 15
Posts: 11
Credit: 113,280,935
RAC: 0
Message 1579 - Posted: 5 Mar 2016, 3:41:55 UTC - in response to Message 1578.  

>>>The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday).
>>>http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866
>>>No happy returns from anyone yet.

I've been running that one too. On an i7-2600. For 1354 hours now. It's currently at 50.565%.

Sigh.
ID: 1579 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond

Send message
Joined: 12 Aug 12
Posts: 7
Credit: 20,464,039
RAC: 0
Message 1582 - Posted: 5 Mar 2016, 21:18:33 UTC - in response to Message 1579.  

>>>The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday).
>>>http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866
>>>No happy returns from anyone yet.

I've been running that one too. On an i7-2600. For 1354 hours now. It's currently at 50.565%.

Sigh.

Wow! Mine's at 50.079%, 793 hours at the moment.
AMD Phenom II X6 1055T CPU.
ID: 1582 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDLS

Send message
Joined: 15 Mar 15
Posts: 11
Credit: 113,280,935
RAC: 0
Message 1583 - Posted: 6 Mar 2016, 14:39:43 UTC - in response to Message 1582.  

>>>Wow! Mine's at 50.079%, 793 hours at the moment.
>>>AMD Phenom II X6 1055T CPU.

Your BOINC benchmark numbers are quite a bit higher than I'd expect a 1055T and Windows to produce. Are you overclocking?
ID: 1583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 20 Dec 14
Posts: 17
Credit: 12,153,123
RAC: 0
Message 1598 - Posted: 13 Mar 2016, 17:27:38 UTC

The first computer to get work unit 12346684 has finished it. I have aborted my result and suggest that everyone else still computing this work unit to abort it so that our energy is put to better use like solving some other work unit or saved. (I regrettably have to do the latter because the weather is too hot in my area to allow the heat from solving loads of work units to be acceptable anymore. I am living with my parents, and they do not want the heat anymore which means higher air conditioner bills. I will be back in the fall, when they will accept the heat.)
ID: 1598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,731,838
RAC: 288,248
Message 1601 - Posted: 14 Mar 2016, 6:18:14 UTC - in response to Message 1598.  

The first computer to get work unit 12346684 has finished it. I have aborted my result and suggest that everyone else still computing this work unit to abort it so that our energy is put to better use like solving some other work unit or saved. (I regrettably have to do the latter because the weather is too hot in my area to allow the heat from solving loads of work units to be acceptable anymore. I am living with my parents, and they do not want the heat anymore which means higher air conditioner bills. I will be back in the fall, when they will accept the heat.)


I granted you credit for the lost cpu cycles. I hope that eases the pain somewhat.
ID: 1601 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aurel
Avatar

Send message
Joined: 25 Feb 13
Posts: 216
Credit: 9,899,302
RAC: 0
Message 1603 - Posted: 16 Mar 2016, 23:44:26 UTC - in response to Message 1601.  

CPU time: 70 days 19 hours 21 min. 15 sec.

This might me the longest one. Amazing.
ID: 1603 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond

Send message
Joined: 12 Aug 12
Posts: 7
Credit: 20,464,039
RAC: 0
Message 1627 - Posted: 28 Mar 2016, 18:01:12 UTC - in response to Message 1578.  

It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs.

The stderr for this WU looks particularly bad. I suspect it could take at least another 6 days to finish. I wont feel bad if you decide to kill it. Either way I will get you manual credit for the lost CPU cycles.

If anyone else has one of these bad WUs please report, either by message board or private message, so I can try and remove them from the system.

Sorry for the inconvenience!


Thanks/Ed

The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday).

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866

No happy returns from anyone yet.

Just got home from vacation and checked this WU again. It's at 1342 hours and 52%. Looks like it's been completed by Capital Avionics on 3/24. I assume that I should abort it?
ID: 1627 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,731,838
RAC: 288,248
Message 1629 - Posted: 29 Mar 2016, 0:45:47 UTC - in response to Message 1627.  

It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs.

The stderr for this WU looks particularly bad. I suspect it could take at least another 6 days to finish. I wont feel bad if you decide to kill it. Either way I will get you manual credit for the lost CPU cycles.

If anyone else has one of these bad WUs please report, either by message board or private message, so I can try and remove them from the system.

Sorry for the inconvenience!


Thanks/Ed

The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday).

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866

No happy returns from anyone yet.

Just got home from vacation and checked this WU again. It's at 1342 hours and 52%. Looks like it's been completed by Capital Avionics on 3/24. I assume that I should abort it?


Yes, you can abort that one. I'll give you credit for it when I get home later. Thanks!
ID: 1629 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond

Send message
Joined: 12 Aug 12
Posts: 7
Credit: 20,464,039
RAC: 0
Message 1630 - Posted: 29 Mar 2016, 1:45:59 UTC - in response to Message 1629.  

It has recently come to my attention that the Qsqrt421 cases suffer from the same problem that the Bounded app did a couple weeks ago. I am currently looking into a similar fix for these WUs.

The stderr for this WU looks particularly bad. I suspect it could take at least another 6 days to finish. I wont feel bad if you decide to kill it. Either way I will get you manual credit for the lost CPU cycles.

If anyone else has one of these bad WUs please report, either by message board or private message, so I can try and remove them from the system.

Sorry for the inconvenience!


Thanks/Ed

The second one is now at 775 hours, still progressing (says 50% now, up from 49% yesterday).

http://numberfields.asu.edu/NumberFields/workunit.php?wuid=12404866

No happy returns from anyone yet.

Just got home from vacation and checked this WU again. It's at 1342 hours and 52%. Looks like it's been completed by Capital Avionics on 3/24. I assume that I should abort it?


Yes, you can abort that one. I'll give you credit for it when I get home later. Thanks!

Thanks Eric. Aborted at 1349 hours. Longest one I've seen so far.
ID: 1630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vik

Send message
Joined: 9 Apr 15
Posts: 2
Credit: 9,058,122
RAC: 0
Message 1665 - Posted: 16 Apr 2016, 3:18:14 UTC

I've noticed the same situation for two work units, due on 10 April.
For the first, the "Time Remaining" column has stayed at either 1 second remaining or 0 seconds remaining for about three days, and the Elapsed time column is now over 4 days.
For teh second the "Time Remaining" column has remained at around 20 seconds for today, with over two days elapsed.
ID: 1665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,731,838
RAC: 288,248
Message 1667 - Posted: 16 Apr 2016, 4:09:44 UTC - in response to Message 1665.  

I've noticed the same situation for two work units, due on 10 April.
For the first, the "Time Remaining" column has stayed at either 1 second remaining or 0 seconds remaining for about three days, and the Elapsed time column is now over 4 days.
For teh second the "Time Remaining" column has remained at around 20 seconds for today, with over two days elapsed.


I don't see the WUs you speak of. The earliest ones I see are due April 21; all other WUs on your task list appear to have completed successfully. Do you have a WU name or id?
ID: 1667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 28 Oct 11
Posts: 179
Credit: 220,381,002
RAC: 127,954
Message 1668 - Posted: 16 Apr 2016, 8:02:40 UTC - in response to Message 1667.  
Last modified: 16 Apr 2016, 8:02:55 UTC

I don't see the WUs you speak of. The earliest ones I see are due April 21; all other WUs on your task list appear to have completed successfully. Do you have a WU name or id?

Vik is possibly reading reading the 'due date' off BOINC Manager. The slow one we were discussing in the other thread is showing a due date of 16 Apr 2016, 23:56:47 locally, but 26 Apr 2016, 22:56:47 UTC on the website. The difference is made up of 1 hour for time zone offset, and 10 days grace period allowed by the project.
ID: 1668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,731,838
RAC: 288,248
Message 1670 - Posted: 16 Apr 2016, 15:46:03 UTC - in response to Message 1668.  

I don't see the WUs you speak of. The earliest ones I see are due April 21; all other WUs on your task list appear to have completed successfully. Do you have a WU name or id?

Vik is possibly reading reading the 'due date' off BOINC Manager. The slow one we were discussing in the other thread is showing a due date of 16 Apr 2016, 23:56:47 locally, but 26 Apr 2016, 22:56:47 UTC on the website. The difference is made up of 1 hour for time zone offset, and 10 days grace period allowed by the project.


Oh that's right, I forgot about the difference between what the manager shows and the true deadline on the server.
ID: 1670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Long running wu_Qsqrt421_DS1x5 units - how long to let them run?


Main page · Your account · Message boards


Copyright © 2024 Arizona State University