Message boards :
Number crunching :
Error in the PARI system
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 Aug 11 Posts: 7 Credit: 100,106 RAC: 0 |
Caught the output of a computation error on Linux AMD64 BOINC core and manager ver 6.10.58 ia32libs installed most wu's complete successfully client app ver is 1.03 output follows Stderr output <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> APP: Checkpoint Flag = 0. APP: a21 Starting Value = -10000000. APP: PolyCount starting value = 0. APP: Stat Count 1 = 0. APP: Stat Count 2 = 0. APP: Stat Count 3 = 0. APP: Reading file ../../projects/stat.la.asu.edu_NumberFields/wu_12E10_SF-3-0_Idx2_Grp9019of11232.dat Disc Bound = 120000000000.00000000 K = y^2 + 3 TgtFlag = 0 a1 Index = 2 a5 = 4 + 8w a22 = 2 |dK| = 3 Signature = [0,1] sig1w = [1, -0.50000000000000000000000000000000000000 - 0.86602540378443864676372317075293618347*I] sig2w = [1, -0.50000000000000000000000000000000000000 + 0.86602540378443864676372317075293618347*I] a11 = 0 a12 = 1 sig1a1 = -0.50000000000000000000000000000000000000 - 0.86602540378443864676372317075293618347*I sig2a1 = -0.50000000000000000000000000000000000000 + 0.86602540378443864676372317075293618347*I Ca1_pre = 0.400000 APP: Opening output file ../../projects/stat.la.asu.edu_NumberFields/wu_12E10_SF-3-0_Idx2_Grp9019of11232_1_0 APP: Now starting the Martinet search: 2nd part of Martinet bound = 28.284271. Martinet bound = 28.684271. a21_L = -6. a21_U = 7. APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done APP: Starting GetDecics checkpoint. APP: GetDecics checkpoint done *** segmentation fault: bug in PARI or calling program. *** Error in the PARI system. End of program. </stderr_txt> ]]> searched on "libpari" and "PARI/GP" and found verious references to the same output message. a more current discussion thread can be found here: http://old.nabble.com/Trapping-errors-in-library-mode-td31332062.html This could be a hard one to debug. Timing shows it ran for 4,890.10 seconds before the error occurred. HTH |
Send message Joined: 8 Jul 11 Posts: 1354 Credit: 562,926,032 RAC: 691,336 |
Thanks for bringing this to my attention. After reading the link, I think my best bet will be to update my version of the pari libraries. My goal for this weekend is to hunt down these last few remaining bugs. |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
I'm getting PARI errors too. http://stat.la.asu.edu/NumberFields/result.php?resultid=59267 *** overflow in t_INT-->t_INT assignment. *** Error in the PARI system. End of program. http://stat.la.asu.edu/NumberFields/result.php?resultid=59281 *** bug in PARI/GP (Segmentation Fault), please report *** Error in the PARI system. End of program. If you browse through my errored tasks you'll also see I've been getting "Maximum disk size exceeded", exit code -177, but I haven't seen anymore of those since alloting more disk space to BOINC so ignore them. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 8 Jul 11 Posts: 1354 Credit: 562,926,032 RAC: 691,336 |
I'm getting PARI errors too. I am aware of the pari seg faults. It seems like every time I find a bug and fix it, another one appears. Needless to say, that's on the top of my list of to-do's. Out of curiosity, what did you need to set your maximum disk usage to? I just checked and mine was set to 20GB which I'm sure is overkill. I've noticed a few other users with exit code -177, so this information might help them set their preferences. |
Send message Joined: 8 Jul 11 Posts: 46 Credit: 7,144,042 RAC: 0 |
I'm getting PARI errors too. but we should have very low disk requirements. We don't have big data sets to scan like Einsteinand the work units are tiny. I don't see how we ever triggered that. |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
I had 20 GB allotted on both of my hosts before I joined NumberFields because I run a project (Test4Theory@home) on both of them that needs 10 GB. When I noticed the disk space errors a couple days ago I bumped it up to 30 GB. That seems to have fixed my 2 core computer but I just now checked the i7 (8 cores with HT) and it's still overrunning the disk space, probably because it runs up to 7 NumberFields tasks concurrently. I've bumped that one up to 40GB now and will continue to monitor it. I wonder how BOINC determines which project exceeds the limit? None of my other projects get that error. How does it know NumberFields is to blame and not Test4Theory, for example? All projects draw from the 1 well. It's not like they each have their individual allotment. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
Hmmm. It looks like 40 GB isn't enough. I just had 3 tasks all crash at the same time with exit code -177 on my i7. They are task number 59501, 58172 and 58115. There were 7 NumberFields tasks running concurrently. Those 3 crashed at exactly the same time, the other 4 are still running.. Why not the other 4 too? Why not all 8 tasks? The max disk space was exceeded so why not all 8 tasks? What differentiates those 3 from the others? I'm beginning to think it's not the limit in BOINC preferences that's being exceeded. I think it's the rsc_disk_bound that's exceeded. But that wouldn't give an exit code -177, unless the exit code lookup table is screwed up or the code that does the lookup is wonky? OK, I've boosted the limit in BOINC prefs to 50 GB. We'll see what happens. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 8 Jul 11 Posts: 1354 Credit: 562,926,032 RAC: 691,336 |
Hmmm. It looks like 40 GB isn't enough. I just had 3 tasks all crash at the same time with exit code -177 on my i7. They are task number 59501, 58172 and 58115. There were 7 NumberFields tasks running concurrently. Those 3 crashed at exactly the same time, the other 4 are still running.. Why not the other 4 too? Why not all 8 tasks? The max disk space was exceeded so why not all 8 tasks? What differentiates those 3 from the others? You shouldn't need that much memory. I think something else is wrong. The ones that crashed with code -177 had strange looking results in the stderr.txt. You might need to reset the project. I saw similar results on my machine when I ran stand alone and tried to manually change the checkpoint file (I was testing the checkpointing). |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
I've reset NumberFields and lowered BOINC's disk space allottment to 20 GB. Ill let you know if I get more -177 errors. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 19 Aug 11 Posts: 1 Credit: 412,883 RAC: 6 |
http://stat.la.asu.edu/NumberFields/result.php?resultid=59309 *** impossible assignment I-->I *** Error in the PARI system. End of program. |
Send message Joined: 8 Jul 11 Posts: 1354 Credit: 562,926,032 RAC: 691,336 |
I've reset NumberFields and lowered BOINC's disk space allottment to 20 GB. Ill let you know if I get more -177 errors. Ok. Thanks! |
Send message Joined: 8 Jul 11 Posts: 1354 Credit: 562,926,032 RAC: 691,336 |
http://stat.la.asu.edu/NumberFields/result.php?resultid=59309 Thanks for reporting! I have been able to replicate these errors in the pari/gp interactive shell. I submitted a bug report to them and they are currently working it. Hopefully I will have a work-around soon. The good news is it's pretty rare (about 1 or 2 for every 10000 WUs). |
Send message Joined: 23 Aug 11 Posts: 2 Credit: 10,004,759 RAC: 0 |
One task wu_12E10_SF-3-0_Idx3_Grp12251of14586b_7 errored out after interrupted run out of checkpoint (Boinc panic mode by other project) <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> . . APP: GetBoundedDecics checkpointed. *** overflow in t_INT-->t_INT assignment. *** Error in the PARI system. End of program. </stderr_txt> ]]>X6 1090T Ubuntu 10.04.3 64bit |
Send message Joined: 19 Aug 11 Posts: 2 Credit: 263,904 RAC: 0 |
One task wu_12E10_SF-3-0_Idx3_Grp12251of14586b_7 errored out after interrupted run out of checkpoint (Boinc panic mode by other project) I can confirm this with an other workunit |
Send message Joined: 8 Jul 11 Posts: 1354 Credit: 562,926,032 RAC: 691,336 |
One task wu_12E10_SF-3-0_Idx3_Grp12251of14586b_7 errored out after interrupted run out of checkpoint (Boinc panic mode by other project) This is that same pari bug that's been haunting us for weeks now. The pari people have confirmed that it's a bug and I am just waiting to hear back from them to get a fix. |