Error in the PARI system

Message boards : Number crunching : Error in the PARI system
Message board moderation

To post messages, you must log in.

AuthorMessage
AMDave

Send message
Joined: 19 Aug 11
Posts: 7
Credit: 100,106
RAC: 0
Message 65 - Posted: 25 Aug 2011, 9:00:37 UTC
Last modified: 25 Aug 2011, 9:06:34 UTC

Caught the output of a computation error on Linux AMD64
BOINC core and manager ver 6.10.58
ia32libs installed
most wu's complete successfully
client app ver is 1.03
output follows
Stderr output

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
APP: Checkpoint Flag = 0.
APP: a21 Starting Value = -10000000.
APP: PolyCount starting value = 0.
APP: Stat Count 1 = 0.
APP: Stat Count 2 = 0.
APP: Stat Count 3 = 0.
APP: Reading file ../../projects/stat.la.asu.edu_NumberFields/wu_12E10_SF-3-0_Idx2_Grp9019of11232.dat
Disc Bound = 120000000000.00000000
    K = y^2 + 3
    TgtFlag = 0
    a1 Index = 2
    a5 = 4 + 8w
    a22 = 2
    |dK| = 3
    Signature = [0,1]
  sig1w = [1, -0.50000000000000000000000000000000000000 - 0.86602540378443864676372317075293618347*I]
  sig2w = [1, -0.50000000000000000000000000000000000000 + 0.86602540378443864676372317075293618347*I]
  a11 = 0
  a12 = 1
  sig1a1 = -0.50000000000000000000000000000000000000 - 0.86602540378443864676372317075293618347*I
  sig2a1 = -0.50000000000000000000000000000000000000 + 0.86602540378443864676372317075293618347*I
  Ca1_pre = 0.400000
APP: Opening output file ../../projects/stat.la.asu.edu_NumberFields/wu_12E10_SF-3-0_Idx2_Grp9019of11232_1_0
APP: Now starting the Martinet search:
      2nd part of Martinet bound = 28.284271.
      Martinet bound = 28.684271.
            a21_L = -6.
            a21_U = 7.
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
APP: Starting GetDecics checkpoint.
APP: GetDecics checkpoint done
  ***   segmentation fault: bug in PARI or calling program.
  ***   Error in the PARI system. End of program.

</stderr_txt>
]]>


searched on "libpari" and "PARI/GP" and found verious references to the same output message.
a more current discussion thread can be found here:
http://old.nabble.com/Trapping-errors-in-library-mode-td31332062.html

This could be a hard one to debug.
Timing shows it ran for 4,890.10 seconds before the error occurred.

HTH
ID: 65 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,759,898
RAC: 288,589
Message 66 - Posted: 25 Aug 2011, 16:14:37 UTC - in response to Message 65.  

Thanks for bringing this to my attention. After reading the link, I think my best bet will be to update my version of the pari libraries. My goal for this weekend is to hunt down these last few remaining bugs.
ID: 66 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 179 - Posted: 10 Sep 2011, 6:09:19 UTC

I'm getting PARI errors too.

http://stat.la.asu.edu/NumberFields/result.php?resultid=59267
*** overflow in t_INT-->t_INT assignment.
*** Error in the PARI system. End of program.

http://stat.la.asu.edu/NumberFields/result.php?resultid=59281
*** bug in PARI/GP (Segmentation Fault), please report
*** Error in the PARI system. End of program.

If you browse through my errored tasks you'll also see I've been getting "Maximum disk size exceeded", exit code -177, but I haven't seen anymore of those since alloting more disk space to BOINC so ignore them.

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,759,898
RAC: 288,589
Message 180 - Posted: 10 Sep 2011, 22:25:32 UTC - in response to Message 179.  

I'm getting PARI errors too.

http://stat.la.asu.edu/NumberFields/result.php?resultid=59267
*** overflow in t_INT-->t_INT assignment.
*** Error in the PARI system. End of program.

http://stat.la.asu.edu/NumberFields/result.php?resultid=59281
*** bug in PARI/GP (Segmentation Fault), please report
*** Error in the PARI system. End of program.

If you browse through my errored tasks you'll also see I've been getting "Maximum disk size exceeded", exit code -177, but I haven't seen anymore of those since alloting more disk space to BOINC so ignore them.


I am aware of the pari seg faults. It seems like every time I find a bug and fix it, another one appears. Needless to say, that's on the top of my list of to-do's.

Out of curiosity, what did you need to set your maximum disk usage to? I just checked and mine was set to 20GB which I'm sure is overkill. I've noticed a few other users with exit code -177, so this information might help them set their preferences.
ID: 180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 46
Credit: 7,144,042
RAC: 0
Message 181 - Posted: 11 Sep 2011, 0:42:39 UTC - in response to Message 180.  

I'm getting PARI errors too.

http://stat.la.asu.edu/NumberFields/result.php?resultid=59267
*** overflow in t_INT-->t_INT assignment.
*** Error in the PARI system. End of program.

http://stat.la.asu.edu/NumberFields/result.php?resultid=59281
*** bug in PARI/GP (Segmentation Fault), please report
*** Error in the PARI system. End of program.

If you browse through my errored tasks you'll also see I've been getting "Maximum disk size exceeded", exit code -177, but I haven't seen anymore of those since alloting more disk space to BOINC so ignore them.


I am aware of the pari seg faults. It seems like every time I find a bug and fix it, another one appears. Needless to say, that's on the top of my list of to-do's.

Out of curiosity, what did you need to set your maximum disk usage to? I just checked and mine was set to 20GB which I'm sure is overkill. I've noticed a few other users with exit code -177, so this information might help them set their preferences.

but we should have very low disk requirements. We don't have big data sets to scan like Einsteinand the work units are tiny. I don't see how we ever triggered that.
ID: 181 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 182 - Posted: 11 Sep 2011, 6:03:42 UTC - in response to Message 181.  

I had 20 GB allotted on both of my hosts before I joined NumberFields because I run a project (Test4Theory@home) on both of them that needs 10 GB. When I noticed the disk space errors a couple days ago I bumped it up to 30 GB. That seems to have fixed my 2 core computer but I just now checked the i7 (8 cores with HT) and it's still overrunning the disk space, probably because it runs up to 7 NumberFields tasks concurrently. I've bumped that one up to 40GB now and will continue to monitor it.

I wonder how BOINC determines which project exceeds the limit? None of my other projects get that error. How does it know NumberFields is to blame and not Test4Theory, for example? All projects draw from the 1 well. It's not like they each have their individual allotment.

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 183 - Posted: 11 Sep 2011, 7:16:47 UTC - in response to Message 182.  

Hmmm. It looks like 40 GB isn't enough. I just had 3 tasks all crash at the same time with exit code -177 on my i7. They are task number 59501, 58172 and 58115. There were 7 NumberFields tasks running concurrently. Those 3 crashed at exactly the same time, the other 4 are still running.. Why not the other 4 too? Why not all 8 tasks? The max disk space was exceeded so why not all 8 tasks? What differentiates those 3 from the others?

I'm beginning to think it's not the limit in BOINC preferences that's being exceeded. I think it's the rsc_disk_bound that's exceeded. But that wouldn't give an exit code -177, unless the exit code lookup table is screwed up or the code that does the lookup is wonky?

OK, I've boosted the limit in BOINC prefs to 50 GB. We'll see what happens.

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,759,898
RAC: 288,589
Message 184 - Posted: 11 Sep 2011, 20:01:08 UTC - in response to Message 183.  

Hmmm. It looks like 40 GB isn't enough. I just had 3 tasks all crash at the same time with exit code -177 on my i7. They are task number 59501, 58172 and 58115. There were 7 NumberFields tasks running concurrently. Those 3 crashed at exactly the same time, the other 4 are still running.. Why not the other 4 too? Why not all 8 tasks? The max disk space was exceeded so why not all 8 tasks? What differentiates those 3 from the others?

I'm beginning to think it's not the limit in BOINC preferences that's being exceeded. I think it's the rsc_disk_bound that's exceeded. But that wouldn't give an exit code -177, unless the exit code lookup table is screwed up or the code that does the lookup is wonky?

OK, I've boosted the limit in BOINC prefs to 50 GB. We'll see what happens.


You shouldn't need that much memory. I think something else is wrong. The ones that crashed with code -177 had strange looking results in the stderr.txt. You might need to reset the project. I saw similar results on my machine when I ran stand alone and tried to manually change the checkpoint file (I was testing the checkpointing).
ID: 184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 185 - Posted: 12 Sep 2011, 2:54:02 UTC - in response to Message 184.  

I've reset NumberFields and lowered BOINC's disk space allottment to 20 GB. Ill let you know if I get more -177 errors.

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[BOINC@Poland]emik

Send message
Joined: 19 Aug 11
Posts: 1
Credit: 17,413
RAC: 0
Message 186 - Posted: 12 Sep 2011, 3:20:01 UTC

http://stat.la.asu.edu/NumberFields/result.php?resultid=59309

*** impossible assignment I-->I
*** Error in the PARI system. End of program.
ID: 186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,759,898
RAC: 288,589
Message 187 - Posted: 12 Sep 2011, 3:52:41 UTC - in response to Message 185.  

I've reset NumberFields and lowered BOINC's disk space allottment to 20 GB. Ill let you know if I get more -177 errors.


Ok. Thanks!
ID: 187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,759,898
RAC: 288,589
Message 188 - Posted: 12 Sep 2011, 4:00:22 UTC - in response to Message 186.  

http://stat.la.asu.edu/NumberFields/result.php?resultid=59309

*** impossible assignment I-->I
*** Error in the PARI system. End of program.


Thanks for reporting!

I have been able to replicate these errors in the pari/gp interactive shell. I submitted a bug report to them and they are currently working it. Hopefully I will have a work-around soon. The good news is it's pretty rare (about 1 or 2 for every 10000 WUs).
ID: 188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nenym

Send message
Joined: 23 Aug 11
Posts: 2
Credit: 10,004,759
RAC: 0
Message 193 - Posted: 13 Sep 2011, 1:22:42 UTC
Last modified: 13 Sep 2011, 1:26:35 UTC

One task wu_12E10_SF-3-0_Idx3_Grp12251of14586b_7 errored out after interrupted run out of checkpoint (Boinc panic mode by other project)
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
.
.
APP: GetBoundedDecics checkpointed.
  ***   overflow in t_INT-->t_INT assignment.
  ***   Error in the PARI system. End of program.

</stderr_txt>
]]>
X6 1090T Ubuntu 10.04.3 64bit
ID: 193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sysadm@Nbg

Send message
Joined: 19 Aug 11
Posts: 2
Credit: 263,904
RAC: 0
Message 196 - Posted: 14 Sep 2011, 15:52:18 UTC - in response to Message 193.  

One task wu_12E10_SF-3-0_Idx3_Grp12251of14586b_7 errored out after interrupted run out of checkpoint (Boinc panic mode by other project)
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
.
.
APP: GetBoundedDecics checkpointed.
  ***   overflow in t_INT-->t_INT assignment.
  ***   Error in the PARI system. End of program.

</stderr_txt>
]]>
X6 1090T Ubuntu 10.04.3 64bit


I can confirm this with an other workunit
ID: 196 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,759,898
RAC: 288,589
Message 197 - Posted: 14 Sep 2011, 19:19:26 UTC - in response to Message 196.  

One task wu_12E10_SF-3-0_Idx3_Grp12251of14586b_7 errored out after interrupted run out of checkpoint (Boinc panic mode by other project)
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
.
.
APP: GetBoundedDecics checkpointed.
  ***   overflow in t_INT-->t_INT assignment.
  ***   Error in the PARI system. End of program.

</stderr_txt>
]]>
X6 1090T Ubuntu 10.04.3 64bit


I can confirm this with an other workunit


This is that same pari bug that's been haunting us for weeks now. The pari people have confirmed that it's a bug and I am just waiting to hear back from them to get a fix.
ID: 197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Error in the PARI system


Main page · Your account · Message boards


Copyright © 2024 Arizona State University