1)
Message boards :
Number crunching :
SIGILL on Minimum Discriminant Septics WUs
(Message 2055)
Posted 18 May 2018 by mugnaio [TNAA] Post: Could you reboot your system to see if the errors go away, before I go down a rabbit hole looking for obscure memory bugs? I rebooted a couple of days ago and ran a hundred Septics jobs without further errors... |
2)
Message boards :
Number crunching :
SIGILL on Minimum Discriminant Septics WUs
(Message 2043)
Posted 8 May 2018 by mugnaio [TNAA] Post: Could you reboot your system to see if the errors go away, before I go down a rabbit hole looking for obscure memory bugs? At this time I'm processing Decic Fields wus and they seem to run fine. I'll reboot and switch back to Septics in the next days when I can check them closer. |
3)
Message boards :
Number crunching :
SIGILL on Minimum Discriminant Septics WUs
(Message 2039)
Posted 7 May 2018 by mugnaio [TNAA] Post: I will look into this when I get home later. This same host processed successfully some of those and many others in the last 2 days; its id is 87320. Many of the WUs were resent and completed successfully, others are still in progress and some others went on error for different causes, so I think that you may be right. If I'm not wrong this is the stock 7.6 client for Debian 9, running on an AMD Ryzen. Feel free to ask if you need more infos. |
4)
Message boards :
Number crunching :
SIGILL on Minimum Discriminant Septics WUs
(Message 2035)
Posted 7 May 2018 by mugnaio [TNAA] Post: Yesterday I noticed that may (>80) septics WUs ended in error. Looking at the stderr output of some of them I saw that all of them failed with the same error, that is SIGILL: illegal instruction Stack trace (4 frames): [0x40a132] [0xa7f370] [0x7d84c2] [0x7ffef7767c50] (taken from https://numberfields.asu.edu/NumberFields/result.php?resultid=26488960). I noticed that the last address in the stack trace changes, while the first three seem to be always the same. Sometimes (e.g. https://numberfields.asu.edu/NumberFields/result.php?resultid=26492750) the message points to a bug: SIGILL: illegal instruction *** bug in PARI/GP (Segmentation Fault), please report. *** Error in the PARI system. End of program. The time when the WU ends ranges from 200 to 20000 seconds. |