Message boards :
Number crunching :
Process got signal 11
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,001,866 RAC: 0 |
I didn't had this error for a long long time, but trying this project out on my quad with openSuse 11.2 and BOINC verison 6.6.37 I get nothing but the same error message. Two examples: 23350, 17248 Wonder what could be wrong here, I don't have that problem with any other project anymore. Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,001,866 RAC: 0 |
Strange, got the same problem also on my X2 under openSuse 11.2 29755, 29735 Does no one knows what this could cause? On what linux-distro was the app compiled? The only thing I can think of that it is simply not working with my openSuse-distribution... Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 8 Jul 11 Posts: 1326 Credit: 414,630,572 RAC: 263,122 |
Yes, that is strange. The app was compiled on Fedora Core 15 with gcc as a static app. It looks like it's getting the signal 11 before it even starts the main part of the app. Maybe it's an issue with the manager trying to initialize the app. The app uses the server stable version of the boinc code which is fairly new (and stable). |
Send message Joined: 19 Aug 11 Posts: 7 Credit: 100,106 RAC: 0 |
For your windows machine, I can't suggest anything other than a project reset (see below). But for your linux machine [edit- if you are running 64-bit linux kernel -edit], we cannot rule out the ia32lib dependency if the client app uses a 'wrapper'. The client app may be compiled in 64-bit but we have witnessed wrappers around 64-bit client apps that still have a 32-bit dependency. This can be resolved by installing ia32lib But that would only be if ALL of your wu's are failing on the linux box. If it is only some of them then it could be some extraordinary parameters in the WU that cause the client app to SIGSEGV Reference In the latter case (intermittent signal 11's) try resetting the project (or detach and re-attach) to get a 'clean' set of the latest project files. Apparently this can occur when files get 'mixed up'. A couple of days ago our machines were processing work with 3 different versions of the client app at the same time (1.01, 1.02 and 1.03). That could have done it. Reference HTH |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,001,866 RAC: 0 |
The app was compiled on Fedora Core 15 with gcc as a static app. I'm no linux expert but shouldn't this be dynamic? Don't know, maybe I'm mixing this up with libraries. For your windows machine, I can't suggest anything other than a project reset (see below). No problem with crunching WUs on Windows, only the linux 64-Bit app on both comps makes problems. My AMD X2 has Win XP 32-Bit and openSuse 11.2 64-Bit on it, my AMD X4 has three OS: Win Vista, Win 7 and openSuse 11.2, all 64-Bit. So I'm currently crunching only with my 32-bit box here. But for your linux machine [edit- if you are running 64-bit linux kernel -edit], we cannot rule out the ia32lib dependency if the client app uses a 'wrapper'. I think I have installed both the 64-bit-libs AND the 32-bit libs on BOTH machines as I already crunched other projects on Linux which only worked with 32-bit. But I will check it later today to be on the safe side, maybe something broke there. If it isn't that, I don't know what libraries I should install otherwise. Besides that, I saw it's an x86_64 app which is downloaded, so probably there is no 32-bit-lib necessary anyway, isn't it? But that would only be if ALL of your wu's are failing on the linux box. Yep, exactly that is the case, on both machines. :-( In the latter case (intermittent signal 11's) try resetting the project (or detach and re-attach) to get a 'clean' set of the latest project files. I don't think there was something mixed up as I'm not crunching much on Linux, and this was a complete new attach to NumberFields on both comps just to see how it worked in comparison to the 32-bit Win app, escpecially with the credits. Well, doesn't matter, if I don't get this to work on Linux I will wait for a Windows 64-bit app. I hope there will be one? Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 19 Aug 11 Posts: 7 Credit: 100,106 RAC: 0 |
Thanks for the quick reply. I'm out of ideas then. Your experience makes it more likely to be some extraordinary WU parameters causing the client app to crash then. I checked my results and I have had a few errors on 64-bit Linux and Win7, but not many. Perhaps admin can query the results database and see if there are any trends in errors. Sounds like fun ;) |
Send message Joined: 8 Jul 11 Posts: 1326 Credit: 414,630,572 RAC: 263,122 |
I'm no linux expert but shouldn't this be dynamic? Don't know, maybe I'm mixing this up with libraries. I think it should be static, because then you dont have to worry about different users having different versions of libraries, which can cause crashes. Well, doesn't matter, if I don't get this to work on Linux I will wait for a Windows 64-bit app. I hope there will be one? Yes, we are hoping to get that ported to 64bit windows, as well as 32bit linux and mac. Your experience makes it more likely to be some extraordinary WU parameters causing the client app to crash then. Yes, thats a good idea. I plan to look into these computation errors starting on Friday, when I finally will have some time. I have noticed a handful of failures on my own machines too, so its not just certain hosts. I ran one of those failures stand alone and it completed just fine, so this might be a hard problem to track down. Eric |
Send message Joined: 23 Aug 11 Posts: 5 Credit: 84,950,248 RAC: 0 |
I was getting the same error (signal 11) on my older openSuse versions. I don't remember what versions they were at. I moved up to 11.4 and it worked. I do have one computer that was on 11.3 and it is working fine at that level, but it also has intel processor instead of AMD Athon 64 X2 processors. |
Send message Joined: 8 Jul 11 Posts: 1326 Credit: 414,630,572 RAC: 263,122 |
Good to know. Thank You! |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
I'm no linux expert but shouldn't this be dynamic? Don't know, maybe I'm mixing this up with libraries. I agree. If your app was being installed by a package manager then any missing shared libs would be installed as needed but it's not being installed via package manager. Leave it static, please. It's not using a lot of memory so its not a problem. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 17 Aug 11 Posts: 3 Credit: 4,606,840 RAC: 0 |
I am having similar problems on my RHEL5 x64 hosts, all task almost immediately end with "signal 11" errors. Your said you compiled on Fedora 15 so I guessed the issue might be related to a newer glibc. So I tested on my RHEL6 host and it ran it's first task with no problems. Next task is not running as quickly but didn't die immediately so looks good. Doing some testing on the RHEL5 computer % ldd GetBoundedDecics_1.07_x86_64-pc-linux-gnu not a dynamic executable % ./GetBoundedDecics_1.07_x86_64-pc-linux-gnu FATAL: kernel too old Segmentation fault (core dumped) RHEL5 glibc-2.5 kernel-2.6.18 RHEL6 glibc-2.12 kernel-2.6.32 I am not positive what the difference is that matters but something changed enough between older and newer versions of the Linux distros that older ones cannot run this static executable. From other reports here and my tests it looks like: openSuse 11.2 - does not work openSuse 11.3 & 11.4 - works RHEL 5 - does not work RHEL 6 - works Fedora 15 - works Fedora 12,13,14 - assume works based on RHEL6 working Can anyone else confirm/deny my version conclusions? If this information is correct can it be documented somewhere by the project admins. |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
Works on Fedora 14 and Ubuntu 10.10. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 3 Sep 11 Posts: 30 Credit: 8,040,545 RAC: 5,399 |
Also working on Fedora 11 (my Fedora 14 machine like Dagorath's also running fine). Conan |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,001,866 RAC: 0 |
glennpat wrote: I moved up to 11.4 and it worked... Well, I finally decided to move up to 11.4 also because my previous Suse version is not supported anymore anyway. Wasn't sure about the upgrade because I feared to lose all my settings, bookmarks etc. which I done before, but it seems I did the upgrade the correct way and all is fine. :-) Jeff17 wrote: openSuse 11.3 & 11.4 - works Nevertheless, I wish I could run the project now. I don't even get BOINC running on 11.4 and I don't know what's wrong. I always installed BOINC with the sh command on the desktop and then runned BOINC and BOINCmgr by clicking on the icons. But it looks like this doesn't work anymore with 11.4, not even with a startup script which I already tried. On the sys monitor I can see that BOINC remains in memory when I click it but the BOINCmanager itself simply doesn't start. Rights etc. are all set up correct. I don't get it... Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 8 Jul 11 Posts: 46 Credit: 7,144,042 RAC: 0 |
I am having similar problems on my RHEL5 x64 hosts, We fixed the issue that was causing some Suse distros to fail. However, it looks like we are using a syscall that was added since kernel 2.6.18. I'm not sure which one or if we can avoid it. Looking through the kernel git log for the syscall table I think the last one was added in 2008. Perhaps you could run an strace on your RHEL5 system to see which syscall fails and I can look up when it was added. |
Send message Joined: 2 Sep 11 Posts: 57 Credit: 1,274,345 RAC: 0 |
Nevertheless, I wish I could run the project now. If you installed BOINC 6.12.x you can run into this problem. The client and manager binaries are no longer static builds so you can be missing some shared libraries, depending on your Linux distro and version. It sounds like BOINC client (boinc in sys monitor) is OK for you. It's just BOINC manager that needs some additional shared libraries. Do you know how to identify which libraries it needs and how to find and install them? Another, easier, fix for this problem is to use the manager from BOINC 6.10.58. Or install BOINC from repos but you don't want to do that if you use a GPU for crunching. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,001,866 RAC: 0 |
If you installed BOINC 6.12.x you can run into this problem. The client and manager binaries are no longer static builds so you can be missing some shared libraries, depending on your Linux distro and version. Thank you for the tips. Looks like I found the problem myself by accident! I'm (mostly) using a self-compiled optimized BOINC client from an team mate for Suse, last one (6.10.60) was that also. But "inspired" by your 6.10.58 advice I now downloaded and tried the official BOINC version, and this one works! Don't know what my team mate did wrong there, normally it should work also... Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 12 Jul 12 Posts: 9 Credit: 10,000,929 RAC: 0 |
I am having similar problems on my RHEL5 x64 hosts, boinc@plum:~/BOINC/projects/numberfields.asu.edu_NumberFields$ ./GetBoundedDecics_2.03_x86_64-pc-linux-gnu FATAL: kernel too old Segmentation fault (core dumped) boinc@plum:~/BOINC/projects/numberfields.asu.edu_NumberFields$ strace ./GetBoundedDecics_2.03_x86_64-pc-linux-gnu execve("./GetBoundedDecics_2.03_x86_64-pc-linux-gnu", ["./GetBoundedDecics_2.03_x86_64-p"...], [/* 19 vars */]) = 0 uname({sys="Linux", node="plum", ...}) = 0 open("/dev/tty", O_RDWR|O_NONBLOCK|O_NOCTTY) = 3 writev(3, [{"FATAL: kernel too old\n", 22}], 1FATAL: kernel too old ) = 22 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aece8177000 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV (core dumped) +++ Process 12651 detached boinc@plum:~/BOINC/projects/numberfields.asu.edu_NumberFields$ uname -a Linux plum 2.6.20-15-generic #2 SMP Sun Apr 15 06:17:24 UTC 2007 x86_64 GNU/Linux ----- I will upgrade that machine to a modern kernel now that I found this thread. |