Process got signal 11

Message boards : Number crunching : Process got signal 11
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile DoctorNow
Avatar

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,001,866
RAC: 0
Message 48 - Posted: 23 Aug 2011, 4:56:05 UTC

I didn't had this error for a long long time, but trying this project out on my quad with openSuse 11.2 and BOINC verison 6.6.37 I get nothing but the same error message.
Two examples:
23350, 17248
Wonder what could be wrong here, I don't have that problem with any other project anymore.

Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 48 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,001,866
RAC: 0
Message 54 - Posted: 24 Aug 2011, 5:55:53 UTC

Strange, got the same problem also on my X2 under openSuse 11.2
29755, 29735
Does no one knows what this could cause?
On what linux-distro was the app compiled? The only thing I can think of that it is simply not working with my openSuse-distribution...
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 54 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 56 - Posted: 24 Aug 2011, 7:39:04 UTC - in response to Message 54.  

Yes, that is strange. The app was compiled on Fedora Core 15 with gcc as a static app. It looks like it's getting the signal 11 before it even starts the main part of the app. Maybe it's an issue with the manager trying to initialize the app. The app uses the server stable version of the boinc code which is fairly new (and stable).
ID: 56 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AMDave

Send message
Joined: 19 Aug 11
Posts: 7
Credit: 100,106
RAC: 0
Message 58 - Posted: 24 Aug 2011, 9:21:44 UTC
Last modified: 24 Aug 2011, 9:37:34 UTC

For your windows machine, I can't suggest anything other than a project reset (see below).

But for your linux machine [edit- if you are running 64-bit linux kernel -edit], we cannot rule out the ia32lib dependency if the client app uses a 'wrapper'.
The client app may be compiled in 64-bit but we have witnessed wrappers around 64-bit client apps that still have a 32-bit dependency.
This can be resolved by installing ia32lib

But that would only be if ALL of your wu's are failing on the linux box.
If it is only some of them then it could be some extraordinary parameters in the WU that cause the client app to SIGSEGV

Reference

In the latter case (intermittent signal 11's) try resetting the project (or detach and re-attach) to get a 'clean' set of the latest project files.
Apparently this can occur when files get 'mixed up'.
A couple of days ago our machines were processing work with 3 different versions of the client app at the same time (1.01, 1.02 and 1.03).
That could have done it.

Reference

HTH
ID: 58 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,001,866
RAC: 0
Message 59 - Posted: 24 Aug 2011, 10:36:30 UTC - in response to Message 58.  

The app was compiled on Fedora Core 15 with gcc as a static app.

I'm no linux expert but shouldn't this be dynamic? Don't know, maybe I'm mixing this up with libraries.


For your windows machine, I can't suggest anything other than a project reset (see below).

No problem with crunching WUs on Windows, only the linux 64-Bit app on both comps makes problems. My AMD X2 has Win XP 32-Bit and openSuse 11.2 64-Bit on it, my AMD X4 has three OS: Win Vista, Win 7 and openSuse 11.2, all 64-Bit.
So I'm currently crunching only with my 32-bit box here.

But for your linux machine [edit- if you are running 64-bit linux kernel -edit], we cannot rule out the ia32lib dependency if the client app uses a 'wrapper'.
The client app may be compiled in 64-bit but we have witnessed wrappers around 64-bit client apps that still have a 32-bit dependency.
This can be resolved by installing ia32lib

I think I have installed both the 64-bit-libs AND the 32-bit libs on BOTH machines as I already crunched other projects on Linux which only worked with 32-bit. But I will check it later today to be on the safe side, maybe something broke there. If it isn't that, I don't know what libraries I should install otherwise.
Besides that, I saw it's an x86_64 app which is downloaded, so probably there is no 32-bit-lib necessary anyway, isn't it?

But that would only be if ALL of your wu's are failing on the linux box.

Yep, exactly that is the case, on both machines. :-(

In the latter case (intermittent signal 11's) try resetting the project (or detach and re-attach) to get a 'clean' set of the latest project files.
Apparently this can occur when files get 'mixed up'.

I don't think there was something mixed up as I'm not crunching much on Linux, and this was a complete new attach to NumberFields on both comps just to see how it worked in comparison to the 32-bit Win app, escpecially with the credits.

Well, doesn't matter, if I don't get this to work on Linux I will wait for a Windows 64-bit app. I hope there will be one?
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 59 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AMDave

Send message
Joined: 19 Aug 11
Posts: 7
Credit: 100,106
RAC: 0
Message 60 - Posted: 24 Aug 2011, 12:53:19 UTC - in response to Message 59.  

Thanks for the quick reply.
I'm out of ideas then.
Your experience makes it more likely to be some extraordinary WU parameters causing the client app to crash then.
I checked my results and I have had a few errors on 64-bit Linux and Win7, but not many.
Perhaps admin can query the results database and see if there are any trends in errors.
Sounds like fun ;)
ID: 60 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 62 - Posted: 24 Aug 2011, 16:46:16 UTC - in response to Message 59.  

I'm no linux expert but shouldn't this be dynamic? Don't know, maybe I'm mixing this up with libraries.


I think it should be static, because then you dont have to worry about different users having different versions of libraries, which can cause crashes.

Well, doesn't matter, if I don't get this to work on Linux I will wait for a Windows 64-bit app. I hope there will be one?


Yes, we are hoping to get that ported to 64bit windows, as well as 32bit linux and mac.

Your experience makes it more likely to be some extraordinary WU parameters causing the client app to crash then.
I checked my results and I have had a few errors on 64-bit Linux and Win7, but not many.
Perhaps admin can query the results database and see if there are any trends in errors.
Sounds like fun ;)


Yes, thats a good idea. I plan to look into these computation errors starting on Friday, when I finally will have some time. I have noticed a handful of failures on my own machines too, so its not just certain hosts. I ran one of those failures stand alone and it completed just fine, so this might be a hard problem to track down.

Eric
ID: 62 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
glennpat

Send message
Joined: 23 Aug 11
Posts: 5
Credit: 84,950,248
RAC: 0
Message 68 - Posted: 28 Aug 2011, 21:15:47 UTC

I was getting the same error (signal 11) on my older openSuse versions. I don't remember what versions they were at. I moved up to 11.4 and it worked. I do have one computer that was on 11.3 and it is working fine at that level, but it also has intel processor instead of AMD Athon 64 X2 processors.
ID: 68 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,667,438
RAC: 287,904
Message 70 - Posted: 29 Aug 2011, 3:14:01 UTC - in response to Message 68.  

Good to know. Thank You!
ID: 70 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 102 - Posted: 2 Sep 2011, 9:24:40 UTC - in response to Message 62.  

I'm no linux expert but shouldn't this be dynamic? Don't know, maybe I'm mixing this up with libraries.


I think it should be static, because then you dont have to worry about different users having different versions of libraries, which can cause crashes.


I agree. If your app was being installed by a package manager then any missing shared libs would be installed as needed but it's not being installed via package manager. Leave it static, please. It's not using a lot of memory so its not a problem.

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeff17
Avatar

Send message
Joined: 17 Aug 11
Posts: 3
Credit: 4,606,840
RAC: 0
Message 214 - Posted: 21 Sep 2011, 4:47:42 UTC
Last modified: 21 Sep 2011, 4:48:40 UTC

I am having similar problems on my RHEL5 x64 hosts, all task almost immediately end with "signal 11" errors. Your said you compiled on Fedora 15 so I guessed the issue might be related to a newer glibc. So I tested on my RHEL6 host and it ran it's first task with no problems. Next task is not running as quickly but didn't die immediately so looks good.

Doing some testing on the RHEL5 computer
% ldd GetBoundedDecics_1.07_x86_64-pc-linux-gnu
not a dynamic executable

% ./GetBoundedDecics_1.07_x86_64-pc-linux-gnu
FATAL: kernel too old
Segmentation fault (core dumped)

RHEL5
glibc-2.5
kernel-2.6.18

RHEL6
glibc-2.12
kernel-2.6.32

I am not positive what the difference is that matters but something changed enough between older and newer versions of the Linux distros that older ones cannot run this static executable.

From other reports here and my tests it looks like:
openSuse 11.2 - does not work
openSuse 11.3 & 11.4 - works
RHEL 5 - does not work
RHEL 6 - works
Fedora 15 - works
Fedora 12,13,14 - assume works based on RHEL6 working

Can anyone else confirm/deny my version conclusions?
If this information is correct can it be documented somewhere by the project admins.
ID: 214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 215 - Posted: 21 Sep 2011, 5:43:21 UTC - in response to Message 214.  

ID: 215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 3 Sep 11
Posts: 30
Credit: 7,701,817
RAC: 3,703
Message 216 - Posted: 21 Sep 2011, 13:26:52 UTC

Also working on Fedora 11 (my Fedora 14 machine like Dagorath's also running fine).

Conan
ID: 216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,001,866
RAC: 0
Message 217 - Posted: 21 Sep 2011, 14:04:55 UTC - in response to Message 214.  

glennpat wrote:
I moved up to 11.4 and it worked...

Well, I finally decided to move up to 11.4 also because my previous Suse version is not supported anymore anyway. Wasn't sure about the upgrade because I feared to lose all my settings, bookmarks etc. which I done before, but it seems I did the upgrade the correct way and all is fine. :-)

Jeff17 wrote:
openSuse 11.3 & 11.4 - works

Nevertheless, I wish I could run the project now.
I don't even get BOINC running on 11.4 and I don't know what's wrong.
I always installed BOINC with the sh command on the desktop and then runned BOINC and BOINCmgr by clicking on the icons.
But it looks like this doesn't work anymore with 11.4, not even with a startup script which I already tried.
On the sys monitor I can see that BOINC remains in memory when I click it but the BOINCmanager itself simply doesn't start. Rights etc. are all set up correct.
I don't get it...
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 46
Credit: 7,144,042
RAC: 0
Message 219 - Posted: 21 Sep 2011, 18:15:07 UTC - in response to Message 214.  

I am having similar problems on my RHEL5 x64 hosts,
...
% ./GetBoundedDecics_1.07_x86_64-pc-linux-gnu
FATAL: kernel too old
Segmentation fault (core dumped)


We fixed the issue that was causing some Suse distros to fail. However, it looks like we are using a syscall that was added since kernel 2.6.18. I'm not sure which one or if we can avoid it. Looking through the kernel git log for the syscall table I think the last one was added in 2008. Perhaps you could run an strace on your RHEL5 system to see which syscall fails and I can look up when it was added.

ID: 219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 221 - Posted: 21 Sep 2011, 19:43:10 UTC - in response to Message 217.  
Last modified: 21 Sep 2011, 19:43:37 UTC

Nevertheless, I wish I could run the project now.
I don't even get BOINC running on 11.4 and I don't know what's wrong.
I always installed BOINC with the sh command on the desktop and then runned BOINC and BOINCmgr by clicking on the icons.
But it looks like this doesn't work anymore with 11.4, not even with a startup script which I already tried.
On the sys monitor I can see that BOINC remains in memory when I click it but the BOINCmanager itself simply doesn't start. Rights etc. are all set up correct.
I don't get it...


If you installed BOINC 6.12.x you can run into this problem. The client and manager binaries are no longer static builds so you can be missing some shared libraries, depending on your Linux distro and version.

It sounds like BOINC client (boinc in sys monitor) is OK for you. It's just BOINC manager that needs some additional shared libraries. Do you know how to identify which libraries it needs and how to find and install them?

Another, easier, fix for this problem is to use the manager from BOINC 6.10.58. Or install BOINC from repos but you don't want to do that if you use a GPU for crunching.
BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,001,866
RAC: 0
Message 223 - Posted: 22 Sep 2011, 5:39:46 UTC - in response to Message 221.  

If you installed BOINC 6.12.x you can run into this problem. The client and manager binaries are no longer static builds so you can be missing some shared libraries, depending on your Linux distro and version.

It sounds like BOINC client (boinc in sys monitor) is OK for you. It's just BOINC manager that needs some additional shared libraries. Do you know how to identify which libraries it needs and how to find and install them?

Another, easier, fix for this problem is to use the manager from BOINC 6.10.58. Or install BOINC from repos but you don't want to do that if you use a GPU for crunching.

Thank you for the tips.
Looks like I found the problem myself by accident!
I'm (mostly) using a self-compiled optimized BOINC client from an team mate for Suse, last one (6.10.60) was that also.
But "inspired" by your 6.10.58 advice I now downloaded and tried the official BOINC version, and this one works!
Don't know what my team mate did wrong there, normally it should work also...
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
fractal

Send message
Joined: 12 Jul 12
Posts: 9
Credit: 10,000,929
RAC: 0
Message 687 - Posted: 13 Jul 2012, 1:09:02 UTC - in response to Message 219.  

I am having similar problems on my RHEL5 x64 hosts,
...
% ./GetBoundedDecics_1.07_x86_64-pc-linux-gnu
FATAL: kernel too old
Segmentation fault (core dumped)


We fixed the issue that was causing some Suse distros to fail. However, it looks like we are using a syscall that was added since kernel 2.6.18. I'm not sure which one or if we can avoid it. Looking through the kernel git log for the syscall table I think the last one was added in 2008. Perhaps you could run an strace on your RHEL5 system to see which syscall fails and I can look up when it was added.


boinc@plum:~/BOINC/projects/numberfields.asu.edu_NumberFields$ ./GetBoundedDecics_2.03_x86_64-pc-linux-gnu
FATAL: kernel too old
Segmentation fault (core dumped)
boinc@plum:~/BOINC/projects/numberfields.asu.edu_NumberFields$ strace ./GetBoundedDecics_2.03_x86_64-pc-linux-gnu
execve("./GetBoundedDecics_2.03_x86_64-pc-linux-gnu", ["./GetBoundedDecics_2.03_x86_64-p"...], [/* 19 vars */]) = 0
uname({sys="Linux", node="plum", ...}) = 0
open("/dev/tty", O_RDWR|O_NONBLOCK|O_NOCTTY) = 3
writev(3, [{"FATAL: kernel too old\n", 22}], 1FATAL: kernel too old
) = 22
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aece8177000
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV (core dumped) +++
Process 12651 detached
boinc@plum:~/BOINC/projects/numberfields.asu.edu_NumberFields$ uname -a
Linux plum 2.6.20-15-generic #2 SMP Sun Apr 15 06:17:24 UTC 2007 x86_64 GNU/Linux

-----

I will upgrade that machine to a modern kernel now that I found this thread.
ID: 687 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Process got signal 11


Main page · Your account · Message boards


Copyright © 2024 Arizona State University