Message boards :
Number crunching :
Comp errors on 1.05 for Gentoo Linux
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 ![]() |
I get instant compute errors with 1.05 tasks. <core_client_version>6.12.33</core_client_version> |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
It looks like the error is occurring right after start-up before the real meat of the code. That's very strange. Could you possibly reset the project to see if the problem goes away. It looks like user Bok might be having the same problem, but other than that it seems to be working fine on other linux platforms. |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 ![]() |
I just reset the project and still get the same thing unfortunately. Do you need an strace ? |
![]() ![]() Send message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 ![]() |
1.03 works, 1.05 doesn't. Not sure what a reset would have to do with that. If it were a dynamic executable I could see what library is missing. There's probably a version mismatch of some library. I seem to be running into the problem more often that apps will run for Ubuntu, but not so much for other distros. I don't think Bok runs Ubuntu either. So, just a guess, perhaps 1.05 was hard coded against whatever library versions of a specific distro the compiling computer is currently at, but other distros aren't using the same library versions? I just had to upgrade 'gmp' on one of my Gentoo boxes to run Yafu. Perhaps some library of the compiling computer for 1.05 got upgraded ahead of other distros? Was it the Fedora computer or some other? I'd be willing to upgrade a library if I knew which one it was. |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 ![]() |
I run CentOS6 on this box. Agree on the library issue, I tried an ldd first as well but with it being dynamic.. If I knew how to put in the arguments I could run strace on it, tried a few things but it's obviously not correct, strace doesn't seem to give too much and fails with - --- SIGFPE (Floating point exception) @ 0 (0) --- +++ killed by SIGFPE (core dumped) +++ Floating point exception (core dumped) but I can't tell whether that's because of the incorrect argument. |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
With this recent build I did upgrade to the latest version of pari/gp (c library for number theory). When I built this library, it had the option to use gmp. Because it improve's performance, I decided to use it. The old executable did not. So this might be the problem. My dynamic version is libgmp.so.3.5.2. With that said, this was supposed to be a static build. I just did a rebuild and verified that the linker is using libgmp.a, the static version. But I guess it's worth a shot, if you want to upgrade to version 3.5.2 of gmp and see if that's the problem. And sorry for causing you guys such a headache. |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
I just had to upgrade 'gmp' on one of my Gentoo boxes to run Yafu. Perhaps some library of the compiling computer for 1.05 got upgraded ahead of other distros? Was it the Fedora computer or some other? Oops, I missed your question the first time. This was built on Fedora 15. |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
I run CentOS6 on this box. Agree on the library issue, I tried an ldd first as well but with it being dynamic.. I think you are basically talking about how to run stand-alone. If you can capture one of the WUs, rename it to "in" and put it in the same directory with the binary. Then just run the binary (no command line args). Thanks for taking the time to help me debug this thing! |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 ![]() |
3.5.0 is the latest release of gmp available for CentOS6 from the main repos. Whilst I can update to the latest with a build I tend to try and keep to the repos on my database servers (which this is). I'll see if I can find an update on one of the 3rd party repos like atomic. p.s. I tried running strace after renaming the file to 'in' but it didn't give anything more than a floating point exception and core dumped. Anything else to try in the meantime? |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
3.5.0 is the latest release of gmp available for CentOS6 from the main repos. Whilst I can update to the latest with a build I tend to try and keep to the repos on my database servers (which this is). I have no other ideas. Let's wait to see if 7ri9991 can easily upgrade to the same version and if the problem goes away for him. Then we will know for sure if this is the problem. In the meantime, I will look into what I can do on my side. I still don't understand why my static executable is relying on dynamic libraries. If that's the case, what's the point of building a static executable? |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 ![]() |
If it helps at all here is the strace. [root@dbase2 stat.la.asu.edu_NumberFields]# strace ./GetBoundedDecics_1.05_x86_64-pc-linux-gnu execve("./GetBoundedDecics_1.05_x86_64-pc-linux-gnu", ["./GetBoundedDecics_1.05_x86_64-p"...], [/* 22 vars */]) = 0 uname({sys="Linux", node="dbase2", ...}) = 0 brk(0) = 0xea6000 brk(0xea71c0) = 0xea71c0 arch_prctl(ARCH_SET_FS, 0xea68a0) = 0 set_tid_address(0xea6b70) = 16342 set_robust_list(0xea6b80, 0x18) = 0 rt_sigaction(SIGRTMIN, {0x7ee090, [], SA_RESTORER|SA_SIGINFO, 0x7f6620}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {0x7ee120, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f6620}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 brk(0xec81c0) = 0xec81c0 brk(0xec9000) = 0xec9000 futex(0xbbf260, FUTEX_WAKE_PRIVATE, 2147483647) = 0 open("stderr.txt", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=554, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf4000 fstat(3, {st_mode=S_IFREG|0644, st_size=554, ...}) = 0 lseek(3, 554, SEEK_SET) = 554 dup3(3, 2, 0) = 2 close(3) = 0 munmap(0x7f8d7adf4000, 4096) = 0 rt_sigaction(SIGILL, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGILL, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0 rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGABRT, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0 rt_sigaction(SIGBUS, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGBUS, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0 rt_sigaction(SIGSEGV, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGSEGV, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0 rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGSYS, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0 rt_sigaction(SIGPIPE, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGPIPE, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0 open("init_data.xml", O_RDONLY) = -1 ENOENT (No such file or directory) brk(0xec8000) = 0xec8000 open("boinc_lockfile", O_WRONLY|O_CREAT, 0664) = 3 fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 stat("init_data.xml", 0x7fff459486d0) = -1 ENOENT (No such file or directory) open("/etc/localtime", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0 fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf4000 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 4096) = 3519 lseek(4, -2252, SEEK_CUR) = 1267 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 4096) = 2252 lseek(4, 3518, SEEK_SET) = 3518 close(4) = 0 munmap(0x7f8d7adf4000, 4096) = 0 write(2, "10:16:14 (16342): Can't open ini"..., 73) = 73 mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8d7adf1000 mprotect(0x7f8d7adf1000, 4096, PROT_NONE) = 0 clone(child_stack=0x7f8d7adf3e50, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8d7adf49d0, tls=0x7f8d7adf4700, child_tidptr=0x7f8d7adf49d0) = 16343 rt_sigaction(SIGALRM, {0x75e340, [], SA_RESTORER|SA_RESTART, 0x7f6620}, NULL, 8) = 0 setitimer(ITIMER_REAL, {it_interval={0, 100000}, it_value={0, 100000}}, NULL) = 0 lstat("in", {st_mode=S_IFREG|0644, st_size=414, ...}) = 0 stat("in", {st_mode=S_IFREG|0644, st_size=414, ...}) = 0 open("in", O_RDONLY) = 4 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 fstat(4, {st_mode=S_IFREG|0644, st_size=414, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf0000 read(4, "[y^2 + 3, [0, 1], -3, 2, [Mat([1"..., 4096) = 414 close(4) = 0 munmap(0x7f8d7adf0000, 4096) = 0 lstat("out", 0x7fff459488c0) = -1 ENOENT (No such file or directory) stat("out", 0x7fff45948890) = -1 ENOENT (No such file or directory) lstat("GetBoundedDecics_state", 0x7fff459488c0) = -1 ENOENT (No such file or directory) stat("GetBoundedDecics_state", 0x7fff45948890) = -1 ENOENT (No such file or directory) stat("GetBoundedDecics_state", 0x7fff45948ac0) = -1 ENOENT (No such file or directory) write(2, "APP: Checkpoint Flag = 0.\n", 26) = 26 write(2, "APP: a21 Starting Value = -10000"..., 37) = 37 write(2, "APP: a32 Starting Value = -10000"..., 37) = 37 write(2, "APP: PolyCount starting value = "..., 35) = 35 write(2, "APP: Stat Count 1 = 0.\n", 23) = 23 write(2, "APP: Stat Count 2 = 0.\n", 23) = 23 write(2, "APP: Stat Count 3 = 0.\n", 23) = 23 getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 geteuid() = 0 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(4) = 0 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(4) = 0 open("/etc/nsswitch.conf", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=1688, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf0000 read(4, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1688 read(4, "", 4096) = 0 close(4) = 0 munmap(0x7f8d7adf0000, 4096) = 0 open("/etc/ld.so.cache", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=71751, ...}) = 0 mmap(NULL, 71751, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f8d7addf000 close(4) = 0 open("/lib64/libnss_files.so.2", O_RDONLY) = 4 read(4, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20!\0\0\0\0\0\0"..., 832) = 832 fstat(4, {st_mode=S_IFREG|0755, st_size=61624, ...}) = 0 mmap(NULL, 2147728, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x7f8d7abd2000 mprotect(0x7f8d7abde000, 2093056, PROT_NONE) = 0 mmap(0x7f8d7addd000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0xb000) = 0x7f8d7addd000 close(4) = 0 open("/lib64/libc.so.6", O_RDONLY) = 4 read(4, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\355!v:\0\0\0"..., 832) = 832 fstat(4, {st_mode=S_IFREG|0755, st_size=1838296, ...}) = 0 mmap(0x3a76200000, 3664040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x3a76200000 mprotect(0x3a76375000, 2097152, PROT_NONE) = 0 mmap(0x3a76575000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x175000) = 0x3a76575000 mmap(0x3a7657a000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3a7657a000 close(4) = 0 open("/lib64/ld-linux-x86-64.so.2", O_RDONLY) = 4 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\n\240u:\0\0\0"..., 832) = 832 fstat(4, {st_mode=S_IFREG|0755, st_size=150672, ...}) = 0 mmap(0x3a75a00000, 2228520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x3a75a00000 mprotect(0x3a75a1e000, 2097152, PROT_NONE) = 0 mmap(0x3a75c1e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1e000) = 0x3a75c1e000 mmap(0x3a75c20000, 296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3a75c20000 close(4) = 0 mprotect(0x3a75c1e000, 4096, PROT_READ) = 0 mprotect(0x3a76575000, 16384, PROT_READ) = 0 mprotect(0x7f8d7addd000, 4096, PROT_READ) = 0 munmap(0x7f8d7addf000, 71751) = 0 mmap(NULL, 4294967296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8c7abd2000 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 4 fcntl(4, F_GETFD) = 0x1 (flags FD_CLOEXEC) fstat(4, {st_mode=S_IFREG|0644, st_size=1731, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf0000 read(4, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1731 close(4) = 0 munmap(0x7f8d7adf0000, 4096) = 0 --- SIGFPE (Floating point exception) @ 0 (0) --- +++ killed by SIGFPE (core dumped) +++ |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
Thanks Bok! Nothing stands out. Maybe Greg (the other admin) will notice something I don't. |
![]() ![]() Send message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 ![]() |
3.5.0 is the latest release of gmp available for CentOS6 from the main repos. Whilst I can update to the latest with a build I tend to try and keep to the repos on my database servers (which this is). Well, as I said, I had to upgrade gmp to run Yafu. I'm at gmp-5.0.2. I'm willing to upgrade, not downgrade. The lowest even available to me is 4.3.2. Perhaps Gentoo versioning is slightly different than Fedora versioning? Is it possible for 3.5.2 on Fedora to equal 5.0.2 on Gentoo? |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 ![]() |
I think that's your gmp version, look for libgmp and see what that is. find / -name libgmp* should be in /usr/lib64/ |
![]() ![]() Send message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 ![]() |
I think that's your gmp version, look for libgmp and see what that is. locate libgmp /usr/lib64/libgmp.a /usr/lib64/libgmp.la /usr/lib64/libgmp.so /usr/lib64/libgmp.so.10 /usr/lib64/libgmp.so.10.0.2 /usr/lib64/libgmpxx.a /usr/lib64/libgmpxx.la /usr/lib64/libgmpxx.so /usr/lib64/libgmpxx.so.4 /usr/lib64/libgmpxx.so.4.2.2 /usr/lib64/engines/libgmp.so |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
Ok. I backed out one of the "fixes" I made. It looks like the program was dying right around the call to a function which "initializes" a stack-like workspace. One of the inputs to this initialization routine is a max_prime which it uses to create a table of primes. The table of primes is supposed to make factorization of large numbers faster. The bigger this table, the better. Anyways, I lowered it back down to the original value I had been using (1 million if you care to know). I only made this change on the linux version, as the windows version was fine. The new linux version is 1.06 Give this one a shot and lets see what happens. If this doesnt fix the problem, then the next step would be to back out to the older version of the pari/gp library with gmp disable. |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
It looks like you're still having the problem with version 1.06. At least I know basically where the problem is now. |
![]() ![]() Send message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 ![]() |
It looks like you're still having the problem with version 1.06. Been keeping an eye on me? |
![]() Send message Joined: 8 Jul 11 Posts: 1355 Credit: 572,666,992 RAC: 778,017 ![]() ![]() ![]() |
It looks like you're still having the problem with version 1.06. Yep. I got nothin better to do than monitor this server... |
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 ![]() |
Yeah, same thing for mine with 1.06. |