Message boards :
Number crunching :
Comp errors on 1.05 for Gentoo Linux
Message board moderation
| Author | Message |
|---|---|
triggglSend message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 |
I get instant compute errors with 1.05 tasks. <core_client_version>6.12.33</core_client_version> |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
It looks like the error is occurring right after start-up before the real meat of the code. That's very strange. Could you possibly reset the project to see if the problem goes away. It looks like user Bok might be having the same problem, but other than that it seems to be working fine on other linux platforms. |
|
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 |
I just reset the project and still get the same thing unfortunately. Do you need an strace ? |
triggglSend message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 |
1.03 works, 1.05 doesn't. Not sure what a reset would have to do with that. If it were a dynamic executable I could see what library is missing. There's probably a version mismatch of some library. I seem to be running into the problem more often that apps will run for Ubuntu, but not so much for other distros. I don't think Bok runs Ubuntu either. So, just a guess, perhaps 1.05 was hard coded against whatever library versions of a specific distro the compiling computer is currently at, but other distros aren't using the same library versions? I just had to upgrade 'gmp' on one of my Gentoo boxes to run Yafu. Perhaps some library of the compiling computer for 1.05 got upgraded ahead of other distros? Was it the Fedora computer or some other? I'd be willing to upgrade a library if I knew which one it was. |
|
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 |
I run CentOS6 on this box. Agree on the library issue, I tried an ldd first as well but with it being dynamic.. If I knew how to put in the arguments I could run strace on it, tried a few things but it's obviously not correct, strace doesn't seem to give too much and fails with - --- SIGFPE (Floating point exception) @ 0 (0) --- +++ killed by SIGFPE (core dumped) +++ Floating point exception (core dumped) but I can't tell whether that's because of the incorrect argument. |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
With this recent build I did upgrade to the latest version of pari/gp (c library for number theory). When I built this library, it had the option to use gmp. Because it improve's performance, I decided to use it. The old executable did not. So this might be the problem. My dynamic version is libgmp.so.3.5.2. With that said, this was supposed to be a static build. I just did a rebuild and verified that the linker is using libgmp.a, the static version. But I guess it's worth a shot, if you want to upgrade to version 3.5.2 of gmp and see if that's the problem. And sorry for causing you guys such a headache. |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
I just had to upgrade 'gmp' on one of my Gentoo boxes to run Yafu. Perhaps some library of the compiling computer for 1.05 got upgraded ahead of other distros? Was it the Fedora computer or some other? Oops, I missed your question the first time. This was built on Fedora 15. |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
I run CentOS6 on this box. Agree on the library issue, I tried an ldd first as well but with it being dynamic.. I think you are basically talking about how to run stand-alone. If you can capture one of the WUs, rename it to "in" and put it in the same directory with the binary. Then just run the binary (no command line args). Thanks for taking the time to help me debug this thing! |
|
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 |
3.5.0 is the latest release of gmp available for CentOS6 from the main repos. Whilst I can update to the latest with a build I tend to try and keep to the repos on my database servers (which this is). I'll see if I can find an update on one of the 3rd party repos like atomic. p.s. I tried running strace after renaming the file to 'in' but it didn't give anything more than a floating point exception and core dumped. Anything else to try in the meantime? |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
3.5.0 is the latest release of gmp available for CentOS6 from the main repos. Whilst I can update to the latest with a build I tend to try and keep to the repos on my database servers (which this is). I have no other ideas. Let's wait to see if 7ri9991 can easily upgrade to the same version and if the problem goes away for him. Then we will know for sure if this is the problem. In the meantime, I will look into what I can do on my side. I still don't understand why my static executable is relying on dynamic libraries. If that's the case, what's the point of building a static executable? |
|
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 |
If it helps at all here is the strace.
[root@dbase2 stat.la.asu.edu_NumberFields]# strace ./GetBoundedDecics_1.05_x86_64-pc-linux-gnu
execve("./GetBoundedDecics_1.05_x86_64-pc-linux-gnu", ["./GetBoundedDecics_1.05_x86_64-p"...], [/* 22 vars */]) = 0
uname({sys="Linux", node="dbase2", ...}) = 0
brk(0) = 0xea6000
brk(0xea71c0) = 0xea71c0
arch_prctl(ARCH_SET_FS, 0xea68a0) = 0
set_tid_address(0xea6b70) = 16342
set_robust_list(0xea6b80, 0x18) = 0
rt_sigaction(SIGRTMIN, {0x7ee090, [], SA_RESTORER|SA_SIGINFO, 0x7f6620}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7ee120, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f6620}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
brk(0xec81c0) = 0xec81c0
brk(0xec9000) = 0xec9000
futex(0xbbf260, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("stderr.txt", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=554, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf4000
fstat(3, {st_mode=S_IFREG|0644, st_size=554, ...}) = 0
lseek(3, 554, SEEK_SET) = 554
dup3(3, 2, 0) = 2
close(3) = 0
munmap(0x7f8d7adf4000, 4096) = 0
rt_sigaction(SIGILL, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGILL, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0
rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGABRT, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0
rt_sigaction(SIGBUS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGBUS, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0
rt_sigaction(SIGSEGV, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSEGV, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0
rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSYS, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPIPE, {0x763bb0, [], SA_RESTORER, 0x7f6620}, NULL, 8) = 0
open("init_data.xml", O_RDONLY) = -1 ENOENT (No such file or directory)
brk(0xec8000) = 0xec8000
open("boinc_lockfile", O_WRONLY|O_CREAT, 0664) = 3
fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
stat("init_data.xml", 0x7fff459486d0) = -1 ENOENT (No such file or directory)
open("/etc/localtime", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf4000
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 4096) = 3519
lseek(4, -2252, SEEK_CUR) = 1267
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 4096) = 2252
lseek(4, 3518, SEEK_SET) = 3518
close(4) = 0
munmap(0x7f8d7adf4000, 4096) = 0
write(2, "10:16:14 (16342): Can't open ini"..., 73) = 73
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8d7adf1000
mprotect(0x7f8d7adf1000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f8d7adf3e50, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8d7adf49d0, tls=0x7f8d7adf4700, child_tidptr=0x7f8d7adf49d0) = 16343
rt_sigaction(SIGALRM, {0x75e340, [], SA_RESTORER|SA_RESTART, 0x7f6620}, NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={0, 100000}, it_value={0, 100000}}, NULL) = 0
lstat("in", {st_mode=S_IFREG|0644, st_size=414, ...}) = 0
stat("in", {st_mode=S_IFREG|0644, st_size=414, ...}) = 0
open("in", O_RDONLY) = 4
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=414, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf0000
read(4, "[y^2 + 3, [0, 1], -3, 2, [Mat([1"..., 4096) = 414
close(4) = 0
munmap(0x7f8d7adf0000, 4096) = 0
lstat("out", 0x7fff459488c0) = -1 ENOENT (No such file or directory)
stat("out", 0x7fff45948890) = -1 ENOENT (No such file or directory)
lstat("GetBoundedDecics_state", 0x7fff459488c0) = -1 ENOENT (No such file or directory)
stat("GetBoundedDecics_state", 0x7fff45948890) = -1 ENOENT (No such file or directory)
stat("GetBoundedDecics_state", 0x7fff45948ac0) = -1 ENOENT (No such file or directory)
write(2, "APP: Checkpoint Flag = 0.\n", 26) = 26
write(2, "APP: a21 Starting Value = -10000"..., 37) = 37
write(2, "APP: a32 Starting Value = -10000"..., 37) = 37
write(2, "APP: PolyCount starting value = "..., 35) = 35
write(2, "APP: Stat Count 1 = 0.\n", 23) = 23
write(2, "APP: Stat Count 2 = 0.\n", 23) = 23
write(2, "APP: Stat Count 3 = 0.\n", 23) = 23
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
geteuid() = 0
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(4) = 0
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(4) = 0
open("/etc/nsswitch.conf", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=1688, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf0000
read(4, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1688
read(4, "", 4096) = 0
close(4) = 0
munmap(0x7f8d7adf0000, 4096) = 0
open("/etc/ld.so.cache", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=71751, ...}) = 0
mmap(NULL, 71751, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f8d7addf000
close(4) = 0
open("/lib64/libnss_files.so.2", O_RDONLY) = 4
read(4, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20!\0\0\0\0\0\0"..., 832) = 832
fstat(4, {st_mode=S_IFREG|0755, st_size=61624, ...}) = 0
mmap(NULL, 2147728, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x7f8d7abd2000
mprotect(0x7f8d7abde000, 2093056, PROT_NONE) = 0
mmap(0x7f8d7addd000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0xb000) = 0x7f8d7addd000
close(4) = 0
open("/lib64/libc.so.6", O_RDONLY) = 4
read(4, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\355!v:\0\0\0"..., 832) = 832
fstat(4, {st_mode=S_IFREG|0755, st_size=1838296, ...}) = 0
mmap(0x3a76200000, 3664040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x3a76200000
mprotect(0x3a76375000, 2097152, PROT_NONE) = 0
mmap(0x3a76575000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x175000) = 0x3a76575000
mmap(0x3a7657a000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3a7657a000
close(4) = 0
open("/lib64/ld-linux-x86-64.so.2", O_RDONLY) = 4
read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\n\240u:\0\0\0"..., 832) = 832
fstat(4, {st_mode=S_IFREG|0755, st_size=150672, ...}) = 0
mmap(0x3a75a00000, 2228520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x3a75a00000
mprotect(0x3a75a1e000, 2097152, PROT_NONE) = 0
mmap(0x3a75c1e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1e000) = 0x3a75c1e000
mmap(0x3a75c20000, 296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3a75c20000
close(4) = 0
mprotect(0x3a75c1e000, 4096, PROT_READ) = 0
mprotect(0x3a76575000, 16384, PROT_READ) = 0
mprotect(0x7f8d7addd000, 4096, PROT_READ) = 0
munmap(0x7f8d7addf000, 71751) = 0
mmap(NULL, 4294967296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8c7abd2000
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 4
fcntl(4, F_GETFD) = 0x1 (flags FD_CLOEXEC)
fstat(4, {st_mode=S_IFREG|0644, st_size=1731, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d7adf0000
read(4, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1731
close(4) = 0
munmap(0x7f8d7adf0000, 4096) = 0
--- SIGFPE (Floating point exception) @ 0 (0) ---
+++ killed by SIGFPE (core dumped) +++
|
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
Thanks Bok! Nothing stands out. Maybe Greg (the other admin) will notice something I don't. |
triggglSend message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 |
3.5.0 is the latest release of gmp available for CentOS6 from the main repos. Whilst I can update to the latest with a build I tend to try and keep to the repos on my database servers (which this is). Well, as I said, I had to upgrade gmp to run Yafu. I'm at gmp-5.0.2. I'm willing to upgrade, not downgrade. The lowest even available to me is 4.3.2. Perhaps Gentoo versioning is slightly different than Fedora versioning? Is it possible for 3.5.2 on Fedora to equal 5.0.2 on Gentoo? |
|
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 |
I think that's your gmp version, look for libgmp and see what that is. find / -name libgmp* should be in /usr/lib64/ |
triggglSend message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 |
I think that's your gmp version, look for libgmp and see what that is. locate libgmp /usr/lib64/libgmp.a /usr/lib64/libgmp.la /usr/lib64/libgmp.so /usr/lib64/libgmp.so.10 /usr/lib64/libgmp.so.10.0.2 /usr/lib64/libgmpxx.a /usr/lib64/libgmpxx.la /usr/lib64/libgmpxx.so /usr/lib64/libgmpxx.so.4 /usr/lib64/libgmpxx.so.4.2.2 /usr/lib64/engines/libgmp.so |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
Ok. I backed out one of the "fixes" I made. It looks like the program was dying right around the call to a function which "initializes" a stack-like workspace. One of the inputs to this initialization routine is a max_prime which it uses to create a table of primes. The table of primes is supposed to make factorization of large numbers faster. The bigger this table, the better. Anyways, I lowered it back down to the original value I had been using (1 million if you care to know). I only made this change on the linux version, as the windows version was fine. The new linux version is 1.06 Give this one a shot and lets see what happens. If this doesnt fix the problem, then the next step would be to back out to the older version of the pari/gp library with gmp disable. |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
It looks like you're still having the problem with version 1.06. At least I know basically where the problem is now. |
triggglSend message Joined: 19 Aug 11 Posts: 9 Credit: 164,423 RAC: 0 |
It looks like you're still having the problem with version 1.06. Been keeping an eye on me? |
Eric DriverSend message Joined: 8 Jul 11 Posts: 1425 Credit: 781,762,645 RAC: 805,022 |
It looks like you're still having the problem with version 1.06. Yep. I got nothin better to do than monitor this server... |
|
Send message Joined: 19 Aug 11 Posts: 10 Credit: 1,004,925 RAC: 0 |
Yeah, same thing for mine with 1.06. |