Comp errors on 1.05 for Gentoo Linux

Message boards : Number crunching : Comp errors on 1.05 for Gentoo Linux
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 46
Credit: 7,144,042
RAC: 0
Message 146 - Posted: 6 Sep 2011, 5:25:00 UTC - in response to Message 143.  

I don't think it has anything to do with lib versions on the client. I think something early is ill conditioned or relies on a failed syscall that leads to the exception. Any chance someone can run under a debugger? You would do #gdb then r for run. After the exception it should trap and give you a line number. That would be great since we have no way to reproduce. Any takers?
ID: 146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath
Avatar

Send message
Joined: 2 Sep 11
Posts: 57
Credit: 1,274,345
RAC: 0
Message 150 - Posted: 6 Sep 2011, 6:45:26 UTC - in response to Message 146.  
Last modified: 6 Sep 2011, 6:46:48 UTC

I'll bite. Followed your directions and got:

Reading symbols from /home/kim/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu...done.
(gdb) r
Starting program: /home/kim/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu

Program exited with code 01.
(gdb)


It took about 1 sec to get from "Starting program" to "Program exited with code 01". Does that sound right? I cd'd to the NumberFields dir in BOINC/projects before running gdb.

It's now paused at the gdb prompt. I'll let it sit there until I hear back from you.

BTW, I'm on Ubuntu 10.10 not Gentoo.
BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bok

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,004,925
RAC: 0
Message 151 - Posted: 6 Sep 2011, 6:47:03 UTC
Last modified: 6 Sep 2011, 7:07:33 UTC

Tried the debug-info command but it didn't find anything to install. Looks like the debuginfo repos for centos 6 are not updated yet.. Do I need a version compiled with a -g ?

Reading symbols from /root/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu...done.
(gdb) r
Starting program: /root/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu

Program received signal SIGFPE, Arithmetic exception.
0x0000003a76277e13 in _int_free () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.7.el6.x86_64

ID: 151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,775,998
RAC: 288,739
Message 152 - Posted: 6 Sep 2011, 7:12:32 UTC - in response to Message 150.  

I'll bite. Followed your directions and got:

Reading symbols from /home/kim/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu...done.
(gdb) r
Starting program: /home/kim/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu

Program exited with code 01.
(gdb)


It took about 1 sec to get from "Starting program" to "Program exited with code 01". Does that sound right? I cd'd to the NumberFields dir in BOINC/projects before running gdb.

It's now paused at the gdb prompt. I'll let it sit there until I hear back from you.

BTW, I'm on Ubuntu 10.10 not Gentoo.


Thanks for volunteering. Unfortunately it's only crashing on Gentoo or CentOS. You're "crash" might be from the way it was run. Do it the same way you did but copy one of the WUs to a file named "in". (When running stand-alone it looks for a file named "in", as defined in the wu template) It should run normally for you and not crash.
ID: 152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 19 Aug 11
Posts: 9
Credit: 164,423
RAC: 0
Message 153 - Posted: 6 Sep 2011, 10:19:18 UTC - in response to Message 146.  
Last modified: 6 Sep 2011, 10:59:06 UTC

I don't think it has anything to do with lib versions on the client. I think something early is ill conditioned or relies on a failed syscall that leads to the exception. Any chance someone can run under a debugger? You would do #gdb <app name> then r for run. After the exception it should trap and give you a line number. That would be great since we have no way to reproduce. Any takers?


Well, this doesn't look real helpful, but here it is.

This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /tmp/GetBoundedDecics_1.06_x86_64-pc-linux-gnu...done.
(gdb) r
Starting program: /tmp/GetBoundedDecics_1.06_x86_64-pc-linux-gnu

Program received signal SIGFPE, Arithmetic exception.
0x00000031fee72b4c in ?? () from /lib64/libc.so.6


Perhaps there's a USE flag I need to enable in my glibc?

emerge -vp glibc

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild R ] sys-libs/glibc-2.12.2 USE="(multilib) nls -debug -gd -glibc-omitfp (-hardened) -profile (-selinux) -vanilla" 0 kB
ID: 153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 19 Aug 11
Posts: 9
Credit: 164,423
RAC: 0
Message 154 - Posted: 6 Sep 2011, 11:29:53 UTC

I know this isn't a Gentoo Forum, but maybe an emerge --info would help.

Portage 2.1.10.11 (default/linux/amd64/10.0/desktop/gnome, gcc-4.4.5, glibc-2.12.2-r0, 2.6.37-gentoo-r4 x86_64)
=================================================================
System uname: Linux-2.6.37-gentoo-r4-x86_64-Dual_Core_AMD_Opteron-tm-_Processor_285-with-gentoo-2.0.3
Timestamp of tree: Sun, 04 Sep 2011 20:00:01 +0000
distcc 3.1 x86_64-pc-linux-gnu [enabled]
app-shells/bash:          4.1_p9
dev-lang/python:          2.7.1-r1, 3.1.3-r1
dev-util/cmake:           2.8.4-r1
dev-util/pkgconfig:       0.26
sys-apps/baselayout:      2.0.3
sys-apps/openrc:          0.8.3-r1
sys-apps/sandbox:         2.4
sys-devel/autoconf:       2.13, 2.68
sys-devel/automake:       1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:       2.21.1-r1
sys-devel/gcc:            4.4.5
sys-devel/gcc-config:     1.4.1-r1
sys-devel/libtool:        2.4-r1
sys-devel/make:           3.82-r1
sys-kernel/linux-headers: 2.6.36.1 (virtual/os-headers)
sys-libs/glibc:           2.12.2
Repositories: gentoo x-portage
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8-sse3 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=k8-sse3 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--with-bdeps=y"
FEATURES="assume-digests binpkg-logs distcc distlocks ebuild-locks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
FFLAGS="-march=k8-sse3 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -O2 -pipe"
GENTOO_MIRRORS=" http://129.123.104.64/gentoo "
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en en_US"
MAKEOPTS="-j8"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi alsa amd64 bash-completion berkdb bluetooth branding bzip2 cairo cdda cdr cli consolekit cracklib crypt cups cxx dbus dri dts dvd dvdr eds emboss encode evo exif fam firefox flac fortran gdu gif gnome gnome-keyring gpm gstreamer gtk iconv imap ipv6 jpeg lcms ldap libnotify libwww mad maildir mmx mng modules mp3 mp4 mpeg mudflap multilib mysql nautilus ncurses nls nptl nptlonly ogg opengl openmp pam pango pcre pdf perl png policykit ppds pppd python qt3support qt4 readline sasl sdl session spell sse sse2 ssl startup-notification svg sysfs tcpd tiff truetype udev unicode usb vorbis x264 xcb xml xorg xulrunner xv xvid zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en en_US" PHP_TARGETS="php5-3" QEMU_SOFTMMU_TARGETS="i386 x86-64" QEMU_USER_TARGETS="i386 x86-64" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia fglrx" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, LANG, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
ID: 154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 46
Credit: 7,144,042
RAC: 0
Message 158 - Posted: 6 Sep 2011, 15:47:15 UTC - in response to Message 151.  

Tried the debug-info command but it didn't find anything to install. Looks like the debuginfo repos for centos 6 are not updated yet.. Do I need a version compiled with a -g ?

Reading symbols from /root/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu...done.
(gdb) r
Starting program: /root/BOINC/projects/stat.la.asu.edu_NumberFields/GetBoundedDecics_1.06_x86_64-pc-linux-gnu

Program received signal SIGFPE, Arithmetic exception.
0x0000003a76277e13 in _int_free () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.7.el6.x86_64



OK thanks. Looks to me like a memory issue. _int_free() I believe is in the malloc path. You already have symbol information so don't worry about adding more. If you could do a backtrace (type bt at the gdb prompt) after the fail this may confirm it. Is your system heavily loaded on memory? Perhaps you could add a vmstat output.

Thanks for your help on this.
ID: 158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bok

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,004,925
RAC: 0
Message 159 - Posted: 6 Sep 2011, 16:36:11 UTC
Last modified: 6 Sep 2011, 16:37:46 UTC

Sure thing. System has 16Gb Ram with almost all of it totally free right now.


(gdb) bt
#0  0x0000003a76277e13 in _int_free () from /lib64/libc.so.6
#1  0x0000003a76265eed in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007ffff7de1f3a in _nss_files_getpwuid_r () from /lib64/libnss_files.so.2
#3  0x000000000085345d in getpwuid_r ()
#4  0x0000000000853029 in getpwuid ()
#5  0x000000000068a5e5 in pari_get_homedir ()
#6  0x000000000068abea in path_expand ()
#7  0x000000000068af29 in gp_expand_path ()
#8  0x000000000069cca0 in pari_init_opts ()
#9  0x000000000041150e in MartinetSearch(char*, char*, int, long, long, long*) ()
#10 0x0000000000403196 in main ()
ID: 159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 46
Credit: 7,144,042
RAC: 0
Message 161 - Posted: 6 Sep 2011, 17:36:28 UTC - in response to Message 159.  

Sure thing. System has 16Gb Ram with almost all of it totally free right now.


(gdb) bt
#0  0x0000003a76277e13 in _int_free () from /lib64/libc.so.6
#1  0x0000003a76265eed in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007ffff7de1f3a in _nss_files_getpwuid_r () from /lib64/libnss_files.so.2
#3  0x000000000085345d in getpwuid_r ()
#4  0x0000000000853029 in getpwuid ()
#5  0x000000000068a5e5 in pari_get_homedir ()
#6  0x000000000068abea in path_expand ()
#7  0x000000000068af29 in gp_expand_path ()
#8  0x000000000069cca0 in pari_init_opts ()
#9  0x000000000041150e in MartinetSearch(char*, char*, int, long, long, long*) ()
#10 0x0000000000403196 in main ()


Bok, this is great. Just what we need. I think we know what the problem is from this. The pari library (a collection of source for numberfields math) is using an outdated version of getpwuid(). Some systems have a workaround but others don't deal with it so well. We need to patch it so that it doesn't call this function and tell the pari people.

Thanks for the effort on this. You rock Bok!

--Greg
ID: 161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 46
Credit: 7,144,042
RAC: 0
Message 162 - Posted: 6 Sep 2011, 17:43:44 UTC - in response to Message 153.  

I don't think it has anything to do with lib versions on the client. I think something early is ill conditioned or relies on a failed syscall that leads to the exception. Any chance someone can run under a debugger? You would do #gdb then r for run. After the exception it should trap and give you a line number. That would be great since we have no way to reproduce. Any takers?


Well, this doesn't look real helpful, but here it is.

This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /tmp/GetBoundedDecics_1.06_x86_64-pc-linux-gnu...done.
(gdb) r
Starting program: /tmp/GetBoundedDecics_1.06_x86_64-pc-linux-gnu

Program received signal SIGFPE, Arithmetic exception.
0x00000031fee72b4c in ?? () from /lib64/libc.so.6


Perhaps there's a USE flag I need to enable in my glibc?

emerge -vp glibc

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild R ] sys-libs/glibc-2.12.2 USE="(multilib) nls -debug -gd -glibc-omitfp (-hardened) -profile (-selinux) -vanilla" 0 kB


This is helpful thanks. Looks like the same issue as Bok is having. We need to patch the pari source again and see if we can eliminate this outdated reference. Perhaps you can beta test for us when we do?
ID: 162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 19 Aug 11
Posts: 9
Credit: 164,423
RAC: 0
Message 167 - Posted: 6 Sep 2011, 21:27:22 UTC - in response to Message 162.  

This is helpful thanks. Looks like the same issue as Bok is having. We need to patch the pari source again and see if we can eliminate this outdated reference. Perhaps you can beta test for us when we do?


Sure thing.
ID: 167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bok

Send message
Joined: 19 Aug 11
Posts: 10
Credit: 1,004,925
RAC: 0
Message 169 - Posted: 6 Sep 2011, 23:56:22 UTC - in response to Message 161.  

You are totally welcome. Though I live in North Carolina, Tempe is almost my second home, I'll be back out there in a few weeks :)
ID: 169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 19 Aug 11
Posts: 9
Credit: 164,423
RAC: 0
Message 170 - Posted: 7 Sep 2011, 0:50:30 UTC

I live in Our Kansas (Arkansas) and I'm also a Greg T.
ID: 170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,775,998
RAC: 288,739
Message 176 - Posted: 7 Sep 2011, 22:36:47 UTC - in response to Message 170.  

Thanks Bok and 7ri9991(Greg) for helping us debug this. I will promote the new executable tonight when I get home (I can't ssh to the server from work). I should have it done by 10pm AZ time.
-- Eric
ID: 176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Greg Tucker
Project administrator
Project developer
Project tester

Send message
Joined: 8 Jul 11
Posts: 46
Credit: 7,144,042
RAC: 0
Message 177 - Posted: 7 Sep 2011, 23:14:39 UTC - in response to Message 170.  

Thanks to the beta testers we confirmed the fix. It was the second and hopefully last bug we found in the pari source. Thanks again guys.
ID: 177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Eric Driver
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 8 Jul 11
Posts: 1318
Credit: 403,775,998
RAC: 288,739
Message 178 - Posted: 8 Sep 2011, 5:05:05 UTC - in response to Message 177.  

The new linux executable is now available.
ID: 178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Comp errors on 1.05 for Gentoo Linux


Main page · Your account · Message boards


Copyright © 2024 Arizona State University