View Issue Details

IDProjectCategoryView StatusLast Update
0002727GNUnetfile-sharing servicepublic2013-12-24 20:54
ReporterMatthias Wachs Assigned ToChristian Grothoff  
PrioritylowSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Product Version0.10.0 
Target Version0.10.0Fixed in Version0.10.0 
Summary0002727: Segfaults in FS on sparcbot
DescriptionKernel logs on sparcbot


[2013-01-05 20:50:24] gnunet-service-[144230]: segfault at 8 ip 00000000702cf7e8 (rpc 000000007029328c) sp 00000000ffd0c9a0 error 30001 in libc-2.13.so[70240000+16c000]
[2013-01-05 20:57:58] gnunet-service-[148622]: segfault at 8 ip 00000000702cf7e8 (rpc 000000007029328c) sp 00000000ffeee9b0 error 30001 in libc-2.13.so[70240000+16c000]
[2013-01-05 20:58:25] gnunet-service-[149168]: segfault at 0 ip 000000007031f7e8 (rpc 00000000702e328c) sp 00000000ffa269a0 error 30001 in libc-2.13.so[70290000+16c000]
[2013-01-05 20:58:26] gnunet-service-[149205]: segfault at 0 ip 00000000702fb7e8 (rpc 00000000702bf28c) sp 00000000ffb249a0 error 30001 in libc-2.13.so[7026c000+16c000]
[2013-01-05 20:58:49] gnunet-service-[149395]: segfault at 0 ip 00000000702cf7e8 (rpc 000000007029328c) sp 00000000ffc069a0 error 30001 in libc-2.13.so[70240000+16c000]
[2013-01-05 23:04:16] gnunet-service-[206960]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000fff9b140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-05 23:07:01] gnunet-service-[207101]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000ffef9140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-06 03:25:23] gnunet-service-[10777]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000ff971140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-07 14:23:57] gnunet-service-[198992]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000ffdcf140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-07 17:47:08] gnunet-service-[4041]: segfault at 0 ip 000000000001eba8 (rpc 00000000700c59a8) sp 00000000ffe9f140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-07 17:47:08] gnunet-service-[4042]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000ffba5140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-08 11:33:31] gnunet-service-[136331]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000ff967140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-08 17:04:24] gnunet-service-[202254]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000ffe1f140 error 30001 in gnunet-service-fs[10000+1c000]
[2013-01-08 17:07:12] gnunet-service-[202393]: segfault at 0 ip 000000000001eba8 (rpc 00000000700799a8) sp 00000000ffa53140 error 30001 in gnunet-service-fs[10000+1c000]
Additional InformationCorrelates to buildbot fs builds

https://gnunet.org/buildbot/builders/lenny-sparc64-wachs/builds/103/steps/tests%20fs/logs/stdio

Jan 08 17:19:53-644221
TagsNo tags attached.

Activities

Matthias Wachs

2013-01-09 16:18

manager   ~0006766

Try to reproduce by running fs tests manually

Matthias Wachs

2013-01-09 16:42

manager   ~0006767

ran fs tests manually: passed without coredump

Matthias Wachs

2013-01-09 16:43

manager   ~0006768

found older coredump in fs dir, don't know if this is the problem:

root@mamasparc:~/buildbot/lenny-sparc64-wachs/build/src/fs# gdb .libs/gnunet-service-fs core
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/buildbot/lenny-sparc64-wachs/build/src/fs/.libs/gnunet-service-fs...done.

warning: core file may not match specified executable file.
[New LWP 202393]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1".
Core was generated by `/tmp/gnbuild/lib/gnunet/libexec/gnunet-service-fs -c /tmp/testbed4a10EF/1/confi'.
Program terminated with signal 11, Segmentation fault.
#0 put_migration_continuation (cls=0x1498f0, success=0, min_expiration=<error reading variable: Cannot access memory at address 0x0>, msg=0x0) at gnunet-service-fs_pr.c:965
965 {
(gdb) bt
#0 put_migration_continuation (cls=0x1498f0, success=0, min_expiration=<error reading variable: Cannot access memory at address 0x0>, msg=0x0) at gnunet-service-fs_pr.c:965
#1 0x7007aee4 in process_result_message (cls=0x4ba28, msg=0x0) at datastore_api.c:1169
#2 0x700799b0 in timeout_queue_entry (cls=0x828c8, tc=0xffa53378) at datastore_api.c:398
#3 0x7019b190 in run_ready (ws=0x3da98, rs=0x3da10) at scheduler.c:597
#4 GNUNET_SCHEDULER_run (task=<optimized out>, task_cls=<optimized out>) at scheduler.c:786
#5 0x701a6f38 in GNUNET_SERVICE_run (argc=3, argv=0xffa53714, service_name=0x28080 "fs", options=<optimized out>, task=<optimized out>, task_cls=<optimized out>) at service.c:1815
#6 0x00013008 in main (argc=3, argv=0xffa53714) at gnunet-service-fs.c:709
(gdb)

Matthias Wachs

2013-01-09 19:57

manager   ~0006769

could reproduce with ./perf_gnunet_service_fs_p2p_respect


Core was generated by `/tmp/gnbuild/lib/gnunet/libexec/gnunet-service-fs -c /tmp/testbedte0z5P/2/confi'.
Program terminated with signal 11, Segmentation fault.
#0 put_migration_continuation (cls=0x9d388, success=0,
    min_expiration=<error reading variable: Cannot access memory at address 0x0>, msg=0x0)
    at gnunet-service-fs_pr.c:965
965 {
(gdb) bt
#0 put_migration_continuation (cls=0x9d388, success=0,
    min_expiration=<error reading variable: Cannot access memory at address 0x0>, msg=0x0)
    at gnunet-service-fs_pr.c:965
#1 0xf7d82ee4 in process_result_message (cls=0x4ba18, msg=0x0) at datastore_api.c:1169
#2 0xf7d819b0 in timeout_queue_entry (cls=0x9be60, tc=0xff95d7f8) at datastore_api.c:398
#3 0xf7cdf190 in run_ready (ws=0x3da88, rs=0x3da00) at scheduler.c:597
#4 GNUNET_SCHEDULER_run (task=<optimized out>, task_cls=<optimized out>) at scheduler.c:786
#5 0xf7ceaf38 in GNUNET_SERVICE_run (argc=3, argv=0xff95db94, service_name=0x28080 "fs",
    options=<optimized out>, task=<optimized out>, task_cls=<optimized out>)
    at service.c:1815
#6 0x00013008 in main (argc=3, argv=0xff95db94) at gnunet-service-fs.c:709
(gdb)

Christian Grothoff

2013-01-24 17:18

manager   ~0006793

Very strange -- the line number 965 is above the beginning of the function, and EVERY dereference in the file (except for 'cls' derefs) is explicitly checked against NULL _already_ (since end of 2012!). So there cannot be a NULL-deref in the entire function!

Matthias Wachs

2013-01-25 10:13

manager   ~0006794

root@mamasparc:~/buildbot/lenny-sparc64-wachs/build/src/fs# ls -l core
-rw------- 1 root root 2080768 Jan 25 09:49 core
root@mamasparc:~/buildbot/lenny-sparc64-wachs/build/src/fs# gdb .libs/gnunet-service-fs core
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/buildbot/lenny-sparc64-wachs/build/src/fs/.libs/gnunet-service-fs...done.

warning: core file may not match specified executable file.
[New LWP 219160]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1".
Core was generated by `/tmp/gnbuild/lib/gnunet/libexec/gnunet-service-fs -c /tmp/testbedkuitAu/0/confi'.
Program terminated with signal 11, Segmentation fault.
#0 put_migration_continuation (cls=0x43a08, success=0,
    min_expiration=<error reading variable: Cannot access memory at address 0x0>, msg=0x0) at gnunet-service-fs_pr.c:965
965 {
(gdb) bt
#0 put_migration_continuation (cls=0x43a08, success=0,
    min_expiration=<error reading variable: Cannot access memory at address 0x0>, msg=0x0) at gnunet-service-fs_pr.c:965
#1 0x7007aee4 in process_result_message (cls=0x4b030, msg=0x0) at datastore_api.c:1169
#2 0x700799b0 in timeout_queue_entry (cls=0xaef18, tc=0xffdcf8d8) at datastore_api.c:398
#3 0x70147190 in run_ready (ws=0x3cd38, rs=0x3ccb0) at scheduler.c:597
#4 GNUNET_SCHEDULER_run (task=<optimized out>, task_cls=<optimized out>) at scheduler.c:786
#5 0x70152f38 in GNUNET_SERVICE_run (argc=3, argv=0xffdcfc74, service_name=0x28080 "fs", options=<optimized out>,
    task=<optimized out>, task_cls=<optimized out>) at service.c:1815
#6 0x00013008 in main (argc=3, argv=0xffdcfc74) at gnunet-service-fs.c:709
(gdb)

Matthias Wachs

2013-01-25 10:53

manager   ~0006795

Could reproduce with perf_gnunet_service_fs_p2p_index

Christian Grothoff

2013-01-25 13:32

manager   ~0006796

put_migration_continuation should never be called from process_result_message as the function has the WRONG SIGNATURE for that call location. So somehow datastore_api.c had some internal (memory) corruption, and then likely calls the wrong function with possibly some bogus closure and thus we crash.

Christian Grothoff

2013-11-18 21:31

manager   ~0007653

Stack trace with SVN HEAD:

Reading symbols from /tmp/gnbuild/lib/gnunet/libexec/gnunet-service-fs...done.

warning: core file may not match specified executable file.
[New LWP 103372]
Core was generated by `/tmp/gnbuild/lib/gnunet/libexec/gnunet-service-fs -c /tmp/testbedgdXR8u/0/confi'.
Program terminated with signal 11, Segmentation fault.
#0 put_migration_continuation (cls=0x43938, success=0,
    min_expiration=<error reading variable: Could not find type for DW_OP_GNU_const_type>, msg=0x0) at gnunet-service-fs_pr.c:940
940 {
(gdb) ba
#0 put_migration_continuation (cls=0x43938, success=0,
    min_expiration=<error reading variable: Could not find type for DW_OP_GNU_const_type>, msg=0x0) at gnunet-service-fs_pr.c:940
#1 0x7007efe4 in process_result_message (cls=0x4b850, msg=0x0) at datastore_api.c:1172
#2 0x7007dad0 in timeout_queue_entry (cls=0xfd2c8, tc=0xffc0387c) at datastore_api.c:398
#3 0x701467e0 in run_ready (ws=0x3c8d0, rs=0x3c848) at scheduler.c:593
#4 GNUNET_SCHEDULER_run (task=<optimized out>, task_cls=task_cls@entry=0xffc039b0) at scheduler.c:808
#5 0x70151a80 in GNUNET_SERVICE_run (argc=argc@entry=3, argv=argv@entry=0xffc03c14, service_name=service_name@entry=0x27d50 "fs",
    options=options@entry=GNUNET_SERVICE_OPTION_NONE, task=task@entry=0x138c0 <run>, task_cls=task_cls@entry=0x0) at service.c:1478
#6 0x00013148 in main (argc=3, argv=0xffc03c14) at gnunet-service-fs.c:721

Christian Grothoff

2013-11-18 21:33

manager   ~0007654

The function 'put_migration_continuation' is called, but the signature at datastore_api.c1172 does not match that type, clearly the h->queue_head.qc union (!) does not contain the type of pointer that was expected here.

Christian Grothoff

2013-11-18 21:45

manager   ~0007655

timeout_queue_entry in datastore_api.c did not ensure that the queue slot for which the 'response_proc' was being invoked was at the head of the list. This caused a missmatch in which functions were called next. This only showed on sparc due to timing (needed timeout!). Fixed in SVN 30791.

Christian Grothoff

2013-11-18 21:58

manager   ~0007656

Fixed in SVN 30792.

Issue History

Date Modified Username Field Change
2013-01-09 16:17 Matthias Wachs New Issue
2013-01-09 16:17 Matthias Wachs Status new => assigned
2013-01-09 16:17 Matthias Wachs Assigned To => Christian Grothoff
2013-01-09 16:18 Matthias Wachs Note Added: 0006766
2013-01-09 16:42 Matthias Wachs Note Added: 0006767
2013-01-09 16:43 Matthias Wachs Note Added: 0006768
2013-01-09 19:57 Matthias Wachs Note Added: 0006769
2013-01-24 17:18 Christian Grothoff Note Added: 0006793
2013-01-25 10:13 Matthias Wachs Note Added: 0006794
2013-01-25 10:53 Matthias Wachs Note Added: 0006795
2013-01-25 13:32 Christian Grothoff Note Added: 0006796
2013-01-31 10:32 Christian Grothoff Relationship added related to 0002724
2013-01-31 10:33 Christian Grothoff Relationship deleted related to 0002724
2013-07-10 23:46 Christian Grothoff Assigned To Christian Grothoff =>
2013-07-10 23:46 Christian Grothoff Priority normal => low
2013-07-10 23:46 Christian Grothoff Status assigned => confirmed
2013-11-18 21:19 Christian Grothoff Assigned To => Christian Grothoff
2013-11-18 21:19 Christian Grothoff Status confirmed => assigned
2013-11-18 21:31 Christian Grothoff Note Added: 0007653
2013-11-18 21:33 Christian Grothoff Note Added: 0007654
2013-11-18 21:45 Christian Grothoff Note Added: 0007655
2013-11-18 21:58 Christian Grothoff Note Added: 0007656
2013-11-18 21:58 Christian Grothoff Status assigned => resolved
2013-11-18 21:58 Christian Grothoff Fixed in Version => 0.10.0
2013-11-18 21:58 Christian Grothoff Resolution open => fixed
2013-11-18 21:58 Christian Grothoff Target Version => 0.10.0
2013-12-24 20:54 Christian Grothoff Status resolved => closed