View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002572 | GNUnet | DHT service | public | 2012-09-30 18:46 | 2012-11-05 18:33 |
Reporter | LRN | Assigned To | LRN | ||
Priority | high | Severity | crash | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | Git master | ||||
Target Version | 0.9.4 | Fixed in Version | 0.9.4 | ||
Summary | 0002572: Crash in dht service during test_stream_local | ||||
Description | subj | ||||
Additional Information | Reading symbols from d:\progs\gnunet\bin\gnunet-service-dht.exe...done. Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 12860.0xbe4] 0x74fd280d in KERNELBASE!DeleteAce () from C:\Windows\syswow64\KernelBase.dll (gdb) up #1 0x62ac6abc in GNUNET_abort () at common_logging.c:291 291 DebugBreak (); (gdb) #2 0x62afa2ac in GNUNET_SCHEDULER_cancel (task=13451671603782742029) at scheduler.c:918 918 GNUNET_assert (0); (gdb) #3 0x62acbd7f in connect_fail_continuation (connection=0x1d59c78) at connection.c:564 564 GNUNET_SCHEDULER_cancel (connection->nth.timeout_task); (gdb) p/x *connection->nth Structure has no component named operator*. (gdb) p/x connection->nth $1 = {notify_ready = 0xbaadf00d, notify_ready_cls = 0xbaadf00d, connection = 0xbaadf00d, transmit_timeout = {abs_value = 0xbaadf00dbaadf00d}, timeout_task = 0xbaadf00dbaadf00d, notify_size = 0xbaadf00d} (gdb) p/x *connection $2 = {cfg = 0x1d64930, ap_head = 0x1d584a8, ap_tail = 0xbaadf00d, addr = 0xbaadf00d, hostname = 0xbaadf00d, sock = 0xbaadf00d, receiver = 0xbaadf00d, receiver_cls = 0xbaadf00d, write_buffer = 0xbaadf00d, write_buffer_size = 0xbaadf00d, write_buffer_off = 0xbaadf00d, write_buffer_pos = 0xbaadf00d, addrlen = 0xbaadf00d, read_task = 0xbaadf00dbaadf00d, write_task = 0xbaadf00dbaadf00d, dns_active = 0xbaadf00d, nth = {notify_ready = 0xbaadf00d, notify_ready_cls = 0xbaadf00d, connection = 0xbaadf00d, transmit_timeout = {abs_value = 0xbaadf00dbaadf00d}, timeout_task = 0xbaadf00dbaadf00d, notify_size = 0xbaadf00d}, receive_timeout = {abs_value = 0xbaadf00dbaadf00d}, max = 0xbaadf00d, port = 0xf00d, persist = 0xbaad} (gdb) bt #0 0x74fd280d in KERNELBASE!DeleteAce () from C:\Windows\syswow64\KernelBase.dll #1 0x62ac6abc in GNUNET_abort () at common_logging.c:291 #2 0x62afa2ac in GNUNET_SCHEDULER_cancel (task=13451671603782742029) at scheduler.c:918 #3 0x62acbd7f in connect_fail_continuation (connection=0x1d59c78) at connection.c:564 #4 0x62acc653 in connect_probe_continuation (cls=0x1d10440, tc=0x28fc70) at connection.c:662 #5 0x62af95bc in run_ready (rs=0x739fe8, ws=0x73b000) at scheduler.c:602 #6 0x62af9d5e in GNUNET_SCHEDULER_run (task=0x62b060b7 <service_task>, task_cls=0x28fe08) at scheduler.c:790 #7 0x62b06e16 in GNUNET_SERVICE_run (argc=3, argv=0x1d0bce0, service_name=0x41009c <__register_frame_info+4259996> "dht", options=GNUNET_SERVICE_OPTION_NONE, task=0x4015cc <run>, task_cls=0x0) at service.c:1804 #8 0x004017a6 in main (argc=3, argv=0x1d0bce0) at gnunet-service-dht.c:184 | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
related to | 0002495 | closed | Christian Grothoff | Memory corruption in core / client. |
|
Log of dht service prior to the crash is attached - https://gnunet.org/bugs/file_download.php?file_id=553&type=bug . My understanding is that testcase is shutting down, other services are down, and dht can't connect to them anymore, but still tries that (for some reason), even though it did get a GNUNET_SCHEDULER_REASON_SHUTDOWN signal in its parent control handler. |
|
I've triple-checked the task management code in connection.c itself --- I cannot find anything wrong there :-(. The super-high task ID indicates that we're using bad memory (use-after-free, etc.), but connection.c itself seems fine. So it could be memory corruption anywhere in the DHT service :-(. |
|
$ valgrind --trace-children=yes .libs/test_stream_local ==20365== Command: /home/grothoff/bin/gnunet-service-dht -c /tmp/test_stream_localucZR0a/0/config ==20365== ==20365== Jump to the invalid address stated on the next line ==20365== at 0x5D6: ??? ==20365== Address 0x5d6 is not stack'd, malloc'd or (recently) free'd ==20365== ==20365== ==20365== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==20365== Bad permissions for mapped region at address 0x5D6 ==20365== at 0x5D6: ??? ==20365== ==20365== Command: /home/grothoff/bin/gnunet-service-dht -c /tmp/test_stream_localucZR0a/0/config ==20365== ==20365== Jump to the invalid address stated on the next line ==20365== at 0x5D6: ??? ==20365== Address 0x5d6 is not stack'd, malloc'd or (recently) free'd ==20365== ==20365== ==20365== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==20365== Bad permissions for mapped region at address 0x5D6 ==20365== at 0x5D6: ??? ==20365== --- just as funky, could be very much related. |
|
Looks a bit different every time: ==20416== Command: /home/grothoff/bin/gnunet-service-dht -c /tmp/test_stream_localhr2lFR/0/config ==20416== ==20416== Jump to the invalid address stated on the next line ==20416== at 0x5D6: ??? ==20416== by 0x40056C7: open_verify (dl-load.c:1911) ==20416== by 0x4005D57: open_path (dl-load.c:2172) ==20416== by 0x4007D41: _dl_map_object (dl-load.c:2378) ==20416== by 0x400D211: openaux (dl-deps.c:65) ==20416== by 0x400DBD5: _dl_catch_error (dl-error.c:178) ==20416== by 0x400C2DB: _dl_map_object_deps (dl-deps.c:247) ==20416== by 0x40032B6: dl_main (rtld.c:1815) ==20416== by 0x4014DDD: _dl_sysdep_start (dl-sysdep.c:244) ==20416== by 0x4001422: _dl_start (rtld.c:342) ==20416== by 0x4000AF7: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so) ==20416== by 0x2: ??? ==20416== Address 0x5d6 is not stack'd, malloc'd or (recently) free'd ==20416== ==20416== ==20416== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==20416== Bad permissions for mapped region at address 0x5D6 ==20416== at 0x5D6: ??? ==20416== by 0x40056C7: open_verify (dl-load.c:1911) ==20416== by 0x4005D57: open_path (dl-load.c:2172) ==20416== by 0x4007D41: _dl_map_object (dl-load.c:2378) ==20416== by 0x400D211: openaux (dl-deps.c:65) ==20416== by 0x400DBD5: _dl_catch_error (dl-error.c:178) ==20416== by 0x400C2DB: _dl_map_object_deps (dl-deps.c:247) ==20416== by 0x40032B6: dl_main (rtld.c:1815) ==20416== by 0x4014DDD: _dl_sysdep_start (dl-sysdep.c:244) ==20416== by 0x4001422: _dl_start (rtld.c:342) ==20416== by 0x4000AF7: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so) ==20416== by 0x2: ??? |
|
Very timing dependent -- if test_stream_local is run without valgrind on *everything*, the bug doesn't appear (because the dht-service isn't even started!). |
|
yes, it depends on timing. If i run it under full debug logging, dht service just hangs without crashing. |
|
Diagnosis from LRN: (02:39:50 PM) LRN: connect_fail_continuation() calls signal_receive_error(), which calls the receiver callback, which destroys the connection completely. Then the control goes back to connect_fail_continuation(), which then proceeds to cancel connection's nth.timeout_task and call, and even calls signal_transmit_error() again (but it never reaches that, crashes on cancelling) (02:40:32 PM) LRN: so the problem is either that the receiver callback kills the connection OR that connection calls the receiver first, and does cancellation second. (02:41:36 PM) LRN: ah, wait, i'm confusing signal_receive_error() and signal_transmit_error() (02:42:20 PM) LRN: ok, it doesn't matter. But then the problem is definitely in receiver killing the connection (03:12:33 PM) LRN: receive_handler is process_ats_message, if gdb should be beleived Possibly fixed in SVN 24225. |
|
Fixed in r24222, r24223, r24225. |
Date Modified | Username | Field | Change |
---|---|---|---|
2012-09-30 18:46 | LRN | New Issue | |
2012-09-30 20:16 | LRN | File Added: dht-12860.log | |
2012-09-30 20:16 | LRN | Note Added: 0006393 | |
2012-09-30 20:16 | LRN | Note Edited: 0006393 | |
2012-10-06 16:04 | Christian Grothoff | Note Added: 0006404 | |
2012-10-06 16:06 | Christian Grothoff | Note Added: 0006405 | |
2012-10-06 16:06 | Christian Grothoff | Note Edited: 0006405 | |
2012-10-06 16:08 | Christian Grothoff | Note Added: 0006406 | |
2012-10-06 16:11 | Christian Grothoff | Note Added: 0006407 | |
2012-10-07 02:54 | LRN | Note Added: 0006415 | |
2012-10-07 12:51 | Christian Grothoff | Priority | normal => high |
2012-10-07 12:51 | Christian Grothoff | Status | new => confirmed |
2012-10-07 12:51 | Christian Grothoff | Product Version | => Git master |
2012-10-08 15:31 | Christian Grothoff | Note Added: 0006423 | |
2012-10-08 16:12 | LRN | Note Added: 0006424 | |
2012-10-08 16:18 | Christian Grothoff | Status | confirmed => resolved |
2012-10-08 16:18 | Christian Grothoff | Fixed in Version | => 0.9.4 |
2012-10-08 16:18 | Christian Grothoff | Resolution | open => fixed |
2012-10-08 16:18 | Christian Grothoff | Assigned To | => LRN |
2012-10-08 23:39 | Christian Grothoff | Relationship added | related to 0002495 |
2012-11-05 17:50 | Christian Grothoff | Target Version | => 0.9.4 |
2012-11-05 18:33 | Christian Grothoff | Status | resolved => closed |