View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003615 | GNUnet | file-sharing service | public | 2015-01-19 19:08 | 2018-06-07 00:25 |
Reporter | amatus | Assigned To | Christian Grothoff | ||
Priority | normal | Severity | crash | Reproducibility | have not tried |
Status | closed | Resolution | fixed | ||
Product Version | Git master | ||||
Target Version | 0.11.0pre66 | Fixed in Version | 0.11.0pre66 | ||
Summary | 0003615: assertion failed in schedule_transmission, GNUNET_assert (NULL != cp->cth); | ||||
Description | My gnunet-service-fs from rev 34880 crashed and dumped core. Backtrace and some inspection are below. | ||||
Additional Information | (gdb) bt full #0 0xb752cddc in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 resultvar = <optimized out> resultvar = <optimized out> pid = -1217998848 selftid = 11624 #1 0xb752e463 in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0, 0, 544104778, 3078479872, 3077870288, 1, 3218252592, 3078405279, 3077870728, 3073032928, 1, 1, 0, 0, 1, 1, 3218252564, 3077819504, 3077347628, 3077337644, 3218252580, 134632792, 134516168, 3075494164, 3218252728, 3077409258, 134591004, 3077817652, 32784, 0, 3218252728, 3078430752}}, sa_flags = 0, sa_restorer = 0xb752e320 <__GI_abort>} sigs = {__val = {32, 0 <repeats 31 times>}} #2 0xb76d8024 in GNUNET_abort () at common_logging.c:289 No locals. #3 0x0804ccf6 in schedule_transmission (pth=pth@entry=0xcc23fa8) at gnunet-service-fs_cp.c:414 cp = <optimized out> target = {public_key = { q_y = "9=\256\204\004T\256\346\025\vVU?\331y\214\265\373\361\201\274\\`\216\202}\222\334\002\374\336-"}} __FUNCTION__ = "schedule_transmission" #4 0x0804e8df in GSF_peer_transmit_ (cp=0xcb53750, is_query=0, priority=0, timeout=..., size=32784, gmc=0x8058420 <transmit_message>, gmc_cls=0xbcd6f00) at gnunet-service-fs_cp.c:1428 pth = 0xcc23fa8 pos = <optimized out> prev = <optimized out> __FUNCTION__ = "GSF_peer_transmit_" #5 0x08057d3f in transmit_content (peer=peer@entry=0xbcd6f00, block=block@entry=0xd345300) at gnunet-service-fs_push.c:301 msize = 32784 msg = 0x12406800 i = <optimized out> ppd = 0xcb53750 ret = 0 __FUNCTION__ = "transmit_content" #6 0x08057fde in process_migration_content (cls=0x0, key=0xbfd2a580, size=32768, data=0xbfd2a5c0, type=GNUNET_BLOCK_TYPE_FS_DBLOCK, priority=0, anonymity=1, expiration=..., uid=5775) at gnunet-service-fs_push.c:559 mb = 0xd345300 pos = 0xbcd6f00 __FUNCTION__ = "process_migration_content" #7 0xb7768b81 in process_result_message (cls=0x9c2a150, msg=0xbfd2a550) at datastore_api.c:1260 h = 0x9c2a150 qe = <optimized out> rc = {proc = 0x8057df0 <process_migration_content>, proc_cls = 0x0} dm = 0xbfd2a550 was_transmitted = <optimized out> __FUNCTION__ = "process_result_message" #8 0xb7768e45 in receive_cb (cls=0x9c2a150, msg=0xbfd2a550) at datastore_api.c:592 h = 0x9c2a150 qe = <optimized out> #9 0xb76d3ea5 in receive_task (cls=0x9c1aff8, tc=0xbfd32674) at client.c:595 client = 0x9c1aff8 handler = 0xb7768e10 <receive_cb> cmsg = 0xc7d9638 handler_cls = 0x9c2a150 mbuf = 0xbfd2a550 "\200p" msg = 0xbfd2a550 __FUNCTION__ = "receive_task" #10 0xb77055fe in run_ready (ws=<optimized out>, rs=<optimized out>) at scheduler.c:587 p = GNUNET_SCHEDULER_PRIORITY_DEFAULT pos = 0x1cad39c8 tc = {reason = GNUNET_SCHEDULER_REASON_TIMEOUT, read_ready = 0x9c29c38, write_ready = 0x9c29cc0} #11 GNUNET_SCHEDULER_run (task=0xb770b5d0 <service_task>, task_cls=0xbfd32774) at scheduler.c:816 timeout = {rel_value_us = 25769803776} ret = 0 c = -73 '\267' __FUNCTION__ = "GNUNET_SCHEDULER_run" #12 0xb770e839 in GNUNET_SERVICE_run (argc=3, argv=0xbfd32984, service_name=0x805b254 "fs", options=GNUNET_SERVICE_OPTION_NONE, task=0x0, task_cls=0x0) at service.c:1502 (gdb) p *pth->cp $3 = {ppd = {atsi = 0x1330c3e0, last_client_replies = {0x148ceee8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, last_p2p_replies = {0, 0, 0, 0, 0, 0, 0, 0}, avg_reply_delay = {rel_value_us = 0}, migration_delay = { rel_value_us = 0}, migration_blocked_until = { abs_value_us = 1421431868190762}, last_request_times = {{ abs_value_us = 1421456435846699}, {abs_value_us = 1421456435847009}, { abs_value_us = 1421456435847390}, {abs_value_us = 1421456435847617}, { abs_value_us = 1421456485048354}, {abs_value_us = 1421456485048693}, { abs_value_us = 1421456485048977}, {abs_value_us = 1421456485049335}, { abs_value_us = 1421456485049671}, {abs_value_us = 1421456485049904}, { abs_value_us = 1421456485050116}, {abs_value_us = 1421456485050327}, { abs_value_us = 1421456485050537}, {abs_value_us = 1421456435846095}, { abs_value_us = 1421456435846301}, {abs_value_us = 1421456435846500}}, transmission_delay = 0xbf71e38, avg_priority = 0, pid = 2, respect = 0, pending_queries = 0, pending_replies = 2}, last_migration_block = { abs_value_us = 1421176954366197}, mig_revive_task = 0x0, pth_head = 0xcc23fa8, pth_tail = 0xcc23fa8, delayed_head = 0x0, delayed_tail = 0x0, migration_pth = 0x0, rc = 0x0, rc_delay_task = 0x0, request_map = 0x0, cth = 0x0, inc_preference = 0, cth_in_progress = 1, disk_respect = 0, last_p2p_replies_woff = 0, last_client_replies_woff = 0, last_request_times_off = 6205, did_reserve = 1, creation_cb = 0x804b780 <connected_peer_cb>, creation_cb_cls = 0x0, respect_iterate_req = 0x158} | ||||
Tags | No tags attached. | ||||
related to | 0003699 | acknowledged | assertion failed in GNUNET_PEER_change_rc: id < size |
|
This is totally bizarre. The assertion is "pid != 0", and your own stack trace shows that "pid = 2". Why would it then fail!? Hardware problem? Compiler issue? |
|
I hit this again with the latest code (rev 35298) and again gdb showed pid != 0. I looked at the disassembly and the actual code looks correct so I think we can assume gdb is confused when it tries to print out the current value of pid. The backtrace should be accurate though. |
|
Maybe compiling with -O0 would be helpful so we can see what's really going on? |
|
Now we're getting somewhere: Program terminated with signal SIGABRT, Aborted. #0 0xb751dddc in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0xb751dddc in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0xb751f463 in __GI_abort () at abort.c:89 #2 0xb76c8a25 in GNUNET_abort_ () at common_logging.c:289 #3 0x0804cc91 in schedule_transmission (pth=0x1080b2c0) at gnunet-service-fs_cp.c:447 #4 0x0804f9b1 in GSF_peer_transmit_ (cp=0xa2f09f0, is_query=0, priority=0, timeout=..., size=29456, gmc=0x805a043 <transmit_message>, gmc_cls=0xb92a8b0) at gnunet-service-fs_cp.c:1429 #5 0x0805a3af in transmit_content (peer=0xb92a8b0, block=0xbb43370) at gnunet-service-fs_push.c:301 #6 0x0805abcd in process_migration_content (cls=0x0, key=0xbfa47400, size=29440, data=0xbfa47440, type=GNUNET_BLOCK_TYPE_FS_IBLOCK, priority=0, anonymity=1, expiration=..., uid=33832) at gnunet-service-fs_push.c:559 #7 0xb774c3dc in process_result_message (cls=0x921b158, msg=0xbfa473d0) at datastore_api.c:1260 #8 0xb774a6dc in receive_cb (cls=0x921b158, msg=0xbfa473d0) at datastore_api.c:592 #9 0xb76c4a42 in receive_task (cls=0x920bf90, tc=0xbfa4e7a4) at client.c:618 #10 0xb76fc3f0 in run_ready (rs=0x921ac68, ws=0x920b8c8) at scheduler.c:587 #11 0xb76fcc91 in GNUNET_SCHEDULER_run (task=0xb77077f5 <service_task>, task_cls=0xbfa4e9d0) at scheduler.c:816 #12 0xb7709393 in GNUNET_SERVICE_run (argc=3, argv=0xbfa4ebb4, service_name=0x805e43b "fs", options=GNUNET_SERVICE_OPTION_NONE, task=0x804c692 <run>, task_cls=0x0) at service.c:1503 #13 0x0804c8e4 in main (argc=3, argv=0xbfa4ebb4) at gnunet-service-fs.c:827 (gdb) p *pth $1 = {next = 0x0, prev = 0x0, transmission_request_start_time = { abs_value_us = 1425304033489875}, timeout = { abs_value_us = 18446744073709551615}, timeout_task = 0xc67ad28, gmc = 0x805a043 <transmit_message>, cp = 0xa2f09f0, gmc_cls = 0xb92a8b0, size = 29456, is_query = 0, was_reserved = 0, priority = 0} (gdb) p *cp $2 = {ppd = {last_client_replies = {0xd4b8868, 0xa6d8a78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, last_p2p_replies = {0, 0, 0, 0, 0, 0, 0, 0}, avg_reply_delay = {rel_value_us = 0}, migration_delay = { rel_value_us = 0}, migration_blocked_until = { abs_value_us = 1425303247771586}, last_request_times = {{ abs_value_us = 1425304025260973}, {abs_value_us = 1425304025261348}, { abs_value_us = 1425304025261703}, {abs_value_us = 1425304030754220}, { abs_value_us = 1425304030754877}, {abs_value_us = 1425304030757134}, { abs_value_us = 1425304030757423}, {abs_value_us = 1425304030757672}, { abs_value_us = 1425304030757964}, {abs_value_us = 1425304030758249}, { abs_value_us = 1425304030758542}, {abs_value_us = 1425304031137648}, { abs_value_us = 1425304022642251}, {abs_value_us = 1425304022642515}, { abs_value_us = 1425304025260096}, {abs_value_us = 1425304025260600}}, transmission_delay = 0xfa5ae18, avg_priority = 0, pid = 2, respect = 0, pending_queries = 0, pending_replies = 2}, last_migration_block = { abs_value_us = 0}, mig_revive_task = 0x0, pth_head = 0x1080b2c0, pth_tail = 0x1080b2c0, delayed_head = 0x0, delayed_tail = 0x0, migration_pth = 0x0, rc = 0x0, rc_delay_task = 0x0, request_map = 0x0, cth = 0x0, inc_preference = 0, cth_in_progress = 1, disk_respect = 0, last_p2p_replies_woff = 0, last_client_replies_woff = 0, last_request_times_off = 6428, did_reserve = 1, creation_cb = 0x804c24e <connected_peer_cb>, creation_cb_cls = 0x0, respect_iterate_req = 0x0} |
|
I forgot to mention, that is from a peer at rev 35318. |
|
fs logs from rev 35341: Mar 05 16:59:16-515206 fs-5478 ERROR Assertion failed at core_api.c:1344. Mar 05 16:59:16-515323 fs-5478 ERROR Assertion failed at gnunet-service-fs_cp.c:455. |
|
Sending gnunet-service-fs_cp.c Sending gnunet-service-fs_push.c Transmitting file data .. Committed revision 35767. This was a fix related to 0003615: we might (theoretically) get a migration stop before we finished the peerstore lookup; in that case, we would do a GSF_push_start() twice (once via revive migration, and once from peerstore lookup succeeding), and then on disconnect only stop once, keeping the migration going for a disconnected peer. That still does not explain the bug report, as the '*cp' in the report doesn't show BADFOOD, and we'd expect a free'd cp in this case to be used for the transmission. Still, this is a possible issue which can be fixed by ignoring migration stop messages until after peerstore lookup succeeded. |
|
Summary of what is known about the bug: FS content migration decides to migrate content to some other peer. CORE says we're not connected to that peer, we (core_api.c:1344 assertion). FS dies (unexpected failure to transmit), at fs_cp:455. The "_push" module is usually notified when a peer connects / disconnects, adding / removing it from the list of valid migration targets. It is also notified when the other peer wants us to stop migrating data to it (temporarily). After SVN 35767, that logic seems robust. But 35767 ought to not have been the cause, as we don't see the BADFOOD from a GNUNET_free() in the trace, unless that was disabled (amatus?). Additional assertions were added to prevent a peer from being added twice to the push list in the future. Given those changes, I would like to know if the bug is still reproduceable. |
|
I don't think I disabled BADFOOD, but maybe this memory was allocated again in the mean time. |
|
That's a quite plausible explanation, please let me know if my patch fixes it... |
|
It has not reoccurred. |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-01-19 19:08 | amatus | New Issue | |
2015-02-01 13:16 | Christian Grothoff | Note Added: 0008832 | |
2015-02-01 13:16 | Christian Grothoff | Assigned To | => Christian Grothoff |
2015-02-01 13:16 | Christian Grothoff | Status | new => feedback |
2015-02-23 19:05 | amatus | Note Added: 0008923 | |
2015-02-23 19:05 | amatus | Status | feedback => assigned |
2015-02-28 14:45 | Christian Grothoff | Note Added: 0008933 | |
2015-02-28 18:35 | Christian Grothoff | Status | assigned => feedback |
2015-03-02 17:04 | amatus | Note Added: 0008961 | |
2015-03-02 17:04 | amatus | Status | feedback => assigned |
2015-03-02 17:05 | amatus | Note Added: 0008962 | |
2015-03-02 17:42 | amatus | Relationship added | related to 0003699 |
2015-03-05 19:38 | amatus | Note Added: 0008981 | |
2015-03-05 19:38 | amatus | Summary | assertion failed in schedule_transmission, 0 != cp->ppd.pid => assertion failed in schedule_transmission, GNUNET_assert (NULL != cp->cth); |
2015-03-05 19:39 | amatus | Note Edited: 0008981 | |
2015-03-05 19:40 | amatus | Note Edited: 0008981 | |
2015-03-21 00:36 | Christian Grothoff | Target Version | => 0.11.0 |
2015-05-21 12:08 | Christian Grothoff | Note Added: 0009147 | |
2015-05-21 13:42 | Christian Grothoff | Note Added: 0009148 | |
2015-05-21 13:42 | Christian Grothoff | Status | assigned => feedback |
2015-05-21 13:43 | Christian Grothoff | Assigned To | Christian Grothoff => amatus |
2015-05-21 18:22 | amatus | Note Added: 0009149 | |
2015-05-21 18:22 | amatus | Status | feedback => assigned |
2015-05-21 18:50 | Christian Grothoff | Note Added: 0009150 | |
2015-06-19 02:44 | amatus | Note Added: 0009298 | |
2015-06-19 06:50 | Christian Grothoff | Status | assigned => resolved |
2015-06-19 06:50 | Christian Grothoff | Fixed in Version | => 0.11.0pre66 |
2015-06-19 06:50 | Christian Grothoff | Resolution | open => fixed |
2015-06-19 06:50 | Christian Grothoff | Assigned To | amatus => Christian Grothoff |
2015-06-19 06:50 | Christian Grothoff | Target Version | 0.11.0 => 0.11.0pre66 |
2018-06-07 00:25 | Christian Grothoff | Status | resolved => closed |