View Issue Details

IDProjectCategoryView StatusLast Update
0003615GNUnetfile-sharing servicepublic2018-06-07 00:25
Reporteramatus Assigned ToChristian Grothoff  
PrioritynormalSeveritycrashReproducibilityhave not tried
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0003615: assertion failed in schedule_transmission, GNUNET_assert (NULL != cp->cth);
DescriptionMy gnunet-service-fs from rev 34880 crashed and dumped core. Backtrace and some inspection are below.
Additional Information(gdb) bt full
#0 0xb752cddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = <optimized out>
        resultvar = <optimized out>
        pid = -1217998848
        selftid = 11624
#1 0xb752e463 in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
          sa_mask = {__val = {0, 0, 544104778, 3078479872, 3077870288, 1,
              3218252592, 3078405279, 3077870728, 3073032928, 1, 1, 0, 0, 1,
              1, 3218252564, 3077819504, 3077347628, 3077337644, 3218252580,
              134632792, 134516168, 3075494164, 3218252728, 3077409258,
              134591004, 3077817652, 32784, 0, 3218252728, 3078430752}},
          sa_flags = 0, sa_restorer = 0xb752e320 <__GI_abort>}
        sigs = {__val = {32, 0 <repeats 31 times>}}
#2 0xb76d8024 in GNUNET_abort () at common_logging.c:289
No locals.
#3 0x0804ccf6 in schedule_transmission (pth=pth@entry=0xcc23fa8)
    at gnunet-service-fs_cp.c:414
        cp = <optimized out>
        target = {public_key = {
            q_y = "9=\256\204\004T\256\346\025\vVU?\331y\214\265\373\361\201\274\\`\216\202}\222\334\002\374\336-"}}
        __FUNCTION__ = "schedule_transmission"
#4 0x0804e8df in GSF_peer_transmit_ (cp=0xcb53750, is_query=0, priority=0,
    timeout=..., size=32784, gmc=0x8058420 <transmit_message>,
    gmc_cls=0xbcd6f00) at gnunet-service-fs_cp.c:1428
        pth = 0xcc23fa8
        pos = <optimized out>
        prev = <optimized out>
        __FUNCTION__ = "GSF_peer_transmit_"
#5 0x08057d3f in transmit_content (peer=peer@entry=0xbcd6f00,
    block=block@entry=0xd345300) at gnunet-service-fs_push.c:301
        msize = 32784
        msg = 0x12406800
        i = <optimized out>
        ppd = 0xcb53750
        ret = 0
        __FUNCTION__ = "transmit_content"
#6 0x08057fde in process_migration_content (cls=0x0, key=0xbfd2a580,
    size=32768, data=0xbfd2a5c0, type=GNUNET_BLOCK_TYPE_FS_DBLOCK, priority=0,
    anonymity=1, expiration=..., uid=5775) at gnunet-service-fs_push.c:559
        mb = 0xd345300
        pos = 0xbcd6f00
        __FUNCTION__ = "process_migration_content"
#7 0xb7768b81 in process_result_message (cls=0x9c2a150, msg=0xbfd2a550)
    at datastore_api.c:1260
        h = 0x9c2a150
        qe = <optimized out>
        rc = {proc = 0x8057df0 <process_migration_content>, proc_cls = 0x0}
        dm = 0xbfd2a550
        was_transmitted = <optimized out>
        __FUNCTION__ = "process_result_message"
#8 0xb7768e45 in receive_cb (cls=0x9c2a150, msg=0xbfd2a550)
    at datastore_api.c:592
        h = 0x9c2a150
        qe = <optimized out>
#9 0xb76d3ea5 in receive_task (cls=0x9c1aff8, tc=0xbfd32674) at client.c:595
        client = 0x9c1aff8
        handler = 0xb7768e10 <receive_cb>
        cmsg = 0xc7d9638
        handler_cls = 0x9c2a150
        mbuf = 0xbfd2a550 "\200p"
        msg = 0xbfd2a550
        __FUNCTION__ = "receive_task"
#10 0xb77055fe in run_ready (ws=<optimized out>, rs=<optimized out>)
    at scheduler.c:587
        p = GNUNET_SCHEDULER_PRIORITY_DEFAULT
        pos = 0x1cad39c8
        tc = {reason = GNUNET_SCHEDULER_REASON_TIMEOUT,
          read_ready = 0x9c29c38, write_ready = 0x9c29cc0}
#11 GNUNET_SCHEDULER_run (task=0xb770b5d0 <service_task>, task_cls=0xbfd32774)
    at scheduler.c:816
        timeout = {rel_value_us = 25769803776}
        ret = 0
        c = -73 '\267'
        __FUNCTION__ = "GNUNET_SCHEDULER_run"
#12 0xb770e839 in GNUNET_SERVICE_run (argc=3, argv=0xbfd32984,
    service_name=0x805b254 "fs", options=GNUNET_SERVICE_OPTION_NONE, task=0x0,
    task_cls=0x0) at service.c:1502




(gdb) p *pth->cp
$3 = {ppd = {atsi = 0x1330c3e0, last_client_replies = {0x148ceee8, 0x0, 0x0,
      0x0, 0x0, 0x0, 0x0, 0x0}, last_p2p_replies = {0, 0, 0, 0, 0, 0, 0, 0},
    avg_reply_delay = {rel_value_us = 0}, migration_delay = {
      rel_value_us = 0}, migration_blocked_until = {
      abs_value_us = 1421431868190762}, last_request_times = {{
        abs_value_us = 1421456435846699}, {abs_value_us = 1421456435847009}, {
        abs_value_us = 1421456435847390}, {abs_value_us = 1421456435847617}, {
        abs_value_us = 1421456485048354}, {abs_value_us = 1421456485048693}, {
        abs_value_us = 1421456485048977}, {abs_value_us = 1421456485049335}, {
        abs_value_us = 1421456485049671}, {abs_value_us = 1421456485049904}, {
        abs_value_us = 1421456485050116}, {abs_value_us = 1421456485050327}, {
        abs_value_us = 1421456485050537}, {abs_value_us = 1421456435846095}, {
        abs_value_us = 1421456435846301}, {abs_value_us = 1421456435846500}},
    transmission_delay = 0xbf71e38, avg_priority = 0, pid = 2, respect = 0,
    pending_queries = 0, pending_replies = 2}, last_migration_block = {
    abs_value_us = 1421176954366197}, mig_revive_task = 0x0,
  pth_head = 0xcc23fa8, pth_tail = 0xcc23fa8, delayed_head = 0x0,
  delayed_tail = 0x0, migration_pth = 0x0, rc = 0x0, rc_delay_task = 0x0,
  request_map = 0x0, cth = 0x0, inc_preference = 0, cth_in_progress = 1,
  disk_respect = 0, last_p2p_replies_woff = 0, last_client_replies_woff = 0,
  last_request_times_off = 6205, did_reserve = 1,
  creation_cb = 0x804b780 <connected_peer_cb>, creation_cb_cls = 0x0,
  respect_iterate_req = 0x158}

TagsNo tags attached.

Relationships

related to 0003699 acknowledged assertion failed in GNUNET_PEER_change_rc: id < size 

Activities

Christian Grothoff

2015-02-01 13:16

manager   ~0008832

This is totally bizarre. The assertion is "pid != 0", and your own stack trace shows that "pid = 2". Why would it then fail!? Hardware problem? Compiler issue?

amatus

2015-02-23 19:05

developer   ~0008923

I hit this again with the latest code (rev 35298) and again gdb showed pid != 0. I looked at the disassembly and the actual code looks correct so I think we can assume gdb is confused when it tries to print out the current value of pid. The backtrace should be accurate though.

Christian Grothoff

2015-02-28 14:45

manager   ~0008933

Maybe compiling with -O0 would be helpful so we can see what's really going on?

amatus

2015-03-02 17:04

developer   ~0008961

Now we're getting somewhere:

Program terminated with signal SIGABRT, Aborted.
#0 0xb751dddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0xb751dddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0xb751f463 in __GI_abort () at abort.c:89
#2 0xb76c8a25 in GNUNET_abort_ () at common_logging.c:289
#3 0x0804cc91 in schedule_transmission (pth=0x1080b2c0)
    at gnunet-service-fs_cp.c:447
#4 0x0804f9b1 in GSF_peer_transmit_ (cp=0xa2f09f0, is_query=0, priority=0,
    timeout=..., size=29456, gmc=0x805a043 <transmit_message>,
    gmc_cls=0xb92a8b0) at gnunet-service-fs_cp.c:1429
#5 0x0805a3af in transmit_content (peer=0xb92a8b0, block=0xbb43370)
    at gnunet-service-fs_push.c:301
#6 0x0805abcd in process_migration_content (cls=0x0, key=0xbfa47400,
    size=29440, data=0xbfa47440, type=GNUNET_BLOCK_TYPE_FS_IBLOCK, priority=0,
    anonymity=1, expiration=..., uid=33832) at gnunet-service-fs_push.c:559
#7 0xb774c3dc in process_result_message (cls=0x921b158, msg=0xbfa473d0)
    at datastore_api.c:1260
#8 0xb774a6dc in receive_cb (cls=0x921b158, msg=0xbfa473d0)
    at datastore_api.c:592
#9 0xb76c4a42 in receive_task (cls=0x920bf90, tc=0xbfa4e7a4) at client.c:618
#10 0xb76fc3f0 in run_ready (rs=0x921ac68, ws=0x920b8c8) at scheduler.c:587
#11 0xb76fcc91 in GNUNET_SCHEDULER_run (task=0xb77077f5 <service_task>,
    task_cls=0xbfa4e9d0) at scheduler.c:816
#12 0xb7709393 in GNUNET_SERVICE_run (argc=3, argv=0xbfa4ebb4,
    service_name=0x805e43b "fs", options=GNUNET_SERVICE_OPTION_NONE,
    task=0x804c692 <run>, task_cls=0x0) at service.c:1503
#13 0x0804c8e4 in main (argc=3, argv=0xbfa4ebb4) at gnunet-service-fs.c:827
(gdb) p *pth
$1 = {next = 0x0, prev = 0x0, transmission_request_start_time = {
    abs_value_us = 1425304033489875}, timeout = {
    abs_value_us = 18446744073709551615}, timeout_task = 0xc67ad28,
  gmc = 0x805a043 <transmit_message>, cp = 0xa2f09f0, gmc_cls = 0xb92a8b0,
  size = 29456, is_query = 0, was_reserved = 0, priority = 0}
(gdb) p *cp
$2 = {ppd = {last_client_replies = {0xd4b8868, 0xa6d8a78, 0x0, 0x0, 0x0, 0x0,
      0x0, 0x0}, last_p2p_replies = {0, 0, 0, 0, 0, 0, 0, 0},
    avg_reply_delay = {rel_value_us = 0}, migration_delay = {
      rel_value_us = 0}, migration_blocked_until = {
      abs_value_us = 1425303247771586}, last_request_times = {{
        abs_value_us = 1425304025260973}, {abs_value_us = 1425304025261348}, {
        abs_value_us = 1425304025261703}, {abs_value_us = 1425304030754220}, {
        abs_value_us = 1425304030754877}, {abs_value_us = 1425304030757134}, {
        abs_value_us = 1425304030757423}, {abs_value_us = 1425304030757672}, {
        abs_value_us = 1425304030757964}, {abs_value_us = 1425304030758249}, {
        abs_value_us = 1425304030758542}, {abs_value_us = 1425304031137648}, {
        abs_value_us = 1425304022642251}, {abs_value_us = 1425304022642515}, {
        abs_value_us = 1425304025260096}, {abs_value_us = 1425304025260600}},
    transmission_delay = 0xfa5ae18, avg_priority = 0, pid = 2, respect = 0,
    pending_queries = 0, pending_replies = 2}, last_migration_block = {
    abs_value_us = 0}, mig_revive_task = 0x0, pth_head = 0x1080b2c0,
  pth_tail = 0x1080b2c0, delayed_head = 0x0, delayed_tail = 0x0,
  migration_pth = 0x0, rc = 0x0, rc_delay_task = 0x0, request_map = 0x0,
  cth = 0x0, inc_preference = 0, cth_in_progress = 1, disk_respect = 0,
  last_p2p_replies_woff = 0, last_client_replies_woff = 0,
  last_request_times_off = 6428, did_reserve = 1,
  creation_cb = 0x804c24e <connected_peer_cb>, creation_cb_cls = 0x0,
  respect_iterate_req = 0x0}

amatus

2015-03-02 17:05

developer   ~0008962

I forgot to mention, that is from a peer at rev 35318.

amatus

2015-03-05 19:38

developer   ~0008981

Last edited: 2015-03-05 19:40

fs logs from rev 35341:

Mar 05 16:59:16-515206 fs-5478 ERROR Assertion failed at core_api.c:1344.
Mar 05 16:59:16-515323 fs-5478 ERROR Assertion failed at gnunet-service-fs_cp.c:455.

Christian Grothoff

2015-05-21 12:08

manager   ~0009147

Sending gnunet-service-fs_cp.c
Sending gnunet-service-fs_push.c
Transmitting file data ..
Committed revision 35767.

This was a fix related to 0003615: we might (theoretically) get a migration stop before we finished the peerstore lookup; in that case, we would do a GSF_push_start() twice (once via revive migration, and once from peerstore lookup succeeding), and then on disconnect only stop once, keeping the migration going for a disconnected peer. That still does not explain the bug report, as the '*cp' in the report doesn't show BADFOOD, and we'd expect a free'd cp in this case to be used for the transmission. Still, this is a possible issue which can be fixed by ignoring migration stop messages until after peerstore lookup succeeded.

Christian Grothoff

2015-05-21 13:42

manager   ~0009148

Summary of what is known about the bug:

FS content migration decides to migrate content to some other peer.
CORE says we're not connected to that peer, we (core_api.c:1344 assertion).
FS dies (unexpected failure to transmit), at fs_cp:455.

The "_push" module is usually notified when a peer connects / disconnects, adding / removing it from the list of valid migration targets. It is also notified when the other peer wants us to stop migrating data to it (temporarily). After SVN 35767, that logic seems robust. But 35767 ought to not have been the cause, as we don't see the BADFOOD from a GNUNET_free() in the trace, unless that was disabled (amatus?).

Additional assertions were added to prevent a peer from being added twice to the push list in the future. Given those changes, I would like to know if the bug is still reproduceable.

amatus

2015-05-21 18:22

developer   ~0009149

I don't think I disabled BADFOOD, but maybe this memory was allocated again in the mean time.

Christian Grothoff

2015-05-21 18:50

manager   ~0009150

That's a quite plausible explanation, please let me know if my patch fixes it...

amatus

2015-06-19 02:44

developer   ~0009298

It has not reoccurred.

Issue History

Date Modified Username Field Change
2015-01-19 19:08 amatus New Issue
2015-02-01 13:16 Christian Grothoff Note Added: 0008832
2015-02-01 13:16 Christian Grothoff Assigned To => Christian Grothoff
2015-02-01 13:16 Christian Grothoff Status new => feedback
2015-02-23 19:05 amatus Note Added: 0008923
2015-02-23 19:05 amatus Status feedback => assigned
2015-02-28 14:45 Christian Grothoff Note Added: 0008933
2015-02-28 18:35 Christian Grothoff Status assigned => feedback
2015-03-02 17:04 amatus Note Added: 0008961
2015-03-02 17:04 amatus Status feedback => assigned
2015-03-02 17:05 amatus Note Added: 0008962
2015-03-02 17:42 amatus Relationship added related to 0003699
2015-03-05 19:38 amatus Note Added: 0008981
2015-03-05 19:38 amatus Summary assertion failed in schedule_transmission, 0 != cp->ppd.pid => assertion failed in schedule_transmission, GNUNET_assert (NULL != cp->cth);
2015-03-05 19:39 amatus Note Edited: 0008981
2015-03-05 19:40 amatus Note Edited: 0008981
2015-03-21 00:36 Christian Grothoff Target Version => 0.11.0
2015-05-21 12:08 Christian Grothoff Note Added: 0009147
2015-05-21 13:42 Christian Grothoff Note Added: 0009148
2015-05-21 13:42 Christian Grothoff Status assigned => feedback
2015-05-21 13:43 Christian Grothoff Assigned To Christian Grothoff => amatus
2015-05-21 18:22 amatus Note Added: 0009149
2015-05-21 18:22 amatus Status feedback => assigned
2015-05-21 18:50 Christian Grothoff Note Added: 0009150
2015-06-19 02:44 amatus Note Added: 0009298
2015-06-19 06:50 Christian Grothoff Status assigned => resolved
2015-06-19 06:50 Christian Grothoff Fixed in Version => 0.11.0pre66
2015-06-19 06:50 Christian Grothoff Resolution open => fixed
2015-06-19 06:50 Christian Grothoff Assigned To amatus => Christian Grothoff
2015-06-19 06:50 Christian Grothoff Target Version 0.11.0 => 0.11.0pre66
2018-06-07 00:25 Christian Grothoff Status resolved => closed