View Issue Details

IDProjectCategoryView StatusLast Update
0004909GNUnetcadet servicepublic2018-06-07 00:24
ReporterChristian GrothoffAssigned Toamatus 
Status closedResolutionfixed 
Platformi7OSDebian GNU/LinuxOS Versionsqueeze
Product VersionSVN HEAD 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0004909: GCPP_del_connection () invariant assertion (still) fails
DescriptionI've seen (and investigated!) this before, couldn't find it. It seems to happen 'rarely', don't have a deterministic way to reproduce yet. What is known is that the path is always too short to contain that connection at the specified 'off', OTOH whenever we shrink the path length we do check that we have no connection attached anymore :-(.

#0 0xb773dd40 in __kernel_vsyscall ()
#1 0xb6efbdc0 in raise () from /lib/i386-linux-gnu/
#2 0xb6efd287 in abort () from /lib/i386-linux-gnu/
#3 0xb765081b in GNUNET_abort_ () at common_logging.c:293
#4 0xb779d85b in GCPP_del_connection (path=0xb85e9eb8, off=1, cc=0xb85f9848) at gnunet-service-cadet-new_paths.c:182
#5 0xb7783f17 in GCC_destroy (cc=0xb85f9848) at gnunet-service-cadet-new_connection.c:244
#6 0xb77842f6 in GCC_destroy_without_tunnel (cc=0xb85f9848) at gnunet-service-cadet-new_connection.c:305
#7 0xb7795c90 in destroy_t_connection (cls=0xb85ea3c8, ct=0xb85e8578) at gnunet-service-cadet-new_tunnels.c:1997
#8 0xb779cb18 in GCT_iterate_connections (t=0xb85ea3c8, iter=0xb7795b6a <destroy_t_connection>, iter_cls=0xb85ea3c8) at gnunet-service-cadet-new_tunnels.c:3287
#9 0xb7795e2a in destroy_tunnel (cls=0xb85ea3c8) at gnunet-service-cadet-new_tunnels.c:2017
#10 0xb77969e7 in GCT_destroy_tunnel_now (t=0xb85ea3c8) at gnunet-service-cadet-new_tunnels.c:2132
#11 0xb7771f4b in destroy_tunnels_now (cls=0x0, pid=0xb85e9098, value=0xb85e9098) at gnunet-service-cadet-new.c:343
#12 0xb76755fe in GNUNET_CONTAINER_multipeermap_iterate (map=0xb85ed350, it=0xb7771f0f <destroy_tunnels_now>, it_cls=0x0) at container_multipeermap.c:343
#13 0xb77a627e in GCP_iterate_all (iter=0xb7771f0f <destroy_tunnels_now>, cls=0x0) at gnunet-service-cadet-new_peer.c:1121
#14 0xb7772032 in shutdown_rest () at gnunet-service-cadet-new.c:392
#15 0xb77721bb in shutdown_task (cls=0x0) at gnunet-service-cadet-new.c:434
#16 0xb76bb301 in run_ready (rs=0xb85ec090, ws=0xb85ec118) at scheduler.c:620
#17 0xb76bbdff in GNUNET_SCHEDULER_run (task=0xb76d655a <service_main>, task_cls=0xbfb0a5b4) at scheduler.c:887
#18 0xb76dd9c5 in GNUNET_SERVICE_ruN_ (argc=3, argv=0xbfb0a904, service_name=0xb77a8147 "cadet", options=GNUNET_SERVICE_OPTION_NONE, service_init_cb=0xb7775baa <run>,
    connect_cb=0xb7774a03 <client_connect_cb>, disconnect_cb=0xb777540b <client_disconnect_cb>, cls=0x0, handlers=0xbfb0a6b0) at service_new.c:1844
#19 0xb7776621 in main (argc=3, argv=0xbfb0a904) at gnunet-service-cadet-new.c:1435
TagsNo tags attached.



2017-07-02 11:55

developer   ~0012295

Last edited: 2017-07-02 11:55

View 2 revisions

> I've seen (and investigated!) this before, couldn't find it. It seems to happen 'rarely', don't have a deterministic way to reproduce yet. What is known is that the path is always too short to contain that connection at the specified 'off', OTOH whenever we shrink the path length we do check that we have no connection attached anymore :-(.

Do you need any help on this? Could someone help with this?
If so, could you describe how you came to produce this bug?


2017-08-20 22:52

developer   ~0012381

If this bug is not reproducible so far, shouldn't we just move it over to the version after the next version release?

What makes this bug special to block the next version snapshot?

Christian Grothoff

2017-08-21 09:27

manager   ~0012382

The bug is reproduceable, just not (yet) deterministically reproducable. It just happens "at random", but I've seen it repeatedly and I am sure it is still there.

Also, it's not actually the only blocker, there is also a segv in transport, just lacks a proper report as I didn't even get a decent stack trace yet.

Ultimately, what is 'special' is that these bugs are known and produce segfaults, which IMO is not acceptable.


2017-08-21 09:36

developer   ~0012383

Okay, thanks for clearing that up.

Yes, I agree (too many/too obvious) segfaults should not be in a commit considered a release version.

Christian Grothoff

2017-08-21 13:34

manager   ~0012386

Now, this is interesting: I just got a slightly _different_ assertion violation in CADET, so this may help diagnose what is wrong:

#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f2f861923fa in __GI_abort () at abort.c:89
#2 0x00007f2f878495ee in GNUNET_abort_ () at common_logging.c:293
#3 0x000000000042019f in extend_path (path=0xf39f40, peers=0x7ffc83faf568, num_peers=2, force=0)
    at gnunet-service-cadet_paths.c:437
#4 0x0000000000420714 in GCPP_try_path_from_dht (get_path=0xf3e690, get_path_length=1, put_path=0xf3e650, put_path_length=2)
    at gnunet-service-cadet_paths.c:543
#5 0x000000000041501a in dht_get_id_handler (cls=0xf08c30, exp=..., key=0xf3e610, get_path=0xf3e690, get_path_length=1,
    put_path=0xf3e650, put_path_length=2, type=GNUNET_BLOCK_TYPE_DHT_HELLO, size=90, data=0xf3e6b0)
    at gnunet-service-cadet_dht.c:125
#6 0x00007f2f8721fbe8 in process_client_result (cls=0xf3e5f0, key=0xf3e610, value=0xf22aa0) at dht_api.c:787
#7 0x00007f2f8785516a in GNUNET_CONTAINER_multihashmap_get_multiple (map=0xeeebb0, key=0xf3e610,
    it=0x7f2f8721f80a <process_client_result>, it_cls=0xf3e5f0) at container_multihashmap.c:816
#8 0x00007f2f8721fc38 in handle_client_result (cls=0xeef3e0, msg=0xf3e5f0) at dht_api.c:813
#9 0x00007f2f878739d2 in GNUNET_MQ_inject_message (mq=0xef0680, mh=0xf3e5f0) at mq.c:252
#10 0x00007f2f878462a1 in recv_message (cls=0xef0530, msg=0xf3e5f0) at client.c:315
#11 0x00007f2f87873057 in GNUNET_MST_from_buffer (mst=0xeeed20, buf=0x0, size=0, purge=0, one_shot=0) at mst.c:232
#12 0x00007f2f87873737 in GNUNET_MST_read (mst=0xeeed20, sock=0xef3f70, purge=0, one_shot=0) at mst.c:359
#13 0x00007f2f8784674a in receive_ready (cls=0xef0530) at client.c:397
#14 0x00007f2f87886470 in run_ready (rs=0xeedd70, ws=0xeede00) at scheduler.c:670
#15 0x00007f2f87886dbb in GNUNET_SCHEDULER_run (task=0x7f2f8788a678 <service_main>, task_cls=0x7ffc83fafe10)
    at scheduler.c:937
#16 0x00007f2f8788f7c4 in GNUNET_SERVICE_run_ (argc=3, argv=0x7ffc83fb0398, service_name=0x4258bf "cadet",
    options=GNUNET_SERVICE_OPTION_NONE, service_init_cb=0x407176 <run>, connect_cb=0x40642c <client_connect_cb>,
    disconnect_cb=0x406d03 <client_disconnect_cb>, cls=0x0, handlers=0x7ffc83faff70) at service.c:1846
#17 0x0000000000407c1f in main (argc=3, argv=0x7ffc83fb0398) at gnunet-service-cadet.c:1461


2017-11-19 20:15

developer   ~0012585

I've hit that last assert a few times now and I'm looking into it. I'm running commit 6249fe609735bb4feb6cab00698928c80ba94f68

Christian Grothoff

2017-11-26 12:05

manager   ~0012602

Amatus made this commit fixing an off-by-one in CADET. He writes it 'might be related'; however, I don't see how precisely it would relate to this issue. Still, great catch ;-)

index e75fd393c..79eed0dcc 100644
--- a/src/cadet/gnunet-service-cadet_paths.c
+++ b/src/cadet/gnunet-service-cadet_paths.c
@@ -498,8 +498,8 @@ GCPP_try_path_from_dht (const struct GNUNET_PeerIdentity *get_path,
     const struct GNUNET_PeerIdentity *pid;
     pid = (off < get_path_length)
- ? &get_path[get_path_length - off]
- : &put_path[get_path_length + put_path_length - off];
+ ? &get_path[get_path_length - off - 1]
+ : &put_path[get_path_length + put_path_length - off - 1];
     cpath[off - skip] = GCP_get (pid,
     /* Check that no peer is twice on the path */


2017-11-26 14:56

developer   ~0012603

Turns out it didn't fix this, I'm still hitting the assert and the one at gnunet-service-cadet_paths.c:182


2017-11-27 04:30

developer   ~0012605

Here's a DEBUG log of the issue [with comments]:

Nov 26 11:13:28-597945 dht-api-11474 DEBUG Giving 434 byte reply for KCXBTR8Z to application (GP: GJD7, PP: KCXB-8Q08)
Nov 26 11:13:28-630107 cadet-pat-11474 DEBUG check_match found match with path GJD7(0x6030000b7d60)
Nov 26 11:13:28-642055 cadet-pat-11474 DEBUG Trying to extend existing path GJD7(0x6030000b7d60) by additional links discovered from DHT
[GCP_path_entry_add() called]
Nov 26 11:13:28-660442 cadet-per-11474 DEBUG Discovered that peer P(KCXB) is on path GJD7-8Q08-KCXB(0x6030000b7d60) at offset 2
Nov 26 11:13:28-660491 cadet-tun-11474 DEBUG Considering GJD7-8Q08-KCXB(0x6030000b7d60) for Tunnel KCXB
Nov 26 11:13:28-682706 cadet-con-11474 DEBUG Creating Connection RTVFFZ (Tunnel KCXB) using path GJD7-8Q08-KCXB(0x6030000b7d60)
[GCPP_add_connection() is called]
Nov 26 11:13:28-708120 cadet-pat-11474 DEBUG Adding connection Connection RTVFFZ (Tunnel KCXB) to path GJD7-8Q08-KCXB(0x6030000b7d60) at offset 2
[entry->cc is assigned]
Nov 26 11:13:28-708174 cadet-per-11474 DEBUG Adding connection Connection RTVFFZ (Tunnel KCXB) to peer P(GJD7)
Nov 26 11:13:28-708207 cadet-per-11474 DEBUG Adding connection Connection RTVFFZ (Tunnel KCXB) to peer P(8Q08)
Nov 26 11:13:28-708239 cadet-per-11474 DEBUG Creating MQM 0x60400009c090 for peer P(GJD7)
Nov 26 11:13:28-742454 cadet-con-11474 DEBUG Core MQ for Connection RTVFFZ (Tunnel KCXB) became available in state 0
Nov 26 11:13:28-742527 util-scheduler-11474 DEBUG Adding task: 0x60c00027bb00
Nov 26 11:13:28-742592 cadet-tun-11474 DEBUG Found interesting path GJD7-8Q08-KCXB(0x6030000b7d60) for Tunnel KCXB, created Connection RTVFFZ (Tunnel KCXB)
Nov 26 11:13:28-742628 cadet-per-11474 DEBUG Discovered that peer P(8Q08) is on path GJD7-8Q08-KCXB(0x6030000b7d60) at offset 1
Nov 26 11:13:28-742690 cadet-per-11474 DEBUG Detatching path GJD7-8Q08-KCXB(0x6030000b7d60) from peer P(GJD7)
Nov 26 11:13:28-775223 cadet-per-11474 DEBUG Decided to not attach path 0x5555557d8340 to peer P(KCXB) due to undesirability
[GCP_attach_path() returned NULL so we check entry->cc]
Nov 26 11:13:28-775277 cadet-11474 ERROR Assertion failed at gnunet-service-cadet_paths.c:437.

It's clear that entry->cc got assigned in GCPP_add_connection() because of the call to GCP_path_entry_add() earlier in extend_path() so I don't know why the code was expecting it to be NULL. I'm guessing there's extra clean-up to add here and remove the assert, or we should not be cleaning it up because the path is in use now? Maybe we needed to force the GCP_attach_path()?


2017-11-28 04:29

developer   ~0012608

Also of note is that most of the same logic from extend_path() is in GCPP_try_path_from_dht() and the NULL == entry->cc assertion is missing. These should be combined.

Christian Grothoff

2017-12-03 17:17

manager   ~0012623

We indeed ought to use 'force' here (see 862e488e0..90c803401), but what is really awkward is that we have the GCP_attach_path() decide that the new path is uninteresting (why???) while the connection was created eagerly. So the logic in "GCPP_get_desirability()" has a serious disagreement with the logic at gnunet-service-cadet_tunnels.c:2485+.

So a clean fix probably would involve refactoring that logic to ensure that we have _one_ consistent test for "path interesting".


2017-12-29 16:58

developer   ~0012709

Fixed in commit 5c05ea3ca25382022d21766b6e2d876c4dabd95d

Issue History

Date Modified Username Field Change
2017-02-21 18:01 Christian Grothoff New Issue
2017-02-21 18:01 Christian Grothoff Status new => assigned
2017-02-21 18:01 Christian Grothoff Assigned To => Bart Polot
2017-02-21 18:36 Christian Grothoff Assigned To Bart Polot => Christian Grothoff
2017-07-02 11:55 nikita Note Added: 0012295
2017-07-02 11:55 nikita Note Edited: 0012295 View Revisions
2017-08-20 22:52 nikita Note Added: 0012381
2017-08-21 09:27 Christian Grothoff Note Added: 0012382
2017-08-21 09:36 nikita Note Added: 0012383
2017-08-21 13:34 Christian Grothoff Note Added: 0012386
2017-11-19 20:15 amatus Note Added: 0012585
2017-11-26 12:05 Christian Grothoff Note Added: 0012602
2017-11-26 14:56 amatus Note Added: 0012603
2017-11-27 04:30 amatus Note Added: 0012605
2017-11-28 04:29 amatus Note Added: 0012608
2017-12-03 17:17 Christian Grothoff Note Added: 0012623
2017-12-29 16:57 amatus Assigned To Christian Grothoff => amatus
2017-12-29 16:58 amatus Status assigned => resolved
2017-12-29 16:58 amatus Resolution open => fixed
2017-12-29 16:58 amatus Fixed in Version => 0.11.0
2017-12-29 16:58 amatus Note Added: 0012709
2018-06-05 22:25 Christian Grothoff Fixed in Version 0.11.0 => 0.11.0pre66
2018-06-07 00:24 Christian Grothoff Status resolved => closed