View Issue Details

IDProjectCategoryView StatusLast Update
0003846GNUnetcadet servicepublic2018-06-07 00:25
ReporteramatusAssigned ToChristian Grothoff 
PrioritynormalSeveritycrashReproducibilityalways
Status closedResolutionfixed 
Platformx86OSDebianOS VersionJessie
Product VersionSVN HEAD 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0003846: Assertion failed (GNUNET_CONTAINER_multihashmap_remove) in GCP_remove_connection (rev 35954)
DescriptionProgram terminated with signal SIGABRT, Aborted.
#0 0x04161ddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x04161ddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x04163463 in __GI_abort () at abort.c:89
#2 0x0405d98a in GNUNET_abort_ () at common_logging.c:290
#3 0x08067f88 in GCP_remove_connection (peer=0x4a93df8, c=0x6062b58)
    at gnunet-service-cadet_peer.c:2181
#4 0x08057356 in unregister_neighbors (c=0x6062b58)
    at gnunet-service-cadet_connection.c:1582
#5 0x0805a230 in GCC_destroy (c=0x6062b58)
    at gnunet-service-cadet_connection.c:2804
#6 0x0805a0a0 in GCC_new (cid=0xbeb09648, t=0x0, path=0x5f40cc8, own_pos=2)
    at gnunet-service-cadet_connection.c:2766
#7 0x08057b55 in GCC_handle_create (cls=0x0, peer=0xbeb09624,
    message=0xbeb09644) at gnunet-service-cadet_connection.c:1811
#8 0x040cd726 in main_notify_handler (cls=0x4634440, msg=0xbeb09620)
    at core_api.c:967
#9 0x04059028 in receive_task (cls=0x46352b0, tc=0xbeb09814) at client.c:618
#10 0x040928b4 in run_ready (rs=0x4527358, ws=0x4527410) at scheduler.c:587
#11 0x04093142 in GNUNET_SCHEDULER_run (task=0x409dcd0 <service_task>,
    task_cls=0xbeb09a40) at scheduler.c:867
#12 0x0409f86e in GNUNET_SERVICE_run (argc=7, argv=0xbeb09c34,
    service_name=0x806d314 "cadet", options=GNUNET_SERVICE_OPTION_NONE,
    task=0x8069be5 <run>, task_cls=0x0) at service.c:1503
#13 0x08069e77 in main (argc=7, argv=0xbeb09c34) at gnunet-service-cadet.c:174
(gdb) f 3
#3 0x08067f88 in GCP_remove_connection (peer=0x4a93df8, c=0x6062b58)
    at gnunet-service-cadet_peer.c:2181
2181 gnunet-service-cadet_peer.c: No such file or directory.
(gdb) p *peer
$1 = {id = 11, last_contact = {abs_value_us = 1434723237915931},
  path_head = 0x4a94280, path_tail = 0x4a94280, search_h = 0x0,
  search_delayed = 0x0, tunnel = 0x0, connections = 0x4a944a8,
  core_transmit = 0x0, tmt_time = {abs_value_us = 0}, queue_head = 0x0,
  queue_tail = 0x0, queue_n = 0, hello = 0x4e7c798}
(gdb) p *peer->connections
$2 = {map = 0x4a944f0, size = 1, map_length = 16, use_small_entries = 0,
  modification_counter = 1}
(gdb) p *c
$3 = {t = 0x0, fwd_fc = {c = 0x6062b58, queue_n = 0, queue_max = 11,
    last_pid_sent = 4294967295, last_pid_recv = 4294967295, recv_bitmap = 0,
    last_ack_sent = 0, last_ack_recv = 0, poll_task = 0x0, poll_time = {
      rel_value_us = 1000000}, poll_msg = 0x0, ack_msg = 0x0}, bck_fc = {
    c = 0x6062b58, queue_n = 0, queue_max = 11, last_pid_sent = 4294967295,
    last_pid_recv = 4294967295, recv_bitmap = 0, last_ack_sent = 0,
    last_ack_recv = 0, poll_task = 0x0, poll_time = {rel_value_us = 1000000},
    poll_msg = 0x0, ack_msg = 0x0}, perf = 0x0, id = {
    bits = "\255\307\b\033+-\034\320?k\354\230\060\030{\273\032\067\377\024$\337$$'\245\241\300L\305_\376"}, path = 0x6056818, fwd_maintenance_task = 0x0,
  bck_maintenance_task = 0x0, maintenance_q = 0x0, next_peer = 0x4a915e0,
  prev_peer = 0x4a93df8, state = CADET_CONNECTION_NEW, own_pos = 2,
  pending_messages = 0, destroy = 2, create_retry = 0}

gnunet-service-cadet was run under valgrind and no memory badness was reported when hitting this assertion.
This happened after restarting gnunet-service-cadet after a previous crash.
TagsNo tags attached.

Relationships

has duplicate 0003873 closedBart Polot Assertion failed at gnunet-service-cadet_peer.c:2222. 
related to 0003845 closedBart Polot Assertion failure (c->next_peer == peer) in unregister_neighbors (rev 35954) 

Activities

amatus

2015-06-24 19:09

developer   ~0009331

SVN rev 35978 still hits this assertion. New error messages are printed out:

Jun 24 16:11:49-825775 cadet-p2p-7630 ERROR core wait time 1526348 µs
Jun 24 16:11:57-208685 cadet-p2p-7630 ERROR core wait time 1810139 µs
Jun 24 16:12:35-871996 cadet-p2p-7630 ERROR core wait time 3006385 µs
Jun 24 16:13:06-559054 cadet-p2p-7630 ERROR core wait time 7381775 µs
Jun 24 16:13:33-010045 cadet-p2p-7630 ERROR core wait time 5246260 µs
Jun 24 16:13:39-973830 cadet-p2p-7630 ERROR core wait time 2006685 µs
Jun 24 16:13:52-347106 cadet-p2p-7630 ERROR core wait time 2666171 µs
Jun 24 16:14:01-146560 cadet-p2p-7630 ERROR core wait time 7768246 µs
Jun 24 16:14:10-283293 cadet-p2p-7630 ERROR core wait time 4631520 µs
Jun 24 16:14:33-392780 cadet-p2p-7630 ERROR core wait time 4526869 µs
Jun 24 16:14:33-453537 cadet-p2p-7630 ERROR core wait time 4560109 µs
Jun 24 16:14:35-003190 cadet-7630 ERROR Assertion failed at gnunet-service-cadet_peer.c:2199.

Jun 24 16:17:30-335802 cadet-p2p-7637 ERROR core wait time 1109652 µs
Jun 24 16:17:48-483554 cadet-p2p-7637 ERROR core wait time 11343884 µs
Jun 24 16:17:50-826244 cadet-p2p-7637 ERROR core wait time 1415831 µs
Jun 24 16:17:56-706457 cadet-p2p-7637 ERROR core wait time 3268081 µs
Jun 24 16:18:09-540902 cadet-p2p-7637 ERROR core wait time 5441439 µs
Jun 24 16:18:14-860535 cadet-p2p-7637 ERROR core wait time 4287178 µs
Jun 24 16:18:20-694135 cadet-p2p-7637 ERROR core wait time 1796187 µs
Jun 24 16:18:20-694288 cadet-p2p-7637 ERROR core wait time 1450599 µs
Jun 24 16:18:37-436791 cadet-p2p-7637 ERROR core wait time 3483678 µs
Jun 24 16:18:47-915556 cadet-p2p-7637 ERROR core wait time 24561400 µs
Jun 24 16:19:03-961717 cadet-p2p-7637 ERROR core wait time 9317277 µs
Jun 24 16:19:03-961910 cadet-p2p-7637 ERROR core wait time 8231655 µs
Jun 24 16:19:07-275967 cadet-p2p-7637 ERROR core wait time 2231404 µs
Jun 24 16:19:11-584880 cadet-p2p-7637 ERROR core wait time 1841162 µs
Jun 24 16:19:19-633443 cadet-p2p-7637 ERROR core wait time 3378745 µs
Jun 24 16:19:19-633620 cadet-p2p-7637 ERROR core wait time 3378896 µs
Jun 24 16:19:26-867448 cadet-p2p-7637 ERROR core wait time 4963454 µs
Jun 24 16:19:33-174265 cadet-p2p-7637 ERROR core wait time 3921485 µs
Jun 24 16:19:33-174399 cadet-p2p-7637 ERROR core wait time 3816158 µs
Jun 24 16:19:40-782140 cadet-p2p-7637 ERROR core wait time 1006007 µs
Jun 24 16:19:51-215917 cadet-p2p-7637 ERROR core wait time 5232503 µs
Jun 24 16:19:52-821536 cadet-p2p-7637 ERROR core wait time 2303988 µs
Jun 24 16:19:55-298745 cadet-p2p-7637 ERROR core wait time 3048631 µs
Jun 24 16:20:07-257621 cadet-p2p-7637 ERROR core wait time 12146145 µs
Jun 24 16:20:19-863831 cadet-p2p-7637 ERROR core wait time 9055750 µs
Jun 24 16:20:39-746406 cadet-p2p-7637 ERROR core wait time 3665470 µs
Jun 24 16:20:39-746590 cadet-p2p-7637 ERROR core wait time 3638215 µs
Jun 24 16:20:39-767277 cadet-p2p-7637 ERROR core wait time 3543392 µs
Jun 24 16:20:43-496210 cadet-p2p-7637 ERROR core wait time 2619660 µs
Jun 24 16:20:43-496355 cadet-p2p-7637 ERROR core wait time 2422327 µs
Jun 24 16:20:43-496431 cadet-p2p-7637 ERROR core wait time 2386799 µs
Jun 24 16:21:10-309795 cadet-p2p-7637 ERROR core wait time 25751076 µs
Jun 24 16:21:11-272378 cadet-p2p-7637 ERROR core wait time 25411188 µs
Jun 24 16:21:14-250999 cadet-p2p-7637 ERROR core wait time 2761715 µs
Jun 24 16:21:32-164995 cadet-p2p-7637 ERROR core wait time 16183941 µs
Jun 24 16:21:37-040363 cadet-p2p-7637 ERROR core wait time 5188508 µs
Jun 24 16:21:37-058879 cadet-p2p-7637 ERROR core wait time 3831963 µs
Jun 24 16:21:39-050295 cadet-p2p-7637 ERROR core wait time 6921646 µs
Jun 24 16:21:50-005058 cadet-p2p-7637 ERROR core wait time 11864860 µs
Jun 24 16:21:51-615700 cadet-p2p-7637 ERROR core wait time 11535796 µs
Jun 24 16:21:51-615871 cadet-p2p-7637 ERROR core wait time 12357191 µs
Jun 24 16:22:06-158601 cadet-p2p-7637 ERROR core wait time 13403850 µs
Jun 24 16:22:33-593809 cadet-p2p-7637 ERROR core wait time 25050929 µs
Jun 24 16:22:33-621438 cadet-p2p-7637 ERROR core wait time 4680544 µs
Jun 24 16:22:44-272479 cadet-p2p-7637 ERROR core wait time 9446562 µs
Jun 24 16:22:44-272642 cadet-p2p-7637 ERROR core wait time 35687409 µs
Jun 24 16:22:44-291908 cadet-p2p-7637 ERROR core wait time 8111919 µs
Jun 24 16:22:44-465022 cadet-p2p-7637 ERROR core wait time 35792148 µs
Jun 24 16:23:03-892711 cadet-p2p-7637 ERROR core wait time 18432260 µs
Jun 24 16:23:03-979330 cadet-p2p-7637 ERROR core wait time 18603614 µs
Jun 24 16:23:04-022888 cadet-p2p-7637 ERROR core wait time 18391794 µs
Jun 24 16:23:04-427159 cadet-p2p-7637 ERROR core wait time 18731278 µs
Jun 24 16:23:17-497803 cadet-p2p-7637 ERROR core wait time 12441777 µs
Jun 24 16:23:17-497955 cadet-p2p-7637 ERROR core wait time 12261175 µs
Jun 24 16:23:17-498035 cadet-p2p-7637 ERROR core wait time 2837817 µs
Jun 24 16:23:24-238451 cadet-p2p-7637 ERROR core wait time 5246310 µs
Jun 24 16:23:40-920026 cadet-p2p-7637 ERROR core wait time 20741797 µs
Jun 24 16:23:40-941020 cadet-p2p-7637 ERROR core wait time 20716256 µs
Jun 24 16:23:53-981582 cadet-p2p-7637 ERROR core wait time 11698883 µs
Jun 24 16:24:04-867139 cadet-p2p-7637 ERROR core wait time 21204620 µs
Jun 24 16:24:04-886729 cadet-p2p-7637 ERROR core wait time 21089378 µs
Jun 24 16:24:04-886854 cadet-p2p-7637 ERROR core wait time 9830636 µs
Jun 24 16:24:06-520914 cadet-p2p-7637 ERROR core wait time 1463508 µs
Jun 24 16:24:26-612671 cadet-p2p-7637 ERROR core wait time 13728272 µs
Jun 24 16:24:26-693633 cadet-p2p-7637 ERROR core wait time 13863223 µs
Jun 24 16:24:28-886638 cadet-p2p-7637 ERROR core wait time 22357393 µs
Jun 24 16:24:29-017782 cadet-p2p-7637 ERROR core wait time 1013363 µs
Jun 24 16:24:44-730806 cadet-p2p-7637 ERROR core wait time 13391372 µs
Jun 24 16:24:44-730972 cadet-p2p-7637 ERROR core wait time 13284756 µs
Jun 24 16:24:48-394109 cadet-p2p-7637 ERROR core wait time 1186329 µs
Jun 24 16:24:48-394252 cadet-p2p-7637 ERROR core wait time 1056908 µs
Jun 24 16:24:48-394329 cadet-p2p-7637 ERROR core wait time 1028680 µs
Jun 24 16:25:27-148906 cadet-p2p-7637 ERROR core wait time 1202636 µs
Jun 24 16:25:27-359345 cadet-p2p-7637 ERROR core wait time 1084818 µs
Jun 24 16:26:03-514570 cadet-p2p-7637 ERROR core wait time 1252936 µs
Jun 24 16:26:03-985843 cadet-p2p-7637 ERROR core wait time 35566135 µs
Jun 24 16:26:17-827476 cadet-p2p-7637 ERROR core wait time 11867350 µs
Jun 24 16:26:30-167180 cadet-p2p-7637 ERROR core wait time 23702592 µs
Jun 24 16:26:45-710540 cadet-p2p-7637 ERROR core wait time 26625994 µs
Jun 24 16:26:47-933963 cadet-p2p-7637 ERROR core wait time 16670937 µs
Jun 24 16:26:57-323420 cadet-p2p-7637 ERROR core wait time 39494267 µs
Jun 24 16:26:58-214701 cadet-p2p-7637 ERROR core wait time 9147146 µs
Jun 24 16:27:09-533290 cadet-p2p-7637 ERROR core wait time 10237966 µs
Jun 24 16:27:09-533464 cadet-p2p-7637 ERROR core wait time 20378605 µs
Jun 24 16:27:15-500351 cadet-p2p-7637 ERROR core wait time 4643506 µs
Jun 24 16:27:15-500502 cadet-p2p-7637 ERROR core wait time 4349448 µs
Jun 24 16:27:50-960755 cadet-p2p-7637 ERROR core wait time 34168373 µs
Jun 24 16:27:51-130636 cadet-p2p-7637 ERROR core wait time 5069534 µs
Jun 24 16:27:57-167752 cadet-p2p-7637 ERROR core wait time 4683136 µs
Jun 24 16:28:03-329494 cadet-p2p-7637 ERROR core wait time 6161592 µs
Jun 24 16:28:09-318579 cadet-p2p-7637 ERROR core wait time 5988942 µs
Jun 24 16:28:30-321834 cadet-7637 ERROR Assertion failed at gnunet-service-cadet_peer.c:2199.

Bart Polot

2015-06-24 19:17

manager   ~0009332

The new messages are most probably a CORE bug:

 Jun 24 16:27:50-960755 cadet-p2p-7637 ERROR core wait time 34168373 µs

This means that CADET waited for 34 seconds before being able to send a message. All kinds of timeouts are possible in these conditions...

Bart Polot

2015-06-25 05:20

manager   ~0009337

I believe r35985 should have fixed this.

amatus

2015-06-25 06:29

developer   ~0009340

Still happening in rev 35987

Bart Polot

2015-06-26 04:22

manager   ~0009341

I'm pretty sure r35993 should finally kill this one.

amatus

2015-06-28 01:40

developer   ~0009357

Looks fixed.

amatus

2015-06-28 17:40

developer   ~0009362

Last edited: 2015-06-28 18:01

View 2 revisions

Hit this again in SVN rev 36015

Bart Polot

2015-06-28 18:49

manager   ~0009363

Are you sure it's not 0003845?

Bart Polot

2015-06-29 14:58

manager   ~0009368

Ok, I had it wrong, 3845 is fixed, this one is the one that still happens.

Could you confirm that you stacktrace looks like this?

#0 0xb77a9d40 in __kernel_vsyscall ()
#1 0xb7510307 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#2 0xb75119c3 in __GI_abort () at abort.c:89
#3 0xb7721ef3 in GNUNET_abort_ () at common_logging.c:290
#4 0x08074245 in GCP_remove_connection (peer=0x83a2290, c=0x83a5a10) at gnunet-service-cadet_peer.c:2219
#5 0x0805b8a9 in unregister_neighbors (c=0x83a5a10) at gnunet-service-cadet_connection.c:1588
#6 0x0805fdf0 in GCC_destroy (c=0x83a5a10) at gnunet-service-cadet_connection.c:2813
#7 0x08058a64 in conn_message_sent (cls=0x8393110, c=0x83a5a10, sent=1, type=258, pid=0, fwd=1, size=100, wait=...)
    at gnunet-service-cadet_connection.c:638
#8 0x080710fa in GCP_queue_destroy (queue=0x8396338, clear_cls=0, sent=1, pid=0) at gnunet-service-cadet_peer.c:1349
#9 0x0807090b in queue_send (cls=0x83a1f70, size=364, buf=0x83a79a0) at gnunet-service-cadet_peer.c:1229
#10 0xb76f44f8 in transmit_message (cls=0x8393360, size=420, buf=0x83a7968) at core_api.c:676
#11 0xb771e574 in client_notify (cls=0x83a7670, size=420, buf=0x83a7968) at client.c:1201
#12 0xb772b447 in process_notify (connection=0x8392ef0) at connection.c:1261
#13 0xb772be4e in transmit_ready (cls=0x8392ef0, tc=0xbfaaada4) at connection.c:1405
#14 0xb775963f in run_ready (rs=0x83a0368, ws=0x83a03f0) at scheduler.c:587
#15 0xb7759f42 in GNUNET_SCHEDULER_run (task=0xb7765f2b <service_task>, task_cls=0xbfaaafd0) at scheduler.c:867
#16 0xb7767bcb in GNUNET_SERVICE_run (argc=3, argv=0xbfaab1c4,
    service_name=0x807d0ba "\a\b\276j\a\b\312j\a\b\303k\a\b\326j\a\b\342j\a\b\356j\a\b\303k\a\b\372j\a\b\006k\a\b\022k\a\b\036k\a\b*k\a\b\303k\a\b\303k\a\b\303k\a\b6k\a\b\246j\a\bBk\a\b\303k\a\b\303k\a\bNk\a\bZk\a\b\303k\a\b\303k\a\b\303k\a\bfk\a\brk\a\b{k\a\b\204k\a\b\215k\a\b\226k\a\b\237k\a\b\250k\a\b\261k\a\b\272k\a\bgnunet-service-cadet.c",
    options=GNUNET_SERVICE_OPTION_NONE, task=0x8076c91 <shutdown_task+106>, task_cls=0x0) at service.c:1503
#17 0x08076fe3 in run (cls=0x8077170 <__udivdi3+128>, server=0x0, c=0x0) at gnunet-service-cadet.c:155
#18 0xb74fba63 in __libc_start_main (main=0x3, argc=134523120, argv=0x0, init=0x804a911 <_start+33>,
    fini=0x8076fab <run+617>, rtld_fini=0x3, stack_end=0xbfaab1c4) at libc-start.c:287
#19 0xb77cc000 in ?? () from /lib/ld-linux.so.2


The key part is in frame 7 (conn_message_sent), type is always 258 for me.

amatus

2015-06-29 16:34

developer   ~0009369

This is the backtrace from rev 36015

Program terminated with signal SIGABRT, Aborted.
#0 0xb6f1eddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0xb6f1eddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0xb6f20463 in __GI_abort () at abort.c:89
#2 0xb7148f2d in GNUNET_abort_ () at common_logging.c:290
#3 0x0808ad46 in GCP_remove_connection (peer=0xb4109680, c=0xb2f01240)
    at gnunet-service-cadet_peer.c:2219
#4 0x080666f5 in unregister_neighbors (c=0xb2f01240)
    at gnunet-service-cadet_connection.c:1589
#5 0x0806b5b1 in GCC_destroy (c=0xb2f01240)
    at gnunet-service-cadet_connection.c:2824
#6 0x08065d4c in connection_timeout (c=0xb2f01240, fwd=1)
    at gnunet-service-cadet_connection.c:1420
#7 0x08065df2 in connection_fwd_timeout (cls=0xb2f01240, tc=0xbfcd03d0)
    at gnunet-service-cadet_connection.c:1441
#8 0xb71c4498 in run_ready (rs=0xb2b03370, ws=0xb2b032a0) at scheduler.c:587
#9 0xb71c51b7 in GNUNET_SCHEDULER_run (task=0xb71df0c5 <service_task>,
    task_cls=0xbfcd0930) at scheduler.c:867
#10 0xb71e2240 in GNUNET_SERVICE_run (argc=7, argv=0xbfcd0bb4,
    service_name=0x809a340 "cadet", options=GNUNET_SERVICE_OPTION_NONE,
    task=0x808f307 <run>, task_cls=0x0) at service.c:1503
#11 0x0808f59f in main (argc=7, argv=0xbfcd0bb4) at gnunet-service-cadet.c:174

amatus

2015-06-30 00:55

developer   ~0009376

Here is a backtrace from rev 36024 showing type=258:

#0 0x00007fbef284e107 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fbef284f4e8 in __GI_abort () at abort.c:89
#2 0x00007fbef3d0e96a in GNUNET_abort_ () at common_logging.c:290
#3 0x000000000042361e in GCP_remove_connection (peer=0x198bf90, c=0x19aa9b0)
    at gnunet-service-cadet_peer.c:2219
#4 0x0000000000411273 in unregister_neighbors (c=0x19aa9b0)
    at gnunet-service-cadet_connection.c:1589
#5 0x00000000004145ba in GCC_destroy (c=0x19aa9b0)
    at gnunet-service-cadet_connection.c:2824
#6 0x000000000040f0ac in conn_message_sent (cls=0x19a9b70, c=0x19aa9b0,
    sent=1, type=258, pid=0, fwd=1, size=100, wait=...)
    at gnunet-service-cadet_connection.c:643
#7 0x0000000000420e86 in GCP_queue_destroy (queue=0x198eee0, clear_cls=0,
    sent=1, pid=0) at gnunet-service-cadet_peer.c:1351
#8 0x00000000004208eb in queue_send (cls=0x1988110, size=332, buf=0x199ed38)
    at gnunet-service-cadet_peer.c:1240
#9 0x00007fbef38e1a2b in transmit_message (cls=0x19740e0, size=388,
    buf=0x199ed00) at core_api.c:676
#10 0x00007fbef3d0b48a in client_notify (cls=0x19a8e50, size=388,
    buf=0x199ed00) at client.c:1201
#11 0x00007fbef3d1754c in process_notify (connection=0x1974200)
    at connection.c:1261
#12 0x00007fbef3d17d21 in transmit_ready (cls=0x1974200, tc=0x7ffc3fee6550)
    at connection.c:1405
#13 0x00007fbef3d46c43 in run_ready (rs=0x19736e0, ws=0x1973770)
    at scheduler.c:587
#14 0x00007fbef3d474d7 in GNUNET_SCHEDULER_run (
    task=0x7fbef3d52a4e <service_task>, task_cls=0x7ffc3fee68e0)
    at scheduler.c:867
#15 0x00007fbef3d54676 in GNUNET_SERVICE_run (argc=7, argv=0x7ffc3fee6b78,
    service_name=0x429998 "cadet", options=GNUNET_SERVICE_OPTION_NONE,
    task=0x42557f <run>, task_cls=0x0) at service.c:1503
#16 0x000000000042581e in main (argc=7, argv=0x7ffc3fee6b78)
    at gnunet-service-cadet.c:174

Christian Grothoff

2015-06-30 11:36

manager   ~0009381

I got it too, with SVN HEAD of today:

#0 0x00007f45778df107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f45778e04e8 in __GI_abort () at abort.c:89
#2 0x00007f4578da3fc0 in GNUNET_abort_ () at common_logging.c:290
#3 0x00000000004300c1 in GCP_remove_connection (peer=0x186d7b0, c=0x1874b20) at gnunet-service-cadet_peer.c:2219
#4 0x000000000041591c in unregister_neighbors (c=0x1874b20) at gnunet-service-cadet_connection.c:1586
#5 0x000000000041a447 in GCC_destroy (c=0x1874b20) at gnunet-service-cadet_connection.c:2824
#6 0x00000000004127cc in conn_message_sent (cls=0x1866a60, c=0x1874b20, sent=0, type=258, pid=0, fwd=0, size=100, wait=...) at gnunet-service-cadet_connection.c:643
#7 0x000000000042cb8f in GCP_queue_destroy (queue=0x1879e40, clear_cls=1, sent=0, pid=0) at gnunet-service-cadet_peer.c:1351
#8 0x000000000042d794 in GCP_queue_cancel (peer=0x186ec20, c=0x1874b20) at gnunet-service-cadet_peer.c:1517
#9 0x0000000000414684 in connection_cancel_queues (c=0x1874b20, fwd=0) at gnunet-service-cadet_connection.c:1249
#10 0x000000000041ad36 in GCC_notify_broken (c=0x1874b20, peer=0x186ec20) at gnunet-service-cadet_connection.c:3089
#11 0x0000000000429658 in notify_broken (cls=0x186ec20, key=0x7fff507e2b20, value=0x1874b20) at gnunet-service-cadet_peer.c:364
#12 0x00007f4578db5423 in GNUNET_CONTAINER_multihashmap_iterate (map=0x1870380, it=0x42958b <notify_broken>, it_cls=0x186ec20) at container_multihashmap.c:358
#13 0x0000000000429dbd in core_disconnect (cls=0x0, peer=0x1867040) at gnunet-service-cadet_peer.c:480
#14 0x00007f4578971d08 in disconnect_and_free_peer_entry (cls=0x184f9d0, key=0x7fff507e2dc8, value=0x1866ff0) at core_api.c:389
#15 0x00007f45789753ad in main_notify_handler (cls=0x184f9d0, msg=0x7fff507e2dc0) at core_api.c:913
#16 0x00007f4578d9e7db in receive_task (cls=0x184f360, tc=0x7fff507e2ec0) at client.c:618
#17 0x00007f4578dded4c in run_ready (rs=0x184c990, ws=0x184ca20) at scheduler.c:587
#18 0x00007f4578ddf657 in GNUNET_SCHEDULER_run (task=0x7f4578dec132 <service_task>, task_cls=0x7fff507e3250) at scheduler.c:867
#19 0x00007f4578dede6b in GNUNET_SERVICE_run (argc=3, argv=0x7fff507e34e8, service_name=0x43a12a "cadet", options=GNUNET_SERVICE_OPTION_NONE, task=0x432f1d <run>, task_cls=0x0) at service.c:1503
#20 0x000000000043328c in main (argc=3, argv=0x7fff507e34e8) at gnunet-service-cadet.c:174

(gdb) bt full
#0 0x00007f45778df107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 27452
        selftid = 27452
#1 0x00007f45778e04e8 in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x28, sa_sigaction = 0x28}, sa_mask = {__val = {0, 140734543833008, 139936361745927, 140733193388033, 0, 1, 139936335072552, 16, 140734543833008, 4208400, 139936361771813, 1,
              139936335935008, 0, 139936338868112, 139936338862176}}, sa_flags = 2034288384, sa_restorer = 0x7f45778f59fa <_IO_vfprintf_internal+22490>}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x00007f4578da3fc0 in GNUNET_abort_ () at common_logging.c:290
No locals.
#3 0x00000000004300c1 in GCP_remove_connection (peer=0x186d7b0, c=0x1874b20) at gnunet-service-cadet_peer.c:2219
        __FUNCTION__ = "GCP_remove_connection"
#4 0x000000000041591c in unregister_neighbors (c=0x1874b20) at gnunet-service-cadet_connection.c:1586
        peer = 0x186d7b0
        __FUNCTION__ = "unregister_neighbors"
#5 0x000000000041a447 in GCC_destroy (c=0x1874b20) at gnunet-service-cadet_connection.c:2824
        __FUNCTION__ = "GCC_destroy"
#6 0x00000000004127cc in conn_message_sent (cls=0x1866a60, c=0x1874b20, sent=0, type=258, pid=0, fwd=0, size=100, wait=...) at gnunet-service-cadet_connection.c:643
        p = 0x0
        fc = 0x1874b70
        q = 0x1866a60
        usecsperbyte = 1.9762625833649862e-322
        forced = 1
        __FUNCTION__ = "conn_message_sent"
#7 0x000000000042cb8f in GCP_queue_destroy (queue=0x1879e40, clear_cls=1, sent=0, pid=0) at gnunet-service-cadet_peer.c:1351
        wait_time = {rel_value_us = 110263569}
        peer = 0x186ec20
        connection_destroyed = 0
        __FUNCTION__ = "GCP_queue_destroy"
#8 0x000000000042d794 in GCP_queue_cancel (peer=0x186ec20, c=0x1874b20) at gnunet-service-cadet_peer.c:1517
        q = 0x1879e40
        next = 0x1879e40
        prev = 0x186e7a0
        connection_destroyed = 0
        __FUNCTION__ = "GCP_queue_cancel"
#9 0x0000000000414684 in connection_cancel_queues (c=0x1874b20, fwd=0) at gnunet-service-cadet_connection.c:1249
        fc = 0x1874b70
        peer = 0x186ec20
        __FUNCTION__ = "connection_cancel_queues"
#10 0x000000000041ad36 in GCC_notify_broken (c=0x1874b20, peer=0x186ec20) at gnunet-service-cadet_connection.c:3089
        hop = 0x186ec20
        fwd = 1
        __FUNCTION__ = "GCC_notify_broken"
#11 0x0000000000429658 in notify_broken (cls=0x186ec20, key=0x7fff507e2b20, value=0x1874b20) at gnunet-service-cadet_peer.c:364
        peer = 0x186ec20
        c = 0x1874b20
        __FUNCTION__ = "notify_broken"
#12 0x00007f4578db5423 in GNUNET_CONTAINER_multihashmap_iterate (map=0x1870380, it=0x42958b <notify_broken>, it_cls=0x186ec20) at container_multihashmap.c:358
        bme = 0x1871e70
        nxt = 0x0
        count = 1
        i = 14
        me = {sme = 0x1871e70, bme = 0x1871e70}
        kc = {bits = {1936134110, 3582693111, 677836696, 4254488402, 3265454343, 1598125268, 627729676, 1047629589, 0, 0, 0, 0, 0, 0, 0, 0}}
        __FUNCTION__ = "GNUNET_CONTAINER_multihashmap_iterate"
#13 0x0000000000429dbd in core_disconnect (cls=0x0, peer=0x1867040) at gnunet-service-cadet_peer.c:480
        p = 0x186ec20
        direct_path = 0x1867180
        own_id = "9QXF", '\000' <repeats 11 times>
        __FUNCTION__ = "core_disconnect"
#14 0x00007f4578971d08 in disconnect_and_free_peer_entry (cls=0x184f9d0, key=0x7fff507e2dc8, value=0x1866ff0) at core_api.c:389
        h = 0x184f9d0
        th = 0x1867080
        pr = 0x1866ff0
        __FUNCTION__ = "disconnect_and_free_peer_entry"
#15 0x00007f45789753ad in main_notify_handler (cls=0x184f9d0, msg=0x7fff507e2dc0) at core_api.c:913
        h = 0x184f9d0
        m = 0x4571fddfcc8e6223
        cnm = 0x38
        dnm = 0x7fff507e2dc0
        ntm = 0x690442854600e800
        em = 0x38
---Type <return> to continue, or q <return> to quit---
        smr = 0x403710 <_start>
        mh = 0x4b278564d25
        init = 0x2a
        pr = 0x1866ff0
        th = 0x0
        hpos = 0
        trigger = 0
        msize = 40
        et = 0
        __FUNCTION__ = "main_notify_handler"
#16 0x00007f4578d9e7db in receive_task (cls=0x184f360, tc=0x7fff507e2ec0) at client.c:618
        client = 0x184f360
        handler = 0x7f45789743d7 <main_notify_handler>
        cmsg = 0x1874570
        handler_cls = 0x184f9d0
        msize = 40
        mbuf = 0x7fff507e2dc0 ""
        msg = 0x7fff507e2dc0
        __FUNCTION__ = "receive_task"
#17 0x00007f4578dded4c in run_ready (rs=0x184c990, ws=0x184ca20) at scheduler.c:587
        p = GNUNET_SCHEDULER_PRIORITY_DEFAULT
        pos = 0x186be70
        tc = {reason = GNUNET_SCHEDULER_REASON_TIMEOUT, read_ready = 0x184c990, write_ready = 0x184ca20}
        __FUNCTION__ = "run_ready"
#18 0x00007f4578ddf657 in GNUNET_SCHEDULER_run (task=0x7f4578dec132 <service_task>, task_cls=0x7fff507e3250) at scheduler.c:867
        rs = 0x184c990
        ws = 0x184ca20
        timeout = {rel_value_us = 0}
        ret = 0
        shc_int = 0x1863e50
        shc_term = 0x184dbf0
        shc_quit = 0x184dd70
        shc_hup = 0x184de30
        shc_pipe = 0x184dcb0
        last_tr = 34833
        busy_wait_warning = 0
        pr = 0x1862c00
        c = 0 '\000'
        __FUNCTION__ = "GNUNET_SCHEDULER_run"
#19 0x00007f4578dede6b in GNUNET_SERVICE_run (argc=3, argv=0x7fff507e34e8, service_name=0x43a12a "cadet", options=GNUNET_SERVICE_OPTION_NONE, task=0x432f1d <run>, task_cls=0x0) at service.c:1503
        err = 0
        ret = 3
        cfg_fn = 0x184a7c0 "/home/grothoff/.config/gnunet.conf"
        opt_cfg_fn = 0x184a920 "/home/grothoff/.config/gnunet.conf"
        loglev = 0x0
        logfile = 0x0
        do_daemonize = 0
        i = 4402973
        skew_offset = 139936363926784
        skew_variance = 140734543836208
        clock_offset = 4208400
        sctx = {cfg = 0x184a7f0, server = 0x184e150, addrs = 0x184c520, service_name = 0x43a12a "cadet", task = 0x432f1d <run>, task_cls = 0x0, v4_denied = 0x0, v6_denied = 0x0, v4_allowed = 0x184a810, v6_allowed = 0x1863fd0,
          my_handlers = 0x184e240, addrlens = 0x1863f30, lsocks = 0x0, shutdown_task = 0x184e1e0, timeout = {rel_value_us = 18446744073709551615}, ret = 1, ready_confirm_fd = -1, require_found = 1, match_uid = 1, match_gid = 1,
          options = GNUNET_SERVICE_OPTION_NONE}
        cfg = 0x184a7f0
        xdg = 0x7fff507e3db0 "/home/grothoff/.config"
        service_options = {{shortName = 99 'c', name = 0x7f4578dfc6ed "config", argumentHelp = 0x7f4578dfc6f4 "FILENAME", description = 0x7f4578dfc700 "use configuration file FILENAME", require_argument = 1,
            processor = 0x7f4578dcc089 <GNUNET_GETOPT_set_string>, scls = 0x7fff507e3308}, {shortName = 100 'd', name = 0x7f4578dfc720 "daemonize", argumentHelp = 0x0, description = 0x7f4578dfc730 "do daemonize (detach from terminal)",
            require_argument = 0, processor = 0x7f4578dcc05c <GNUNET_GETOPT_set_one>, scls = 0x7fff507e32f4}, {shortName = 104 'h', name = 0x7f4578dfc754 "help", argumentHelp = 0x0, description = 0x7f4578dfc759 "print this help",
            require_argument = 0, processor = 0x7f4578dcbb27 <GNUNET_GETOPT_format_help_>, scls = 0x0}, {shortName = 76 'L', name = 0x7f4578dfc769 "log", argumentHelp = 0x7f4578dfc76d "LOGLEVEL",
            description = 0x7f4578dfc778 "configure logging to use LOGLEVEL", require_argument = 1, processor = 0x7f4578dcc089 <GNUNET_GETOPT_set_string>, scls = 0x7fff507e3300}, {shortName = 108 'l', name = 0x7f4578dfc79a "logfile",
            argumentHelp = 0x7f4578dfc7a2 "LOGFILE", description = 0x7f4578dfc7b0 "configure logging to write logs to LOGFILE", require_argument = 1, processor = 0x7f4578dcc089 <GNUNET_GETOPT_set_string>, scls = 0x7fff507e32f8}, {
            shortName = 118 'v', name = 0x7f4578dfc7db "version", argumentHelp = 0x0, description = 0x7f4578dfc7e3 "print the version number", require_argument = 0, processor = 0x7f4578dcbae1 <GNUNET_GETOPT_print_version_>,
            scls = 0x7f4578dfc7fc}, {shortName = 0 '\000', name = 0x0, argumentHelp = 0x0, description = 0x0, require_argument = 0, processor = 0x0, scls = 0x0}}
        __FUNCTION__ = "GNUNET_SERVICE_run"
#20 0x000000000043328c in main (argc=3, argv=0x7fff507e34e8) at gnunet-service-cadet.c:174

Christian Grothoff

2015-07-13 23:29

manager   ~0009430

With the latest instrumentation, I got this violation of the invariant, which seems interesting:

(gdb) ba
#0 0x00007f5c56909107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f5c5690a4e8 in __GI_abort () at abort.c:89
#2 0x00007f5c57dce480 in GNUNET_abort_ () at common_logging.c:290
#3 0x00000000004302f7 in GCP_check_connection (peer=0x208b980, c=0x2094630) at gnunet-service-cadet_peer.c:2213
#4 0x00000000004133c2 in check_neighbours (c=0x2094630) at gnunet-service-cadet_connection.c:806
#5 0x00000000004133ec in check_connection (cls=0x0, key=0x7ffea0de53a0, value=0x2094630) at gnunet-service-cadet_connection.c:824
#6 0x00007f5c57ddf8e3 in GNUNET_CONTAINER_multihashmap_iterate (map=0x2074c80, it=0x4133c4 <check_connection>, it_cls=0x0)
    at container_multihashmap.c:358
#7 0x0000000000413410 in check_connections () at gnunet-service-cadet_connection.c:835
#8 0x000000000041a0b6 in GCC_new (cid=0x7ffea0de54a0, t=0x2094ff0, path=0x2093b50, own_pos=0) at gnunet-service-cadet_connection.c:2838
#9 0x000000000040fc08 in GCT_use_path (t=0x2094ff0, p=0x2093b50) at gnunet-service-cadet_tunnel.c:3778
#10 0x000000000042ee4b in GCP_connect (peer=0x208b980) at gnunet-service-cadet_peer.c:1889
#11 0x000000000042b6e3 in search_handler (cls=0x208b980, path=0x208b900) at gnunet-service-cadet_peer.c:1032
#12 0x000000000043144f in dht_get_id_handler (cls=0x2094e90, exp=..., key=0x7ffea0de5820, get_path=0x7ffea0de5880, get_path_length=0,
    put_path=0x7ffea0de5860, put_path_length=1, type=GNUNET_BLOCK_TYPE_DHT_HELLO, size=90, data=0x7ffea0de5880)
    at gnunet-service-cadet_dht.c:192
#13 0x00007f5c57794e1d in process_reply (cls=0x7ffea0de5800, key=0x7ffea0de5820, value=0x2095400) at dht_api.c:740
#14 0x00007f5c57de04c3 in GNUNET_CONTAINER_multihashmap_get_multiple (map=0x2078570, key=0x7ffea0de5820,
    it=0x7f5c57794946 <process_reply>, it_cls=0x7ffea0de5800) at container_multihashmap.c:816
#15 0x00007f5c57795b0a in service_message_handler (cls=0x2078590, msg=0x7ffea0de5800) at dht_api.c:1016
#16 0x00007f5c57dc8c9b in receive_task (cls=0x20789f0, tc=0x7ffea0de59b0) at client.c:618
#17 0x00007f5c57e0aaf3 in run_ready (rs=0x2073c10, ws=0x2073ca0) at scheduler.c:587
#18 0x00007f5c57e0b3fe in GNUNET_SCHEDULER_run (task=0x7f5c57e17ed9 <service_task>, task_cls=0x7ffea0de5d40) at scheduler.c:867
#19 0x00007f5c57e19c12 in GNUNET_SERVICE_run (argc=3, argv=0x7ffea0de5fd8, service_name=0x43a56a "cadet",
    options=GNUNET_SERVICE_OPTION_NONE, task=0x43333f <run>, task_cls=0x0) at service.c:1503
#20 0x00000000004336ae in main (argc=3, argv=0x7ffea0de5fd8) at gnunet-service-cadet.c:174

Christian Grothoff

2015-07-13 23:31

manager   ~0009431

Additionally, this one suggests that we probably need to add a few more
places with check_connections(), as here the invariant is false at the
beginning of a function:

#0 0x00007f7cdce00107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) ba
#0 0x00007f7cdce00107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f7cdce014e8 in __GI_abort () at abort.c:89
#2 0x00007f7cde2c5480 in GNUNET_abort_ () at common_logging.c:290
#3 0x00000000004302f7 in GCP_check_connection (peer=0x2013750, c=0x202d550) at gnunet-service-cadet_peer.c:2213
#4 0x00000000004133a4 in check_neighbours (c=0x202d550) at gnunet-service-cadet_connection.c:805
#5 0x00000000004133ec in check_connection (cls=0x0, key=0x7ffdfe8f0e20, value=0x202d550) at gnunet-service-cadet_connection.c:824
#6 0x00007f7cde2d68e3 in GNUNET_CONTAINER_multihashmap_iterate (map=0x200cc80, it=0x4133c4 <check_connection>, it_cls=0x0)
    at container_multihashmap.c:358
#7 0x0000000000413410 in check_connections () at gnunet-service-cadet_connection.c:835
#8 0x0000000000416313 in GCC_handle_create (cls=0x0, peer=0x7ffdfe8f10c4, message=0x7ffdfe8f10e4)
    at gnunet-service-cadet_connection.c:1813
#9 0x00007f7cdde96a0c in main_notify_handler (cls=0x200d6f0, msg=0x7ffdfe8f10c0) at core_api.c:967
#10 0x00007f7cde2bfc9b in receive_task (cls=0x200cfd0, tc=0x7ffdfe8f1220) at client.c:618
#11 0x00007f7cde301af3 in run_ready (rs=0x200bc10, ws=0x200bca0) at scheduler.c:587
#12 0x00007f7cde3023fe in GNUNET_SCHEDULER_run (task=0x7f7cde30eed9 <service_task>, task_cls=0x7ffdfe8f15b0) at scheduler.c:867
#13 0x00007f7cde310c12 in GNUNET_SERVICE_run (argc=3, argv=0x7ffdfe8f1848, service_name=0x43a56a "cadet",
    options=GNUNET_SERVICE_OPTION_NONE, task=0x43333f <run>, task_cls=0x0) at service.c:1503
#14 0x00000000004336ae in main (argc=3, argv=0x7ffdfe8f1848) at gnunet-service-cadet.c:174

Christian Grothoff

2015-07-14 23:56

manager   ~0009432

Gotcha: invariant violated at the END of core_connect, but passed at the beginning:

#0 0x00007f20911a3107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f20911a44e8 in __GI_abort () at abort.c:89
#2 0x00007f2092668480 in GNUNET_abort_ () at common_logging.c:290
#3 0x00000000004305d3 in GCP_check_connection (peer=0xc7b6c0, c=0xca7fe0) at gnunet-service-cadet_peer.c:2251
#4 0x00000000004133f6 in check_neighbours (c=0xca7fe0) at gnunet-service-cadet_connection.c:814
#5 0x000000000041343e in check_connection (cls=0x0, key=0x7ffd368c1a80, value=0xca7fe0) at gnunet-service-cadet_connection.c:834
#6 0x00007f20926798e3 in GNUNET_CONTAINER_multihashmap_iterate (map=0xc61c80, it=0x413416 <check_connection>, it_cls=0x0)
    at container_multihashmap.c:358
#7 0x0000000000413470 in GCC_check_connections () at gnunet-service-cadet_connection.c:847
#8 0x000000000042a058 in core_connect (cls=0x0, peer=0xc9ff40) at gnunet-service-cadet_peer.c:448
#9 0x00007f2092238fa4 in main_notify_handler (cls=0xc626f0, msg=0x7ffd368c1cc0) at core_api.c:882
#10 0x00007f2092662c9b in receive_task (cls=0xc61fd0, tc=0x7ffd368c1dc0) at client.c:618
#11 0x00007f20926a4af3 in run_ready (rs=0xc60c10, ws=0xc60ca0) at scheduler.c:587
#12 0x00007f20926a53fe in GNUNET_SCHEDULER_run (task=0x7f20926b1ed9 <service_task>, task_cls=0x7ffd368c2150) at scheduler.c:867
#13 0x00007f20926b3c12 in GNUNET_SERVICE_run (argc=3, argv=0x7ffd368c23e8, service_name=0x43a8ba "cadet",
    options=GNUNET_SERVICE_OPTION_NONE, task=0x433693 <run>, task_cls=0x0) at service.c:1503
#14 0x0000000000433a02 in main (argc=3, argv=0x7ffd368c23e8) at gnunet-service-cadet.c:174
(gdb)

Christian Grothoff

2015-07-15 00:01

manager   ~0009433

AHA!

How about this: we have an existing 'struct CadetConnection' A-US-B, but no connection to say 'A'. Then A connects to us (core_connect). At the beginning, the invariant passes because (NULL == peer->connections) for the non-connected peer. Then 'core_connect' *creates* peer->connections HT (BUT does NOT add existing CadetConnections to the new map). Then afterwards the check fails, as the CadetConnection is NOT in the map. If we then tear down the CadetConnection (i.e. because 'B' disconnects), we try to remove from 'A' but the remove fails (connection not in map!).

I'm not sure how we can have a 'struct CadetConnection' from A-US-B without a connection A-US, but the fact that we check "NULL == peer->connections" (in GCP_remove_connection) suggests this at least plausible.

Christian Grothoff

2015-07-15 00:04

manager   ~0009434

From cadet_peer.c:2251 on assertion failure:

(gdb) print *peer
$1 = {id = 20, last_contact = {abs_value_us = 1436910759928036}, path_head = 0xc99ee0, path_tail = 0xca6c40, search_h = 0xc9ef60,
  search_delayed = 0x0, tunnel = 0xcaacb0, connections = 0xcab350, core_transmit = 0x0, tmt_time = {abs_value_us = 0}, queue_head = 0x0,
  queue_tail = 0x0, queue_n = 0, hello = 0xca9590}
(gdb) print *c
$2 = {t = 0x0, fwd_fc = {c = 0xca7fe0, queue_n = 0, queue_max = 11, last_pid_sent = 4294967295, last_pid_recv = 4294967295,
    recv_bitmap = 0, last_ack_sent = 0, last_ack_recv = 0, poll_task = 0x0, poll_time = {rel_value_us = 1000000}, poll_msg = 0x0,
    ack_msg = 0x0}, bck_fc = {c = 0xca7fe0, queue_n = 0, queue_max = 11, last_pid_sent = 4294967295, last_pid_recv = 4294967295,
    recv_bitmap = 0, last_ack_sent = 0, last_ack_recv = 0, poll_task = 0x0, poll_time = {rel_value_us = 1000000}, poll_msg = 0x0,
    ack_msg = 0x0}, perf = 0x0, id = {bits = "\t9I)\222\063A\004\326\363\353\226j\365j\366\226w\344\\h\220\rj2Z0l\267\262]\205"},
  path = 0xca8140, fwd_maintenance_task = 0xca7160, bck_maintenance_task = 0x0, maintenance_q = 0x0, next_peer = 0xc7b6c0,
  prev_peer = 0xc7c570, state = CADET_CONNECTION_DESTROYED, own_pos = 3, pending_messages = 1, destroy = 1, create_retry = 0}
(gdb) print *c->next_peer
$3 = {id = 20, last_contact = {abs_value_us = 1436910759928036}, path_head = 0xc99ee0, path_tail = 0xca6c40, search_h = 0xc9ef60,
  search_delayed = 0x0, tunnel = 0xcaacb0, connections = 0xcab350, core_transmit = 0x0, tmt_time = {abs_value_us = 0}, queue_head = 0x0,
  queue_tail = 0x0, queue_n = 0, hello = 0xca9590}
(gdb) print *c->prev_peer
$4 = {id = 28, last_contact = {abs_value_us = 1436910759928010}, path_head = 0xc7aa70, path_tail = 0xc9b200, search_h = 0xc84370,
  search_delayed = 0x0, tunnel = 0xc86640, connections = 0xc7b9c0, core_transmit = 0xc81618, tmt_time = {
    abs_value_us = 1436910751616026}, queue_head = 0xca8ad0, queue_tail = 0xca8ad0, queue_n = 1, hello = 0xc9e170}

=> so 'c' was not in the map for c's 'next_peer'

At cadet_peer.c:448 at the same time:
(gdb) print *mp
$5 = {id = 20, last_contact = {abs_value_us = 1436910759928036}, path_head = 0xc99ee0, path_tail = 0xca6c40, search_h = 0xc9ef60,
  search_delayed = 0x0, tunnel = 0xcaacb0, connections = 0xcab350, core_transmit = 0x0, tmt_time = {abs_value_us = 0}, queue_head = 0x0,
  queue_tail = 0x0, queue_n = 0, hello = 0xca9590}

=> so the peer that we established a connection to was '20' == c's 'next_peer'

This confirms the previous theory that by initializing 'mp->connections' we cause the invariant to fail.

Christian Grothoff

2015-07-15 00:06

manager   ~0009435

So now the 1M $$ question: why did we have a connection via a 'next' peer which is NOT currently connected?

It cannot (by design) be due to disconnect, on disconnect we clean up connections associated with the disconnecting peer.

That leaves that we somehow *create* a connection via peers that we are not currently connected to. => Another invariant to check? Or Bart, do you have an explanation from the design?

Christian Grothoff

2015-07-15 00:28

manager   ~0009436

Ok, my stronger check of the invariant fails during disconnect -- a connection remains with a disconnected peer. Closer inspection easily explains this as well: the connection in 'GCC_notify_broken' is not fully destroyed as a last 'disconnect notification' is scheduled (send_broken()) for the OTHER peer that remains. While that one is in the queue, the comments say the connection object will live.

Thus, if then the peer re-connects, the connection may still be around, but wouldn't be re-added to the peer's connection list. If it *then* gets killed, (which is normal, after all it will eventually transmit the notification...) the code will attempt to remove it from it's neighbour's queue and then this violates the connections-must-be-in-peer-list assertion.

Possibly a good answer: remove it immediately from the disconnecting peer AND mark this in the data structure, so we don't try doing so again (so we can keep the invariant).

Christian Grothoff

2015-07-15 01:03

manager   ~0009437

This is not the only issue in this domain. Sometimes c->destroy is set to YES, the actual destruction is deferred; however, the connection is then anyway returned via the hashmap and in some cases the 'destroy' flag is not checked. This can then in particular cause havoc when a new (more or less identical) connection is to be created while the previous one is awaiting destruction. Very messy.

Christian Grothoff

2015-07-15 01:34

manager   ~0009438

Proposed fix in SVN 36070. Will leave at feedback for a bit to be sure...

Christian Grothoff

2015-07-15 14:16

manager   ~0009450

The fix caused a follow-up issue when connections were re-created. That's also now fixed (right after Bart's long commit series), so ~36083.

amatus

2015-07-15 19:19

developer   ~0009451

At rev 36090 I'm now hitting your new GNUNET_assert (NULL != ( (fwd) ? c->next_peer : c->prev_peer) ); in GCC_notify_broken.

Program terminated with signal SIGABRT, Aborted.
#0 0xb756eddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0xb756eddc in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0xb7570463 in __GI_abort () at abort.c:89
#2 0xb7740caa in GNUNET_abort_ () at common_logging.c:290
#3 0x0805b2d2 in GCC_notify_broken (c=0x99a2860, peer=0x9985318)
    at gnunet-service-cadet_connection.c:3276
#4 0x08064284 in notify_broken (cls=0x9985318, key=0xbffa25e0,
    value=0x99a2860) at gnunet-service-cadet_peer.c:364
#5 0xb774f851 in GNUNET_CONTAINER_multihashmap_iterate (map=0x9982d40,
    it=0x806425d <notify_broken>, it_cls=0x9985318)
    at container_multihashmap.c:358
#6 0x08064948 in core_disconnect (cls=0x0, peer=0x9992630)
    at gnunet-service-cadet_peer.c:483
#7 0xb77155dc in disconnect_and_free_peer_entry (cls=0x99809e8,
    key=0xbffa2828, value=0x9992600) at core_api.c:389
#8 0xb77181a7 in main_notify_handler (cls=0x99809e8, msg=0xbffa2820)
    at core_api.c:913
#9 0xb773c348 in receive_task (cls=0x9980040, tc=0xbffa28b4) at client.c:618
#10 0xb7777469 in run_ready (rs=0x998ee30, ws=0x998eeb8) at scheduler.c:587
#11 0xb7777cf7 in GNUNET_SCHEDULER_run (task=0xb7782885 <service_task>,
    task_cls=0xbffa2ae0) at scheduler.c:867
#12 0xb7784423 in GNUNET_SERVICE_run (argc=7, argv=0xbffa2cd4,
    service_name=0x806e5bc "cadet", options=GNUNET_SERVICE_OPTION_NONE,
    task=0x806ac6e <run>, task_cls=0x0) at service.c:1503
#13 0x0806af00 in main (argc=7, argv=0xbffa2cd4) at gnunet-service-cadet.c:174
(gdb) f 3
#3 0x0805b2d2 in GCC_notify_broken (c=0x99a2860, peer=0x9985318)
    at gnunet-service-cadet_connection.c:3276
3276 gnunet-service-cadet_connection.c: No such file or directory.
(gdb) p *peer
$1 = {id = 16, last_contact = {abs_value_us = 1436969489418527},
  path_head = 0x0, path_tail = 0x0, search_h = 0x0, search_delayed = 0x0,
  tunnel = 0x0, connections = 0x9982d40, core_transmit = 0x0, tmt_time = {
    abs_value_us = 0}, queue_head = 0x0, queue_tail = 0x0, queue_n = 0,
  hello = 0x99856b8}
(gdb) p *c
$2 = {t = 0x0, fwd_fc = {c = 0x99a2860, queue_n = 0, queue_max = 11,
    last_pid_sent = 4294967295, last_pid_recv = 4294967295, recv_bitmap = 0,
    last_ack_sent = 0, last_ack_recv = 0, poll_task = 0x0, poll_time = {
      rel_value_us = 1000000}, poll_msg = 0x0, ack_msg = 0x0}, bck_fc = {
    c = 0x99a2860, queue_n = 0, queue_max = 11, last_pid_sent = 4294967295,
    last_pid_recv = 4294967295, recv_bitmap = 0, last_ack_sent = 0,
    last_ack_recv = 0, poll_task = 0x0, poll_time = {rel_value_us = 1000000},
    poll_msg = 0x0, ack_msg = 0x0}, perf = 0x0, id = {
    bits = "\343\223\031\016\n\004\263^\275\231\224\365\272\361s $\253\061\066Q\234\233B\315L\320\251\236\340\355E"}, path = 0x999d9f0,
  fwd_maintenance_task = 0x999f218, bck_maintenance_task = 0x0,
  maintenance_q = 0x0, next_peer = 0x0, prev_peer = 0x0,
  state = CADET_CONNECTION_DESTROYED, own_pos = 3, pending_messages = 1,
  destroy = 1, was_removed = 1, create_retry = 0}
(gdb) p *hop
$3 = {id = 62, last_contact = {abs_value_us = 1436969489418517},
  path_head = 0x99a0418, path_tail = 0x99a0418, search_h = 0x0,
  search_delayed = 0x0, tunnel = 0x999e770, connections = 0x99a04c8,
  core_transmit = 0x999e93c, tmt_time = {abs_value_us = 1436969488768256},
  queue_head = 0x999ddf0, queue_tail = 0x999ef40, queue_n = 2, hello = 0x0}

Christian Grothoff

2015-07-15 19:41

manager   ~0009452

Yeah, after my last changes the assertion is just bogus. Removed in 36091.

Issue History

Date Modified Username Field Change
2015-06-19 16:46 amatus New Issue
2015-06-19 16:46 amatus Status new => assigned
2015-06-19 16:46 amatus Assigned To => Bart Polot
2015-06-19 16:52 amatus Platform amd64 => x86
2015-06-24 16:53 Bart Polot Status assigned => feedback
2015-06-24 19:09 amatus Note Added: 0009331
2015-06-24 19:09 amatus Status feedback => assigned
2015-06-24 19:17 Bart Polot Note Added: 0009332
2015-06-25 05:19 Bart Polot Relationship added related to 0003845
2015-06-25 05:20 Bart Polot Note Added: 0009337
2015-06-25 05:20 Bart Polot Status assigned => feedback
2015-06-25 06:29 amatus Note Added: 0009340
2015-06-25 06:29 amatus Status feedback => assigned
2015-06-26 04:22 Bart Polot Note Added: 0009341
2015-06-26 04:22 Bart Polot Status assigned => feedback
2015-06-26 13:35 Christian Grothoff Target Version => 0.11.0pre66
2015-06-28 01:40 amatus Note Added: 0009357
2015-06-28 01:40 amatus Status feedback => assigned
2015-06-28 03:48 Bart Polot Status assigned => resolved
2015-06-28 03:48 Bart Polot Fixed in Version => 0.11.0pre66
2015-06-28 03:48 Bart Polot Resolution open => fixed
2015-06-28 17:40 amatus Note Added: 0009362
2015-06-28 17:40 amatus Status resolved => feedback
2015-06-28 17:40 amatus Resolution fixed => reopened
2015-06-28 17:40 amatus Status feedback => assigned
2015-06-28 18:01 amatus Note Edited: 0009362 View Revisions
2015-06-28 18:49 Bart Polot Note Added: 0009363
2015-06-29 14:58 Bart Polot Note Added: 0009368
2015-06-29 16:34 amatus Note Added: 0009369
2015-06-30 00:55 amatus Note Added: 0009376
2015-06-30 11:32 Bart Polot Relationship added has duplicate 0003873
2015-06-30 11:36 Christian Grothoff Note Added: 0009381
2015-07-13 23:29 Christian Grothoff Note Added: 0009430
2015-07-13 23:31 Christian Grothoff Note Added: 0009431
2015-07-14 23:56 Christian Grothoff Note Added: 0009432
2015-07-15 00:01 Christian Grothoff Note Added: 0009433
2015-07-15 00:04 Christian Grothoff Note Added: 0009434
2015-07-15 00:06 Christian Grothoff Note Added: 0009435
2015-07-15 00:28 Christian Grothoff Note Added: 0009436
2015-07-15 01:03 Christian Grothoff Note Added: 0009437
2015-07-15 01:34 Christian Grothoff Note Added: 0009438
2015-07-15 01:45 Christian Grothoff Status assigned => feedback
2015-07-15 01:46 Christian Grothoff Assigned To Bart Polot => Christian Grothoff
2015-07-15 14:16 Christian Grothoff Note Added: 0009450
2015-07-15 19:19 amatus Note Added: 0009451
2015-07-15 19:19 amatus Status feedback => assigned
2015-07-15 19:41 Christian Grothoff Note Added: 0009452
2015-07-16 11:15 Christian Grothoff Status assigned => feedback
2015-08-03 13:37 Christian Grothoff Status feedback => resolved
2015-08-03 13:37 Christian Grothoff Resolution reopened => fixed
2018-06-07 00:25 Christian Grothoff Status resolved => closed