View Issue Details

IDProjectCategoryView StatusLast Update
0001880GNUnettransport servicepublic2011-11-30 18:38
ReporterLRN Assigned ToLRN  
PrioritynormalSeveritymajorReproducibilitysometimes
Status closedResolutionfixed 
Target Version0.9.0Fixed in Version0.9.0 
Summary0001880: Crash in transport service (neighbour entry is corrupt)
DescriptionHad this for a while now, but only bothered to look at the code today.

I've commented the assertion there (figured - it should process an entry without a connected neighbour just fine), but apparently cls here contains some invalid data.
Steps To ReproduceRun the test suite. Usually a more than one transport process will crash simultaneously over this (in tests that do have multiple transport processes), such as dht and mesh tests.
Additional InformationAttaching to process 8504
[New Thread 8504.0x2aa8]
[New Thread 8504.0x1b0c]
[New Thread 8504.0x2964]
[New Thread 8504.0x2d70]
[New Thread 8504.0x15a8]
Reading symbols from d:\progs\gnunet\bin\gnunet-service-transport.exe...done.
[Switching to Thread 8504.0x15a8]
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 8504.0x2aa8]
0x772515ee in ntdll!LdrQueryProcessModuleInformation () from E:\Windows\SysWOW64\ntdll.dll
[Switching to thread 1 (Thread 8504.0x2aa8)]
#0 0x772515ee in ntdll!LdrQueryProcessModuleInformation () from E:\Windows\SysWOW64\ntdll.dll
#0 0x772515ee in ntdll!LdrQueryProcessModuleInformation () from E:\Windows\SysWOW64\ntdll.dll
#1 0x772515ee in ntdll!LdrQueryProcessModuleInformation () from E:\Windows\SysWOW64\ntdll.dll
#2 0x7724015e in ntdll!LdrFindResource_U () from E:\Windows\SysWOW64\ntdll.dll
#3 0x0028f5fc in ?? ()
#4 0x00408c34 in _fu221__skip_log () at gnunet-service-transport_neighbours.c:1117
#5 0x6c382a2c in _fu35__skip_log () at plugin_transport_tcp.c:829
#6 0x6c386585 in libgnunet_plugin_transport_tcp_done (cls=0x1d5d370) at plugin_transport_tcp.c:2059
#7 0x624a7700 in GNUNET_PLUGIN_unload (library_name=0x1d60c70 "libgnunet_plugin_transport_tcp", arg=0x1d5d370) at plugin.c:270
#8 0x0040cbd7 in GST_plugins_unload () at gnunet-service-transport_plugins.c:160
#9 0x00401fcf in shutdown_task (cls=0x0, tc=0x28fca8) at gnunet-service-transport.c:453
#10 0x624ac9d1 in run_ready (rs=0x1d5c770, ws=0x1d5d140) at scheduler.c:684
#11 0x624acf10 in GNUNET_SCHEDULER_run (task=0x624b5a51 <service_task>, task_cls=0x28fe28) at scheduler.c:864
#12 0x624b6735 in GNUNET_SERVICE_run (argc=3, argv=0x1d5a250, serviceName=0x411293 "transport", opt=GNUNET_SERVICE_OPTION_NONE,
    task=0x40205f <run>, task_cls=0x0) at service.c:1584
#13 0x004023cc in main (argc=3, argv=0x1d5a250) at gnunet-service-transport.c:554
(gdb) up 4
#4 0x00408c34 in _fu221__skip_log () at gnunet-service-transport_neighbours.c:1117
1117 GNUNET_ATS_address_destroyed (GST_ats, &n->id, n->plugin_name, n->addr,
(gdb) p *n
$1 = {messages_head = 0x1d7c5b0, messages_tail = 0x1d85e78, is_active = 0x6e000101, session = 0x10101,
  plugin_name = 0xa8bcf3e9 <Address 0xa8bcf3e9 out of bounds>, addr = 0x661555e8, addrlen = 3410653863, id = {hashPubKey = {bits = {738968866,
        1832180779, 15829386, 286614, 1922684995, 3393576832, 1118916848, 738824394, 476219900, 2274401372, 235098225, 1426914419, 1774637927,
        546680069, 2293507062, 2799793571}}}, timeout_task = 5209587130130777514, keepalive_task = 16511268100620241461,
  transmission_task = 17543152867108499175, in_tracker = {consumption_since_last_update__ = -6068785878852531726, last_update__ = {
      abs_value = 16503045288224349061}, available_bytes_per_s__ = 3458884045, max_carry_s__ = 186741818}, bandwidth_in = {value__ = 1655610955},
  bandwidth_out = {value__ = 2051908698}, connect_ts = {abs_value = 6130166182246217722}, ats_suggest = 5588662537427971225,
  state_reset = 7813720629198082242, quota_violation_count = 486347848, state = -70825652}
(gdb) p n
$2 = (struct NeighbourMapEntry *) 0x1d68730
(gdb) l send_switch_address_continuation
1093 */
1094 static void
1095 send_switch_address_continuation (void *cls,
1096 const struct GNUNET_PeerIdentity *target,
1097 int success)
1098 {
1099 struct NeighbourMapEntry *n = cls;
1100
1101 GNUNET_assert (n != NULL);
1102 if (is_disconnecting (n))
(gdb)
1103 return; /* neighbour is going away */
1104
1105 /*GNUNET_assert (n->state == S_CONNECTED);*/
1106 if (GNUNET_YES != success)
1107 {
1108 #if DEBUG_TRANSPORT
1109 GNUNET_log (GNUNET_ERROR_TYPE_DEBUG,
1110 "Failed to switch connected peer `%s' to plugin `%s' address '%s' session %X, asking ATS for new address \n",
1111 GNUNET_i2s (&n->id), n->plugin_name,
1112 (n->addrlen ==
(gdb)
1113 0) ? "<inbound>" : GST_plugins_a2s (n->plugin_name, n->addr,
1114 n->addrlen), n->session);
1115 #endif
1116
1117 GNUNET_ATS_address_destroyed (GST_ats, &n->id, n->plugin_name, n->addr,
1118 n->addrlen, NULL);
1119
1120 if (n->ats_suggest != GNUNET_SCHEDULER_NO_TASK)
1121 GNUNET_SCHEDULER_cancel (n->ats_suggest);
1122 n->ats_suggest =
(gdb)
1123 GNUNET_SCHEDULER_add_delayed (ATS_RESPONSE_TIMEOUT, ats_suggest_cancel,
1124 n);
1125 GNUNET_ATS_suggest_address (GST_ats, &n->id);
1126 return;
1127 }
1128 }
TagsNo tags attached.
Attached Files
0001-Call-transport-disconnect-to-clear-session-message-q.patch (1,183 bytes)   
From 94580a0da287a5e9cb095e68f6344ad1fe39f1b9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=A0=D1=83=D1=81=D0=BB=D0=B0=D0=BD=20=D0=98=D0=B6=D0=B1=D1?=
 =?UTF-8?q?=83=D0=BB=D0=B0=D1=82=D0=BE=D0=B2?= <lrn1986@gmail.com>
Date: Sat, 5 Nov 2011 01:48:28 +0400
Subject: [PATCH] Call transport disconnect to clear session message queue

---
 .../gnunet-service-transport_neighbours.c          |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/src/transport/gnunet-service-transport_neighbours.c b/src/transport/gnunet-service-transport_neighbours.c
index e0af12b..14987d6 100644
--- a/src/transport/gnunet-service-transport_neighbours.c
+++ b/src/transport/gnunet-service-transport_neighbours.c
@@ -862,6 +862,14 @@ disconnect_neighbour (struct NeighbourMapEntry *n)
     return;
   change_state (n, S_DISCONNECT);
 
+  if (n->plugin_name != NULL)
+  {
+    struct GNUNET_TRANSPORT_PluginFunctions *papi;
+    papi = GST_plugins_find (n->plugin_name);
+    if (papi != NULL)
+      papi->disconnect (papi->cls, &n->id);
+  }
+
   while (NULL != (mq = n->messages_head))
   {
     GNUNET_CONTAINER_DLL_remove (n->messages_head, n->messages_tail, mq);
-- 
1.7.4

Activities

LRN

2011-11-04 20:20

developer   ~0004859

ok, send_switch_address_continuation() fails because struct NeighbourMapEntry *n = cls; contains junk
it gets cls from a sent_with_plugin() call, issued by GST_neighbours_switch_to_address_3way()
cls (n) is a result of a lookup_neighbour() call
and at the time of send_with_plugin() call it seems to be valid
lookup_neighbour() calls GNUNET_CONTAINER_multihashmap_get(), and the only way to remove multihashmap is to do GNUNET_CONTAINER_multihashmap_remove(), and only disconnect_neighbour() does that.
but before freeing the messages in the queue of the entry, disconnect_neighbour() DOES call their callbacks
aha, it empties the queue of the neighbour entry, while disconnect_session() (which also calls callbacks on the queued messages) empties the session queue
So the problem is that session queue contains messages that are processed with a neighbour entry context, but that entry might not be valid by then
session messages are queued by tcp_plugin_send(), which is likely called by send_with_plugin() (or by one of the validation functions...), which gets us back to GST_neighbours_switch_to_address_3way()

LRN

2011-11-04 22:49

developer   ~0004860

neighbour entry message queue is filled by GST_neighbours_send()

So, what to do?

1) It might be possible to use n->session to remove messages from session queue during disconnect_neighbour(). But it isn't, because n->session is opaque.

2) It might be possible to rewrite send_switch_address_continuation() to handle GNUNET_SYSERR correctly (that is, assume that cls is b0rked). I haven't tried that.

3) It might be possible to call tcp_plugin_disconnect() via papi from disconnect_neighbour(). That will clean the session. I did that, patch is attached below. Results are positive (no crashes).

LRN

2011-11-04 22:50

developer   ~0004861

Well, actually, it's not "below", it's "above", 0001-Call-transport-disconnect-to-clear-session-message-q.patch

Christian Grothoff

2011-11-05 19:54

manager   ~0004862

Patch has been applied to SVN as 18023.

Issue History

Date Modified Username Field Change
2011-11-04 19:51 LRN New Issue
2011-11-04 19:51 LRN Status new => assigned
2011-11-04 19:51 LRN Assigned To => Matthias Wachs
2011-11-04 20:20 LRN Note Added: 0004859
2011-11-04 22:49 LRN Note Added: 0004860
2011-11-04 22:50 LRN File Added: 0001-Call-transport-disconnect-to-clear-session-message-q.patch
2011-11-04 22:50 LRN Note Added: 0004861
2011-11-05 19:54 Christian Grothoff Note Added: 0004862
2011-11-05 19:54 Christian Grothoff Assigned To Matthias Wachs => LRN
2011-11-05 19:54 Christian Grothoff Status assigned => resolved
2011-11-05 19:54 Christian Grothoff Fixed in Version => 0.9.0
2011-11-05 19:54 Christian Grothoff Resolution open => fixed
2011-11-05 19:54 Christian Grothoff Product Version Git master => 0.9.0pre4
2011-11-05 19:54 Christian Grothoff Target Version => 0.9.0
2011-11-30 18:38 Christian Grothoff Status resolved => closed