View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002248 | GNUnet | transport service | public | 2012-04-01 18:43 | 2012-06-02 19:15 |
Reporter | Christian Grothoff | Assigned To | Matthias Wachs | ||
Priority | urgent | Severity | major | Reproducibility | have not tried |
Status | closed | Resolution | fixed | ||
Product Version | Git master | ||||
Target Version | 0.9.3 | Fixed in Version | 0.9.3 | ||
Summary | 0002248: significant discrepancy between core-level and transport-level connections for running peer | ||||
Description | I've been running a peer (with properly punched NAT) for a few hours now. I'm seeing a significant difference (3-to-8) between the number of connections 'core' is aware of and the number of connections transport has: grothoff@spec:~/svn/gnunet$ gnunet-core Peer `GJG6GNC6ME462F4QA5KHEC4FSQDRSQJI9HP88DKC057463F0FN072AO01JSI1E4A9PUI0CNUS88SH4SM9S0QUUHH24C3DGU8CU0G2OG' Peer `KNAPGJMNVI5LECGUVBF1CTLAR2VHDI7HH5TDBRPDHDPKL5LA27FIBD0F63HAMQCAN2R5ESVA3CB6ICQRGN6UKSQF3EKVH3ISUKMGQC8' Peer `RL7P1AO4M4LOI3JPVER5FJKG6A847K3O86NS4Q07J1721371FO7RB6IFVMNI1NKLFTRCM7O5AGJPQ3UDD8MST8QQHLNOC4GUL6731O0' grothoff@spec:~/svn/gnunet$ gnunet-transport -ni Peer `BEKL': tcp 120.108.205.237:2086 Peer `RL7P': tcp 131.159.74.67:2086 Peer `R69Q': tcp 203.178.133.3:2086 Peer `KUPL': udp 134.2.197.47:2086 Peer `KNAP': udp 98.119.204.191:2086 Peer `IVT2': tcp 71.177.216.36:2086 Peer `GJG6': udp 98.119.204.191:12086 Peer `R6OV': tcp 188.40.170.101:2086 Looking at the statistics, 'core' confirmed 47 session keys and terminated 44 sessions due to 'transport-level disconnect' (which leaves 3). So either core got erroneous disconnects from transport (maybe because of ATS session switch?) or core failed to get connect notifications for 5 additional connections. Transport told stats that 25 (!) peers are connected (far more than the 8 being listed -- bad counting?), 39 (!) TCP sessions are active (bad counting?), so the numbers are still very wrong here. | ||||
Additional Information | grothoff@spec:~/svn/gnunet$ gnunet-statistics -s core core # encrypted bytes given to transport : 113658666 core # bytes encrypted : 110385656 core # avg payload per encrypted message : 2037 core # bytes of payload decrypted : 201784291 core # bytes decrypted : 202497359 core # type maps received : 580 core # type map refreshes sent : 594 core # entries in session map : 3 core # sessions terminated by timeout : 3 core # keepalive messages sent : 412 core # messages discarded (session disconnected) : 2 core # neighbour entries allocated : 11 core # key exchanges stopped : 44 core # sessions terminated by transport disconnect : 44 core # PONG messages created : 137 core # PING messages received : 190 core # session keys confirmed via PONG : 47 core # PONG messages decrypted : 447 core # PONG messages received : 687 core # SET_KEY messages decrypted : 51 core # session keys received : 51 core # SET_KEY and PING messages created : 119 core # key exchanges initiated : 96 core # updates to my type map : 20 core # failed to decrypt message (no session key) : 730 core # send requests dropped (disconnected) : 2 core # rekey operations confirmed via PONG : 23 core # REKEY operations performed : 23 grothoff@spec:~/svn/gnunet$ gnunet-statistics -s transport transport # bytes transmitted via TCP : 92504494 transport # bytes currently in TCP buffers : 860 transport # keepalives sent : 13187 transport # peers connected : 25 transport # TCP sessions active : 39 transport # bytes discarded by TCP (disconnect) : 2385476 transport # network-level TCP disconnect events : 19413 transport # bytes received via TCP : 74552828 transport # IPv4 broadcast HELLO beacons received via udp : 1748 transport # fragments retransmitted : 23264 transport # fragments transmitted : 40981 transport # bytes in message queue for other peers : 114448070 transport # bytes payload received for other peers : 113739780 transport # fragmentation transmissions completed : 1898 transport # fragment acknowledgements received : 2771 transport # total size of fragmented messages : 20507051 transport # messages fragmented : 1901 transport # REQUEST CONNECT messages received : 11720 transport # ms throttling suggested : 30563912 transport # PONGs unicast via reliable transport : 1305 transport # PING messages received : 1305 transport # PONG messages received : 1297 transport # PING without HELLO messages sent : 2128 transport # address revalidations started : 2785 transport # TCP WELCOME messages received : 216 transport # address records discarded : 1391 transport # KEEPALIVE_RESPONSE messages discarded (not expected): 80 transport # peers disconnected due to timeout : 975 transport # acknowledgements sent for fragment : 93550 transport # duplicate fragments received : 86514 transport # fragments received : 190458 transport # messages defragmented : 7374 transport # PING message for different peer received : 70 transport # fast reconnects failed : 26 transport # failed connection attempts due to timeout : 3266 transport # transport-service disconnect requests for TCP : 5 transport # bytes payload dropped (other peer was not connected): 42836 transport # PONGs dropped, no matching pending validation : 1 transport # peers disconnected due to external request : 4 transport # bytes payload discarded due to not connected peer : 5845839 transport # refreshed my HELLO : 127 transport # bandwidth quota violations by other peers : 449 transport # bytes discarded by TCP (timeout) : 294 transport # KEEPALIVE messages discarded (not connected) : 4 | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
|
Had a peer running for several hours showing the following statistics: 9P6V "transport " "# peers connected " == 4 9P6V "core " "# neighbour entries allocated " == 4 9P6V "core " "# entries in session map " == 3 9P6V "topology " "# peers connected " == 3 In this test transport connections and core notifications are equal, but core stats are inconsistent... ATM Try to analyze this |
|
Added logfile test.log.tar.gz for the test mentioned above running for 15 hours showing a consistent picture of the analysis mentioned above |
|
Analysis with the new integration-tests/connection_watchdog tool running for 15 hours with tcp: - core and transport (dis)connect notifications are consistent - transport (dis)connect notifications && transport statistics value "# peers connected" are consistent - transport (dis)connect notifications && core statistics value "# entries in session map" && "# neighbour entries allocated" are consistent not consistent: transport " # TCP sessions active " && "# peers connected": tcp sessions << peers connected |
|
Snapshot: mwachs@fulcrum:~/gnunet/gnunet-debug/src/integration-tests$ gnunet-transport -i -n; netstat -a inet -n | grep 2086; gnunet-statistics -s transport -n "# TCP sessions active" Peer `RL7P': tcp [2001:4ca0:2001:42:230:48ff:febb:4bca]:2086 Peer `R69Q': tcp 203.178.133.3:2086 Peer `LJR8': tcp 193.174.155.36:2086 Peer `KNAP': tcp [2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086 Peer `IVT2': tcp [2001:470:d:ef4::11]:2086 Peer `94CH': tcp 83.169.43.154:2086 Peer `1G1M': tcp 132.252.152.193:2086 tcp 0 0 0.0.0.0:2086 0.0.0.0:* LISTEN tcp 0 0 131.159.20.52:43843 193.174.155.36:2086 ESTABLISHED tcp 0 1 131.159.20.52:52159 192.168.1.101:2086 SYN_SENT tcp 0 0 131.159.20.52:58340 203.178.133.3:2086 ESTABLISHED tcp 0 0 131.159.20.52:35071 132.252.152.193:2086 ESTABLISHED tcp 0 0 131.159.20.52:41852 83.169.43.154:2086 ESTABLISHED tcp 0 0 131.159.20.52:34521 131.159.74.67:2086 ESTABLISHED tcp6 0 0 :::2086 :::* LISTEN tcp6 0 0 2001:4ca0:2001:11:52934 2001:4ca0:2001:42::2086 ESTABLISHED tcp6 0 0 2001:4ca0:2001:11:59229 2001:470:d:ef4::11:2086 ESTABLISHED tcp6 0 0 2001:4ca0:2001:11:50269 2a01:4f8:101:281:7:2086 ESTABLISHED tcp6 0 0 2001:4ca0:2001:11:45970 2002:c32e:b925:0:2:2086 ESTABLISHED transport # TCP sessions active : 5 |
|
netstat, transport and peerinfo output matched + explanations: --- Peer connections with corresponding tcp connections: Peer `RL7P': tcp [2001:4ca0:2001:42:230:48ff:febb:4bca]:2086 tcp 0 0 131.159.20.52:34521 131.159.74.67:2086 ESTABLISHED tcp6 0 0 2001:4ca0:2001:11:52934 2001:4ca0:2001:42::2086 ESTABLISHED Peer `R69Q': tcp 203.178.133.3:2086 tcp 0 0 131.159.20.52:58340 203.178.133.3:2086 ESTABLISHED Peer `LJR8': tcp 193.174.155.36:2086 tcp 0 0 131.159.20.52:43843 193.174.155.36:2086 ESTABLISHED Peer `KNAP': tcp [2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086 tcp6 0 0 2001:4ca0:2001:11:45970 2002:c32e:b925:0:2:2086 ESTABLISHED Peer `IVT2': tcp [2001:470:d:ef4::11]:2086 tcp6 0 0 2001:4ca0:2001:11:59229 2001:470:d:ef4::11:2086 ESTABLISHED Peer `94CH': tcp 83.169.43.154:2086 tcp 0 0 131.159.20.52:41852 83.169.43.154:2086 ESTABLISHED Peer `1G1M': tcp 132.252.152.193:2086 tcp 0 0 131.159.20.52:35071 132.252.152.193:2086 ESTABLISHED ---- The following tcp connections are left after matching tp connections to peer connections # This address is not in peerinfo! tcp 0 1 131.159.20.52:52159 192.168.1.101:2086 SYN_SENT # This peer is not marked as connected in transport Peer `R6OVGTG5M5EKRS0MBLGI5VEOG360RCJS2TCUU7OCTGKN5CSBIGCC7C75IU6J1P3F4OQC2D21203VECHCKQ95IBUMO6C46SH5MPAO9J8' tcp6 0 0 2001:4ca0:2001:11:50269 2a01:4f8:101:281:7:2086 ESTABLISHED --- # Number of sessions is really too low! transport # TCP sessions active : 5 |
|
A whole afternoon running with just tcp: everything consistent with 7 peers |
|
Issue: number of TCP sessions is higher (~ 2 * connections)than number of transport connections, multiple TCP sessions to one peer. This sessions are initiated by validation, but not terminated. Analysis: ATS triggers every 15 seconds a connection switch for every peer to switch to every address!!!! Apr 05 18:15:11-936444 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED' Apr 05 18:15:11-936576 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED' Apr 05 18:15:11-936678 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED' Apr 05 18:15:11-936759 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED' Apr 05 18:15:11-936801 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED' Apr 05 18:15:11-936894 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED' Apr 05 18:15:11-936983 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED' Apr 05 18:15:11-937064 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED' Apr 05 18:15:11-937160 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED' Apr 05 18:15:11-937240 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED' Apr 05 18:15:11-937330 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED' Apr 05 18:15:11-937418 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED' Apr 05 18:15:11-937458 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED' Apr 05 18:15:11-937548 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED' Apr 05 18:15:11-937636 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED' Apr 05 18:15:11-937724 transport-31608 ERROR ATS tells us to switch to address '188.40.170.101:2086' session (nil) for peer `R6OV' in state `S_NOT_CONNECTED' Apr 05 18:15:11-937789 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED' Apr 05 18:15:26-939589 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED' Apr 05 18:15:26-939703 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED' Apr 05 18:15:26-939794 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED' Apr 05 18:15:26-939858 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED' Apr 05 18:15:26-939904 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED' Apr 05 18:15:26-939962 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED' Apr 05 18:15:26-940030 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED' Apr 05 18:15:26-940114 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED' Apr 05 18:15:26-940193 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED' Apr 05 18:15:26-940269 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED' Apr 05 18:15:26-940336 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED' Apr 05 18:15:26-940401 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED' Apr 05 18:15:26-940442 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED' Apr 05 18:15:26-940507 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED' Apr 05 18:15:26-940571 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED' Apr 05 18:15:26-940653 transport-31608 ERROR ATS tells us to switch to address '188.40.170.101:2086' session (nil) for peer `R6OV' in state `S_NOT_CONNECTED' Apr 05 18:15:26-940717 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED' ISSUE! |
|
Found an issue: transport only in- or decreases the statistics value "# peers connected" When transport dies, the statistics value "# peers connected" stays the same When it restarts, it increases the value for the new connected peers Example: - transport is connected to 7 peers - transport dies and restarts - transport connects to 7 peers -> statistics value is 14! |
|
Core uses statistics_set to set the values "# entries in session map" "# neighbour entries allocated" |
|
TCP sets the session counter initial to 0 |
|
SVN 21414 contains a relevant fix by re-starting the key-exchange code on timeout. |
|
With revision 21639 I analyzed the situation after the latest changes: - 2 Peers with TCP and UDP connected directly running over night on my machine: (so we sure have no connectivity problem or version incompatibilities) All services (transport, core, fs, dht) say "peers connected" = 1 for both peers So there no timeouts on upper layers ... - Running a peer with "real gnunet" core and transport values "peers connected" "# neighbour entries allocated" and core are consistent (with values around 15 - 17 peers) |
|
The issue is still very much alive. I am trying to get my peer from at home to connect to gnunet.org, and it fails. I have 1 inbound TCP connection, 1 transport-level connection but zero core connections. Address validations have clearly succeeded, but neither does transport build connections as desired, nor does core consistently use those connections that transport thinks are up. |
|
Ok, looks much, much better now, so I'm willing to accept this one as resolved ;-). |
Date Modified | Username | Field | Change |
---|---|---|---|
2012-04-01 18:43 | Christian Grothoff | New Issue | |
2012-04-01 18:43 | Christian Grothoff | Status | new => assigned |
2012-04-01 18:43 | Christian Grothoff | Assigned To | => Matthias Wachs |
2012-04-04 10:04 | Matthias Wachs | Note Added: 0005693 | |
2012-04-04 10:07 | Matthias Wachs | File Added: test.log.tar.gz | |
2012-04-04 10:08 | Matthias Wachs | Note Added: 0005694 | |
2012-04-05 08:56 | Matthias Wachs | Note Added: 0005697 | |
2012-04-05 09:02 | Matthias Wachs | Note Added: 0005698 | |
2012-04-05 09:13 | Matthias Wachs | Note Added: 0005699 | |
2012-04-05 09:17 | Matthias Wachs | Note Edited: 0005699 | |
2012-04-05 17:19 | Matthias Wachs | Note Added: 0005701 | |
2012-04-05 17:52 | Matthias Wachs | Note Added: 0005702 | |
2012-04-05 18:15 | Matthias Wachs | Note Edited: 0005702 | |
2012-04-05 18:15 | Matthias Wachs | Note Edited: 0005702 | |
2012-04-05 18:16 | Matthias Wachs | Note Edited: 0005702 | |
2012-05-04 16:44 | Matthias Wachs | Note Added: 0005782 | |
2012-05-04 16:50 | Matthias Wachs | Note Added: 0005783 | |
2012-05-04 16:58 | Matthias Wachs | Note Added: 0005784 | |
2012-05-10 15:51 | Christian Grothoff | Note Added: 0005855 | |
2012-05-31 11:25 | Matthias Wachs | Note Added: 0005967 | |
2012-05-31 11:25 | Matthias Wachs | Status | assigned => feedback |
2012-05-31 23:52 | Christian Grothoff | Note Added: 0005974 | |
2012-05-31 23:52 | Christian Grothoff | Status | feedback => assigned |
2012-06-01 15:18 | Christian Grothoff | Note Added: 0005978 | |
2012-06-01 15:18 | Christian Grothoff | Status | assigned => resolved |
2012-06-01 15:18 | Christian Grothoff | Fixed in Version | => 0.9.3 |
2012-06-01 15:18 | Christian Grothoff | Resolution | open => fixed |
2012-06-02 19:15 | Christian Grothoff | Status | resolved => closed |