View Issue Details

IDProjectCategoryView StatusLast Update
0002248GNUnettransport servicepublic2012-06-02 19:15
ReporterChristian Grothoff Assigned ToMatthias Wachs  
PriorityurgentSeveritymajorReproducibilityhave not tried
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.9.3Fixed in Version0.9.3 
Summary0002248: significant discrepancy between core-level and transport-level connections for running peer
DescriptionI've been running a peer (with properly punched NAT) for a few hours now. I'm seeing a significant difference (3-to-8) between the number of connections 'core' is aware of and the number of connections transport has:

grothoff@spec:~/svn/gnunet$ gnunet-core
Peer `GJG6GNC6ME462F4QA5KHEC4FSQDRSQJI9HP88DKC057463F0FN072AO01JSI1E4A9PUI0CNUS88SH4SM9S0QUUHH24C3DGU8CU0G2OG'
Peer `KNAPGJMNVI5LECGUVBF1CTLAR2VHDI7HH5TDBRPDHDPKL5LA27FIBD0F63HAMQCAN2R5ESVA3CB6ICQRGN6UKSQF3EKVH3ISUKMGQC8'
Peer `RL7P1AO4M4LOI3JPVER5FJKG6A847K3O86NS4Q07J1721371FO7RB6IFVMNI1NKLFTRCM7O5AGJPQ3UDD8MST8QQHLNOC4GUL6731O0'
grothoff@spec:~/svn/gnunet$ gnunet-transport -ni
Peer `BEKL': tcp 120.108.205.237:2086
Peer `RL7P': tcp 131.159.74.67:2086
Peer `R69Q': tcp 203.178.133.3:2086
Peer `KUPL': udp 134.2.197.47:2086
Peer `KNAP': udp 98.119.204.191:2086
Peer `IVT2': tcp 71.177.216.36:2086
Peer `GJG6': udp 98.119.204.191:12086
Peer `R6OV': tcp 188.40.170.101:2086

Looking at the statistics, 'core' confirmed 47 session keys and terminated 44 sessions due to 'transport-level disconnect' (which leaves 3). So either core got erroneous disconnects from transport (maybe because of ATS session switch?) or core failed to get connect notifications for 5 additional connections.

Transport told stats that 25 (!) peers are connected (far more than the 8 being listed -- bad counting?), 39 (!) TCP sessions are active (bad counting?), so the numbers are still very wrong here.
Additional Informationgrothoff@spec:~/svn/gnunet$ gnunet-statistics -s core
 core # encrypted bytes given to transport : 113658666
 core # bytes encrypted : 110385656
 core # avg payload per encrypted message : 2037
 core # bytes of payload decrypted : 201784291
 core # bytes decrypted : 202497359
 core # type maps received : 580
 core # type map refreshes sent : 594
 core # entries in session map : 3
 core # sessions terminated by timeout : 3
 core # keepalive messages sent : 412
 core # messages discarded (session disconnected) : 2
 core # neighbour entries allocated : 11
 core # key exchanges stopped : 44
 core # sessions terminated by transport disconnect : 44
 core # PONG messages created : 137
 core # PING messages received : 190
 core # session keys confirmed via PONG : 47
 core # PONG messages decrypted : 447
 core # PONG messages received : 687
 core # SET_KEY messages decrypted : 51
 core # session keys received : 51
 core # SET_KEY and PING messages created : 119
 core # key exchanges initiated : 96
 core # updates to my type map : 20
 core # failed to decrypt message (no session key) : 730
 core # send requests dropped (disconnected) : 2
 core # rekey operations confirmed via PONG : 23
 core # REKEY operations performed : 23
grothoff@spec:~/svn/gnunet$ gnunet-statistics -s transport
 transport # bytes transmitted via TCP : 92504494
 transport # bytes currently in TCP buffers : 860
 transport # keepalives sent : 13187
 transport # peers connected : 25
 transport # TCP sessions active : 39
 transport # bytes discarded by TCP (disconnect) : 2385476
 transport # network-level TCP disconnect events : 19413
 transport # bytes received via TCP : 74552828
 transport # IPv4 broadcast HELLO beacons received via udp : 1748
 transport # fragments retransmitted : 23264
 transport # fragments transmitted : 40981
 transport # bytes in message queue for other peers : 114448070
 transport # bytes payload received for other peers : 113739780
 transport # fragmentation transmissions completed : 1898
 transport # fragment acknowledgements received : 2771
 transport # total size of fragmented messages : 20507051
 transport # messages fragmented : 1901
 transport # REQUEST CONNECT messages received : 11720
 transport # ms throttling suggested : 30563912
 transport # PONGs unicast via reliable transport : 1305
 transport # PING messages received : 1305
 transport # PONG messages received : 1297
 transport # PING without HELLO messages sent : 2128
 transport # address revalidations started : 2785
 transport # TCP WELCOME messages received : 216
 transport # address records discarded : 1391
 transport # KEEPALIVE_RESPONSE messages discarded (not expected): 80
 transport # peers disconnected due to timeout : 975
 transport # acknowledgements sent for fragment : 93550
 transport # duplicate fragments received : 86514
 transport # fragments received : 190458
 transport # messages defragmented : 7374
 transport # PING message for different peer received : 70
 transport # fast reconnects failed : 26
 transport # failed connection attempts due to timeout : 3266
 transport # transport-service disconnect requests for TCP : 5
 transport # bytes payload dropped (other peer was not connected): 42836
 transport # PONGs dropped, no matching pending validation : 1
 transport # peers disconnected due to external request : 4
 transport # bytes payload discarded due to not connected peer : 5845839
 transport # refreshed my HELLO : 127
 transport # bandwidth quota violations by other peers : 449
 transport # bytes discarded by TCP (timeout) : 294
 transport # KEEPALIVE messages discarded (not connected) : 4
TagsNo tags attached.
Attached Files
test.log.tar.gz (75,487 bytes)

Activities

Matthias Wachs

2012-04-04 10:04

manager   ~0005693

Had a peer running for several hours showing the following statistics:

9P6V "transport " "# peers connected " == 4
9P6V "core " "# neighbour entries allocated " == 4
9P6V "core " "# entries in session map " == 3
9P6V "topology " "# peers connected " == 3

In this test transport connections and core notifications are equal, but core stats are inconsistent...

ATM Try to analyze this

Matthias Wachs

2012-04-04 10:08

manager   ~0005694

Added logfile test.log.tar.gz for the test mentioned above running for 15 hours showing a consistent picture of the analysis mentioned above

Matthias Wachs

2012-04-05 08:56

manager   ~0005697

Analysis with the new integration-tests/connection_watchdog tool running for 15 hours with tcp:

- core and transport (dis)connect notifications are consistent
- transport (dis)connect notifications && transport statistics value "# peers connected" are consistent
- transport (dis)connect notifications && core statistics value "# entries in session map" && "# neighbour entries allocated" are consistent

not consistent: transport " # TCP sessions active " && "# peers connected": tcp sessions << peers connected

Matthias Wachs

2012-04-05 09:02

manager   ~0005698

Snapshot:

mwachs@fulcrum:~/gnunet/gnunet-debug/src/integration-tests$ gnunet-transport -i -n; netstat -a inet -n | grep 2086; gnunet-statistics -s transport -n "# TCP sessions active"
Peer `RL7P': tcp [2001:4ca0:2001:42:230:48ff:febb:4bca]:2086
Peer `R69Q': tcp 203.178.133.3:2086
Peer `LJR8': tcp 193.174.155.36:2086
Peer `KNAP': tcp [2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086
Peer `IVT2': tcp [2001:470:d:ef4::11]:2086
Peer `94CH': tcp 83.169.43.154:2086
Peer `1G1M': tcp 132.252.152.193:2086
tcp 0 0 0.0.0.0:2086 0.0.0.0:* LISTEN
tcp 0 0 131.159.20.52:43843 193.174.155.36:2086 ESTABLISHED
tcp 0 1 131.159.20.52:52159 192.168.1.101:2086 SYN_SENT
tcp 0 0 131.159.20.52:58340 203.178.133.3:2086 ESTABLISHED
tcp 0 0 131.159.20.52:35071 132.252.152.193:2086 ESTABLISHED
tcp 0 0 131.159.20.52:41852 83.169.43.154:2086 ESTABLISHED
tcp 0 0 131.159.20.52:34521 131.159.74.67:2086 ESTABLISHED
tcp6 0 0 :::2086 :::* LISTEN
tcp6 0 0 2001:4ca0:2001:11:52934 2001:4ca0:2001:42::2086 ESTABLISHED
tcp6 0 0 2001:4ca0:2001:11:59229 2001:470:d:ef4::11:2086 ESTABLISHED
tcp6 0 0 2001:4ca0:2001:11:50269 2a01:4f8:101:281:7:2086 ESTABLISHED
tcp6 0 0 2001:4ca0:2001:11:45970 2002:c32e:b925:0:2:2086 ESTABLISHED
transport # TCP sessions active : 5

Matthias Wachs

2012-04-05 09:13

manager   ~0005699

Last edited: 2012-04-05 09:17

netstat, transport and peerinfo output matched + explanations:

---
Peer connections with corresponding tcp connections:


Peer `RL7P': tcp [2001:4ca0:2001:42:230:48ff:febb:4bca]:2086
tcp 0 0 131.159.20.52:34521 131.159.74.67:2086 ESTABLISHED
tcp6 0 0 2001:4ca0:2001:11:52934 2001:4ca0:2001:42::2086 ESTABLISHED

Peer `R69Q': tcp 203.178.133.3:2086
tcp 0 0 131.159.20.52:58340 203.178.133.3:2086 ESTABLISHED

Peer `LJR8': tcp 193.174.155.36:2086
tcp 0 0 131.159.20.52:43843 193.174.155.36:2086 ESTABLISHED

Peer `KNAP': tcp [2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086
tcp6 0 0 2001:4ca0:2001:11:45970 2002:c32e:b925:0:2:2086 ESTABLISHED

Peer `IVT2': tcp [2001:470:d:ef4::11]:2086
tcp6 0 0 2001:4ca0:2001:11:59229 2001:470:d:ef4::11:2086 ESTABLISHED

Peer `94CH': tcp 83.169.43.154:2086
tcp 0 0 131.159.20.52:41852 83.169.43.154:2086 ESTABLISHED

Peer `1G1M': tcp 132.252.152.193:2086
tcp 0 0 131.159.20.52:35071 132.252.152.193:2086 ESTABLISHED

----
The following tcp connections are left after matching tp connections to peer connections

# This address is not in peerinfo!
tcp 0 1 131.159.20.52:52159 192.168.1.101:2086 SYN_SENT


# This peer is not marked as connected in transport
Peer `R6OVGTG5M5EKRS0MBLGI5VEOG360RCJS2TCUU7OCTGKN5CSBIGCC7C75IU6J1P3F4OQC2D21203VECHCKQ95IBUMO6C46SH5MPAO9J8'
tcp6 0 0 2001:4ca0:2001:11:50269 2a01:4f8:101:281:7:2086 ESTABLISHED

---
# Number of sessions is really too low!

transport # TCP sessions active : 5

Matthias Wachs

2012-04-05 17:19

manager   ~0005701

A whole afternoon running with just tcp: everything consistent with 7 peers

Matthias Wachs

2012-04-05 17:52

manager   ~0005702

Last edited: 2012-04-05 18:16

Issue: number of TCP sessions is higher (~ 2 * connections)than number of transport connections, multiple TCP sessions to one peer. This sessions are initiated by validation, but not terminated.

Analysis:
ATS triggers every 15 seconds a connection switch for every peer to switch to every address!!!!

Apr 05 18:15:11-936444 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED'
Apr 05 18:15:11-936576 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED'
Apr 05 18:15:11-936678 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED'
Apr 05 18:15:11-936759 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED'
Apr 05 18:15:11-936801 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED'
Apr 05 18:15:11-936894 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED'
Apr 05 18:15:11-936983 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED'
Apr 05 18:15:11-937064 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED'
Apr 05 18:15:11-937160 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED'
Apr 05 18:15:11-937240 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED'
Apr 05 18:15:11-937330 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED'
Apr 05 18:15:11-937418 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED'
Apr 05 18:15:11-937458 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED'
Apr 05 18:15:11-937548 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED'
Apr 05 18:15:11-937636 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED'
Apr 05 18:15:11-937724 transport-31608 ERROR ATS tells us to switch to address '188.40.170.101:2086' session (nil) for peer `R6OV' in state `S_NOT_CONNECTED'
Apr 05 18:15:11-937789 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED'

Apr 05 18:15:26-939589 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED'
Apr 05 18:15:26-939703 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED'
Apr 05 18:15:26-939794 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED'
Apr 05 18:15:26-939858 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED'
Apr 05 18:15:26-939904 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED'
Apr 05 18:15:26-939962 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED'
Apr 05 18:15:26-940030 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED'
Apr 05 18:15:26-940114 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED'
Apr 05 18:15:26-940193 transport-31608 ERROR ATS tells us to switch to address '132.252.152.193:2086' session 0x25b8710 for peer `1G1M' in state `S_CONNECTED'
Apr 05 18:15:26-940269 transport-31608 ERROR ATS tells us to switch to address '71.177.216.36:2086' session 0x25b1aa0 for peer `IVT2' in state `S_CONNECTED'
Apr 05 18:15:26-940336 transport-31608 ERROR ATS tells us to switch to address '[2002:c32e:b925:0:224:21ff:fef0:9eb9]:2086' session 0x25b3c50 for peer `KNAP' in state `S_CONNECTED'
Apr 05 18:15:26-940401 transport-31608 ERROR ATS tells us to switch to address '134.2.197.47:2086' session 0x25cf160 for peer `KUPL' in state `S_NOT_CONNECTED'
Apr 05 18:15:26-940442 transport-31608 ERROR ATS tells us to switch to address '193.174.155.36:2086' session 0x25b7f80 for peer `LJR8' in state `S_CONNECTED'
Apr 05 18:15:26-940507 transport-31608 ERROR ATS tells us to switch to address '[2a01:488:66:1000:53a9:2b9a:0:1]:2086' session 0x25a7430 for peer `94CH' in state `S_CONNECTED'
Apr 05 18:15:26-940571 transport-31608 ERROR ATS tells us to switch to address '203.178.133.3:2086' session 0x25ba680 for peer `R69Q' in state `S_CONNECTED'
Apr 05 18:15:26-940653 transport-31608 ERROR ATS tells us to switch to address '188.40.170.101:2086' session (nil) for peer `R6OV' in state `S_NOT_CONNECTED'
Apr 05 18:15:26-940717 transport-31608 ERROR ATS tells us to switch to address '131.159.74.67:2086' session 0x25bae70 for peer `RL7P' in state `S_CONNECTED'

ISSUE!

Matthias Wachs

2012-05-04 16:44

manager   ~0005782

Found an issue:

transport only in- or decreases the statistics value "# peers connected"
When transport dies, the statistics value "# peers connected" stays the same
When it restarts, it increases the value for the new connected peers

Example:
- transport is connected to 7 peers
- transport dies and restarts
- transport connects to 7 peers
-> statistics value is 14!

Matthias Wachs

2012-05-04 16:50

manager   ~0005783

Core uses statistics_set to set the values "# entries in session map" "# neighbour entries allocated"

Matthias Wachs

2012-05-04 16:58

manager   ~0005784

TCP sets the session counter initial to 0

Christian Grothoff

2012-05-10 15:51

manager   ~0005855

SVN 21414 contains a relevant fix by re-starting the key-exchange code on timeout.

Matthias Wachs

2012-05-31 11:25

manager   ~0005967

With revision 21639

I analyzed the situation after the latest changes:

- 2 Peers with TCP and UDP connected directly running over night on my machine:
(so we sure have no connectivity problem or version incompatibilities)

All services (transport, core, fs, dht) say "peers connected" = 1 for both peers

So there no timeouts on upper layers ...

- Running a peer with "real gnunet"

core and transport values "peers connected" "# neighbour entries allocated" and core are consistent (with values around 15 - 17 peers)

Christian Grothoff

2012-05-31 23:52

manager   ~0005974

The issue is still very much alive. I am trying to get my peer from at home to connect to gnunet.org, and it fails. I have 1 inbound TCP connection, 1 transport-level connection but zero core connections. Address validations have clearly succeeded, but neither does transport build connections as desired, nor does core consistently use those connections that transport thinks are up.

Christian Grothoff

2012-06-01 15:18

manager   ~0005978

Ok, looks much, much better now, so I'm willing to accept this one as resolved ;-).

Issue History

Date Modified Username Field Change
2012-04-01 18:43 Christian Grothoff New Issue
2012-04-01 18:43 Christian Grothoff Status new => assigned
2012-04-01 18:43 Christian Grothoff Assigned To => Matthias Wachs
2012-04-04 10:04 Matthias Wachs Note Added: 0005693
2012-04-04 10:07 Matthias Wachs File Added: test.log.tar.gz
2012-04-04 10:08 Matthias Wachs Note Added: 0005694
2012-04-05 08:56 Matthias Wachs Note Added: 0005697
2012-04-05 09:02 Matthias Wachs Note Added: 0005698
2012-04-05 09:13 Matthias Wachs Note Added: 0005699
2012-04-05 09:17 Matthias Wachs Note Edited: 0005699
2012-04-05 17:19 Matthias Wachs Note Added: 0005701
2012-04-05 17:52 Matthias Wachs Note Added: 0005702
2012-04-05 18:15 Matthias Wachs Note Edited: 0005702
2012-04-05 18:15 Matthias Wachs Note Edited: 0005702
2012-04-05 18:16 Matthias Wachs Note Edited: 0005702
2012-05-04 16:44 Matthias Wachs Note Added: 0005782
2012-05-04 16:50 Matthias Wachs Note Added: 0005783
2012-05-04 16:58 Matthias Wachs Note Added: 0005784
2012-05-10 15:51 Christian Grothoff Note Added: 0005855
2012-05-31 11:25 Matthias Wachs Note Added: 0005967
2012-05-31 11:25 Matthias Wachs Status assigned => feedback
2012-05-31 23:52 Christian Grothoff Note Added: 0005974
2012-05-31 23:52 Christian Grothoff Status feedback => assigned
2012-06-01 15:18 Christian Grothoff Note Added: 0005978
2012-06-01 15:18 Christian Grothoff Status assigned => resolved
2012-06-01 15:18 Christian Grothoff Fixed in Version => 0.9.3
2012-06-01 15:18 Christian Grothoff Resolution open => fixed
2012-06-02 19:15 Christian Grothoff Status resolved => closed