View Issue Details

IDProjectCategoryView StatusLast Update
0003993GNUnettestbed servicepublic2018-06-07 00:24
ReporterFlorian DoldAssigned ToChristian Grothoff 
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product VersionSVN HEAD 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0003993: testbed connections very slow with almost idle CPU
DescriptionI'm using testbed in the consensus profiler. While about 3 peers work fine, with anything larger it takes ages to connect the peers (about 3 minutes for 5 peers). Larger testbeds used to work completely fine quite a while ago, and I'm not sure if it's a problem with my configuration, with testbed itself or with transport/core.


Here's a log snippet (for 5 peers):

Sep 28 21:07:17-489476 testbed-OC-11750 DEBUG 0xc: Trail 14 to connect to peer 6DEV
Sep 28 21:07:17-489569 transport-11764 INFO Asked to connect to peer `6DEV' (state: S_CONNECTED)
Sep 28 21:07:17-489589 transport-11764 INFO Ignoring request to try to connect, already connected to `6DEV'!
[repeated]
Sep 28 21:07:17-860209 transport-11757 INFO Neighbour `TY5J' changed state to S_CONNECTED with timeout Mon Sep 28 21:12:17 2015
[same msg for other peers]
Sep 28 21:07:22-716072 transport-11765 INFO Neighbour `6DEV' changed state to S_CONNECTED with timeout Mon Sep 28 21:12:22 2015
Sep 28 21:07:37-648413 testbed-11750 DEBUG Queueing message of type 473, size 73 for sending
Sep 28 21:07:37-648530 testbed-OC-11750 DEBUG 0xc: Cleaning up occ
Sep 28 21:07:37-648573 testbed-connectionpool-11750 DEBUG Cleaning up get handle 0xf96c40 for service 2, peer 0


There seems to be a combination of two things going wrong here (I suppose, I'm not to familiar with testbed):

- Testbed itself is overly eager with the exponential backoff. Transport seems to be finished at 21:07:22, but testbed waits until 21:07:37 and idles in between.

- Transport seems to connect rather slowly, I'm not sure why. What also surprises me is that my CPU is practically idle while the peers connect.
Steps To Reproduce$ gnunet-consensus-profiler -p 3 -c bad_testbed.conf
# ... works

$ gnunet-consensus-profiler -p 4 -c bad_testbed.conf
# ... hangs
Additional Information$ cat bad_testbed.conf
[testbed]
OVERLAY_TOPOLOGY = CLIQUE

[core]
FORCESTART = NO

[fs]
FORCESTART = NO

[dht]
FORCESTART = NO

[cadet]
FORCESTART = NO

[nse]
FORCESTART = NO
WORKBITS = 0

[revocation]
FORCESTART = NO

[topology]
FORCESTART = NO

[hostlist]
FORCESTART = NO

[gns]
FORCESTART = NO
TagsNo tags attached.

Activities

Christian Grothoff

2015-10-26 16:27

manager   ~0009793

Is this still an issue?

Florian Dold

2015-10-26 17:49

developer   ~0009796

It is still an issue in r36587, but not an urgent one.

The output looks different though:

$ gnunet-testbed-profiler -c bad_testbed.conf -p 4
Establishing links. Please wait
..Oct 26 17:46:53-757516 nat-12868 WARNING Asked to validate one of my addresses (...) and validation failed!
Oct 26 17:46:53-985245 nat-12859 WARNING Error processing a STUN request: 5
Oct 26 17:46:57-915862 nat-12858 WARNING Error processing a STUN request: 5
Oct 26 17:46:57-924604 nat-12868 WARNING Error processing a STUN request: 5
Oct 26 17:46:57-932245 nat-12867 WARNING Error processing a STUN request: 5
Oct 26 17:47:13-367768 nat-12859 WARNING Asked to validate one of my addresses (...) and validation failed!
F
Aborting due to very high failure rate
2 links succeeded
1 links failed due to timeouts
Oct 26 17:47:22-810771 testbed-api-topology-12849 WARNING Error while establishing a link: 0x9: Timeout during TRANSPORT_try_connect() at peer DK5W -- Retrying
Oct 26 17:47:22-811083 gnunet-testbed-profiler-12849 WARNING Aborting

Florian Dold

2015-10-26 17:52

developer   ~0009797

(And again, it works with -p 3 and the same configuration, which is the odd part here)

Christian Grothoff

2015-10-27 14:04

manager   ~0009804

$ ./gnunet-testbed-profiler -c bad_testbed.conf -p4
Establishing links. Please wait
..F

Before 'F' is printed, in another console:

$ ps ax | grep gnunet
22522 pts/6 S+ 0:00 /home/grothoff/svn/gnunet/src/testbed/.libs/lt-gnunet-testbed-profiler -c bad_testbed.conf -p4
22536 pts/6 S+ 0:00 gnunet-helper-testbed
22537 pts/6 S+ 0:00 gnunet-service-testbed -c /tmp/testbed-helperFTFzpE/0/config
22538 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/0/config
22539 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/1/config
22540 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/2/config
22541 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/3/config
22542 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-core -c /tmp/testbedugpCkD/1/config
22543 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-core -c /tmp/testbedugpCkD/0/config
22544 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/1/config
22545 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/0/config
22546 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/1/config
22547 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/0/config
22548 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/1/config
22549 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/0/config
22550 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/1/config
22551 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/0/config
22552 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/0/config
22553 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/1/config
22554 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/3/config
22555 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/2/config
22556 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/3/config
22557 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/2/config
22558 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/3/config
22559 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/3/config
22560 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/2/config
22561 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/3/config
22562 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/2/config
22563 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/2/config
22566 pts/1 S+ 0:00 grep gnunet

So basically, with this config, we only really start 2 peers.

Christian Grothoff

2015-10-27 14:41

manager   ~0009805

Ok, so the cause is simple: to establish a connection between peers A and B, testbed connects to transport of both peers (to get the HELLO from A and offer it to B), but only to CORE of "A" (to monitor for the connection being 'up'). Now, as testbed never connects to CORE of "B", the CORE of "B" is never started, and then the connection is never established (at the CORE level), so testbed (monitoring CORE of A) eventually times out with an error (and we see only two CORE processes instead of four).

Now, if we have 3 peers, we are lucky: A->B and B->C and C->A connections start all 3 CORE services. But with 4, we (by chance) don't end up with a cycle like that, and things break.

Sree Harsha Totakura

2015-10-27 14:42

developer   ~0009806

No, we are starting 4 peers. In the output 4 instances of ARM, TRANSPORT are started.

Sree Harsha Totakura

2015-10-27 14:44

developer   ~0009807

So, shouldn't enabling FORCESTART in CORE fix this?

Christian Grothoff

2015-10-27 14:54

manager   ~0009808

I've "fixed" this issue by documenting the problem in the developer handbook at https://gnunet.org/testbed-caveats

Issue History

Date Modified Username Field Change
2015-09-28 21:21 Florian Dold New Issue
2015-09-28 21:21 Florian Dold Status new => assigned
2015-09-28 21:21 Florian Dold Assigned To => Sree Harsha Totakura
2015-10-26 16:27 Christian Grothoff Note Added: 0009793
2015-10-26 17:49 Florian Dold Note Added: 0009796
2015-10-26 17:49 Florian Dold Priority urgent => normal
2015-10-26 17:49 Florian Dold Steps to Reproduce Updated View Revisions
2015-10-26 17:49 Florian Dold Additional Information Updated View Revisions
2015-10-26 17:52 Florian Dold Note Added: 0009797
2015-10-27 14:04 Christian Grothoff Note Added: 0009804
2015-10-27 14:41 Christian Grothoff Note Added: 0009805
2015-10-27 14:42 Sree Harsha Totakura Note Added: 0009806
2015-10-27 14:44 Sree Harsha Totakura Note Added: 0009807
2015-10-27 14:54 Christian Grothoff Note Added: 0009808
2015-10-27 14:55 Christian Grothoff Status assigned => resolved
2015-10-27 14:55 Christian Grothoff Fixed in Version => 0.11.0pre66
2015-10-27 14:55 Christian Grothoff Resolution open => fixed
2015-10-27 14:55 Christian Grothoff Assigned To Sree Harsha Totakura => Christian Grothoff
2015-10-27 14:55 Christian Grothoff Target Version => 0.11.0pre66
2018-06-07 00:24 Christian Grothoff Status resolved => closed