View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003993 | GNUnet | testbed service | public | 2015-09-28 21:21 | 2018-06-07 00:24 |
Reporter | Florian Dold | Assigned To | Christian Grothoff | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | Git master | ||||
Target Version | 0.11.0pre66 | Fixed in Version | 0.11.0pre66 | ||
Summary | 0003993: testbed connections very slow with almost idle CPU | ||||
Description | I'm using testbed in the consensus profiler. While about 3 peers work fine, with anything larger it takes ages to connect the peers (about 3 minutes for 5 peers). Larger testbeds used to work completely fine quite a while ago, and I'm not sure if it's a problem with my configuration, with testbed itself or with transport/core. Here's a log snippet (for 5 peers): Sep 28 21:07:17-489476 testbed-OC-11750 DEBUG 0xc: Trail 14 to connect to peer 6DEV Sep 28 21:07:17-489569 transport-11764 INFO Asked to connect to peer `6DEV' (state: S_CONNECTED) Sep 28 21:07:17-489589 transport-11764 INFO Ignoring request to try to connect, already connected to `6DEV'! [repeated] Sep 28 21:07:17-860209 transport-11757 INFO Neighbour `TY5J' changed state to S_CONNECTED with timeout Mon Sep 28 21:12:17 2015 [same msg for other peers] Sep 28 21:07:22-716072 transport-11765 INFO Neighbour `6DEV' changed state to S_CONNECTED with timeout Mon Sep 28 21:12:22 2015 Sep 28 21:07:37-648413 testbed-11750 DEBUG Queueing message of type 473, size 73 for sending Sep 28 21:07:37-648530 testbed-OC-11750 DEBUG 0xc: Cleaning up occ Sep 28 21:07:37-648573 testbed-connectionpool-11750 DEBUG Cleaning up get handle 0xf96c40 for service 2, peer 0 There seems to be a combination of two things going wrong here (I suppose, I'm not to familiar with testbed): - Testbed itself is overly eager with the exponential backoff. Transport seems to be finished at 21:07:22, but testbed waits until 21:07:37 and idles in between. - Transport seems to connect rather slowly, I'm not sure why. What also surprises me is that my CPU is practically idle while the peers connect. | ||||
Steps To Reproduce | $ gnunet-consensus-profiler -p 3 -c bad_testbed.conf # ... works $ gnunet-consensus-profiler -p 4 -c bad_testbed.conf # ... hangs | ||||
Additional Information | $ cat bad_testbed.conf [testbed] OVERLAY_TOPOLOGY = CLIQUE [core] FORCESTART = NO [fs] FORCESTART = NO [dht] FORCESTART = NO [cadet] FORCESTART = NO [nse] FORCESTART = NO WORKBITS = 0 [revocation] FORCESTART = NO [topology] FORCESTART = NO [hostlist] FORCESTART = NO [gns] FORCESTART = NO | ||||
Tags | No tags attached. | ||||
|
Is this still an issue? |
|
It is still an issue in r36587, but not an urgent one. The output looks different though: $ gnunet-testbed-profiler -c bad_testbed.conf -p 4 Establishing links. Please wait ..Oct 26 17:46:53-757516 nat-12868 WARNING Asked to validate one of my addresses (...) and validation failed! Oct 26 17:46:53-985245 nat-12859 WARNING Error processing a STUN request: 5 Oct 26 17:46:57-915862 nat-12858 WARNING Error processing a STUN request: 5 Oct 26 17:46:57-924604 nat-12868 WARNING Error processing a STUN request: 5 Oct 26 17:46:57-932245 nat-12867 WARNING Error processing a STUN request: 5 Oct 26 17:47:13-367768 nat-12859 WARNING Asked to validate one of my addresses (...) and validation failed! F Aborting due to very high failure rate 2 links succeeded 1 links failed due to timeouts Oct 26 17:47:22-810771 testbed-api-topology-12849 WARNING Error while establishing a link: 0x9: Timeout during TRANSPORT_try_connect() at peer DK5W -- Retrying Oct 26 17:47:22-811083 gnunet-testbed-profiler-12849 WARNING Aborting |
|
(And again, it works with -p 3 and the same configuration, which is the odd part here) |
|
$ ./gnunet-testbed-profiler -c bad_testbed.conf -p4 Establishing links. Please wait ..F Before 'F' is printed, in another console: $ ps ax | grep gnunet 22522 pts/6 S+ 0:00 /home/grothoff/svn/gnunet/src/testbed/.libs/lt-gnunet-testbed-profiler -c bad_testbed.conf -p4 22536 pts/6 S+ 0:00 gnunet-helper-testbed 22537 pts/6 S+ 0:00 gnunet-service-testbed -c /tmp/testbed-helperFTFzpE/0/config 22538 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/0/config 22539 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/1/config 22540 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/2/config 22541 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-arm -c /tmp/testbedugpCkD/3/config 22542 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-core -c /tmp/testbedugpCkD/1/config 22543 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-core -c /tmp/testbedugpCkD/0/config 22544 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/1/config 22545 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/0/config 22546 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/1/config 22547 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/0/config 22548 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/1/config 22549 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/0/config 22550 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/1/config 22551 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/0/config 22552 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/0/config 22553 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/1/config 22554 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/3/config 22555 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-transport -c /tmp/testbedugpCkD/2/config 22556 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/3/config 22557 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-peerinfo -c /tmp/testbedugpCkD/2/config 22558 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/3/config 22559 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/3/config 22560 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-statistics -c /tmp/testbedugpCkD/2/config 22561 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/3/config 22562 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-ats -c /tmp/testbedugpCkD/2/config 22563 pts/6 S+ 0:00 /home/grothoff/lib//gnunet/libexec/gnunet-service-resolver -c /tmp/testbedugpCkD/2/config 22566 pts/1 S+ 0:00 grep gnunet So basically, with this config, we only really start 2 peers. |
|
Ok, so the cause is simple: to establish a connection between peers A and B, testbed connects to transport of both peers (to get the HELLO from A and offer it to B), but only to CORE of "A" (to monitor for the connection being 'up'). Now, as testbed never connects to CORE of "B", the CORE of "B" is never started, and then the connection is never established (at the CORE level), so testbed (monitoring CORE of A) eventually times out with an error (and we see only two CORE processes instead of four). Now, if we have 3 peers, we are lucky: A->B and B->C and C->A connections start all 3 CORE services. But with 4, we (by chance) don't end up with a cycle like that, and things break. |
|
No, we are starting 4 peers. In the output 4 instances of ARM, TRANSPORT are started. |
|
So, shouldn't enabling FORCESTART in CORE fix this? |
|
I've "fixed" this issue by documenting the problem in the developer handbook at https://gnunet.org/testbed-caveats |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-09-28 21:21 | Florian Dold | New Issue | |
2015-09-28 21:21 | Florian Dold | Status | new => assigned |
2015-09-28 21:21 | Florian Dold | Assigned To | => Sree Harsha Totakura |
2015-10-26 16:27 | Christian Grothoff | Note Added: 0009793 | |
2015-10-26 17:49 | Florian Dold | Note Added: 0009796 | |
2015-10-26 17:49 | Florian Dold | Priority | urgent => normal |
2015-10-26 17:49 | Florian Dold | Steps to Reproduce Updated | |
2015-10-26 17:49 | Florian Dold | Additional Information Updated | |
2015-10-26 17:52 | Florian Dold | Note Added: 0009797 | |
2015-10-27 14:04 | Christian Grothoff | Note Added: 0009804 | |
2015-10-27 14:41 | Christian Grothoff | Note Added: 0009805 | |
2015-10-27 14:42 | Sree Harsha Totakura | Note Added: 0009806 | |
2015-10-27 14:44 | Sree Harsha Totakura | Note Added: 0009807 | |
2015-10-27 14:54 | Christian Grothoff | Note Added: 0009808 | |
2015-10-27 14:55 | Christian Grothoff | Status | assigned => resolved |
2015-10-27 14:55 | Christian Grothoff | Fixed in Version | => 0.11.0pre66 |
2015-10-27 14:55 | Christian Grothoff | Resolution | open => fixed |
2015-10-27 14:55 | Christian Grothoff | Assigned To | Sree Harsha Totakura => Christian Grothoff |
2015-10-27 14:55 | Christian Grothoff | Target Version | => 0.11.0pre66 |
2018-06-07 00:24 | Christian Grothoff | Status | resolved => closed |