View Issue Details

IDProjectCategoryView StatusLast Update
0001796GNUnetUDP transportpublic2011-11-30 18:38
ReporterChristian Grothoff Assigned ToMatthias Wachs  
PriorityurgentSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Target Version0.9.0 
Summary0001796: testcase test_transport_api_unreliability_udp fails with timeout
Description.Sep 19 13:17:39-927044 test_transport_api_unreliability_udp-12657 ERROR Timeout occurred while waiting for transmit_ready
Sep 19 13:17:39-927496 test_transport_api_unreliability_udp-12657 ERROR Reliability failed: Last message sent 32, Next message scheduled 33, Last message received 32, Message expected 32
FAIL: test_transport_api_unreliability_udp
......................................................................................................

Could be that the timeout is simply set too low (seems to happen mostly on slower machines).
Steps To Reproducerun the unreliabile UDP test on some of the slower (?) buildbots, such as sparc64-wachs or lenny-powerpc or even freebsd7-amd64-wachs.
Additional Informationunreliability now uses max_bps cfg value and timeout is increased
TagsNo tags attached.
Attached Files
test_0009.log.tar.xz (275,576 bytes)

Relationships

related to 0001827 closedMatthias Wachs UDP unreliability test fails on sparcbot 
related to 0001813 closedChristian Grothoff Receiving UDP flood for several minutes, rev 17144 

Activities

Matthias Wachs

2011-09-23 16:43

manager   ~0004627

Increasing the timeout and enforcing a quota does not solve the problem!

With different quotas (10000, 20000, 25000, 40000, 50000) the UDP receive queue is running full and packets get lost.

The receive queue is not slowly filling but suddenly!

Matthias Wachs

2011-09-23 16:57

manager   ~0004628

Using netstat -uacn :

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 95920 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 105856 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 104200 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Proto Recv-Q Send-Q Local Address Foreign Address State
udp 109168 0 0.0.0.0:12040 0.0.0.0:*
udp 0 0 0.0.0.0:12050 0.0.0.0:*

Matthias Wachs

2011-09-23 17:17

manager   ~0004629

cat /proc/sys/net/ipv4/udp_mem
47424 63232 94848

Matthias Wachs

2011-09-23 18:17

manager   ~0004630

using constant packet sizes:

size 500 quota 600 works
size 5000 quot 6000 works
size 15000 quota 16000 fails

Christian Grothoff

2011-09-29 00:12

manager   ~0004634

On my system, the test seems to fail because in gnunet-service-transport.c, this call:
<code>
      ret =
          GST_neighbours_calculate_receive_delay (peer,
                                                  (message ==
                                                   NULL) ? 0 :
                                                  ntohs (message->size),
                                                  &do_forward);
      if (do_forward == GNUNET_YES)
</code>
determines that the sender has violated the quota, sets do_forward to NO and then drops virtually everything.

If I manually force the "if" branch to be taken, the test passes. This raises the question why UDP violates the quota.

Partially, the cause is obvious: for TCP/HTTP, we ALSO use flow-control to throttle the sender. For UDP, the flow-control doesn't exist, so we are truly at the mercy of the sender. Now, it could be that the bandwidth calculation in neighbours_calculate_receive_delay is wrong, or that the sender fails to throttle.

Christian Grothoff

2011-09-29 00:24

manager   ~0004635

Other tests don't override or HAVE a quota (the MAX_BPS in [transport-udp] is unrelated). The reason is simple: core defines the quota, and the tests don't have core so their quota should be infinity. So why does do_forward get set to NO again?

Christian Grothoff

2011-09-29 00:29

manager   ~0004636

Hmm. The bandwidth tracker is set to 5000000 -- same value as MAX_BPS from the configuration. I wonder why. IIRC MAX_BPS was supposed to define OUTGOING limits for UDP/Fragmentation, not INBOUND limits. Also, if that was the (enforced) outgoing limit, we should be fine even it it was the inbound limit. Strange.

Christian Grothoff

2011-09-29 00:30

manager   ~0004637

Further testing shows that clearly UDP's MAX_BPS *is* used in the bandwidth tracker of gnunet-service-transport_neighbours.c. Why? It has NOTHING to do there (UDP option vs. service, to begin with!).

Christian Grothoff

2011-09-29 00:32

manager   ~0004638

This is odd code (from test_transport_api_unreliability.c):

  if (cls == p1)
  {
    char * sec;
    long long unsigned int l_bps;

    GNUNET_asprintf(&sec, "transport-%s", test_plugin);
    if (GNUNET_CONFIGURATION_have_value (p1->cfg, sec, "MAX_BPS"))
    {
      GNUNET_CONFIGURATION_get_value_number (p1->cfg, sec, "MAX_BPS",
                                             &l_bps);
      max_bps_p1 = l_bps;
    }
    else
      max_bps_p1 = 1024 * 1024 * 1024;
    GNUNET_free (sec);

    GNUNET_TRANSPORT_set_quota (p1->th, &p2->id,
                                GNUNET_BANDWIDTH_value_init (max_bps_p1),
                                GNUNET_BANDWIDTH_value_init (max_bps_p1));
  }

Christian Grothoff

2011-09-29 00:45

manager   ~0004639

Ok, removing it "solves" the problem. I think the issue with the above is that we set a low quota (lower than what we CAN transmit), and the inbound quota ends up being enforced by dropping. So far so good, this works if the outbound quota is fine.

But the 'outbound quota' has exactly the same value (and should be enforced on the sender side). This then suggests to me that maybe the 'do_forward' check might be too strict!? (the quota itself should overall be obeyed, we've got another test for that).

Christian Grothoff

2011-09-29 00:50

manager   ~0004640

Oh, better reason. The MAX_BPS was 500,000 for peer1 and 5,000,000 for peer2. Now, with the set_quota code above, one peer was explicitly configured to transmit 5 MB/s, while the other was explicitly configured to reject anything more than 500kb/s. As a result, on most systems (that can do more than 500kb/s), the quota WAS being violated => caboom.

Anyway, now that this is all understood, I'll just remove the crazy MAX_BPS-quota-setting code from the unreliability test driver.

Christian Grothoff

2011-09-29 00:51

manager   ~0004641

Fixed in SVN 17095.

Matthias Wachs

2011-09-29 14:35

manager   ~0004646

Cannot see any changes...tests fail exactly as before

34 messages transmitted, then nothing happens.
Receiving peers UDP receive queue is full

Matthias Wachs

2011-09-29 14:38

manager   ~0004647

BTW: Increasing the notify transmit ready just delays when the test is failing.

With UDP verbose enabled the sender sends all the time, but receiver does not receive a complete message

Christian Grothoff

2011-09-29 15:02

manager   ~0004648

Which system? At least the buildbots behave very differently now..

Christian Grothoff

2011-10-28 14:25

manager   ~0004785

Uploaded some UDP logs from LRN.

Matthias Wachs

2011-11-16 20:55

manager   ~0004933

very slow but passing

Issue History

Date Modified Username Field Change
2011-09-19 16:13 Christian Grothoff New Issue
2011-09-19 16:13 Christian Grothoff Status new => assigned
2011-09-19 16:13 Christian Grothoff Assigned To => Matthias Wachs
2011-09-19 16:19 Christian Grothoff Priority normal => urgent
2011-09-23 14:30 Matthias Wachs Status assigned => confirmed
2011-09-23 16:03 Matthias Wachs Reproducibility sometimes => always
2011-09-23 16:03 Matthias Wachs Status confirmed => feedback
2011-09-23 16:03 Matthias Wachs Resolution open => fixed
2011-09-23 16:03 Matthias Wachs Additional Information Updated
2011-09-23 16:43 Matthias Wachs Note Added: 0004627
2011-09-23 16:57 Matthias Wachs Note Added: 0004628
2011-09-23 17:17 Matthias Wachs Note Added: 0004629
2011-09-23 17:17 Matthias Wachs Status feedback => confirmed
2011-09-23 18:17 Matthias Wachs Note Added: 0004630
2011-09-29 00:12 Christian Grothoff Note Added: 0004634
2011-09-29 00:24 Christian Grothoff Note Added: 0004635
2011-09-29 00:29 Christian Grothoff Note Added: 0004636
2011-09-29 00:30 Christian Grothoff Note Added: 0004637
2011-09-29 00:32 Christian Grothoff Note Added: 0004638
2011-09-29 00:45 Christian Grothoff Note Added: 0004639
2011-09-29 00:50 Christian Grothoff Note Added: 0004640
2011-09-29 00:51 Christian Grothoff Note Added: 0004641
2011-09-29 00:51 Christian Grothoff Status confirmed => resolved
2011-09-29 00:51 Christian Grothoff Fixed in Version => 0.9.0pre4
2011-09-29 00:51 Christian Grothoff Assigned To Matthias Wachs => Christian Grothoff
2011-09-29 14:35 Matthias Wachs Note Added: 0004646
2011-09-29 14:35 Matthias Wachs Status resolved => feedback
2011-09-29 14:35 Matthias Wachs Resolution fixed => reopened
2011-09-29 14:38 Matthias Wachs Note Added: 0004647
2011-09-29 15:02 Christian Grothoff Note Added: 0004648
2011-09-29 15:02 Christian Grothoff Status feedback => assigned
2011-10-04 13:47 Christian Grothoff Assigned To Christian Grothoff => Matthias Wachs
2011-10-20 11:23 Christian Grothoff Relationship added related to 0001827
2011-10-20 19:55 Christian Grothoff Relationship added related to 0001813
2011-10-25 17:04 Christian Grothoff Target Version 0.9.0pre4 => 0.9.0
2011-10-26 00:12 Christian Grothoff Severity crash => minor
2011-10-26 00:13 Christian Grothoff Priority urgent => high
2011-10-28 14:24 Christian Grothoff File Added: test_0009.log.tar.xz
2011-10-28 14:25 Christian Grothoff Note Added: 0004785
2011-11-04 11:18 Christian Grothoff Priority high => urgent
2011-11-04 11:18 Christian Grothoff Severity minor => major
2011-11-16 20:55 Matthias Wachs Note Added: 0004933
2011-11-16 20:55 Matthias Wachs Status assigned => resolved
2011-11-16 20:55 Matthias Wachs Resolution reopened => fixed
2011-11-30 18:38 Christian Grothoff Status resolved => closed