View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001796 | GNUnet | UDP transport | public | 2011-09-19 16:13 | 2011-11-30 18:38 |
Reporter | Christian Grothoff | Assigned To | Matthias Wachs | ||
Priority | urgent | Severity | major | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Target Version | 0.9.0 | ||||
Summary | 0001796: testcase test_transport_api_unreliability_udp fails with timeout | ||||
Description | .Sep 19 13:17:39-927044 test_transport_api_unreliability_udp-12657 ERROR Timeout occurred while waiting for transmit_ready Sep 19 13:17:39-927496 test_transport_api_unreliability_udp-12657 ERROR Reliability failed: Last message sent 32, Next message scheduled 33, Last message received 32, Message expected 32 FAIL: test_transport_api_unreliability_udp ...................................................................................................... Could be that the timeout is simply set too low (seems to happen mostly on slower machines). | ||||
Steps To Reproduce | run the unreliabile UDP test on some of the slower (?) buildbots, such as sparc64-wachs or lenny-powerpc or even freebsd7-amd64-wachs. | ||||
Additional Information | unreliability now uses max_bps cfg value and timeout is increased | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
related to | 0001827 | closed | Matthias Wachs | UDP unreliability test fails on sparcbot |
related to | 0001813 | closed | Christian Grothoff | Receiving UDP flood for several minutes, rev 17144 |
|
Increasing the timeout and enforcing a quota does not solve the problem! With different quotas (10000, 20000, 25000, 40000, 50000) the UDP receive queue is running full and packets get lost. The receive queue is not slowly filling but suddenly! |
|
Using netstat -uacn : Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 0 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 95920 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 105856 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 104200 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* Proto Recv-Q Send-Q Local Address Foreign Address State udp 109168 0 0.0.0.0:12040 0.0.0.0:* udp 0 0 0.0.0.0:12050 0.0.0.0:* |
|
cat /proc/sys/net/ipv4/udp_mem 47424 63232 94848 |
|
using constant packet sizes: size 500 quota 600 works size 5000 quot 6000 works size 15000 quota 16000 fails |
|
On my system, the test seems to fail because in gnunet-service-transport.c, this call: <code> ret = GST_neighbours_calculate_receive_delay (peer, (message == NULL) ? 0 : ntohs (message->size), &do_forward); if (do_forward == GNUNET_YES) </code> determines that the sender has violated the quota, sets do_forward to NO and then drops virtually everything. If I manually force the "if" branch to be taken, the test passes. This raises the question why UDP violates the quota. Partially, the cause is obvious: for TCP/HTTP, we ALSO use flow-control to throttle the sender. For UDP, the flow-control doesn't exist, so we are truly at the mercy of the sender. Now, it could be that the bandwidth calculation in neighbours_calculate_receive_delay is wrong, or that the sender fails to throttle. |
|
Other tests don't override or HAVE a quota (the MAX_BPS in [transport-udp] is unrelated). The reason is simple: core defines the quota, and the tests don't have core so their quota should be infinity. So why does do_forward get set to NO again? |
|
Hmm. The bandwidth tracker is set to 5000000 -- same value as MAX_BPS from the configuration. I wonder why. IIRC MAX_BPS was supposed to define OUTGOING limits for UDP/Fragmentation, not INBOUND limits. Also, if that was the (enforced) outgoing limit, we should be fine even it it was the inbound limit. Strange. |
|
Further testing shows that clearly UDP's MAX_BPS *is* used in the bandwidth tracker of gnunet-service-transport_neighbours.c. Why? It has NOTHING to do there (UDP option vs. service, to begin with!). |
|
This is odd code (from test_transport_api_unreliability.c): if (cls == p1) { char * sec; long long unsigned int l_bps; GNUNET_asprintf(&sec, "transport-%s", test_plugin); if (GNUNET_CONFIGURATION_have_value (p1->cfg, sec, "MAX_BPS")) { GNUNET_CONFIGURATION_get_value_number (p1->cfg, sec, "MAX_BPS", &l_bps); max_bps_p1 = l_bps; } else max_bps_p1 = 1024 * 1024 * 1024; GNUNET_free (sec); GNUNET_TRANSPORT_set_quota (p1->th, &p2->id, GNUNET_BANDWIDTH_value_init (max_bps_p1), GNUNET_BANDWIDTH_value_init (max_bps_p1)); } |
|
Ok, removing it "solves" the problem. I think the issue with the above is that we set a low quota (lower than what we CAN transmit), and the inbound quota ends up being enforced by dropping. So far so good, this works if the outbound quota is fine. But the 'outbound quota' has exactly the same value (and should be enforced on the sender side). This then suggests to me that maybe the 'do_forward' check might be too strict!? (the quota itself should overall be obeyed, we've got another test for that). |
|
Oh, better reason. The MAX_BPS was 500,000 for peer1 and 5,000,000 for peer2. Now, with the set_quota code above, one peer was explicitly configured to transmit 5 MB/s, while the other was explicitly configured to reject anything more than 500kb/s. As a result, on most systems (that can do more than 500kb/s), the quota WAS being violated => caboom. Anyway, now that this is all understood, I'll just remove the crazy MAX_BPS-quota-setting code from the unreliability test driver. |
|
Fixed in SVN 17095. |
|
Cannot see any changes...tests fail exactly as before 34 messages transmitted, then nothing happens. Receiving peers UDP receive queue is full |
|
BTW: Increasing the notify transmit ready just delays when the test is failing. With UDP verbose enabled the sender sends all the time, but receiver does not receive a complete message |
|
Which system? At least the buildbots behave very differently now.. |
|
Uploaded some UDP logs from LRN. |
|
very slow but passing |
Date Modified | Username | Field | Change |
---|---|---|---|
2011-09-19 16:13 | Christian Grothoff | New Issue | |
2011-09-19 16:13 | Christian Grothoff | Status | new => assigned |
2011-09-19 16:13 | Christian Grothoff | Assigned To | => Matthias Wachs |
2011-09-19 16:19 | Christian Grothoff | Priority | normal => urgent |
2011-09-23 14:30 | Matthias Wachs | Status | assigned => confirmed |
2011-09-23 16:03 | Matthias Wachs | Reproducibility | sometimes => always |
2011-09-23 16:03 | Matthias Wachs | Status | confirmed => feedback |
2011-09-23 16:03 | Matthias Wachs | Resolution | open => fixed |
2011-09-23 16:03 | Matthias Wachs | Additional Information Updated | |
2011-09-23 16:43 | Matthias Wachs | Note Added: 0004627 | |
2011-09-23 16:57 | Matthias Wachs | Note Added: 0004628 | |
2011-09-23 17:17 | Matthias Wachs | Note Added: 0004629 | |
2011-09-23 17:17 | Matthias Wachs | Status | feedback => confirmed |
2011-09-23 18:17 | Matthias Wachs | Note Added: 0004630 | |
2011-09-29 00:12 | Christian Grothoff | Note Added: 0004634 | |
2011-09-29 00:24 | Christian Grothoff | Note Added: 0004635 | |
2011-09-29 00:29 | Christian Grothoff | Note Added: 0004636 | |
2011-09-29 00:30 | Christian Grothoff | Note Added: 0004637 | |
2011-09-29 00:32 | Christian Grothoff | Note Added: 0004638 | |
2011-09-29 00:45 | Christian Grothoff | Note Added: 0004639 | |
2011-09-29 00:50 | Christian Grothoff | Note Added: 0004640 | |
2011-09-29 00:51 | Christian Grothoff | Note Added: 0004641 | |
2011-09-29 00:51 | Christian Grothoff | Status | confirmed => resolved |
2011-09-29 00:51 | Christian Grothoff | Fixed in Version | => 0.9.0pre4 |
2011-09-29 00:51 | Christian Grothoff | Assigned To | Matthias Wachs => Christian Grothoff |
2011-09-29 14:35 | Matthias Wachs | Note Added: 0004646 | |
2011-09-29 14:35 | Matthias Wachs | Status | resolved => feedback |
2011-09-29 14:35 | Matthias Wachs | Resolution | fixed => reopened |
2011-09-29 14:38 | Matthias Wachs | Note Added: 0004647 | |
2011-09-29 15:02 | Christian Grothoff | Note Added: 0004648 | |
2011-09-29 15:02 | Christian Grothoff | Status | feedback => assigned |
2011-10-04 13:47 | Christian Grothoff | Assigned To | Christian Grothoff => Matthias Wachs |
2011-10-20 11:23 | Christian Grothoff | Relationship added | related to 0001827 |
2011-10-20 19:55 | Christian Grothoff | Relationship added | related to 0001813 |
2011-10-25 17:04 | Christian Grothoff | Target Version | 0.9.0pre4 => 0.9.0 |
2011-10-26 00:12 | Christian Grothoff | Severity | crash => minor |
2011-10-26 00:13 | Christian Grothoff | Priority | urgent => high |
2011-10-28 14:24 | Christian Grothoff | File Added: test_0009.log.tar.xz | |
2011-10-28 14:25 | Christian Grothoff | Note Added: 0004785 | |
2011-11-04 11:18 | Christian Grothoff | Priority | high => urgent |
2011-11-04 11:18 | Christian Grothoff | Severity | minor => major |
2011-11-16 20:55 | Matthias Wachs | Note Added: 0004933 | |
2011-11-16 20:55 | Matthias Wachs | Status | assigned => resolved |
2011-11-16 20:55 | Matthias Wachs | Resolution | reopened => fixed |
2011-11-30 18:38 | Christian Grothoff | Status | resolved => closed |