View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003596 | GNUnet | cadet service | public | 2014-12-24 02:51 | 2018-06-07 00:25 |
Reporter | Christian Grothoff | Assigned To | Bart Polot | ||
Priority | normal | Severity | tweak | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | Git master | ||||
Target Version | 0.11.0pre66 | Fixed in Version | 0.11.0pre66 | ||
Summary | 0003596: CADET "kills" gnunet-service-dht with > 20000 new requests / minute (on startup) | ||||
Description | Dec 24 02:43:55 -- peer started $ date ;gnunet-statistics -s dht | grep client Wed Dec 24 02:45:30 CET 2014 dht # GET requests from clients injected: 47047 dht # GET STOP requests received from clients: 47044 dht # GET requests received from clients: 47046 dht # RESULTS queued for clients: 177276 dht # PUT requests received from clients: 6 grothoff@pixel:~$ date ;gnunet-statistics -s dht | grep client Wed Dec 24 02:46:03 CET 2014 dht # GET requests from clients injected: 60655 dht # GET STOP requests received from clients: 60653 dht # RESULTS queued for clients: 231716 dht # GET requests received from clients: 60655 dht # PUT requests received from clients: 6 The result is 100% CPU usage of the DHT service and high bandwidth consumption. | ||||
Additional Information | CADET's stats of "bad paths" seems to correlate with the DHT requests: grothoff@pixel:~$ gnunet-statistics -s cadet cadet # data on unknown channel: 26 cadet # messages received: 107 cadet # channels: 2 cadet # peers: 7 cadet # connections: 0 cadet # bad paths: 60655 cadet # local bad paths: 60655 cadet # data retransmitted: 3 cadet # duplicate PID: 1 cadet # duplicate PONG messages: 1 cadet # keepalives sent: 2 cadet # clients: 3 Note that after a while, the situation stabilizes (at those 60k requests for now for me). However, issuing 60k requests and then canceling them all all again is obviously bad style... | ||||
Tags | No tags attached. | ||||
related to | 0003247 | closed | Christian Grothoff | 100% CPU usage in MESH and FS after search |
|
Added a 100ms delay in case of DHT/core connection mismatch in r34889. |
|
That kind of missmatch does not explain the behavior, especially if it happens after a peer is running for hours and core is disconnected, and then CADET starts hitting DHT with tons of requests. You do need to impose a strict limit on the # parallel queries (with configuration option) and the rate limit (how many queries are issued at most per second). |
|
The behavior happens when DHT returns a path that is not valid, for instance, the first specified hop is not our neighbor. This does impose a rate (not yet configurable) of 10 requests per second *per peer*, which I think is very reasonable... |
|
Is that a bug in DHT, or just caused by the DHT having 'outdated' information (as in, the path existed in the past)? Also, why do you cancel the GET and re-issue it, instead of just keeping it running? I would have expected you to just ignore this 'useless' result, but keep the GET going. |
|
Is is caused by cached info in the DHT, just as the case you presented, where CORE is disconnected from a neighbor (or all) but the DHT still has the result of the old GET path. The reissiueing attemps to force the DHT to issue a new GET that would go through a new set of peers towards some STORE peer, but caching seems to interfere with this, unless until the result times out :( |
|
So then re-issuing the GET is exactly the worst thing to do. You will immediately get the cached information *again*, as for a new request this is 'new' information. If instead you keep the 'old' GET active ("want more results"), then the DHT will go out to find more results (unless your block plugin said that this was the last possible result, which you should avoid if there can be more than one). So definitively just drop/ignore that 'bad' path, and keep the GET open. |
|
If I leave the current one active I seem to *never* get anything anymore. Haven't debugged it yet, but my theory is that as the contents are the same, DHT treats it as duplicate and discards any new results. I thought of adding a DHT option to "treat same data with different path" as different results, but there are a million more important things to do before that... |
|
As I said, this may be caused by the return value from the block plugin. |
|
Look into block plugin, keep GET open. |
|
Ah, yes, that's another good point, if the result is byte-wise identical, then the typical bloom-filter logic (which is in the plugin) will determine the result to be a duplicate. I am not 100% sure if it is sufficient to adjust the plugin logic (as there might be other places in the code that also go for byte-wise identical, like maybe the database), but that's a place to start: modify the plugin to say different paths mean really different business. Now, that could also be dangerous as it may cause excessive storage of duplicates (a, aba, ababa, abababa are all different paths if taken literally), so to limit cost we might want something more complex to tell if the paths are different in an *interesting* sense. |
|
The paths should be "easy" to filter, as CADET always eliminates peers that appear in a path twice (as it is a very common ocurrence when concatenating GET + PUT paths). |
|
Seems fine now, at least I don't see crazy high loads anymore. |
Date Modified | Username | Field | Change |
---|---|---|---|
2014-12-24 02:51 | Christian Grothoff | New Issue | |
2014-12-24 02:51 | Christian Grothoff | Status | new => assigned |
2014-12-24 02:51 | Christian Grothoff | Assigned To | => Bart Polot |
2014-12-24 02:52 | Christian Grothoff | Priority | immediate => urgent |
2014-12-24 02:52 | Christian Grothoff | Summary | CADET "kills" gnunet-service-dht with > 20000 new requests / minute => CADET "kills" gnunet-service-dht with > 20000 new requests / minute (on startup) |
2014-12-24 02:53 | Christian Grothoff | Relationship added | related to 0003247 |
2015-01-15 16:17 | Bart Polot | Note Added: 0008775 | |
2015-01-15 16:17 | Bart Polot | Status | assigned => resolved |
2015-01-15 16:17 | Bart Polot | Fixed in Version | => Git master |
2015-01-15 16:17 | Bart Polot | Resolution | open => fixed |
2015-01-15 17:51 | Christian Grothoff | Note Added: 0008779 | |
2015-01-15 17:51 | Christian Grothoff | Status | resolved => feedback |
2015-01-15 17:51 | Christian Grothoff | Resolution | fixed => reopened |
2015-01-15 17:54 | Bart Polot | Note Added: 0008780 | |
2015-01-15 19:18 | Christian Grothoff | Note Added: 0008781 | |
2015-01-15 19:18 | Christian Grothoff | Status | feedback => assigned |
2015-01-15 19:21 | Bart Polot | Note Added: 0008782 | |
2015-01-15 19:24 | Christian Grothoff | Note Added: 0008783 | |
2015-01-15 19:28 | Bart Polot | Note Added: 0008784 | |
2015-01-15 19:30 | Christian Grothoff | Note Added: 0008786 | |
2015-01-15 19:31 | Bart Polot | Note Added: 0008787 | |
2015-01-15 19:31 | Bart Polot | Status | assigned => acknowledged |
2015-01-15 19:32 | Christian Grothoff | Note Added: 0008788 | |
2015-01-15 19:34 | Bart Polot | Note Added: 0008789 | |
2015-01-15 20:26 | Bart Polot | Priority | urgent => normal |
2015-01-15 20:26 | Bart Polot | Severity | major => tweak |
2015-03-08 13:00 | Christian Grothoff | Note Added: 0009003 | |
2015-03-08 13:00 | Christian Grothoff | Status | acknowledged => resolved |
2015-03-08 13:00 | Christian Grothoff | Fixed in Version | Git master => 0.11.0pre66 |
2015-03-08 13:00 | Christian Grothoff | Resolution | reopened => fixed |
2018-06-07 00:25 | Christian Grothoff | Status | resolved => closed |