View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003596||GNUnet||cadet service||public||2014-12-24 02:51||2018-06-07 00:25|
|Reporter||Christian Grothoff||Assigned To||Bart Polot|
|Product Version||SVN HEAD|
|Target Version||0.11.0pre66||Fixed in Version||0.11.0pre66|
|Summary||0003596: CADET "kills" gnunet-service-dht with > 20000 new requests / minute (on startup)|
|Description||Dec 24 02:43:55 -- peer started|
$ date ;gnunet-statistics -s dht | grep client
Wed Dec 24 02:45:30 CET 2014
dht # GET requests from clients injected: 47047
dht # GET STOP requests received from clients: 47044
dht # GET requests received from clients: 47046
dht # RESULTS queued for clients: 177276
dht # PUT requests received from clients: 6
grothoff@pixel:~$ date ;gnunet-statistics -s dht | grep client
Wed Dec 24 02:46:03 CET 2014
dht # GET requests from clients injected: 60655
dht # GET STOP requests received from clients: 60653
dht # RESULTS queued for clients: 231716
dht # GET requests received from clients: 60655
dht # PUT requests received from clients: 6
The result is 100% CPU usage of the DHT service and high bandwidth consumption.
|Additional Information||CADET's stats of "bad paths" seems to correlate with the DHT requests:|
grothoff@pixel:~$ gnunet-statistics -s cadet
cadet # data on unknown channel: 26
cadet # messages received: 107
cadet # channels: 2
cadet # peers: 7
cadet # connections: 0
cadet # bad paths: 60655
cadet # local bad paths: 60655
cadet # data retransmitted: 3
cadet # duplicate PID: 1
cadet # duplicate PONG messages: 1
cadet # keepalives sent: 2
cadet # clients: 3
Note that after a while, the situation stabilizes (at those 60k requests for now for me). However, issuing 60k requests and then canceling them all all again is obviously bad style...
|Tags||No tags attached.|
||Added a 100ms delay in case of DHT/core connection mismatch in r34889.|
That kind of missmatch does not explain the behavior, especially if it happens after a peer is running for hours and core is disconnected, and then CADET starts hitting DHT with tons of requests.
You do need to impose a strict limit on the # parallel queries (with configuration option) and the rate limit (how many queries are issued at most per second).
The behavior happens when DHT returns a path that is not valid, for instance, the first specified hop is not our neighbor.
This does impose a rate (not yet configurable) of 10 requests per second *per peer*, which I think is very reasonable...
Is that a bug in DHT, or just caused by the DHT having 'outdated' information (as in, the path existed in the past)?
Also, why do you cancel the GET and re-issue it, instead of just keeping it running? I would have expected you to just ignore this 'useless' result, but keep the GET going.
Is is caused by cached info in the DHT, just as the case you presented, where CORE is disconnected from a neighbor (or all) but the DHT still has the result of the old GET path.
The reissiueing attemps to force the DHT to issue a new GET that would go through a new set of peers towards some STORE peer, but caching seems to interfere with this, unless until the result times out :(
So then re-issuing the GET is exactly the worst thing to do. You will immediately get the cached information *again*, as for a new request this is 'new' information.
If instead you keep the 'old' GET active ("want more results"), then the DHT will go out to find more results (unless your block plugin said that this was the last possible result, which you should avoid if there can be more than one).
So definitively just drop/ignore that 'bad' path, and keep the GET open.
If I leave the current one active I seem to *never* get anything anymore. Haven't debugged it yet, but my theory is that as the contents are the same, DHT treats it as duplicate and discards any new results.
I thought of adding a DHT option to "treat same data with different path" as different results, but there are a million more important things to do before that...
||As I said, this may be caused by the return value from the block plugin.|
||Look into block plugin, keep GET open.|
||Ah, yes, that's another good point, if the result is byte-wise identical, then the typical bloom-filter logic (which is in the plugin) will determine the result to be a duplicate. I am not 100% sure if it is sufficient to adjust the plugin logic (as there might be other places in the code that also go for byte-wise identical, like maybe the database), but that's a place to start: modify the plugin to say different paths mean really different business. Now, that could also be dangerous as it may cause excessive storage of duplicates (a, aba, ababa, abababa are all different paths if taken literally), so to limit cost we might want something more complex to tell if the paths are different in an *interesting* sense.|
||The paths should be "easy" to filter, as CADET always eliminates peers that appear in a path twice (as it is a very common ocurrence when concatenating GET + PUT paths).|
||Seems fine now, at least I don't see crazy high loads anymore.|
|2014-12-24 02:51||Christian Grothoff||New Issue|
|2014-12-24 02:51||Christian Grothoff||Status||new => assigned|
|2014-12-24 02:51||Christian Grothoff||Assigned To||=> Bart Polot|
|2014-12-24 02:52||Christian Grothoff||Priority||immediate => urgent|
|2014-12-24 02:52||Christian Grothoff||Summary||CADET "kills" gnunet-service-dht with > 20000 new requests / minute => CADET "kills" gnunet-service-dht with > 20000 new requests / minute (on startup)|
|2014-12-24 02:53||Christian Grothoff||Relationship added||related to 0003247|
|2015-01-15 16:17||Bart Polot||Note Added: 0008775|
|2015-01-15 16:17||Bart Polot||Status||assigned => resolved|
|2015-01-15 16:17||Bart Polot||Fixed in Version||=> SVN HEAD|
|2015-01-15 16:17||Bart Polot||Resolution||open => fixed|
|2015-01-15 17:51||Christian Grothoff||Note Added: 0008779|
|2015-01-15 17:51||Christian Grothoff||Status||resolved => feedback|
|2015-01-15 17:51||Christian Grothoff||Resolution||fixed => reopened|
|2015-01-15 17:54||Bart Polot||Note Added: 0008780|
|2015-01-15 19:18||Christian Grothoff||Note Added: 0008781|
|2015-01-15 19:18||Christian Grothoff||Status||feedback => assigned|
|2015-01-15 19:21||Bart Polot||Note Added: 0008782|
|2015-01-15 19:24||Christian Grothoff||Note Added: 0008783|
|2015-01-15 19:28||Bart Polot||Note Added: 0008784|
|2015-01-15 19:30||Christian Grothoff||Note Added: 0008786|
|2015-01-15 19:31||Bart Polot||Note Added: 0008787|
|2015-01-15 19:31||Bart Polot||Status||assigned => acknowledged|
|2015-01-15 19:32||Christian Grothoff||Note Added: 0008788|
|2015-01-15 19:34||Bart Polot||Note Added: 0008789|
|2015-01-15 20:26||Bart Polot||Priority||urgent => normal|
|2015-01-15 20:26||Bart Polot||Severity||major => tweak|
|2015-03-08 13:00||Christian Grothoff||Note Added: 0009003|
|2015-03-08 13:00||Christian Grothoff||Status||acknowledged => resolved|
|2015-03-08 13:00||Christian Grothoff||Fixed in Version||SVN HEAD => 0.11.0pre66|
|2015-03-08 13:00||Christian Grothoff||Resolution||reopened => fixed|
|2018-06-07 00:25||Christian Grothoff||Status||resolved => closed|