View Issue Details

IDProjectCategoryView StatusLast Update
0003596GNUnetcadet servicepublic2018-06-07 00:25
ReporterChristian Grothoff Assigned ToBart Polot  
PrioritynormalSeveritytweakReproducibilityalways
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0003596: CADET "kills" gnunet-service-dht with > 20000 new requests / minute (on startup)
DescriptionDec 24 02:43:55 -- peer started

$ date ;gnunet-statistics -s dht | grep client
Wed Dec 24 02:45:30 CET 2014
          dht # GET requests from clients injected: 47047
          dht # GET STOP requests received from clients: 47044
          dht # GET requests received from clients: 47046
          dht # RESULTS queued for clients: 177276
          dht # PUT requests received from clients: 6
grothoff@pixel:~$ date ;gnunet-statistics -s dht | grep client
Wed Dec 24 02:46:03 CET 2014
          dht # GET requests from clients injected: 60655
          dht # GET STOP requests received from clients: 60653
          dht # RESULTS queued for clients: 231716
          dht # GET requests received from clients: 60655
          dht # PUT requests received from clients: 6


The result is 100% CPU usage of the DHT service and high bandwidth consumption.
Additional InformationCADET's stats of "bad paths" seems to correlate with the DHT requests:

grothoff@pixel:~$ gnunet-statistics -s cadet
                                                               
        cadet # data on unknown channel: 26
        cadet # messages received: 107
        cadet # channels: 2
        cadet # peers: 7
        cadet # connections: 0
        cadet # bad paths: 60655
        cadet # local bad paths: 60655
        cadet # data retransmitted: 3
        cadet # duplicate PID: 1
        cadet # duplicate PONG messages: 1
        cadet # keepalives sent: 2
        cadet # clients: 3


Note that after a while, the situation stabilizes (at those 60k requests for now for me). However, issuing 60k requests and then canceling them all all again is obviously bad style...
TagsNo tags attached.

Relationships

related to 0003247 closedChristian Grothoff 100% CPU usage in MESH and FS after search 

Activities

Bart Polot

2015-01-15 16:17

manager   ~0008775

Added a 100ms delay in case of DHT/core connection mismatch in r34889.

Christian Grothoff

2015-01-15 17:51

manager   ~0008779

That kind of missmatch does not explain the behavior, especially if it happens after a peer is running for hours and core is disconnected, and then CADET starts hitting DHT with tons of requests.

You do need to impose a strict limit on the # parallel queries (with configuration option) and the rate limit (how many queries are issued at most per second).

Bart Polot

2015-01-15 17:54

manager   ~0008780

The behavior happens when DHT returns a path that is not valid, for instance, the first specified hop is not our neighbor.

This does impose a rate (not yet configurable) of 10 requests per second *per peer*, which I think is very reasonable...

Christian Grothoff

2015-01-15 19:18

manager   ~0008781

Is that a bug in DHT, or just caused by the DHT having 'outdated' information (as in, the path existed in the past)?

Also, why do you cancel the GET and re-issue it, instead of just keeping it running? I would have expected you to just ignore this 'useless' result, but keep the GET going.

Bart Polot

2015-01-15 19:21

manager   ~0008782

Is is caused by cached info in the DHT, just as the case you presented, where CORE is disconnected from a neighbor (or all) but the DHT still has the result of the old GET path.

The reissiueing attemps to force the DHT to issue a new GET that would go through a new set of peers towards some STORE peer, but caching seems to interfere with this, unless until the result times out :(

Christian Grothoff

2015-01-15 19:24

manager   ~0008783

So then re-issuing the GET is exactly the worst thing to do. You will immediately get the cached information *again*, as for a new request this is 'new' information.

If instead you keep the 'old' GET active ("want more results"), then the DHT will go out to find more results (unless your block plugin said that this was the last possible result, which you should avoid if there can be more than one).

So definitively just drop/ignore that 'bad' path, and keep the GET open.

Bart Polot

2015-01-15 19:28

manager   ~0008784

If I leave the current one active I seem to *never* get anything anymore. Haven't debugged it yet, but my theory is that as the contents are the same, DHT treats it as duplicate and discards any new results.

I thought of adding a DHT option to "treat same data with different path" as different results, but there are a million more important things to do before that...

Christian Grothoff

2015-01-15 19:30

manager   ~0008786

As I said, this may be caused by the return value from the block plugin.

Bart Polot

2015-01-15 19:31

manager   ~0008787

Look into block plugin, keep GET open.

Christian Grothoff

2015-01-15 19:32

manager   ~0008788

Ah, yes, that's another good point, if the result is byte-wise identical, then the typical bloom-filter logic (which is in the plugin) will determine the result to be a duplicate. I am not 100% sure if it is sufficient to adjust the plugin logic (as there might be other places in the code that also go for byte-wise identical, like maybe the database), but that's a place to start: modify the plugin to say different paths mean really different business. Now, that could also be dangerous as it may cause excessive storage of duplicates (a, aba, ababa, abababa are all different paths if taken literally), so to limit cost we might want something more complex to tell if the paths are different in an *interesting* sense.

Bart Polot

2015-01-15 19:34

manager   ~0008789

The paths should be "easy" to filter, as CADET always eliminates peers that appear in a path twice (as it is a very common ocurrence when concatenating GET + PUT paths).

Christian Grothoff

2015-03-08 13:00

manager   ~0009003

Seems fine now, at least I don't see crazy high loads anymore.

Issue History

Date Modified Username Field Change
2014-12-24 02:51 Christian Grothoff New Issue
2014-12-24 02:51 Christian Grothoff Status new => assigned
2014-12-24 02:51 Christian Grothoff Assigned To => Bart Polot
2014-12-24 02:52 Christian Grothoff Priority immediate => urgent
2014-12-24 02:52 Christian Grothoff Summary CADET "kills" gnunet-service-dht with > 20000 new requests / minute => CADET "kills" gnunet-service-dht with > 20000 new requests / minute (on startup)
2014-12-24 02:53 Christian Grothoff Relationship added related to 0003247
2015-01-15 16:17 Bart Polot Note Added: 0008775
2015-01-15 16:17 Bart Polot Status assigned => resolved
2015-01-15 16:17 Bart Polot Fixed in Version => Git master
2015-01-15 16:17 Bart Polot Resolution open => fixed
2015-01-15 17:51 Christian Grothoff Note Added: 0008779
2015-01-15 17:51 Christian Grothoff Status resolved => feedback
2015-01-15 17:51 Christian Grothoff Resolution fixed => reopened
2015-01-15 17:54 Bart Polot Note Added: 0008780
2015-01-15 19:18 Christian Grothoff Note Added: 0008781
2015-01-15 19:18 Christian Grothoff Status feedback => assigned
2015-01-15 19:21 Bart Polot Note Added: 0008782
2015-01-15 19:24 Christian Grothoff Note Added: 0008783
2015-01-15 19:28 Bart Polot Note Added: 0008784
2015-01-15 19:30 Christian Grothoff Note Added: 0008786
2015-01-15 19:31 Bart Polot Note Added: 0008787
2015-01-15 19:31 Bart Polot Status assigned => acknowledged
2015-01-15 19:32 Christian Grothoff Note Added: 0008788
2015-01-15 19:34 Bart Polot Note Added: 0008789
2015-01-15 20:26 Bart Polot Priority urgent => normal
2015-01-15 20:26 Bart Polot Severity major => tweak
2015-03-08 13:00 Christian Grothoff Note Added: 0009003
2015-03-08 13:00 Christian Grothoff Status acknowledged => resolved
2015-03-08 13:00 Christian Grothoff Fixed in Version Git master => 0.11.0pre66
2015-03-08 13:00 Christian Grothoff Resolution reopened => fixed
2018-06-07 00:25 Christian Grothoff Status resolved => closed