View Issue Details

IDProjectCategoryView StatusLast Update
0003671GNUnetfile-sharing servicepublic2018-06-07 00:25
ReporterMatthias WachsAssigned ToChristian Grothoff 
PriorityurgentSeveritymajorReproducibilitysometimes
Status closedResolutionfixed 
Product VersionSVN HEAD 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0003671: FS running with 100% CPU load
DescriptionI saw FS running with high CPU load on different systems:

On gnunet9 with 6 core connections:

gnunet9 28124 92.9 0.3 581332 449908 ? R 15:01 145:10 /usr/bin/valgrind.bin /home/gnunet9/lib//gnunet/libexec/gnunet-service-fs -c /home/gnunet9/.config/gnunet.conf

Had the same issue on my system without valgrind....
Steps To Reproducegnunet9 statistics

   revocation # peers connected: 5
   revocation # revocation set unions failed: 45
           fs # running average P2P latency (ms): 4
           fs # replies transmitted to other peers: 18062
           fs # query messages sent to other peers: 603364
           fs # P2P searches destroyed due to ultimate reply: 25401
           fs # replies received for other peers: 27077
           fs # replies received and matched: 29461
           fs # GAP PUT messages received: 11274
           fs # migration stop messages received: 4291
           fs # query plan entries: 49687
           fs # GET requests received (from other peers): 276613
           fs # average retransmission delay (ms): 14928
           fs # Pending requests active: 20523
           fs # replies dropped: 930
           fs # Datastore lookups concluded (error queueing): 16228
           fs # P2P query messages received and processed: 65428
           fs # P2P searches active: 8567
           fs # peers connected: 8
           fs # migration stop messages sent: 66
           fs # delay heap timeout (ms): 12
    transport # bytes transmitted via TCP: 890378228
    transport # bytes currently in TCP buffers: 32872
    transport # messages transmitted to other peers: 138789
    transport # bytes in message queue for other peers: 322527
    transport # bytes payload received: 929174402
    transport # bytes total received: 929405745
    transport # bytes received via TCP: 929405745
    transport # REQUEST CONNECT messages received: 66387
    transport # PONGs unicast via reliable transport: 701
    transport # successful address checks during validation: 701
    transport # PING messages received: 701
    transport # ATS suggestions received: 127423
    transport # address revalidations started: 2714
    transport # Addresses in validation map: 73
    transport # keepalives sent: 944
    transport # address records discarded: 1671
    transport # peers connected: 5
    transport # Addresses given to ATS: 24
    transport # TCP sessions active: 6
    transport # network-level TCP disconnect events: 1257
    transport # DISCONNECT messages received: 6
    transport # KEEPALIVE_RESPONSE messages discarded (wrong nonce): 33
    transport # ATS suggestions ignored (failed to create session): 5
    transport # bytes discarded by TCP (timeout): 9120
    transport # validations running: 357
    transport # validations succeeded: 101
    transport # PONG messages received: 101
    transport # PINGs for address validation sent: 1171
    transport # refreshed my HELLO: 11
    transport # transport addresses: 2
    transport # validations delayed by global throttle: 2451
    transport # SYN messages sent: 280
    transport # SYN messages received: 979
    transport # bytes discarded by TCP (disconnect): 324188
    transport # TCP WELCOME messages received: 293
    transport # DISCONNECT messages sent: 12
    transport # disconnected from peer upon explicit request: 103
    transport # ms throttling suggested: 192101
    transport # ACK messages received: 143
    transport # SYN_ACK messages sent: 146
    transport # SYN_ACK messages received: 58
    transport # Successful attempts to switch addresses: 9
    transport # Attempts to switch addresses: 10
    transport # bytes payload discarded due to not connected peer: 398475
    transport # bytes payload dropped (other peer was not connected): 40695
    transport # transmission failures for messages to other peers: 8
    transport # messages timed out while in transport queue: 1
    transport # KEEPALIVE_RESPONSE messages discarded (not expected): 1
 datastore-api # GET ZERO ANONYMITY requests executed: 19
 datastore-api # queue overflows: 167919
          gns Current zone iteration interval (ms): 342857
          gns Number of public records in DHT: 42
          gns Number of zone iterations: 1
          gns Current zone iteration interval (in ms): 342857
    datacache # requests received: 94352
    datacache # bytes stored: 0
    datacache # items stored: 2880
    datacache # requests filtered by bloom filter: 40920
     topology # peers connected: 5
     topology # HELLO messages received: 285
     topology # connect requests issued to transport: 999
     topology # HELLO messages gossipped: 265
         core # encrypted bytes given to transport: 890691742
         core # bytes encrypted: 949437483
         core # avg payload per encrypted message: 6629
         core # bytes of messages of type 147 received: 29513380
         core # bytes of payload decrypted: 917091596
         core # bytes decrypted: 919215772
         core # bytes of messages of type 148 received: 8268136
         core # bytes of messages of type 262 received: 552072
         core # bytes of messages of type 137 received: 75396416
         core # bytes of messages of type 138 received: 829915973
         core # bytes of messages of type 139 received: 8528
         core # bytes of messages of type 256 received: 557568
         core # bytes of messages of type 257 received: 19692
         core # messages discarded (session disconnected): 32
         core # bytes of messages of type 258 received: 216200
         core # type maps received: 292
         core # neighbour entries allocated: 5
         core # key exchanges stopped: 118
         core # peers connected: 5
         core # sessions terminated by transport disconnect: 118
         core # bytes of messages of type 17 received: 10600
         core # bytes of messages of type 146 received: 739647
         core # type map refreshes sent: 348
         core # old ephemeral keys ignored: 36
         core # PONG messages created: 68
         core # PING messages received: 68
         core # messages discarded (expired prior to transmission): 69
         core # rekey operations confirmed via PONG: 1
         core # PONG messages decrypted: 64
         core # PONG messages received: 64
         core # EPHEMERAL_KEY messages received: 97
         core # ephemeral keys received: 97
         core # bytes of messages of type 322 received: 16632
         core # bytes of messages of type 280 received: 1914236
         core # bytes of messages of type 269 received: 3200
         core # bytes of messages of type 268 received: 10000
         core # send requests dropped (disconnected): 84
         core # keepalive messages sent: 7
         core # valid typemap confirmations received: 53
         core # session keys confirmed via PONG: 63
         core # key exchanges initiated: 123
         core # outdated typemap confirmations received: 3
         core # updates to my type map: 8
         core # bytes of messages of type 266 received: 468
         core # DATA message dropped (out of order): 11
        cadet # DHT search: 17859
        cadet # connections: 0
        cadet # bad paths: 18858
        cadet # local bad paths: 17855
        cadet # messages forwarded: 2215
        cadet # unknown connection: 392
        cadet # peers: 23
        cadet # control on unknown connection: 163
        cadet # DHT announce: 4
        cadet # keepalives sent: 8
        cadet # tunnels: 0
        cadet # channels: 0
        cadet # messages received: 881
        cadet # duplicate PONG messages: 5
        cadet # keepalives received: 14
        cadet # DHT announce skipped (no hello): 2
        cadet # poll on unknown connection: 1
        cadet # messages dropped due to wrong hop: 6
          nse # peers connected: 6
          nse # flood messages received: 104
          nse # flood messages transmitted: 126
          nse # flood messages ignored (had closer already): 10
          nse # nodes in the network (estimate): 21
          nse # estimated network diameter: 5
          nse # flood messages started: 15
          nse # flood messages discarded (clock skew too large): 4
          ats # address updates received: 216626
          ats # performance updates given to clients: 280622
          ats # reservation requests processed: 822667
          ats # address suggestions made: 147156
          ats # preference change requests processed: 12362
          ats # ATS addresses total: 5
          ats # ATS active addresses WAN total: 5
          ats # ATS addresses WAN total: 5
          ats # addresses: 5
          ats # addresses destroyed: 370
          ats # ATS active addresses UNSPECIFIED total: 0
          ats # ATS addresses UNSPECIFIED total: 0
          ats # ATS active addresses total: 375
          ats # addresses created: 375
          dht # Bytes transmitted to other peers: 33550697
          dht # RESULTS queued for clients: 19758
          dht # GET requests received from clients: 14499
          dht # ITEMS stored in datacache: 26254
          dht # P2P RESULT bytes received: 12018053
          dht # P2P RESULTS received: 25510
          dht # GET messages queued for transmission: 92890
          dht # GET requests routed: 72684
          dht # Good REPLIES matched against routing table: 14498
          dht # Duplicate REPLIES matched against routing table: 5570146
          dht # REPLIES ignored for CLIENTS (no match): 634248
          dht # DHT requests combined: 13888
          dht # Entries added to routing table: 14445
          dht # P2P GET bytes received: 2999124
          dht # P2P GET requests received: 10495
          dht # Peers excluded from routing due to Bloomfilter: 30569
          dht # P2P messages dropped due to full queue: 1534
          dht # Duplicate REPLIES to CLIENT request dropped: 526742
          dht # Good RESULTS found in datacache: 539030
          dht # GET requests given to datacache: 11803
          dht # Duplicate RESULTS found in datacache: 143513
          dht # GET STOP requests received from clients: 7355
          dht # GET requests from clients injected: 5118
          dht # RESULT messages queued for transmission: 2079
          dht # peers connected: 10
          dht # Bytes of bandwidth requested from core: 2330924
          dht # Peer selection failed: 2375
          dht # P2P FIND PEER requests processed: 117
          dht # FIND PEER requests ignored due to Bloomfilter: 93
          dht # PUT messages queued for transmission: 21
          dht # PUT requests routed: 57
          dht # P2P PUT bytes received: 6117
          dht # P2P PUT requests received: 10
          dht # P2P GET requests ONLY routed: 226
          dht # Preference updates given to core: 29
          dht # FIND PEER messages initiated: 7
          dht # PUT requests received from clients: 47
          dht # HELLOs obtained from peerinfo: 30
          dht # Queued messages discarded (peer disconnected): 180
          dht # Messages dropped (CORE timeout): 7
          dht # requests TTL-dropped: 5
          dht # Network size estimates received: 3
     peerinfo # peers known: 115
    datastore # requests filtered by bloomfilter: 289579
    datastore # GET requests received: 289579
    datastore # GET ZERO ANONYMITY requests received: 21
    datastore # cache size: 12500000
    datastore # quota: 100000000
! datastore # bytes stored: 16387
! datastore # bytes used in file-sharing datastore `sqlite': 46851
! hostlist hostlist requests processed: 504840
! hostlist bytes in hostlist: 1736
     hostlist HELLOs without addresses encountered (ignored): 11684
! hostlist hostlist requests refused (not HTTP GET): 9
! hostlist expired addresses encountered: 35077067
TagsNo tags attached.

Activities

Christian Grothoff

2015-02-10 17:54

manager   ~0008861

The main "problem" I see is FS reporting 8 connections with transport/core only reporting 5. CPU load right now doesn't even seem high (despite valgrind).

Christian Grothoff

2015-02-11 20:03

manager   ~0008872

Broken ATS reservation API might have caused FS to go more crazy than expected, that was fixed today, which should limit FS query transmission rates.

Christian Grothoff

2015-02-28 22:20

manager   ~0008950

I'm seeing 'high' CPU usage still. The peer seems to be sending 0000074:0000070 replies per minute (which fits bandwidth constraints), but gets 800 requests per minute (is that OK with reservation?), and similarly does 800 DB requests per minute. Processing 800 requests per minute (with database, etc.) is what pushes the CPU up.

Christian Grothoff

2015-02-28 22:23

manager   ~0008951

FS _also_ takes 1 GB of RAM -- after "handling" about 100k requests/local replies. That maches each reply taking ~100k (plausible) and pending replies (those that couldn't be transmitted due to bandwidth restrictions) still staying around in RAM.

Christian Grothoff

2015-03-01 14:48

manager   ~0008952

Fixed CPU and RAM issues by limiting message queue for replies.

Issue History

Date Modified Username Field Change
2015-02-10 17:40 Matthias Wachs New Issue
2015-02-10 17:40 Matthias Wachs Description Updated View Revisions
2015-02-10 17:54 Christian Grothoff Note Added: 0008861
2015-02-11 00:36 Christian Grothoff Status new => acknowledged
2015-02-11 20:03 Christian Grothoff Note Added: 0008872
2015-02-28 22:20 Christian Grothoff Note Added: 0008950
2015-02-28 22:20 Christian Grothoff Assigned To => Christian Grothoff
2015-02-28 22:20 Christian Grothoff Status acknowledged => assigned
2015-02-28 22:23 Christian Grothoff Note Added: 0008951
2015-02-28 22:23 Christian Grothoff Priority normal => urgent
2015-02-28 22:23 Christian Grothoff Target Version => 0.11.0pre66
2015-03-01 14:48 Christian Grothoff Note Added: 0008952
2015-03-01 14:48 Christian Grothoff Status assigned => resolved
2015-03-01 14:48 Christian Grothoff Fixed in Version => 0.11.0pre66
2015-03-01 14:48 Christian Grothoff Resolution open => fixed
2018-06-07 00:25 Christian Grothoff Status resolved => closed