View Issue Details

IDProjectCategoryView StatusLast Update
0004024GNUnetfile-sharing servicepublic2018-06-07 00:24
ReporterChristian Grothoff Assigned ToChristian Grothoff  
PriorityhighSeveritymajorReproducibilitysometimes
Status closedResolutionfixed 
Platformi7OSDebian GNU/LinuxOS Versionsqueeze
Product VersionGit master 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0004024: CPU utilization of FS on gnunet9 is too high
DescriptionTraces suggest it is in gnunet-service-fs_pe.c::get_latest() and merge_pr() where we do a linear scan of a linked list. The issue is that the number of entries there should be bounded by the number of connected peers, but is not.

Somehow, peers can create a 'struct GSF_PendingRequest', which creates a plan entry (struct GSF_PendingRequestPlanBijection), then the request is cleaned up (GSF_pending_request_cancel_), but without 'full_cleanup'. Somehow, the PR and the BI survive. The next time the peer issues the same request, the check in gnunet-service-fs_cp::test_exist_cb() fails to detect that we still have a matching PR (well, CP is right: it cancel'ed it after all). The new PR is then added to the same PE, so now we have two BI/PRs for the same query and peer.

This can happen a lot, so we end up with a long DLL of BIs and excessive CPU consumption.
Steps To ReproduceRun gnunet9 peer for a while.
Additional Information(gdb) print *rp->pe_head->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->pr
$26 = {public_data = {query = {bits = {3431383598, 4291201156, 2335342679, 3877743069, 3061985059, 3582666198, 612382203, 951268149,
        1161128333, 1040336393, 1794210090, 2831188678, 3075402478, 1938878346, 3024631312, 693392574}}, target = 0x0,
    pr_head = 0x3c9e000, pr_tail = 0x250dc50, ttl = {abs_value_us = 1445551114366873}, start_time = {abs_value_us = 1445815582366873},
    anonymity_level = 1, priority = 0, original_priority = 0, num_transmissions = 4, respect_offered = 0, options = GSF_PRO_DEFAULTS,
    type = GNUNET_BLOCK_TYPE_FS_DBLOCK, results_found = 0, has_started = 1}, rh = 0x0, rh_cls = 0x1bb1fd0, replies_seen = 0x0, bf = 0x0,
  hnode = 0x5c94d10, qe = 0x0, gh = 0x0, cadet_request = 0x0, llc_cont = 0x0, llc_cont_cls = 0x0,
  local_result = GNUNET_BLOCK_EVALUATION_OK_MORE, sender_pid = 0, origin_pid = 24, qe_start = {abs_value_us = 1445815582366947},
  warn_task = 0x0, local_result_offset = 9215370345274876929, first_uid = 0, cadet_retry_count = 0, replies_seen_count = 0,
  replies_seen_size = 0, mingle = 0, have_first_uid = 1}
(gdb) print *rp->pe_head->next_PE->next_PE->next_PE->next_PE->pr
$27 = {public_data = {query = {bits = {3431383598, 4291201156, 2335342679, 3877743069, 3061985059, 3582666198, 612382203, 951268149,
        1161128333, 1040336393, 1794210090, 2831188678, 3075402478, 1938878346, 3024631312, 693392574}}, target = 0x0,
    pr_head = 0x72e0340, pr_tail = 0x889dc30, ttl = {abs_value_us = 1445548631500674}, start_time = {abs_value_us = 1445815473500674},
    anonymity_level = 1, priority = 0, original_priority = 0, num_transmissions = 5, respect_offered = 0, options = GSF_PRO_DEFAULTS,
    type = GNUNET_BLOCK_TYPE_FS_DBLOCK, results_found = 0, has_started = 1}, rh = 0x0, rh_cls = 0x226c430, replies_seen = 0x0, bf = 0x0,
  hnode = 0x7565ec0, qe = 0x0, gh = 0x0, cadet_request = 0x0, llc_cont = 0x0, llc_cont_cls = 0x0,
  local_result = GNUNET_BLOCK_EVALUATION_OK_MORE, sender_pid = 0, origin_pid = 24, qe_start = {abs_value_us = 1445815473500690},
  warn_task = 0x0, local_result_offset = 2230422963627760641, first_uid = 0, cadet_retry_count = 0, replies_seen_count = 0,
  replies_seen_size = 0, mingle = 0, have_first_uid = 1}

(gdb) print *rp->pe_head->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->next_PE->pr
$16 = {public_data = {query = {bits = {3431383598, 4291201156, 2335342679, 3877743069, 3061985059, 3582666198, 612382203, 951268149,
        1161128333, 1040336393, 1794210090, 2831188678, 3075402478, 1938878346, 3024631312, 693392574}}, target = 0x0,
    pr_head = 0x3c9e000, pr_tail = 0x250dc50, ttl = {abs_value_us = 1445551114366873}, start_time = {abs_value_us = 1445815582366873},
    anonymity_level = 1, priority = 0, original_priority = 0, num_transmissions = 4, respect_offered = 0, options = GSF_PRO_DEFAULTS,
    type = GNUNET_BLOCK_TYPE_FS_DBLOCK, results_found = 0, has_started = 1}, rh = 0x0, rh_cls = 0x1bb1fd0, replies_seen = 0x0, bf = 0x0,
  hnode = 0x5c94d10, qe = 0x0, gh = 0x0, cadet_request = 0x0, llc_cont = 0x0, llc_cont_cls = 0x0,
  local_result = GNUNET_BLOCK_EVALUATION_OK_MORE, sender_pid = 0, origin_pid = 24, qe_start = {abs_value_us = 1445815582366947},
  warn_task = 0x0, local_result_offset = 9215370345274876929, first_uid = 0, cadet_retry_count = 0, replies_seen_count = 0,
  replies_seen_size = 0, mingle = 0, have_first_uid = 1}
(gdb) print $10
$17 = {public_data = {query = {bits = {3431383598, 4291201156, 2335342679, 3877743069, 3061985059, 3582666198, 612382203, 951268149,
        1161128333, 1040336393, 1794210090, 2831188678, 3075402478, 1938878346, 3024631312, 693392574}}, target = 0x0,
    pr_head = 0x72e0340, pr_tail = 0x889dc30, ttl = {abs_value_us = 1445548631500674}, start_time = {abs_value_us = 1445815473500674},
    anonymity_level = 1, priority = 0, original_priority = 0, num_transmissions = 5, respect_offered = 0, options = GSF_PRO_DEFAULTS,
    type = GNUNET_BLOCK_TYPE_FS_DBLOCK, results_found = 0, has_started = 1}, rh = 0x0, rh_cls = 0x226c430, replies_seen = 0x0, bf = 0x0,
  hnode = 0x7565ec0, qe = 0x0, gh = 0x0, cadet_request = 0x0, llc_cont = 0x0, llc_cont_cls = 0x0,
  local_result = GNUNET_BLOCK_EVALUATION_OK_MORE, sender_pid = 0, origin_pid = 24, qe_start = {abs_value_us = 1445815473500690},
  warn_task = 0x0, local_result_offset = 2230422963627760641, first_uid = 0, cadet_retry_count = 0, replies_seen_count = 0,
  replies_seen_size = 0, mingle = 0, have_first_uid = 1}
TagsNo tags attached.

Activities

Christian Grothoff

2015-10-26 01:44

manager   ~0009787

I just noticed that the "pr_head" of the respective PR is non-NULL. So the PR *knows* that it is still associated with BIs. But 'pr->rh' is only set to NULL just before we call GSF_plan_notify_request_done() where we have a 'while (NULL != prd->pr_head)' loop, which cannot exit until all entries are removed.

So either I'm missing a case where 'rh' is NULL'ed (grep strongly disagrees),
or the PEs are created just *after* GSF_pending_request_cancel_ (pr, GNUNET_NO)
is called!

Christian Grothoff

2015-10-26 01:53

manager   ~0009788

Aha! When a new peer connected, we were going over the 'pending' requests to see if any of those might be good to send to the new connection. In this case, we might take an 'inactive' but pending request, and add it to the new peer. Badness results. Fixed in SVN 36578, together with a bunch of fresh assertions.

Christian Grothoff

2015-10-26 11:36

manager   ~0009789

Can confirm this is fixed.

Issue History

Date Modified Username Field Change
2015-10-26 01:34 Christian Grothoff New Issue
2015-10-26 01:34 Christian Grothoff Status new => assigned
2015-10-26 01:34 Christian Grothoff Assigned To => Christian Grothoff
2015-10-26 01:44 Christian Grothoff Note Added: 0009787
2015-10-26 01:53 Christian Grothoff Note Added: 0009788
2015-10-26 01:55 Christian Grothoff Status assigned => feedback
2015-10-26 11:36 Christian Grothoff Note Added: 0009789
2015-10-26 11:36 Christian Grothoff Status feedback => assigned
2015-10-26 11:36 Christian Grothoff Status assigned => resolved
2015-10-26 11:36 Christian Grothoff Fixed in Version => 0.11.0pre66
2015-10-26 11:36 Christian Grothoff Resolution open => fixed
2018-06-07 00:24 Christian Grothoff Status resolved => closed