View Issue Details

IDProjectCategoryView StatusLast Update
0007617GNUnetutil librarypublic2024-02-29 22:46
Reporterulfvonbelow Assigned ToChristian Grothoff  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.19.4Fixed in Version0.19.4 
Summary0007617: Task can delay scheduler pass indefinitely under certain circumstances, even when yielding regularly
DescriptionIf the task that was the last one at the start of the current pass is deleted before it is reached, or if a priority switch causes the currently-executing ready queue to change to one that doesn't contain that task, GNUNET_SCHEDULER_do_work will run until the current ready queue is empty. This may happen after an arbitrarily long time, or never. This blocking will continue even if higher-priority tasks become available.
Steps To ReproduceSee first attached patch for test cases
Additional InformationFix in second attached patch
Tagspatch
Attached Files
0001-UTIL-add-test-demonstrating-scheduler-bug-don-t-run-.patch (5,310 bytes)   
From 079fac47d7d68bd62c9c3ab6ebdf04ca379823ed Mon Sep 17 00:00:00 2001
From: ulfvonbelow <strilen@tilde.club>
Date: Sat, 28 Jan 2023 15:30:54 -0600
Subject: [PATCH 1/2] UTIL: add test demonstrating scheduler bug, don't run it
 by default.

These demonstrate a bug in the scheduler by which a task can prevent any other
task from running for an arbitrarily long time despite regularly yielding to
the scheduler.  It is caused by a faulty check in GNUNET_SCHEDULER_do_work
that assumes that the task that was the last in the queue when the pass began
will still be in the same relative position when the pass ends, and uses this
assumption to detect the end of the current pass.  This assumption fails when
the last task of the current pass is canceled after the pass has started.  It
also fails when we schedule a higher-priority task to run immediately, causing
work_priority to immediately switch such that we now process a queue that
doesn't contain the pass-ending task we're looking for.

These tests are built, but not run by 'make check' yet, since they currently
fail.  You can manually verify that they do currently fail.
---
 src/util/Makefile.am                       | 12 +++++
 src/util/test_scheduler_hogging_cancel.c   | 51 ++++++++++++++++++++
 src/util/test_scheduler_hogging_priority.c | 55 ++++++++++++++++++++++
 3 files changed, 118 insertions(+)
 create mode 100644 src/util/test_scheduler_hogging_cancel.c
 create mode 100644 src/util/test_scheduler_hogging_priority.c

diff --git a/src/util/Makefile.am b/src/util/Makefile.am
index ed01558eb..81b8a93b7 100644
--- a/src/util/Makefile.am
+++ b/src/util/Makefile.am
@@ -174,6 +174,8 @@ endif
 noinst_PROGRAMS = \
  gnunet-config-diff \
  test_common_logging_dummy \
+ test_scheduler_hogging_cancel \
+ test_scheduler_hogging_priority \
  gnunet-crypto-tvg
 
 if ENABLE_TEST_RUN
@@ -587,6 +589,16 @@ test_scheduler_delay_SOURCES = \
 test_scheduler_delay_LDADD = \
  libgnunetutil.la
 
+test_scheduler_hogging_cancel_SOURCES = \
+ test_scheduler_hogging_cancel.c
+test_scheduler_hogging_cancel_LDADD = \
+ libgnunetutil.la
+
+test_scheduler_hogging_priority_SOURCES = \
+ test_scheduler_hogging_priority.c
+test_scheduler_hogging_priority_LDADD = \
+ libgnunetutil.la
+
 test_service_SOURCES = \
  test_service.c
 test_service_LDADD = \
diff --git a/src/util/test_scheduler_hogging_cancel.c b/src/util/test_scheduler_hogging_cancel.c
new file mode 100644
index 000000000..7611338b3
--- /dev/null
+++ b/src/util/test_scheduler_hogging_cancel.c
@@ -0,0 +1,51 @@
+#include "gnunet_util_lib.h"
+#include <unistd.h>
+
+static int count = 0;
+static int final_count;
+static struct GNUNET_SCHEDULER_Task *t4;
+
+static void end (void *cls)
+{
+  final_count = count;
+  count = 5000;
+  GNUNET_SCHEDULER_shutdown ();
+}
+
+static void self_rescheduling (void *cls)
+{
+  if (0 == count)
+  {
+    GNUNET_SCHEDULER_cancel (t4);
+    GNUNET_SCHEDULER_add_delayed_with_priority (GNUNET_TIME_UNIT_MILLISECONDS,
+                                                GNUNET_SCHEDULER_PRIORITY_URGENT,
+                                                &end,
+                                                NULL);
+    sleep (1);
+    /* end should be added to ready queue on next scheduler pass for certain
+       now */
+  }
+  if (++count < 5000)
+    {
+      GNUNET_SCHEDULER_add_now (&self_rescheduling, NULL);
+    }
+}
+
+static void to_be_canceled (void *cls)
+{
+  /* Don't run me! */
+}
+
+
+static void init (void *cls)
+{
+  GNUNET_SCHEDULER_add_now (&self_rescheduling, NULL);
+  t4 = GNUNET_SCHEDULER_add_now (&to_be_canceled, NULL);
+}
+
+
+int main (int argc, char **argv)
+{
+  GNUNET_SCHEDULER_run (&init, NULL);
+  return final_count < 5000 ? 0 : 1;
+}
diff --git a/src/util/test_scheduler_hogging_priority.c b/src/util/test_scheduler_hogging_priority.c
new file mode 100644
index 000000000..217a39ce7
--- /dev/null
+++ b/src/util/test_scheduler_hogging_priority.c
@@ -0,0 +1,55 @@
+#include "gnunet_util_lib.h"
+#include <unistd.h>
+
+static int count = 0;
+static int final_count;
+
+static void end (void *cls)
+{
+  final_count = count;
+  count = 5000;
+  GNUNET_SCHEDULER_shutdown ();
+}
+
+static void self_rescheduling (void *cls)
+{
+  if (count == 0)
+  {
+    GNUNET_SCHEDULER_add_delayed_with_priority (GNUNET_TIME_UNIT_MILLISECONDS,
+                                                GNUNET_SCHEDULER_PRIORITY_URGENT,
+                                                &end,
+                                                NULL);
+    sleep(1);
+    /* end should be added to ready queue on next scheduler pass for certain
+       now */
+  }
+  if (++count < 5000)
+  {
+    GNUNET_SCHEDULER_add_now (&self_rescheduling, NULL);
+  }
+}
+
+
+static void noop (void *cls)
+{
+}
+
+static void indirection (void *cls)
+{
+  GNUNET_SCHEDULER_add_with_reason_and_priority (&self_rescheduling, NULL,
+                                                 GNUNET_SCHEDULER_REASON_STARTUP,
+                                                 GNUNET_SCHEDULER_PRIORITY_HIGH);
+}
+
+static void init (void *cls)
+{
+  GNUNET_SCHEDULER_add_now (&indirection, NULL);
+  GNUNET_SCHEDULER_add_now (&noop, NULL);
+}
+
+
+int main (int argc, char **argv)
+{
+  GNUNET_SCHEDULER_run (&init, NULL);
+  return final_count < 5000 ? 0 : 1;
+}
-- 
2.38.1

0002-UTIL-use-dedicated-marker-in-ready-queue.patch (4,780 bytes)   
From 5367cdadf1130aa96868cac2870b0a101ba44569 Mon Sep 17 00:00:00 2001
From: ulfvonbelow <strilen@tilde.club>
Date: Sat, 28 Jan 2023 16:08:45 -0600
Subject: [PATCH 2/2] UTIL: use dedicated marker in ready queue.

This inserts a dedicated dummy marker task at the end of the ready queue at
the start of a pass.  Because this marker task isn't visible to users of the
scheduler, it can't be canceled while the pass is being run.  Additionally,
switching which ready queue is being run partway through by scheduling a
higher-priority task to immediately run also places this dummy marker.  This
resolves both erroneous cases by which a pass can accidentally run an
unbounded number of tasks.

This also modifies GNUNET_SCHEDULER_get_load to not be misled by this extra
dummy task, and adds the now-passing test cases to the test suite.
---
 src/util/Makefile.am |  4 ++--
 src/util/scheduler.c | 46 ++++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/src/util/Makefile.am b/src/util/Makefile.am
index 81b8a93b7..431bd7d0d 100644
--- a/src/util/Makefile.am
+++ b/src/util/Makefile.am
@@ -174,8 +174,6 @@ endif
 noinst_PROGRAMS = \
  gnunet-config-diff \
  test_common_logging_dummy \
- test_scheduler_hogging_cancel \
- test_scheduler_hogging_priority \
  gnunet-crypto-tvg
 
 if ENABLE_TEST_RUN
@@ -323,6 +321,8 @@ check_PROGRAMS = \
  test_resolver_api.nc \
  test_scheduler \
  test_scheduler_delay \
+ test_scheduler_hogging_cancel \
+ test_scheduler_hogging_priority \
  test_service \
  test_strings \
  test_strings_to_data \
diff --git a/src/util/scheduler.c b/src/util/scheduler.c
index f3b220c4a..b57e2db66 100644
--- a/src/util/scheduler.c
+++ b/src/util/scheduler.c
@@ -248,6 +248,13 @@ struct GNUNET_SCHEDULER_Task
   struct GNUNET_AsyncScopeSave scope;
 };
 
+/**
+ * Placed at the end of a ready queue to indicate where a scheduler run pass
+ * ends. The next, prev, in_ready_list and priority fields are the only ones
+ * that should be used.
+ */
+static struct GNUNET_SCHEDULER_Task pass_end_marker;
+
 
 /**
  * A struct representing an event the select driver is waiting for
@@ -503,6 +510,27 @@ get_timeout ()
   return timeout;
 }
 
+static void remove_pass_end_marker ()
+{
+  if (pass_end_marker.in_ready_list)
+  {
+    GNUNET_CONTAINER_DLL_remove (ready_head[pass_end_marker.priority],
+                                 ready_tail[pass_end_marker.priority],
+                                 &pass_end_marker);
+    pass_end_marker.in_ready_list = GNUNET_NO;
+  }
+}
+
+static void set_work_priority (enum GNUNET_SCHEDULER_Priority p)
+{
+  remove_pass_end_marker ();
+  GNUNET_CONTAINER_DLL_insert_tail (ready_head[p],
+                                    ready_tail[p],
+                                    &pass_end_marker);
+  pass_end_marker.priority = p;
+  pass_end_marker.in_ready_list = GNUNET_YES;
+  work_priority = p;
+}
 
 /**
  * Put a task that is ready for execution into the ready queue.
@@ -518,7 +546,7 @@ queue_ready_task (struct GNUNET_SCHEDULER_Task *task)
                                     ready_tail[p],
                                     task);
   if (p > work_priority)
-    work_priority = p;
+    set_work_priority (p);
   task->in_ready_list = GNUNET_YES;
   ready_count++;
 }
@@ -752,6 +780,9 @@ GNUNET_SCHEDULER_get_load (enum GNUNET_SCHEDULER_Priority p)
        NULL != pos;
        pos = pos->next)
     ret++;
+  if (pass_end_marker.in_ready_list && pass_end_marker.priority == p)
+    // Don't count the dummy marker
+    ret--;
   return ret;
 }
 
@@ -2050,8 +2081,9 @@ GNUNET_SCHEDULER_do_work (struct GNUNET_SCHEDULER_Handle *sh)
 
     /* process all *existing* tasks at this priority
        level, then yield */
-    last = ready_tail[work_priority];
-    while (NULL != (pos = ready_head[work_priority]))
+    set_work_priority (work_priority);
+    while (NULL != (pos = ready_head[work_priority])
+           && pos != &pass_end_marker)
     {
       GNUNET_CONTAINER_DLL_remove (ready_head[work_priority],
                                    ready_tail[work_priority],
@@ -2121,14 +2153,8 @@ GNUNET_SCHEDULER_do_work (struct GNUNET_SCHEDULER_Handle *sh)
       active_task = NULL;
       dump_backtrace (pos);
       destroy_task (pos);
-      /* pointer 'pos' was free'd, but we can still safely check for
-         pointer equality still. */
-      if (pos == last)
-        break; /* All tasks that _were_ ready when we started were
-                  executed. New tasks may have been added in the
-                  meantime, but we should check with the OS to
-                  be sure no higher-priority actions are pending! */
     }
+    remove_pass_end_marker ();
   }
   shutdown_if_no_lifeness ();
   if (0 == ready_count)
-- 
2.38.1

Activities

Christian Grothoff

2023-02-06 16:13

manager   ~0019768

Nice catch & very nice patch(es). Fixed as suggested in eabc1baaf..35ca280a7.

schanzen

2023-06-01 20:26

administrator   ~0020215

released some time ago

Issue History

Date Modified Username Field Change
2023-01-29 21:17 ulfvonbelow New Issue
2023-01-29 21:17 ulfvonbelow Tag Attached: bug
2023-01-29 21:17 ulfvonbelow Tag Attached: patch
2023-01-29 21:17 ulfvonbelow File Added: 0001-UTIL-add-test-demonstrating-scheduler-bug-don-t-run-.patch
2023-01-29 21:17 ulfvonbelow File Added: 0002-UTIL-use-dedicated-marker-in-ready-queue.patch
2023-02-06 05:23 schanzen Assigned To => Christian Grothoff
2023-02-06 05:23 schanzen Status new => assigned
2023-02-06 05:23 schanzen Target Version => 0.19.4
2023-02-06 16:13 Christian Grothoff Note Added: 0019768
2023-02-06 16:13 Christian Grothoff Status assigned => resolved
2023-02-06 16:13 Christian Grothoff Resolution open => fixed
2023-02-06 16:13 Christian Grothoff Fixed in Version => 0.19.4
2023-06-01 20:26 schanzen Note Added: 0020215
2023-06-01 20:26 schanzen Status resolved => closed
2024-02-29 22:46 Christian Grothoff Tag Detached: bug