View Issue Details

IDProjectCategoryView StatusLast Update
0008978GNUnetutil librarypublic2024-07-08 23:33
Reporterthejackimonster Assigned To 
PrioritynormalSeveritycrashReproducibilitysometimes
Status newResolutionopen 
Product VersionGit master 
Summary0008978: Scheduler: ready_count underflow
DescriptionDuring testing of the new voice chat feature in messenger-gtk I ran into a crash caused by the scheduler. An assertion shut it down because of an underflow of the `ready_count` variable. I'm not sure how this is possible but it seems that during some decrement of it, it may happen.

It might be caused by multiple threads accessing the GNUnet APIs. But I'm not 100% sure about this because I can't reproduce a backtrace that hints to that yet.
Steps To ReproduceThe upstream version of messenger-gtk runs into this issue during a voice call at times. The application just crashes because of the assertion that assumes `ready_count` in the scheduler would need to be zero when the task queues are all empty. However it seems possible that an underflow hits it to the value 4294967295.
Additional InformationAssertion that triggers the crash:
https://git.gnunet.org/gnunet.git/tree/src/lib/util/scheduler.c#n2084

Locations of decrement:
https://git.gnunet.org/gnunet.git/tree/src/lib/util/scheduler.c#n1033
https://git.gnunet.org/gnunet.git/tree/src/lib/util/scheduler.c#n2095

Messenger-GTK:
https://git.gnunet.org/messenger-gtk.git/
TagsNo tags attached.

Activities

thejackimonster

2024-07-02 00:23

developer   ~0022781

Okay, so I tried to remove multiple threads as cause by using the code from gnunet-gtk to combine GNUnet scheduler and g_main_loop from GTK. But I could still reproduce the issue, triggering the assertion. I'm not sure how this is possible in a single-threaded environment but it looks like it is. So there's definitely something wrong in the scheduler logic.

thejackimonster

2024-07-04 22:29

developer   ~0022787

I think I found the actual cause of `ready_count` being wrong. The internal function `remove_pass_end_marker()` removed a task from the ready queue while not adjusting the `ready_count`. So I patched this:
https://git.gnunet.org/gnunet.git/commit/?id=2dc422a9652b90c4c6e472aa4e7e349c77aa8098

The scheduler test case also used a workaround for this existing issue, verifying a wrong `ready_count` as accurate...

thejackimonster

2024-07-05 02:08

developer   ~0022788

I can still run into the same assertion crashing my application after those changes. So I'm still investigating.

thejackimonster

2024-07-08 23:33

developer   ~0022799

From my testing today it seems like the issue was coming from GStreamer running on a separate thread as the GNUnet scheduler. After synchronizing the pulled sample data with the scheduler thread, I was able to write it via messaging without reproducing the crash so far. I'll keep this issue open for now but it seems like the cause has been found.

Issue History

Date Modified Username Field Change
2024-06-25 15:23 thejackimonster New Issue
2024-06-25 15:23 thejackimonster Additional Information Updated
2024-07-02 00:23 thejackimonster Note Added: 0022781
2024-07-04 22:29 thejackimonster Note Added: 0022787
2024-07-05 02:08 thejackimonster Note Added: 0022788
2024-07-08 23:33 thejackimonster Note Added: 0022799