View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005250||GNUnet||util library||public||2018-01-08 15:43||2018-06-07 00:26|
|Product Version||SVN HEAD|
|Target Version||0.11.0||Fixed in Version||0.11.0pre66|
|Summary||0005250: Scheduler asserion fails with mhd-related components|
|Description||The gnunet-gns-proxy (and also the rest service) currently fail when handling incoming connections. Outputs:|
gnunet-rest-server-24073 ERROR Assertion failed at scheduler.c:2010
gnunet-gns-proxy-24963 ERROR Assertion failed at scheduler.c:2010
|Steps To Reproduce||1. Start gnunet with gns|
2. Start gnunet-gns-proxy
3. curl --socks5-hostname localhost:7777 test.gnu -v
4. Observe log
|Tags||No tags attached.|
I just went back to 542bf871c6eebdc645884960559e15ce400939e6 (before the driver shenanigans) and it works.
I did not pinpoint the exact commit that broke it though.
||Assined to lurchi due to history pinpoint|
Yep. Pinpointed it to https://gnunet.org/git/gnunet.git/commit/?id=dca8eb9c44dbdfa3fc572dde40c1e0927cfee8db.
Before that one it still works.
I can't reproduce it yet. I ran
I attached the log output of gnunet-gls-proxy after I ran curl. Looks OK, doesn't it? Can you maybe generate a coredump and get a backtrace?
Jan 08 17:28:51-959104 util-scheduler-1246 DEBUG Running task 0x659db0
Jan 08 17:28:51-959210 util-network-1246 DEBUG Accepting connection on `[::]:7777'
Jan 08 17:28:51-959252 gnunet-gns-proxy-1246 DEBUG Got an inbound connection, waiting for data
Jan 08 17:28:51-959296 util-scheduler-1246 DEBUG destroying task 0x659db0
Jan 08 17:28:51-959316 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959336 util-scheduler-1246 DEBUG Running task 0x659e40
Jan 08 17:28:51-959362 gnunet-gns-proxy-1246 DEBUG Processing 3 bytes of socks data in state 0
Jan 08 17:28:51-959385 util-scheduler-1246 DEBUG destroying task 0x659e40
Jan 08 17:28:51-959401 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959418 util-scheduler-1246 DEBUG Running task 0x659840
Jan 08 17:28:51-959464 util-scheduler-1246 DEBUG destroying task 0x659840
Jan 08 17:28:51-959501 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959530 util-scheduler-1246 DEBUG Running task 0x652280
Jan 08 17:28:51-959550 gnunet-gns-proxy-1246 DEBUG Processing 15 bytes of socks data in state 1
Jan 08 17:28:51-959571 gnunet-gns-proxy-1246 DEBUG Requested connection is to test.gnu:80
Jan 08 17:28:51-959588 gns-api-1246 DEBUG Trying to lookup `test.gnu' in GNS
Jan 08 17:28:51-959615 gnunet-gns-proxy-1246 DEBUG mq: sending message of type 500, queue empty (MQ: 0x658a40)
Jan 08 17:28:51-959636 util-scheduler-1246 DEBUG canceling task 0x659840
Jan 08 17:28:51-959650 util-scheduler-1246 DEBUG destroying task 0x659840
Jan 08 17:28:51-959662 util-scheduler-1246 DEBUG destroying task 0x652280
Jan 08 17:28:51-959677 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959693 util-scheduler-1246 DEBUG Running task 0x6526b0
Jan 08 17:28:51-959706 gnunet-gns-proxy-1246 DEBUG client: message of type 500 trying to send with socket 0x6588f0 (MQ: 0x658a40
Jan 08 17:28:51-959727 gnunet-gns-proxy-1246 DEBUG client: sending message of type 500 successful
Jan 08 17:28:51-959750 util-scheduler-1246 DEBUG Adding task 0x652280
Jan 08 17:28:51-959769 util-scheduler-1246 DEBUG destroying task 0x6526b0
Jan 08 17:28:51-959794 util-scheduler-1246 DEBUG select timeout = 0 ms
Jan 08 17:28:51-959818 util-scheduler-1246 DEBUG Running task 0x652280
Jan 08 17:28:51-959838 util-scheduler-1246 DEBUG destroying task 0x652280
Jan 08 17:28:51-959864 util-scheduler-1246 DEBUG select timeout = forever
#0 0x00007f48cfd579fb in raise () from /lib64/libc.so.6
#1 0x00007f48cfd59800 in abort () from /lib64/libc.so.6
#2 0x00007f48d1702959 in GNUNET_abort_ () at common_logging.c:284
#3 0x00007f48d17334b0 in GNUNET_SCHEDULER_run_from_driver (
sh=sh@entry=0x7f48d19549b0 <sh>) at scheduler.c:1912
#4 0x00007f48d1733cbd in select_loop (cls=0x7ffd601d43f0,
sh=0x7f48d19549b0 <sh>) at scheduler.c:2314
#5 0x00007f48d1734050 in GNUNET_SCHEDULER_run_with_driver (
task_cls=task_cls@entry=0x7ffd601d4490) at scheduler.c:2124
#6 0x00007f48d17342d5 in GNUNET_SCHEDULER_run (
task_cls=task_cls@entry=0x7ffd601d4490) at scheduler.c:689
#7 0x00007f48d172ce72 in GNUNET_PROGRAM_run2 (argc=<optimized out>,
argv=0x18780e0, binaryName=binaryName@entry=0x409adb "gnunet-gns-proxy",
binaryHelp=<optimized out>, options=options@entry=0x7ffd601d4680,
task=task@entry=0x4070c0 <run>, task_cls=<optimized out>,
run_without_scheduler=<optimized out>) at program.c:312
#8 0x00007f48d172d1af in GNUNET_PROGRAM_run (argc=<optimized out>,
binaryHelp=<optimized out>, options=options@entry=0x7ffd601d4680,
task=task@entry=0x4070c0 <run>, task_cls=0x0) at program.c:350
---Type <return> to continue, or q <return> to quit---
#9 0x0000000000403660 in main (argc=<optimized out>, argv=<optimized out>)
I traced this a bit:
Jan 08 19:30:35-136238 gnunet-gns-proxy-22929 ERROR Adding task 0xa70e040
Jan 08 19:30:35-136464 gnunet-gns-proxy-22929 ERROR Deleting task 0xa70da60
Jan 08 19:30:35-136695 util-scheduler-22929 DEBUG destroying task 0xa70d9a0
Jan 08 19:30:35-136934 util-scheduler-22929 DEBUG select timeout = forever
Jan 08 19:30:35-137262 util-scheduler-22929 DEBUG Running task 0xa70ddb0
Jan 08 19:30:35-183503 gnunet-gns-proxy-22929 ERROR Adding task 0xa7167f0
Jan 08 19:30:35-184965 util-scheduler-22929 DEBUG Adding task 0xa716680
Jan 08 19:30:35-187405 util-scheduler-22929 DEBUG Adding task 0xa716900
Jan 08 19:30:35-187869 gnunet-gns-proxy-22929 ERROR Deleting task 0xa70de70
Jan 08 19:30:35-188305 util-scheduler-22929 DEBUG destroying task 0xa70ddb0
Jan 08 19:30:35-190737 util-scheduler-22929 DEBUG select timeout = 14998264 µs
Jan 08 19:30:35-191207 util-scheduler-22929 DEBUG Running task 0xa716680
Jan 08 19:30:35-207386 util-scheduler-22929 DEBUG canceling task 0xa716900
Jan 08 19:30:35-209432 util-scheduler-22929 DEBUG destroying task 0xa716900
Jan 08 19:30:35-330282 util-scheduler-22929 DEBUG Adding task 0xa7206c0
Jan 08 19:30:35-343409 util-scheduler-22929 DEBUG Adding task 0xa720970
Jan 08 19:30:35-343966 gnunet-gns-proxy-22929 ERROR Deleting task 0xa7167f0
Jan 08 19:30:35-344459 util-scheduler-22929 DEBUG destroying task 0xa716680
Jan 08 19:30:35-346669 util-scheduler-22929 DEBUG select timeout = 0 ms
Jan 08 19:30:35-347846 util-scheduler-22929 DEBUG Running task 0xa720970
Jan 08 19:30:35-348749 util-scheduler-22929 DEBUG Adding task 0xa720da0
Jan 08 19:30:35-350317 util-scheduler-22929 ERROR driver could not delete task 0xa720970
Is it really necessary to remove the task from the DLL in scheduler.c "select_del"?
I looks to me like this task is never scheduled but instantly run. Thus never in the DLL and then the assertion kicks in?
I looked at gnunet-gns-proxy.c and tried to make sure that all tasks are NULLed immediately after canceling and that the SCHEDULER_add functions don't get invalid input (I found the logic in do_accept a little confusing). See https://gnunet.org/git/gnunet.git/commit/?id=5e233bcb4b48593af84c551738369fe89bc77251
I doubt that it fixes the issue though.
Will look at your suggestion.
That commit actually made it work. Let me recheck to verify.
I tried to do the same for the rest server, but that still suffers from the assertion...
||I really don't know why #1 that commit fixes anything and #2 What I need to do for the rest server (or even plugins) so that this assertion is not triggered. Can you explain?|
The problem you described can occur when
1. a task is canceled twice or
2. a task that was already run is canceled
So in that commit I tried to make sure that
1. task pointers are NULLed immediately after GNUNET_SCHEDULER_cancel was called on them
2. task pointers are NULLed in the scheduled function
The changes in do_accept I only did because I found the code confusing.
I guess it's possible that the old scheduler code did not complain when GNUNET_SCHEDULER_cancel was called on a non-existing task. The new code does complain because it's a programming error. It's also possible that the new scheduler code is still buggy of course.
It's unfortunate that I can't reproduce the problem. Should I look at the rest-server anyway? What plugins are suffering from this problem? What do you do to reproduce the problem there?
I think I found the problem bit I do not know if the mitigation I found is sane:
What happends is that if I call SCHEDULER_add_select with a set of FDs where 0 are set. This is allowed when adding the select task, a task with fds != NULL is created and added to the queue. When it becomes ready (e.g. timeout) it is "run" and then the assertion fails becuase there is not actually an entry for this task in the scheduled_head list.
If I add an
if (0 == task->fds_len)
into select_del it fixes the bug.
Please check 16f04cf4fb1a7b489b3672d19818ffd1fd8d57f1 for a proposed fix
There was logic for detecting if GNUNET_SCHEDULER_add_select is called with no fdsets/empty fdsets, so no task is queued in the first place and the driver does not have to care about "empty tasks". The logic was missing some cases.
I tried to fix it in c66e64df31713e5e97297300ec8b39536da1369b. Can you test please?
In this commit you introduced another assertion that fails now in scheduler.c:1800.
The reason why this assertion fails is the same. I call add_select with 0 fdsets set. Note that I do not know that before calling add_select. See rest/gnunet-rest-server.c:450 and following.
The scheduler needs to gracely handle the case where 0 fds are set and add_select is called with them.
If that assertion is remove, we are back to square 1(the original assertion at line 2009 now).
Ok, I think I got it. The scheduler was relying on the maximum FD number field of GNUNET_NETWORK_FDSet (nsds), but it seems libmicrohttpd does not provide the maximum FD number per FD set but only from all its FD sets (see implementation of MHD_get_fdset in https://gnunet.org/git/libmicrohttpd.git/tree/src/microhttpd/daemon.c). So the scheduler sometimes gets empty FD sets although their maximum FD number is greater NULL.
In 67bb148acd0de26403e4dd89659efeec8889a2c5 I tried to consider that. Can you test please?
||Fixed. Thank you.|
||Fixed thank you|
|2018-01-08 15:43||schanzen||New Issue|
|2018-01-08 16:27||schanzen||Note Added: 0012742|
|2018-01-08 16:31||schanzen||Assigned To||=> lurchi|
|2018-01-08 16:31||schanzen||Status||new => assigned|
|2018-01-08 16:32||schanzen||Note Added: 0012743|
|2018-01-08 16:53||schanzen||Note Added: 0012744|
|2018-01-08 17:31||lurchi||Note Added: 0012745|
|2018-01-08 17:31||lurchi||Note Added: 0012746|
|2018-01-08 18:23||schanzen||Note Added: 0012747|
|2018-01-08 20:33||schanzen||Note Added: 0012748|
|2018-01-08 20:33||schanzen||Note Edited: 0012748||View Revisions|
|2018-01-08 20:36||lurchi||Note Added: 0012749|
|2018-01-09 10:50||schanzen||Note Added: 0012750|
|2018-01-09 11:13||schanzen||Note Added: 0012751|
|2018-01-09 16:17||lurchi||Note Added: 0012756|
|2018-01-09 16:22||lurchi||Note Edited: 0012756||View Revisions|
|2018-01-10 04:20||schanzen||Note Added: 0012759|
|2018-01-10 04:30||schanzen||Status||assigned => feedback|
|2018-01-10 04:30||schanzen||Note Added: 0012761|
|2018-01-10 04:30||schanzen||Note Edited: 0012761||View Revisions|
|2018-01-10 17:25||lurchi||Note Added: 0012762|
|2018-01-10 17:44||schanzen||Note Added: 0012763|
|2018-01-10 17:44||schanzen||Status||feedback => assigned|
|2018-01-10 23:22||lurchi||Note Added: 0012764|
|2018-01-11 08:17||schanzen||Note Added: 0012765|
|2018-01-11 08:17||schanzen||Status||assigned => resolved|
|2018-01-11 08:17||schanzen||Resolution||open => fixed|
|2018-01-11 08:17||schanzen||Note Added: 0012766|
|2018-06-07 00:24||Christian Grothoff||Status||resolved => closed|
|2018-06-07 00:26||Christian Grothoff||Fixed in Version||=> 0.11.0pre66|