View Issue Details

IDProjectCategoryView StatusLast Update
0005250GNUnetutil librarypublic2018-06-07 00:26
ReporterschanzenAssigned Tolurchi 
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product VersionSVN HEAD 
Target Version0.11.0Fixed in Version0.11.0pre66 
Summary0005250: Scheduler asserion fails with mhd-related components
DescriptionThe gnunet-gns-proxy (and also the rest service) currently fail when handling incoming connections. Outputs:

gnunet-rest-server-24073 ERROR Assertion failed at scheduler.c:2010
or
gnunet-gns-proxy-24963 ERROR Assertion failed at scheduler.c:2010
Steps To Reproduce1. Start gnunet with gns
2. Start gnunet-gns-proxy
3. curl --socks5-hostname localhost:7777 test.gnu -v
4. Observe log
TagsNo tags attached.

Activities

schanzen

2018-01-08 16:27

developer   ~0012742

I just went back to 542bf871c6eebdc645884960559e15ce400939e6 (before the driver shenanigans) and it works.

I did not pinpoint the exact commit that broke it though.

schanzen

2018-01-08 16:32

developer   ~0012743

Assined to lurchi due to history pinpoint

schanzen

2018-01-08 16:53

developer   ~0012744

Yep. Pinpointed it to https://gnunet.org/git/gnunet.git/commit/?id=dca8eb9c44dbdfa3fc572dde40c1e0927cfee8db.
Before that one it still works.

lurchi

2018-01-08 17:31

developer   ~0012745

I can't reproduce it yet. I ran

export GNUNET_FORCE_LOG=";;;;DEBUG"
gnunet-gns-proxy-setup-ca
gnunet-gns-import.sh
gnunet-gns-proxy

I attached the log output of gnunet-gls-proxy after I ran curl. Looks OK, doesn't it? Can you maybe generate a coredump and get a backtrace?

lurchi

2018-01-08 17:31

developer   ~0012746

Jan 08 17:28:51-959104 util-scheduler-1246 DEBUG Running task 0x659db0
Jan 08 17:28:51-959210 util-network-1246 DEBUG Accepting connection on `[::]:7777'
Jan 08 17:28:51-959252 gnunet-gns-proxy-1246 DEBUG Got an inbound connection, waiting for data
Jan 08 17:28:51-959296 util-scheduler-1246 DEBUG destroying task 0x659db0
Jan 08 17:28:51-959316 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959336 util-scheduler-1246 DEBUG Running task 0x659e40
Jan 08 17:28:51-959362 gnunet-gns-proxy-1246 DEBUG Processing 3 bytes of socks data in state 0
Jan 08 17:28:51-959385 util-scheduler-1246 DEBUG destroying task 0x659e40
Jan 08 17:28:51-959401 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959418 util-scheduler-1246 DEBUG Running task 0x659840
Jan 08 17:28:51-959464 util-scheduler-1246 DEBUG destroying task 0x659840
Jan 08 17:28:51-959501 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959530 util-scheduler-1246 DEBUG Running task 0x652280
Jan 08 17:28:51-959550 gnunet-gns-proxy-1246 DEBUG Processing 15 bytes of socks data in state 1
Jan 08 17:28:51-959571 gnunet-gns-proxy-1246 DEBUG Requested connection is to test.gnu:80
Jan 08 17:28:51-959588 gns-api-1246 DEBUG Trying to lookup `test.gnu' in GNS
Jan 08 17:28:51-959615 gnunet-gns-proxy-1246 DEBUG mq: sending message of type 500, queue empty (MQ: 0x658a40)
Jan 08 17:28:51-959636 util-scheduler-1246 DEBUG canceling task 0x659840
Jan 08 17:28:51-959650 util-scheduler-1246 DEBUG destroying task 0x659840
Jan 08 17:28:51-959662 util-scheduler-1246 DEBUG destroying task 0x652280
Jan 08 17:28:51-959677 util-scheduler-1246 DEBUG select timeout = forever
Jan 08 17:28:51-959693 util-scheduler-1246 DEBUG Running task 0x6526b0
Jan 08 17:28:51-959706 gnunet-gns-proxy-1246 DEBUG client: message of type 500 trying to send with socket 0x6588f0 (MQ: 0x658a40
Jan 08 17:28:51-959727 gnunet-gns-proxy-1246 DEBUG client: sending message of type 500 successful
Jan 08 17:28:51-959750 util-scheduler-1246 DEBUG Adding task 0x652280
Jan 08 17:28:51-959769 util-scheduler-1246 DEBUG destroying task 0x6526b0
Jan 08 17:28:51-959794 util-scheduler-1246 DEBUG select timeout = 0 ms
Jan 08 17:28:51-959818 util-scheduler-1246 DEBUG Running task 0x652280
Jan 08 17:28:51-959838 util-scheduler-1246 DEBUG destroying task 0x652280
Jan 08 17:28:51-959864 util-scheduler-1246 DEBUG select timeout = forever

schanzen

2018-01-08 18:23

developer   ~0012747

GDB backtrace:

#0 0x00007f48cfd579fb in raise () from /lib64/libc.so.6
#1 0x00007f48cfd59800 in abort () from /lib64/libc.so.6
#2 0x00007f48d1702959 in GNUNET_abort_ () at common_logging.c:284
#3 0x00007f48d17334b0 in GNUNET_SCHEDULER_run_from_driver (
    sh=sh@entry=0x7f48d19549b0 <sh>) at scheduler.c:1912
#4 0x00007f48d1733cbd in select_loop (cls=0x7ffd601d43f0,
    sh=0x7f48d19549b0 <sh>) at scheduler.c:2314
#5 0x00007f48d1734050 in GNUNET_SCHEDULER_run_with_driver (
    driver=driver@entry=0x1878260,
    task=task@entry=0x7f48d172c7a0 <program_main>,
    task_cls=task_cls@entry=0x7ffd601d4490) at scheduler.c:2124
#6 0x00007f48d17342d5 in GNUNET_SCHEDULER_run (
    task=task@entry=0x7f48d172c7a0 <program_main>,
    task_cls=task_cls@entry=0x7ffd601d4490) at scheduler.c:689
#7 0x00007f48d172ce72 in GNUNET_PROGRAM_run2 (argc=<optimized out>,
    argv=0x18780e0, binaryName=binaryName@entry=0x409adb "gnunet-gns-proxy",
    binaryHelp=<optimized out>, options=options@entry=0x7ffd601d4680,
    task=task@entry=0x4070c0 <run>, task_cls=<optimized out>,
    run_without_scheduler=<optimized out>) at program.c:312
#8 0x00007f48d172d1af in GNUNET_PROGRAM_run (argc=<optimized out>,
    argv=<optimized out>,
    binaryName=binaryName@entry=0x409adb "gnunet-gns-proxy",
    binaryHelp=<optimized out>, options=options@entry=0x7ffd601d4680,
    task=task@entry=0x4070c0 <run>, task_cls=0x0) at program.c:350
---Type <return> to continue, or q <return> to quit---
#9 0x0000000000403660 in main (argc=<optimized out>, argv=<optimized out>)
    at gnunet-gns-proxy.c:3310

schanzen

2018-01-08 20:33

developer   ~0012748

Last edited: 2018-01-08 20:33

View 2 revisions

I traced this a bit:
Jan 08 19:30:35-136238 gnunet-gns-proxy-22929 ERROR Adding task 0xa70e040
Jan 08 19:30:35-136464 gnunet-gns-proxy-22929 ERROR Deleting task 0xa70da60
Jan 08 19:30:35-136695 util-scheduler-22929 DEBUG destroying task 0xa70d9a0
Jan 08 19:30:35-136934 util-scheduler-22929 DEBUG select timeout = forever
Jan 08 19:30:35-137262 util-scheduler-22929 DEBUG Running task 0xa70ddb0
Jan 08 19:30:35-183503 gnunet-gns-proxy-22929 ERROR Adding task 0xa7167f0
Jan 08 19:30:35-184965 util-scheduler-22929 DEBUG Adding task 0xa716680
Jan 08 19:30:35-187405 util-scheduler-22929 DEBUG Adding task 0xa716900
Jan 08 19:30:35-187869 gnunet-gns-proxy-22929 ERROR Deleting task 0xa70de70
Jan 08 19:30:35-188305 util-scheduler-22929 DEBUG destroying task 0xa70ddb0
Jan 08 19:30:35-190737 util-scheduler-22929 DEBUG select timeout = 14998264 ┬Ás
Jan 08 19:30:35-191207 util-scheduler-22929 DEBUG Running task 0xa716680
Jan 08 19:30:35-207386 util-scheduler-22929 DEBUG canceling task 0xa716900
Jan 08 19:30:35-209432 util-scheduler-22929 DEBUG destroying task 0xa716900
Jan 08 19:30:35-330282 util-scheduler-22929 DEBUG Adding task 0xa7206c0
Jan 08 19:30:35-343409 util-scheduler-22929 DEBUG Adding task 0xa720970
Jan 08 19:30:35-343966 gnunet-gns-proxy-22929 ERROR Deleting task 0xa7167f0
Jan 08 19:30:35-344459 util-scheduler-22929 DEBUG destroying task 0xa716680
Jan 08 19:30:35-346669 util-scheduler-22929 DEBUG select timeout = 0 ms
Jan 08 19:30:35-347846 util-scheduler-22929 DEBUG Running task 0xa720970
Jan 08 19:30:35-348749 util-scheduler-22929 DEBUG Adding task 0xa720da0
Jan 08 19:30:35-350317 util-scheduler-22929 ERROR driver could not delete task 0xa720970

Is it really necessary to remove the task from the DLL in scheduler.c "select_del"?
I looks to me like this task is never scheduled but instantly run. Thus never in the DLL and then the assertion kicks in?

lurchi

2018-01-08 20:36

developer   ~0012749

I looked at gnunet-gns-proxy.c and tried to make sure that all tasks are NULLed immediately after canceling and that the SCHEDULER_add functions don't get invalid input (I found the logic in do_accept a little confusing). See https://gnunet.org/git/gnunet.git/commit/?id=5e233bcb4b48593af84c551738369fe89bc77251

I doubt that it fixes the issue though.

Will look at your suggestion.

schanzen

2018-01-09 10:50

developer   ~0012750

That commit actually made it work. Let me recheck to verify.
I tried to do the same for the rest server, but that still suffers from the assertion...

schanzen

2018-01-09 11:13

developer   ~0012751

I really don't know why #1 that commit fixes anything and #2 What I need to do for the rest server (or even plugins) so that this assertion is not triggered. Can you explain?

lurchi

2018-01-09 16:17

developer   ~0012756

Last edited: 2018-01-09 16:22

View 2 revisions

The problem you described can occur when

1. a task is canceled twice or
2. a task that was already run is canceled

So in that commit I tried to make sure that
1. task pointers are NULLed immediately after GNUNET_SCHEDULER_cancel was called on them
2. task pointers are NULLed in the scheduled function

The changes in do_accept I only did because I found the code confusing.

I guess it's possible that the old scheduler code did not complain when GNUNET_SCHEDULER_cancel was called on a non-existing task. The new code does complain because it's a programming error. It's also possible that the new scheduler code is still buggy of course.

It's unfortunate that I can't reproduce the problem. Should I look at the rest-server anyway? What plugins are suffering from this problem? What do you do to reproduce the problem there?

schanzen

2018-01-10 04:20

developer   ~0012759

I think I found the problem bit I do not know if the mitigation I found is sane:

What happends is that if I call SCHEDULER_add_select with a set of FDs where 0 are set. This is allowed when adding the select task, a task with fds != NULL is created and added to the queue. When it becomes ready (e.g. timeout) it is "run" and then the assertion fails becuase there is not actually an entry for this task in the scheduled_head list.

If I add an

if (0 == task->fds_len)
  return GNUNET_OK;

into select_del it fixes the bug.

schanzen

2018-01-10 04:30

developer   ~0012761

Last edited: 2018-01-10 04:30

View 2 revisions

Please check 16f04cf4fb1a7b489b3672d19818ffd1fd8d57f1 for a proposed fix

lurchi

2018-01-10 17:25

developer   ~0012762

Good catch!

There was logic for detecting if GNUNET_SCHEDULER_add_select is called with no fdsets/empty fdsets, so no task is queued in the first place and the driver does not have to care about "empty tasks". The logic was missing some cases.

I tried to fix it in c66e64df31713e5e97297300ec8b39536da1369b. Can you test please?

schanzen

2018-01-10 17:44

developer   ~0012763

In this commit you introduced another assertion that fails now in scheduler.c:1800.

The reason why this assertion fails is the same. I call add_select with 0 fdsets set. Note that I do not know that before calling add_select. See rest/gnunet-rest-server.c:450 and following.
The scheduler needs to gracely handle the case where 0 fds are set and add_select is called with them.
If that assertion is remove, we are back to square 1(the original assertion at line 2009 now).

lurchi

2018-01-10 23:22

developer   ~0012764

Ok, I think I got it. The scheduler was relying on the maximum FD number field of GNUNET_NETWORK_FDSet (nsds), but it seems libmicrohttpd does not provide the maximum FD number per FD set but only from all its FD sets (see implementation of MHD_get_fdset in https://gnunet.org/git/libmicrohttpd.git/tree/src/microhttpd/daemon.c). So the scheduler sometimes gets empty FD sets although their maximum FD number is greater NULL.

In 67bb148acd0de26403e4dd89659efeec8889a2c5 I tried to consider that. Can you test please?

schanzen

2018-01-11 08:17

developer   ~0012765

Fixed. Thank you.

schanzen

2018-01-11 08:17

developer   ~0012766

Fixed thank you

Issue History

Date Modified Username Field Change
2018-01-08 15:43 schanzen New Issue
2018-01-08 16:27 schanzen Note Added: 0012742
2018-01-08 16:31 schanzen Assigned To => lurchi
2018-01-08 16:31 schanzen Status new => assigned
2018-01-08 16:32 schanzen Note Added: 0012743
2018-01-08 16:53 schanzen Note Added: 0012744
2018-01-08 17:31 lurchi Note Added: 0012745
2018-01-08 17:31 lurchi Note Added: 0012746
2018-01-08 18:23 schanzen Note Added: 0012747
2018-01-08 20:33 schanzen Note Added: 0012748
2018-01-08 20:33 schanzen Note Edited: 0012748 View Revisions
2018-01-08 20:36 lurchi Note Added: 0012749
2018-01-09 10:50 schanzen Note Added: 0012750
2018-01-09 11:13 schanzen Note Added: 0012751
2018-01-09 16:17 lurchi Note Added: 0012756
2018-01-09 16:22 lurchi Note Edited: 0012756 View Revisions
2018-01-10 04:20 schanzen Note Added: 0012759
2018-01-10 04:30 schanzen Status assigned => feedback
2018-01-10 04:30 schanzen Note Added: 0012761
2018-01-10 04:30 schanzen Note Edited: 0012761 View Revisions
2018-01-10 17:25 lurchi Note Added: 0012762
2018-01-10 17:44 schanzen Note Added: 0012763
2018-01-10 17:44 schanzen Status feedback => assigned
2018-01-10 23:22 lurchi Note Added: 0012764
2018-01-11 08:17 schanzen Note Added: 0012765
2018-01-11 08:17 schanzen Status assigned => resolved
2018-01-11 08:17 schanzen Resolution open => fixed
2018-01-11 08:17 schanzen Note Added: 0012766
2018-06-07 00:24 Christian Grothoff Status resolved => closed
2018-06-07 00:26 Christian Grothoff Fixed in Version => 0.11.0pre66