View Issue Details

IDProjectCategoryView StatusLast Update
0004034GNUnettestbed servicepublic2018-06-07 00:34
ReporterSree Harsha TotakuraAssigned ToSree Harsha Totakura 
PrioritynormalSeveritycrashReproducibilitysometimes
Status confirmedResolutionopen 
Product VersionSVN HEAD 
Target VersionFixed in Version 
Summary0004034: Crash while running multi-host experiments
DescriptionTestbed crashes while running multi-host experiments
Additional Informationwarning: core file may not match specified executable file.
[New LWP 26773]

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `gnunet-service-testbed -c /tmp/totakura/testbed-helperBYFziJ/0/config'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GNUNET_TESTBED_remove_opc_ (c=0x0, opc=opc@entry=0x2265900) at testbed_api.c:350
350 GNUNET_assert (NULL != c->opc_map);
(gdb) bt
#0 GNUNET_TESTBED_remove_opc_ (c=0x0, opc=opc@entry=0x2265900) at testbed_api.c:350
#1 0x00007f37fe4db01f in GNUNET_TESTBED_forward_operation_msg_cancel_ (
    opc=0x2265900) at testbed_api.c:1324
#2 0x0000000000406e09 in GST_clear_fopcq () at gnunet-service-testbed.c:738
#3 0x0000000000407070 in shutdown_task (cls=<optimized out>, tc=<optimized out>)
    at gnunet-service-testbed.c:790
#4 0x00007f37ff16e9a6 in run_ready (ws=0x2206ad0, rs=0x2206a40) at scheduler.c:587
#5 GNUNET_SCHEDULER_run (task=task@entry=0x7f37ff175b60 <service_task>,
    task_cls=task_cls@entry=0x7fffbd24a760) at scheduler.c:868
#6 0x00007f37ff178f6c in GNUNET_SERVICE_run (argc=<optimized out>,
    argv=<optimized out>, service_name=service_name@entry=0x41a892 "testbed",
    options=options@entry=GNUNET_SERVICE_OPTION_NONE,
    task=task@entry=0x4049e0 <testbed_run>, task_cls=task_cls@entry=0x0)
    at service.c:1503
#7 0x0000000000404119 in main (argc=<optimized out>, argv=<optimized out>)
    at gnunet-service-testbed.c:963
(gdb) bt full
#0 GNUNET_TESTBED_remove_opc_ (c=0x0, opc=opc@entry=0x2265900) at testbed_api.c:350
        __FUNCTION__ = "GNUNET_TESTBED_remove_opc_"
#1 0x00007f37fe4db01f in GNUNET_TESTBED_forward_operation_msg_cancel_ (
    opc=0x2265900) at testbed_api.c:1324
No locals.
#2 0x0000000000406e09 in GST_clear_fopcq () at gnunet-service-testbed.c:738
        fopc = 0x2267140
        __FUNCTION__ = "GST_clear_fopcq"
#3 0x0000000000407070 in shutdown_task (cls=<optimized out>, tc=<optimized out>)
    at gnunet-service-testbed.c:790
        mq_entry = <optimized out>
        id = <optimized out>
        __FUNCTION__ = "shutdown_task"
#4 0x00007f37ff16e9a6 in run_ready (ws=0x2206ad0, rs=0x2206a40) at scheduler.c:587
        p = GNUNET_SCHEDULER_PRIORITY_SHUTDOWN
        pos = 0x2222f70
        tc = {reason = GNUNET_SCHEDULER_REASON_SHUTDOWN, read_ready = 0x2206a40,
          write_ready = 0x2206ad0}
#5 GNUNET_SCHEDULER_run (task=task@entry=0x7f37ff175b60 <service_task>,
    task_cls=task_cls@entry=0x7fffbd24a760) at scheduler.c:868
        rs = 0x2206a40
        ws = 0x2206ad0
        timeout = <optimized out>
        ret = <optimized out>
        shc_int = 0x2206b80
        shc_term = 0x2206c40
        shc_quit = 0x2206dc0
        shc_hup = 0x2206e80
        shc_pipe = 0x2206d00
        last_tr = 858
        busy_wait_warning = 0
        pr = 0x22069c0
        c = 0 '\000'
        __FUNCTION__ = "GNUNET_SCHEDULER_run"
#6 0x00007f37ff178f6c in GNUNET_SERVICE_run (argc=<optimized out>,
    argv=<optimized out>, service_name=service_name@entry=0x41a892 "testbed",
    options=options@entry=GNUNET_SERVICE_OPTION_NONE,
    task=task@entry=0x4049e0 <testbed_run>, task_cls=task_cls@entry=0x0)
    at service.c:1503
        err = 0
        ret = <optimized out>
        cfg_fn = 0x21fc700 "~/.config/gnunet.conf"
        opt_cfg_fn = 0x21fc850 "/tmp/totakura/testbed-helperBYFziJ/0/config"
        loglev = 0x0
        logfile = 0x0
        do_daemonize = 0
        i = <optimized out>
---Type <return> to continue, or q <return> to quit---
        skew_offset = 139878479320632
        skew_variance = 139878479342184
        clock_offset = <optimized out>
        sctx = {cfg = 0x21fc720, server = 0x2207280, addrs = 0x22068c0,
          service_name = 0x41a892 "testbed", task = 0x4049e0 <testbed_run>,
          task_cls = 0x0, v4_denied = 0x0, v6_denied = 0x0, v4_allowed = 0x2205800,
          v6_allowed = 0x22069e0, my_handlers = 0x21fc740, addrlens = 0x2215830,
          lsocks = 0x0, shutdown_task = 0x2207310, timeout = {
            rel_value_us = 18446744073709551615}, ret = 1, ready_confirm_fd = -1,
          require_found = 1, match_uid = 1, match_gid = 1,
          options = GNUNET_SERVICE_OPTION_NONE}
        cfg = 0x21fc720
        xdg = <optimized out>
        service_options = {{shortName = 99 'c', name = 0x7f37ff185723 "config",
            argumentHelp = 0x7f37ff18572a "FILENAME",
            description = 0x7f37ff1857f0 "use configuration file FILENAME",
            require_argument = 1,
            processor = 0x7f37ff15eca0 <GNUNET_GETOPT_set_string>,
            scls = 0x7fffbd24a690}, {shortName = 100 'd',
            name = 0x7f37ff186fbb "daemonize", argumentHelp = 0x0,
            description = 0x7f37ff187318 "do daemonize (detach from terminal)",
            require_argument = 0,
            processor = 0x7f37ff15ec90 <GNUNET_GETOPT_set_one>,
            scls = 0x7fffbd24a684}, {shortName = 104 'h',
            name = 0x7f37ff18573e "help", argumentHelp = 0x0,
            description = 0x7f37ff185733 "print this help", require_argument = 0,
            processor = 0x7f37ff15e930 <GNUNET_GETOPT_format_help_>, scls = 0x0}, {
            shortName = 76 'L', name = 0x7f37ff185743 "log",
            argumentHelp = 0x7f37ff185747 "LOGLEVEL",
            description = 0x7f37ff185810 "configure logging to use LOGLEVEL",
            require_argument = 1,
            processor = 0x7f37ff15eca0 <GNUNET_GETOPT_set_string>,
            scls = 0x7fffbd24a698}, {shortName = 108 'l',
            name = 0x7f37ff185750 "logfile",
            argumentHelp = 0x7f37ff181484 "LOGFILE",
            description = 0x7f37ff185838 "configure logging to write logs to LOGFILE", require_argument = 1, processor = 0x7f37ff15eca0 <GNUNET_GETOPT_set_string>,
            scls = 0x7fffbd24a6a0}, {shortName = 118 'v',
            name = 0x7f37ff185758 "version", argumentHelp = 0x0,
            description = 0x7f37ff185760 "print the version number",
            require_argument = 0,
            processor = 0x7f37ff15e910 <GNUNET_GETOPT_print_version_>,
            scls = 0x7f37ff185779}, {shortName = 0 '\000', name = 0x0,
            argumentHelp = 0x0, description = 0x0, require_argument = 0,
            processor = 0x0, scls = 0x0}}
        __FUNCTION__ = "GNUNET_SERVICE_run"
#7 0x0000000000404119 in main (argc=<optimized out>, argv=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at gnunet-service-testbed.c:963
No locals.
(gdb)
TagsNo tags attached.

Relationships

related to 0004035 closedChristian Grothoff Crash in Testbed 

Activities

Christian Grothoff

2015-10-29 11:07

manager   ~0009829

Hmm. This seems VERY far away from anything I touched recently (didn't mess with testbed_api.c at all). Anyway, looks like we didn't expect 'opc->c' to be NULL, so maybe adding assertions when 'opc->c' is initialized to check for non-NULLness is a good idea?

Christian Grothoff

2015-10-29 11:09

manager   ~0009830

Ok, that can't be it, as each time we set "opc->c" we also deref 'c' already and thus implicitly check for non-NULL-ness. Memory corruption!? Yikes.

Christian Grothoff

2015-10-29 11:31

manager   ~0009833

I've tried to reproduce this using 'localhost' twice in 'hosts.txt' given to the profiler with -H. The bug failed to appear, and it was also valgrind-clean. Strange.

Christian Grothoff

2015-10-29 11:32

manager   ~0009834

Oh, to clarify: I did valgrind gnunet-service-testbed by modifying the helper to spawn valgrind and then start testbed 'indirectly'.

Sree Harsha Totakura

2015-10-29 11:37

developer   ~0009836

The testbed profiler starts two testbed controller instances on the first two hosts listed in hosts.txt, but only one of them (the second instance) will be used to spawn peers. The first instance is used as a master controller instance. So, given that this bug arises when you use at least two hosts, you need to populate hosts.txt with three hosts. Here, the first and the second host could be the same, but starting from the third hosts, they should all be different as starting multiple controller instances which spawn peers on the same hosts will lead to port conflicts.

Sree Harsha Totakura

2015-10-29 11:40

developer   ~0009837

Just to note: do not use `localhost' or loopback IP addresses like `127.0.0.1' in hosts.txt. The reason is when controllers on other hosts want to communicate, they will use this address to reach the controller.

So, always use an IP address which at least has a link scope.

Christian Grothoff

2015-10-29 11:46

manager   ~0009838

I'm trying with "fe80::230:48ff:febb:4bb2" (in various variations), but I always get errors because SSH fails to recog the v6 IP:

ssh: Could not resolve hostname fe80: Name or service not known

Christian Grothoff

2015-10-29 11:49

manager   ~0009840

Hmm. Tried with names in /etc/hosts for the IPs, now I get:
grothoff@pixel:~/svn/gnunet/src/testbed$ ssh -6 local3
ssh: connect to host local3 port 22: Invalid argument

Stupid SSH. Grrr.

Christian Grothoff

2015-10-29 11:51

manager   ~0009841

Ah, right, need to specify interface. Silly me.

Christian Grothoff

2015-10-29 11:55

manager   ~0009842

Hmm. Neither SSH directly nor /etc/hosts support specifying the interface with any canonical syntax. Sree: how did you get this to work with link scoped addresses?

Sree Harsha Totakura

2015-10-29 13:38

developer   ~0009846

Oops, I never tried it with IPv6. Sorry.

Sree Harsha Totakura

2015-10-29 13:41

developer   ~0009847

I was wrong in saying that you need at least link scoped address; you need global scoped ones.

Sree Harsha Totakura

2015-10-29 13:44

developer   ~0009848

And, IPv6 does not work in the hosts file as anything after `:' is taken for a port number.

Christian Grothoff

2015-10-29 14:13

manager   ~0009852

I still cannot reproduce this. I get various issues from now using IPv6 global addresses, but not the crash. :-(

Sree Harsha Totakura

2015-10-29 17:10

developer   ~0009855

This is rare. I believe this only happens at the master controller if a slave controller dies unexpectedly. It happened so during this crash.

Sree Harsha Totakura

2015-10-29 17:21

developer   ~0009856

testbed_api_peers.c:853 seems to be only place where opc->c could have been possibly set to NULL; but that code is not called from testbed controller and this crash happens in the controller. :-(

Issue History

Date Modified Username Field Change
2015-10-28 17:58 Sree Harsha Totakura New Issue
2015-10-28 17:58 Sree Harsha Totakura Status new => assigned
2015-10-28 17:58 Sree Harsha Totakura Assigned To => Sree Harsha Totakura
2015-10-28 18:02 Sree Harsha Totakura Relationship added related to 0004035
2015-10-29 11:07 Christian Grothoff Note Added: 0009829
2015-10-29 11:09 Christian Grothoff Note Added: 0009830
2015-10-29 11:31 Christian Grothoff Note Added: 0009833
2015-10-29 11:32 Christian Grothoff Note Added: 0009834
2015-10-29 11:37 Sree Harsha Totakura Note Added: 0009836
2015-10-29 11:40 Sree Harsha Totakura Note Added: 0009837
2015-10-29 11:46 Christian Grothoff Note Added: 0009838
2015-10-29 11:49 Christian Grothoff Note Added: 0009840
2015-10-29 11:51 Christian Grothoff Note Added: 0009841
2015-10-29 11:55 Christian Grothoff Note Added: 0009842
2015-10-29 13:38 Sree Harsha Totakura Note Added: 0009846
2015-10-29 13:41 Sree Harsha Totakura Note Added: 0009847
2015-10-29 13:44 Sree Harsha Totakura Note Added: 0009848
2015-10-29 14:13 Christian Grothoff Note Added: 0009852
2015-10-29 17:10 Sree Harsha Totakura Note Added: 0009855
2015-10-29 17:21 Sree Harsha Totakura Note Added: 0009856
2017-02-26 02:16 Christian Grothoff Product Version => SVN HEAD
2017-02-26 02:16 Christian Grothoff Target Version => 0.11.0
2018-06-07 00:34 Christian Grothoff Status assigned => confirmed
2018-06-07 00:34 Christian Grothoff Target Version 0.11.0 =>