View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0004034 | GNUnet | testbed service | public | 2015-10-28 17:58 | 2018-06-07 00:34 |
Reporter | Sree Harsha Totakura | Assigned To | Sree Harsha Totakura | ||
Priority | normal | Severity | crash | Reproducibility | sometimes |
Status | confirmed | Resolution | open | ||
Product Version | Git master | ||||
Summary | 0004034: Crash while running multi-host experiments | ||||
Description | Testbed crashes while running multi-host experiments | ||||
Additional Information | warning: core file may not match specified executable file. [New LWP 26773] warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? Core was generated by `gnunet-service-testbed -c /tmp/totakura/testbed-helperBYFziJ/0/config'. Program terminated with signal SIGSEGV, Segmentation fault. #0 GNUNET_TESTBED_remove_opc_ (c=0x0, opc=opc@entry=0x2265900) at testbed_api.c:350 350 GNUNET_assert (NULL != c->opc_map); (gdb) bt #0 GNUNET_TESTBED_remove_opc_ (c=0x0, opc=opc@entry=0x2265900) at testbed_api.c:350 #1 0x00007f37fe4db01f in GNUNET_TESTBED_forward_operation_msg_cancel_ ( opc=0x2265900) at testbed_api.c:1324 #2 0x0000000000406e09 in GST_clear_fopcq () at gnunet-service-testbed.c:738 #3 0x0000000000407070 in shutdown_task (cls=<optimized out>, tc=<optimized out>) at gnunet-service-testbed.c:790 #4 0x00007f37ff16e9a6 in run_ready (ws=0x2206ad0, rs=0x2206a40) at scheduler.c:587 #5 GNUNET_SCHEDULER_run (task=task@entry=0x7f37ff175b60 <service_task>, task_cls=task_cls@entry=0x7fffbd24a760) at scheduler.c:868 #6 0x00007f37ff178f6c in GNUNET_SERVICE_run (argc=<optimized out>, argv=<optimized out>, service_name=service_name@entry=0x41a892 "testbed", options=options@entry=GNUNET_SERVICE_OPTION_NONE, task=task@entry=0x4049e0 <testbed_run>, task_cls=task_cls@entry=0x0) at service.c:1503 #7 0x0000000000404119 in main (argc=<optimized out>, argv=<optimized out>) at gnunet-service-testbed.c:963 (gdb) bt full #0 GNUNET_TESTBED_remove_opc_ (c=0x0, opc=opc@entry=0x2265900) at testbed_api.c:350 __FUNCTION__ = "GNUNET_TESTBED_remove_opc_" #1 0x00007f37fe4db01f in GNUNET_TESTBED_forward_operation_msg_cancel_ ( opc=0x2265900) at testbed_api.c:1324 No locals. #2 0x0000000000406e09 in GST_clear_fopcq () at gnunet-service-testbed.c:738 fopc = 0x2267140 __FUNCTION__ = "GST_clear_fopcq" #3 0x0000000000407070 in shutdown_task (cls=<optimized out>, tc=<optimized out>) at gnunet-service-testbed.c:790 mq_entry = <optimized out> id = <optimized out> __FUNCTION__ = "shutdown_task" #4 0x00007f37ff16e9a6 in run_ready (ws=0x2206ad0, rs=0x2206a40) at scheduler.c:587 p = GNUNET_SCHEDULER_PRIORITY_SHUTDOWN pos = 0x2222f70 tc = {reason = GNUNET_SCHEDULER_REASON_SHUTDOWN, read_ready = 0x2206a40, write_ready = 0x2206ad0} #5 GNUNET_SCHEDULER_run (task=task@entry=0x7f37ff175b60 <service_task>, task_cls=task_cls@entry=0x7fffbd24a760) at scheduler.c:868 rs = 0x2206a40 ws = 0x2206ad0 timeout = <optimized out> ret = <optimized out> shc_int = 0x2206b80 shc_term = 0x2206c40 shc_quit = 0x2206dc0 shc_hup = 0x2206e80 shc_pipe = 0x2206d00 last_tr = 858 busy_wait_warning = 0 pr = 0x22069c0 c = 0 '\000' __FUNCTION__ = "GNUNET_SCHEDULER_run" #6 0x00007f37ff178f6c in GNUNET_SERVICE_run (argc=<optimized out>, argv=<optimized out>, service_name=service_name@entry=0x41a892 "testbed", options=options@entry=GNUNET_SERVICE_OPTION_NONE, task=task@entry=0x4049e0 <testbed_run>, task_cls=task_cls@entry=0x0) at service.c:1503 err = 0 ret = <optimized out> cfg_fn = 0x21fc700 "~/.config/gnunet.conf" opt_cfg_fn = 0x21fc850 "/tmp/totakura/testbed-helperBYFziJ/0/config" loglev = 0x0 logfile = 0x0 do_daemonize = 0 i = <optimized out> ---Type <return> to continue, or q <return> to quit--- skew_offset = 139878479320632 skew_variance = 139878479342184 clock_offset = <optimized out> sctx = {cfg = 0x21fc720, server = 0x2207280, addrs = 0x22068c0, service_name = 0x41a892 "testbed", task = 0x4049e0 <testbed_run>, task_cls = 0x0, v4_denied = 0x0, v6_denied = 0x0, v4_allowed = 0x2205800, v6_allowed = 0x22069e0, my_handlers = 0x21fc740, addrlens = 0x2215830, lsocks = 0x0, shutdown_task = 0x2207310, timeout = { rel_value_us = 18446744073709551615}, ret = 1, ready_confirm_fd = -1, require_found = 1, match_uid = 1, match_gid = 1, options = GNUNET_SERVICE_OPTION_NONE} cfg = 0x21fc720 xdg = <optimized out> service_options = {{shortName = 99 'c', name = 0x7f37ff185723 "config", argumentHelp = 0x7f37ff18572a "FILENAME", description = 0x7f37ff1857f0 "use configuration file FILENAME", require_argument = 1, processor = 0x7f37ff15eca0 <GNUNET_GETOPT_set_string>, scls = 0x7fffbd24a690}, {shortName = 100 'd', name = 0x7f37ff186fbb "daemonize", argumentHelp = 0x0, description = 0x7f37ff187318 "do daemonize (detach from terminal)", require_argument = 0, processor = 0x7f37ff15ec90 <GNUNET_GETOPT_set_one>, scls = 0x7fffbd24a684}, {shortName = 104 'h', name = 0x7f37ff18573e "help", argumentHelp = 0x0, description = 0x7f37ff185733 "print this help", require_argument = 0, processor = 0x7f37ff15e930 <GNUNET_GETOPT_format_help_>, scls = 0x0}, { shortName = 76 'L', name = 0x7f37ff185743 "log", argumentHelp = 0x7f37ff185747 "LOGLEVEL", description = 0x7f37ff185810 "configure logging to use LOGLEVEL", require_argument = 1, processor = 0x7f37ff15eca0 <GNUNET_GETOPT_set_string>, scls = 0x7fffbd24a698}, {shortName = 108 'l', name = 0x7f37ff185750 "logfile", argumentHelp = 0x7f37ff181484 "LOGFILE", description = 0x7f37ff185838 "configure logging to write logs to LOGFILE", require_argument = 1, processor = 0x7f37ff15eca0 <GNUNET_GETOPT_set_string>, scls = 0x7fffbd24a6a0}, {shortName = 118 'v', name = 0x7f37ff185758 "version", argumentHelp = 0x0, description = 0x7f37ff185760 "print the version number", require_argument = 0, processor = 0x7f37ff15e910 <GNUNET_GETOPT_print_version_>, scls = 0x7f37ff185779}, {shortName = 0 '\000', name = 0x0, argumentHelp = 0x0, description = 0x0, require_argument = 0, processor = 0x0, scls = 0x0}} __FUNCTION__ = "GNUNET_SERVICE_run" #7 0x0000000000404119 in main (argc=<optimized out>, argv=<optimized out>) ---Type <return> to continue, or q <return> to quit--- at gnunet-service-testbed.c:963 No locals. (gdb) | ||||
Tags | No tags attached. | ||||
related to | 0004035 | closed | Christian Grothoff | Crash in Testbed |
|
Hmm. This seems VERY far away from anything I touched recently (didn't mess with testbed_api.c at all). Anyway, looks like we didn't expect 'opc->c' to be NULL, so maybe adding assertions when 'opc->c' is initialized to check for non-NULLness is a good idea? |
|
Ok, that can't be it, as each time we set "opc->c" we also deref 'c' already and thus implicitly check for non-NULL-ness. Memory corruption!? Yikes. |
|
I've tried to reproduce this using 'localhost' twice in 'hosts.txt' given to the profiler with -H. The bug failed to appear, and it was also valgrind-clean. Strange. |
|
Oh, to clarify: I did valgrind gnunet-service-testbed by modifying the helper to spawn valgrind and then start testbed 'indirectly'. |
|
The testbed profiler starts two testbed controller instances on the first two hosts listed in hosts.txt, but only one of them (the second instance) will be used to spawn peers. The first instance is used as a master controller instance. So, given that this bug arises when you use at least two hosts, you need to populate hosts.txt with three hosts. Here, the first and the second host could be the same, but starting from the third hosts, they should all be different as starting multiple controller instances which spawn peers on the same hosts will lead to port conflicts. |
|
Just to note: do not use `localhost' or loopback IP addresses like `127.0.0.1' in hosts.txt. The reason is when controllers on other hosts want to communicate, they will use this address to reach the controller. So, always use an IP address which at least has a link scope. |
|
I'm trying with "fe80::230:48ff:febb:4bb2" (in various variations), but I always get errors because SSH fails to recog the v6 IP: ssh: Could not resolve hostname fe80: Name or service not known |
|
Hmm. Tried with names in /etc/hosts for the IPs, now I get: grothoff@pixel:~/svn/gnunet/src/testbed$ ssh -6 local3 ssh: connect to host local3 port 22: Invalid argument Stupid SSH. Grrr. |
|
Ah, right, need to specify interface. Silly me. |
|
Hmm. Neither SSH directly nor /etc/hosts support specifying the interface with any canonical syntax. Sree: how did you get this to work with link scoped addresses? |
|
Oops, I never tried it with IPv6. Sorry. |
|
I was wrong in saying that you need at least link scoped address; you need global scoped ones. |
|
And, IPv6 does not work in the hosts file as anything after `:' is taken for a port number. |
|
I still cannot reproduce this. I get various issues from now using IPv6 global addresses, but not the crash. :-( |
|
This is rare. I believe this only happens at the master controller if a slave controller dies unexpectedly. It happened so during this crash. |
|
testbed_api_peers.c:853 seems to be only place where opc->c could have been possibly set to NULL; but that code is not called from testbed controller and this crash happens in the controller. :-( |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-10-28 17:58 | Sree Harsha Totakura | New Issue | |
2015-10-28 17:58 | Sree Harsha Totakura | Status | new => assigned |
2015-10-28 17:58 | Sree Harsha Totakura | Assigned To | => Sree Harsha Totakura |
2015-10-28 18:02 | Sree Harsha Totakura | Relationship added | related to 0004035 |
2015-10-29 11:07 | Christian Grothoff | Note Added: 0009829 | |
2015-10-29 11:09 | Christian Grothoff | Note Added: 0009830 | |
2015-10-29 11:31 | Christian Grothoff | Note Added: 0009833 | |
2015-10-29 11:32 | Christian Grothoff | Note Added: 0009834 | |
2015-10-29 11:37 | Sree Harsha Totakura | Note Added: 0009836 | |
2015-10-29 11:40 | Sree Harsha Totakura | Note Added: 0009837 | |
2015-10-29 11:46 | Christian Grothoff | Note Added: 0009838 | |
2015-10-29 11:49 | Christian Grothoff | Note Added: 0009840 | |
2015-10-29 11:51 | Christian Grothoff | Note Added: 0009841 | |
2015-10-29 11:55 | Christian Grothoff | Note Added: 0009842 | |
2015-10-29 13:38 | Sree Harsha Totakura | Note Added: 0009846 | |
2015-10-29 13:41 | Sree Harsha Totakura | Note Added: 0009847 | |
2015-10-29 13:44 | Sree Harsha Totakura | Note Added: 0009848 | |
2015-10-29 14:13 | Christian Grothoff | Note Added: 0009852 | |
2015-10-29 17:10 | Sree Harsha Totakura | Note Added: 0009855 | |
2015-10-29 17:21 | Sree Harsha Totakura | Note Added: 0009856 | |
2017-02-26 02:16 | Christian Grothoff | Product Version | => Git master |
2017-02-26 02:16 | Christian Grothoff | Target Version | => 0.11.0 |
2018-06-07 00:34 | Christian Grothoff | Status | assigned => confirmed |
2018-06-07 00:34 | Christian Grothoff | Target Version | 0.11.0 => |