View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0005569 | GNUnet | other | public | 2019-02-14 23:57 | 2019-02-28 11:17 |
Reporter | nikita | Assigned To | Christian Grothoff | ||
Priority | normal | Severity | minor | Reproducibility | have not tried |
Status | closed | Resolution | fixed | ||
Platform | amd64 | OS | NetBSD | OS Version | CURRENT |
Product Version | Git master | ||||
Target Version | 0.11.0 | Fixed in Version | 0.11.0 | ||
Summary | 0005569: tests hang | ||||
Description | tests on their own pass (I think), but when run on the latest commit with a simple gmake check in the root of the source directory it gets seemingly random stuck in statistics. CG: I'm changing the title, as it seems this happens randomly in _any_ test on shutdown, at least for me. But with very low probability overall. | ||||
Additional Information | gmake[3]: Entering directory '/home/ng0/src/gnunet/gnunet/src/statistics' gmake[4]: Entering directory '/home/ng0/src/gnunet/gnunet/src/statistics' PASS: test_statistics_api PASS: test_statistics_api_loop PASS: test_statistics_api_watch PASS: test_statistics_api_watch_zero_value [hangs] | ||||
Tags | No tags attached. | ||||
|
wrong, they also hang with simply running in src/statistics. No idea why I assumed a difference. |
|
It's confusing because in the the python3.7 migration ticket I tested them positively without hanging. |
|
I also see some tests randomly hang, with a process stuck like this: (gdb) ba #0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:63 #1 0x00007f6ea4dfddcb in __unregister_atfork (dso_handle=0x7f6ea4eb5940 <atfork_lock>, dso_handle@entry=0x55a865ebd0b8) at register-atfork.c:80 #2 0x00007f6ea4d314a9 in __cxa_finalize (d=0x55a865ebd0b8) at cxa_finalize.c:107 #3 0x000055a865eba233 in __do_global_dtors_aux () #4 0x00007ffe871814b0 in ?? () #5 0x00007f6ea510f686 in _dl_fini () at dl-fini.c:138 Backtrace stopped: frame did not save the PC This is VERY, very odd. |
|
Adding amatus to the monitors on this, as he might have a clue as expert for obscure issues. |
|
Likely fixed as of e98a4e07e..4611c473f. For posterity, here's Florian Weimer's diagnosis of this one: * Christian Grothoff: > I'm seeing some _very_ odd behavior with processes hanging on exit (?) > with GNU libc 2.28-6 on Debian (amd64 threadripper). This seems to > happen at random (for random tests, with very low frequency!) in the > GNUnet (Git master) testsuite when a child process is about to exit. It looks like you call exit from a signal handler, see src/util/scheduler.c: /** * Signal handler called for signals that should cause us to shutdown. */ static void sighandler_shutdown () { static char c; int old_errno = errno; /* backup errno */ if (getpid () != my_pid) exit (1); /* we have fork'ed since the signal handler was created, * ignore the signal, see https://gnunet.org/vfork discussion */ GNUNET_DISK_file_write (GNUNET_DISK_pipe_handle (shutdown_pipe_handle, GNUNET_DISK_PIPE_END_WRITE), &c, sizeof (c)); errno = old_errno; } In general, this results in undefined behavior because exit (unlike _exit) is not an async-signal-safe function. I suspect you either call the exit function while a fork is in progress, or since you register this signal handler multiple times for different signals: sh->shc_int = GNUNET_SIGNAL_handler_install (SIGINT, &sighandler_shutdown); sh->shc_term = GNUNET_SIGNAL_handler_install (SIGTERM, &sighandler_shutdown); one call to exit might interrupt another call to exit if both signals are delivered to the process. The deadlock you see was introduced in commit 27761a1042daf01987e7d79636d0c41511c6df3c ("Refactor atfork handlers"), first released in glibc 2.28. The fork deadlock will be gone (in the single-threaded case) if Debian updates to the current release/2.28/master branch because we backported commit 60f80624257ef84eacfd9b400bda1b5a5e8e7816 ("nptl: Avoid fork handler lock for async-signal-safe fork [BZ #24161]") there. But this will not help you. Even without the deadlock, I expect you still experience some random corruption during exit, but it's going to be difficult to spot. Thanks, Florian |
Date Modified | Username | Field | Change |
---|---|---|---|
2019-02-14 23:57 | nikita | New Issue | |
2019-02-14 23:59 | nikita | OS | => NetBSD |
2019-02-14 23:59 | nikita | OS Version | => CURRENT |
2019-02-14 23:59 | nikita | Platform | => amd64 |
2019-02-14 23:59 | nikita | Product Version | => Git master |
2019-02-15 00:00 | nikita | Note Added: 0013786 | |
2019-02-15 00:06 | nikita | Note Added: 0013787 | |
2019-02-15 00:23 | Christian Grothoff | Note Added: 0013788 | |
2019-02-15 00:24 | Christian Grothoff | Note Added: 0013789 | |
2019-02-16 14:49 | Christian Grothoff | Assigned To | => Christian Grothoff |
2019-02-16 14:49 | Christian Grothoff | Status | new => assigned |
2019-02-16 15:48 | Christian Grothoff | Category | statistics service => other |
2019-02-16 15:48 | Christian Grothoff | Summary | statistics tests hang => tests hang |
2019-02-16 15:48 | Christian Grothoff | Description Updated | |
2019-02-16 15:48 | Christian Grothoff | Additional Information Updated | |
2019-02-16 21:21 | Christian Grothoff | Status | assigned => resolved |
2019-02-16 21:21 | Christian Grothoff | Resolution | open => fixed |
2019-02-16 21:21 | Christian Grothoff | Note Added: 0013833 | |
2019-02-16 21:21 | Christian Grothoff | Fixed in Version | => 0.11.0 |
2019-02-16 21:21 | Christian Grothoff | Target Version | => 0.11.0 |
2019-02-28 11:17 | Christian Grothoff | Status | resolved => closed |