View Issue Details

IDProjectCategoryView StatusLast Update
0005569GNUnetotherpublic2019-02-28 11:17
Reporterng0Assigned ToChristian Grothoff 
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Platformamd64OSNetBSDOS VersionCURRENT
Product VersionSVN HEAD 
Target Version0.11.0Fixed in Version0.11.0 
Summary0005569: tests hang
Descriptiontests on their own pass (I think), but when run on the latest commit with a simple gmake check in the root of the source directory it gets seemingly random stuck in statistics.

CG: I'm changing the title, as it seems this happens randomly in _any_ test on shutdown, at least for me. But with very low probability overall.
Additional Informationgmake[3]: Entering directory '/home/ng0/src/gnunet/gnunet/src/statistics'
gmake[4]: Entering directory '/home/ng0/src/gnunet/gnunet/src/statistics'
PASS: test_statistics_api
PASS: test_statistics_api_loop
PASS: test_statistics_api_watch
PASS: test_statistics_api_watch_zero_value
[hangs]
TagsNo tags attached.

Activities

ng0

2019-02-15 00:00

developer   ~0013786

wrong, they also hang with simply running in src/statistics.
No idea why I assumed a difference.

ng0

2019-02-15 00:06

developer   ~0013787

It's confusing because in the the python3.7 migration ticket I tested them positively without hanging.

Christian Grothoff

2019-02-15 00:23

manager   ~0013788

I also see some tests randomly hang, with a process stuck like this:

(gdb) ba
#0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:63
#1 0x00007f6ea4dfddcb in __unregister_atfork (dso_handle=0x7f6ea4eb5940 <atfork_lock>, dso_handle@entry=0x55a865ebd0b8) at register-atfork.c:80
#2 0x00007f6ea4d314a9 in __cxa_finalize (d=0x55a865ebd0b8) at cxa_finalize.c:107
#3 0x000055a865eba233 in __do_global_dtors_aux ()
#4 0x00007ffe871814b0 in ?? ()
#5 0x00007f6ea510f686 in _dl_fini () at dl-fini.c:138
Backtrace stopped: frame did not save the PC

This is VERY, very odd.

Christian Grothoff

2019-02-15 00:24

manager   ~0013789

Adding amatus to the monitors on this, as he might have a clue as expert for obscure issues.

Christian Grothoff

2019-02-16 21:21

manager   ~0013833

Likely fixed as of e98a4e07e..4611c473f. For posterity, here's Florian Weimer's diagnosis of this one:

* Christian Grothoff:

> I'm seeing some _very_ odd behavior with processes hanging on exit (?)
> with GNU libc 2.28-6 on Debian (amd64 threadripper). This seems to
> happen at random (for random tests, with very low frequency!) in the
> GNUnet (Git master) testsuite when a child process is about to exit.

It looks like you call exit from a signal handler, see
src/util/scheduler.c:

/**
 * Signal handler called for signals that should cause us to shutdown.
 */
static void
sighandler_shutdown ()
{
  static char c;
  int old_errno = errno; /* backup errno */

  if (getpid () != my_pid)
    exit (1); /* we have fork'ed since the signal handler was created,
                                 * ignore the signal, see https://gnunet.org/vfork discussion */
  GNUNET_DISK_file_write (GNUNET_DISK_pipe_handle
                          (shutdown_pipe_handle, GNUNET_DISK_PIPE_END_WRITE),
                          &c, sizeof (c));
  errno = old_errno;
}

In general, this results in undefined behavior because exit (unlike
_exit) is not an async-signal-safe function.

I suspect you either call the exit function while a fork is in progress,
or since you register this signal handler multiple times for different
signals:

  sh->shc_int = GNUNET_SIGNAL_handler_install (SIGINT,
                                               &sighandler_shutdown);
  sh->shc_term = GNUNET_SIGNAL_handler_install (SIGTERM,
                                                &sighandler_shutdown);

one call to exit might interrupt another call to exit if both signals
are delivered to the process.

The deadlock you see was introduced in commit
27761a1042daf01987e7d79636d0c41511c6df3c ("Refactor atfork handlers"),
first released in glibc 2.28. The fork deadlock will be gone (in the
single-threaded case) if Debian updates to the current
release/2.28/master branch because we backported commit
60f80624257ef84eacfd9b400bda1b5a5e8e7816 ("nptl: Avoid fork handler lock
for async-signal-safe fork [BZ #24161]") there.

But this will not help you. Even without the deadlock, I expect you
still experience some random corruption during exit, but it's going to
be difficult to spot.

Thanks,
Florian

Issue History

Date Modified Username Field Change
2019-02-14 23:57 ng0 New Issue
2019-02-14 23:59 ng0 OS => NetBSD
2019-02-14 23:59 ng0 OS Version => CURRENT
2019-02-14 23:59 ng0 Platform => amd64
2019-02-14 23:59 ng0 Product Version => SVN HEAD
2019-02-15 00:00 ng0 Note Added: 0013786
2019-02-15 00:06 ng0 Note Added: 0013787
2019-02-15 00:23 Christian Grothoff Note Added: 0013788
2019-02-15 00:24 Christian Grothoff Note Added: 0013789
2019-02-16 14:49 Christian Grothoff Assigned To => Christian Grothoff
2019-02-16 14:49 Christian Grothoff Status new => assigned
2019-02-16 15:48 Christian Grothoff Category statistics service => other
2019-02-16 15:48 Christian Grothoff Summary statistics tests hang => tests hang
2019-02-16 15:48 Christian Grothoff Description Updated View Revisions
2019-02-16 15:48 Christian Grothoff Additional Information Updated View Revisions
2019-02-16 21:21 Christian Grothoff Status assigned => resolved
2019-02-16 21:21 Christian Grothoff Resolution open => fixed
2019-02-16 21:21 Christian Grothoff Note Added: 0013833
2019-02-16 21:21 Christian Grothoff Fixed in Version => 0.11.0
2019-02-16 21:21 Christian Grothoff Target Version => 0.11.0
2019-02-28 11:17 Christian Grothoff Status resolved => closed