View Issue Details

IDProjectCategoryView StatusLast Update
0002780GNUnettestbed servicepublic2013-12-24 20:55
ReporterBart Polot Assigned ToSree Harsha Totakura  
PrioritylowSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.10.0Fixed in Version0.10.0 
Summary0002780: Testbed doesn't clean up properly on error (II)
DescriptionI start regex-profiler.
I see lots of gnunet-rsa going on
I get lots of:
Feb 07 12:59:42-846187 testbed-27207 WARNING A forwarded operation has timed out
probably because of the whole RSA-ECC port

I am left with 518 processes doing:
find / -mount -type f -exec cp {} /dev/null ;

And eating up my CPU.
Additional Informationbart 5347 0.0 0.0 0 0 ? Z 12:59 0:00 \_ [gnunet-service-] <defunct>
bart 5352 0.0 0.0 23832 1488 ? S 12:59 0:00 \_ /tmp/bartgnunet/lib/gnunet/libexec/gnunet-ser
bart 7307 0.2 0.0 30244 1512 ? S 12:59 0:00 \_ /tmp/bartgnunet/lib/gnunet/libexec/gnunet
bart 10862 0.1 0.0 8192 1436 ? S 12:59 0:00 \_ find / -mount -type f -exec cp {} /de
TagsNo tags attached.
Attached Files
sl.c (430 bytes)   
#include <unistd.h>
#include <stdio.h>
#include <signal.h>

#define SLEEP 10

int main ()
{
  unsigned int t = SLEEP;
  
  if (SIG_ERR == signal (SIGTERM, SIG_IGN))
  {
    printf ("Error setting up sig handler\n");
    return 1;
  }
  if (SIG_ERR == signal (SIGINT, SIG_IGN))
  {
    printf ("Error setting up sig handler\n");
    return 1;
  }
  while (0 != (t = sleep (t)))
  {
    printf ("Interrupted.\n");
  }
  return 0;
}
sl.c (430 bytes)   

Relationships

parent of 0002781 closedSree Harsha Totakura API should facilitate killing of a peer with SIGKILL. 

Activities

Bart Polot

2013-02-07 13:06

reporter   ~0006841

ps xf shows:

 4656 ? S 0:00 \_ /tmp/bartgnunet/lib/gnunet/libexec/gnunet-service-arm -c /tmp/testbediwdZxm/265/co
 6234 ? S 0:00 | \_ /tmp/bartgnunet/lib/gnunet/libexec/gnunet-service-core -c /tmp/testbediwdZxm/26
 9621 ? S 0:00 | \_ find / -mount -type f -exec cp {} /dev/null ;

Bart Polot

2013-02-07 13:07

reporter   ~0006842

killall -9 find

ps aux | grep gnunet | wc
    555 7212 84382

Bart Polot

2013-02-07 13:13

reporter   ~0006843

Probably testbed should do something along the lines of 'pkill -9 gnunet' after a while of shutting down...

Sree Harsha Totakura

2013-02-07 13:36

updater   ~0006844

this is likely to be a problem with core not able to find the needed ECC keys and trying to generate them. The 'find' you observe is to generate the entropy needed for the random generator.

Sree Harsha Totakura

2013-02-07 13:38

updater   ~0006845

moving this to 'testing' section as it deals with distributing the key files.

Bart Polot

2013-02-07 13:40

reporter   ~0006846

My point is: the testbed has failed. After it fails it should not leave stuff running on the system. Therefore, killall -9 after a while (and before exiting the main process) is required.

Bart Polot

2013-02-07 13:42

reporter   ~0006847

The fact that the keys are not there is a separate bug. This is about testbed cleaning up after itself if things go badly.

Sree Harsha Totakura

2013-02-07 14:00

updater   ~0006848

moving back to testbed :)

Sree Harsha Totakura

2013-02-07 15:22

updater   ~0006851

The driver running the testbed should block in call to GNUNET_TESTBED_controller_stop() while killing testbed and should only return when testbed is terminated.

Sree Harsha Totakura

2013-02-12 16:53

updater   ~0006866

The problem seems to be related to the way SSH handles SIGTERM.

Compile the attached program sl.c and run through ssh as
"ssh localhost -o BatchMode=yes /tmp/a.out"

The process a.out will take about 10 seconds to complete. In the mean time, if you terminate the SSH connection by pressing Ctrl-C, you will observe that the process a.out is executing. This means that SSH is not waiting for the child processes to terminate.

Sree Harsha Totakura

2013-02-12 18:16

updater   ~0006867

fixed in SVN 26081 by signalling termination to HELPER by closing its STDIN.

Issue History

Date Modified Username Field Change
2013-02-07 13:05 Bart Polot New Issue
2013-02-07 13:05 Bart Polot Status new => assigned
2013-02-07 13:05 Bart Polot Assigned To => Sree Harsha Totakura
2013-02-07 13:06 Bart Polot Note Added: 0006841
2013-02-07 13:07 Bart Polot Note Added: 0006842
2013-02-07 13:13 Bart Polot Note Added: 0006843
2013-02-07 13:36 Sree Harsha Totakura Note Added: 0006844
2013-02-07 13:38 Sree Harsha Totakura Note Added: 0006845
2013-02-07 13:38 Sree Harsha Totakura Category testbed service => testing library
2013-02-07 13:38 Sree Harsha Totakura Summary Testbed doesn't clean up properly on error (II) => Host keys are not copied. Was "Testbed doesn't clean up properly on error (II)"
2013-02-07 13:40 Bart Polot Note Added: 0006846
2013-02-07 13:42 Bart Polot Note Added: 0006847
2013-02-07 14:00 Sree Harsha Totakura Note Added: 0006848
2013-02-07 14:00 Sree Harsha Totakura Category testing library => testbed service
2013-02-07 14:00 Sree Harsha Totakura Summary Host keys are not copied. Was "Testbed doesn't clean up properly on error (II)" => Testbed doesn't clean up properly on error (II)
2013-02-07 14:33 Sree Harsha Totakura Relationship added parent of 0002781
2013-02-07 15:22 Sree Harsha Totakura Note Added: 0006851
2013-02-12 16:53 Sree Harsha Totakura Note Added: 0006866
2013-02-12 16:53 Sree Harsha Totakura File Added: sl.c
2013-02-12 18:16 Sree Harsha Totakura Note Added: 0006867
2013-02-12 18:16 Sree Harsha Totakura Status assigned => resolved
2013-02-12 18:16 Sree Harsha Totakura Fixed in Version => 0.10.0
2013-02-12 18:16 Sree Harsha Totakura Resolution open => fixed
2013-03-18 14:56 Christian Grothoff Target Version => 0.10.0
2013-12-24 20:55 Christian Grothoff Status resolved => closed