View Issue Details

IDProjectCategoryView StatusLast Update
0002819GNUnetARM servicepublic2013-12-24 20:55
ReporterLRN Assigned ToLRN  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.10.0Fixed in Version0.10.0 
Summary0002819: gnunet-arm -I never times out when arm is not running
Descriptionsubj
Additional InformationWithin a few seconds (2-5, regardless of what -T is specified) this is printed into the log:
INFO Trying to connect to `[::1]:2087' (0x4f73b8)
INFO Trying to connect to `127.0.0.1:2087' (0x4f73b8)
INFO Failed to establish TCP connection to `localhost:2087', no further addresses to try.
ERROR Assertion failed at arm_api.c:683.
Error communicating with ARM. ARM not running?
INFO Trying to connect to `[::1]:2087' (0x4efcc8)
INFO Trying to connect to `127.0.0.1:2087' (0x4efcc8)
INFO Failed to establish TCP connection to `localhost:2087', no further addresses to try.

And the process hangs up until it's killed.
TagsNo tags attached.

Relationships

has duplicate 0002827 closedLRN gnunet-arm -I does not terminate and does work when arm is not running 

Activities

LRN

2013-03-14 08:35

reporter   ~0006964

Should now work with recent ARM changes.

Matthias Wachs

2013-03-14 09:41

reporter   ~0006968

Last edited: 2013-03-14 09:42

mwachs@fulcrum:~/gnunet/gnunet-debug$ time gnunet-arm -I -T 20
Failed to request a list of services: Request timed out
Error communicating with ARM. ARM not running?

real 0m0.024s
user 0m0.004s
sys 0m0.000s

Working!

Matthias Wachs

2013-03-14 09:44

reporter   ~0006969

But with -e I still have to interrupt when not using an timeout:

mwachs@fulcrum:~/gnunet/gnunet-debug$ time gnunet-arm -e
^CFailed to send a stop request to the ARM service: We disconnected from ARM before we could send a request

!!!-> real 0m40.313s
user 0m0.008s
sys 0m0.000s


vs

mwachs@fulcrum:~/gnunet/gnunet-debug$ time gnunet-arm -e -T 5
Failed to send a stop request to the ARM service: Request timed out

!!!-> real 0m0.010s
user 0m0.000s
sys 0m0.004s

LRN

2013-03-14 10:18

reporter   ~0006970

That really depends on what behaviour do we want.
Right now gnunet-arm doesn't do any smart-guessing, it just does what it is told, in a particular order (in case you ask multiple things at once).
Which means that any operation other than "Start ARM" will hang up (until it times out; default timeout is 1-2 minutes) if ARM is not running already. It does try to connect to ARM over that period of time, so if ARM is started (by something else), it will finally connect (do note that reconnect delay increases exponentially) and do what it was told to do.

So...what behaviour do you want instead of that? Should it immediately refuse to do anything (other than "Start ARM") if ARM is not running (and if "Start ARM" is not among the things it was told to do)?

Matthias Wachs

2013-03-14 10:30

reporter   ~0006973

Last edited: 2013-03-14 10:31

From my perspective (user perspective):

If arm is not running and I called gnunet-arm -e:

Immediate shutdown + "Error communicating with ARM. ARM not running?"

Even better just a simplified message: "ARM not running?"

LRN

2013-03-14 10:53

reporter   ~0006974

What if i run:
gnunet-arm -k resolver -r (stop `resolver' service, restart ARM)?
gnunet-arm -r (restart ARM)?
gnunet-arm -e -s (stop ARM, start ARM)?
What should it do if ARM is not running when you run gnunet-arm with these arguments?

Right now gnunet-arm is hard-coded to do all operations in this order:
1) kill a service, if asked to kill a service
2) stop ARM, if asked to stop ARM or restart ARM
3) start ARM, if asked to start ARM or restart ARM
4) run a service, if asked to run a service
5) get a list of running services, if asked to get a list of running services
6) finish (gnunet-arm exits).

Since there is no API to check whether ARM is running or not (other than running a service test), if ARM isn't running, the step (1) will always hang up until ARM is started, or until it times out.
This can be fixed by running a service test immediately on startup, and using the result to determine whether to do any of the actions the use asked to do. Still, even if we know that ARM is not running, that leaves the question: what should we do? Especially if '-s' or '-r' is one of the arguments.

Christian Grothoff

2013-03-14 13:03

manager   ~0006976

The bigger issue is this: you cannot quickly tell that ARM is not running.

ARM might just be starting, and it might just take a while for ARM to start listening. Imagine this shell command:

$ gnunet-arm -s; gnunet-arm -e

This *must* succeed with ARM not running anymore in the end on virtually all systems. gnunet-arm -s will terminate, possibly *before* gnunet-service-arm has opened the listen socket. Thus gnunet-arm -e must wait some time to be sure ARM is not just 'starting' / taking a while.

LRN

2013-03-14 13:40

reporter   ~0006977

Options:
1) Make sure -s doesn't return until ARM is really ready.
2) Make sure gnunet-arm waits enough time (how much is 'enough'?) before it concludes that ARM isn't running.

(1) is trivial to implement, for -s. A pity we can't implement the same for -i.
(2) requires some measurements. What is the worst startup time for ARM?

Also:
A) Keep things as-is (i.e. if you ask gnunet-arm to do things that require ARM connection, and ARM is not running, then it tries until it times out).
B) Keep things as-is, except when -s is also specified. If it is, start ARM first (regardless of what other options are specified), then do the rest.

(A) is simple and straightforward (users always have the option of using -T to prevent gnunet-arm from hanging up too long).
(B) allows users to just add `-s' everywhere, so if ARM isn't running, it will be started. That said, i'm not sure how ARM will perform if multiple instances are started concurrently (which one will win? Is there a possibility that none will be able to initialize correctly due to massive toe-stepping?).

P.S. Do note that -T takes milliseconds, unless you explicitly add the unit of time measure (i.e. "-T 5s" is 5 seconds, "-T 5" is 5 milliseconds).

Christian Grothoff

2013-03-18 13:03

manager   ~0006988

I'm actually fine with the situation as-is.

Christian Grothoff

2013-03-18 13:06

manager   ~0006989

One more remark: somehow I had expected that for the various ARM operations (service list, service start/stop) there'd now be some operation handle returned that could be used to cancel the operation (or at least, the calling of the callback, as the operation may have of course already be in progress in gnunet-service-arm). Is that still going to happen, or is there a reason not to do this?

Christian Grothoff

2013-03-18 13:07

manager   ~0006990

Oh, and I don't understand the split in 'monitor_alloc' and 'monitor'. That should be ONE function. There is no reason to split the API like that AFAIK.

LRN

2013-03-18 13:14

reporter   ~0006991

I can add operation handles, if you think there's a demand for them.

_monitor_alloc and _monitor are split because _alloc and _connect are split too (i've made a copy of the code, then adjusted it to monitor semantics).
As-is, they may be merged into one function indeed.

That said, monitor code has one bad feature - it has no "connected/disconnected" callback, which means that it can't report misconfiguration errors (i.e. when CLIENT_connect() fails due to bad config). If i add connection status callback to _monitor(), then the split between _monitor_alloc() and _monitor() may be necessary after all.

Christian Grothoff

2013-03-18 13:32

manager   ~0006992

I think operation handles would make sense; how would you start a service in the GUI and handle the case that the user hits the "abort" button because ARM takes too long to respond?

Also, I think the 'ARM_connect' function is already a misnomer, as it is not the dual of 'ARM_disconnect'. ARM_connect should be "list_services" or something like that. (and maybe the ARM_disconnect should be renamed to 'ARM_free', to mirror 'ARM_alloc'). In any case, there is no need for this in the monitor API, so we should reduce the number of functions there.

The monitor API is fine with respect to misconfiguration: all you do is return 'NULL' from monitor_start IFF CLIENT_connect fails (returns NULL). Otherwise, the configuration is fine and CLIENT_connect can always be (safely) re-tried (reconnect, etc.). All of our APIs are going towards NOT reporting asynchronous errors for CLIENT_connect but only return NULL if the cfg is bogus.

LRN

2013-03-18 13:44

reporter   ~0006993

Wait, what? "ARM_connect" -> "list_services"?
ARM_connect establishes a connection to ARM, and tells you when it goes up or down. This connection can be used to start/stop/list services.
ARM_disconnect disconnects, and also frees the handle. So, why "list_services"?

Maybe simply renaming "ARM_disconnect" to "ARM_disconnect_and_free" would do?

Christian Grothoff

2013-03-22 09:50

manager   ~0007001

Ok, so can we call this bug resolved? I don't see anything urgent that is left...

LRN

2013-03-22 10:39

reporter   ~0007002

If you like things to remain as-is - then yes.
I haven't implemented operation handles yet, but since they are not needed for anything (yet), they can wait (and are not really related to the original issue).

Call it resolved.

Issue History

Date Modified Username Field Change
2013-03-06 22:38 LRN New Issue
2013-03-14 08:35 LRN Relationship added has duplicate 0002827
2013-03-14 08:35 LRN Note Added: 0006964
2013-03-14 09:41 Matthias Wachs Note Added: 0006968
2013-03-14 09:42 Matthias Wachs Note Edited: 0006968
2013-03-14 09:44 Matthias Wachs Note Added: 0006969
2013-03-14 10:18 LRN Note Added: 0006970
2013-03-14 10:30 Matthias Wachs Note Added: 0006973
2013-03-14 10:31 Matthias Wachs Note Edited: 0006973
2013-03-14 10:53 LRN Note Added: 0006974
2013-03-14 13:03 Christian Grothoff Note Added: 0006976
2013-03-14 13:40 LRN Note Added: 0006977
2013-03-18 13:03 Christian Grothoff Note Added: 0006988
2013-03-18 13:06 Christian Grothoff Note Added: 0006989
2013-03-18 13:07 Christian Grothoff Note Added: 0006990
2013-03-18 13:14 LRN Note Added: 0006991
2013-03-18 13:32 Christian Grothoff Note Added: 0006992
2013-03-18 13:44 LRN Note Added: 0006993
2013-03-22 09:50 Christian Grothoff Note Added: 0007001
2013-03-22 09:50 Christian Grothoff Assigned To => LRN
2013-03-22 09:50 Christian Grothoff Status new => feedback
2013-03-22 10:39 LRN Note Added: 0007002
2013-03-22 10:39 LRN Status feedback => assigned
2013-03-22 10:42 Christian Grothoff Status assigned => resolved
2013-03-22 10:42 Christian Grothoff Fixed in Version => 0.10.0
2013-03-22 10:42 Christian Grothoff Resolution open => fixed
2013-03-22 10:42 Christian Grothoff Product Version => Git master
2013-03-22 10:42 Christian Grothoff Target Version => 0.10.0
2013-12-24 20:55 Christian Grothoff Status resolved => closed