View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0002819 | GNUnet | ARM service | public | 2013-03-06 22:38 | 2013-12-24 20:55 |
| Reporter | LRN | Assigned To | LRN | ||
| Priority | normal | Severity | minor | Reproducibility | always |
| Status | closed | Resolution | fixed | ||
| Product Version | Git master | ||||
| Target Version | 0.10.0 | Fixed in Version | 0.10.0 | ||
| Summary | 0002819: gnunet-arm -I never times out when arm is not running | ||||
| Description | subj | ||||
| Additional Information | Within a few seconds (2-5, regardless of what -T is specified) this is printed into the log:INFO Trying to connect to `[::1]:2087' (0x4f73b8) INFO Trying to connect to `127.0.0.1:2087' (0x4f73b8) INFO Failed to establish TCP connection to `localhost:2087', no further addresses to try. ERROR Assertion failed at arm_api.c:683. Error communicating with ARM. ARM not running? INFO Trying to connect to `[::1]:2087' (0x4efcc8) INFO Trying to connect to `127.0.0.1:2087' (0x4efcc8) INFO Failed to establish TCP connection to `localhost:2087', no further addresses to try. And the process hangs up until it's killed. | ||||
| Tags | No tags attached. | ||||
|
|
Should now work with recent ARM changes. |
|
|
mwachs@fulcrum:~/gnunet/gnunet-debug$ time gnunet-arm -I -T 20 Failed to request a list of services: Request timed out Error communicating with ARM. ARM not running? real 0m0.024s user 0m0.004s sys 0m0.000s Working! |
|
|
But with -e I still have to interrupt when not using an timeout: mwachs@fulcrum:~/gnunet/gnunet-debug$ time gnunet-arm -e ^CFailed to send a stop request to the ARM service: We disconnected from ARM before we could send a request !!!-> real 0m40.313s user 0m0.008s sys 0m0.000s vs mwachs@fulcrum:~/gnunet/gnunet-debug$ time gnunet-arm -e -T 5 Failed to send a stop request to the ARM service: Request timed out !!!-> real 0m0.010s user 0m0.000s sys 0m0.004s |
|
|
That really depends on what behaviour do we want. Right now gnunet-arm doesn't do any smart-guessing, it just does what it is told, in a particular order (in case you ask multiple things at once). Which means that any operation other than "Start ARM" will hang up (until it times out; default timeout is 1-2 minutes) if ARM is not running already. It does try to connect to ARM over that period of time, so if ARM is started (by something else), it will finally connect (do note that reconnect delay increases exponentially) and do what it was told to do. So...what behaviour do you want instead of that? Should it immediately refuse to do anything (other than "Start ARM") if ARM is not running (and if "Start ARM" is not among the things it was told to do)? |
|
|
From my perspective (user perspective): If arm is not running and I called gnunet-arm -e: Immediate shutdown + "Error communicating with ARM. ARM not running?" Even better just a simplified message: "ARM not running?" |
|
|
What if i run: gnunet-arm -k resolver -r (stop `resolver' service, restart ARM)? gnunet-arm -r (restart ARM)? gnunet-arm -e -s (stop ARM, start ARM)? What should it do if ARM is not running when you run gnunet-arm with these arguments? Right now gnunet-arm is hard-coded to do all operations in this order: 1) kill a service, if asked to kill a service 2) stop ARM, if asked to stop ARM or restart ARM 3) start ARM, if asked to start ARM or restart ARM 4) run a service, if asked to run a service 5) get a list of running services, if asked to get a list of running services 6) finish (gnunet-arm exits). Since there is no API to check whether ARM is running or not (other than running a service test), if ARM isn't running, the step (1) will always hang up until ARM is started, or until it times out. This can be fixed by running a service test immediately on startup, and using the result to determine whether to do any of the actions the use asked to do. Still, even if we know that ARM is not running, that leaves the question: what should we do? Especially if '-s' or '-r' is one of the arguments. |
|
|
The bigger issue is this: you cannot quickly tell that ARM is not running. ARM might just be starting, and it might just take a while for ARM to start listening. Imagine this shell command: $ gnunet-arm -s; gnunet-arm -e This *must* succeed with ARM not running anymore in the end on virtually all systems. gnunet-arm -s will terminate, possibly *before* gnunet-service-arm has opened the listen socket. Thus gnunet-arm -e must wait some time to be sure ARM is not just 'starting' / taking a while. |
|
|
Options: 1) Make sure -s doesn't return until ARM is really ready. 2) Make sure gnunet-arm waits enough time (how much is 'enough'?) before it concludes that ARM isn't running. (1) is trivial to implement, for -s. A pity we can't implement the same for -i. (2) requires some measurements. What is the worst startup time for ARM? Also: A) Keep things as-is (i.e. if you ask gnunet-arm to do things that require ARM connection, and ARM is not running, then it tries until it times out). B) Keep things as-is, except when -s is also specified. If it is, start ARM first (regardless of what other options are specified), then do the rest. (A) is simple and straightforward (users always have the option of using -T to prevent gnunet-arm from hanging up too long). (B) allows users to just add `-s' everywhere, so if ARM isn't running, it will be started. That said, i'm not sure how ARM will perform if multiple instances are started concurrently (which one will win? Is there a possibility that none will be able to initialize correctly due to massive toe-stepping?). P.S. Do note that -T takes milliseconds, unless you explicitly add the unit of time measure (i.e. "-T 5s" is 5 seconds, "-T 5" is 5 milliseconds). |
|
|
I'm actually fine with the situation as-is. |
|
|
One more remark: somehow I had expected that for the various ARM operations (service list, service start/stop) there'd now be some operation handle returned that could be used to cancel the operation (or at least, the calling of the callback, as the operation may have of course already be in progress in gnunet-service-arm). Is that still going to happen, or is there a reason not to do this? |
|
|
Oh, and I don't understand the split in 'monitor_alloc' and 'monitor'. That should be ONE function. There is no reason to split the API like that AFAIK. |
|
|
I can add operation handles, if you think there's a demand for them. _monitor_alloc and _monitor are split because _alloc and _connect are split too (i've made a copy of the code, then adjusted it to monitor semantics). As-is, they may be merged into one function indeed. That said, monitor code has one bad feature - it has no "connected/disconnected" callback, which means that it can't report misconfiguration errors (i.e. when CLIENT_connect() fails due to bad config). If i add connection status callback to _monitor(), then the split between _monitor_alloc() and _monitor() may be necessary after all. |
|
|
I think operation handles would make sense; how would you start a service in the GUI and handle the case that the user hits the "abort" button because ARM takes too long to respond? Also, I think the 'ARM_connect' function is already a misnomer, as it is not the dual of 'ARM_disconnect'. ARM_connect should be "list_services" or something like that. (and maybe the ARM_disconnect should be renamed to 'ARM_free', to mirror 'ARM_alloc'). In any case, there is no need for this in the monitor API, so we should reduce the number of functions there. The monitor API is fine with respect to misconfiguration: all you do is return 'NULL' from monitor_start IFF CLIENT_connect fails (returns NULL). Otherwise, the configuration is fine and CLIENT_connect can always be (safely) re-tried (reconnect, etc.). All of our APIs are going towards NOT reporting asynchronous errors for CLIENT_connect but only return NULL if the cfg is bogus. |
|
|
Wait, what? "ARM_connect" -> "list_services"? ARM_connect establishes a connection to ARM, and tells you when it goes up or down. This connection can be used to start/stop/list services. ARM_disconnect disconnects, and also frees the handle. So, why "list_services"? Maybe simply renaming "ARM_disconnect" to "ARM_disconnect_and_free" would do? |
|
|
Ok, so can we call this bug resolved? I don't see anything urgent that is left... |
|
|
If you like things to remain as-is - then yes. I haven't implemented operation handles yet, but since they are not needed for anything (yet), they can wait (and are not really related to the original issue). Call it resolved. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2013-03-06 22:38 | LRN | New Issue | |
| 2013-03-14 08:35 | LRN | Relationship added | has duplicate 0002827 |
| 2013-03-14 08:35 | LRN | Note Added: 0006964 | |
| 2013-03-14 09:41 | Matthias Wachs | Note Added: 0006968 | |
| 2013-03-14 09:42 | Matthias Wachs | Note Edited: 0006968 | |
| 2013-03-14 09:44 | Matthias Wachs | Note Added: 0006969 | |
| 2013-03-14 10:18 | LRN | Note Added: 0006970 | |
| 2013-03-14 10:30 | Matthias Wachs | Note Added: 0006973 | |
| 2013-03-14 10:31 | Matthias Wachs | Note Edited: 0006973 | |
| 2013-03-14 10:53 | LRN | Note Added: 0006974 | |
| 2013-03-14 13:03 | Christian Grothoff | Note Added: 0006976 | |
| 2013-03-14 13:40 | LRN | Note Added: 0006977 | |
| 2013-03-18 13:03 | Christian Grothoff | Note Added: 0006988 | |
| 2013-03-18 13:06 | Christian Grothoff | Note Added: 0006989 | |
| 2013-03-18 13:07 | Christian Grothoff | Note Added: 0006990 | |
| 2013-03-18 13:14 | LRN | Note Added: 0006991 | |
| 2013-03-18 13:32 | Christian Grothoff | Note Added: 0006992 | |
| 2013-03-18 13:44 | LRN | Note Added: 0006993 | |
| 2013-03-22 09:50 | Christian Grothoff | Note Added: 0007001 | |
| 2013-03-22 09:50 | Christian Grothoff | Assigned To | => LRN |
| 2013-03-22 09:50 | Christian Grothoff | Status | new => feedback |
| 2013-03-22 10:39 | LRN | Note Added: 0007002 | |
| 2013-03-22 10:39 | LRN | Status | feedback => assigned |
| 2013-03-22 10:42 | Christian Grothoff | Status | assigned => resolved |
| 2013-03-22 10:42 | Christian Grothoff | Fixed in Version | => 0.10.0 |
| 2013-03-22 10:42 | Christian Grothoff | Resolution | open => fixed |
| 2013-03-22 10:42 | Christian Grothoff | Product Version | => Git master |
| 2013-03-22 10:42 | Christian Grothoff | Target Version | => 0.10.0 |
| 2013-12-24 20:55 | Christian Grothoff | Status | resolved => closed |