View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003799 | GNUnet | other | public | 2015-05-29 02:09 | 2018-06-07 00:24 |
Reporter | jah | Assigned To | Christian Grothoff | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Platform | x86_64 | OS | Trisquel | OS Version | 7.0, Belenos |
Product Version | Git master | ||||
Target Version | 0.11.0pre66 | Fixed in Version | 0.11.0pre66 | ||
Summary | 0003799: gnunet-publish client hangs after unindex error | ||||
Description | The publish-service won't publish files (for some reason as yet undetermined) and the client receives a GNUNET_FS_STATUS_UNINDEX_START which causes it to hang until the user sends SIGINT. Then the client receives GNUNET_FS_STATUS_UNINDEX_ERROR which again causes it to hang until a further SIGINT. Then the client process ends with a non-zero exit status. | ||||
Steps To Reproduce | $ gnunet-publish -s Music/some.mp3 Publishing `/path/to/Music/some.mp3' done. URI is `gnunet://fs/chk/DF8D[snipped].5853556'. $ gnunet-publish -V Music/some.mp3 Preprocessing complete. Ctrl+c ^CError publishing: Insufficient space for publishing: Failed to receive status response from database.. Unexpected status: 30 Ctrl+c ^CUnexpected status: 34 | ||||
Additional Information | I've attached the output from a gdb session: publish.gdb.out. Gnunet is in multi-user mode and publishing is being attempted as a user in the gnunet group. I've no idea why publishing isn't working, but the problem isn't relevant to this issue with the client anyway... In gnunet-publish-gtk the issue manifests as a progress bar stuck at 0%. | ||||
Tags | No tags attached. | ||||
Attached Files | gnunet-publish_unistop.patch (479 bytes)
Index: gnunet-publish.c =================================================================== --- gnunet-publish.c (revision 35841) +++ gnunet-publish.c (working copy) @@ -312,6 +312,9 @@ _("Cleanup after abort failed.\n")); GNUNET_FS_unindex_stop (info->value.unindex.uc); return NULL; + case GNUNET_FS_STATUS_UNINDEX_STOPPED: + GNUNET_break (NULL == pc); + return NULL; default: FPRINTF (stderr, _("Unexpected status: %d\n"), | ||||
|
I think the real problem here is that your user is not in the group 'gnunet', and that's why you are hitting some (admittedly buggy) error handling paths in the logic. So add your $USER to the 'gnunet' group so that you can access the datastore, re-login and hopefully this will 'go away'. |
|
Ok, tried to fix the error handling in SvN 35841, so if you didn't fix the 'basic' issue of your user not being in 'gnunet', I'd appreciate feedback on whether the fix works. |
|
r35841 doesn't help much:- $ gnunet-publish -V Music/various/08-luniz-i_got_5_on_it.mp3 Preprocessing complete. <hangs> Ctrl-c ^CError publishing: Insufficient space for publishing: Failed to receive status response from database.. Starting cleanup after abort <hangs> Ctrl-c ^CCleanup after abort failed. Unexpected status: 36 $ By the way, the user already is in the gnunet group:- $ id -G gnunet 130 $ id -G 1000 4 24 27 30 46 109 122 130 |
|
Hmm. That suggests my initial diagnostic of where the issue came from was wrong. What do you get for gnunet-statistics -s datastore |
|
To better reproduce the issue: set GNUNET_RUNTIME_DIR in the system's config to something other than /tmp/gnunet-system-runtime (e.g. /tmp/gnunet). Attached is publish.debug.log which was run through the following to remove config loading messages:- grep -v 'DEBUG Asked' | grep -v 'DEBUG Retrieved' | grep -v 'DEBUG Expanded' | grep -v 'DEBUG Config' | grep -iv 'deserializing' | grep -iv tmp SIGINTs were sent at 'May 30 13:13:01' and 'May 30 13:18:25', which roughly correspond to line 5362 and the end of the file. |
|
The reason this issue happened for me was that I'd set GNUNET_RUNTIME_DIR incorrectly so that the client processes were trying to talk over the default named sockets, but the services were configured differently (and erroneously). I would have realised this sooner if I'd run gnunet-publish with DEBUG logging. |
|
It looks as if the requests to the datastore only timed-out because of the SIGINTs:- grep -i datastore publish.debug.log | grep -v 'Failed to connect' ... May 30 13:13:01-381249 datastore-api-5012 DEBUG Timeout of request in datastore queue ... May 30 13:18:25-436292 datastore-api-5012 DEBUG Timeout of request in datastore queue ... |
|
Ok, so that's still equivalent to not being in the 'gnunet' group: you didn't communicate because you had the wrong path, I suspected you had the wrong permissions. Still, the error handling should be fixed. |
|
Is the expected behaviour of the client to attempt repeatedly attempt to transmit over a unix socket that has no listener, forever? Would it make more sense to detect the absence of anything listening at the other end? |
|
Well, the problem is that the listener may (re)start anytime. So in general GNUnet's client APIs all are supposed to always keep trying until it works. So yes, this is expected behavior. |
|
Understood, thank you. Attached is a patch to handle GNUNET_FS_STATUS_UNINDEX_STOPPED (see #c9190, above). |
|
Is the problem fixed with that patch? (Still need to look at it...) |
|
My patch just handles one more status message, but please do check that it's the right thing to do. As to your r35841, I will examine the difference it makes more closely and come back to you. Certainly the behaviour now is "Sending REMOVE request to DATASTORE service" (see publish.debug.log:5470) after sending the first of the two SIGINTs required to halt the client. |
|
Well, the line + GNUNET_break (NULL == pc); is a bit unnatural in the patch, as theoretically the unindex could complete before the publish stop is finished. So I'll leave that one out. Also, we should fix it so we do not need two CTRL-C. I'll look into this, now that the diagnostic is a bit clearer. |
|
Reproduced it by simply not starting the peer. Fixed the need for double-CTRL-C by finding the 2nd place where the unindex-on-error was initiated and then skipping unindex if we never stored anything in the DB (which is the case on connect failure). Also improved the error message (SVN 35849). Thanks for reporting! |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-05-29 02:09 | jah | New Issue | |
2015-05-29 02:09 | jah | File Added: publish.gdb.out | |
2015-05-29 14:07 | Christian Grothoff | Assigned To | => Christian Grothoff |
2015-05-29 14:07 | Christian Grothoff | Status | new => assigned |
2015-05-29 14:46 | Christian Grothoff | Note Added: 0009170 | |
2015-05-29 15:07 | Christian Grothoff | Note Added: 0009179 | |
2015-05-29 15:07 | Christian Grothoff | Status | assigned => feedback |
2015-05-29 19:24 | jah | Note Added: 0009190 | |
2015-05-29 19:24 | jah | Status | feedback => assigned |
2015-05-29 20:43 | Christian Grothoff | Note Added: 0009191 | |
2015-05-30 14:29 | jah | File Added: publish.debug.log | |
2015-05-30 14:29 | jah | Note Added: 0009195 | |
2015-05-30 14:38 | jah | Note Added: 0009196 | |
2015-05-30 15:19 | jah | Note Added: 0009197 | |
2015-05-30 21:25 | jah | Note Edited: 0009197 | |
2015-05-31 12:58 | Christian Grothoff | Note Added: 0009198 | |
2015-05-31 16:58 | jah | Note Added: 0009205 | |
2015-05-31 17:15 | Christian Grothoff | Note Added: 0009207 | |
2015-05-31 18:10 | jah | File Added: gnunet-publish_unistop.patch | |
2015-05-31 18:12 | jah | Note Added: 0009208 | |
2015-05-31 18:59 | Christian Grothoff | Note Added: 0009209 | |
2015-05-31 19:28 | jah | Note Added: 0009210 | |
2015-05-31 21:36 | Christian Grothoff | Note Added: 0009213 | |
2015-05-31 21:45 | Christian Grothoff | Note Added: 0009214 | |
2015-05-31 21:45 | Christian Grothoff | Status | assigned => resolved |
2015-05-31 21:45 | Christian Grothoff | Fixed in Version | => 0.11.0pre66 |
2015-05-31 21:45 | Christian Grothoff | Resolution | open => fixed |
2015-05-31 21:45 | Christian Grothoff | Target Version | => 0.11.0pre66 |
2018-06-07 00:24 | Christian Grothoff | Status | resolved => closed |