View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0004690 | GNUnet | rps service | public | 2016-09-27 18:33 | 2018-06-07 00:24 |
Reporter | ch3 | Assigned To | ch3 | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | Git master | ||||
Target Version | 0.11.0pre66 | Fixed in Version | 0.11.0pre66 | ||
Summary | 0004690: problems sending rps-service -> rps-api | ||||
Description | It seems like every message from the service to the api fails. (The other way round there is no problem at all.) To be able to look at the vack trace, I put GNUNET_assert (0); in the mq_error_handler () in the rps_api.c As far as I can tell from looking at the code, receive_helper (cls=0xaa3ae0, buf=0x7ffc6acce1b0, available=0, addr=0xaa42c0, addrlen=110, errCode=0) at client.c:533 decides, that this signal timed out. This is caused by being called with available=0 from receive_ready (cls=0xaa4340) at connection.c:1189. The 0 comes from a call to GNUNET_NETWORK_socket_recv (connection->sock, buffer, connection->max); previously in receive_ready () at connection.c:1170. From this point on I don't understand what's happening. The cause is probably some simple mistake I made during the migration to service_new. (gdb) bt #0 0x00007fc52704e04f in raise () from /usr/lib/libc.so.6 #1 0x00007fc52704f47a in abort () from /usr/lib/libc.so.6 #2 0x00007fc5293f25f7 in GNUNET_abort_ () at common_logging.c:293 #3 0x00007fc52966d549 in mq_error_handler (cls=0xab50f0, error=GNUNET_MQ_ERROR_READ) at rps_api.c:202 #4 0x00007fc529420945 in GNUNET_MQ_inject_error (mq=0xaa3b60, error=GNUNET_MQ_ERROR_READ) at mq.c:330 #5 0x00007fc529421d85 in handle_client_message (cls=0xaa3b60, msg=0x0) at mq.c:758 #6 0x00007fc5293eea41 in receive_helper (cls=0xaa3ae0, buf=0x7ffc6acce1b0, available=0, addr=0xaa42c0, addrlen=110, errCode=0) at client.c:533 #7 0x00007fc5293fb7e9 in receive_ready (cls=0xaa4340) at connection.c:1189 #8 0x00007fc529433107 in run_ready (rs=0x9e4b40, ws=0x9e4bd0) at scheduler.c:620 #9 0x00007fc529433a52 in GNUNET_SCHEDULER_run (task=0x7fc52942d93f <program_main>, task_cls=0x7ffc6acde4d0) at scheduler.c:887 #10 0x00007fc52942e517 in GNUNET_PROGRAM_run2 (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>, task_cls=0x9e3a00, run_without_scheduler=0) at program.c:300 #11 0x00007fc52942e66e in GNUNET_PROGRAM_run (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>, task_cls=0x9e3a00) at program.c:339 #12 0x00007fc5291cccc6 in GNUNET_TESTBED_test_run (testname=0x408381 "test-rps-single-req", cfg_filename=0x40856b "test_rps.conf", num_peers=5, event_mask=0, cc=0x0, cc_cls=0x0, test_master=0x405e47 <run>, test_master_cls=0x0) at testbed_api_test.c:151 #13 0x0000000000406ba7 in main (argc=1, argv=0x7ffc6acde7f8) at test_rps.c:1532 | ||||
Steps To Reproduce | put GNUNET_assert (0); into the mq_error_handler () at rps_api.c:192 and run one of test_rps_* | ||||
Additional Information | (gdb) bt f #0 0x00007fc52704e04f in raise () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007fc52704f47a in abort () from /usr/lib/libc.so.6 No symbol table info available. #2 0x00007fc5293f25f7 in GNUNET_abort_ () at common_logging.c:293 No locals. #3 0x00007fc52966d549 in mq_error_handler (cls=0xab50f0, error=GNUNET_MQ_ERROR_READ) at rps_api.c:202 h = 0xab50f0 __FUNCTION__ = "mq_error_handler" #4 0x00007fc529420945 in GNUNET_MQ_inject_error (mq=0xaa3b60, error=GNUNET_MQ_ERROR_READ) at mq.c:330 __FUNCTION__ = "GNUNET_MQ_inject_error" #5 0x00007fc529421d85 in handle_client_message (cls=0xaa3b60, msg=0x0) at mq.c:758 mq = 0xaa3b60 state = 0xab5180 #6 0x00007fc5293eea41 in receive_helper (cls=0xaa3ae0, buf=0x7ffc6acce1b0, available=0, addr=0xaa42c0, addrlen=110, errCode=0) at client.c:533 client = 0xaa3ae0 remaining = {rel_value_us = 18446744073709551615} receive_handler = 0x7fc529421d49 <handle_client_message> receive_handler_cls = 0xaa3b60 __FUNCTION__ = "receive_helper" #7 0x00007fc5293fb7e9 in receive_ready (cls=0xaa4340) at connection.c:1189 connection = 0xaa4340 tc = 0x7fc52966b0b0 <tc> buffer = "\022-\001\335\000\000\000\004\000\000\000\000\000\000\000\065\327\213!\005\tFI\377\vK\245\222\370\340\377\336(\254\335\366\376aO\005K\343\341D\347\337\065\002LZx\001\305\\[s\342H\262~\327\257\360D\354y\331X\256mc\217;\346A\200\000E\v\211\021\242۽\035\016\205\f\262\315\016\267E\302\333ލ9\277\375|Y\027\251@\302\b\214\367\304\304L\230\272dVf\345\275R\363c\276\231\305\323q\020\305\367\232\336j\031\003\317\357\270N\277q\361\333\305\355m\355\263:\206\241Z\375\272\\\305?\265\317y\177i=g\350\331z\337\300\312\331r\034̞\227Q\254\215l\363\316\357\353^\253\347w\315\066\246\276\033Cup\264\065\070н\036\326T\342\371"... ret = 0 receiver = 0x7fc5293ee7a9 <receive_helper> __FUNCTION__ = "receive_ready" #8 0x00007fc529433107 in run_ready (rs=0x9e4b40, ws=0x9e4bd0) at scheduler.c:620 p = GNUNET_SCHEDULER_PRIORITY_DEFAULT pos = 0xa923d0 __FUNCTION__ = "run_ready" #9 0x00007fc529433a52 in GNUNET_SCHEDULER_run (task=0x7fc52942d93f <program_main>, task_cls=0x7ffc6acde4d0) at scheduler.c:887 rs = 0x9e4b40 ws = 0x9e4bd0 timeout = {rel_value_us = 27953715} ret = 1 shc_int = 0xa0e090 shc_term = 0xa0e150 shc_quit = 0xa0e2d0 shc_hup = 0xa0e390 shc_pipe = 0xa0e210 last_tr = 395 busy_wait_warning = 0 pr = 0xa0dfb0 c = 91 '[' __FUNCTION__ = "GNUNET_SCHEDULER_run" #10 0x00007fc52942e517 in GNUNET_PROGRAM_run2 (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>, task_cls=0x9e3a00, run_without_scheduler=0) at program.c:300 cc = {args = 0x7ffc6acde688, cfgfile = 0x9e4d90 "test_rps.conf", task = 0x7fc5291cc9c8 <run>, task_cls = 0x9e3a00, cfg = 0x9e3a60} path = 0x9e4a90 "test-rps-single-req-9888" loglev = 0x0 logfile = 0x0 cfg_fn = 0x9e4a70 "~/.config/gnunet.conf" xdg = 0x0 ret = 3 cnt = 5 ---Type <return> to continue, or q <return> to quit--- skew_offset = 0 skew_variance = 0 clock_offset = 0 cfg = 0x9e3a60 defoptions = {{shortName = 99 'c', name = 0x7fc529458310 "config", argumentHelp = 0x7fc529458317 "FILENAME", description = 0x7fc529458320 "use configuration file FILENAME", require_argument = 1, processor = 0x7fc52941c9fa <GNUNET_GETOPT_set_string>, scls = 0x7ffc6acde4d8}, {shortName = 104 'h', name = 0x7fc529458340 "help", argumentHelp = 0x0, description = 0x7fc529458345 "print this help", require_argument = 0, processor = 0x7fc52941c451 <GNUNET_GETOPT_format_help_>, scls = 0x7fc5291d43d9}, {shortName = 76 'L', name = 0x7fc529458355 "log", argumentHelp = 0x7fc529458359 "LOGLEVEL", description = 0x7fc529458368 "configure logging to use LOGLEVEL", require_argument = 1, processor = 0x7fc52941c9fa <GNUNET_GETOPT_set_string>, scls = 0x7ffc6acde4c8}, {shortName = 108 'l', name = 0x7fc52945838a "logfile", argumentHelp = 0x7fc529458392 "LOGFILE", description = 0x7fc5294583a0 "configure logging to write logs to LOGFILE", require_argument = 1, processor = 0x7fc52941c9fa <GNUNET_GETOPT_set_string>, scls = 0x7ffc6acde4c0}, {shortName = 118 'v', name = 0x7fc5294583cb "version", argumentHelp = 0x0, description = 0x7fc5294583d3 "print the version number", require_argument = 0, processor = 0x7fc52941c40b <GNUNET_GETOPT_print_version_>, scls = 0x7fc5294583ec}} allopts = 0x9e4b20 gargs = 0x0 lpfx = 0x9e4c80 "\020v\241" spc = 0x0 __FUNCTION__ = "GNUNET_PROGRAM_run2" #11 0x00007fc52942e66e in GNUNET_PROGRAM_run (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>, task_cls=0x9e3a00) at program.c:339 No locals. #12 0x00007fc5291cccc6 in GNUNET_TESTBED_test_run (testname=0x408381 "test-rps-single-req", cfg_filename=0x40856b "test_rps.conf", num_peers=5, event_mask=0, cc=0x0, cc_cls=0x0, test_master=0x405e47 <run>, test_master_cls=0x0) at testbed_api_test.c:151 argv2 = {0x9e39c0 "test-rps-single-req", 0x7fc5291d43a0 "-c", 0x9e39e0 "test_rps.conf", 0x0} options = {{shortName = 0 '\000', name = 0x0, argumentHelp = 0x0, description = 0x0, require_argument = 0, processor = 0x0, scls = 0x0}} rc = 0x9e3a00 ret = 0 __FUNCTION__ = "GNUNET_TESTBED_test_run" #13 0x0000000000406ba7 in main (argc=1, argv=0x7ffc6acde7f8) at test_rps.c:1532 ret_value = -1467828757 __FUNCTION__ = "main" | ||||
Tags | No tags attached. | ||||
|
> Your service is not starting most likely. The last time I checked, I got nice logs from the service. It was able to receive requests from the client and, according to the logs, it was going to send a reply. > A quick test revealed that if you have it installed it cannot bind to > the UNIXPATH set in rps.conf. I have now set it to > UNIXPATH = $GNUNET_USER_RUNTIME_DIR/gnunet-service-rps.sock > and it works. Did not change a thing for me. > However, the test fails to start the service as well... valgrind is > giving me strange errors. Maybe cleanup the test_rps.conf configuration > before looking at the code... > > Did the test run successfully before the change?? The test did run successfully for a long time. Unfortunately there were quite a few changes before the migration to the service_new that I did not test. |
|
My valgrind tells me: ==4953== Source and destination overlap in memcpy(0x742ba60, 0x742ba80, 64) ==4953== at 0x4C301F3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018) ==4953== by 0x40BDDE: rem_from_list (gnunet-service-rps.c:402) ==4953== by 0x40BDDE: do_round (gnunet-service-rps.c:1994) ==4953== by 0x50841DF: run_ready (scheduler.c:620) ==4953== by 0x50841DF: GNUNET_SCHEDULER_run (scheduler.c:887) ==4953== by 0x50922A0: GNUNET_SERVICE_ruN_ (service_new.c:1724) ==4953== by 0x402CFD: main (gnunet-service-rps.c:2428) ==4953== rvice-rps.c:907) Can you check if it might have to be a memmove there? This code looks fishy (gnunet-service-rps.c:402) |
|
Yes, you are right. Fixed it. (I still want to get rid of the whole function, but other things are more important right now.) Fixed also another issue in the service. Now, the assertion in the api is not triggered anymore, the service seems to be able to transmit some messages to the api and I got some new assertion failures. It's too late now. Will take care about that the next days. |
|
> Now, the assertion in the api is not triggered anymore, the service seems to be able to transmit some messages to the api and I got some new assertion failures. It's too late now. Will take care about that the next days. That was too quick. The assertion triggers again. Everything looks like before. |
|
The assertion in the api does not give you any indication of what is wrong... you have to debug the service. It seems your cadet usage is wrong: Sep 30 08:01:54-492077 rps-7602 INFO Ready to receive requests from clients Sep 30 08:01:55-021007 rps-7602 WARNING Trying to destroy the context of a client that still has pending requests. Going to clean those Sep 30 08:02:23-188606 rps-7602 ERROR Assertion failed at cadet_api.c:1575. |
|
This I am aware of. It is related to this issue: https://gnunet.org/bugs/view.php?id=4622 (Cadet has not yet an implementation for the cancellation of messages via mq. And so I cannot cancel pending messages before the destruction of the channel.) Do you think this is what causes it? I am not sure but I think rps was working when this issue first occured. Sep 30 13:16:20-310095 rps-peers-7481 DEBUG Destroying send channel Sep 30 13:16:20-321921 rps-7481 ERROR Assertion failed at cadet_api.c:1575. Sep 30 13:16:20-323062 cadet-api-7481 WARNING Channel destroyed without cancelling transmission requests Sep 30 13:16:20-324068 mq-7481 WARNING Got error 2, but no handler installed |
|
Can you tell me what message should be called? Because if it is Seed it doesn't really send an answer, right? It even seems to drop the client. I suggest first fully moving to the new API btw and not test halfway through. |
|
I think we are getting closer to the problem here. (I thought I migrated the important parts - please tell me what I still have to do, because for some reason I'm missing it.) > Because if it is Seed it doesn't really send an answer, right? correct. > It even seems to drop the client. Yes, actually I was dropping the client connection after having successfully handled a seed. Thanks for the hint. > I suggest first fully moving to the new API btw and not test halfway through. Sounds good. Problem is that I don't know what I missed. I thought, I did pay attention to everything. |
|
Ok, I might (for now) not resolve the problem of the timeouts, but I will try to come up with a proper handling of timeouts. Up until now, a connection error causes the client to reconnect to the server with the consequence that the client canceles and deletes all the request the clients sent. The caller of the API gets not notified. I see two possible ways of solving that: a) let the client API handle it (resending the requests, ...) without notifying the caller b) let the caller do it. I will try to go for a). |
|
See other APIs. They do a) on reconnect. |
|
client - service communication seems to work now. Still there are problems. |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-09-27 18:33 | ch3 | New Issue | |
2016-09-27 18:33 | ch3 | Status | new => assigned |
2016-09-27 18:33 | ch3 | Assigned To | => ch3 |
2016-09-29 19:43 | ch3 | Note Added: 0011207 | |
2016-09-29 22:15 | schanzen | Note Added: 0011209 | |
2016-09-30 01:49 | ch3 | Note Added: 0011210 | |
2016-09-30 11:01 | ch3 | Note Added: 0011211 | |
2016-09-30 11:07 | schanzen | Note Added: 0011212 | |
2016-09-30 13:30 | ch3 | Note Added: 0011213 | |
2016-09-30 14:02 | schanzen | Note Added: 0011214 | |
2016-09-30 15:30 | ch3 | Note Added: 0011217 | |
2016-10-04 13:07 | ch3 | Note Added: 0011236 | |
2016-10-04 17:06 | schanzen | Note Added: 0011237 | |
2016-10-04 22:25 | ch3 | Note Added: 0011245 | |
2016-10-04 22:27 | ch3 | Status | assigned => resolved |
2016-10-04 22:27 | ch3 | Resolution | open => fixed |
2016-10-04 22:27 | ch3 | Fixed in Version | => Git master |
2016-10-07 00:52 | Christian Grothoff | Fixed in Version | Git master => 0.11.0pre66 |
2016-10-07 00:52 | Christian Grothoff | Target Version | => 0.11.0pre66 |
2018-06-07 00:24 | Christian Grothoff | Status | resolved => closed |