View Issue Details

IDProjectCategoryView StatusLast Update
0004690GNUnetrps servicepublic2018-06-07 00:24
Reporterch3Assigned Toch3 
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product VersionSVN HEAD 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0004690: problems sending rps-service -> rps-api
DescriptionIt seems like every message from the service to the api fails.
(The other way round there is no problem at all.)

To be able to look at the vack trace, I put
GNUNET_assert (0);
in the mq_error_handler () in the rps_api.c

As far as I can tell from looking at the code,
receive_helper (cls=0xaa3ae0, buf=0x7ffc6acce1b0, available=0, addr=0xaa42c0, addrlen=110, errCode=0) at client.c:533
decides, that this signal timed out.
This is caused by being called with available=0 from
receive_ready (cls=0xaa4340) at connection.c:1189.
The 0 comes from a call to
GNUNET_NETWORK_socket_recv (connection->sock, buffer, connection->max);
previously in receive_ready () at connection.c:1170.
From this point on I don't understand what's happening.

The cause is probably some simple mistake I made during the migration to service_new.

(gdb) bt
#0 0x00007fc52704e04f in raise () from /usr/lib/libc.so.6
#1 0x00007fc52704f47a in abort () from /usr/lib/libc.so.6
#2 0x00007fc5293f25f7 in GNUNET_abort_ () at common_logging.c:293
#3 0x00007fc52966d549 in mq_error_handler (cls=0xab50f0, error=GNUNET_MQ_ERROR_READ) at rps_api.c:202
#4 0x00007fc529420945 in GNUNET_MQ_inject_error (mq=0xaa3b60, error=GNUNET_MQ_ERROR_READ) at mq.c:330
#5 0x00007fc529421d85 in handle_client_message (cls=0xaa3b60, msg=0x0) at mq.c:758
#6 0x00007fc5293eea41 in receive_helper (cls=0xaa3ae0, buf=0x7ffc6acce1b0, available=0, addr=0xaa42c0, addrlen=110, errCode=0) at client.c:533
#7 0x00007fc5293fb7e9 in receive_ready (cls=0xaa4340) at connection.c:1189
#8 0x00007fc529433107 in run_ready (rs=0x9e4b40, ws=0x9e4bd0) at scheduler.c:620
#9 0x00007fc529433a52 in GNUNET_SCHEDULER_run (task=0x7fc52942d93f <program_main>, task_cls=0x7ffc6acde4d0) at scheduler.c:887
#10 0x00007fc52942e517 in GNUNET_PROGRAM_run2 (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>,
    task_cls=0x9e3a00, run_without_scheduler=0) at program.c:300
#11 0x00007fc52942e66e in GNUNET_PROGRAM_run (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>,
    task_cls=0x9e3a00) at program.c:339
#12 0x00007fc5291cccc6 in GNUNET_TESTBED_test_run (testname=0x408381 "test-rps-single-req", cfg_filename=0x40856b "test_rps.conf", num_peers=5, event_mask=0, cc=0x0, cc_cls=0x0, test_master=0x405e47 <run>,
    test_master_cls=0x0) at testbed_api_test.c:151
#13 0x0000000000406ba7 in main (argc=1, argv=0x7ffc6acde7f8) at test_rps.c:1532
Steps To Reproduceput
GNUNET_assert (0);
into the mq_error_handler () at rps_api.c:192
and run one of test_rps_*
Additional Information(gdb) bt f
#0 0x00007fc52704e04f in raise () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007fc52704f47a in abort () from /usr/lib/libc.so.6
No symbol table info available.
#2 0x00007fc5293f25f7 in GNUNET_abort_ () at common_logging.c:293
No locals.
#3 0x00007fc52966d549 in mq_error_handler (cls=0xab50f0, error=GNUNET_MQ_ERROR_READ) at rps_api.c:202
        h = 0xab50f0
        __FUNCTION__ = "mq_error_handler"
#4 0x00007fc529420945 in GNUNET_MQ_inject_error (mq=0xaa3b60, error=GNUNET_MQ_ERROR_READ) at mq.c:330
        __FUNCTION__ = "GNUNET_MQ_inject_error"
#5 0x00007fc529421d85 in handle_client_message (cls=0xaa3b60, msg=0x0) at mq.c:758
        mq = 0xaa3b60
        state = 0xab5180
#6 0x00007fc5293eea41 in receive_helper (cls=0xaa3ae0, buf=0x7ffc6acce1b0, available=0, addr=0xaa42c0, addrlen=110, errCode=0) at client.c:533
        client = 0xaa3ae0
        remaining = {rel_value_us = 18446744073709551615}
        receive_handler = 0x7fc529421d49 <handle_client_message>
        receive_handler_cls = 0xaa3b60
        __FUNCTION__ = "receive_helper"
#7 0x00007fc5293fb7e9 in receive_ready (cls=0xaa4340) at connection.c:1189
        connection = 0xaa4340
        tc = 0x7fc52966b0b0 <tc>
        buffer = "\022-\001\335\000\000\000\004\000\000\000\000\000\000\000\065\327\213!\005\tFI\377\vK\245\222\370\340\377\336(\254\335\366\376aO\005K\343\341D\347\337\065\002LZx\001\305\\[s\342H\262~\327\257\360D\354y\331X\256mc\217;\346A\200\000E\v\211\021\242۽\035\016\205\f\262\315\016\267E\302\333ލ9\277\375|Y\027\251@\302\b\214\367\304\304L\230\272dVf\345\275R\363c\276\231\305\323q\020\305\367\232\336j\031\003\317\357\270N\277q\361\333\305\355m\355\263:\206\241Z\375\272\\\305?\265\317y\177i=g\350\331z\337\300\312\331r\034̞\227Q\254\215l\363\316\357\353^\253\347w\315\066\246\276\033Cup\264\065\070н\036\326T\342\371"...
        ret = 0
        receiver = 0x7fc5293ee7a9 <receive_helper>
        __FUNCTION__ = "receive_ready"
#8 0x00007fc529433107 in run_ready (rs=0x9e4b40, ws=0x9e4bd0) at scheduler.c:620
        p = GNUNET_SCHEDULER_PRIORITY_DEFAULT
        pos = 0xa923d0
        __FUNCTION__ = "run_ready"
#9 0x00007fc529433a52 in GNUNET_SCHEDULER_run (task=0x7fc52942d93f <program_main>, task_cls=0x7ffc6acde4d0) at scheduler.c:887
        rs = 0x9e4b40
        ws = 0x9e4bd0
        timeout = {rel_value_us = 27953715}
        ret = 1
        shc_int = 0xa0e090
        shc_term = 0xa0e150
        shc_quit = 0xa0e2d0
        shc_hup = 0xa0e390
        shc_pipe = 0xa0e210
        last_tr = 395
        busy_wait_warning = 0
        pr = 0xa0dfb0
        c = 91 '['
        __FUNCTION__ = "GNUNET_SCHEDULER_run"
#10 0x00007fc52942e517 in GNUNET_PROGRAM_run2 (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>,
    task_cls=0x9e3a00, run_without_scheduler=0) at program.c:300
        cc = {args = 0x7ffc6acde688, cfgfile = 0x9e4d90 "test_rps.conf", task = 0x7fc5291cc9c8 <run>, task_cls = 0x9e3a00, cfg = 0x9e3a60}
        path = 0x9e4a90 "test-rps-single-req-9888"
        loglev = 0x0
        logfile = 0x0
        cfg_fn = 0x9e4a70 "~/.config/gnunet.conf"
        xdg = 0x0
        ret = 3
        cnt = 5
---Type <return> to continue, or q <return> to quit---
        skew_offset = 0
        skew_variance = 0
        clock_offset = 0
        cfg = 0x9e3a60
        defoptions = {{shortName = 99 'c', name = 0x7fc529458310 "config", argumentHelp = 0x7fc529458317 "FILENAME", description = 0x7fc529458320 "use configuration file FILENAME", require_argument = 1,
            processor = 0x7fc52941c9fa <GNUNET_GETOPT_set_string>, scls = 0x7ffc6acde4d8}, {shortName = 104 'h', name = 0x7fc529458340 "help", argumentHelp = 0x0, description = 0x7fc529458345 "print this help",
            require_argument = 0, processor = 0x7fc52941c451 <GNUNET_GETOPT_format_help_>, scls = 0x7fc5291d43d9}, {shortName = 76 'L', name = 0x7fc529458355 "log", argumentHelp = 0x7fc529458359 "LOGLEVEL",
            description = 0x7fc529458368 "configure logging to use LOGLEVEL", require_argument = 1, processor = 0x7fc52941c9fa <GNUNET_GETOPT_set_string>, scls = 0x7ffc6acde4c8}, {shortName = 108 'l',
            name = 0x7fc52945838a "logfile", argumentHelp = 0x7fc529458392 "LOGFILE", description = 0x7fc5294583a0 "configure logging to write logs to LOGFILE", require_argument = 1,
            processor = 0x7fc52941c9fa <GNUNET_GETOPT_set_string>, scls = 0x7ffc6acde4c0}, {shortName = 118 'v', name = 0x7fc5294583cb "version", argumentHelp = 0x0,
            description = 0x7fc5294583d3 "print the version number", require_argument = 0, processor = 0x7fc52941c40b <GNUNET_GETOPT_print_version_>, scls = 0x7fc5294583ec}}
        allopts = 0x9e4b20
        gargs = 0x0
        lpfx = 0x9e4c80 "\020v\241"
        spc = 0x0
        __FUNCTION__ = "GNUNET_PROGRAM_run2"
#11 0x00007fc52942e66e in GNUNET_PROGRAM_run (argc=3, argv=0x7ffc6acde670, binaryName=0x408381 "test-rps-single-req", binaryHelp=0x7fc5291d43d9 "nohelp", options=0x7ffc6acde630, task=0x7fc5291cc9c8 <run>,
    task_cls=0x9e3a00) at program.c:339
No locals.
#12 0x00007fc5291cccc6 in GNUNET_TESTBED_test_run (testname=0x408381 "test-rps-single-req", cfg_filename=0x40856b "test_rps.conf", num_peers=5, event_mask=0, cc=0x0, cc_cls=0x0, test_master=0x405e47 <run>,
    test_master_cls=0x0) at testbed_api_test.c:151
        argv2 = {0x9e39c0 "test-rps-single-req", 0x7fc5291d43a0 "-c", 0x9e39e0 "test_rps.conf", 0x0}
        options = {{shortName = 0 '\000', name = 0x0, argumentHelp = 0x0, description = 0x0, require_argument = 0, processor = 0x0, scls = 0x0}}
        rc = 0x9e3a00
        ret = 0
        __FUNCTION__ = "GNUNET_TESTBED_test_run"
#13 0x0000000000406ba7 in main (argc=1, argv=0x7ffc6acde7f8) at test_rps.c:1532
        ret_value = -1467828757
        __FUNCTION__ = "main"
TagsNo tags attached.

Activities

ch3

2016-09-29 19:43

developer   ~0011207

> Your service is not starting most likely.
The last time I checked, I got nice logs from the service. It was able
to receive requests from the client and, according to the logs, it was
going to send a reply.

> A quick test revealed that if you have it installed it cannot bind to
> the UNIXPATH set in rps.conf. I have now set it to 
> UNIXPATH = $GNUNET_USER_RUNTIME_DIR/gnunet-service-rps.sock
> and it works.
Did not change a thing for me.


> However, the test fails to start the service as well... valgrind is
> giving me strange errors. Maybe cleanup the test_rps.conf configuration
> before looking at the code...
>
> Did the test run successfully before the change??
The test did run successfully for a long time.
Unfortunately there were quite a few changes before the migration to the service_new that I did not test.

schanzen

2016-09-29 22:15

developer   ~0011209

My valgrind tells me:
==4953== Source and destination overlap in memcpy(0x742ba60, 0x742ba80, 64)
==4953== at 0x4C301F3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
==4953== by 0x40BDDE: rem_from_list (gnunet-service-rps.c:402)
==4953== by 0x40BDDE: do_round (gnunet-service-rps.c:1994)
==4953== by 0x50841DF: run_ready (scheduler.c:620)
==4953== by 0x50841DF: GNUNET_SCHEDULER_run (scheduler.c:887)
==4953== by 0x50922A0: GNUNET_SERVICE_ruN_ (service_new.c:1724)
==4953== by 0x402CFD: main (gnunet-service-rps.c:2428)
==4953==
rvice-rps.c:907)


Can you check if it might have to be a memmove there? This code looks fishy (gnunet-service-rps.c:402)

ch3

2016-09-30 01:49

developer   ~0011210

Yes, you are right. Fixed it. (I still want to get rid of the whole function, but other things are more important right now.)

Fixed also another issue in the service.

Now, the assertion in the api is not triggered anymore, the service seems to be able to transmit some messages to the api and I got some new assertion failures. It's too late now. Will take care about that the next days.

ch3

2016-09-30 11:01

developer   ~0011211

> Now, the assertion in the api is not triggered anymore, the service seems to be able to transmit some messages to the api and I got some new assertion failures. It's too late now. Will take care about that the next days.

That was too quick. The assertion triggers again. Everything looks like before.

schanzen

2016-09-30 11:07

developer   ~0011212

The assertion in the api does not give you any indication of what is wrong... you have to debug the service. It seems your cadet usage is wrong:

Sep 30 08:01:54-492077 rps-7602 INFO Ready to receive requests from clients
Sep 30 08:01:55-021007 rps-7602 WARNING Trying to destroy the context of a client that still has pending requests. Going to clean those
Sep 30 08:02:23-188606 rps-7602 ERROR Assertion failed at cadet_api.c:1575.

ch3

2016-09-30 13:30

developer   ~0011213

This I am aware of.

It is related to this issue:
https://gnunet.org/bugs/view.php?id=4622
(Cadet has not yet an implementation for the cancellation of messages via mq. And so I cannot cancel pending messages before the destruction of the channel.)
Do you think this is what causes it?
I am not sure but I think rps was working when this issue first occured.

Sep 30 13:16:20-310095 rps-peers-7481 DEBUG Destroying send channel
Sep 30 13:16:20-321921 rps-7481 ERROR Assertion failed at cadet_api.c:1575.
Sep 30 13:16:20-323062 cadet-api-7481 WARNING Channel destroyed without cancelling transmission requests
Sep 30 13:16:20-324068 mq-7481 WARNING Got error 2, but no handler installed

schanzen

2016-09-30 14:02

developer   ~0011214

Can you tell me what message should be called? Because if it is Seed it doesn't really send an answer, right? It even seems to drop the client.
I suggest first fully moving to the new API btw and not test halfway through.

ch3

2016-09-30 15:30

developer   ~0011217

I think we are getting closer to the problem here. (I thought I migrated the important parts - please tell me what I still have to do, because for some reason I'm missing it.)

> Because if it is Seed it doesn't really send an answer, right?
correct.

> It even seems to drop the client.
Yes, actually I was dropping the client connection after having successfully handled a seed. Thanks for the hint.

> I suggest first fully moving to the new API btw and not test halfway through.
Sounds good. Problem is that I don't know what I missed. I thought, I did pay attention to everything.

ch3

2016-10-04 13:07

developer   ~0011236

Ok, I might (for now) not resolve the problem of the timeouts, but I will try to come up with a proper handling of timeouts.

Up until now, a connection error causes the client to reconnect to the server with the consequence that the client canceles and deletes all the request the clients sent. The caller of the API gets not notified.

I see two possible ways of solving that:
a) let the client API handle it (resending the requests, ...) without notifying the caller
b) let the caller do it.

I will try to go for a).

schanzen

2016-10-04 17:06

developer   ~0011237

See other APIs. They do a) on reconnect.

ch3

2016-10-04 22:25

developer   ~0011245

client - service communication seems to work now. Still there are problems.

Issue History

Date Modified Username Field Change
2016-09-27 18:33 ch3 New Issue
2016-09-27 18:33 ch3 Status new => assigned
2016-09-27 18:33 ch3 Assigned To => ch3
2016-09-29 19:43 ch3 Note Added: 0011207
2016-09-29 22:15 schanzen Note Added: 0011209
2016-09-30 01:49 ch3 Note Added: 0011210
2016-09-30 11:01 ch3 Note Added: 0011211
2016-09-30 11:07 schanzen Note Added: 0011212
2016-09-30 13:30 ch3 Note Added: 0011213
2016-09-30 14:02 schanzen Note Added: 0011214
2016-09-30 15:30 ch3 Note Added: 0011217
2016-10-04 13:07 ch3 Note Added: 0011236
2016-10-04 17:06 schanzen Note Added: 0011237
2016-10-04 22:25 ch3 Note Added: 0011245
2016-10-04 22:27 ch3 Status assigned => resolved
2016-10-04 22:27 ch3 Resolution open => fixed
2016-10-04 22:27 ch3 Fixed in Version => SVN HEAD
2016-10-07 00:52 Christian Grothoff Fixed in Version SVN HEAD => 0.11.0pre66
2016-10-07 00:52 Christian Grothoff Target Version => 0.11.0pre66
2018-06-07 00:24 Christian Grothoff Status resolved => closed