View Issue Details

IDProjectCategoryView StatusLast Update
0005337Talertwisterpublic2019-12-20 19:12
ReporterMarcello Stanisci Assigned ToChristian Grothoff  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Product Versiongit (master) 
Target Version0.6Fixed in Version0.6 
Summary0005337: Machine state not changed to "download started"
DescriptionIf the twister is run under Valgrind, then it fails _always_ to
set the machine state as REQUEST_STATE_DOWNLOAD_STARTED, making
the first assert from curl_download_cb() fail.
Steps To Reproduceit suffices to start the Web server which is found under
src/test/ (twister repo), then start the twister under Valgrind (using the
config file from the same directory), and finally issue a POST request to
http://127.0.0.1:8888/ uploading a dummy JSON.
Additional InformationLogs from this kind of error.

May 25 10:54:21-522105 twister-5532 ERROR Assertion failed at taler-twister-service.c:550.
==5532==
==5532== Process terminating with default action of signal 6 (SIGABRT)
==5532== at 0x5CB7E7B: raise (raise.c:51)
==5532== by 0x5CB9230: abort (abort.c:79)
==5532== by 0x5812918: GNUNET_abort_ (common_logging.c:284)
==5532== by 0x10AEF7: curl_download_cb (taler-twister-service.c:549)
==5532== by 0x5387291: ??? (in /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4.4.0)
==5532== by 0x539F3E9: ??? (in /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4.4.0)
==5532== by 0x53A96F5: ??? (in /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4.4.0)
==5532== by 0x53AA330: curl_multi_perform (in /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4.4.0)
==5532== by 0x10C3E5: curl_task_download (taler-twister-service.c:738)
==5532== by 0x58435DE: GNUNET_SCHEDULER_do_work (scheduler.c:2106)
==5532== by 0x584436F: select_loop (scheduler.c:2407)
==5532== by 0x584436F: GNUNET_SCHEDULER_run (scheduler.c:727)
==5532== by 0x584864C: GNUNET_SERVICE_run_ (service.c:1847)
TagsNo tags attached.

Activities

Marcello Stanisci

2018-06-01 16:49

reporter   ~0012976

Last edited: 2018-06-01 17:18

Commit c5a65df2f40125fdd7d35f298220 should fix this. The problem was
that if the Web server responds before reading any POSTed data, then the
curl download callback gets called _before_ the curl upload callback; moreover,
it seems that the latter NEVER gets called in such scenarios.

This way, no function ever set the state as "download started" and
so the Twister got in an inconsistent state: download callback in action
and state NOT as "download started".

This fix just tolerates such scenarios by setting the state as "download
started" INSIDE the download callback, printing a warning.

The issue is left open for internal review.

Christian Grothoff

2018-06-03 16:07

manager   ~0012977

Did you push the commit? I don't see it in the master branch, and dev/marcello is untouched for 3 months.

Marcello Stanisci

2018-09-07 12:36

reporter   ~0013228

After several tries I can confirm that: if the problem is reproduced as in the description above, the libcurl's upload callback does NEVER get called (therefore no one sets the state as "download started").

On the other hand, if the scenario is the same but Valgrind is NOT used, then the upload callback goes first and sets the state as expected.

Note, we already set two timeout options like CURLOPT_TIMEOUT and CURLOPT_CONNECTTIMEOUT to 600 seconds, so unless there is some other
crazy timeout option to use, I'd say the problem is not even time-related.

And yes, the mentioned commit above did get pushed on master.

Marcello Stanisci

2018-09-17 22:25

reporter   ~0013234

Additional issues:

* if the download callback returns CURL_WRITEFUNC_PAUSE then the progress callback does NEVER get called (as the manpage suggests), therefore making it impossible to wake things up.

* if the easy handle is instantiated using curl_easy_pause(handle, CURLPAUSE_RECV), then _no_ effect is produced: the download callback will just be called as if it were never get paused.

Those issues arose because we do want to keep the state machine as it is, namely we do want the state to be set as "download started" when the download callback fires up. And to do so, we need to pause the download callback.

Christian Grothoff

2018-09-17 23:50

manager   ~0013235

If we returned CURL_WRITEFUNC_PAUSE, we need to call
https://linux.die.net/man/3/curl_easy_pause to have the progress callback be called again later.

Marcello Stanisci

2018-09-17 23:58

reporter   ~0013236

Sure, and as of the docs, the best place to call curl_easy_pause is the progress function (that I was not able to get invoked); but now it is fixed (commit 26b8340), it just needed to be enabled by setting CURLOPT_NOPROGRESS to 0.

But another problem arises: the progress function's argument 'dltotal' (= expected number of bytes to download) is passed as 0 always, and so it is not possible to understand when it's the right time to unpause the download callback from within the progress function..

The idea was to compare 'ultotal' and 'ulnow', and unpause the download callback whenever 'ulnow' reached 'ultotal'; but apparently this is not possible.

Christian Grothoff

2018-09-18 12:22

manager   ~0013237

I'm confused. You added the progress_cb -- but for what? We don't need it.

Christian Grothoff

2018-09-18 12:22

manager   ~0013238

I've pushed a possible fix in 26b8340..324f6db. Basically, the code now remembers that it suspended the curl download and resumes it if necessary.

Marcello Stanisci

2018-09-18 12:25

reporter   ~0013239

the progress_cb is told to be the best place to unpause what has been paused. See curl_easy_pause man page.

Marcello Stanisci

2018-09-18 12:36

reporter   ~0013240

26b8340..324f6db doesn't work. There are two problems with it: (1) once the download_cb fires up, the upload_cb doesn't get called _anymore_, even when the former goes to sleep, therefore no unpausing happens. (2) Your patch unpauses regardless of whether _all_ the bytes got uploaded first.

Marcello Stanisci

2018-09-18 12:37

reporter   ~0013241

Sorry, (2) doesn't apply on the upload_cb.

Christian Grothoff

2018-09-18 13:01

manager   ~0013242

Please let me know if this helps, or if there is another issue left.

Marcello Stanisci

2018-09-18 15:01

reporter   ~0013243

No it does not. As (1) says, the upload cb does never get called, therefore NO unpausing happens.

We should find a way to tell libcurl "do run the upload cb first for all POSTs".

Christian Grothoff

2018-09-18 17:42

manager   ~0013244

Could you get a packet capture of the interaction that happens with wireshark and attach it to this bug report? I wonder if we're missing something, like a 100 continue or so.

Marcello Stanisci

2018-09-18 18:16

reporter   ~0013245

Last edited: 2018-09-18 18:17

Wireshark only catches the first POST, and not the one between the Twister and the test webserver. Below you see the (nice) HTTP request when netcat is used as the test webserver. The Expect: header is something that I added now, and did NOT fix things.

The request was generated with:
$ curl -d '{}' -H 'Content-Type: application/json' http://localhost:8888/

marcello@laptop01:~$ nc -l -p 8080
POST / HTTP/1.1
Host: localhost:8888
User-Agent: curl/7.52.1
Accept: */*
Content-Type: application/json
Content-Length: 2
Expect: 100-continue

{}

Marcello Stanisci

2018-09-20 16:51

reporter   ~0013246

2f69f626b0e47 fixes it.

Christian Grothoff

2018-09-20 17:15

manager   ~0013247

I don't think the proposed fix is sane, it is likely to create subtle problems and almost certainly does not fix the root cause.

Christian Grothoff

2018-09-20 20:12

manager   ~0013249

9ad1ebf8f6b126fdef9a0753305b77f30ebbfa3f fixes the "odd" behavior of the test web server.

51165a5b431fc5859afc81dba47833ad1e1f35d6 adjust the proxy to deal with odd behavior of test web server and (slightly) odd handling of such behavior by curl.

The "odd" behavior was that the test Web server returned a (non-error) HTTP response before the upload was complete, causing curl to abandon the upload (but the proxy "insisting" that it should be completed).

With the change, the Web server waits with the response until the upload is done, and the proxy _would_ abort uploads (and move towards the download phase) for Web servers that do return a response 'early'.

Issue History

Date Modified Username Field Change
2018-06-01 16:39 Marcello Stanisci New Issue
2018-06-01 16:39 Marcello Stanisci Status new => assigned
2018-06-01 16:39 Marcello Stanisci Assigned To => Marcello Stanisci
2018-06-01 16:43 Marcello Stanisci Target Version => 0.6
2018-06-01 16:49 Marcello Stanisci Note Added: 0012976
2018-06-01 17:13 Marcello Stanisci Note Edited: 0012976
2018-06-01 17:18 Marcello Stanisci Note Edited: 0012976
2018-06-03 16:07 Christian Grothoff Note Added: 0012977
2018-09-07 12:36 Marcello Stanisci Note Added: 0013228
2018-09-17 22:25 Marcello Stanisci Note Added: 0013234
2018-09-17 23:50 Christian Grothoff Note Added: 0013235
2018-09-17 23:58 Marcello Stanisci Note Added: 0013236
2018-09-18 12:22 Christian Grothoff Note Added: 0013237
2018-09-18 12:22 Christian Grothoff Note Added: 0013238
2018-09-18 12:25 Marcello Stanisci Note Added: 0013239
2018-09-18 12:36 Marcello Stanisci Note Added: 0013240
2018-09-18 12:37 Marcello Stanisci Note Added: 0013241
2018-09-18 13:01 Christian Grothoff Note Added: 0013242
2018-09-18 15:01 Marcello Stanisci Note Added: 0013243
2018-09-18 17:42 Christian Grothoff Note Added: 0013244
2018-09-18 18:16 Marcello Stanisci Note Added: 0013245
2018-09-18 18:17 Marcello Stanisci Note Edited: 0013245
2018-09-20 16:51 Marcello Stanisci Note Added: 0013246
2018-09-20 16:52 Marcello Stanisci Status assigned => resolved
2018-09-20 16:52 Marcello Stanisci Resolution open => fixed
2018-09-20 17:14 Christian Grothoff Assigned To Marcello Stanisci => Christian Grothoff
2018-09-20 17:15 Christian Grothoff Status resolved => assigned
2018-09-20 17:15 Christian Grothoff Note Added: 0013247
2018-09-20 20:12 Christian Grothoff Status assigned => resolved
2018-09-20 20:12 Christian Grothoff Note Added: 0013249
2018-09-20 20:13 Christian Grothoff Product Version => git (master)
2018-09-20 20:13 Christian Grothoff Fixed in Version => 0.6
2019-12-20 19:12 Christian Grothoff Status resolved => closed