View Issue Details

IDProjectCategoryView StatusLast Update
0006578Talermechant backendpublic2021-08-24 16:23
ReporterFlorian Dold Assigned ToFlorian Dold  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Target Version0.8Fixed in Version0.8 
Summary0006578: merchant backend /pay is unreasonably slow on gv
DescriptionThe following integration test command fails on gv, while it succeeds on my (much, much slower) laptop.

$ su - taler-internal
$ cd wallet-core/packages/taler-integrationtests/
$ ./testrunner 'test-payment-multiple'

The /pay reques takes >5s and thus times out.
TagsNo tags attached.

Activities

Christian Grothoff

2020-09-06 23:39

manager   ~0016889

Eh, I just tried it, and the test passed:

test logs and config can be found under /tmp/taler-integrationtest-1254bh
-----------------------------------
Tests finished
1/1 tests succeeded
-----------------------------------
taler-internal@gv:~/wallet-core/packages/taler-integrationtests$

Christian Grothoff

2020-09-06 23:41

manager   ~0016890

Ran it a 2nd time, got the timeout:
{
  "type": "error",
  "operation": "confirmPay",
  "id": "reqid-1",
  "error": {
    "code": 7003,
    "hint": "Error: WALLET_NETWORK_ERROR",
    "details": {
      "requestUrl": "http://localhost:8083/orders/2020.250-02GAH1DS9VAC4/pay",
      "requestMethod": "POST"
    },
    "message": "timeout of 5000ms exceeded"
  }
}

FATAL: test failed with exception Error: test assertion failed
    at GlobalTestState.assertTrue (/home/taler-internal/wallet-core/packages/taler-integrationtests/src/harness.ts:360:13)
    at /home/taler-internal/wallet-core/packages/taler-integrationtests/src/test-payment-multiple.ts:151:5
    at Generator.next (<anonymous>)
    at fulfilled (/home/taler-internal/wallet-core/node_modules/.pnpm/tslib@2.0.1/node_modules/tslib/tslib.js:111:62)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
shutting down
killing process 1146552
process bank exited
killing process 1146610
process exchange-wirewatch-testexchange-1 exited
killing process 1146611
process exchange-httpd-testexchange-1 exited
killing process 1146652
process merchant-testmerchant-1 exited
test logs and config can be found under /tmp/taler-integrationtest-Q2noSM
-----------------------------------
Tests finished
0/1 tests succeeded
These tests failed:
lib/test-payment-multiple.js
-----------------------------------

Christian Grothoff

2020-09-06 23:56

manager   ~0016891

Last edited: 2020-09-06 23:57

I think this is not a speed issue. I see one semi-interesting warnings in the logs:
Sep 06 23:33:59-214872 taler-merchant-httpd-1146652(6FR2W3FJDRB1WBW9MW6B075QWW) WARNING Exchange does not support `
x-taler-bank' wire method (will retry later)
This could result in a minor timing issue, but doesn't look like it.

Next, we see the deposit confirmation submission to the (non-existing) auditor being logged. So around this time, the deposits must also be running:

Sep 06 23:33:59-333737 taler-merchant-httpd-1146652(6FR2W3FJDRB1WBW9MW6B075QWW) INFO No auditor available for excha
nge `http://localhost:8081/'. Not submitting deposit confirmations.
-> repeated 80 times over 5 seconds, then finally:
Sep 06 23:34:04-418830 taler-merchant-httpd-1146652(6FR2W3FJDRB1WBW9MW6B075QWW) INFO No auditor available for exchange `http://localhost:8081/'. Not submitting deposit confirmations.
Sep 06 23:34:04-428854 taler-merchant-httpd-1146652(6FR2W3FJDRB1WBW9MW6B075QWW) INFO Subtracting total refunds from paid amount: TESTKUDOS:0
Sep 06 23:34:04-428881 taler-merchant-httpd-1146652(6FR2W3FJDRB1WBW9MW6B075QWW) INFO Order `2020.250-02GAH1DS9VAC4' (TD9E9VDN) was fully paid

Alas, at this point we have already hit the timeout (5s).

Christian Grothoff

2020-09-07 00:00

manager   ~0016892

Looking at the exchange, I see this (80 times):

Sep 06 23:34:04-352006 taler-exchange-httpd-1146611 INFO Handling new request
Sep 06 23:34:04-352062 taler-exchange-httpd-1146611(YWTKWXRKTX9RYDP9RK1BFRC9N0) INFO Handling request (POST) for UR
L '/coins/TC6JHVD08SXKENNPT8PKN32A5P18XPW9JKYYXH3SZMD2ZCTWBRVG/deposit'
Sep 06 23:34:04-352161 taler-exchange-httpd-1146611(YWTKWXRKTX9RYDP9RK1BFRC9N0) INFO Handling request (POST) for UR
L '/coins/TC6JHVD08SXKENNPT8PKN32A5P18XPW9JKYYXH3SZMD2ZCTWBRVG/deposit'
Sep 06 23:34:04-352191 taler-exchange-httpd-1146611(YWTKWXRKTX9RYDP9RK1BFRC9N0) INFO Handling request (POST) for UR
L '/coins/TC6JHVD08SXKENNPT8PKN32A5P18XPW9JKYYXH3SZMD2ZCTWBRVG/deposit'
Sep 06 23:34:04-355276 taler-exchange-httpd-1146611(YWTKWXRKTX9RYDP9RK1BFRC9N0) INFO Coin 1M3EKWS8 yielded 0 transa
ctions of type get_deposit_with_coin_pub
Sep 06 23:34:04-355480 taler-exchange-httpd-1146611(YWTKWXRKTX9RYDP9RK1BFRC9N0) INFO Coin 1M3EKWS8 yielded 0 transa
ctions of type get_refresh_session_by_coin
Sep 06 23:34:04-355681 taler-exchange-httpd-1146611(YWTKWXRKTX9RYDP9RK1BFRC9N0) INFO Coin 1M3EKWS8 yielded 0 transa
ctions of type get_refunds_by_coin
Sep 06 23:34:04-355703 taler-exchange-httpd-1146611(YWTKWXRKTX9RYDP9RK1BFRC9N0) INFO Inserting deposit to be execut
ed at end of time (18446744073709551615/1599428036000000)
Sep 06 23:34:04-368662 taler-exchange-httpd-1146611 INFO Request completed

A common delta per request is around 80ms, 80x80 > 5000.

Christian Grothoff

2020-09-07 00:11

manager   ~0016893

Analysis of time spent suggests it is where it should be: at the times where we do crypto and/or database interactions.
We have a 2.67 GHz cpu, using 2048 bit RSA crypto. Verification takes about 0.5 ms. EdDSA sign at 86 µs, verify at 127 µs. So that's 0.7 ms expected (verify denom sig, verify deposit permission sig, generate deposit confirmation sig). So let's say 1 ms for crypto. Means database is the costly part.

# hdparm -t /dev/{sda,sdb,md0}

roughly reports:

Timing buffered disk reads: 510 MB in 3.00 seconds = 169.75 MB/sec

My home PC has disks that are 3x and 15x and 16x faster than that. And this is *linear read*. Not sure what database *write* (random write) looks like. So _maybe_ this is just the abysmal disk + sequential execution?

Christian Grothoff

2020-09-07 00:21

manager   ~0016894

Also, I checked and the *merchant backend* has a timeout for /pay set to 30s. Where do the 5s come from? Based on the logs, I strongly suspect it is the wallet. So my conclusion: forcing the exchange to operate sequentially (so DB has to really do a full data sync per request, cannot combine multiple transactions into one log entry), plus no way to utilize multiple CPU cores (JSON parsing, crypto, context switching) means it *does* take 80ms/transaction. This is also not far off, given that with my 16x faster disk, we did 1000 transactions/second on 32 cores.

So take 1 transaction at 32 milliseconds. So go up 32x (single-core) and 16 (disk speed) and voila: 32 * 16 * 80 = 40960 ms. So you'd *expect* this to take 40s. Of course, the 32 cores were partially (2/3rds?) busy simulating clients, and you have other 'losses' from parallelism, and the 16x disk speed increase doesn't help if at say 8x the disk wasn't the bottleneck anymore, so the factors don't quite multiply like this. BUT: This shows that the 80ms/transaction is NOT at all far off the mark for this system (with non-SSD hard disks).

To conclude: 80 coins is a LOT. If you send this many, 5s timeout is too short, *especially* if the exchange processes coins sequentially.
Suggestion: Use 5s + (number-of-coins / 20) * 5s for the wallet timeout, but max at 30s as that's the merchant's own limit. And > 100 coins is just insane. So if you do that, log, warn, bitch, alert, whatever that your coin selection is going really, really wrong ;-).

Florian Dold

2020-09-07 12:31

manager   ~0016897

Both the merchant and the wallet now implement a dynamic request timeout for /pay, as discussed based on the number of coins.

Issue History

Date Modified Username Field Change
2020-09-06 18:12 Florian Dold New Issue
2020-09-06 18:12 Florian Dold Status new => assigned
2020-09-06 18:12 Florian Dold Assigned To => Christian Grothoff
2020-09-06 23:39 Christian Grothoff Note Added: 0016889
2020-09-06 23:41 Christian Grothoff Note Added: 0016890
2020-09-06 23:56 Christian Grothoff Note Added: 0016891
2020-09-06 23:57 Christian Grothoff Note Edited: 0016891
2020-09-07 00:00 Christian Grothoff Note Added: 0016892
2020-09-07 00:11 Christian Grothoff Note Added: 0016893
2020-09-07 00:21 Christian Grothoff Note Added: 0016894
2020-09-07 00:21 Christian Grothoff Assigned To Christian Grothoff => Florian Dold
2020-09-07 12:31 Florian Dold Status assigned => resolved
2020-09-07 12:31 Florian Dold Resolution open => fixed
2020-09-07 12:31 Florian Dold Note Added: 0016897
2020-10-03 14:08 Christian Grothoff Fixed in Version => 0.8
2020-10-03 14:10 Christian Grothoff Target Version => 0.8
2021-08-24 16:23 Christian Grothoff Status resolved => closed