View Issue Details

IDProjectCategoryView StatusLast Update
0008452Talerexchangepublic2024-03-07 20:47
ReporterFlorian Dold Assigned ToChristian Grothoff  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Target Version0.9.4Fixed in Version0.9.4 
Summary0008452: deposit long-polling does not work
Description# check out branch dev/dold/deposit-longpolling
$ taler-harness run-integrationtests deposit

This will hang in deposit long-polling, whereas the master branch that currently uses no long-polling but just retries with back-off does not hang.

The aggregator is run (one-shot), but GET /deposits/$H_WIRE/$MERCHANT_PUB/$H_CONTRACT_TERMS/$COIN_PUB is stuck and does not return.
TagsNo tags attached.

Relationships

child of 0008413 closedFlorian Dold wallet-core does not use long polling for deposit tracking 

Activities

Christian Grothoff

2024-02-21 20:59

manager   ~0021421

I did a visual inspection of the code, and all of the logic required seems to be there...

Christian Grothoff

2024-02-21 21:42

manager   ~0021430

I see *empty* log files for the aggregator when running your test, which either suggests it did NOT run OK, *or* you didn't log on INFO level, which means the logging is pretty useless. Could you please check that the aggregator is run at INFO for all of your test suites?

Christian Grothoff

2024-02-21 21:44

manager   ~0021431

Ah, looks like the test didn't even get to that point yet, let me look into it.

Christian Grothoff

2024-02-21 21:45

manager   ~0021432

Test fails with latest exchange because now, due to x-taler-bank validation, the missing ?receiver-name= of the payto://-URI in your test stops you early:

2024-02-21T20:15:21.097Z harness.ts INFO running command taler-exchange-offline -c '/tmp/taler-integrationtests-7kI6Oc/deposit/exchange-testexchange-1.conf' enable-account payto://x-taler-bank/localhost/myexchange upload
FATAL: test failed for exchange-offline
Unexpected exit code 2: "taler-exchange-offline"
2024-02-21T21:15:21.113731+0100 taler-exchange-offline-55636 ERROR payto URI is malformed: 'receiver-name' parameter missing

CommandError: Unexpected exit code 2
    at ChildProcess.<anonymous> (/home/grothoff/lib/taler-harness/node_modules/taler-harness/dist/taler-harness-bundled.cjs:45588:11)
    at ChildProcess.emit (node:events:513:28)
    at ChildProcess._handle.onexit (node:internal/child_process:293:12) {
  logName: 'exchange-offline',
  command: 'taler-exchange-offline'

Florian Dold

2024-02-28 01:32

manager   ~0021541

I've fixed the receiver-name issue on master and also in the dev/dold/deposit-longpolling branch (with a rebase / force-push).

Here's the commit where deposit again fails now due to the longpolled request not returning a response:

commit 4aacad96d0d02293d2d665e99a9bdf1c4adaf421 (HEAD -> dev/dold/deposit-longpolling, origin/dev/dold/deposit-longpolling)
Author: Florian Dold <florian@dold.me>
Date: Wed Feb 28 01:29:22 2024 +0100

    wallet-core: enable deposit long-polling

Christian Grothoff

2024-02-29 00:57

manager   ~0021573

Wrong condition, fixed in 3f0bfbeb..295be9df.

Florian Dold

2024-03-06 08:51

manager   ~0021711

The long-poller returns now, but it only returns after 30 seconds instead of immediately when the aggregator runs.

Florian Dold

2024-03-06 09:08

manager   ~0021715

I've added a -t run of taler-exchange-transfer after a -t run of taler-exchange-wire, but it still hangs 30 seconds.

Christian Grothoff

2024-03-06 22:21

manager   ~0021737

9173d274..b8c5d140 fixes it by working around Postgres #18380.

Christian Grothoff

2024-03-06 22:32

manager   ~0021738

The following bug has been logged on the website:

Bug reference: 18380
Logged by: Christian Grothoff
Email address: grothoff@gnu.org
PostgreSQL version: 16.2
Operating system: Debian GNU/Linux
Description:

We tested with Postgres 14, 15, 16.1 and 16.2; same results.

Overview:
========

We use LISTEN + NOTIFY to implement some long polling. This usually works.
However, we stop getting NOTIFY messages after a LISTENing client issued a
particular SELECT query with a LEFT JOIN in the LISTENing client. Disabling
"enable_mergejoin" fixes the issue. We believe that the "enable_mergejoin"
logic somehow corrupts the notify state (likely an obscure rare memory
corruption issue). The bug is 100% reproduceable on several of our
systems.

How to reproduce:
===============
* Install GNUnet 0.21.0 [or: git.gnunet.org/gnunet.git master]
* Install GNU Taler exchange, checkout tag 'postgres-bug-mergejoin' [from
git.taler.net/exchange.git]
* Install taler-wallet-core [from git.taler.net/taler-wallet-core]
* run $ taler-harness run-integrationtests deposit

Buggy outcome
=============

If enable_mergejoin is ON, the test reports this at the end:
      parent: got result
{"name":"deposit","timeSec":35.168,"status":"pass"}
Note that the 35s is due to all kinds of things in the test, none related to
actual postgres query performance.
This is NOT a performance issue. We can set the long-poll threshold higher,
and the time will go up accordingly.

Expected behavior with work-around
==============================

 If enable_mergejoin is OFF, long-polling works and the test reports
something like this at the end:
      parent: got result
{"name":"deposit","timeSec":16.094,"status":"pass"}

This is basically the fastest the integration test will run (~15-17
seconds), the NOTIFY happens as expected.

Relevant SQL query break-down
==========================

The query that causes things to break is in
exchange/src/exchangedb/pg_lookup_transfer_by_deposit.c. The buggy
sub-clause is marked with
#if BUG below:

    PREPARE (pg,
             "get_deposit_without_wtid",
             "SELECT"
             " bdep.wire_salt"
             ",wt.payto_uri"
             ",cdep.amount_with_fee"
             ",denom.fee_deposit"
             ",bdep.wire_deadline"
#if BUG
             ",agt.legitimization_requirement_serial_id"
#endif
             ",aml.status"
             ",aml.kyc_requirement"
             " FROM coin_deposits cdep"
             " JOIN batch_deposits bdep"
             " USING (batch_deposit_serial_id)"
             " JOIN wire_targets wt"
             " USING (wire_target_h_payto)"
             " JOIN known_coins kc"
             " ON (kc.coin_pub = cdep.coin_pub)"
             " JOIN denominations denom"
             " USING (denominations_serial)"
#if BUG
             " LEFT JOIN aggregation_transient agt "
             " ON ( (bdep.wire_target_h_payto = agt.wire_target_h_payto)
AND"
             " (bdep.merchant_pub = agt.merchant_pub) )"
#endif
             " LEFT JOIN aml_status aml"
             " ON (wt.wire_target_h_payto = aml.h_payto)"
             " WHERE cdep.coin_pub=$1"
             " AND bdep.merchant_pub=$3"
             " AND bdep.h_contract_terms=$2"
             " LIMIT 1;");

If we #define BUG to 0 for the code above, we do NOT have to disable the
merge_join to get the desired behavior.

Issue History

Date Modified Username Field Change
2024-02-20 16:20 Florian Dold New Issue
2024-02-20 16:20 Florian Dold Status new => assigned
2024-02-20 16:20 Florian Dold Assigned To => Christian Grothoff
2024-02-20 16:20 Florian Dold Relationship added child of 0008413
2024-02-20 16:21 Florian Dold Steps to Reproduce Updated
2024-02-21 20:59 Christian Grothoff Note Added: 0021421
2024-02-21 21:42 Christian Grothoff Note Added: 0021430
2024-02-21 21:44 Christian Grothoff Note Added: 0021431
2024-02-21 21:45 Christian Grothoff Note Added: 0021432
2024-02-28 01:32 Florian Dold Note Added: 0021541
2024-02-29 00:57 Christian Grothoff Note Added: 0021573
2024-02-29 00:57 Christian Grothoff Status assigned => resolved
2024-02-29 00:57 Christian Grothoff Resolution open => fixed
2024-02-29 00:57 Christian Grothoff Fixed in Version => 0.9.4
2024-02-29 00:57 Christian Grothoff Target Version 0.10 => 0.9.4
2024-03-06 08:51 Florian Dold Status resolved => assigned
2024-03-06 08:51 Florian Dold Note Added: 0021711
2024-03-06 09:08 Florian Dold Note Added: 0021715
2024-03-06 22:21 Christian Grothoff Status assigned => resolved
2024-03-06 22:21 Christian Grothoff Note Added: 0021737
2024-03-06 22:32 Christian Grothoff Note Added: 0021738
2024-03-07 20:47 Christian Grothoff Status resolved => closed