View Issue Details

IDProjectCategoryView StatusLast Update
0008971Talerlibeufin-bankpublic2024-07-26 00:11
Reportersebasjm Assigned ToAntoine A  
PriorityhighSeveritymajorReproducibilitysometimes
Status closedResolutionfixed 
Product Version0.12 
Target Version0.12Fixed in Version0.12 
Summary0008971: getting 500 from time to time
Descriptioncurl 'http://bank.taler.test:1180/accounts/admin/transactions?delta=-6' \
  -H 'Accept: application/json' \
  -H 'Accept-Language: en-US,en;q=0.9,es;q=0.8' \
  -H 'Authorization: Bearer secret-token:DKVKR1RS6H8CV3X918DVPPMKSBY6H1C5AD0NFM4BG0KM9QDVSSW0' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Pragma: no-cache' \
  -H 'Referer: http://bank.taler.test:1180/webui/' \
  -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' \
  --compressed \
  --insecure

HTTP/1.1 500 Internal Server Error
Server: nginx/1.18.0 (Ubuntu)
Date: Wed, 19 Jun 2024 15:30:27 GMT
Content-Type: application/json
Content-Length: 56
Connection: keep-alive
Vary: Origin
Access-Control-Allow-Credentials: true

{"code":5111,"hint":"Transaction serialization failure"}


org.postgresql.util.PSQLException: ERROR: could not serialize access due to concurrent update
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2725)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2412)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:371)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:502)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:419)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:194)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:137)
        at tech.libeufin.common.db.TransactionKt.oneOrNull(transaction.kt:47)
        at tech.libeufin.bank.db.TokenDAO$access$2.invokeSuspend(TokenDAO.kt:88)
        at tech.libeufin.bank.db.TokenDAO$access$2.invoke(TokenDAO.kt)
        at tech.libeufin.bank.db.TokenDAO$access$2.invoke(TokenDAO.kt)
        at tech.libeufin.common.db.DbPool$conn$2.invokeSuspend(DbPool.kt:56)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
        at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702)
19-Jun-2024T12:29:51.185 INFO libeufin-bank - 500 Internal Server Error, GET /monitor, BANK_SOFT_EXCEPTION Transaction serializatio>
TagsNo tags attached.

Activities

Antoine A

2024-06-20 13:44

developer   ~0022686

We use serializable transaction isolation levels for determinism and this can lead to serialization errors for write transactions such as the creation of a new wire transfer. libeufin-bank already retries automatically but only for a limited amount of time before failing. However, a non-trivial read transaction such as /monitor can cause a serialization error when it doesn't matter (we expect the statistics to only grow but they can safely be a little out of date), we could relax the transaction isolation level there. We should also examine the impact of garbage collection.

Antoine A

2024-06-20 14:26

developer   ~0022688

after further testing, it's the garbage collection that creates these serialization errors, which is why you get them at a constant interval (every 15 min)

sebasjm

2024-06-20 14:32

developer   ~0022689

i was contstantly seeing the errors but in different endpoints loading the admin dashboard.

so garbage collector seems more plausible since no write transaction ia being triggered AFAIK

Christian Grothoff

2024-06-23 10:18

manager   ~0022711

In general, whenever we do a PG transaction that is *not* a single statement (so we do explicit BEGIN, FOO, BAR, COMMIT) we must have a retry-loop (general rule: 3x) around the transaction server-side and not throw a 500 on the first serialization error.

Moreover, if you get these, it is *very* likely that you are missing an INDEX and thus conflicting more than you should. This could be the GC (lacking an index and 'where' clause over timestamps and thus walking everything), but equally likely is the specific failed query.

Thus, the correct fix is to (1) add a retry loop, and (2) investigate the use of indices/look for slow queries in your Postgres logs (and possibly log slow queries more aggressively as per PG configuration).

Antoine A

2024-06-25 21:50

developer   ~0022720

Fixed retry loop in ac8a642657d8b346a5cf83d3864f6426b1733cc0
Fixed slow query and missing indexes in 7153ffba0a060cb6131421933fd8f978d4c9487a

Issue History

Date Modified Username Field Change
2024-06-19 17:42 sebasjm New Issue
2024-06-19 17:42 sebasjm Status new => assigned
2024-06-19 17:42 sebasjm Assigned To => Antoine A
2024-06-19 18:12 sebasjm Priority normal => high
2024-06-19 18:12 sebasjm Severity minor => major
2024-06-20 13:44 Antoine A Note Added: 0022686
2024-06-20 14:26 Antoine A Note Added: 0022688
2024-06-20 14:32 sebasjm Note Added: 0022689
2024-06-23 10:18 Christian Grothoff Note Added: 0022711
2024-06-25 21:50 Antoine A Status assigned => resolved
2024-06-25 21:50 Antoine A Resolution open => fixed
2024-06-25 21:50 Antoine A Note Added: 0022720
2024-07-26 00:11 Christian Grothoff Fixed in Version => 0.12
2024-07-26 00:11 Christian Grothoff Status resolved => closed