View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0008971 | Taler | libeufin-bank | public | 2024-06-19 17:42 | 2024-07-26 00:11 |
Reporter | sebasjm | Assigned To | Antoine A | ||
Priority | high | Severity | major | Reproducibility | sometimes |
Status | closed | Resolution | fixed | ||
Product Version | 0.12 | ||||
Target Version | 0.12 | Fixed in Version | 0.12 | ||
Summary | 0008971: getting 500 from time to time | ||||
Description | curl 'http://bank.taler.test:1180/accounts/admin/transactions?delta=-6' \ -H 'Accept: application/json' \ -H 'Accept-Language: en-US,en;q=0.9,es;q=0.8' \ -H 'Authorization: Bearer secret-token:DKVKR1RS6H8CV3X918DVPPMKSBY6H1C5AD0NFM4BG0KM9QDVSSW0' \ -H 'Cache-Control: no-cache' \ -H 'Connection: keep-alive' \ -H 'Pragma: no-cache' \ -H 'Referer: http://bank.taler.test:1180/webui/' \ -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' \ --compressed \ --insecure HTTP/1.1 500 Internal Server Error Server: nginx/1.18.0 (Ubuntu) Date: Wed, 19 Jun 2024 15:30:27 GMT Content-Type: application/json Content-Length: 56 Connection: keep-alive Vary: Origin Access-Control-Allow-Credentials: true {"code":5111,"hint":"Transaction serialization failure"} org.postgresql.util.PSQLException: ERROR: could not serialize access due to concurrent update at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2725) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2412) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:371) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:502) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:419) at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:194) at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:137) at tech.libeufin.common.db.TransactionKt.oneOrNull(transaction.kt:47) at tech.libeufin.bank.db.TokenDAO$access$2.invokeSuspend(TokenDAO.kt:88) at tech.libeufin.bank.db.TokenDAO$access$2.invoke(TokenDAO.kt) at tech.libeufin.bank.db.TokenDAO$access$2.invoke(TokenDAO.kt) at tech.libeufin.common.db.DbPool$conn$2.invokeSuspend(DbPool.kt:56) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111) at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99) at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702) 19-Jun-2024T12:29:51.185 INFO libeufin-bank - 500 Internal Server Error, GET /monitor, BANK_SOFT_EXCEPTION Transaction serializatio> | ||||
Tags | No tags attached. | ||||
|
We use serializable transaction isolation levels for determinism and this can lead to serialization errors for write transactions such as the creation of a new wire transfer. libeufin-bank already retries automatically but only for a limited amount of time before failing. However, a non-trivial read transaction such as /monitor can cause a serialization error when it doesn't matter (we expect the statistics to only grow but they can safely be a little out of date), we could relax the transaction isolation level there. We should also examine the impact of garbage collection. |
|
after further testing, it's the garbage collection that creates these serialization errors, which is why you get them at a constant interval (every 15 min) |
|
i was contstantly seeing the errors but in different endpoints loading the admin dashboard. so garbage collector seems more plausible since no write transaction ia being triggered AFAIK |
|
In general, whenever we do a PG transaction that is *not* a single statement (so we do explicit BEGIN, FOO, BAR, COMMIT) we must have a retry-loop (general rule: 3x) around the transaction server-side and not throw a 500 on the first serialization error. Moreover, if you get these, it is *very* likely that you are missing an INDEX and thus conflicting more than you should. This could be the GC (lacking an index and 'where' clause over timestamps and thus walking everything), but equally likely is the specific failed query. Thus, the correct fix is to (1) add a retry loop, and (2) investigate the use of indices/look for slow queries in your Postgres logs (and possibly log slow queries more aggressively as per PG configuration). |
|
Fixed retry loop in ac8a642657d8b346a5cf83d3864f6426b1733cc0 Fixed slow query and missing indexes in 7153ffba0a060cb6131421933fd8f978d4c9487a |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-06-19 17:42 | sebasjm | New Issue | |
2024-06-19 17:42 | sebasjm | Status | new => assigned |
2024-06-19 17:42 | sebasjm | Assigned To | => Antoine A |
2024-06-19 18:12 | sebasjm | Priority | normal => high |
2024-06-19 18:12 | sebasjm | Severity | minor => major |
2024-06-20 13:44 | Antoine A | Note Added: 0022686 | |
2024-06-20 14:26 | Antoine A | Note Added: 0022688 | |
2024-06-20 14:32 | sebasjm | Note Added: 0022689 | |
2024-06-23 10:18 | Christian Grothoff | Note Added: 0022711 | |
2024-06-25 21:50 | Antoine A | Status | assigned => resolved |
2024-06-25 21:50 | Antoine A | Resolution | open => fixed |
2024-06-25 21:50 | Antoine A | Note Added: 0022720 | |
2024-07-26 00:11 | Christian Grothoff | Fixed in Version | => 0.12 |
2024-07-26 00:11 | Christian Grothoff | Status | resolved => closed |