View Issue Details

IDProjectCategoryView StatusLast Update
0008302Talerdeployment and operationspublic2024-04-12 18:10
Reporterjavier.sepulveda Assigned Tojavier.sepulveda  
PriorityhighSeveritytextReproducibilityhave not tried
Status assignedResolutionreopened 
Product Versiongit (master) 
Target Version0.11 
Summary0008302: Current monitoring tools - Write documentation
DescriptionAt this point we have currently up and running: Netdata, Grafana and Kuma.
So write documentation about what exactly is being monitored by each program.
TagsNo tags attached.

Activities

javier.sepulveda

2024-02-07 14:18

administrator   ~0021193

Agreed to create a completely new .rst file under docs.git

javier.sepulveda

2024-02-14 14:16

administrator   ~0021290

Doing quite good progress with drafting of this document. Right now I do have quite a nice skeleton of it, and are getting into the detail of each section.

javier.sepulveda

2024-03-05 13:56

administrator   ~0021672

Pushed to git the reviewed Draft. Still need to be linked somewhere in the --menu.
Even though I refer in this document to Loki and Promotail, yet, we don't have them installed on taler.net (loki+promtail). Installing those packages is easy, but
configuring the logs is another story. I need to --learn first how to manage Loki+Promtail through
the Grafana admin panel, and after that learn how to track specific logs.
Nowadays, there is different and opened bug for that 0008303.

Christian Grothoff

2024-03-07 09:29

manager   ~0021746

Did you forget to Git add images?

/research/taler/docs/taler-monitoring-infrastructure.rst:24: WARNING: image file not readable: images/taler-monitoring-infrastructure.png
/research/taler/docs/taler-monitoring-infrastructure.rst:91: WARNING: image file not readable: images/grafana-postgres-exporter.png
/research/taler/docs/taler-monitoring-infrastructure.rst:108: WARNING: image file not readable: images/uptime-kuma-from-grafana.png
/research/taler/docs/taler-monitoring-infrastructure.rst:170: WARNING: image file not readable: images/kuma.png
/research/taler/docs/taler-monitoring-infrastructure.rst:197: WARNING: image file not readable: images/uptime-kuma-edit.png

javier.sepulveda

2024-03-07 09:32

administrator   ~0021747

Yes. I realized that yesterday and added to my caldav tasks. I will do that in not time. Thank you.

javier.sepulveda

2024-03-07 12:53

administrator   ~0021755

I have checked this, as I was sure I had copied those images to the --images folder beforehand. So I thought doing "git add ." would have include them.
Now I can see, the .gitignore is avoiding to track these .png files.

# generated images
*.png

So, no idea what's the plan now. As until I can't change the .gitignore file, I won't able to commit to add these images files.
I can find my way to upload them manually but I am not sure if this what we want to do forever, when adding image files on the rst files.

javier.sepulveda

2024-03-11 09:08

administrator   ~0021832

Added missing image files by using: git add -f
I will add that to my notes, to take that into account for the next time.

Christian Grothoff

2024-03-12 13:59

manager   ~0021864

Last edited: 2024-03-18 12:19

Reopening now to give feedback on both documentation and the setup.

1) In the main figure on top, it would be good to consistently have the real hostnames (not: TALER OPS, not: "external server") in the architecture diagram. Also, we have 6 internal machines: firefly.gnunet.org (BFH), sam.gnunet.org (TUM), gv.taler.net (BFH), $tbd.taler.net (TUE), anastasis.lu (contabo) and taler-ops.ch (hosttech). They are at 5 different hosters -- that should be *plenty* to run the uptime kuma servers _within_ our own infrastructure instead of depending on any external hosts. I propose we run one at TUE, and a second one at contabo. Also, we should monitor _all_ of our hosts, not just taler.net and taler-ops.ch. In fact, we should now *also* start monitoring exchange.e.netzbon-basel.ch ;-). The diagram should also say where grafana is hosted.

2) Grafana dashboards: Marco Boss had developed some taler-specific Grafana dashboards as part of this BS thesis. You can find them in the grid5k.git, and probably should unearth and deploy them (and then document that).

3) The Node exporter, Postgres exporter and Prometheus seem perfect targets for an Ansible playbook for taler-ops.ch (and in the future other hosts).

4) Managing logs: it is totally unclear from what you write *where* this is done.

Issue History

Date Modified Username Field Change
2024-02-01 10:17 javier.sepulveda New Issue
2024-02-01 10:17 javier.sepulveda Status new => assigned
2024-02-01 10:17 javier.sepulveda Assigned To => javier.sepulveda
2024-02-07 14:18 javier.sepulveda Note Added: 0021193
2024-02-14 14:16 javier.sepulveda Note Added: 0021290
2024-02-14 14:28 Christian Grothoff Severity minor => text
2024-02-14 14:28 Christian Grothoff Target Version => 0.11
2024-03-05 13:56 javier.sepulveda Note Added: 0021672
2024-03-05 13:56 javier.sepulveda Assigned To javier.sepulveda => Christian Grothoff
2024-03-05 13:56 javier.sepulveda Status assigned => feedback
2024-03-07 09:29 Christian Grothoff Note Added: 0021746
2024-03-07 09:30 Christian Grothoff Assigned To Christian Grothoff => javier.sepulveda
2024-03-07 09:32 javier.sepulveda Note Added: 0021747
2024-03-07 12:53 javier.sepulveda Note Added: 0021755
2024-03-11 09:08 javier.sepulveda Note Added: 0021832
2024-03-12 09:03 javier.sepulveda Status feedback => resolved
2024-03-12 09:03 javier.sepulveda Resolution open => fixed
2024-03-12 13:46 Christian Grothoff Product Version => git (master)
2024-03-12 13:46 Christian Grothoff Fixed in Version => 0.10
2024-03-12 13:46 Christian Grothoff Target Version 0.11 => 0.10
2024-03-12 13:48 Christian Grothoff Assigned To javier.sepulveda => Christian Grothoff
2024-03-12 13:48 Christian Grothoff Status resolved => feedback
2024-03-12 13:48 Christian Grothoff Resolution fixed => reopened
2024-03-12 13:59 Christian Grothoff Note Added: 0021864
2024-03-13 17:19 Christian Grothoff View Status private => public
2024-03-18 12:19 Christian Grothoff Note Edited: 0021864
2024-03-18 12:19 Christian Grothoff Assigned To Christian Grothoff => javier.sepulveda
2024-03-18 12:20 Christian Grothoff Status feedback => assigned
2024-04-12 18:10 Christian Grothoff Priority normal => high
2024-04-12 18:10 Christian Grothoff Fixed in Version 0.10 =>
2024-04-12 18:10 Christian Grothoff Target Version 0.10 => 0.11