0008302
Summary0008302: Current monitoring tools - Write documentation
DescriptionAt this point we have currently up and running: Netdata, Grafana and Kuma.
So write documentation about what exactly is being monitored by each program.
2024-02-07 14:18

administrator   ~0021193

Agreed to create a completely new .rst file under docs.git


2024-02-14 14:16

administrator   ~0021290

Doing quite good progress with drafting of this document. Right now I do have quite a nice skeleton of it, and are getting into the detail of each section.


2024-03-05 13:56

administrator   ~0021672

Pushed to git the reviewed Draft. Still need to be linked somewhere in the --menu.
Even though I refer in this document to Loki and Promotail, yet, we don't have them installed on (loki+promtail). Installing those packages is easy, but
configuring the logs is another story. I need to --learn first how to manage Loki+Promtail through
the Grafana admin panel, and after that learn how to track specific logs.
Nowadays, there is different and opened bug for that 0008303.

Christian Grothoff

2024-03-07 09:29

manager   ~0021746

Did you forget to Git add images?

/research/taler/docs/taler-monitoring-infrastructure.rst:24: WARNING: image file not readable: images/taler-monitoring-infrastructure.png
/research/taler/docs/taler-monitoring-infrastructure.rst:91: WARNING: image file not readable: images/grafana-postgres-exporter.png
/research/taler/docs/taler-monitoring-infrastructure.rst:108: WARNING: image file not readable: images/uptime-kuma-from-grafana.png
/research/taler/docs/taler-monitoring-infrastructure.rst:170: WARNING: image file not readable: images/kuma.png
/research/taler/docs/taler-monitoring-infrastructure.rst:197: WARNING: image file not readable: images/uptime-kuma-edit.png


2024-03-07 09:32

administrator   ~0021747

Yes. I realized that yesterday and added to my caldav tasks. I will do that in not time. Thank you.


2024-03-07 12:53

administrator   ~0021755

I have checked this, as I was sure I had copied those images to the --images folder beforehand. So I thought doing "git add ." would have include them.
Now I can see, the .gitignore is avoiding to track these .png files.

# generated images

So, no idea what's the plan now. As until I can't change the .gitignore file, I won't able to commit to add these images files.
I can find my way to upload them manually but I am not sure if this what we want to do forever, when adding image files on the rst files.


2024-03-11 09:08

administrator   ~0021832

Added missing image files by using: git add -f
I will add that to my notes, to take that into account for the next time.

Christian Grothoff

2024-03-12 13:59

manager   ~0021864

Last edited: 2024-03-18 12:19

Reopening now to give feedback on both documentation and the setup.

1) In the main figure on top, it would be good to consistently have the real hostnames (not: TALER OPS, not: "external server") in the architecture diagram. Also, we have 6 internal machines: (BFH), (TUM), (BFH), $ (TUE), (contabo) and (hosttech). They are at 5 different hosters -- that should be *plenty* to run the uptime kuma servers _within_ our own infrastructure instead of depending on any external hosts. I propose we run one at TUE, and a second one at contabo. Also, we should monitor _all_ of our hosts, not just and In fact, we should now *also* start monitoring ;-). The diagram should also say where grafana is hosted.

2) Grafana dashboards: Marco Boss had developed some taler-specific Grafana dashboards as part of this BS thesis. You can find them in the grid5k.git, and probably should unearth and deploy them (and then document that).

3) The Node exporter, Postgres exporter and Prometheus seem perfect targets for an Ansible playbook for (and in the future other hosts).

4) Managing logs: it is totally unclear from what you write *where* this is done.

