View Issue Details

IDProjectCategoryView StatusLast Update
0005487Talerdeployment and operationspublic2020-07-05 00:07
ReporterFlorian DoldAssigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
Status confirmedResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary0005487: set up system monitoring on gv
DescriptionOn tripwire, we previously ran into the issue of a full /tmp directory.

With gv, we should have monitoring in place to inform us of abnormal states, such as full disks or excessive memory use.
TagsNo tags attached.

Activities

nikita

2019-07-15 18:55

developer   ~0014680

ack, will get to this after GSoC.

buckE

2020-04-13 03:37

reporter   ~0015634

Please list all things to monitor here. They do not have to be in one note, but as you think of them, add them so we have a full picture when it's time to start this.

buckE

2020-06-19 08:54

reporter   ~0016320

Please create a list of things we should monitor. The list will help determine the best solution. If it's just RAM and disk space, we could have a simple BASH script and cron job. If the list is larger, we do something else.

Also Christian should be involved because I think he will probably use it (as root) most often. Maybe he likes something that he doesn't have to check, but notifies by e-mail when there is a problem. Maybe he prefers a GUI like Zabbix. Etc. But let's start with a list of things to monitor.

Florian Dold

2020-06-24 18:18

manager   ~0016356

We need two things:

1. General system monitoring, such as disks, memory, CPU usage.
2. Application-specific monitoring, which will go over log files and extract "stuff" from it. What this stuff is depends to some degree on the application, but we want to at least collect log lines that are at the ERROR log level and alert if they occur with a too high frequency.

NetData (https://github.com/netdata/netdata) might be a good start for system monitoring.

buckE

2020-06-25 07:52

reporter   ~0016358

Ah. Well if you already have a project in mind that you want to use, it's now a matter of how Christian feels.

This is the quickstart: https://github.com/netdata/netdata#quickstart

For the .deb: https://github.com/netdata/netdata/blob/master/packaging/installer/README.md

And a good overview: https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-00.md#netdata-fundamentals

Here is information on opting out of the analytics: https://github.com/netdata/netdata/blob/master/docs/anonymous-statistics.md

From the website it looks to have some nice features, including integration with PostgreSQL, nginx, syslog.

But I really can't imagine doing this on a live system without testing on a stage server first. But the next step is gettin Christian's opinion. Assigning to him.

Christian Grothoff

2020-06-25 09:16

manager   ~0016361

netdata seems fine. I do see the point that we need to sort out server staging. Don't have a good plan for that yet.

buckE

2020-06-26 11:27

reporter   ~0016369

> do see the point that we need to sort out server staging. Don't have a good plan for that yet.

I recommend a cheap VPS. They can be *very* cheap. We don't need anything fast, or even very secure just to do staging (except deployment.git stuff maybe). All we need to do is to duplicate the taler.net environment reasonably well.

Issue History

Date Modified Username Field Change
2018-11-27 10:53 Florian Dold New Issue
2019-07-15 18:55 nikita Assigned To => nikita
2019-07-15 18:55 nikita Status new => assigned
2019-07-15 18:55 nikita Note Added: 0014680
2020-04-13 02:45 Christian Grothoff Assigned To nikita => buckE
2020-04-13 02:45 Christian Grothoff Severity minor => feature
2020-04-13 03:37 buckE Note Added: 0015634
2020-06-19 08:54 buckE Note Added: 0016320
2020-06-19 08:54 buckE Assigned To buckE => Florian Dold
2020-06-24 18:18 Florian Dold Note Added: 0016356
2020-06-25 07:52 buckE Note Added: 0016358
2020-06-25 07:52 buckE Assigned To Florian Dold => Christian Grothoff
2020-06-25 09:16 Christian Grothoff Note Added: 0016361
2020-06-26 11:27 buckE Note Added: 0016369
2020-07-05 00:07 Christian Grothoff Assigned To Christian Grothoff =>
2020-07-05 00:07 Christian Grothoff Status assigned => confirmed