Server monitoring for web projects with DataDog
I learned about DataDog in 2017 from one of my favorite IT podcasts: DevZen. Since then, it has been my reliable assistant in monitoring my servers and the health of the sites that I maintain. In this article I will tell you which parts of the DataDog agent I use and how I configure them in order to receive metrics and setup notifications. A free account is enough for me (I also tried a paid one - but it turned out to be too expensive for me, and a free one is just perfect).
Infrastructure: how do I run websites
First, a few words about how I run multiple sites on the same server: for this, of course, I use docker and docker-compose. Nginx or traefik acts as a balancer and reverse-proxy. So, each site is running in its own environment, in its own subnet, with services that it needs only for itself. At the same time, projects are completely isolated from each other both at the network level and at the file system level:
DadaDog agent installation
First, install the official DataDog agent using the command that can be found in the official documentation. The unique token of your datadog user will already be substituted into the executed command: you only need to run it:
DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=XXX DD_SITE="datadoghq.com" bash -c "$(curl -L
https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
Next, you need to install the module for collecting traefik metrics with a separate command, if you use traefik (I recently - yes). This must be done with dd-agent user:
sudo -u dd-agent datadog-agent integration install -t datadog-traefik==1.0.0
Also, since we will be working with docker, we need to add the dd-agent user to the docker group so that it can interact with the service:
usermod -aG docker dd-agent
Configuring DataDog Agent
After the agent is installed, several configuration files must be created or edited.
Docker
To activate docker monitoring:
/etc/datadog-agent/conf.d/docker.d/conf.yaml
init_config:
instances:
- url: "unix://var/run/docker.sock"
new_tag_names: true
Volumes
We correct the list of partitions so as not to collect information on virtual disks created by docker. Otherwise, the data will be corrupted:
/etc/datadog-agent/conf.d/disk.d/conf.yaml
init_config:
instances:
- use_mount: false
file_system_blacklist:
- tmpfs
- none
- shm
- nsfs
- netns
- binfmt_misc
- autofs
mount_point_blacklist:
- /var/lib/docker/(containers|overlay2)/
- /run/docker/
- /sys/kernel/debug/
- /run/user/1000/
Traefik
If you are using traefik, then activate it using the config:
/etc/datadog-agent/conf.d/traefik.d/conf.yaml
init_config:
instances:
- host: localhost
In addition, the following block must be added to the traefik YAML configuration file:
metrics:
datadog:
addEntryPointsLabels: true
Nginx
If you are using nginx, then activate it using this config: /etc/datadog-agent/conf.d/nginx.d/conf.yaml
init_config:
instances:
- nginx_status_url: http://localhost:8831/nginx_status
We also create a config that opens a port for nginx status page: /etc/nginx/sites-enabled/nginx_status
server {
listen 8831;
location /nginx_status {
stub_status;
}
}
HTTP protocol checks
The DataDog has huge opportunities for various types of checks based on the http protocol: checks for the presence of a string in the response, headers, response codes, timeouts, and so on. More details can be found in the official documentation. In my example, the simplest checks for the health of the main page are applied. In real life, this file can be very long for me and check different site urls, including especially "heavy" and slow ones.
/etc/datadog-agent/conf.d/http_check.d/conf.yaml
init_config:
instances:
- name: akvilon.expert
url: https://akvilon.expert
timeout: 5
http_response_status_code: 200
seconds_warning: 3
- name: volvofix.ru
url: https://volvofix.ru
timeout: 5
http_response_status_code: 200
seconds_warning: 3
After the settings are completed, our host and a list of services that are found on it will appear in the Infrastructure List section.
Based on the received data, you can easily set up alerts for events such as:
- lack of free space on disks or in RAM
- website performance problems: downtime or long timeouts
- imminent expiration of tls certificates
- downtime of the entire server
Using datadog for free for several years, I was convinced that this is a great tool and an indispensable assistant for infrastructure monitoring. In addition, it has a million other features and functions that can be used when maintaining web projects: for example, synthetic browser tests.
I hope you found this information helpful!