Evgenii Goryaev
Development, support and optimization

Server monitoring for web projects with DataDog

Poster for the article Server monitoring for web projects with DataDog

I learned about DataDog in 2017 from one of my favorite IT podcasts: DevZen. Since then, it has been my reliable assistant in monitoring my servers and the health of the sites that I maintain. In this article I will tell you which parts of the DataDog agent I use and how I configure them in order to receive metrics and setup notifications. A free account is enough for me (I also tried a paid one - but it turned out to be too expensive for me, and a free one is just perfect).

Infrastructure: how do I run websites

First, a few words about how I run multiple sites on the same server: for this, of course, I use docker and docker-compose. Nginx or traefik acts as a balancer and reverse-proxy. So, each site is running in its own environment, in its own subnet, with services that it needs only for itself. At the same time, projects are completely isolated from each other both at the network level and at the file system level:

Server scheme

DadaDog agent installation

First, install the official DataDog agent using the command that can be found in the official documentation. The unique token of your datadog user will already be substituted into the executed command: you only need to run it:

DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=XXX DD_SITE="datadoghq.com" bash -c "$(curl -L 
https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"

Next, you need to install the module for collecting traefik metrics with a separate command, if you use traefik (I recently - yes). This must be done with dd-agent user:

sudo -u dd-agent datadog-agent integration install -t datadog-traefik==1.0.0

Also, since we will be working with docker, we need to add the dd-agent user to the docker group so that it can interact with the service:

usermod -aG docker dd-agent

Configuring DataDog Agent

After the agent is installed, several configuration files must be created or edited.

Docker

To activate docker monitoring:

/etc/datadog-agent/conf.d/docker.d/conf.yaml

init_config:
instances:
    - url: "unix://var/run/docker.sock"
      new_tag_names: true

Volumes

We correct the list of partitions so as not to collect information on virtual disks created by docker. Otherwise, the data will be corrupted:

/etc/datadog-agent/conf.d/disk.d/conf.yaml

init_config:
instances:
  - use_mount: false
    file_system_blacklist:
      - tmpfs
      - none
      - shm
      - nsfs
      - netns
      - binfmt_misc
      - autofs
    mount_point_blacklist:
      - /var/lib/docker/(containers|overlay2)/
      - /run/docker/
      - /sys/kernel/debug/
      - /run/user/1000/

Traefik

If you are using traefik, then activate it using the config:

/etc/datadog-agent/conf.d/traefik.d/conf.yaml

init_config:
instances:
  - host: localhost 

In addition, the following block must be added to the traefik YAML configuration file:

metrics:
  datadog:
    addEntryPointsLabels: true

Nginx

If you are using nginx, then activate it using this config: /etc/datadog-agent/conf.d/nginx.d/conf.yaml

init_config:
instances:
  - nginx_status_url: http://localhost:8831/nginx_status

We also create a config that opens a port for nginx status page: /etc/nginx/sites-enabled/nginx_status

server {
    listen 8831;
    location /nginx_status {
        stub_status;
        }
}

HTTP protocol checks

The DataDog has huge opportunities for various types of checks based on the http protocol: checks for the presence of a string in the response, headers, response codes, timeouts, and so on. More details can be found in the official documentation. In my example, the simplest checks for the health of the main page are applied. In real life, this file can be very long for me and check different site urls, including especially "heavy" and slow ones.

/etc/datadog-agent/conf.d/http_check.d/conf.yaml

init_config:
instances:
  - name: akvilon.expert
    url: https://akvilon.expert
    timeout: 5
    http_response_status_code: 200
    seconds_warning: 3

  - name: volvofix.ru
    url: https://volvofix.ru
    timeout: 5
    http_response_status_code: 200
    seconds_warning: 3

After the settings are completed, our host and a list of services that are found on it will appear in the Infrastructure List section.

Хосты в панеле DadaDog

Based on the received data, you can easily set up alerts for events such as:

  • lack of free space on disks or in RAM
  • website performance problems: downtime or long timeouts
  • imminent expiration of tls certificates
  • downtime of the entire server

Using datadog for free for several years, I was convinced that this is a great tool and an indispensable assistant for infrastructure monitoring. In addition, it has a million other features and functions that can be used when maintaining web projects: for example, synthetic browser tests.

I hope you found this information helpful!