Technical Guide

Server Monitoring:
The Essential KPIs

Monitoring the right indicators lets you detect problems before they impact your users. Here are the essential server metrics and how to configure them.

The 8 essential server monitoring KPIs

1. CPU Usage

The processor is often the first bottleneck. Excessively high CPU usage slows down all operations.

Normal

< 70%

Warning

70-85%

Critical

> 85%

Also monitor: Load average (1, 5, 15 min), I/O wait, steal time (VM)

2. RAM Memory

Running out of RAM forces the system to use swap (disk), which severely degrades performance.

Normal

< 80%

Warning

80-90%

Critical

> 90% or active swap

Also monitor: Swap usage, OOM killer events, cache/buffers

3. Disk Space

A full disk can completely lock up a server. Databases and logs are the usual culprits.

Normal

< 70%

Warning

70-85%

Critical

> 85%

Also monitor: Inodes, I/O latency, SMART status (physical drives)

4. Network

Bandwidth and network latency directly impact the end-user experience.

  • Throughput: inbound/outbound traffic (Mbps)
  • Latency: response time (ms)
  • Packet loss: dropped packets (%)
  • Connections: number of active TCP connections

5. Availability (Uptime)

The most important metric: is your server responding to requests?

Checks to configure:

  • ICMP ping (network availability)
  • HTTP check (status 200, response time)
  • Critical port checks (SSH, database)
  • Application-level check (/health endpoint)

6. Application Services

Verify that your critical services are running and responding correctly.

Web Server

Apache/Nginx: processes, workers, connections

Database

MySQL/PostgreSQL: connections, queries/s, slow queries

PHP-FPM

Active workers, queue, processing time

Cache

Redis/Memcached: hit ratio, memory usage

7. Backups

A backup that has not been verified does not exist. Monitor your backups.

  • Last backup status (success/failure)
  • Date of last successful backup
  • Backup size (anomaly detection)
  • Available storage space

8. Security

Detect intrusion attempts and abnormal behaviour.

  • Failed SSH authentication attempts
  • Connections from unknown IP addresses
  • Modifications to critical files
  • SSL certificates nearing expiration

Alerting best practices

Avoid alert fatigue

Too many alerts kill alerting. If your team receives 100 notifications a day, they will stop paying attention. Set realistic thresholds and group similar alerts together.

Use time-based thresholds

A 3-second CPU spike is not a problem. Configure alerts that only trigger when a threshold is exceeded for X minutes.

Prioritise alerts

Distinguish critical alerts (SMS/phone call) from informational alerts (email). Web server down = immediate call. Disk at 75% = email for action within 24 hours.

PDF Guide: Stress-Free On-Call Support

Organisation, real costs, and French legal framework

Download

Monitoring included in our managed services

Our plans start from EUR 70/month (business hours) or EUR 150/month (24x7) and include monitoring of all these KPIs, with intelligent alerting and incident response. Need managed Proxmox supervision? We manage your clusters too.