Server Monitoring:
The Essential KPIs
Monitoring the right indicators lets you detect problems before they impact your users. Here are the essential server metrics and how to configure them.
The 8 essential server monitoring KPIs
1. CPU Usage
The processor is often the first bottleneck. Excessively high CPU usage slows down all operations.
Normal
< 70%
Warning
70-85%
Critical
> 85%
Also monitor: Load average (1, 5, 15 min), I/O wait, steal time (VM)
2. RAM Memory
Running out of RAM forces the system to use swap (disk), which severely degrades performance.
Normal
< 80%
Warning
80-90%
Critical
> 90% or active swap
Also monitor: Swap usage, OOM killer events, cache/buffers
3. Disk Space
A full disk can completely lock up a server. Databases and logs are the usual culprits.
Normal
< 70%
Warning
70-85%
Critical
> 85%
Also monitor: Inodes, I/O latency, SMART status (physical drives)
4. Network
Bandwidth and network latency directly impact the end-user experience.
- Throughput: inbound/outbound traffic (Mbps)
- Latency: response time (ms)
- Packet loss: dropped packets (%)
- Connections: number of active TCP connections
5. Availability (Uptime)
The most important metric: is your server responding to requests?
Checks to configure:
- ICMP ping (network availability)
- HTTP check (status 200, response time)
- Critical port checks (SSH, database)
- Application-level check (/health endpoint)
6. Application Services
Verify that your critical services are running and responding correctly.
Web Server
Apache/Nginx: processes, workers, connections
Database
MySQL/PostgreSQL: connections, queries/s, slow queries
PHP-FPM
Active workers, queue, processing time
Cache
Redis/Memcached: hit ratio, memory usage
7. Backups
A backup that has not been verified does not exist. Monitor your backups — and specifically the result of the verify-job, not just the apparent success of the backup-job.
- Last backup status (success/failure)
- Date of last successful backup
- Backup size (anomaly detection)
- Available storage space
8. Security
Detect intrusion attempts and abnormal behaviour.
- Failed SSH authentication attempts
- Connections from unknown IP addresses
- Modifications to critical files
- SSL certificates nearing expiration
Alerting best practices
Avoid alert fatigue
Too many alerts kill alerting. If your team receives 100 notifications a day, they will stop paying attention. Set realistic thresholds and group similar alerts together.
Use time-based thresholds
A 3-second CPU spike is not a problem. Configure alerts that only trigger when a threshold is exceeded for X minutes.
Prioritise alerts
Distinguish critical alerts (SMS/phone call) from informational alerts (email). Web server down = immediate call. Disk at 75% = email for action within 24 hours.
Monitor clock drift too
Silent but critical KPI: if your servers drift out of sync (NTP down, source lost), your logs become inconsistent and some authentications break. Add an NTP offset check and consider pointing at a sovereign NTP/NTS source with GPS Stratum 1.
PDF Guide: Stress-Free On-Call Support
Organisation, real costs, and French legal framework
Monitoring included in our managed services
Our plans start from EUR 70/month (business hours) or EUR 150/month (24x7) and include monitoring of all these KPIs, with intelligent alerting and incident response. Need managed Proxmox supervision? We manage your clusters too.