The 8 essential server monitoring KPIs

1. CPU Usage

The processor is often the first bottleneck. Excessively high CPU usage slows down all operations.

Normal

< 70%

Warning

70-85%

Critical

> 85%

Also monitor: Load average (1, 5, 15 min), I/O wait, steal time (VM)

2. RAM Memory

Running out of RAM forces the system to use swap (disk), which severely degrades performance.

Normal

< 80%

Warning

80-90%

Critical

> 90% or active swap

Also monitor: Swap usage, OOM killer events, cache/buffers

3. Disk Space

A full disk can completely lock up a server. Databases and logs are the usual culprits.

Normal

< 70%

Warning

70-85%

Critical

> 85%

Also monitor: Inodes, I/O latency, SMART status (physical drives)

4. Network

Bandwidth and network latency directly impact the end-user experience.

Throughput: inbound/outbound traffic (Mbps)
Latency: response time (ms)
Packet loss: dropped packets (%)
Connections: number of active TCP connections

5. Availability (Uptime)

The most important metric: is your server responding to requests?

Checks to configure:

ICMP ping (network availability)
HTTP check (status 200, response time)
Critical port checks (SSH, database)
Application-level check (/health endpoint)

6. Application Services

Verify that your critical services are running and responding correctly.

Web Server

Apache/Nginx: processes, workers, connections

Database

MySQL/PostgreSQL: connections, queries/s, slow queries

PHP-FPM

Active workers, queue, processing time

Cache

Redis/Memcached: hit ratio, memory usage

7. Backups

A backup that has not been verified does not exist. Monitor your backups.

Last backup status (success/failure)
Date of last successful backup
Backup size (anomaly detection)
Available storage space

8. Security

Detect intrusion attempts and abnormal behaviour.

Failed SSH authentication attempts
Connections from unknown IP addresses
Modifications to critical files
SSL certificates nearing expiration

Alerting best practices

Avoid alert fatigue

Too many alerts kill alerting. If your team receives 100 notifications a day, they will stop paying attention. Set realistic thresholds and group similar alerts together.

Use time-based thresholds

A 3-second CPU spike is not a problem. Configure alerts that only trigger when a threshold is exceeded for X minutes.

Prioritise alerts

Distinguish critical alerts (SMS/phone call) from informational alerts (email). Web server down = immediate call. Disk at 75% = email for action within 24 hours.

Server Monitoring:
The Essential KPIs

The 8 essential server monitoring KPIs

1. CPU Usage

2. RAM Memory

3. Disk Space

4. Network

5. Availability (Uptime)

6. Application Services

7. Backups

8. Security

Alerting best practices

Avoid alert fatigue

Use time-based thresholds

Prioritise alerts

PDF Guide: Stress-Free On-Call Support

Monitoring included in our managed services

Server Monitoring: The Essential KPIs

The 8 essential server monitoring KPIs

1. CPU Usage

2. RAM Memory

3. Disk Space

4. Network

5. Availability (Uptime)

6. Application Services

7. Backups

8. Security

Alerting best practices

Avoid alert fatigue

Use time-based thresholds

Prioritise alerts

PDF Guide: Stress-Free On-Call Support

Monitoring included in our managed services

Server Monitoring:
The Essential KPIs