SLA vs SLO vs SLI: Understanding
IT Service Level Commitments
SLA (Service Level Agreement), SLO (Service Level Objective), and SLI (Service Level Indicator) form the backbone of every managed services contract. This guide explains what they mean, how they differ, and how to negotiate them effectively -- aligned with ITIL and ISO 20000 best practices.
Table of Contents
1. What Is an SLA (Service Level Agreement)?
SLA Definition
A Service Level Agreement (SLA) is a formal contract between a service provider and a client that defines the expected level of service. It specifies measurable commitments -- such as uptime, incident response time, and resolution time -- along with the consequences (typically service credits or penalties) when those commitments are not met.
The SLA is the binding document that governs the relationship between your organisation and your managed services provider. It establishes what "normal service" looks like and what constitutes a breach. Without a clear SLA, disputes are common because each party interprets service expectations differently.
In the ITIL framework and ISO 20000 standard, the SLA sits at the top of the service level management hierarchy. It is a customer-facing agreement -- as opposed to OLAs (Operational Level Agreements) which are internal, or UCs (Underpinning Contracts) which govern third-party suppliers.
Key Components of an SLA
Time-Based Commitments
- Response time by priority level
- Resolution time / MTTR targets
- Coverage hours (business hours, 24/7)
- Maintenance window notification periods
Availability Targets
- Guaranteed uptime (99%, 99.9%, 99.99%)
- Scheduled maintenance exclusions
- Measurement methodology
- Reporting period (monthly, quarterly)
For context: a cluster spread across 3 interconnected Equinix Paris datacenters is the kind of architecture required to hold 99.95 %+ over a year — exclusions matter just as much as the headline number.
Priority Levels
- P1/Critical: total service outage
- P2/Major: significant degradation
- P3/Minor: limited impact
- P4/Low: service request or inquiry
Penalties and Remedies
- Service credits for SLA breaches
- Penalty caps
- Escalation procedures
- Exclusions (force majeure, client fault)
2. What Is an SLO (Service Level Objective)?
SLO Definition
A Service Level Objective (SLO) is a specific, measurable target within an SLA. While the SLA is the overall contract, each SLO defines a single goal -- for example, "99.9% uptime measured monthly" or "P1 incident response within 1 hour." SLOs are the individual promises that make an SLA concrete.
Think of SLOs as the building blocks of your SLA. A single SLA typically contains multiple SLOs covering different aspects of service quality: availability, response time, throughput, error rate, and so on.
The concept of SLOs was popularised by Google's Site Reliability Engineering (SRE) framework, which treats SLOs as the primary mechanism for balancing reliability with development velocity. Even outside the SRE context, SLOs provide a structured way to define "good enough" service.
Examples of Common SLOs
- Availability SLO: 99.95% uptime per calendar month
- Response Time SLO: P1 incidents acknowledged within 15 minutes
- Resolution Time SLO: P1 incidents resolved within 4 hours
- Latency SLO: 95th percentile API response under 200ms
SLOs and Error Budgets
A powerful concept linked to SLOs is the error budget. If your availability SLO is 99.9%, your error budget is 0.1% -- roughly 43 minutes of downtime per month. This budget can be "spent" on deployments, maintenance, or incidents. When the budget is exhausted, the team should prioritise reliability over new features.
Internal vs External SLOs
Best practice is to set internal SLOs that are tighter than your external SLA. If your SLA promises 99.9% uptime, target 99.95% internally. This gives your team a buffer to catch and fix issues before they become contractual breaches.
3. What Is an SLI (Service Level Indicator)?
SLI Definition
A Service Level Indicator (SLI) is the actual measurement used to evaluate whether an SLO is being met. It is a quantitative metric -- a number derived from real system data -- that tells you how your service is performing right now. Without SLIs, SLOs are just aspirational targets with no way to verify compliance.
SLIs are typically expressed as ratios or percentages. For example, an availability SLI might be calculated as: successful requests / total requests * 100. A response time SLI might be the 95th percentile latency over a rolling 5-minute window.
Common SLIs in Managed Services
- Uptime percentage: minutes of availability / total minutes in period
- Ticket response time: time from ticket creation to first human response
- MTTR: average time from incident report to service restoration
- Error rate: percentage of failed requests or transactions
Choosing Good SLIs
A well-chosen SLI should reflect the user's experience, not just internal system health. CPU utilisation is a poor SLI because a server can be at 90% CPU and still serving requests perfectly. Request success rate is a better SLI because it directly measures what the user cares about.
Watch Out for Vanity SLIs
Some providers advertise very short response times (e.g., "15-minute response") that only correspond to an automated email acknowledgement. A genuine response time SLI should measure the time until a qualified human engineer begins working on your incident -- not just when a bot sends an auto-reply.
4. SLA vs SLO vs SLI: Key Differences
| Criterion | SLI | SLO | SLA |
|---|---|---|---|
| What it is | A measurement | A target | A contract |
| Example | Current uptime is 99.97% | Target uptime: 99.9% | Contract guarantees 99.9% |
| Audience | Engineering team | Engineering + management | Customer-facing |
| Consequences of breach | Investigation triggered | Internal escalation | Service credits / penalties |
| Binding? | No | Internally only | Yes, legally |
| Framework reference | Google SRE, ITIL | Google SRE, ITIL | ITIL, ISO 20000 |
Practical Example
Scenario: Your production web server goes down at 2:00 PM. You report the incident immediately.
SLI
The monitoring system records the outage start time. The ticket system records an engineer was assigned at 2:18 PM. Response time SLI = 18 minutes.
SLO
Your internal target says P1 incidents should be responded to within 30 minutes. SLO met (18 min < 30 min).
SLA
The contract guarantees response within 1 hour. SLA met (18 min < 60 min). No service credits triggered.
5. Incident Response Time and MTTR Explained
Two of the most important SLOs in any managed services contract are response time and resolution time. Understanding the difference -- and how each is measured -- is critical when evaluating providers.
Response Time (Time to Respond)
Response time measures the interval between an incident being reported and a qualified engineer starting to work on it. It is sometimes called TTR (Time to Respond) or, in ITIL terminology, the initial response target.
What it includes:
- Ticket receipt and registration
- Assignment to a competent engineer
- Start of diagnosis or intervention
- First communication to the client
What it does NOT guarantee:
- Problem resolution
- Service restoration
- Total downtime duration
Resolution Time / MTTR
Resolution time -- often expressed as MTTR (Mean Time to Repair) -- measures the interval between an incident being reported and the service being fully restored. This is a much stronger commitment because it guarantees the outcome, not just the beginning of work.
What it guarantees:
- Service is accessible to users again
- Critical functionality is operational
- Incident is resolved (even if root cause analysis follows)
MTTR vs root cause fix:
MTTR covers service restoration, not necessarily the permanent fix. A provider may restore service via a workaround (meeting the MTTR target) and then address the root cause separately. This distinction matters when reading SLA fine print.
Why MTTR Commitments Cost More
Committing to MTTR is inherently risky for a provider: they cannot always predict incident complexity upfront. This is why contracts with MTTR guarantees typically cost more than response-time-only contracts. At RDEM Systems, we include a 4-hour response time (or 1-hour with our Critical plan) because we prefer realistic, measurable commitments over promises that are difficult to keep. See our managed server plans with guaranteed SLAs.
A second reason MTTR commitments carry a premium: enforcing them requires staffing coverage outside business hours. A 4-hour MTTR that only applies from 9 to 5 is not really a 4-hour MTTR. For organisations without an internal NOC, the usual answer is to outsource the night-and-weekend tier — and the real cost of 24/7 on-call support is the line item most buyers underestimate when they sign an MTTR-backed SLA.
6. Standard SLA Tiers and Metrics
Response and resolution times vary by incident priority. The table below shows industry-standard values commonly found in managed services contracts, aligned with ITIL incident priority classifications:
| Priority | Description | Response Time | Resolution Target |
|---|---|---|---|
| P1 - Critical | Total service outage, major business impact | 15 min - 1h | 4h - 8h |
| P2 - Major | Significant degradation, workaround available | 1h - 4h | 8h - 24h |
| P3 - Minor | Limited impact, few users affected | 4h - 8h | 24h - 72h |
| P4 - Low | Service request, inquiry, improvement | 8h - 24h | Best effort |
At RDEM Systems
Our managed services plans include a 4-hour response time for critical incidents, with 24/7/365 on-call support. For mission-critical infrastructure, we offer a 1-hour response time option. Explore our dedicated server management with guaranteed SLAs.
24/7 On-Call -- 4h Response
Included in the 24x7 plan at 150 EUR/month/server (or 70 EUR/month business hours only)
24/7 On-Call -- 1h Response
6,000 EUR/month (fleet package)
Uptime SLOs: What the Nines Really Mean
Availability targets are often expressed as "nines" -- but the practical difference between each level is dramatic:
| Availability | Downtime / Month | Downtime / Year | Typical Use Case |
|---|---|---|---|
| 99% (two nines) | 7h 18min | 3.65 days | Internal tools, dev environments |
| 99.9% (three nines) | 43 min | 8.76 hours | Business applications, SaaS |
| 99.95% | 21 min | 4.38 hours | E-commerce, customer portals |
| 99.99% (four nines) | 4.3 min | 52.6 min | Financial systems, healthcare |
How Are SLA Metrics Measured?
The measurement methodology must be precisely defined in your SLA to prevent disputes. Here are the key considerations:
When the Clock Starts
The timer typically starts at ticket creation. But watch for nuances:
- Automated ticket (monitoring alert): clock starts when the alert fires
- Manual ticket (phone/email): clock starts when the ticket is logged in the system
- Business hours only? Verify whether the clock runs outside coverage hours
Often overlooked: the timestamps used to prove SLA compliance are only defensible if the provider's clocks are traceably synchronized. That's the point of running a sovereign NTP/NTS infrastructure with GPS-backed Stratum 1 — neutral evidence that a ticket really opened when claimed.
When the Clock Pauses
Most SLAs define conditions where the timer is suspended:
- Awaiting client input: the provider needs information or access from you
- Access denied: the provider cannot reach the affected system
- Third-party dependency: the issue depends on a hosting provider, vendor, or ISP
When the Clock Stops
Response time met when:
An engineer has begun diagnosis and sent the first update to the client.
Resolution time met when:
The service is restored and functional (confirmed by the client, or automatically after X hours without objection).
Measurement Example
P1 incident reported: 2:00 PM
Contractual response time: 1 hour
Engineer begins work: 2:42 PM
Actual response time: 42 minutes
Result: SLA met (42 min < 60 min)
7. How to Negotiate an SLA
A good SLA strikes a balance between your actual needs and an acceptable cost. Here are practical tips for negotiating effectively.
Assess Your Real Requirements
Do you genuinely need a 15-minute response time at 3 AM on a Sunday? If your business does not operate 24/7, business-hours coverage may be sufficient -- and considerably cheaper. Map SLA tiers to actual business impact using a framework like BIA (Business Impact Analysis).
Prioritise Your Services Correctly
Not every server deserves the same SLA. A revenue-generating e-commerce platform needs a short response time. A development environment can wait. Tiered SLAs reduce costs without sacrificing protection where it matters.
Be Sceptical of Unrealistic Targets
A 15-minute response time looks appealing on paper, but if it is not achievable, the provider will find loopholes: pausing the clock, reclassifying priority levels, or counting automated replies as "responses." Realistic commitments are more valuable than impressive numbers.
Scrutinise the Exclusions
An SLA packed with exclusions (broad force majeure clauses, third-party outages, maintenance windows) may never actually apply. Read the fine print carefully. A good SLA clearly defines what is excluded -- and the list should be short.
Ask for Historical Performance Data
A reputable provider can show you their SLA compliance statistics. If the rate is close to 100%, the SLA targets are realistic. If they refuse to share performance data, that is a red flag. Transparency is a hallmark of a trustworthy managed services partner.
Align with Industry Standards
Reference established frameworks when negotiating. ITIL provides a mature incident management process. ISO 20000 defines SLA requirements for IT service management certification. Using these standards gives your negotiation a solid foundation and avoids subjective arguments.
8. Penalties and Service Credits
An SLA without penalties is just a statement of intent. Financial consequences give teeth to the commitments and incentivise the provider to meet them consistently.
Service Credits
The most common remedy: when the SLA is breached, the client receives a credit on the next invoice. Typical structures:
- 5-10% of monthly fee per P1 SLA breach
- 2-5% for P2 breaches
- Cap usually at 20-30% of monthly fee
Contractual Remedies
Beyond financial penalties, a well-drafted SLA may include:
- Right to terminate without notice after X SLA breaches
- Mandatory post-incident review (RCA)
- Required improvement plan with measurable milestones
Understand the Limits
SLA penalties almost never cover consequential damages (lost revenue, customer churn, reputational harm). For those risks, you need separate insurance. The SLA is an incentive mechanism, not a full indemnification. To estimate the actual financial impact of downtime on your business, try our downtime cost calculator.
Managed Services with Contractual SLA -- from 70 EUR/month
At RDEM Systems, SLA commitments are contractual and measurable. No "best effort," no fine print. Our 3 plans cover any hosting provider:
Essential
70 EUR/mo
7 AM - 10 PM, 7 days/week
Pro
150 EUR/mo
24/7 -- 4h response
Critical
250 EUR/mo
24/7 -- 1h response
We manage your servers regardless of hosting provider: OVHcloud, Scaleway, Hetzner, Contabo, IONOS, or any other provider.
Related Articles
What Is an MSP? Complete Guide
Managed Service Provider: definition, services, and business advantages.
Read article →What Does Server Management Include?
Complete breakdown of services included in a managed services plan.
Read article →How Our On-Call Support Works
Our operational setup for guaranteed rapid 24/7 incident response.
Read article →How Much Does On-Call Support Cost?
Cost comparison: in-house on-call teams vs outsourced managed on-call.
Read article →Clear SLAs. Measurable Commitments.
At RDEM Systems, our commitments are straightforward: 4-hour response time included, 1-hour option available. No fine print, no surprises. See our managed services plans with guaranteed SLAs.