SLA (Service Level Agreement), SLO (Service Level Objective), and SLI (Service Level Indicator) form the backbone of every managed services contract. This guide explains what they mean, how they differ, and how to negotiate them effectively -- aligned with ITIL and ISO 20000 best practices.
A Service Level Agreement (SLA) is a formal contract between a service provider and a client that defines the expected level of service. It specifies measurable commitments -- such as uptime, incident response time, and resolution time -- along with the consequences (typically service credits or penalties) when those commitments are not met.
The SLA is the binding document that governs the relationship between your organisation and your managed services provider. It establishes what "normal service" looks like and what constitutes a breach. Without a clear SLA, disputes are common because each party interprets service expectations differently.
In the ITIL framework and ISO 20000 standard, the SLA sits at the top of the service level management hierarchy. It is a customer-facing agreement -- as opposed to OLAs (Operational Level Agreements) which are internal, or UCs (Underpinning Contracts) which govern third-party suppliers.
A Service Level Objective (SLO) is a specific, measurable target within an SLA. While the SLA is the overall contract, each SLO defines a single goal -- for example, "99.9% uptime measured monthly" or "P1 incident response within 1 hour." SLOs are the individual promises that make an SLA concrete.
Think of SLOs as the building blocks of your SLA. A single SLA typically contains multiple SLOs covering different aspects of service quality: availability, response time, throughput, error rate, and so on.
The concept of SLOs was popularised by Google's Site Reliability Engineering (SRE) framework, which treats SLOs as the primary mechanism for balancing reliability with development velocity. Even outside the SRE context, SLOs provide a structured way to define "good enough" service.
A powerful concept linked to SLOs is the error budget. If your availability SLO is 99.9%, your error budget is 0.1% -- roughly 43 minutes of downtime per month. This budget can be "spent" on deployments, maintenance, or incidents. When the budget is exhausted, the team should prioritise reliability over new features.
Best practice is to set internal SLOs that are tighter than your external SLA. If your SLA promises 99.9% uptime, target 99.95% internally. This gives your team a buffer to catch and fix issues before they become contractual breaches.
A Service Level Indicator (SLI) is the actual measurement used to evaluate whether an SLO is being met. It is a quantitative metric -- a number derived from real system data -- that tells you how your service is performing right now. Without SLIs, SLOs are just aspirational targets with no way to verify compliance.
SLIs are typically expressed as ratios or percentages. For example, an availability SLI might be calculated as: successful requests / total requests * 100. A response time SLI might be the 95th percentile latency over a rolling 5-minute window.
A well-chosen SLI should reflect the user's experience, not just internal system health. CPU utilisation is a poor SLI because a server can be at 90% CPU and still serving requests perfectly. Request success rate is a better SLI because it directly measures what the user cares about.
Some providers advertise very short response times (e.g., "15-minute response") that only correspond to an automated email acknowledgement. A genuine response time SLI should measure the time until a qualified human engineer begins working on your incident -- not just when a bot sends an auto-reply.
| Criterion | SLI | SLO | SLA |
|---|---|---|---|
| What it is | A measurement | A target | A contract |
| Example | Current uptime is 99.97% | Target uptime: 99.9% | Contract guarantees 99.9% |
| Audience | Engineering team | Engineering + management | Customer-facing |
| Consequences of breach | Investigation triggered | Internal escalation | Service credits / penalties |
| Binding? | No | Internally only | Yes, legally |
| Framework reference | Google SRE, ITIL | Google SRE, ITIL | ITIL, ISO 20000 |
Scenario: Your production web server goes down at 2:00 PM. You report the incident immediately.
SLI
The monitoring system records the outage start time. The ticket system records an engineer was assigned at 2:18 PM. Response time SLI = 18 minutes.
SLO
Your internal target says P1 incidents should be responded to within 30 minutes. SLO met (18 min < 30 min).
SLA
The contract guarantees response within 1 hour. SLA met (18 min < 60 min). No service credits triggered.
Two of the most important SLOs in any managed services contract are response time and resolution time. Understanding the difference -- and how each is measured -- is critical when evaluating providers.
Response time measures the interval between an incident being reported and a qualified engineer starting to work on it. It is sometimes called TTR (Time to Respond) or, in ITIL terminology, the initial response target.
Resolution time -- often expressed as MTTR (Mean Time to Repair) -- measures the interval between an incident being reported and the service being fully restored. This is a much stronger commitment because it guarantees the outcome, not just the beginning of work.
MTTR covers service restoration, not necessarily the permanent fix. A provider may restore service via a workaround (meeting the MTTR target) and then address the root cause separately. This distinction matters when reading SLA fine print.
Committing to MTTR is inherently risky for a provider: they cannot always predict incident complexity upfront. This is why contracts with MTTR guarantees typically cost more than response-time-only contracts. At RDEM Systems, we include a 4-hour response time (or 1-hour with our Critical plan) because we prefer realistic, measurable commitments over promises that are difficult to keep. See our managed server plans with guaranteed SLAs.
Response and resolution times vary by incident priority. The table below shows industry-standard values commonly found in managed services contracts, aligned with ITIL incident priority classifications:
| Priority | Description | Response Time | Resolution Target |
|---|---|---|---|
| P1 - Critical | Total service outage, major business impact | 15 min - 1h | 4h - 8h |
| P2 - Major | Significant degradation, workaround available | 1h - 4h | 8h - 24h |
| P3 - Minor | Limited impact, few users affected | 4h - 8h | 24h - 72h |
| P4 - Low | Service request, inquiry, improvement | 8h - 24h | Best effort |
Our managed services plans include a 4-hour response time for critical incidents, with 24/7/365 on-call support. For mission-critical infrastructure, we offer a 1-hour response time option. Explore our dedicated server management with guaranteed SLAs.
24/7 On-Call -- 4h Response
Included in the 24x7 plan at 150 EUR/month/server (or 70 EUR/month business hours only)
24/7 On-Call -- 1h Response
6,000 EUR/month (fleet package)
Availability targets are often expressed as "nines" -- but the practical difference between each level is dramatic:
| Availability | Downtime / Month | Downtime / Year | Typical Use Case |
|---|---|---|---|
| 99% (two nines) | 7h 18min | 3.65 days | Internal tools, dev environments |
| 99.9% (three nines) | 43 min | 8.76 hours | Business applications, SaaS |
| 99.95% | 21 min | 4.38 hours | E-commerce, customer portals |
| 99.99% (four nines) | 4.3 min | 52.6 min | Financial systems, healthcare |
The measurement methodology must be precisely defined in your SLA to prevent disputes. Here are the key considerations:
The timer typically starts at ticket creation. But watch for nuances:
Most SLAs define conditions where the timer is suspended:
Response time met when:
An engineer has begun diagnosis and sent the first update to the client.
Resolution time met when:
The service is restored and functional (confirmed by the client, or automatically after X hours without objection).
P1 incident reported: 2:00 PM
Contractual response time: 1 hour
Engineer begins work: 2:42 PM
Actual response time: 42 minutes
Result: SLA met (42 min < 60 min)
A good SLA strikes a balance between your actual needs and an acceptable cost. Here are practical tips for negotiating effectively.
Do you genuinely need a 15-minute response time at 3 AM on a Sunday? If your business does not operate 24/7, business-hours coverage may be sufficient -- and considerably cheaper. Map SLA tiers to actual business impact using a framework like BIA (Business Impact Analysis).
Not every server deserves the same SLA. A revenue-generating e-commerce platform needs a short response time. A development environment can wait. Tiered SLAs reduce costs without sacrificing protection where it matters.
A 15-minute response time looks appealing on paper, but if it is not achievable, the provider will find loopholes: pausing the clock, reclassifying priority levels, or counting automated replies as "responses." Realistic commitments are more valuable than impressive numbers.
An SLA packed with exclusions (broad force majeure clauses, third-party outages, maintenance windows) may never actually apply. Read the fine print carefully. A good SLA clearly defines what is excluded -- and the list should be short.
A reputable provider can show you their SLA compliance statistics. If the rate is close to 100%, the SLA targets are realistic. If they refuse to share performance data, that is a red flag. Transparency is a hallmark of a trustworthy managed services partner.
Reference established frameworks when negotiating. ITIL provides a mature incident management process. ISO 20000 defines SLA requirements for IT service management certification. Using these standards gives your negotiation a solid foundation and avoids subjective arguments.
An SLA without penalties is just a statement of intent. Financial consequences give teeth to the commitments and incentivise the provider to meet them consistently.
The most common remedy: when the SLA is breached, the client receives a credit on the next invoice. Typical structures:
Beyond financial penalties, a well-drafted SLA may include:
SLA penalties almost never cover consequential damages (lost revenue, customer churn, reputational harm). For those risks, you need separate insurance. The SLA is an incentive mechanism, not a full indemnification. To estimate the actual financial impact of downtime on your business, try our downtime cost calculator.
At RDEM Systems, SLA commitments are contractual and measurable. No "best effort," no fine print. Our 3 plans cover any hosting provider:
Essential
70 EUR/mo
7 AM - 10 PM, 7 days/week
Pro
150 EUR/mo
24/7 -- 4h response
Critical
250 EUR/mo
24/7 -- 1h response
We manage your servers regardless of hosting provider: OVHcloud, Scaleway, Hetzner, Contabo, IONOS, or any other provider.
Managed Service Provider: definition, services, and business advantages.
Read article →Complete breakdown of services included in a managed services plan.
Read article →Our operational setup for guaranteed rapid 24/7 incident response.
Read article →Cost comparison: in-house on-call teams vs outsourced managed on-call.
Read article →At RDEM Systems, our commitments are straightforward: 4-hour response time included, 1-hour option available. No fine print, no surprises. See our managed services plans with guaranteed SLAs.