What Is Uptime Monitoring?
Uptime monitoring checks whether your website or service is reachable and working as expected — then alerts you quickly when it isn’t. This guide explains how monitoring works, what to measure, and how to choose intervals and alerts that reduce downtime.
Short definition
Uptime monitoring measures availability — the fraction of requests that succeed over time — and alerts you when availability drops.
How uptime monitoring works
1) A scheduled check runs
A monitoring system sends a request to your site (often HTTP/HTTPS) at a fixed interval and records the result.
2) It records the outcome
The system logs status codes, response time, and errors to build an availability timeline.
3) Alerts fire on failures
When failures meet your alert criteria (e.g., multiple consecutive checks), notifications go out via email or other channels.
What to measure
Availability
Availability is the fraction of well‑formed requests that succeed. It’s commonly expressed as “nines†(e.g., 99.9%).
Latency
Slow responses can be an early warning sign. Tracking response time helps you see issues before a full outage.
Business‑critical endpoints
Monitoring the homepage alone isn’t enough. Check the actions that matter most (login, checkout, booking).
Choosing check intervals
Faster checks = faster detection
Monitoring intervals directly affect time to detection. Shorter intervals reduce how long an outage can go unnoticed.
Balance cost and noise
Very short intervals can increase costs and false positives. Use multi‑check confirmation to avoid noisy alerts.
What should trigger an alert?
5xx errors
Server‑side failures indicate your app can’t fulfill requests.
Timeouts
If requests exceed your timeout threshold, users experience it as downtime even if the server eventually responds.
DNS failures
If DNS can’t resolve your domain, the site is unreachable.
503 Service Unavailable
503 indicates the server is temporarily unable to handle requests, often due to overload or maintenance.
Best practices
Monitor all critical components
Reliability guidance recommends monitoring all components and business KPIs to detect failures quickly.
Use multiple locations
Multi‑location checks reduce false positives caused by local network issues.
Track user‑level success
Monitoring success of real user flows aligns uptime with customer experience, not just server health.
Ready to monitor your site?
Start a 30-day free trial and get alerted the moment your site goes down.
FAQ
How often should I check my site?
Many small businesses start with 5‑minute checks and move to 1‑minute checks for mission‑critical pages.
Is a single failed check enough to alert?
Usually no. Multi‑check confirmation reduces false positives from brief network issues.
Do I need to monitor more than the homepage?
Yes. Monitor the user actions that matter most to your business.
What’s the difference between uptime and performance?
Uptime measures whether requests succeed. Performance measures how fast they complete.
Sources
Google SRE Book: availability defined as the fraction of successful well‑formed requests, and “nines†as a common availability shorthand.
AWS Well‑Architected Reliability: monitor all components and business KPIs to detect failures quickly.
RFC 7231: 503 Service Unavailable indicates temporary overload or maintenance.