ObservabilityIntermediate6h

Prometheus & Grafana.

Scraping metrics and building dashboards that matter.

What are Prometheus and Grafana?

Prometheus collects metrics by scraping endpoints your services expose, stores them as time series, and lets you query them with PromQL. Grafana turns those queries into dashboards and alerts. Together they are the de facto open-source monitoring stack.

Why it matters

Metrics are only useful if you can collect, query, and visualize them, and this pair is what most teams use to do it. Building a dashboard that surfaces real problems — and an alert that fires on them — is a daily DevOps task and a common interview topic.

What to learn

  • The pull model: Prometheus scrapes targets
  • Exposing metrics in the Prometheus format
  • Metric types: counter, gauge, histogram, summary
  • PromQL basics for querying
  • Recording rules and alerting rules
  • Building Grafana dashboards
  • Alertmanager and routing notifications

Common pitfall

Building dashboards full of every metric you can graph, so the one that matters is lost in a wall of charts. A good dashboard answers "is the service healthy?" at a glance — a few key signals like error rate, latency, and saturation. Vanity metrics that nobody acts on are clutter.

Resources

Primary (free):

Practice

Expose a metric from a service, scrape it with Prometheus, and write a PromQL query for its request rate. Build a Grafana dashboard with error rate and latency, and add an alert that fires when the error rate crosses a threshold. Done when the alert fires on a deliberately broken request.

Outcomes

  • Explain Prometheus's pull-based scraping model.
  • Expose and query metrics with PromQL.
  • Build a focused dashboard of key signals.
  • Define an alert that fires on a real problem.
Back to DevOps roadmap