What are the three pillars?
Observability rests on three kinds of data. Logs are timestamped records of what happened. Metrics are numbers over time, like request rate or CPU. Traces follow a single request across services. Together they let you understand a running system from the outside.
Why it matters
You cannot fix what you cannot see, and in production you only see what your systems emit. Each pillar answers a different question — logs the "what," metrics the "how much," traces the "where." Knowing which to reach for turns a vague outage into a targeted fix.
What to learn
- Logs: structured, leveled, and searchable
- Metrics: counters, gauges, histograms
- Traces and spans across services
- When to use each pillar
- Cardinality and why high-cardinality labels hurt
- Correlation: tying logs, metrics, and traces together
- OpenTelemetry as a vendor-neutral standard
Common pitfall
Logging everything at full verbosity in production. The noise buries the signal, the storage bill explodes, and high-cardinality fields can overwhelm the system. Log at sensible levels, use metrics for high-volume counts rather than a log line per event, and reserve verbose logging for when you are actively debugging.
Resources
Primary (free):
- OpenTelemetry — Documentation · docs
- Google SRE — Monitoring distributed systems · docs
- Grafana — Observability fundamentals · docs
Practice
Instrument a small service to emit all three: structured logs, a request-count metric, and a trace across one downstream call. For a sample failure, decide which pillar you would consult first and why. Done when you can answer "what," "how much," and "where" from your own telemetry.
Outcomes
- Explain what logs, metrics, and traces each answer.
- Choose the right pillar for a given question.
- Avoid high-cardinality and over-verbose logging.
- Correlate the three pillars to debug an incident.