Role Overview
This role is eligible for our hybrid work model: Two days in-office. As a Site Reliability Engineer – Observability, you will play a key part in maturing our observability capabilities by standardizing instrumentation, improving telemetry quality, and enabling faster root cause analysis that directly impacts MTTR and MTTD.
Responsibilities
- Support and evolve end-to-end observability solutions for collecting, shipping, storing, and querying OpenTelemetry signals (metrics, logs, and traces) across infrastructure, containers, and Kubernetes environments.
- Administer and operate core observability platforms (Splunk, New Relic, ClickHouse, Grafana, Lightrun), including service onboarding, access management, configuration, upgrades, and ongoing platform health.
- Contribute to building and advancing a modern OpenTelemetry-based observability ecosystem that supports multiple telemetry types at scale.
- Improve and standar...