CoreWeave is a cloud platform that empowers AI innovation. Founded in 2017, it provides infrastructure, tools, and expertise to improve performance.
What You’ll Do
- Develop, optimize, and maintain network observability platforms. Use Python and Golang to create collectors, exporters, and dashboards that provide deep visibility into network health and performance.
- Collaborate with Network Engineering and Platform teams to ingest and unify logs, metrics, and events from various platforms (Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux, etc.) into a single observability pipeline.
- Design and implement scalable telemetry solutions using protocols like gNMI, SNMP, and streaming analytics. Ensure advanced alerting and anomaly detection with Prometheus, Grafana, Alertmanager.
- Work closely with network developers, site reliability engineers, and security teams to integrate observability solutions across the broader infrastructu...