DataOps Engineer

  • Dormont Manufacturing Co
  • 08/06/2026
Full time Information Technology Telecommunications SQL Python Data Scientist Testing

Job Description

CoreWeave is The Essential Cloud for AI . Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025.

We're proud to be a Living Wage accredited Employer.

What You'll Do: The Monolith AI Platform Engineering Team at CoreWeave is responsible for building and scaling the data and workflow backbone that powers the world's most advanced engineering simulation and AI workflows - our ambition is to become the super intelligent AI test lab for the engineering industry, helping customers ship science, faster. From high throughput data ingestion and feature pipelines to model training and real time inference, our platform delivers the performant, reliable, and trustworthy data foundation trusted by the world's largest engineering companies.

The Senior DataOps Engineer II will own and drive all things data observability and operations across our client estate - building the practices, tooling, and culture that make Monolith's data flows debuggable, auditable, and safe to evolve. You'll sit at the intersection of platform engineering, data engineering, and reliability, implementing end to end lineage and DataOps practices while mentoring data producers and consumers on how to manage data as a first class product.

You'll partner closely with Monolith's Product, Engineering and forward deployed teams, as well as with CoreWeave's infrastructure and AI platform groups, to turn fragmented, real world engineering data into well governed, observable, and operationally robust pipelines powering our SaaS platform and client specific deployments.

About the Role: We're seeking an Senior DataOps Engineer II who can act as the hands on owner for Monolith's data observability and operational surface: from batch and streaming pipelines running on our platform, through to the lineage, quality, and runbooks that keep customer environments healthy.

You'll define and roll out DataOps practices (CI/CD, infra as code, data SLOs, incident response) across the Monolith estate, implement end to end data lineage and observability, and serve as the go to mentor for engineering teams and client facing colleagues on best practice data management.

In this role, you will:
  • Own Monolith's Data Observability & Operations Surface
    • Design and implement the end to end observability stack for data workloads (metrics, logs, traces, and data quality signals) across batch and streaming pipelines.
    • Define and maintain operational SLOs/SLAs for critical data flows powering training, inference, and analytics, and ensure they are measurable and actionable.
    • Build dashboards, alerts, and runbooks that allow engineers and on call responders to quickly detect, triage, and remediate data incidents.
    • Standardise "golden paths" for how teams instrument pipelines, expose health signals, and respond to data related failures.
  • Implement Data Lineage, Quality & Governance
    • Deploy and maintain end to end data lineage for key domains - from client sources through transformations to features, models, and downstream analytics so teams can debug, audit, and reason about change.
    • Define and roll out data quality checks (schema, freshness, completeness, distribution, drift) and ensure failures integrate cleanly into alerting and incident workflows.
    • Partner with Security, Compliance, and customer facing teams to encode data governance requirements (e.g., retention, residency, access controls) into our pipelines and tooling.
    • Help shape metadata models and catalog conventions so that producers and consumers can reliably discover, understand, and use shared datasets.
  • Enable DataOps Practices Across Teams
    • Establish CI/CD patterns for data pipelines and related infrastructure, including testing strategies, promotion workflows, and change management guardrails.
    • Drive adoption of infra as code for data infrastructure (e.g., pipeline orchestration, storage, observability components), reducing manual drift across environments.
    • Define and continuously improve DataOps processes - incident response, post incident review, change review, on call rotations - with a focus on learning rather than blame.
    • Evaluate and integrate best of breed DataOps and observability tooling where it accelerates our teams, balancing build vs. buy pragmatically.
  • Partner Across Monolith, CoreWeave & Clients
    • Work with Monolith platform, data, agent, and reliability teams to expose observability and lineage as shared services and patterns other engineers can build on.
    • Collaborate with CoreWeave infrastructure and AI platform teams to leverage underlying storage, compute, networking, and observability in service of robust data flows.
    • Serve as a technical escalation point for forward deployed and customer facing engineers when data issues cross service boundaries or require deeper architectural insight.
    • Mentor data producers (product teams, integrations, forward deployed engineers) and data consumers (data scientists, analysts, client engineers) on resilient schemas, contracts, and operational practices.
Who You Are:
  • Experience & Level
    • Typically 5-6+ years of experience in DataOps, Data Engineering, DevOps/SRE for data platforms, or similar roles, including end to end ownership of production data pipelines and their operations.
    • Proven track record of operating at Senior IC scope: leading cross team initiatives, introducing new practices/tooling, and improving reliability at the platform level.
  • DataOps, Pipelines & Tooling
    • Strong hands on experience designing, deploying, and operating data pipelines in production (batch and/or streaming), including failure modes, retries, and backfills.
    • Practical experience with data orchestration and ETL/ELT tooling (e.g., Airflow, Dagster, dbt, Temporal, or similar) and comfort evaluating and integrating new tools where appropriate.
    • Solid SQL and/or Spark skills and experience with at least one major analytical database or warehouse; familiarity with time series / telemetry data is a plus.
  • Observability, Lineage & Data Quality
    • Extensive experience implementing data observability - metrics, logging, tracing, dashboards, and alerting - for data centric workloads.
    • Hands on work with data quality frameworks and/or observability platforms to monitor freshness, completeness, schema changes, and anomalies.
    • Experience deploying and using data lineage or metadata/catalog solutions, and applying them to debugging, compliance, and change impact analysis.
  • Platform, Infrastructure & Automation
    • Comfortable working in containerised, cloud native environments (Kubernetes plus at least one major cloud provider); experience with GPU or compute intensive workloads is a bonus.
    • Strong automation mindset: infra as code, CI/CD, and configuration management for data infrastructure and observability components.
    • Proficient in Python for building tooling, pipeline glue, and platform integrations; additional languages are a plus.
  • Collaboration, Mentorship & Communication
    • Clear communicator who can explain complex data flows and failure modes to both deeply technical and non specialist audiences.
    • Experience mentoring engineers and data practitioners on better data management, observability, and operational hygiene - through documentation, examples, reviews, and office hours.
    • Comfortable working in a fast moving, high ambiguity environment where we balance rapid iteration with the safety and reliability demanded by enterprise engineering clients.
Preferred:
  • Experience in ML/AI platforms or MLOps environments where data pipelines power experimentation, training, and inference at scale.
  • Background with test, simulation, or time series data (e.g., physical test benches, battery labs, automotive/aerospace R&D).
  • Familiarity with feature stores, experiment tracking, or model registries and their interaction with upstream data pipelines.
  • Prior work in multi tenant SaaS platforms, especially those with strong compliance, observability, and uptime requirements.
  • Experience supporting or partnering closely with forward deployed / professional services teams in complex customer environments.
Wondering if you're a good fit? We believe in investing in our people, and value candidates who bring diverse experiences - even if you don't tick every single box. Here are a few qualities we've found compatible with our team. If some of this sounds like you, we'd love to talk:
  • Data obsessed operator - You care deeply about making data systems observable, predictable, and easy to reason about, not just "working most of the time."
  • Systems thinker - You enjoy mapping complex data flows across services, understanding failure modes, and designing for graceful degradation and rapid recovery.
  • Pragmatic - You know when to build the ideal abstraction and when to ship the smallest change that meaningfully reduces risk or toil.
  • Collaborative mentor . click apply for full job details