Technical Product Manager - AI Cloud Infrastructure

  • Era4
  • 23/06/2026
Full time Information Technology Telecommunications

Job Description

Technical Product Manager - AI Cloud Infrastructure

Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public sector organisations.

Role Summary

We are seeking a Technical Product Manager - AI Cloud Infrastructure to join our fast scaling team. In this role, you will embed with engineering to act as the "First Customer," owning the continuous validation, reliability strategy, and technical documentation for our bare metal, VM, Kubernetes, and ML infrastructure. By treating testability as a core feature and shadowing real world workflows, you will ensure our compute platform handles the demands of advanced AI training and engineering workloads. This is an opportunity to join a mission led AI business that is redefining infrastructure, intelligence, and impact for enterprise customers.

Key Responsibilities
  • Execute integration testing in staging environments, work closely with the platform engineers to build repeatable test frameworks, and shadow internal and external AI infrastructure engineers to translate their real world usage patterns into automated in house test cases.
  • Establish strict quality gates, performance SLOs, and scheduling benchmarks that our compute and orchestration services must pass before production deployment.
  • Review, refine, and author technical guides, API documentation, and CLI guides, using them as the blueprint to test the platform exactly as an external engineer would.
  • Partner with software and platform engineers to design robust validation suites, anticipating complex edge cases and structural failure modes across bare metal provisioning and Kubernetes cluster lifecycles.
  • Technical familiarity with bare metal infrastructure (e.g., PXE booting, IPMI/Redfish), virtualization layers (e.g., KVM), and container orchestration (Kubernetes or similar).
  • Track record designing comprehensive test strategies, validation frameworks, and acceptance criteria for highly technical cloud native, API, or infrastructure as a service (IaaS) products.
  • Analyse infrastructure services, CLIs, and APIs from a developer's perspective to identify friction points, usability gaps, and reliability risks.
  • Working knowledge of modern CI/CD pipelines, automated testing, and automation tooling (e.g., GitLab CI, GitHub Actions, Terraform, Ansible) to help engineering shape automated quality gates.
  • Proven experience in a highly technical role embedded directly within a core infrastructure or platform engineering team.
Desired Skills / Advantages
  • Direct exposure to high performance computing (HPC) setups, large scale cluster scheduling (e.g., Slurm), or infrastructure optimized for heavy AI/ML training workloads.
  • Experience using cloud observability, telemetry, and monitoring tools (e.g., Prometheus, Grafana, Datadog) to track and improve system reliability metrics.
  • Experience writing or structuring technical documentation, API reference guides, and developer tutorials from scratch.
Why Join Era4

You'll be joining a mission driven start up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next generation company operates at scale.

Diversity & Inclusion

Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

London, United Kingdom (Visit to office required)