Salary: £72,702-£80,780
Location: Halifax or Leeds
Workstyle: Hybrid (at least two days a week/on average 40% on site)
Job Description Summary
Senior Public Cloud Infrastructure Engineer - responsible for building and operating large scale Kubernetes platforms in a regulated environment, focusing on automation, security, observability and reliability.
Key Responsibilities
- Platform Engineering (GKE)
- Design, build and operate scalable, resilient GKE environments
- Engineer multi tenant Kubernetes clusters with strong workload isolation and platform guardrails
- Support shared and dedicated cluster patterns, including tenant onboarding
- Improve platform performance under production conditions (e.g. scaling, storage, node pressure)
- Automation & DevOps
- Build automation first infrastructure using Terraform, CI/CD and GitOps
- Simplify cluster lifecycle management (provisioning, upgrades, add ons)
- Develop self service platform capabilities to improve developer experience
- Reliability & SRE
- Apply SRE practices to platform operations
- Support incident response, monitoring, observability and continuous improvement
- Diagnose issues across performance, scaling, storage and automation
- Contribute to a 24x7 on call rotation
- Security & Compliance
- Implement policy as code controls (e.g. OPA Gatekeeper, RBAC, workload identity)
- Support audit, compliance and risk mitigation activities
- Ensure platforms are secure, supportable and aligned to control frameworks
- Networking & Platform Services
- Work with service mesh and ingress/egress patterns (e.g. Istio, Anthos, Cloud Service Mesh)
- Support cloud networking (VPCs, DNS, NAT, VPN, routing, connectivity)
- Integrate shared platform services (cert manager, observability, cost tooling)
Essential Skills & Experience
- Strong experience in Platform Engineering, DevOps or SRE
- Proven delivery of production Kubernetes platforms, ideally GKE
- Experience with multi tenant platform environments (shared clusters, isolation, scaling)
- Deep understanding of Kubernetes internals (scheduling, storage, node lifecycle, upgrades)
- Strong knowledge of Google Cloud Platform (GCP), including GKE IAM / Workload Identity, Networking (VPC, DNS, NAT, ingress/egress), Storage patterns
- Experience with Infrastructure as Code (Terraform) using modular design
- Strong experience with CI/CD pipelines and GitOps workflows
- Coding/scripting (Python, Go or Bash)
- Strong troubleshooting and problem solving skills
- Ability to own and deliver complex engineering outcomes
Desirable Experience
- Advanced GKE operational expertise (node pools, upgrades, scaling, security boundaries)
- Experience operating platforms at scale (multi cluster, multi tenant)
- Service mesh experience (e.g. Istio, mTLS, traffic management)
- Experience with policy as code (OPA Gatekeeper, Config Sync)
- Experience in regulated or compliance heavy environments
- Strong focus on SRE and platform reliability improvements
- Nice to have: Experience with Backstage or self service platform tooling
- Familiarity with Anthos Config Management / Config Sync
- Exposure to tools such as CoreDNS, cert manager, Dynatrace, Cloudability, Infoblox
- Understanding of platform scaling challenges (ephemeral storage, workload density, resilience)
- Experience working with cloud providers on platform architecture
Benefits
- Performance bonus
- Generous pension
- Flexible benefits package
- Private healthcare
- 30 days holiday + bank holidays
- Share schemes
Flexible Working Options
Hybrid working, job sharing, and other flexible working options are supported.