Senior Site Reliability Engineer
Start: ASAP
Duration: 6 months +
Location: 3-days per week in central London
Rate: negotiable DoE, inside IR35
We're seeking a skilled Site Reliability Engineer to join a high-performing technology team within a leading professional services environment. You'll design and maintain reliable, secure, and scalable cloud infrastructure while driving automation and DevOps best practices.
Key Responsibilities
- Build and manage CI/CD pipelines, release automation, and infrastructure as code (IaC).
- Develop resilient cloud environments and optimize monitoring, alerting, and performance.
- Maintain observability tools (Prometheus, Grafana, Datadog, Splunk, etc.).
- Manage incident response, troubleshooting, and root cause analysis.
- Collaborate with teams to improve reliability, scalability, and deployment efficiency.
- Document and standardize technical processes and runbooks.
Skills & Experience
- 4+ years in SRE, DevOps, or related roles.
- Advanced Kubernetes (EKS, GKE, AKS, or RKE); proficient with Kubectl and Helm.
- Strong containerization (Docker, microservices with Java/Spring Boot).
- CI/CD tools: Jenkins, GitHub Actions, Azure DevOps, ArgoCD.
- IaC: Terraform or Pulumi (module development preferred).
- Observability & monitoring: Prometheus/Grafana, Datadog, OpsGenie, or similar.
- Familiar with Git workflows, Python/Go Scripting, and security tools (Vault, Qualys, etc.).