Site Reliability Engineer

  • Charles Simon Associates Ltd
  • Nottingham, Nottinghamshire
  • 08/09/2025
Full time Information Technology Telecommunications Python Testing

Job Description

Site Reliability Engineer - (SRE, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) - Permanent - Remote

Charles Simon Associates are proud to partner with a global technology business, headquartered in Nottinghamshire, in their search for a Site Reliability Engineer.

This is an exciting opportunity to join a company that is passionate about modern reliability practices. They're looking for someone who shares their drive for excellence, if you're enthusiastic about automation, scalability, and building resilient systems, this role is for you.

Location: Remote, with occasional travel to Nottinghamshire HQ

Salary: Up to £75,000 per annum

Skills/Requirements for the Site Reliability Engineer:

  • Proven experience in Site Reliability Engineering or similar roles
  • Strong Terraform skills and hands-on experience with live environment deployments
  • Solid Kubernetes and AKS expertise
  • Familiarity with monitoring tools (Datadog preferred; Azure Application Insights, Log Analytics, Grafana also considered)
  • Scripting/automation skills (PowerShell, Python, Bash)
  • Experience supporting web-based applications

Desirable:

  • Exposure to microservices architectures
  • Knowledge of Agile methodologies (Kanban, Scrum)
  • Experience with tools such as Puppet or Chef

What You'll Do:

As a Site Reliability Engineer, you will:

  • Design and implement SLOs, SLIs, and SLAs to align reliability with business needs
  • Build and maintain incident response frameworks (runbooks, postmortems, blameless RCA)
  • Enhance observability with tools such as Prometheus, Grafana, Datadog, and OpenTelemetry
  • Manage infrastructure as code (Terraform, Pulumi, or CloudFormation) for consistent deployments
  • Optimize cloud performance and costs through autoscaling, rightsizing, and lifecycle management
  • Introduce chaos engineering practices to validate resilience and recovery strategies
  • Champion cloud security best practices (secrets management, IAM policies, vulnerability scanning)
  • Collaborate with DevOps and platform teams to create paved-road deployment patterns and internal developer portals
  • Lead capacity planning and load testing to ensure systems scale effectively
  • Contribute to architectural decisions around reliability, latency, and fault tolerance
  • Share knowledge, mentor colleagues, and promote SRE culture across teams

Please send an up-to-date CV to be considered for this position.

Site Reliability Engineer - (SRE, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) - Permanent - Remote