Site Reliability Engineer

Charles Simon Associates Ltd
Nottingham, Nottinghamshire
08/09/2025

Full time Information Technology Telecommunications Python Testing

Job Description

Site Reliability Engineer - (SRE, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) - Permanent - Remote

Charles Simon Associates are proud to partner with a global technology business, headquartered in Nottinghamshire, in their search for a Site Reliability Engineer.

This is an exciting opportunity to join a company that is passionate about modern reliability practices. They're looking for someone who shares their drive for excellence, if you're enthusiastic about automation, scalability, and building resilient systems, this role is for you.

Location: Remote, with occasional travel to Nottinghamshire HQ

Salary: Up to £75,000 per annum

Skills/Requirements for the Site Reliability Engineer:

Proven experience in Site Reliability Engineering or similar roles
Strong Terraform skills and hands-on experience with live environment deployments
Solid Kubernetes and AKS expertise
Familiarity with monitoring tools (Datadog preferred; Azure Application Insights, Log Analytics, Grafana also considered)
Scripting/automation skills (PowerShell, Python, Bash)
Experience supporting web-based applications

Desirable:

Exposure to microservices architectures
Knowledge of Agile methodologies (Kanban, Scrum)
Experience with tools such as Puppet or Chef

What You'll Do:

As a Site Reliability Engineer, you will:

Design and implement SLOs, SLIs, and SLAs to align reliability with business needs
Build and maintain incident response frameworks (runbooks, postmortems, blameless RCA)
Enhance observability with tools such as Prometheus, Grafana, Datadog, and OpenTelemetry
Manage infrastructure as code (Terraform, Pulumi, or CloudFormation) for consistent deployments
Optimize cloud performance and costs through autoscaling, rightsizing, and lifecycle management
Introduce chaos engineering practices to validate resilience and recovery strategies
Champion cloud security best practices (secrets management, IAM policies, vulnerability scanning)
Collaborate with DevOps and platform teams to create paved-road deployment patterns and internal developer portals
Lead capacity planning and load testing to ensure systems scale effectively
Contribute to architectural decisions around reliability, latency, and fault tolerance
Share knowledge, mentor colleagues, and promote SRE culture across teams

Please send an up-to-date CV to be considered for this position.

Site Reliability Engineer - (SRE, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) - Permanent - Remote

Site Reliability Engineer

Job Description

Modal Window