Site Reliability Engineer, Azure, GCP, Automation
A key customer of ours is seeking several SRE candidates to help with this massive build out, implementation across the GCP/Azure platforms.
They are looking for several Site Reliability Engineer (SRE) to help improve the reliability, performance and observability of our Azure and GCP environments. You'll work within a multidisciplinary engineering squad, supporting the delivery, operation and continuous improvement of our cloud-hosted services.
- Support the reliability and performance of the cloud platforms your squad owns.
- Use observability tools, metrics, logs and traces to detect and prevent issues.
- Contribute to incident response, post-incident reviews and problem management activities.
- Build automation that removes toil and improves operational efficiency.
- Work collaboratively with engineers, Product Owners and platform teams to balance delivery with operational health.
- Improve SLOs, error budgets and other product health measures.
- Take part in engineering ceremonies, knowledge sharing and squad-wide improvement initiatives.
Technical Skills- Experience with Azure and/or GCP public cloud platforms.
- Understanding of observability (metrics, logs, traces) and its impact on system health.
- Experience with GitHub pipelines and Terraform modules.
- Exposure to SRE principles such as SLOs, SLIs and error budgets.
- Ability to contribute to automation using Python, PowerShell, Terraform, CI/CD, or similar tools.
- Solid knowledge of modern engineering practices including DevOps, Infrastructure as Code and automation.
Site Reliability Engineer, Azure, GCP, Automation McGregor Boyall is an equal opportunity employer and do not discriminate on any grounds.