BOSS Professional Services LTD
Macclesfield, Cheshire
Mid-Level Site Reliability Engineer (SRE) Are you an experienced Site Reliability Engineer with a passion for building reliable, scalable systems that empower innovation? Our client is looking for a skilled Mid-Level SRE to join our growing technology team. In this role, you'll help ensure our infrastructure is stable, secure, and efficient - supporting the applications that drive support our clients. The Role We are seeking a mid-level Site Reliability Engineer (SRE) to join our technology team, helping to ensure the smooth operation and reliability of our infrastructure. You'll play a vital role in maintaining uptime, managing deployments, and supporting other team members. This is a hands-on position suited for someone who thrives on problem-solving, process improvement, and cross-team communication. What You'll Do: Maintain & Improve Systems Ensure the reliability, performance, and availability of production systems. Perform regular updates, patching, and maintenance across environments. Manage infrastructure provisioning using Terraform, Ansible, and AWS. Collaborate & Support Work closely with the junior SRE to develop their practical experience and technical confidence. Partner with developers, data scientists, and business users to resolve technical issues. Automate & Optimise Contribute to configuration management and automation improvements. Identify and document standard operating procedures. Implement proactive monitoring measures to detect and prevent issues. Monitor & Troubleshoot Troubleshoot system issues using logs, monitoring tools, and a methodical approach. Oversee and enhance system monitoring with Nagios, with a transition to Datadog. Incident Management Support incident management processes, including post-mortems and follow-up actions. Communicate outcomes with customers clearly and effectively. What We're Looking For: Experience Proven experience in an SRE, DevOps, or Operations Engineering role. Strong working knowledge of AWS, Terraform, and Ansible. Technical Skills Linux system administration & shell scripting. Networking fundamentals, containerization, and infrastructure security best practices. Version control experience (e.g., Git). Strong troubleshooting and root cause analysis skills. Desirable Skills Experience with Kubernetes and/or other cloud platforms. Familiarity with Nagios, Datadog, or similar monitoring tools. Exposure to CI/CD systems such as TeamCity, AWS CodeBuild, AWS CodePipeline, or ArgoCD. Personal Attributes Proactive, curious, and process-driven. Enjoys collaboration and mentoring. Calm under pressure, especially during incidents. Flexible and adaptable to technical and business priorities. Nice-to-Have Experience supporting scientific or data-intensive applications. Background in post-mortem facilitation and follow-up. Enthusiasm for observability, performance tuning, and cost optimisation.
Mid-Level Site Reliability Engineer (SRE) Are you an experienced Site Reliability Engineer with a passion for building reliable, scalable systems that empower innovation? Our client is looking for a skilled Mid-Level SRE to join our growing technology team. In this role, you'll help ensure our infrastructure is stable, secure, and efficient - supporting the applications that drive support our clients. The Role We are seeking a mid-level Site Reliability Engineer (SRE) to join our technology team, helping to ensure the smooth operation and reliability of our infrastructure. You'll play a vital role in maintaining uptime, managing deployments, and supporting other team members. This is a hands-on position suited for someone who thrives on problem-solving, process improvement, and cross-team communication. What You'll Do: Maintain & Improve Systems Ensure the reliability, performance, and availability of production systems. Perform regular updates, patching, and maintenance across environments. Manage infrastructure provisioning using Terraform, Ansible, and AWS. Collaborate & Support Work closely with the junior SRE to develop their practical experience and technical confidence. Partner with developers, data scientists, and business users to resolve technical issues. Automate & Optimise Contribute to configuration management and automation improvements. Identify and document standard operating procedures. Implement proactive monitoring measures to detect and prevent issues. Monitor & Troubleshoot Troubleshoot system issues using logs, monitoring tools, and a methodical approach. Oversee and enhance system monitoring with Nagios, with a transition to Datadog. Incident Management Support incident management processes, including post-mortems and follow-up actions. Communicate outcomes with customers clearly and effectively. What We're Looking For: Experience Proven experience in an SRE, DevOps, or Operations Engineering role. Strong working knowledge of AWS, Terraform, and Ansible. Technical Skills Linux system administration & shell scripting. Networking fundamentals, containerization, and infrastructure security best practices. Version control experience (e.g., Git). Strong troubleshooting and root cause analysis skills. Desirable Skills Experience with Kubernetes and/or other cloud platforms. Familiarity with Nagios, Datadog, or similar monitoring tools. Exposure to CI/CD systems such as TeamCity, AWS CodeBuild, AWS CodePipeline, or ArgoCD. Personal Attributes Proactive, curious, and process-driven. Enjoys collaboration and mentoring. Calm under pressure, especially during incidents. Flexible and adaptable to technical and business priorities. Nice-to-Have Experience supporting scientific or data-intensive applications. Background in post-mortem facilitation and follow-up. Enthusiasm for observability, performance tuning, and cost optimisation.