Cambridge University Press & Assessment
Cambridge, Cambridgeshire
Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essentialexperience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
02/04/2026
Full time
Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essentialexperience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
We are partnering with a leading organisation in the data and analytics space to recruit an experienced Senior Site Reliability Engineer . This is an opportunity to join a highly collaborative, technically strong SRE function working on large scale, cloud native platforms that support high volume, high speed data services. The team is expanding due to increased workload, and this role will become the eighth member of an established, supportive engineering group. You ll play a key part in driving cloud automation, improving system reliability, and supporting critical production environments. Key Responsibilities Build, maintain, and improve AWS cloud infrastructure Develop automation using Terraform, Ansible, and Python Support incident response and troubleshoot performance issues Deliver routine maintenance, including patching and upgrades Enhance CI/CD pipelines (GitLab CI, GitHub CI) Contribute to Agile ceremonies and take ownership of user stories Implement new technologies and solutions to improve system reliability What You Will Bring Strong commercial experience with AWS (essential) Solid understanding of Linux systems (RHEL, CentOS or similar) Scripting skills, ideally Python Hands on experience with Terraform and/or Ansible Proficiency with Docker Exposure to CI/CD tooling and Agile ways of working Background in software engineering, systems engineering, or previous SRE roles Minimum 4 years experience in a relevant technical discipline Please note, this role is not suitable for candidates with Windows only experience or Engineers without hands on AWS or Linux exposure. Remote working is supported, with an on-site presence in Nottingham, ideally once per week preferred.
01/04/2026
Contractor
We are partnering with a leading organisation in the data and analytics space to recruit an experienced Senior Site Reliability Engineer . This is an opportunity to join a highly collaborative, technically strong SRE function working on large scale, cloud native platforms that support high volume, high speed data services. The team is expanding due to increased workload, and this role will become the eighth member of an established, supportive engineering group. You ll play a key part in driving cloud automation, improving system reliability, and supporting critical production environments. Key Responsibilities Build, maintain, and improve AWS cloud infrastructure Develop automation using Terraform, Ansible, and Python Support incident response and troubleshoot performance issues Deliver routine maintenance, including patching and upgrades Enhance CI/CD pipelines (GitLab CI, GitHub CI) Contribute to Agile ceremonies and take ownership of user stories Implement new technologies and solutions to improve system reliability What You Will Bring Strong commercial experience with AWS (essential) Solid understanding of Linux systems (RHEL, CentOS or similar) Scripting skills, ideally Python Hands on experience with Terraform and/or Ansible Proficiency with Docker Exposure to CI/CD tooling and Agile ways of working Background in software engineering, systems engineering, or previous SRE roles Minimum 4 years experience in a relevant technical discipline Please note, this role is not suitable for candidates with Windows only experience or Engineers without hands on AWS or Linux exposure. Remote working is supported, with an on-site presence in Nottingham, ideally once per week preferred.
Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Junior Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll join an experienced team you can learn and grow from. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure Previous experience working with Monitoring and Observability Tools - Datadog, Grafana or Prometheus Knowledge on how Kubernetes works. Understanding of IaC - Terraform. Experience with CI/CD (GitHub Actions or similar) A good communicator who enjoys working collaboratively across product and engineering. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
01/04/2026
Full time
Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Junior Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll join an experienced team you can learn and grow from. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure Previous experience working with Monitoring and Observability Tools - Datadog, Grafana or Prometheus Knowledge on how Kubernetes works. Understanding of IaC - Terraform. Experience with CI/CD (GitHub Actions or similar) A good communicator who enjoys working collaboratively across product and engineering. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
Senior Site Reliability Engineer - Active SC Required! Up to £75,000 + benefits Wokingham - Hybrid (UK-based) We're seeking a Senior Site Reliability Engineer to play a key role in designing and operating highly reliable, scalable systems in a fast-paced environment. You'll act as a technical leader within the team, driving best practices across reliability engineering, automation, and system performance. What you'll be doing: Designing and improving system reliability, scalability, and observability Leading incident management and driving root cause analysis Building and maintaining robust CI/CD pipelines and automation frameworks Partnering with development teams to embed SRE principles into the SDLC Mentoring junior engineers and promoting engineering best practices What we're looking for: Strong experience in SRE, DevOps, or platform engineering roles Deep understanding of cloud infrastructure (AWS, Azure, or GCP) Hands-on experience with Kubernetes and containerised environments Strong scripting/programming skills (Python, Go, or similar) Experience with monitoring, alerting, and observability tooling Proven ability to troubleshoot complex distributed systems Why apply? Opportunity to influence technical direction and best practices Work on large-scale, mission-critical systems Leadership exposure with clear progression to principal level
01/04/2026
Full time
Senior Site Reliability Engineer - Active SC Required! Up to £75,000 + benefits Wokingham - Hybrid (UK-based) We're seeking a Senior Site Reliability Engineer to play a key role in designing and operating highly reliable, scalable systems in a fast-paced environment. You'll act as a technical leader within the team, driving best practices across reliability engineering, automation, and system performance. What you'll be doing: Designing and improving system reliability, scalability, and observability Leading incident management and driving root cause analysis Building and maintaining robust CI/CD pipelines and automation frameworks Partnering with development teams to embed SRE principles into the SDLC Mentoring junior engineers and promoting engineering best practices What we're looking for: Strong experience in SRE, DevOps, or platform engineering roles Deep understanding of cloud infrastructure (AWS, Azure, or GCP) Hands-on experience with Kubernetes and containerised environments Strong scripting/programming skills (Python, Go, or similar) Experience with monitoring, alerting, and observability tooling Proven ability to troubleshoot complex distributed systems Why apply? Opportunity to influence technical direction and best practices Work on large-scale, mission-critical systems Leadership exposure with clear progression to principal level
Site Reliability Engineer (SRE) - Active SC required! Up to £55,000 + benefits Hybrid (UK-based) We're looking for a Site Reliability Engineer to join a growing technology team delivering highly scalable, resilient systems across a range of enterprise environments. This is a fantastic opportunity for someone with a solid foundation in DevOps/SRE practices who wants to deepen their expertise in automation, reliability, and cloud-native technologies. What you'll be doing: Supporting the reliability, availability, and performance of production systems Monitoring applications and infrastructure, responding to incidents and driving resolution Automating manual processes to improve efficiency and reduce risk Collaborating with engineering teams to improve system design and resilience Contributing to CI/CD pipelines and infrastructure-as-code practices What we're looking for: Experience in an SRE, DevOps, or similar engineering role Knowledge of cloud platforms (AWS, Azure, or GCP) Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana, ELK) Scripting or programming skills (e.g. Python, Bash, Go) Understanding of containers and orchestration (Docker/Kubernetes is a plus) Why apply? Work with modern, cloud-native technologies Supportive environment with strong learning and development opportunities Clear progression path into senior SRE roles
01/04/2026
Full time
Site Reliability Engineer (SRE) - Active SC required! Up to £55,000 + benefits Hybrid (UK-based) We're looking for a Site Reliability Engineer to join a growing technology team delivering highly scalable, resilient systems across a range of enterprise environments. This is a fantastic opportunity for someone with a solid foundation in DevOps/SRE practices who wants to deepen their expertise in automation, reliability, and cloud-native technologies. What you'll be doing: Supporting the reliability, availability, and performance of production systems Monitoring applications and infrastructure, responding to incidents and driving resolution Automating manual processes to improve efficiency and reduce risk Collaborating with engineering teams to improve system design and resilience Contributing to CI/CD pipelines and infrastructure-as-code practices What we're looking for: Experience in an SRE, DevOps, or similar engineering role Knowledge of cloud platforms (AWS, Azure, or GCP) Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana, ELK) Scripting or programming skills (e.g. Python, Bash, Go) Understanding of containers and orchestration (Docker/Kubernetes is a plus) Why apply? Work with modern, cloud-native technologies Supportive environment with strong learning and development opportunities Clear progression path into senior SRE roles
IT Manager (CDN, AWS & SRE Focus) Manchester (Hybrid - 2 days in office) Up to £80,000 + Benefits Permanent, Full-Time The Opportunity Morson Edge are are looking for an experienced IT Manager to lead and evolve a highperforming infrastructure and reliability function. This is a key leadership role where you'll shape strategy, improve system resilience, and drive best practices across CDN, AWS cloud environments, and Site Reliability Engineering (SRE) . You'll work at the intersection of infrastructure, performance, and reliability-ensuring systems are scalable, secure, and always available. What You'll Be Doing Lead, mentor, and develop a team of engineers across cloud infrastructure and SRE Own and optimise AWS environments , ensuring scalability, cost-efficiency, and security Manage and enhance CDN performance and delivery strategies Drive adoption of SRE principles including SLIs, SLOs, and error budgets Improve system observability through monitoring, logging, and alerting Collaborate with engineering and product teams to support high-availability services Oversee incident management, root cause analysis, and continuous improvement Define and implement infrastructure best practices and automation What We're Looking For Proven experience in an IT Manager/Infrastructure Manager/SRE Lead role Strong expertise in AWS (EC2, Lambda, CloudFront, VPC, etc.) Solid understanding of Content Delivery Networks (CDN) and performance optimisation Experience implementing or working within SRE frameworks Knowledge of Infrastructure as Code (eg, Terraform, CloudFormation) Strong background in monitoring tools (eg, Prometheus, Grafana, Datadog) Excellent leadership and stakeholder management skills Nice to Have Experience with containerisation (Docker, Kubernetes) Exposure to DevOps culture and CI/CD pipelines Security and compliance awareness in cloud environments What's in It for You Salary up to £80,000 Hybrid working (2 days per week in Manchester office) Pension scheme Training and development opportunities A chance to shape and lead a modern, cloud-first infrastructure function
01/04/2026
Full time
IT Manager (CDN, AWS & SRE Focus) Manchester (Hybrid - 2 days in office) Up to £80,000 + Benefits Permanent, Full-Time The Opportunity Morson Edge are are looking for an experienced IT Manager to lead and evolve a highperforming infrastructure and reliability function. This is a key leadership role where you'll shape strategy, improve system resilience, and drive best practices across CDN, AWS cloud environments, and Site Reliability Engineering (SRE) . You'll work at the intersection of infrastructure, performance, and reliability-ensuring systems are scalable, secure, and always available. What You'll Be Doing Lead, mentor, and develop a team of engineers across cloud infrastructure and SRE Own and optimise AWS environments , ensuring scalability, cost-efficiency, and security Manage and enhance CDN performance and delivery strategies Drive adoption of SRE principles including SLIs, SLOs, and error budgets Improve system observability through monitoring, logging, and alerting Collaborate with engineering and product teams to support high-availability services Oversee incident management, root cause analysis, and continuous improvement Define and implement infrastructure best practices and automation What We're Looking For Proven experience in an IT Manager/Infrastructure Manager/SRE Lead role Strong expertise in AWS (EC2, Lambda, CloudFront, VPC, etc.) Solid understanding of Content Delivery Networks (CDN) and performance optimisation Experience implementing or working within SRE frameworks Knowledge of Infrastructure as Code (eg, Terraform, CloudFormation) Strong background in monitoring tools (eg, Prometheus, Grafana, Datadog) Excellent leadership and stakeholder management skills Nice to Have Experience with containerisation (Docker, Kubernetes) Exposure to DevOps culture and CI/CD pipelines Security and compliance awareness in cloud environments What's in It for You Salary up to £80,000 Hybrid working (2 days per week in Manchester office) Pension scheme Training and development opportunities A chance to shape and lead a modern, cloud-first infrastructure function
The Site Reliability Engineer plays a critical role in ensuring that our AI-driven, cloud-native platform is reliable, observable, secure, and able to scale with the organisation's growth. As we adopt intelligent agents, autonomous workflows, and increasingly complex distributed systems, the SRE ensures that resilience, performance, and operational excellence are built into everything we deliver. By partnering closely with Engineers, Architects, and the Engineering Manager, the SRE defines the patterns, tooling, and automation that enable fast, safe, and repeatable deployments. This role safeguards our production environment, drives continuous improvement across CI/CD and observability, and establishes the reliability practices that empower autonomous squads to move quickly without compromising stability. The SRE is essential to maintaining customer trust, supporting AI-first innovation, and ensuring our platform remains robust, secure, and highly available at scale. In this position you will ensure the reliability, scalability, and security of our engineering systems. Working closely with the Engineering Manager and Head of Engineering, the SRE will identify priorities to remove friction from engineering teams, streamline processes, and enhance operational excellence. This role combines software engineering principles with systems administration to deliver robust, automated, cost-effective, and secure-by-design solutions. Key Responsibilities Reliability, Performance & Security: Design and implement strategies to improve system reliability, availability, and security. Ensure all solutions follow secure-by-design principles, incorporating cybersecurity best practices from inception through deployment. Conduct regular security reviews and collaborate with security teams to address vulnerabilities. CI/CD Management: Own and optimise Continuous Integration and Continuous Deployment pipelines. Embed security checks (e.g., static analysis, dependency scanning) into CI/CD workflows. Ensure secure, efficient, and automated deployment processes across environments. Monitoring & Observability: Implement and maintain monitoring solutions for infrastructure and applications. Develop dashboards and alerting systems to ensure proactive incident and security event management. Evaluate and integrate new observability tools as needed. Automation & Tooling: Automate repetitive tasks to improve efficiency and reduce human error. Build and maintain internal tools that support engineering productivity and security compliance. Champion Infrastructure as Code (IaC) practices using tools like Terraform or ARM templates. Cloud Infrastructure Management: Manage and optimise services across AWS and Azure environments. Ensure scalability, resilience, and security of service-based architectures. Implement cost management strategies to optimise cloud spend without compromising performance or security. Incident Response & Root Cause Analysis: Lead incident response efforts, including security incidents, and conduct post-mortem reviews. Drive continuous improvement through lessons learned and preventive measures. Skills & Experience Proven experience in AWS and Azure cloud environments. Strong background in CI/CD tools (e.g., Azure DevOps, Pipelines, GitHub Actions, Jenkins). Expertise in monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog). Proficiency in scripting and automation (Python, Bash, PowerShell). Familiarity with containerisation and orchestration (Docker, Kubernetes). Solid understanding of networking, security, and cost optimisation in cloud environments. Knowledge of cybersecurity principles, secure coding practices, and compliance frameworks. A problem-solver with a proactive mindset. Comfortable working in fast-paced, evolving environments. Strong communicator who can bridge gaps between operations, development, and security teams. Passionate about automation, scalability, cost efficiency, and security.
01/04/2026
Full time
The Site Reliability Engineer plays a critical role in ensuring that our AI-driven, cloud-native platform is reliable, observable, secure, and able to scale with the organisation's growth. As we adopt intelligent agents, autonomous workflows, and increasingly complex distributed systems, the SRE ensures that resilience, performance, and operational excellence are built into everything we deliver. By partnering closely with Engineers, Architects, and the Engineering Manager, the SRE defines the patterns, tooling, and automation that enable fast, safe, and repeatable deployments. This role safeguards our production environment, drives continuous improvement across CI/CD and observability, and establishes the reliability practices that empower autonomous squads to move quickly without compromising stability. The SRE is essential to maintaining customer trust, supporting AI-first innovation, and ensuring our platform remains robust, secure, and highly available at scale. In this position you will ensure the reliability, scalability, and security of our engineering systems. Working closely with the Engineering Manager and Head of Engineering, the SRE will identify priorities to remove friction from engineering teams, streamline processes, and enhance operational excellence. This role combines software engineering principles with systems administration to deliver robust, automated, cost-effective, and secure-by-design solutions. Key Responsibilities Reliability, Performance & Security: Design and implement strategies to improve system reliability, availability, and security. Ensure all solutions follow secure-by-design principles, incorporating cybersecurity best practices from inception through deployment. Conduct regular security reviews and collaborate with security teams to address vulnerabilities. CI/CD Management: Own and optimise Continuous Integration and Continuous Deployment pipelines. Embed security checks (e.g., static analysis, dependency scanning) into CI/CD workflows. Ensure secure, efficient, and automated deployment processes across environments. Monitoring & Observability: Implement and maintain monitoring solutions for infrastructure and applications. Develop dashboards and alerting systems to ensure proactive incident and security event management. Evaluate and integrate new observability tools as needed. Automation & Tooling: Automate repetitive tasks to improve efficiency and reduce human error. Build and maintain internal tools that support engineering productivity and security compliance. Champion Infrastructure as Code (IaC) practices using tools like Terraform or ARM templates. Cloud Infrastructure Management: Manage and optimise services across AWS and Azure environments. Ensure scalability, resilience, and security of service-based architectures. Implement cost management strategies to optimise cloud spend without compromising performance or security. Incident Response & Root Cause Analysis: Lead incident response efforts, including security incidents, and conduct post-mortem reviews. Drive continuous improvement through lessons learned and preventive measures. Skills & Experience Proven experience in AWS and Azure cloud environments. Strong background in CI/CD tools (e.g., Azure DevOps, Pipelines, GitHub Actions, Jenkins). Expertise in monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog). Proficiency in scripting and automation (Python, Bash, PowerShell). Familiarity with containerisation and orchestration (Docker, Kubernetes). Solid understanding of networking, security, and cost optimisation in cloud environments. Knowledge of cybersecurity principles, secure coding practices, and compliance frameworks. A problem-solver with a proactive mindset. Comfortable working in fast-paced, evolving environments. Strong communicator who can bridge gaps between operations, development, and security teams. Passionate about automation, scalability, cost efficiency, and security.
Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll have the autonomy to make technical decisions and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure (ideally in a regulated or high-traffic environment) Previous experience working with Monitoring and Observability Tools Hands-on Kubernetes know-how, specifically with EKS. Solid IaC experience with Terraform. Experience with containerisation (Docker, Helm) and CI/CD (GitHub Actions or similar) Solid scripting/Automation experience with Python, Bash or Go A good communicator who enjoys working collaboratively across product and engineering. Desirable Certifications - CKA, CKAD, AWS Solutions Architect etc. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
01/04/2026
Full time
Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll have the autonomy to make technical decisions and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure (ideally in a regulated or high-traffic environment) Previous experience working with Monitoring and Observability Tools Hands-on Kubernetes know-how, specifically with EKS. Solid IaC experience with Terraform. Experience with containerisation (Docker, Helm) and CI/CD (GitHub Actions or similar) Solid scripting/Automation experience with Python, Bash or Go A good communicator who enjoys working collaboratively across product and engineering. Desirable Certifications - CKA, CKAD, AWS Solutions Architect etc. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
eDV DevOps Engineer / Site Reliability Engineer (SRE) - AWS, Kubernetes - Contract Outside IR35. . We are supporting a specialist engineering consultancy delivering secure technology platforms to high-profile UK government organisations. They are seeking an eDV Cleared DevOps Engineer / Site Reliability Engineer (SRE) with strong experience across AWS, Kubernetes, Terraform, CI/CD and Linux environments to support the continued growth of critical cross-domain systems. This contract role will focus on improving platform reliability, automation, infrastructure as code, observability and DevOps practices across both cloud and on-premise environments. You will work closely with software engineers, platform engineers and operations teams to ensure highly secure, scalable and resilient systems supporting sensitive government programmes. Location: Cheltenham (Hybrid - 3 days onsite) Rate: 500- 650 per day Outside IR35 Security Clearance: Active eDV Clearance required Start Date ASAP As a DevOps / Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and reliability of services supporting sensitive government programmes. You will collaborate with multiple feature development teams and BAU/support teams to evolve both cloud and on-premise infrastructure, delivery pipelines, and observability tooling. The role will focus on improving system reliability, monitoring, automation, and performance, while proactively identifying and mitigating operational risks. This position may also involve participation in an on-call rota, which could include occasional 24/7 call-out support. Key Responsibilities: Collaborate with software engineering teams to improve subsystem reliability and performance. Work with system administrators to automate operational processes and reduce manual effort. Enhance monitoring and observability capabilities to proactively detect and resolve issues. Support development environments to improve delivery speed and quality. Contribute to the evolution of infrastructure, DevOps practices, and CI/CD pipelines. Research and evaluate new technologies and tools to support engineering decisions. Develop expertise across multiple technical and business domains. Required Skills & Experience Active eDV clearance is essential configuration management tools such as Ansible, Chef, or similar Strong Terraform Docker containers and container orchestration platforms (Kubernetes, OpenShift, Docker Swarm) maintaining and using CI/CD tooling such as Jenkins Monitoring and observability experience with Prometheus, Grafana, or InfluxDB event-driven integration and messaging systems such as RabbitMQ or other AMQP solutions Strong Linux command line, administration, and shell scripting experience Solid understanding of relational databases and SQL network security protocols Working with cloud platforms, ideally AWS (EC2, RDS, S3, Lambda) Azure a plus Please send your CV to Laura at (url removed) to progress matters. Services Advertised are those of Employment Business.
31/03/2026
Contractor
eDV DevOps Engineer / Site Reliability Engineer (SRE) - AWS, Kubernetes - Contract Outside IR35. . We are supporting a specialist engineering consultancy delivering secure technology platforms to high-profile UK government organisations. They are seeking an eDV Cleared DevOps Engineer / Site Reliability Engineer (SRE) with strong experience across AWS, Kubernetes, Terraform, CI/CD and Linux environments to support the continued growth of critical cross-domain systems. This contract role will focus on improving platform reliability, automation, infrastructure as code, observability and DevOps practices across both cloud and on-premise environments. You will work closely with software engineers, platform engineers and operations teams to ensure highly secure, scalable and resilient systems supporting sensitive government programmes. Location: Cheltenham (Hybrid - 3 days onsite) Rate: 500- 650 per day Outside IR35 Security Clearance: Active eDV Clearance required Start Date ASAP As a DevOps / Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and reliability of services supporting sensitive government programmes. You will collaborate with multiple feature development teams and BAU/support teams to evolve both cloud and on-premise infrastructure, delivery pipelines, and observability tooling. The role will focus on improving system reliability, monitoring, automation, and performance, while proactively identifying and mitigating operational risks. This position may also involve participation in an on-call rota, which could include occasional 24/7 call-out support. Key Responsibilities: Collaborate with software engineering teams to improve subsystem reliability and performance. Work with system administrators to automate operational processes and reduce manual effort. Enhance monitoring and observability capabilities to proactively detect and resolve issues. Support development environments to improve delivery speed and quality. Contribute to the evolution of infrastructure, DevOps practices, and CI/CD pipelines. Research and evaluate new technologies and tools to support engineering decisions. Develop expertise across multiple technical and business domains. Required Skills & Experience Active eDV clearance is essential configuration management tools such as Ansible, Chef, or similar Strong Terraform Docker containers and container orchestration platforms (Kubernetes, OpenShift, Docker Swarm) maintaining and using CI/CD tooling such as Jenkins Monitoring and observability experience with Prometheus, Grafana, or InfluxDB event-driven integration and messaging systems such as RabbitMQ or other AMQP solutions Strong Linux command line, administration, and shell scripting experience Solid understanding of relational databases and SQL network security protocols Working with cloud platforms, ideally AWS (EC2, RDS, S3, Lambda) Azure a plus Please send your CV to Laura at (url removed) to progress matters. Services Advertised are those of Employment Business.
Senior Site Reliability Engineer (SRE) Remote 12-month contract (high chance of extension) Job Description Join a global pioneer in the video game industry and own the reliability of high-traffic, revenue-critical platforms used by millions worldwide. As a Senior SRE, you'll shape the architecture, improve platform-wide resiliency, and ensure services stay performant, scalable, and secure. This isn't just about maintaining a single system, you'll influence reliability across multiple services, driving improvements that touch the entire ecosystem. Key Responsibilities Lead incident response and troubleshooting for production systems, resolving high-severity issues and driving post-incident improvements. Influence architecture to improve platform-wide reliability, resiliency, and operational efficiency, ensuring services remain available under heavy load. Drive containerisation best practices and manage Kubernetes-based workloads at scale. Build and maintain event-driven architectures that scale globally while ensuring fault-tolerance and high availability. Automate infrastructure provisioning, deployment, and monitoring using Infrastructure as Code (Terraform, CloudFormation, Ansible, CDK). Collaborate with engineering, product, and security teams to define SLOs, SLIs, and error budgets across services. Provide mentorship, advocate SRE best practices, and ensure teams are empowered to deliver resilient, reliable systems. Experience / Must-Have Skills Extensive experience in AWS and AWS-managed services (EC2, Lambda, S3, VPC, CloudWatch, CloudTrail, IAM, EKS, Service Catalog, multi-account environments). Strong Kubernetes / container orchestration experience, including EKS, OpenShift, Docker, and service mesh. Deep understanding of networking fundamentals: DNS, VPCs, routing, load balancing, TCP/IP, firewall policies. Proven track record in incident response and troubleshooting at scale. Hands-on experience with infrastructure automation and CI/CD pipelines. Experience designing event-driven architectures and resilient systems. High level of autonomy, able to influence platform-wide decisions and architect for reliability across services. Ability and desire to mentor junior staff Bonus: experience in gaming, interactive entertainment, or other high-traffic, global-scale platforms. If you are interested in this role, please feel free to submit your CV.
31/03/2026
Contractor
Senior Site Reliability Engineer (SRE) Remote 12-month contract (high chance of extension) Job Description Join a global pioneer in the video game industry and own the reliability of high-traffic, revenue-critical platforms used by millions worldwide. As a Senior SRE, you'll shape the architecture, improve platform-wide resiliency, and ensure services stay performant, scalable, and secure. This isn't just about maintaining a single system, you'll influence reliability across multiple services, driving improvements that touch the entire ecosystem. Key Responsibilities Lead incident response and troubleshooting for production systems, resolving high-severity issues and driving post-incident improvements. Influence architecture to improve platform-wide reliability, resiliency, and operational efficiency, ensuring services remain available under heavy load. Drive containerisation best practices and manage Kubernetes-based workloads at scale. Build and maintain event-driven architectures that scale globally while ensuring fault-tolerance and high availability. Automate infrastructure provisioning, deployment, and monitoring using Infrastructure as Code (Terraform, CloudFormation, Ansible, CDK). Collaborate with engineering, product, and security teams to define SLOs, SLIs, and error budgets across services. Provide mentorship, advocate SRE best practices, and ensure teams are empowered to deliver resilient, reliable systems. Experience / Must-Have Skills Extensive experience in AWS and AWS-managed services (EC2, Lambda, S3, VPC, CloudWatch, CloudTrail, IAM, EKS, Service Catalog, multi-account environments). Strong Kubernetes / container orchestration experience, including EKS, OpenShift, Docker, and service mesh. Deep understanding of networking fundamentals: DNS, VPCs, routing, load balancing, TCP/IP, firewall policies. Proven track record in incident response and troubleshooting at scale. Hands-on experience with infrastructure automation and CI/CD pipelines. Experience designing event-driven architectures and resilient systems. High level of autonomy, able to influence platform-wide decisions and architect for reliability across services. Ability and desire to mentor junior staff Bonus: experience in gaming, interactive entertainment, or other high-traffic, global-scale platforms. If you are interested in this role, please feel free to submit your CV.
Principal Developer Team Lead
Salary: £51,400 - £68,800
Location: Cambridge/Hybrid
Contract: Permanent
This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS.
You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities:
Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams
Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms.
What You'll Do
Technical Leadership
Lead migration of legacy applications to cloud-native AWS architectures
Build DevOps automation to support SRE practices
Establish AI/ML development standards and frameworks
Set observability, monitoring, and incident response standards
Promote best practices in web, event-driven, and cloud-native technologies
Provide technical expertise and oversee code reviews
People Leadership
Manage and mentor a team of 4–8 developers, providing coaching, development plan
Identifying training needs in AI/ML and SRE.
Support recruitment and foster a culture of continual improvement and wellbeing.
Delivery & Collaboration
Deliver software in agile squads
Collaborate with architects, SREs, product owners, and infrastructure teams
Liaise with stakeholders to identify education sector needs
Plan and estimate migrations and feature delivery
Coordinate with service management, security, and AWS experts
About you
Essential experience
Degree or equivalent
Proven technical team leadership
Skilled in two or more modern programming languages
Experience with AWS cloud and infrastructure
DevOps skills: automation, CI/CD, infrastructure-as-code
Understanding of SRE and observability
Experience in web-apps and modern frameworks
Strong communicator with technical and non-technical audiences
Technical Expertise
CI/CD pipelines, automation frameworks, and developer tooling
Observability tools, monitoring, logging, and alerting systems
Responsible AI practices and governance
Event-driven architecture and microservices patterns
Software design patterns and scalability best practices
Security principles in cloud environments
Leadership Qualities
Ability to set technical standards and provide thought leadership
Experience balancing people management with hands-on contribution
Strong mentoring and coaching skills
Collaborative approach that builds trust across teams
Passion for continuous learning in AI/ML and DevOps
Promotes inclusion and continuous improvement
You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies.
If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability.
Ready to pursue your potential? Apply now.
We review applications on an ongoing basis, with a closing date for all applications being 18 February 2026.
If you are shortlisted and progressed through the stages, you can expect:
A 40-minute screening call with the Hiring Manager.
First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Why join us
Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
04/02/2026
Full time
Principal Developer Team Lead
Salary: £51,400 - £68,800
Location: Cambridge/Hybrid
Contract: Permanent
This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS.
You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities:
Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams
Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms.
What You'll Do
Technical Leadership
Lead migration of legacy applications to cloud-native AWS architectures
Build DevOps automation to support SRE practices
Establish AI/ML development standards and frameworks
Set observability, monitoring, and incident response standards
Promote best practices in web, event-driven, and cloud-native technologies
Provide technical expertise and oversee code reviews
People Leadership
Manage and mentor a team of 4–8 developers, providing coaching, development plan
Identifying training needs in AI/ML and SRE.
Support recruitment and foster a culture of continual improvement and wellbeing.
Delivery & Collaboration
Deliver software in agile squads
Collaborate with architects, SREs, product owners, and infrastructure teams
Liaise with stakeholders to identify education sector needs
Plan and estimate migrations and feature delivery
Coordinate with service management, security, and AWS experts
About you
Essential experience
Degree or equivalent
Proven technical team leadership
Skilled in two or more modern programming languages
Experience with AWS cloud and infrastructure
DevOps skills: automation, CI/CD, infrastructure-as-code
Understanding of SRE and observability
Experience in web-apps and modern frameworks
Strong communicator with technical and non-technical audiences
Technical Expertise
CI/CD pipelines, automation frameworks, and developer tooling
Observability tools, monitoring, logging, and alerting systems
Responsible AI practices and governance
Event-driven architecture and microservices patterns
Software design patterns and scalability best practices
Security principles in cloud environments
Leadership Qualities
Ability to set technical standards and provide thought leadership
Experience balancing people management with hands-on contribution
Strong mentoring and coaching skills
Collaborative approach that builds trust across teams
Passion for continuous learning in AI/ML and DevOps
Promotes inclusion and continuous improvement
You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies.
If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability.
Ready to pursue your potential? Apply now.
We review applications on an ongoing basis, with a closing date for all applications being 18 February 2026.
If you are shortlisted and progressed through the stages, you can expect:
A 40-minute screening call with the Hiring Manager.
First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Why join us
Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
Cambridge University Press & Assessment
Cambridge/Hybrid (with 2-3 days per week in office)
Job Title: English Technology Platform SRE Team Lead
Salary: £68,600 - £91,700
Location: Cambridge/Hybrid (with 2-3 days per week in office)
Contract: Permanent
Hours: Full time
Are you ready to shape the future of technology platforms at the heart of Cambridge's academic excellence? Join us as our English Technology Platform SRE Team Lead and help drive innovation, reliability, and intelligent automation in a world-class environment.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
The SRE Team Lead will lead a mature Site Reliability Engineering function within the Platform Operations Team, working closely with Platform Support and Engineering teams. This role demands strong thought leadership, technical depth, and strategic direction for the discipline, with a particular emphasis on leveraging AI-driven operations (AIOps) and FinOps practices to optimise reliability, performance, and cloud spend.
Although this is a hands-on technical role, the SRE Team Lead will also manage a small team of SRE, providing clear direction and ensuring consistent, data-driven, AI-enhanced service delivery across the platforms while working collaboratively with existing support and engineering groups.
Apply core SRE and DevOps principles—culture, automation, testing, measurement, and continuous improvement—to build and optimise pipelines focused on rapid, reliable software delivery. Integrate AIOps capabilities, such as automated anomaly detection and intelligent alerting, to further enhance operational excellence.
Work with Solutions Architecture, Development, and QA teams to automate processes wherever possible, creating and improving stable CI/CD pipelines for both software and infrastructure. Develop tools that enable rapid provisioning of environments and resources across all teams, incorporating AI-assisted automation where beneficial.
Use automation, observability, and monitoring tools to improve site reliability and proactively identify issues. Support development teams with troubleshooting, particularly in infrastructure, networking, and multi-tier application design. Serve as a subject matter expert for cloud services—especially AWS PaaS—while applying FinOps practices to ensure cloud cost transparency, optimisation, and efficient resource usage.
Create and maintain robust technical documentation for the infrastructure of the English platforms, including operational runbooks enhanced with predictive and AI-supported insights.
Stay engaged with developments in the SRE, DevOps, AIOps, and FinOps communities, continually introducing new practices and technologies to improve reliability, performance, automation, and cloud cost efficiency
This position has been classified as a hybrid role, requiring the selected candidate to typically spend 40-60% of their time collaborating and connecting face-to-face at their dedicated location. Aside from our hybrid principles, other flexible working requests will be considered from the first day of employment, including other work arrangements should you require adjustments due to a disability or long-term health condition.
About you
A passion for Site reliability engineering and driven to understand, anticipate, and counter platform related issues before they become problems and staying up to date with the latest technological trends and developments
Great communication allowing effective collaboration across technical leadership and various business stakeholders with the ability to present ideas and strategies clearly and persuasively.
Demonstratable soft skills in motivating, inspiring and leading a team (direct line management is not part of the roles remit)
Educated to degree level or equivalent and with a minimum of 5 years proven experience in a systems administration or dev-ops blended role.
Experience implementing technologies such as Terraform, Github Actions & Containerization/Orchestration e.g. Kubernetes & Docker
Expertise in Monitoring tools like New Relic, Grafana, Alert Manager and site24x7.
Have extreme knowledge of cloud computing infrastructure, especially using Amazon Web Services (EKS, ECS, RDS, Route53 etc.)
Excellent troubleshooting, debugging, communication and documentation skills
Experience of working within an Agile product development environment.
For a detailed job description, please refer to the link at the bottom of the advert on our careers site.
We are a Disability Confident (DC) employer that is committed to equality and inclusion ensuring our recruitment process is accessible to all. The DC scheme's Offer of an Interview commitment applies to applicants who opt in, and disclose a disability or a long-term health condition, and best meet the minimum criteria for the role. In instances where interviewing all qualifying candidates is not practicable, we prioritise those who best meet the minimum criteria, as we would for applicants who do not have a disability or long-term health condition.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
Ready to pursue your potential? Apply now.
We aim to support candidates by making our interview process clear and transparent. The closing date for all applications will be 4th February. We will review applications on an ongoing basis, and shortlisted candidates can expect interviews to take place shortly after it closes.
If you are shortlisted and progressed through the stages, you can expect:
A 15-minute screening call with the Hiring Manager.
Final stage virtual interview via MS Teams.
If you require any reasonable adjustments during the recruitment process due to a disability or a long-term health condition, there will be an opportunity for you to inform us via the online application form. We will do our best to accommodate your needs.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
We are committed to an equitable recruitment process. As such, applications must be submitted via our official online application procedure. Please refrain from sending your CV directly to our recruiters. If you experience technical difficulties or require additional support with submitting your online application, contact the Recruiter.
Why join us
Joining us is your opportunity to pursue potential. You will belong to a collaborative team that is exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it is safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
If you are ready to take the next step in your Cambridge journey, we welcome your application. Together, we continue to shape a culture where everyone feels empowered to succeed and motivated to make a difference— for ourselves, for each other, and for learners worldwide.
21/01/2026
Full time
Job Title: English Technology Platform SRE Team Lead
Salary: £68,600 - £91,700
Location: Cambridge/Hybrid (with 2-3 days per week in office)
Contract: Permanent
Hours: Full time
Are you ready to shape the future of technology platforms at the heart of Cambridge's academic excellence? Join us as our English Technology Platform SRE Team Lead and help drive innovation, reliability, and intelligent automation in a world-class environment.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
The SRE Team Lead will lead a mature Site Reliability Engineering function within the Platform Operations Team, working closely with Platform Support and Engineering teams. This role demands strong thought leadership, technical depth, and strategic direction for the discipline, with a particular emphasis on leveraging AI-driven operations (AIOps) and FinOps practices to optimise reliability, performance, and cloud spend.
Although this is a hands-on technical role, the SRE Team Lead will also manage a small team of SRE, providing clear direction and ensuring consistent, data-driven, AI-enhanced service delivery across the platforms while working collaboratively with existing support and engineering groups.
Apply core SRE and DevOps principles—culture, automation, testing, measurement, and continuous improvement—to build and optimise pipelines focused on rapid, reliable software delivery. Integrate AIOps capabilities, such as automated anomaly detection and intelligent alerting, to further enhance operational excellence.
Work with Solutions Architecture, Development, and QA teams to automate processes wherever possible, creating and improving stable CI/CD pipelines for both software and infrastructure. Develop tools that enable rapid provisioning of environments and resources across all teams, incorporating AI-assisted automation where beneficial.
Use automation, observability, and monitoring tools to improve site reliability and proactively identify issues. Support development teams with troubleshooting, particularly in infrastructure, networking, and multi-tier application design. Serve as a subject matter expert for cloud services—especially AWS PaaS—while applying FinOps practices to ensure cloud cost transparency, optimisation, and efficient resource usage.
Create and maintain robust technical documentation for the infrastructure of the English platforms, including operational runbooks enhanced with predictive and AI-supported insights.
Stay engaged with developments in the SRE, DevOps, AIOps, and FinOps communities, continually introducing new practices and technologies to improve reliability, performance, automation, and cloud cost efficiency
This position has been classified as a hybrid role, requiring the selected candidate to typically spend 40-60% of their time collaborating and connecting face-to-face at their dedicated location. Aside from our hybrid principles, other flexible working requests will be considered from the first day of employment, including other work arrangements should you require adjustments due to a disability or long-term health condition.
About you
A passion for Site reliability engineering and driven to understand, anticipate, and counter platform related issues before they become problems and staying up to date with the latest technological trends and developments
Great communication allowing effective collaboration across technical leadership and various business stakeholders with the ability to present ideas and strategies clearly and persuasively.
Demonstratable soft skills in motivating, inspiring and leading a team (direct line management is not part of the roles remit)
Educated to degree level or equivalent and with a minimum of 5 years proven experience in a systems administration or dev-ops blended role.
Experience implementing technologies such as Terraform, Github Actions & Containerization/Orchestration e.g. Kubernetes & Docker
Expertise in Monitoring tools like New Relic, Grafana, Alert Manager and site24x7.
Have extreme knowledge of cloud computing infrastructure, especially using Amazon Web Services (EKS, ECS, RDS, Route53 etc.)
Excellent troubleshooting, debugging, communication and documentation skills
Experience of working within an Agile product development environment.
For a detailed job description, please refer to the link at the bottom of the advert on our careers site.
We are a Disability Confident (DC) employer that is committed to equality and inclusion ensuring our recruitment process is accessible to all. The DC scheme's Offer of an Interview commitment applies to applicants who opt in, and disclose a disability or a long-term health condition, and best meet the minimum criteria for the role. In instances where interviewing all qualifying candidates is not practicable, we prioritise those who best meet the minimum criteria, as we would for applicants who do not have a disability or long-term health condition.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
Ready to pursue your potential? Apply now.
We aim to support candidates by making our interview process clear and transparent. The closing date for all applications will be 4th February. We will review applications on an ongoing basis, and shortlisted candidates can expect interviews to take place shortly after it closes.
If you are shortlisted and progressed through the stages, you can expect:
A 15-minute screening call with the Hiring Manager.
Final stage virtual interview via MS Teams.
If you require any reasonable adjustments during the recruitment process due to a disability or a long-term health condition, there will be an opportunity for you to inform us via the online application form. We will do our best to accommodate your needs.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
We are committed to an equitable recruitment process. As such, applications must be submitted via our official online application procedure. Please refrain from sending your CV directly to our recruiters. If you experience technical difficulties or require additional support with submitting your online application, contact the Recruiter.
Why join us
Joining us is your opportunity to pursue potential. You will belong to a collaborative team that is exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it is safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
If you are ready to take the next step in your Cambridge journey, we welcome your application. Together, we continue to shape a culture where everyone feels empowered to succeed and motivated to make a difference— for ourselves, for each other, and for learners worldwide.
AVP Infrastructure Cloud Support - AWS, Terraform, Python, DevOps, SRE - Permanent Job purpose This role is supporting the AWS Public cloud infrastructure and implementation of Infrastructure as Code using Terraform. The role will work closely with the SRE and Engineering teams to ensure that the Cloud environment has sufficient observability and is appropriately managed. What you will be doing: Responsible for ensuring the Production service is prioritized, with all service incidents, problems and requests for cloud hosted services responded to and actioned. Responsible for maintaining the reliability and security of the Cloud Hosted environments. Improve Observability and Telemetry in the Cloud Hosted environments utilizing SRE methodology to give SLA, SLO and SLIs. Ensure risks within the Cloud hosted environment are documented and regularly reviewed. Identified operational risk issues are captured with appropriate actions tracked to agreed timelines. Define and implement standards and procedures to adhere to current best practice and drive continual service improvement. Responsible for ensuring Security standards are implemented and maintained in the Cloud hosted environment. Including delivery of upgrades and security updates to minimise risk and ensure stability for all cloud hosted services. Responsible for maintaining service resilience for all cloud hosted services, including backup and disaster recovery processes. Where necessary plan and conduct quarterly DR tests for all cloud hosted services ensuring any findings are captured and addressed promptly. What we're looking for: Must have strong technical operational skills in supporting AWS Cloud Hosted environments and at least 3 years in an Infrastructure support role. Strong understanding of Infrastructure as Code technologies, ideally including Terraform and Ansible. Operational risk and control management processes, including an understanding of Security best practice and how to apply this safely within a Production environment. Asset management and life cycle (EOS/EOL) process management. Planning and leading disaster recovery fail-overs of IT systems and services. Preferably experience of working in a regulated financial services/banking organization. Able to understand and use AWS including an understanding of AWS services, security and networking. Knowledge of at least 1 programming language, preferably Python. Knowledge of CI/CD specifically relating to Cloud Hosted environments. Including an understanding of some of the Infrastructure as Code tools GIT, Terraform, Ansible, Jenkins. Permanent Role - Hybrid working (Central London based) - Candidate must be eligible to work in the UK By applying to this job you are sending us your CV, which may contain personal information. Please refer to our Privacy Notice to understand how we process this information. In short, in order to supply you with work finding services, we will hold and process your personal data, and only with your express permission we will share this personal data with a client (or a third party working on behalf of the client) by email or by upload to the Client/third parties vendor management system. By giving us permission to send your CV to a client, this constitutes permission to share the personal data that would be necessary to consider your application, interview you (Phone/video/face to face) and if successful hire you. Scope AT acts as an employment agency for Permanent Recruitment and an employment business for the supply of temporary workers. By applying for this job you accept the Terms and Conditions, Data Protection Policy, Privacy Notice and Disclaimers which can be found at our website.
06/10/2025
Full time
AVP Infrastructure Cloud Support - AWS, Terraform, Python, DevOps, SRE - Permanent Job purpose This role is supporting the AWS Public cloud infrastructure and implementation of Infrastructure as Code using Terraform. The role will work closely with the SRE and Engineering teams to ensure that the Cloud environment has sufficient observability and is appropriately managed. What you will be doing: Responsible for ensuring the Production service is prioritized, with all service incidents, problems and requests for cloud hosted services responded to and actioned. Responsible for maintaining the reliability and security of the Cloud Hosted environments. Improve Observability and Telemetry in the Cloud Hosted environments utilizing SRE methodology to give SLA, SLO and SLIs. Ensure risks within the Cloud hosted environment are documented and regularly reviewed. Identified operational risk issues are captured with appropriate actions tracked to agreed timelines. Define and implement standards and procedures to adhere to current best practice and drive continual service improvement. Responsible for ensuring Security standards are implemented and maintained in the Cloud hosted environment. Including delivery of upgrades and security updates to minimise risk and ensure stability for all cloud hosted services. Responsible for maintaining service resilience for all cloud hosted services, including backup and disaster recovery processes. Where necessary plan and conduct quarterly DR tests for all cloud hosted services ensuring any findings are captured and addressed promptly. What we're looking for: Must have strong technical operational skills in supporting AWS Cloud Hosted environments and at least 3 years in an Infrastructure support role. Strong understanding of Infrastructure as Code technologies, ideally including Terraform and Ansible. Operational risk and control management processes, including an understanding of Security best practice and how to apply this safely within a Production environment. Asset management and life cycle (EOS/EOL) process management. Planning and leading disaster recovery fail-overs of IT systems and services. Preferably experience of working in a regulated financial services/banking organization. Able to understand and use AWS including an understanding of AWS services, security and networking. Knowledge of at least 1 programming language, preferably Python. Knowledge of CI/CD specifically relating to Cloud Hosted environments. Including an understanding of some of the Infrastructure as Code tools GIT, Terraform, Ansible, Jenkins. Permanent Role - Hybrid working (Central London based) - Candidate must be eligible to work in the UK By applying to this job you are sending us your CV, which may contain personal information. Please refer to our Privacy Notice to understand how we process this information. In short, in order to supply you with work finding services, we will hold and process your personal data, and only with your express permission we will share this personal data with a client (or a third party working on behalf of the client) by email or by upload to the Client/third parties vendor management system. By giving us permission to send your CV to a client, this constitutes permission to share the personal data that would be necessary to consider your application, interview you (Phone/video/face to face) and if successful hire you. Scope AT acts as an employment agency for Permanent Recruitment and an employment business for the supply of temporary workers. By applying for this job you accept the Terms and Conditions, Data Protection Policy, Privacy Notice and Disclaimers which can be found at our website.
Senior Site Reliability Engineer 6 months Remote £Negotiable - INSIDE IR35 Tech Stack Multiple Platforms and Applications AWS and Azure - Cloud Mainframe skills would be handy Latest applications on Cloud Dev Ops skills would be helpful Attitude of being part of the team and owning the outcomes Advocate - to change the culture to SRE Disclaimer: This vacancy is being advertised by either Advanced Resource Managers Limited, Advanced Resource Managers IT Limited or Advanced Resource Managers Engineering Limited ("ARM"). ARM is a specialist talent acquisition and management consultancy. We provide technical contingency recruitment and a portfolio of more complex resource solutions. Our specialist recruitment divisions cover the entire technical arena, including some of the most economically and strategically important industries in the UK and the world today. We will never send your CV without your permission. Where the role is marked as Outside IR35 in the advertisement this is subject to receipt of a final Status Determination Statement from the end Client and may be subject to change.
06/10/2025
Contractor
Senior Site Reliability Engineer 6 months Remote £Negotiable - INSIDE IR35 Tech Stack Multiple Platforms and Applications AWS and Azure - Cloud Mainframe skills would be handy Latest applications on Cloud Dev Ops skills would be helpful Attitude of being part of the team and owning the outcomes Advocate - to change the culture to SRE Disclaimer: This vacancy is being advertised by either Advanced Resource Managers Limited, Advanced Resource Managers IT Limited or Advanced Resource Managers Engineering Limited ("ARM"). ARM is a specialist talent acquisition and management consultancy. We provide technical contingency recruitment and a portfolio of more complex resource solutions. Our specialist recruitment divisions cover the entire technical arena, including some of the most economically and strategically important industries in the UK and the world today. We will never send your CV without your permission. Where the role is marked as Outside IR35 in the advertisement this is subject to receipt of a final Status Determination Statement from the end Client and may be subject to change.
About the role: Join Our Team at Holland & Barrett! Are you passionate about cloud security and looking to make a significant impact? Holland & Barrett is seeking a Cloud Security Specialist to help us define and implement our cloud security strategy. If you're an experienced professional eager to work with cutting-edge technology and collaborate with diverse teams, we want to hear from you! Key Responsibilities: Security Strategy: Help define and execute the Holland & Barrett cloud security strategy, partnering with platform and Site Reliability Engineering (SRE) teams to build robust infrastructure that supports our business. Perimeter Security: Establish platform perimeter security by implementing controls at ingress and egress points, including creating and maintaining an edge network with a Web Application Firewall (WAF), Distributed Denial of Service (DDoS) protection, and a Content Delivery Network (CDN). Access Control: Establish an access control baseline focusing on the principle of least privilege and segregation of duties. Monitor and enforce these controls once roles and permissions are set. Security Controls: Design, implement, and maintain security controls to prevent, detect, and remediate insecure configurations, including defining and disseminating secure AWS/infrastructure baselines. Standards Development: Own the development and maintenance of tailored security standards and guidelines, creating reusable resources for various development teams. AWS Security Services: Establish and manage AWS security services, including certificate authorities, encryption services, insecure configuration scanners, and security control canaries. Key requirements: Essential: 5+ years of experience in cloud security, particularly with AWS, and at least 2+ years in software development. Strong understanding of cloud and application security concepts, including secure coding practices, threat modeling, vulnerability management, and access control mechanisms. Experience with AWS, Kubernetes, Service Mesh, API gateways, and API Security (authentication and authorization). Proficiency in programming languages such as Python, JavaScript, GoLang, Terraform, CloudFormation (AWS), and AWS CDK. Familiarity with Agile methodologies like SCRUM, along with proven project management skills to manage multiple security projects effectively. Desired: Ability to work independently, take initiative, and maintain a keen attention to detail, ensuring high security standards. Strong communication and interpersonal skills, facilitating effective collaboration with both technical and non-technical teams. Why Holland & Barrett? At Holland & Barrett, we are dedicated to promoting health and well-being while ensuring the highest standards of cloud security. Join our team and be part of a company that values innovation and security. Ready to Make an Impact? If you're excited about cloud security and want to contribute to a secure future, apply now! We look forward to welcoming you to our team. We support flexibility and productivity of our employees by hybrid working arrangements. Although your role will be based in London (or Nuneaton, or Amsterdam) you will be required to travel only occasionally to our Hubs in Nuneaton or London or to any other location of H&B. What we offer: Pension company contribution = 3% Incentive scheme up to 10% of annual salary , based on company performance. Your wellbeing is paramount so you can get away and take 33 Days Holiday per year . Private Medical Care (Self after 1 year) Learning and Development opportunity with Holland & Barrett is a great base for career development long term. Career progression. Refer and Earn Scheme - as we're growing you can earn money by referring people to join us from your network. Epic Extras gives you access to exclusive benefits, free advice and savings from a range of retailers and providers. Stay healthy with Discounted Products - from day one you'll get a 25% discount (on top of other promotions) when you shop at H&B on anything that you buy. We all need a little help sometimes, so weoffer Free 24/7 Confidential Advice & Colleague Welfare . Mental Health First Aiders - we have lots of qualified Mental Health First Aiders because its all about your health & wellbeing. Stay active in the Onsite Gym at our Nuneaton Hub! We have colleague Reward and Recognition Schemes , so your hard work and loyalty won't go unnoticed. And many more! We're passionate about helping every colleague thrive across all dimensions of wellbeing, and we're committed to having a diverse and inclusive workplace. In line with our EPIC values (Expertise, Pioneering, Inclusive, Caring), we embrace and actively celebrate all our colleagues' unique and varying experiences, backgrounds, identities and cultures - I am me, we are H&B. Holland & Barrett does not accept unsolicited resumes from search firms/recruiters. Please do not forward resumes to our job alias, employees, or any other company location. Holland & Barrett is not and will not be responsible for any fees if a candidate submitted by a search firm/recruiter unless otherwise agreed with respect to specific open position(s).
01/10/2025
Full time
About the role: Join Our Team at Holland & Barrett! Are you passionate about cloud security and looking to make a significant impact? Holland & Barrett is seeking a Cloud Security Specialist to help us define and implement our cloud security strategy. If you're an experienced professional eager to work with cutting-edge technology and collaborate with diverse teams, we want to hear from you! Key Responsibilities: Security Strategy: Help define and execute the Holland & Barrett cloud security strategy, partnering with platform and Site Reliability Engineering (SRE) teams to build robust infrastructure that supports our business. Perimeter Security: Establish platform perimeter security by implementing controls at ingress and egress points, including creating and maintaining an edge network with a Web Application Firewall (WAF), Distributed Denial of Service (DDoS) protection, and a Content Delivery Network (CDN). Access Control: Establish an access control baseline focusing on the principle of least privilege and segregation of duties. Monitor and enforce these controls once roles and permissions are set. Security Controls: Design, implement, and maintain security controls to prevent, detect, and remediate insecure configurations, including defining and disseminating secure AWS/infrastructure baselines. Standards Development: Own the development and maintenance of tailored security standards and guidelines, creating reusable resources for various development teams. AWS Security Services: Establish and manage AWS security services, including certificate authorities, encryption services, insecure configuration scanners, and security control canaries. Key requirements: Essential: 5+ years of experience in cloud security, particularly with AWS, and at least 2+ years in software development. Strong understanding of cloud and application security concepts, including secure coding practices, threat modeling, vulnerability management, and access control mechanisms. Experience with AWS, Kubernetes, Service Mesh, API gateways, and API Security (authentication and authorization). Proficiency in programming languages such as Python, JavaScript, GoLang, Terraform, CloudFormation (AWS), and AWS CDK. Familiarity with Agile methodologies like SCRUM, along with proven project management skills to manage multiple security projects effectively. Desired: Ability to work independently, take initiative, and maintain a keen attention to detail, ensuring high security standards. Strong communication and interpersonal skills, facilitating effective collaboration with both technical and non-technical teams. Why Holland & Barrett? At Holland & Barrett, we are dedicated to promoting health and well-being while ensuring the highest standards of cloud security. Join our team and be part of a company that values innovation and security. Ready to Make an Impact? If you're excited about cloud security and want to contribute to a secure future, apply now! We look forward to welcoming you to our team. We support flexibility and productivity of our employees by hybrid working arrangements. Although your role will be based in London (or Nuneaton, or Amsterdam) you will be required to travel only occasionally to our Hubs in Nuneaton or London or to any other location of H&B. What we offer: Pension company contribution = 3% Incentive scheme up to 10% of annual salary , based on company performance. Your wellbeing is paramount so you can get away and take 33 Days Holiday per year . Private Medical Care (Self after 1 year) Learning and Development opportunity with Holland & Barrett is a great base for career development long term. Career progression. Refer and Earn Scheme - as we're growing you can earn money by referring people to join us from your network. Epic Extras gives you access to exclusive benefits, free advice and savings from a range of retailers and providers. Stay healthy with Discounted Products - from day one you'll get a 25% discount (on top of other promotions) when you shop at H&B on anything that you buy. We all need a little help sometimes, so weoffer Free 24/7 Confidential Advice & Colleague Welfare . Mental Health First Aiders - we have lots of qualified Mental Health First Aiders because its all about your health & wellbeing. Stay active in the Onsite Gym at our Nuneaton Hub! We have colleague Reward and Recognition Schemes , so your hard work and loyalty won't go unnoticed. And many more! We're passionate about helping every colleague thrive across all dimensions of wellbeing, and we're committed to having a diverse and inclusive workplace. In line with our EPIC values (Expertise, Pioneering, Inclusive, Caring), we embrace and actively celebrate all our colleagues' unique and varying experiences, backgrounds, identities and cultures - I am me, we are H&B. Holland & Barrett does not accept unsolicited resumes from search firms/recruiters. Please do not forward resumes to our job alias, employees, or any other company location. Holland & Barrett is not and will not be responsible for any fees if a candidate submitted by a search firm/recruiter unless otherwise agreed with respect to specific open position(s).
Join a team at the heart of the global economy! The Department for Business and Trade ("DBT") and Inspire People are partnering together to bring you an exciting opportunity for an experienced Python Developer to support essential tooling and systems across DBT. This role is ideal for a Back End Python developer looking for career growth and be exposed to cloud native systems with an SRE touch to join a team that ensures DBT's digital services work as users expect, working with development teams giving them the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CD/CI pipelines. £55,400 to £74,600 (including allowances) plus excellent Civil Service benefits and pension. Salary is dependent on location and technical skills as assessed at interview. Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Cardiff, Birmingham or Salford. DBTs Digital, Data and Technology (DDaT) team develops and operates tools, services, and platforms that enable the UK government to provide world leading support to businesses in the UK and overseas. As a senior SRE developing Python, you will work to give development teams the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CI/CD (continuous integration/continuous delivery) pipelines. You'll evangelise product teams about service-level indicators, objectives, and error budgets, and negotiate them. You'll help build and scale our global product platform and participate in an on-call rota. The Tech Stack includes: Python and Django framework Serverless compute (Lambda) Amazon Web Services Azure Jenkins and AWS Codepipelines Terraform & CloudFormation Kubernetes Elastic Container Service (ECS) Elasticsearch PostgreSQL Sentry Redis Essential Skills and Experience You should be able to demonstrate: Experience and fluency in Python, writing clean and effective code. Cloud experience with either Amazon Web Services, Azure or Google Cloud. Ability to build code-defined, reliable, and well tested infrastructure on top of cloud computing systems (eg, Terraform, CloudFormation, Pulumi). Experience in designing, analysing, and troubleshooting distributed systems. Knowledge of Linux/Unix fundamentals and TCP/IP Networking. Ability to see user impact in the infrastructure changes. Desirable Skills and Experience While not essential, it would be ideal if you have demonstrable skills and experience of: Experience coding infrastructure (ie, Terraform, CloudFormation). Experience in defining and measuring Service Level Objectives. Experience in observability driven development. Experience in prototyping through reuse of existing Open Source components. In return, you can expect a planned, transparent progression with learning and development tailored to your role, an environment with flexible working options and a culture encouraging inclusion and diversity, plus the following benefits: Salary of £54,400 to £74,600 (including allowances) including annual allowance depending on location and experience Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Birmingham, Salford Annual leave starting at 26 days per annum plus statutory bank holidays rising to 33 days with service An excellent Civil Service pension scheme. If you are a Python Developer, DevOps Engineer, Site Reliability Engineer or Systems Administrator looking to enhance your career and make a difference across an expanding function, then apply today or contact Alison Whitehead at Inspire People in complete confidence for further information. Further Information: This role requires SC clearance, a condition of which is to have been present in the UK for 3 out of the past 5 years.
14/08/2023
Full time
Join a team at the heart of the global economy! The Department for Business and Trade ("DBT") and Inspire People are partnering together to bring you an exciting opportunity for an experienced Python Developer to support essential tooling and systems across DBT. This role is ideal for a Back End Python developer looking for career growth and be exposed to cloud native systems with an SRE touch to join a team that ensures DBT's digital services work as users expect, working with development teams giving them the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CD/CI pipelines. £55,400 to £74,600 (including allowances) plus excellent Civil Service benefits and pension. Salary is dependent on location and technical skills as assessed at interview. Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Cardiff, Birmingham or Salford. DBTs Digital, Data and Technology (DDaT) team develops and operates tools, services, and platforms that enable the UK government to provide world leading support to businesses in the UK and overseas. As a senior SRE developing Python, you will work to give development teams the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CI/CD (continuous integration/continuous delivery) pipelines. You'll evangelise product teams about service-level indicators, objectives, and error budgets, and negotiate them. You'll help build and scale our global product platform and participate in an on-call rota. The Tech Stack includes: Python and Django framework Serverless compute (Lambda) Amazon Web Services Azure Jenkins and AWS Codepipelines Terraform & CloudFormation Kubernetes Elastic Container Service (ECS) Elasticsearch PostgreSQL Sentry Redis Essential Skills and Experience You should be able to demonstrate: Experience and fluency in Python, writing clean and effective code. Cloud experience with either Amazon Web Services, Azure or Google Cloud. Ability to build code-defined, reliable, and well tested infrastructure on top of cloud computing systems (eg, Terraform, CloudFormation, Pulumi). Experience in designing, analysing, and troubleshooting distributed systems. Knowledge of Linux/Unix fundamentals and TCP/IP Networking. Ability to see user impact in the infrastructure changes. Desirable Skills and Experience While not essential, it would be ideal if you have demonstrable skills and experience of: Experience coding infrastructure (ie, Terraform, CloudFormation). Experience in defining and measuring Service Level Objectives. Experience in observability driven development. Experience in prototyping through reuse of existing Open Source components. In return, you can expect a planned, transparent progression with learning and development tailored to your role, an environment with flexible working options and a culture encouraging inclusion and diversity, plus the following benefits: Salary of £54,400 to £74,600 (including allowances) including annual allowance depending on location and experience Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Birmingham, Salford Annual leave starting at 26 days per annum plus statutory bank holidays rising to 33 days with service An excellent Civil Service pension scheme. If you are a Python Developer, DevOps Engineer, Site Reliability Engineer or Systems Administrator looking to enhance your career and make a difference across an expanding function, then apply today or contact Alison Whitehead at Inspire People in complete confidence for further information. Further Information: This role requires SC clearance, a condition of which is to have been present in the UK for 3 out of the past 5 years.
Contents Location About the job Benefits Things you need to know Apply and further information Location Belfast, Cardiff, Darlington, Edinburgh, London About the job Summary Join a team at the heart of the global economy! We create digital services, data tools and technology for businesses to prosper around the world. Have a look at our video ! Our Digital, Data and Technology team develops and operates tools, services, and platforms that enable the UK government to provide world leading support to businesses in the UK and overseas. Youll get to constantly push boundaries in an environment free of heavy legacy, driven by curiosity, social purpose, diversity of thought, entrepreneurship, and the aspiration to offer an incredible experience to all our users. Find out more on our blog, Digital Trade. Job description As our Lead Site Reliability Engineer, you will lead a team of site reliability engineers who are committed to delivering excellent services and continual improvement. You will drive adaption of best practices and be responsible for managingour platform hosting. You will influence our future hosting strategy, helping to develop the team's roadmap of work and lead the support of several services offerings including CI/CD, Account Management, Containerisation, Network Connectivity, Cloud Cost Optimisation, Service Performance and Availability. Working with development teams to create reusable components, enabling service delivery at pace, you will automate the oversight of systems at scale, covering a hybrid cloud environment, including but not limited to AWS, GovUK PaaS, & Azure. Responsibilities In your day-to-day role, you will: Set the SRE teams technical direction, working with the technology leadership team. Provide technical leadership & guidance to the SRE team through coaching and mentoring. Lead the sharing of knowledge and good practice to develop the teams capability. Identify and lead on modernization initiatives through continuous improvement. Give development teams the tools for their job, including infrastructure, APM, exception, log aggregation, dashboards, and declarative CD/CI pipelines. Collaboratively develop the future hosting strategy. Actively lead the support of service offerings (Account Management, Security, CI/CD, Automation, Containerization, Service Performance & cloud Infrastructure). Ensure security, stability, and capacity are embedded in services deployments. Champion the adoption of emerging technology to automate tasks & deployments. Solve complex issues using root cause analysis, progressing opportunities to improve reliability, security, capability of infrastructure, application, and site services. Essential Skills and Experience Youll have demonstrable skills and experience of: Team leadership: managing workload, coaching, and mentoring technical staff. Setting the direction for technical teams, and liaising with colleagues to establish requirements and identify, propose, initiate work. Identifying good practices, sharing experiences, and championing your team's agenda, acting as the voice of the team. Working with cloud technology and the use of orchestration tools, developing infrastructure as code on top of cloud computing services. Deploying & managing CI/CD pipelines. Identification of process optimization opportunities and leading teams to deliver service improvements. Experience of information security, designing, quality-reviewing and quality-assurance solutions with security controls embedded. Designing and reviewing systems, selecting appropriate design standards, methods and tools and ensure they are applied effectively. Benefits Learning and development tailored to your role An environment with flexible working options A culture encouraging inclusion and diversity A Civil Service pension with an average employer contribution of 27% Things you need to know Security Successful candidates must pass a disclosure and barring security check. Successful candidates must meet the security requirements before they can be appointed. The level of security needed is security check . See our vetting charter . People working with government assets must complete basic personnel security standard checks. Selection process details We are closely monitoring the situation regarding the coronavirus, and will be following central Government advice as it is issued. There is therefore a risk that recruitment to this post may be subject to change at short notice. In addition, where appropriate, you may be invited to attend a video interview. Please continue to follow the application process as normal and ensure that you check your emails regularly as all updates from us will be sent to you this way. Assessment and Interview As part of the application process you will be asked to upload a CV which outlines your experience, skills and fit for the role. At the sift stage for this role, Inspire People will assess you against the essential criteria listed above to compile a long list of applications. If you are progressed through to this stage, you will be asked to complete a short, pre-recorded video interview with Inspire People or provide written answers to questions. These applications will then be sifted by DIT hiring managers. Initial sifting will take place the week commencing 26th September, with CV submissions to DIT on the 30th September. Interviews will take place the week commencing 10th October. Please note that these dates are indicative and may be subject to change. At the interview stage for this role, we will assess your technical/specialist experience, outlined in the above role description, testing your ability through relevant assessments/presentations and ask you questions around Behaviours and Technical skills, which are part of the Civil Service Success Profiles . The technical element within the interview, where you will be asked a series of questions to demonstrate your specific professional skills and knowledge related directly to the job role and context, will assess against these Technical Skills: Availability and capacity management Information security Modern standards approach Programming and build (software engineering) Prototyping Service support Systems design Systems integration User focus You will also be assessed against the Behaviours of: Communicating and Influencing Developing Self and Others Changing and Improving Making Effective Decisions Offer Stage Appointments may be made to candidates in merit order based on location preferences. The salary we will offer is determined using interview performance. Scores at interview translate to proficiency levels and an associated salary. Once a successful candidate has a proficiency level and is part of the capability framework, they will be given opportunities to self-assess to progress through the pay scale within their grade during their time at DIT. For further explanation of proficiency levels and more information about DDaT click here. The Department for International Trade embraces and values diversity in all forms. We welcome and pride ourselves on the positive impact diversity has on the work we do, and we promote equality of opportunity throughout the organisation. As such, we run a Disability Confident Scheme (DCS) for candidates with disabilities who meet the minimum selection criteria. Candidates who pass the bar at interview but are not the highest scoring will be held on a 12-month reserve list for future appointments. Candidates who are judged to be a near miss at interview may be offered a post at the grade below the one advertised. If successful and transferring from another Government Department a criminal record check may be carried out. The Department for International Trade embraces and values diversity in all forms. We welcome and pride ourselves on the positive impact diversity has on the work we do, and we promote equality of opportunity throughout the organisation. Harmonised terms and conditions are attached. Please take time to read the document to determine how these may affect you. Please note the successful candidate will be expected to remain in post for a minimum of 18 months before being released for another role. Any move to the Department for International Trade from another employer will mean you can no longer access childcare vouchers. This includes moves between government departments. You may however be eligible for other government schemes, including Tax Free Childcare. Determine your eligibility at New entrants are expected to join on the minimum of the pay band. Reasonable adjustment If a person with disabilities is put at a substantial disadvantage compared to a non-disabled person, we have a duty to make reasonable changes to our processes. If you need a change to be made so that you can make your application, you should contact the DDaT Recruitment team before the closing date to discuss your needs. ..... click apply for full job details
23/09/2022
Full time
Contents Location About the job Benefits Things you need to know Apply and further information Location Belfast, Cardiff, Darlington, Edinburgh, London About the job Summary Join a team at the heart of the global economy! We create digital services, data tools and technology for businesses to prosper around the world. Have a look at our video ! Our Digital, Data and Technology team develops and operates tools, services, and platforms that enable the UK government to provide world leading support to businesses in the UK and overseas. Youll get to constantly push boundaries in an environment free of heavy legacy, driven by curiosity, social purpose, diversity of thought, entrepreneurship, and the aspiration to offer an incredible experience to all our users. Find out more on our blog, Digital Trade. Job description As our Lead Site Reliability Engineer, you will lead a team of site reliability engineers who are committed to delivering excellent services and continual improvement. You will drive adaption of best practices and be responsible for managingour platform hosting. You will influence our future hosting strategy, helping to develop the team's roadmap of work and lead the support of several services offerings including CI/CD, Account Management, Containerisation, Network Connectivity, Cloud Cost Optimisation, Service Performance and Availability. Working with development teams to create reusable components, enabling service delivery at pace, you will automate the oversight of systems at scale, covering a hybrid cloud environment, including but not limited to AWS, GovUK PaaS, & Azure. Responsibilities In your day-to-day role, you will: Set the SRE teams technical direction, working with the technology leadership team. Provide technical leadership & guidance to the SRE team through coaching and mentoring. Lead the sharing of knowledge and good practice to develop the teams capability. Identify and lead on modernization initiatives through continuous improvement. Give development teams the tools for their job, including infrastructure, APM, exception, log aggregation, dashboards, and declarative CD/CI pipelines. Collaboratively develop the future hosting strategy. Actively lead the support of service offerings (Account Management, Security, CI/CD, Automation, Containerization, Service Performance & cloud Infrastructure). Ensure security, stability, and capacity are embedded in services deployments. Champion the adoption of emerging technology to automate tasks & deployments. Solve complex issues using root cause analysis, progressing opportunities to improve reliability, security, capability of infrastructure, application, and site services. Essential Skills and Experience Youll have demonstrable skills and experience of: Team leadership: managing workload, coaching, and mentoring technical staff. Setting the direction for technical teams, and liaising with colleagues to establish requirements and identify, propose, initiate work. Identifying good practices, sharing experiences, and championing your team's agenda, acting as the voice of the team. Working with cloud technology and the use of orchestration tools, developing infrastructure as code on top of cloud computing services. Deploying & managing CI/CD pipelines. Identification of process optimization opportunities and leading teams to deliver service improvements. Experience of information security, designing, quality-reviewing and quality-assurance solutions with security controls embedded. Designing and reviewing systems, selecting appropriate design standards, methods and tools and ensure they are applied effectively. Benefits Learning and development tailored to your role An environment with flexible working options A culture encouraging inclusion and diversity A Civil Service pension with an average employer contribution of 27% Things you need to know Security Successful candidates must pass a disclosure and barring security check. Successful candidates must meet the security requirements before they can be appointed. The level of security needed is security check . See our vetting charter . People working with government assets must complete basic personnel security standard checks. Selection process details We are closely monitoring the situation regarding the coronavirus, and will be following central Government advice as it is issued. There is therefore a risk that recruitment to this post may be subject to change at short notice. In addition, where appropriate, you may be invited to attend a video interview. Please continue to follow the application process as normal and ensure that you check your emails regularly as all updates from us will be sent to you this way. Assessment and Interview As part of the application process you will be asked to upload a CV which outlines your experience, skills and fit for the role. At the sift stage for this role, Inspire People will assess you against the essential criteria listed above to compile a long list of applications. If you are progressed through to this stage, you will be asked to complete a short, pre-recorded video interview with Inspire People or provide written answers to questions. These applications will then be sifted by DIT hiring managers. Initial sifting will take place the week commencing 26th September, with CV submissions to DIT on the 30th September. Interviews will take place the week commencing 10th October. Please note that these dates are indicative and may be subject to change. At the interview stage for this role, we will assess your technical/specialist experience, outlined in the above role description, testing your ability through relevant assessments/presentations and ask you questions around Behaviours and Technical skills, which are part of the Civil Service Success Profiles . The technical element within the interview, where you will be asked a series of questions to demonstrate your specific professional skills and knowledge related directly to the job role and context, will assess against these Technical Skills: Availability and capacity management Information security Modern standards approach Programming and build (software engineering) Prototyping Service support Systems design Systems integration User focus You will also be assessed against the Behaviours of: Communicating and Influencing Developing Self and Others Changing and Improving Making Effective Decisions Offer Stage Appointments may be made to candidates in merit order based on location preferences. The salary we will offer is determined using interview performance. Scores at interview translate to proficiency levels and an associated salary. Once a successful candidate has a proficiency level and is part of the capability framework, they will be given opportunities to self-assess to progress through the pay scale within their grade during their time at DIT. For further explanation of proficiency levels and more information about DDaT click here. The Department for International Trade embraces and values diversity in all forms. We welcome and pride ourselves on the positive impact diversity has on the work we do, and we promote equality of opportunity throughout the organisation. As such, we run a Disability Confident Scheme (DCS) for candidates with disabilities who meet the minimum selection criteria. Candidates who pass the bar at interview but are not the highest scoring will be held on a 12-month reserve list for future appointments. Candidates who are judged to be a near miss at interview may be offered a post at the grade below the one advertised. If successful and transferring from another Government Department a criminal record check may be carried out. The Department for International Trade embraces and values diversity in all forms. We welcome and pride ourselves on the positive impact diversity has on the work we do, and we promote equality of opportunity throughout the organisation. Harmonised terms and conditions are attached. Please take time to read the document to determine how these may affect you. Please note the successful candidate will be expected to remain in post for a minimum of 18 months before being released for another role. Any move to the Department for International Trade from another employer will mean you can no longer access childcare vouchers. This includes moves between government departments. You may however be eligible for other government schemes, including Tax Free Childcare. Determine your eligibility at New entrants are expected to join on the minimum of the pay band. Reasonable adjustment If a person with disabilities is put at a substantial disadvantage compared to a non-disabled person, we have a duty to make reasonable changes to our processes. If you need a change to be made so that you can make your application, you should contact the DDaT Recruitment team before the closing date to discuss your needs. ..... click apply for full job details
Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels
23/09/2022
Contractor
Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels
Site Reliability Engineer - SRE - Outside IR355 - Fully Remote - 5-9 months Jefferson Frank are currently recruiting for a Site Reliability Engineer (SRE) to grow an infrastructure team from scratch for a managed IT services company who are looking to onboard someone as soon as possible. For this role we would like to find out about your recent projects, demonstrating your skill and proven application of the below technologies. Within the role you will Grow a team of geographically distributed Infrastructure Engineers from scratch whislt mentoring the team throughout Participate in technical design, driving engineering initiatives and owning product quality Act as a point of contact between Product and Customer Service, peer engineering teams and other parts of the organisations Assist in the migration from the old AWS OU to new AWS OU Assist in the development and testing of IAM roles, Security policies and Service groups for members of the OU Administration and support of Kubernetes as recommended by AWS Assist in the Terraform refactoring and repository move to Gitlab Must Haves: Experience working with Terraform Experience working with cloud-based infrastructure (AWS, Google Cloud) 2+ years of programming experience Experience working with containers and orchestration Familiarity with CI/CD processes Experience with monitoring tools 3+ years of platform experience, with at least 2 of them in an enterprise-scale high paced environment. Strong experience working with modern cloud-native technologies and modern software architecture (Micro-services-based architectures) Ongoing management of our clients cloud infrastructure Creating custom automation tooling Monitoring system health and performance and creating custom tooling to do so Ensuring system uptime and reliability meets the internal SLAs Background in coding Site Reliability Engineer | SRE | Terraform | Kubernetes | Microservices | Remote | Immediate Start If you like the sound of the role or know someone who would, regardless of experience, please don't hesitate to get in touch with me! You can call me direct, email (see below) or click apply on the advert and I will get back to you ASAP. Site Reliability Engineer | SRE | Terraform | Kubernetes | Microservices | Remote | Immediate Start
05/11/2021
Contractor
Site Reliability Engineer - SRE - Outside IR355 - Fully Remote - 5-9 months Jefferson Frank are currently recruiting for a Site Reliability Engineer (SRE) to grow an infrastructure team from scratch for a managed IT services company who are looking to onboard someone as soon as possible. For this role we would like to find out about your recent projects, demonstrating your skill and proven application of the below technologies. Within the role you will Grow a team of geographically distributed Infrastructure Engineers from scratch whislt mentoring the team throughout Participate in technical design, driving engineering initiatives and owning product quality Act as a point of contact between Product and Customer Service, peer engineering teams and other parts of the organisations Assist in the migration from the old AWS OU to new AWS OU Assist in the development and testing of IAM roles, Security policies and Service groups for members of the OU Administration and support of Kubernetes as recommended by AWS Assist in the Terraform refactoring and repository move to Gitlab Must Haves: Experience working with Terraform Experience working with cloud-based infrastructure (AWS, Google Cloud) 2+ years of programming experience Experience working with containers and orchestration Familiarity with CI/CD processes Experience with monitoring tools 3+ years of platform experience, with at least 2 of them in an enterprise-scale high paced environment. Strong experience working with modern cloud-native technologies and modern software architecture (Micro-services-based architectures) Ongoing management of our clients cloud infrastructure Creating custom automation tooling Monitoring system health and performance and creating custom tooling to do so Ensuring system uptime and reliability meets the internal SLAs Background in coding Site Reliability Engineer | SRE | Terraform | Kubernetes | Microservices | Remote | Immediate Start If you like the sound of the role or know someone who would, regardless of experience, please don't hesitate to get in touch with me! You can call me direct, email (see below) or click apply on the advert and I will get back to you ASAP. Site Reliability Engineer | SRE | Terraform | Kubernetes | Microservices | Remote | Immediate Start
Fully Remote| Senior AWS Site Reliability engineer| £80k-£100k Your new company Created in 2012, this exciting London based Fintech Company has a unique value proposition of enabling and empowering financial institutions to take their problem-solving abilities to the next level, further aligning with the rapid development of technology within businesses. Not only will you be working in the Heart of the City of London, but you will also be part of a rapidly growing Fintech company that has been ranked within the top 100 influential Companies to work alongside financial institutions. Your new role You will be a crucial senior engineer working on mission critical functions within the operations team. You must have an immense passion for technology and automation. A resilient approach and a problem-solving mind when dealing with complex problems are all essential. You will be working alongside seasoned professionals in a high-intensity team environment. Your day to day will be diverse and consists of: Designing and innovating and developing a wide range of key systems including improving the CI/CD Process. Monitoring and maintaining a 100% cloud environment (AWS) Create, secure, reliable, repeatable production rollouts. Communicating ideas and decisions throughout the team Ensuring that tasks are owned and fully completed with high quality Working with multiple business units to gather requirements and collaboratively build solutions Ongoing maintenance and upgrades on key components of the platform What you'll need to succeed Excellent knowledge and practical skills regarding AWS (Monitoring and maintaining) Excellent Linux/ Unix Administration skills- requires networking and scripting Experience using one or more of the following Ansible Puppet Chef Salt Excellent teamwork and strong Communication skills. Understanding AWS, Google Cloud, Microsoft Azure or one of the other IaaS providers (Linode, Digital Ocean, OpenStack, VMWare, XEN) and using Terraform, CloudFormation, boto or other orchestration tools. Intrinsic interest and experience with development and scripting languages as Python, JavaScript, Java, C++, Bash Experience with software such as Git What you'll get in return You will be working an emerging fintech company whose state-of-the-art offices are in the heart of London. You will also be entitled to flexible working including a 100% remote if you would like it. An extremely competitive salary as well as other benefits such as subsides healthcare, dental, pension, 25 + days annual leave and a very competitive salary. What you need to do now Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found at hays.co.uk
04/11/2021
Full time
Fully Remote| Senior AWS Site Reliability engineer| £80k-£100k Your new company Created in 2012, this exciting London based Fintech Company has a unique value proposition of enabling and empowering financial institutions to take their problem-solving abilities to the next level, further aligning with the rapid development of technology within businesses. Not only will you be working in the Heart of the City of London, but you will also be part of a rapidly growing Fintech company that has been ranked within the top 100 influential Companies to work alongside financial institutions. Your new role You will be a crucial senior engineer working on mission critical functions within the operations team. You must have an immense passion for technology and automation. A resilient approach and a problem-solving mind when dealing with complex problems are all essential. You will be working alongside seasoned professionals in a high-intensity team environment. Your day to day will be diverse and consists of: Designing and innovating and developing a wide range of key systems including improving the CI/CD Process. Monitoring and maintaining a 100% cloud environment (AWS) Create, secure, reliable, repeatable production rollouts. Communicating ideas and decisions throughout the team Ensuring that tasks are owned and fully completed with high quality Working with multiple business units to gather requirements and collaboratively build solutions Ongoing maintenance and upgrades on key components of the platform What you'll need to succeed Excellent knowledge and practical skills regarding AWS (Monitoring and maintaining) Excellent Linux/ Unix Administration skills- requires networking and scripting Experience using one or more of the following Ansible Puppet Chef Salt Excellent teamwork and strong Communication skills. Understanding AWS, Google Cloud, Microsoft Azure or one of the other IaaS providers (Linode, Digital Ocean, OpenStack, VMWare, XEN) and using Terraform, CloudFormation, boto or other orchestration tools. Intrinsic interest and experience with development and scripting languages as Python, JavaScript, Java, C++, Bash Experience with software such as Git What you'll get in return You will be working an emerging fintech company whose state-of-the-art offices are in the heart of London. You will also be entitled to flexible working including a 100% remote if you would like it. An extremely competitive salary as well as other benefits such as subsides healthcare, dental, pension, 25 + days annual leave and a very competitive salary. What you need to do now Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found at hays.co.uk