Cambridge University Press & Assessment
Cambridge, Cambridgeshire
Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essential experience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities. Documents Job-Description - Principal Developer Team Lead V01.00 .pdf (117.42 KB)
03/04/2026
Full time
Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essential experience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities. Documents Job-Description - Principal Developer Team Lead V01.00 .pdf (117.42 KB)
Principal Site Reliability Engineer - Active SC Required! Up to £100,000 + benefits Wokingham - Hybrid (UK-based) We're looking for a Principal Site Reliability Engineer to provide technical leadership across large-scale, complex platforms. This is a strategic role where you'll shape reliability engineering practices, influence architecture, and drive operational excellence across the organisation. What you'll be doing: Defining and driving SRE strategy, standards, and best practices Architecting highly resilient, scalable, and secure systems Leading major incident reviews and driving organisational improvements Influencing platform and application design at an architectural level Championing automation, self-healing systems, and reliability by design Acting as a mentor and technical authority across multiple teams What we're looking for: Extensive experience in SRE, DevOps, or platform engineering Proven track record of designing and operating large-scale distributed systems Deep expertise in cloud platforms and cloud-native architectures Strong experience with Kubernetes, infrastructure as code, and automation Excellent stakeholder management and leadership skills Ability to operate at both strategic and hands-on levels Why apply? High-impact role with influence across engineering and architecture Opportunity to shape reliability strategy at scale Work with cutting-edge technologies in a complex environment
02/04/2026
Full time
Principal Site Reliability Engineer - Active SC Required! Up to £100,000 + benefits Wokingham - Hybrid (UK-based) We're looking for a Principal Site Reliability Engineer to provide technical leadership across large-scale, complex platforms. This is a strategic role where you'll shape reliability engineering practices, influence architecture, and drive operational excellence across the organisation. What you'll be doing: Defining and driving SRE strategy, standards, and best practices Architecting highly resilient, scalable, and secure systems Leading major incident reviews and driving organisational improvements Influencing platform and application design at an architectural level Championing automation, self-healing systems, and reliability by design Acting as a mentor and technical authority across multiple teams What we're looking for: Extensive experience in SRE, DevOps, or platform engineering Proven track record of designing and operating large-scale distributed systems Deep expertise in cloud platforms and cloud-native architectures Strong experience with Kubernetes, infrastructure as code, and automation Excellent stakeholder management and leadership skills Ability to operate at both strategic and hands-on levels Why apply? High-impact role with influence across engineering and architecture Opportunity to shape reliability strategy at scale Work with cutting-edge technologies in a complex environment
Cambridge University Press & Assessment
Cambridge, Cambridgeshire
Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essentialexperience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
02/04/2026
Full time
Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essentialexperience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
About the role The Site Reliability Engineer plays a critical role in ensuring that our AI-driven, cloud-native platform is reliable, observable, secure, and able to scale with the organisation's growth. As we adopt intelligent agents, autonomous workflows, and increasingly complex distributed systems, the SRE ensures that resilience, performance, and operational excellence are built into everything we deliver. By partnering closely with Engineers, Architects, and the Engineering Manager, the SRE defines the patterns, tooling, and automation that enable fast, safe, and repeatable deployments. This role safeguards our production environment, drives continuous improvement across CI/CD and observability, and establishes the reliability practices that empower autonomous squads to move quickly without compromising stability. The SRE is essential to maintaining customer trust, supporting AI-first innovation, and ensuring our platform remains robust, secure, and highly available at scale. In this position you will ensure the reliability, scalability, and security of our engineering systems. Working closely with the Engineering Manager and Head of Engineering, the SRE will identify priorities to remove friction from engineering teams, streamline processes, and enhance operational excellence. This role combines software engineering principles with systems administration to deliver robust, automated, cost-effective, and secure-by-design solutions. Key Responsibilities Reliability, Performance & Security: Design and implement strategies to improve system reliability, availability, and security. Ensure all solutions follow secure-by-design principles, incorporating cybersecurity best practices from inception through deployment. Conduct regular security reviews and collaborate with security teams to address vulnerabilities. CI/CD Management: Own and optimise Continuous Integration and Continuous Deployment pipelines. Embed security checks (e.g., static analysis, dependency scanning) into CI/CD workflows. Ensure secure, efficient, and automated deployment processes across environments. Monitoring & Observability: Implement and maintain monitoring solutions for infrastructure and applications. Develop dashboards and alerting systems to ensure proactive incident and security event management. Evaluate and integrate new observability tools as needed. Automation & Tooling: Automate repetitive tasks to improve efficiency and reduce human error. Build and maintain internal tools that support engineering productivity and security compliance. Champion Infrastructure as Code (IaC) practices using tools like Terraform or ARM templates. Cloud Infrastructure Management: Manage and optimise services across AWS and Azure environments. Ensure scalability, resilience, and security of service-based architectures. Implement cost management strategies to optimise cloud spend without compromising performance or security. Incident Response & Root Cause Analysis: Lead incident response efforts, including security incidents, and conduct post-mortem reviews. Drive continuous improvement through lessons learned and preventive measures. Skills & experience Proven experience in AWS and Azure cloud environments. Strong background in CI/CD tools (e.g., Azure DevOps, Pipelines, GitHub Actions, Jenkins). Expertise in monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog). Proficiency in scripting and automation (Python, Bash, PowerShell). Familiarity with containerisation and orchestration (Docker, Kubernetes). Solid understanding of networking, security, and cost optimisation in cloud environments. Knowledge of cybersecurity principles, secure coding practices, and compliance frameworks. A problem-solver with a proactive mindset. Comfortable working in fast-paced, evolving environments. Strong communicator who can bridge gaps between operations, development, and security teams. Passionate about automation, scalability, cost efficiency, and security. Benefits & culture Part of the Zellis Group, Moorepay is a team of over 500 friendly professionals across four offices in Swinton (Manchester), Sheffield, Birmingham and Kochi (India). We're passionate about making Moorepay a fantastic place to work for every single one of our colleagues. The average length of service at Moorepay is 12 years, which speaks for itself. To help make Moorepay such a great place to work, we focus on three things in our company culture: mental health support, maintaining a healthy work/life balance, and equal opportunities and inclusion for all. Here's what you'll gain if you join our team: A career packed with opportunity, in a stable and growing company. A comprehensive programme of learning and development. Competitive base salary. 25 days annual leave, with the opportunity to buy more. You'll even get your birthday off as well! Private medical insurance. Life assurance 4x salary. Enhanced pension with up to 8.5% employer contributions. A huge range of additional flexible benefits across financial & personal wellbeing, lifestyle & leisure.
01/04/2026
Full time
About the role The Site Reliability Engineer plays a critical role in ensuring that our AI-driven, cloud-native platform is reliable, observable, secure, and able to scale with the organisation's growth. As we adopt intelligent agents, autonomous workflows, and increasingly complex distributed systems, the SRE ensures that resilience, performance, and operational excellence are built into everything we deliver. By partnering closely with Engineers, Architects, and the Engineering Manager, the SRE defines the patterns, tooling, and automation that enable fast, safe, and repeatable deployments. This role safeguards our production environment, drives continuous improvement across CI/CD and observability, and establishes the reliability practices that empower autonomous squads to move quickly without compromising stability. The SRE is essential to maintaining customer trust, supporting AI-first innovation, and ensuring our platform remains robust, secure, and highly available at scale. In this position you will ensure the reliability, scalability, and security of our engineering systems. Working closely with the Engineering Manager and Head of Engineering, the SRE will identify priorities to remove friction from engineering teams, streamline processes, and enhance operational excellence. This role combines software engineering principles with systems administration to deliver robust, automated, cost-effective, and secure-by-design solutions. Key Responsibilities Reliability, Performance & Security: Design and implement strategies to improve system reliability, availability, and security. Ensure all solutions follow secure-by-design principles, incorporating cybersecurity best practices from inception through deployment. Conduct regular security reviews and collaborate with security teams to address vulnerabilities. CI/CD Management: Own and optimise Continuous Integration and Continuous Deployment pipelines. Embed security checks (e.g., static analysis, dependency scanning) into CI/CD workflows. Ensure secure, efficient, and automated deployment processes across environments. Monitoring & Observability: Implement and maintain monitoring solutions for infrastructure and applications. Develop dashboards and alerting systems to ensure proactive incident and security event management. Evaluate and integrate new observability tools as needed. Automation & Tooling: Automate repetitive tasks to improve efficiency and reduce human error. Build and maintain internal tools that support engineering productivity and security compliance. Champion Infrastructure as Code (IaC) practices using tools like Terraform or ARM templates. Cloud Infrastructure Management: Manage and optimise services across AWS and Azure environments. Ensure scalability, resilience, and security of service-based architectures. Implement cost management strategies to optimise cloud spend without compromising performance or security. Incident Response & Root Cause Analysis: Lead incident response efforts, including security incidents, and conduct post-mortem reviews. Drive continuous improvement through lessons learned and preventive measures. Skills & experience Proven experience in AWS and Azure cloud environments. Strong background in CI/CD tools (e.g., Azure DevOps, Pipelines, GitHub Actions, Jenkins). Expertise in monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog). Proficiency in scripting and automation (Python, Bash, PowerShell). Familiarity with containerisation and orchestration (Docker, Kubernetes). Solid understanding of networking, security, and cost optimisation in cloud environments. Knowledge of cybersecurity principles, secure coding practices, and compliance frameworks. A problem-solver with a proactive mindset. Comfortable working in fast-paced, evolving environments. Strong communicator who can bridge gaps between operations, development, and security teams. Passionate about automation, scalability, cost efficiency, and security. Benefits & culture Part of the Zellis Group, Moorepay is a team of over 500 friendly professionals across four offices in Swinton (Manchester), Sheffield, Birmingham and Kochi (India). We're passionate about making Moorepay a fantastic place to work for every single one of our colleagues. The average length of service at Moorepay is 12 years, which speaks for itself. To help make Moorepay such a great place to work, we focus on three things in our company culture: mental health support, maintaining a healthy work/life balance, and equal opportunities and inclusion for all. Here's what you'll gain if you join our team: A career packed with opportunity, in a stable and growing company. A comprehensive programme of learning and development. Competitive base salary. 25 days annual leave, with the opportunity to buy more. You'll even get your birthday off as well! Private medical insurance. Life assurance 4x salary. Enhanced pension with up to 8.5% employer contributions. A huge range of additional flexible benefits across financial & personal wellbeing, lifestyle & leisure.
Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Junior Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll join an experienced team you can learn and grow from. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure Previous experience working with Monitoring and Observability Tools - Datadog, Grafana or Prometheus Knowledge on how Kubernetes works. Understanding of IaC - Terraform. Experience with CI/CD (GitHub Actions or similar) A good communicator who enjoys working collaboratively across product and engineering. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
01/04/2026
Full time
Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Junior Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll join an experienced team you can learn and grow from. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure Previous experience working with Monitoring and Observability Tools - Datadog, Grafana or Prometheus Knowledge on how Kubernetes works. Understanding of IaC - Terraform. Experience with CI/CD (GitHub Actions or similar) A good communicator who enjoys working collaboratively across product and engineering. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
Senior Site Reliability Engineer - Active SC Required! Up to £75,000 + benefits Wokingham - Hybrid (UK-based) We're seeking a Senior Site Reliability Engineer to play a key role in designing and operating highly reliable, scalable systems in a fast-paced environment. You'll act as a technical leader within the team, driving best practices across reliability engineering, automation, and system performance. What you'll be doing: Designing and improving system reliability, scalability, and observability Leading incident management and driving root cause analysis Building and maintaining robust CI/CD pipelines and automation frameworks Partnering with development teams to embed SRE principles into the SDLC Mentoring junior engineers and promoting engineering best practices What we're looking for: Strong experience in SRE, DevOps, or platform engineering roles Deep understanding of cloud infrastructure (AWS, Azure, or GCP) Hands-on experience with Kubernetes and containerised environments Strong scripting/programming skills (Python, Go, or similar) Experience with monitoring, alerting, and observability tooling Proven ability to troubleshoot complex distributed systems Why apply? Opportunity to influence technical direction and best practices Work on large-scale, mission-critical systems Leadership exposure with clear progression to principal level
01/04/2026
Full time
Senior Site Reliability Engineer - Active SC Required! Up to £75,000 + benefits Wokingham - Hybrid (UK-based) We're seeking a Senior Site Reliability Engineer to play a key role in designing and operating highly reliable, scalable systems in a fast-paced environment. You'll act as a technical leader within the team, driving best practices across reliability engineering, automation, and system performance. What you'll be doing: Designing and improving system reliability, scalability, and observability Leading incident management and driving root cause analysis Building and maintaining robust CI/CD pipelines and automation frameworks Partnering with development teams to embed SRE principles into the SDLC Mentoring junior engineers and promoting engineering best practices What we're looking for: Strong experience in SRE, DevOps, or platform engineering roles Deep understanding of cloud infrastructure (AWS, Azure, or GCP) Hands-on experience with Kubernetes and containerised environments Strong scripting/programming skills (Python, Go, or similar) Experience with monitoring, alerting, and observability tooling Proven ability to troubleshoot complex distributed systems Why apply? Opportunity to influence technical direction and best practices Work on large-scale, mission-critical systems Leadership exposure with clear progression to principal level
Site Reliability Engineer (SRE) - Active SC required! Up to £55,000 + benefits Hybrid (UK-based) We're looking for a Site Reliability Engineer to join a growing technology team delivering highly scalable, resilient systems across a range of enterprise environments. This is a fantastic opportunity for someone with a solid foundation in DevOps/SRE practices who wants to deepen their expertise in automation, reliability, and cloud-native technologies. What you'll be doing: Supporting the reliability, availability, and performance of production systems Monitoring applications and infrastructure, responding to incidents and driving resolution Automating manual processes to improve efficiency and reduce risk Collaborating with engineering teams to improve system design and resilience Contributing to CI/CD pipelines and infrastructure-as-code practices What we're looking for: Experience in an SRE, DevOps, or similar engineering role Knowledge of cloud platforms (AWS, Azure, or GCP) Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana, ELK) Scripting or programming skills (e.g. Python, Bash, Go) Understanding of containers and orchestration (Docker/Kubernetes is a plus) Why apply? Work with modern, cloud-native technologies Supportive environment with strong learning and development opportunities Clear progression path into senior SRE roles
01/04/2026
Full time
Site Reliability Engineer (SRE) - Active SC required! Up to £55,000 + benefits Hybrid (UK-based) We're looking for a Site Reliability Engineer to join a growing technology team delivering highly scalable, resilient systems across a range of enterprise environments. This is a fantastic opportunity for someone with a solid foundation in DevOps/SRE practices who wants to deepen their expertise in automation, reliability, and cloud-native technologies. What you'll be doing: Supporting the reliability, availability, and performance of production systems Monitoring applications and infrastructure, responding to incidents and driving resolution Automating manual processes to improve efficiency and reduce risk Collaborating with engineering teams to improve system design and resilience Contributing to CI/CD pipelines and infrastructure-as-code practices What we're looking for: Experience in an SRE, DevOps, or similar engineering role Knowledge of cloud platforms (AWS, Azure, or GCP) Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana, ELK) Scripting or programming skills (e.g. Python, Bash, Go) Understanding of containers and orchestration (Docker/Kubernetes is a plus) Why apply? Work with modern, cloud-native technologies Supportive environment with strong learning and development opportunities Clear progression path into senior SRE roles
IT Manager (CDN, AWS & SRE Focus) Manchester (Hybrid - 2 days in office) Up to £80,000 + Benefits Permanent, Full-Time The Opportunity Morson Edge are are looking for an experienced IT Manager to lead and evolve a highperforming infrastructure and reliability function. This is a key leadership role where you'll shape strategy, improve system resilience, and drive best practices across CDN, AWS cloud environments, and Site Reliability Engineering (SRE) . You'll work at the intersection of infrastructure, performance, and reliability-ensuring systems are scalable, secure, and always available. What You'll Be Doing Lead, mentor, and develop a team of engineers across cloud infrastructure and SRE Own and optimise AWS environments , ensuring scalability, cost-efficiency, and security Manage and enhance CDN performance and delivery strategies Drive adoption of SRE principles including SLIs, SLOs, and error budgets Improve system observability through monitoring, logging, and alerting Collaborate with engineering and product teams to support high-availability services Oversee incident management, root cause analysis, and continuous improvement Define and implement infrastructure best practices and automation What We're Looking For Proven experience in an IT Manager/Infrastructure Manager/SRE Lead role Strong expertise in AWS (EC2, Lambda, CloudFront, VPC, etc.) Solid understanding of Content Delivery Networks (CDN) and performance optimisation Experience implementing or working within SRE frameworks Knowledge of Infrastructure as Code (eg, Terraform, CloudFormation) Strong background in monitoring tools (eg, Prometheus, Grafana, Datadog) Excellent leadership and stakeholder management skills Nice to Have Experience with containerisation (Docker, Kubernetes) Exposure to DevOps culture and CI/CD pipelines Security and compliance awareness in cloud environments What's in It for You Salary up to £80,000 Hybrid working (2 days per week in Manchester office) Pension scheme Training and development opportunities A chance to shape and lead a modern, cloud-first infrastructure function
01/04/2026
Full time
IT Manager (CDN, AWS & SRE Focus) Manchester (Hybrid - 2 days in office) Up to £80,000 + Benefits Permanent, Full-Time The Opportunity Morson Edge are are looking for an experienced IT Manager to lead and evolve a highperforming infrastructure and reliability function. This is a key leadership role where you'll shape strategy, improve system resilience, and drive best practices across CDN, AWS cloud environments, and Site Reliability Engineering (SRE) . You'll work at the intersection of infrastructure, performance, and reliability-ensuring systems are scalable, secure, and always available. What You'll Be Doing Lead, mentor, and develop a team of engineers across cloud infrastructure and SRE Own and optimise AWS environments , ensuring scalability, cost-efficiency, and security Manage and enhance CDN performance and delivery strategies Drive adoption of SRE principles including SLIs, SLOs, and error budgets Improve system observability through monitoring, logging, and alerting Collaborate with engineering and product teams to support high-availability services Oversee incident management, root cause analysis, and continuous improvement Define and implement infrastructure best practices and automation What We're Looking For Proven experience in an IT Manager/Infrastructure Manager/SRE Lead role Strong expertise in AWS (EC2, Lambda, CloudFront, VPC, etc.) Solid understanding of Content Delivery Networks (CDN) and performance optimisation Experience implementing or working within SRE frameworks Knowledge of Infrastructure as Code (eg, Terraform, CloudFormation) Strong background in monitoring tools (eg, Prometheus, Grafana, Datadog) Excellent leadership and stakeholder management skills Nice to Have Experience with containerisation (Docker, Kubernetes) Exposure to DevOps culture and CI/CD pipelines Security and compliance awareness in cloud environments What's in It for You Salary up to £80,000 Hybrid working (2 days per week in Manchester office) Pension scheme Training and development opportunities A chance to shape and lead a modern, cloud-first infrastructure function
The Site Reliability Engineer plays a critical role in ensuring that our AI-driven, cloud-native platform is reliable, observable, secure, and able to scale with the organisation's growth. As we adopt intelligent agents, autonomous workflows, and increasingly complex distributed systems, the SRE ensures that resilience, performance, and operational excellence are built into everything we deliver. By partnering closely with Engineers, Architects, and the Engineering Manager, the SRE defines the patterns, tooling, and automation that enable fast, safe, and repeatable deployments. This role safeguards our production environment, drives continuous improvement across CI/CD and observability, and establishes the reliability practices that empower autonomous squads to move quickly without compromising stability. The SRE is essential to maintaining customer trust, supporting AI-first innovation, and ensuring our platform remains robust, secure, and highly available at scale. In this position you will ensure the reliability, scalability, and security of our engineering systems. Working closely with the Engineering Manager and Head of Engineering, the SRE will identify priorities to remove friction from engineering teams, streamline processes, and enhance operational excellence. This role combines software engineering principles with systems administration to deliver robust, automated, cost-effective, and secure-by-design solutions. Key Responsibilities Reliability, Performance & Security: Design and implement strategies to improve system reliability, availability, and security. Ensure all solutions follow secure-by-design principles, incorporating cybersecurity best practices from inception through deployment. Conduct regular security reviews and collaborate with security teams to address vulnerabilities. CI/CD Management: Own and optimise Continuous Integration and Continuous Deployment pipelines. Embed security checks (e.g., static analysis, dependency scanning) into CI/CD workflows. Ensure secure, efficient, and automated deployment processes across environments. Monitoring & Observability: Implement and maintain monitoring solutions for infrastructure and applications. Develop dashboards and alerting systems to ensure proactive incident and security event management. Evaluate and integrate new observability tools as needed. Automation & Tooling: Automate repetitive tasks to improve efficiency and reduce human error. Build and maintain internal tools that support engineering productivity and security compliance. Champion Infrastructure as Code (IaC) practices using tools like Terraform or ARM templates. Cloud Infrastructure Management: Manage and optimise services across AWS and Azure environments. Ensure scalability, resilience, and security of service-based architectures. Implement cost management strategies to optimise cloud spend without compromising performance or security. Incident Response & Root Cause Analysis: Lead incident response efforts, including security incidents, and conduct post-mortem reviews. Drive continuous improvement through lessons learned and preventive measures. Skills & Experience Proven experience in AWS and Azure cloud environments. Strong background in CI/CD tools (e.g., Azure DevOps, Pipelines, GitHub Actions, Jenkins). Expertise in monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog). Proficiency in scripting and automation (Python, Bash, PowerShell). Familiarity with containerisation and orchestration (Docker, Kubernetes). Solid understanding of networking, security, and cost optimisation in cloud environments. Knowledge of cybersecurity principles, secure coding practices, and compliance frameworks. A problem-solver with a proactive mindset. Comfortable working in fast-paced, evolving environments. Strong communicator who can bridge gaps between operations, development, and security teams. Passionate about automation, scalability, cost efficiency, and security.
01/04/2026
Full time
The Site Reliability Engineer plays a critical role in ensuring that our AI-driven, cloud-native platform is reliable, observable, secure, and able to scale with the organisation's growth. As we adopt intelligent agents, autonomous workflows, and increasingly complex distributed systems, the SRE ensures that resilience, performance, and operational excellence are built into everything we deliver. By partnering closely with Engineers, Architects, and the Engineering Manager, the SRE defines the patterns, tooling, and automation that enable fast, safe, and repeatable deployments. This role safeguards our production environment, drives continuous improvement across CI/CD and observability, and establishes the reliability practices that empower autonomous squads to move quickly without compromising stability. The SRE is essential to maintaining customer trust, supporting AI-first innovation, and ensuring our platform remains robust, secure, and highly available at scale. In this position you will ensure the reliability, scalability, and security of our engineering systems. Working closely with the Engineering Manager and Head of Engineering, the SRE will identify priorities to remove friction from engineering teams, streamline processes, and enhance operational excellence. This role combines software engineering principles with systems administration to deliver robust, automated, cost-effective, and secure-by-design solutions. Key Responsibilities Reliability, Performance & Security: Design and implement strategies to improve system reliability, availability, and security. Ensure all solutions follow secure-by-design principles, incorporating cybersecurity best practices from inception through deployment. Conduct regular security reviews and collaborate with security teams to address vulnerabilities. CI/CD Management: Own and optimise Continuous Integration and Continuous Deployment pipelines. Embed security checks (e.g., static analysis, dependency scanning) into CI/CD workflows. Ensure secure, efficient, and automated deployment processes across environments. Monitoring & Observability: Implement and maintain monitoring solutions for infrastructure and applications. Develop dashboards and alerting systems to ensure proactive incident and security event management. Evaluate and integrate new observability tools as needed. Automation & Tooling: Automate repetitive tasks to improve efficiency and reduce human error. Build and maintain internal tools that support engineering productivity and security compliance. Champion Infrastructure as Code (IaC) practices using tools like Terraform or ARM templates. Cloud Infrastructure Management: Manage and optimise services across AWS and Azure environments. Ensure scalability, resilience, and security of service-based architectures. Implement cost management strategies to optimise cloud spend without compromising performance or security. Incident Response & Root Cause Analysis: Lead incident response efforts, including security incidents, and conduct post-mortem reviews. Drive continuous improvement through lessons learned and preventive measures. Skills & Experience Proven experience in AWS and Azure cloud environments. Strong background in CI/CD tools (e.g., Azure DevOps, Pipelines, GitHub Actions, Jenkins). Expertise in monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog). Proficiency in scripting and automation (Python, Bash, PowerShell). Familiarity with containerisation and orchestration (Docker, Kubernetes). Solid understanding of networking, security, and cost optimisation in cloud environments. Knowledge of cybersecurity principles, secure coding practices, and compliance frameworks. A problem-solver with a proactive mindset. Comfortable working in fast-paced, evolving environments. Strong communicator who can bridge gaps between operations, development, and security teams. Passionate about automation, scalability, cost efficiency, and security.
Senior Site Reliability Engineer (Observability) Location: London/UK (Remote) Contract: 12 Months Initial Day rate : £55 Per Hour - £62 Per Hour Inside IR35 Job Overview We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services. Responsibilities Design, deploy and scale observability platforms Manage and scale Prometheus monitoring systems Deploy and maintain large Elasticsearch clusters Build and maintain data pipelines using Kafka Develop alerting and monitoring frameworks Automate infrastructure using Terraform and Ansible Develop tools and scripts using Python, Go, Ruby or Bash Work with Linux systems (Debian/Ubuntu) Participate in on-call rotation Improve system reliability, performance and scalability Required Skills 5+ years experience in Site Reliability Engineering / DevOps Strong Linux systems experience Observability and Monitoring tools experience Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana) Kafka Terraform / Infrastructure as Code Ansible / Configuration Management Programming experience (Python, Go, Ruby or Bash) Distributed systems and cloud infrastructure experience This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. Co. uk Randstad Technologies is acting as an Employment Business in relation to this vacancy.
01/04/2026
Contractor
Senior Site Reliability Engineer (Observability) Location: London/UK (Remote) Contract: 12 Months Initial Day rate : £55 Per Hour - £62 Per Hour Inside IR35 Job Overview We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services. Responsibilities Design, deploy and scale observability platforms Manage and scale Prometheus monitoring systems Deploy and maintain large Elasticsearch clusters Build and maintain data pipelines using Kafka Develop alerting and monitoring frameworks Automate infrastructure using Terraform and Ansible Develop tools and scripts using Python, Go, Ruby or Bash Work with Linux systems (Debian/Ubuntu) Participate in on-call rotation Improve system reliability, performance and scalability Required Skills 5+ years experience in Site Reliability Engineering / DevOps Strong Linux systems experience Observability and Monitoring tools experience Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana) Kafka Terraform / Infrastructure as Code Ansible / Configuration Management Programming experience (Python, Go, Ruby or Bash) Distributed systems and cloud infrastructure experience This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. Co. uk Randstad Technologies is acting as an Employment Business in relation to this vacancy.
Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll have the autonomy to make technical decisions and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure (ideally in a regulated or high-traffic environment) Previous experience working with Monitoring and Observability Tools Hands-on Kubernetes know-how, specifically with EKS. Solid IaC experience with Terraform. Experience with containerisation (Docker, Helm) and CI/CD (GitHub Actions or similar) Solid scripting/Automation experience with Python, Bash or Go A good communicator who enjoys working collaboratively across product and engineering. Desirable Certifications - CKA, CKAD, AWS Solutions Architect etc. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
01/04/2026
Full time
Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll have the autonomy to make technical decisions and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure (ideally in a regulated or high-traffic environment) Previous experience working with Monitoring and Observability Tools Hands-on Kubernetes know-how, specifically with EKS. Solid IaC experience with Terraform. Experience with containerisation (Docker, Helm) and CI/CD (GitHub Actions or similar) Solid scripting/Automation experience with Python, Bash or Go A good communicator who enjoys working collaboratively across product and engineering. Desirable Certifications - CKA, CKAD, AWS Solutions Architect etc. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Site Reliability Engineer Central London (3 days a week in the office) Up to £70,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.
Moorepay is transforming. We are a trusted leader in UK Payroll and HR solutions, but we aren't resting on our history. We are embarking on a major digital transformation to redefine how businesses manage their most important asset: their people. As the Principal Software Solutions Architect, you'll be the technical authority responsible for defining, governing, and evolving the end-to-end architecture of our "AI First" platform, ensuring architectural consistency, secure-by-design principles, and long-term scalability across all engineering squads. Working closely with the Engineering Manager, Cloud & Platform Engineering Lead, and Product leadership, this role shapes our architectural strategy, drives technical excellence, and provides deep guidance to multiple autonomous squads as we scale towards high-performing, cloud-native teams. The Architect balances hands-on solution design, strategic planning, technical oversight, and stakeholder collaboration to keep the platform robust, secure, and ready for future growth. This role defines the architectural backbone that enables the entire engineering organisation to scale effectively. As we transition to multiple autonomous squads, you will ensure our systems remain leading edge, secure, resilient, and consistent enabling rapid product delivery while maintaining high standards of engineering excellence. You will leave an enduring impact on the platform's foundations, influencing everything from service boundaries to reliability strategies and cloud platform design. This is a full time, permanent role working on a hybrid basis with 3 days per week in Manchester. Key Responsibilities: Team Leadership & Scaling Define and maintain the technical architecture vision and roadmap across all squads. Ensure alignment of architecture with business goals, engineering strategy, and long-term scalability. Drive system-wide architectural decisions, providing clear technical direction for squads. Evaluate emerging technologies and propose solutions that improve scalability, performance, and developer productivity. Mentor senior engineers and influence technical leaders across the organisation. Secure-by-Design & Compliance Embed secure-by-design principles into architectural decisions. Ensure threat modelling is performed for new features and major changes. Champion secure coding standards and integration of security testing into the delivery pipeline. Collaborate with security and compliance stakeholders to ensure solutions meet regulatory and governance requirements. Promote design patterns that minimise risk across distributed systems. Solution Design & Governance Own the end-to-end architectural design for major platform components and new product capabilities, with a focus on AI First. Work closely with Engineering Manager and Engineering Team Leads to ensure solutions are consistent, secure, and scalable. Lead architecture reviews and ensure adherence to design standards, technical patterns, and best practices. Produce solution blueprints, reference architectures, and technical documentation. Validate that all solutions support operational excellence, reliability, and maintainability. Cloud, Infrastructure, and Platform Architecture Define scalable service-based architectures leveraging cloud-native patterns. Work with the Lead SRE to ensure architectural designs account for: Observability (metrics, logs, tracing) Reliability (SLIs, SLOs, failover) CI/CD automation Infrastructure as code and environment design Drive optimisation of compute, storage, and network resources across cloud platforms (Azure/AWS). Engineering Collaboration & Technical Enablement Partner with Engineering Manager to ensure squads have clear architectural guidance. Support teams in breaking down complex technical problems into executable, scalable solutions. Provide architectural input into backlog refinement, release planning, and prioritisation. Act as the primary facilitator for cross-team architectural decision-making. Communicate architectural decisions, trade-offs, and risks to both technical and non-technical stakeholders. Continuous Improvement & Technology Standards Define and maintain engineering standards, reusable patterns, and architectural principles. Champion continuous improvement across code quality, security, performance, and operational readiness. Foster a culture of technical excellence, experimentation, and innovation. Skills & Experience Essential: Proven experience as a Principal Architect, Solutions Architect, or Senior Engineer leading architectural decisions in complex systems. Strong understanding of AI technologies such as agents and models for both accelerated design & delivery as well as delivery of product capabilities. Strong background in cloud-native architectures (microservices, event-driven, distributed systems). Deep understanding of secure-by-design principles, threat modelling, cryptography basics, and modern security practices. Experience with API design, integration patterns, and domain-driven design (DDD) and Event Driven Design. Ability to influence without authority and collaborate effectively across engineering, SRE, product, and leadership teams. Exceptional communication skills, capable of simplifying complex technical topics for diverse stakeholders. Extensive experience with modern programming platforms and frameworks (e.g., Node.js, C# .NET, React). Strong grounding in cloud platforms (AWS/Azure), including networking, identity, observability, and cost optimisation. Desirable: Experience designing solutions in regulated or compliance-driven industries. Background in DevOps, platform engineering, or SRE practices. Experience scaling architectures to support high-growth environments. Certification in cloud or architecture frameworks (AWS SA Pro, Azure Architect Expert, TOGAF, etc.).
01/04/2026
Full time
Moorepay is transforming. We are a trusted leader in UK Payroll and HR solutions, but we aren't resting on our history. We are embarking on a major digital transformation to redefine how businesses manage their most important asset: their people. As the Principal Software Solutions Architect, you'll be the technical authority responsible for defining, governing, and evolving the end-to-end architecture of our "AI First" platform, ensuring architectural consistency, secure-by-design principles, and long-term scalability across all engineering squads. Working closely with the Engineering Manager, Cloud & Platform Engineering Lead, and Product leadership, this role shapes our architectural strategy, drives technical excellence, and provides deep guidance to multiple autonomous squads as we scale towards high-performing, cloud-native teams. The Architect balances hands-on solution design, strategic planning, technical oversight, and stakeholder collaboration to keep the platform robust, secure, and ready for future growth. This role defines the architectural backbone that enables the entire engineering organisation to scale effectively. As we transition to multiple autonomous squads, you will ensure our systems remain leading edge, secure, resilient, and consistent enabling rapid product delivery while maintaining high standards of engineering excellence. You will leave an enduring impact on the platform's foundations, influencing everything from service boundaries to reliability strategies and cloud platform design. This is a full time, permanent role working on a hybrid basis with 3 days per week in Manchester. Key Responsibilities: Team Leadership & Scaling Define and maintain the technical architecture vision and roadmap across all squads. Ensure alignment of architecture with business goals, engineering strategy, and long-term scalability. Drive system-wide architectural decisions, providing clear technical direction for squads. Evaluate emerging technologies and propose solutions that improve scalability, performance, and developer productivity. Mentor senior engineers and influence technical leaders across the organisation. Secure-by-Design & Compliance Embed secure-by-design principles into architectural decisions. Ensure threat modelling is performed for new features and major changes. Champion secure coding standards and integration of security testing into the delivery pipeline. Collaborate with security and compliance stakeholders to ensure solutions meet regulatory and governance requirements. Promote design patterns that minimise risk across distributed systems. Solution Design & Governance Own the end-to-end architectural design for major platform components and new product capabilities, with a focus on AI First. Work closely with Engineering Manager and Engineering Team Leads to ensure solutions are consistent, secure, and scalable. Lead architecture reviews and ensure adherence to design standards, technical patterns, and best practices. Produce solution blueprints, reference architectures, and technical documentation. Validate that all solutions support operational excellence, reliability, and maintainability. Cloud, Infrastructure, and Platform Architecture Define scalable service-based architectures leveraging cloud-native patterns. Work with the Lead SRE to ensure architectural designs account for: Observability (metrics, logs, tracing) Reliability (SLIs, SLOs, failover) CI/CD automation Infrastructure as code and environment design Drive optimisation of compute, storage, and network resources across cloud platforms (Azure/AWS). Engineering Collaboration & Technical Enablement Partner with Engineering Manager to ensure squads have clear architectural guidance. Support teams in breaking down complex technical problems into executable, scalable solutions. Provide architectural input into backlog refinement, release planning, and prioritisation. Act as the primary facilitator for cross-team architectural decision-making. Communicate architectural decisions, trade-offs, and risks to both technical and non-technical stakeholders. Continuous Improvement & Technology Standards Define and maintain engineering standards, reusable patterns, and architectural principles. Champion continuous improvement across code quality, security, performance, and operational readiness. Foster a culture of technical excellence, experimentation, and innovation. Skills & Experience Essential: Proven experience as a Principal Architect, Solutions Architect, or Senior Engineer leading architectural decisions in complex systems. Strong understanding of AI technologies such as agents and models for both accelerated design & delivery as well as delivery of product capabilities. Strong background in cloud-native architectures (microservices, event-driven, distributed systems). Deep understanding of secure-by-design principles, threat modelling, cryptography basics, and modern security practices. Experience with API design, integration patterns, and domain-driven design (DDD) and Event Driven Design. Ability to influence without authority and collaborate effectively across engineering, SRE, product, and leadership teams. Exceptional communication skills, capable of simplifying complex technical topics for diverse stakeholders. Extensive experience with modern programming platforms and frameworks (e.g., Node.js, C# .NET, React). Strong grounding in cloud platforms (AWS/Azure), including networking, identity, observability, and cost optimisation. Desirable: Experience designing solutions in regulated or compliance-driven industries. Background in DevOps, platform engineering, or SRE practices. Experience scaling architectures to support high-growth environments. Certification in cloud or architecture frameworks (AWS SA Pro, Azure Architect Expert, TOGAF, etc.).
eDV DevOps Engineer / Site Reliability Engineer (SRE) - AWS, Kubernetes - Contract Outside IR35. . We are supporting a specialist engineering consultancy delivering secure technology platforms to high-profile UK government organisations. They are seeking an eDV Cleared DevOps Engineer / Site Reliability Engineer (SRE) with strong experience across AWS, Kubernetes, Terraform, CI/CD and Linux environments to support the continued growth of critical cross-domain systems. This contract role will focus on improving platform reliability, automation, infrastructure as code, observability and DevOps practices across both cloud and on-premise environments. You will work closely with software engineers, platform engineers and operations teams to ensure highly secure, scalable and resilient systems supporting sensitive government programmes. Location: Cheltenham (Hybrid - 3 days onsite) Rate: 500- 650 per day Outside IR35 Security Clearance: Active eDV Clearance required Start Date ASAP As a DevOps / Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and reliability of services supporting sensitive government programmes. You will collaborate with multiple feature development teams and BAU/support teams to evolve both cloud and on-premise infrastructure, delivery pipelines, and observability tooling. The role will focus on improving system reliability, monitoring, automation, and performance, while proactively identifying and mitigating operational risks. This position may also involve participation in an on-call rota, which could include occasional 24/7 call-out support. Key Responsibilities: Collaborate with software engineering teams to improve subsystem reliability and performance. Work with system administrators to automate operational processes and reduce manual effort. Enhance monitoring and observability capabilities to proactively detect and resolve issues. Support development environments to improve delivery speed and quality. Contribute to the evolution of infrastructure, DevOps practices, and CI/CD pipelines. Research and evaluate new technologies and tools to support engineering decisions. Develop expertise across multiple technical and business domains. Required Skills & Experience Active eDV clearance is essential configuration management tools such as Ansible, Chef, or similar Strong Terraform Docker containers and container orchestration platforms (Kubernetes, OpenShift, Docker Swarm) maintaining and using CI/CD tooling such as Jenkins Monitoring and observability experience with Prometheus, Grafana, or InfluxDB event-driven integration and messaging systems such as RabbitMQ or other AMQP solutions Strong Linux command line, administration, and shell scripting experience Solid understanding of relational databases and SQL network security protocols Working with cloud platforms, ideally AWS (EC2, RDS, S3, Lambda) Azure a plus Please send your CV to Laura at (url removed) to progress matters. Services Advertised are those of Employment Business.
31/03/2026
Contractor
eDV DevOps Engineer / Site Reliability Engineer (SRE) - AWS, Kubernetes - Contract Outside IR35. . We are supporting a specialist engineering consultancy delivering secure technology platforms to high-profile UK government organisations. They are seeking an eDV Cleared DevOps Engineer / Site Reliability Engineer (SRE) with strong experience across AWS, Kubernetes, Terraform, CI/CD and Linux environments to support the continued growth of critical cross-domain systems. This contract role will focus on improving platform reliability, automation, infrastructure as code, observability and DevOps practices across both cloud and on-premise environments. You will work closely with software engineers, platform engineers and operations teams to ensure highly secure, scalable and resilient systems supporting sensitive government programmes. Location: Cheltenham (Hybrid - 3 days onsite) Rate: 500- 650 per day Outside IR35 Security Clearance: Active eDV Clearance required Start Date ASAP As a DevOps / Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and reliability of services supporting sensitive government programmes. You will collaborate with multiple feature development teams and BAU/support teams to evolve both cloud and on-premise infrastructure, delivery pipelines, and observability tooling. The role will focus on improving system reliability, monitoring, automation, and performance, while proactively identifying and mitigating operational risks. This position may also involve participation in an on-call rota, which could include occasional 24/7 call-out support. Key Responsibilities: Collaborate with software engineering teams to improve subsystem reliability and performance. Work with system administrators to automate operational processes and reduce manual effort. Enhance monitoring and observability capabilities to proactively detect and resolve issues. Support development environments to improve delivery speed and quality. Contribute to the evolution of infrastructure, DevOps practices, and CI/CD pipelines. Research and evaluate new technologies and tools to support engineering decisions. Develop expertise across multiple technical and business domains. Required Skills & Experience Active eDV clearance is essential configuration management tools such as Ansible, Chef, or similar Strong Terraform Docker containers and container orchestration platforms (Kubernetes, OpenShift, Docker Swarm) maintaining and using CI/CD tooling such as Jenkins Monitoring and observability experience with Prometheus, Grafana, or InfluxDB event-driven integration and messaging systems such as RabbitMQ or other AMQP solutions Strong Linux command line, administration, and shell scripting experience Solid understanding of relational databases and SQL network security protocols Working with cloud platforms, ideally AWS (EC2, RDS, S3, Lambda) Azure a plus Please send your CV to Laura at (url removed) to progress matters. Services Advertised are those of Employment Business.
Senior Site Reliability Engineer (Observability) Location: London/UK (Remote) Contract: 12 Months Initial Day rate : 55 Per Hour - 62 Per Hour Inside IR35 Job Overview We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services. Responsibilities Design, deploy and scale observability platforms Manage and scale Prometheus monitoring systems Deploy and maintain large Elasticsearch clusters Build and maintain data pipelines using Kafka Develop alerting and monitoring frameworks Automate infrastructure using Terraform and Ansible Develop tools and scripts using Python, Go, Ruby or Bash Work with Linux systems (Debian/Ubuntu) Participate in on-call rotation Improve system reliability, performance and scalability Required Skills 5+ years experience in Site Reliability Engineering / DevOps Strong Linux systems experience Observability and Monitoring tools experience Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana) Kafka Terraform / Infrastructure as Code Ansible / Configuration Management Programming experience (Python, Go, Ruby or Bash) Distributed systems and cloud infrastructure experience This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. Co. uk Randstad Technologies is acting as an Employment Business in relation to this vacancy.
31/03/2026
Contractor
Senior Site Reliability Engineer (Observability) Location: London/UK (Remote) Contract: 12 Months Initial Day rate : 55 Per Hour - 62 Per Hour Inside IR35 Job Overview We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services. Responsibilities Design, deploy and scale observability platforms Manage and scale Prometheus monitoring systems Deploy and maintain large Elasticsearch clusters Build and maintain data pipelines using Kafka Develop alerting and monitoring frameworks Automate infrastructure using Terraform and Ansible Develop tools and scripts using Python, Go, Ruby or Bash Work with Linux systems (Debian/Ubuntu) Participate in on-call rotation Improve system reliability, performance and scalability Required Skills 5+ years experience in Site Reliability Engineering / DevOps Strong Linux systems experience Observability and Monitoring tools experience Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana) Kafka Terraform / Infrastructure as Code Ansible / Configuration Management Programming experience (Python, Go, Ruby or Bash) Distributed systems and cloud infrastructure experience This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. Co. uk Randstad Technologies is acting as an Employment Business in relation to this vacancy.
Principal Developer Team Lead
Salary: £51,400 - £68,800
Location: Cambridge/Hybrid
Contract: Permanent
This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS.
You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities:
Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams
Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms.
What You'll Do
Technical Leadership
Lead migration of legacy applications to cloud-native AWS architectures
Build DevOps automation to support SRE practices
Establish AI/ML development standards and frameworks
Set observability, monitoring, and incident response standards
Promote best practices in web, event-driven, and cloud-native technologies
Provide technical expertise and oversee code reviews
People Leadership
Manage and mentor a team of 4–8 developers, providing coaching, development plan
Identifying training needs in AI/ML and SRE.
Support recruitment and foster a culture of continual improvement and wellbeing.
Delivery & Collaboration
Deliver software in agile squads
Collaborate with architects, SREs, product owners, and infrastructure teams
Liaise with stakeholders to identify education sector needs
Plan and estimate migrations and feature delivery
Coordinate with service management, security, and AWS experts
About you
Essential experience
Degree or equivalent
Proven technical team leadership
Skilled in two or more modern programming languages
Experience with AWS cloud and infrastructure
DevOps skills: automation, CI/CD, infrastructure-as-code
Understanding of SRE and observability
Experience in web-apps and modern frameworks
Strong communicator with technical and non-technical audiences
Technical Expertise
CI/CD pipelines, automation frameworks, and developer tooling
Observability tools, monitoring, logging, and alerting systems
Responsible AI practices and governance
Event-driven architecture and microservices patterns
Software design patterns and scalability best practices
Security principles in cloud environments
Leadership Qualities
Ability to set technical standards and provide thought leadership
Experience balancing people management with hands-on contribution
Strong mentoring and coaching skills
Collaborative approach that builds trust across teams
Passion for continuous learning in AI/ML and DevOps
Promotes inclusion and continuous improvement
You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies.
If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability.
Ready to pursue your potential? Apply now.
We review applications on an ongoing basis, with a closing date for all applications being 18 February 2026.
If you are shortlisted and progressed through the stages, you can expect:
A 40-minute screening call with the Hiring Manager.
First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Why join us
Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
04/02/2026
Full time
Principal Developer Team Lead
Salary: £51,400 - £68,800
Location: Cambridge/Hybrid
Contract: Permanent
This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS.
You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities:
Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams
Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms.
What You'll Do
Technical Leadership
Lead migration of legacy applications to cloud-native AWS architectures
Build DevOps automation to support SRE practices
Establish AI/ML development standards and frameworks
Set observability, monitoring, and incident response standards
Promote best practices in web, event-driven, and cloud-native technologies
Provide technical expertise and oversee code reviews
People Leadership
Manage and mentor a team of 4–8 developers, providing coaching, development plan
Identifying training needs in AI/ML and SRE.
Support recruitment and foster a culture of continual improvement and wellbeing.
Delivery & Collaboration
Deliver software in agile squads
Collaborate with architects, SREs, product owners, and infrastructure teams
Liaise with stakeholders to identify education sector needs
Plan and estimate migrations and feature delivery
Coordinate with service management, security, and AWS experts
About you
Essential experience
Degree or equivalent
Proven technical team leadership
Skilled in two or more modern programming languages
Experience with AWS cloud and infrastructure
DevOps skills: automation, CI/CD, infrastructure-as-code
Understanding of SRE and observability
Experience in web-apps and modern frameworks
Strong communicator with technical and non-technical audiences
Technical Expertise
CI/CD pipelines, automation frameworks, and developer tooling
Observability tools, monitoring, logging, and alerting systems
Responsible AI practices and governance
Event-driven architecture and microservices patterns
Software design patterns and scalability best practices
Security principles in cloud environments
Leadership Qualities
Ability to set technical standards and provide thought leadership
Experience balancing people management with hands-on contribution
Strong mentoring and coaching skills
Collaborative approach that builds trust across teams
Passion for continuous learning in AI/ML and DevOps
Promotes inclusion and continuous improvement
You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies.
If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability.
Ready to pursue your potential? Apply now.
We review applications on an ongoing basis, with a closing date for all applications being 18 February 2026.
If you are shortlisted and progressed through the stages, you can expect:
A 40-minute screening call with the Hiring Manager.
First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Why join us
Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
Cambridge University Press & Assessment
Cambridge/Hybrid (with 2-3 days per week in office)
Job Title: English Technology Platform SRE Team Lead
Salary: £68,600 - £91,700
Location: Cambridge/Hybrid (with 2-3 days per week in office)
Contract: Permanent
Hours: Full time
Are you ready to shape the future of technology platforms at the heart of Cambridge's academic excellence? Join us as our English Technology Platform SRE Team Lead and help drive innovation, reliability, and intelligent automation in a world-class environment.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
The SRE Team Lead will lead a mature Site Reliability Engineering function within the Platform Operations Team, working closely with Platform Support and Engineering teams. This role demands strong thought leadership, technical depth, and strategic direction for the discipline, with a particular emphasis on leveraging AI-driven operations (AIOps) and FinOps practices to optimise reliability, performance, and cloud spend.
Although this is a hands-on technical role, the SRE Team Lead will also manage a small team of SRE, providing clear direction and ensuring consistent, data-driven, AI-enhanced service delivery across the platforms while working collaboratively with existing support and engineering groups.
Apply core SRE and DevOps principles—culture, automation, testing, measurement, and continuous improvement—to build and optimise pipelines focused on rapid, reliable software delivery. Integrate AIOps capabilities, such as automated anomaly detection and intelligent alerting, to further enhance operational excellence.
Work with Solutions Architecture, Development, and QA teams to automate processes wherever possible, creating and improving stable CI/CD pipelines for both software and infrastructure. Develop tools that enable rapid provisioning of environments and resources across all teams, incorporating AI-assisted automation where beneficial.
Use automation, observability, and monitoring tools to improve site reliability and proactively identify issues. Support development teams with troubleshooting, particularly in infrastructure, networking, and multi-tier application design. Serve as a subject matter expert for cloud services—especially AWS PaaS—while applying FinOps practices to ensure cloud cost transparency, optimisation, and efficient resource usage.
Create and maintain robust technical documentation for the infrastructure of the English platforms, including operational runbooks enhanced with predictive and AI-supported insights.
Stay engaged with developments in the SRE, DevOps, AIOps, and FinOps communities, continually introducing new practices and technologies to improve reliability, performance, automation, and cloud cost efficiency
This position has been classified as a hybrid role, requiring the selected candidate to typically spend 40-60% of their time collaborating and connecting face-to-face at their dedicated location. Aside from our hybrid principles, other flexible working requests will be considered from the first day of employment, including other work arrangements should you require adjustments due to a disability or long-term health condition.
About you
A passion for Site reliability engineering and driven to understand, anticipate, and counter platform related issues before they become problems and staying up to date with the latest technological trends and developments
Great communication allowing effective collaboration across technical leadership and various business stakeholders with the ability to present ideas and strategies clearly and persuasively.
Demonstratable soft skills in motivating, inspiring and leading a team (direct line management is not part of the roles remit)
Educated to degree level or equivalent and with a minimum of 5 years proven experience in a systems administration or dev-ops blended role.
Experience implementing technologies such as Terraform, Github Actions & Containerization/Orchestration e.g. Kubernetes & Docker
Expertise in Monitoring tools like New Relic, Grafana, Alert Manager and site24x7.
Have extreme knowledge of cloud computing infrastructure, especially using Amazon Web Services (EKS, ECS, RDS, Route53 etc.)
Excellent troubleshooting, debugging, communication and documentation skills
Experience of working within an Agile product development environment.
For a detailed job description, please refer to the link at the bottom of the advert on our careers site.
We are a Disability Confident (DC) employer that is committed to equality and inclusion ensuring our recruitment process is accessible to all. The DC scheme's Offer of an Interview commitment applies to applicants who opt in, and disclose a disability or a long-term health condition, and best meet the minimum criteria for the role. In instances where interviewing all qualifying candidates is not practicable, we prioritise those who best meet the minimum criteria, as we would for applicants who do not have a disability or long-term health condition.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
Ready to pursue your potential? Apply now.
We aim to support candidates by making our interview process clear and transparent. The closing date for all applications will be 4th February. We will review applications on an ongoing basis, and shortlisted candidates can expect interviews to take place shortly after it closes.
If you are shortlisted and progressed through the stages, you can expect:
A 15-minute screening call with the Hiring Manager.
Final stage virtual interview via MS Teams.
If you require any reasonable adjustments during the recruitment process due to a disability or a long-term health condition, there will be an opportunity for you to inform us via the online application form. We will do our best to accommodate your needs.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
We are committed to an equitable recruitment process. As such, applications must be submitted via our official online application procedure. Please refrain from sending your CV directly to our recruiters. If you experience technical difficulties or require additional support with submitting your online application, contact the Recruiter.
Why join us
Joining us is your opportunity to pursue potential. You will belong to a collaborative team that is exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it is safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
If you are ready to take the next step in your Cambridge journey, we welcome your application. Together, we continue to shape a culture where everyone feels empowered to succeed and motivated to make a difference— for ourselves, for each other, and for learners worldwide.
21/01/2026
Full time
Job Title: English Technology Platform SRE Team Lead
Salary: £68,600 - £91,700
Location: Cambridge/Hybrid (with 2-3 days per week in office)
Contract: Permanent
Hours: Full time
Are you ready to shape the future of technology platforms at the heart of Cambridge's academic excellence? Join us as our English Technology Platform SRE Team Lead and help drive innovation, reliability, and intelligent automation in a world-class environment.
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge.
About the role
The SRE Team Lead will lead a mature Site Reliability Engineering function within the Platform Operations Team, working closely with Platform Support and Engineering teams. This role demands strong thought leadership, technical depth, and strategic direction for the discipline, with a particular emphasis on leveraging AI-driven operations (AIOps) and FinOps practices to optimise reliability, performance, and cloud spend.
Although this is a hands-on technical role, the SRE Team Lead will also manage a small team of SRE, providing clear direction and ensuring consistent, data-driven, AI-enhanced service delivery across the platforms while working collaboratively with existing support and engineering groups.
Apply core SRE and DevOps principles—culture, automation, testing, measurement, and continuous improvement—to build and optimise pipelines focused on rapid, reliable software delivery. Integrate AIOps capabilities, such as automated anomaly detection and intelligent alerting, to further enhance operational excellence.
Work with Solutions Architecture, Development, and QA teams to automate processes wherever possible, creating and improving stable CI/CD pipelines for both software and infrastructure. Develop tools that enable rapid provisioning of environments and resources across all teams, incorporating AI-assisted automation where beneficial.
Use automation, observability, and monitoring tools to improve site reliability and proactively identify issues. Support development teams with troubleshooting, particularly in infrastructure, networking, and multi-tier application design. Serve as a subject matter expert for cloud services—especially AWS PaaS—while applying FinOps practices to ensure cloud cost transparency, optimisation, and efficient resource usage.
Create and maintain robust technical documentation for the infrastructure of the English platforms, including operational runbooks enhanced with predictive and AI-supported insights.
Stay engaged with developments in the SRE, DevOps, AIOps, and FinOps communities, continually introducing new practices and technologies to improve reliability, performance, automation, and cloud cost efficiency
This position has been classified as a hybrid role, requiring the selected candidate to typically spend 40-60% of their time collaborating and connecting face-to-face at their dedicated location. Aside from our hybrid principles, other flexible working requests will be considered from the first day of employment, including other work arrangements should you require adjustments due to a disability or long-term health condition.
About you
A passion for Site reliability engineering and driven to understand, anticipate, and counter platform related issues before they become problems and staying up to date with the latest technological trends and developments
Great communication allowing effective collaboration across technical leadership and various business stakeholders with the ability to present ideas and strategies clearly and persuasively.
Demonstratable soft skills in motivating, inspiring and leading a team (direct line management is not part of the roles remit)
Educated to degree level or equivalent and with a minimum of 5 years proven experience in a systems administration or dev-ops blended role.
Experience implementing technologies such as Terraform, Github Actions & Containerization/Orchestration e.g. Kubernetes & Docker
Expertise in Monitoring tools like New Relic, Grafana, Alert Manager and site24x7.
Have extreme knowledge of cloud computing infrastructure, especially using Amazon Web Services (EKS, ECS, RDS, Route53 etc.)
Excellent troubleshooting, debugging, communication and documentation skills
Experience of working within an Agile product development environment.
For a detailed job description, please refer to the link at the bottom of the advert on our careers site.
We are a Disability Confident (DC) employer that is committed to equality and inclusion ensuring our recruitment process is accessible to all. The DC scheme's Offer of an Interview commitment applies to applicants who opt in, and disclose a disability or a long-term health condition, and best meet the minimum criteria for the role. In instances where interviewing all qualifying candidates is not practicable, we prioritise those who best meet the minimum criteria, as we would for applicants who do not have a disability or long-term health condition.
Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for.
Rewards and benefits
We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including:
28 days annual leave plus bank holidays
Private medical and Permanent Health Insurance
Discretionary annual bonus
Group personal pension scheme
Life assurance up to 4 x annual salary
Green travel schemes
Ready to pursue your potential? Apply now.
We aim to support candidates by making our interview process clear and transparent. The closing date for all applications will be 4th February. We will review applications on an ongoing basis, and shortlisted candidates can expect interviews to take place shortly after it closes.
If you are shortlisted and progressed through the stages, you can expect:
A 15-minute screening call with the Hiring Manager.
Final stage virtual interview via MS Teams.
If you require any reasonable adjustments during the recruitment process due to a disability or a long-term health condition, there will be an opportunity for you to inform us via the online application form. We will do our best to accommodate your needs.
Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry.
We are committed to an equitable recruitment process. As such, applications must be submitted via our official online application procedure. Please refrain from sending your CV directly to our recruiters. If you experience technical difficulties or require additional support with submitting your online application, contact the Recruiter.
Why join us
Joining us is your opportunity to pursue potential. You will belong to a collaborative team that is exploring new and better ways to serve students, teachers and researchers across the globe – for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it is safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background.
We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.
If you are ready to take the next step in your Cambridge journey, we welcome your application. Together, we continue to shape a culture where everyone feels empowered to succeed and motivated to make a difference— for ourselves, for each other, and for learners worldwide.
Join us as a PostgreSQL SRE at Barclays where you'll effectively monitor and maintain the bank's critical technology infrastructure and resolve more complex technical issues, whilst minimizing disruption to operations. In this role you will assume a key technical leadership role. You will shape the direction of our database administration, ensuring our technological approaches are innovative and aligned with the Bank's business goals. To be successful as a PostgreSQL SRE, you should have: Experience as a Database Administrator, with a focus on PostgreSQL and similar database technologies such as Oracle or MS-SQL. A background in implementing and leading SRE practices across large organizations or complex teams. Hands-on experience on Containers and Kubernetes Experience with DevOps automation tools such as Code versioning (git), JIRA, Ansible, database CI/CD tools and their implementation. Some other highly valued skills may include: Expertise with scripting languages (e.g. PowerShell, Python, Bash) for automation/migration tasks Experience of working on Data migration tools and software's Expertise in system configuration management tools such as Chef, Ansible for database server configurations. You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen strategic thinking and digital and technology, as well as job-specific technical skills This role can be based in our Knutsford or Glasgow, office. Purpose of the role To apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations. Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth. Assistant Vice President Expectations To advise and influence decision making, contribute to policy development and take responsibility for operational effectiveness. Collaborate closely with other functions/ business divisions. Lead a team performing complex tasks, using well developed professional knowledge and skills to deliver on work that impacts the whole business function. Set objectives and coach employees in pursuit of those objectives, appraisal of performance relative to objectives and determination of reward outcomes If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: L - Listen and be authentic, E - Energise and inspire, A - Align across the enterprise, D - Develop others. OR for an individual contributor, they will lead collaborative assignments and guide team members through structured assignments, identify the need for the inclusion of other areas of specialisation to complete assignments. They will identify new directions for assignments and/ or projects, identifying a combination of cross functional methodologies or practices to meet required outcomes. Consult on complex issues; providing advice to People Leaders to support the resolution of escalated issues. Identify ways to mitigate risk and developing new policies/procedures in support of the control and governance agenda. Take ownership for managing risk and strengthening controls in relation to the work done. Perform work that is closely related to that of other areas, which requires understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function. Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategy. Engage in complex analysis of data from multiple sources of information, internal and external sources such as procedures and practises (in other areas, teams, companies, etc).to solve problems creatively and effectively. Communicate complex information. 'Complex' information could include sensitive information or information that is difficult to communicate because of its content or its audience. Influence or convince stakeholders to achieve outcomes. All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave. Investment
06/10/2025
Full time
Join us as a PostgreSQL SRE at Barclays where you'll effectively monitor and maintain the bank's critical technology infrastructure and resolve more complex technical issues, whilst minimizing disruption to operations. In this role you will assume a key technical leadership role. You will shape the direction of our database administration, ensuring our technological approaches are innovative and aligned with the Bank's business goals. To be successful as a PostgreSQL SRE, you should have: Experience as a Database Administrator, with a focus on PostgreSQL and similar database technologies such as Oracle or MS-SQL. A background in implementing and leading SRE practices across large organizations or complex teams. Hands-on experience on Containers and Kubernetes Experience with DevOps automation tools such as Code versioning (git), JIRA, Ansible, database CI/CD tools and their implementation. Some other highly valued skills may include: Expertise with scripting languages (e.g. PowerShell, Python, Bash) for automation/migration tasks Experience of working on Data migration tools and software's Expertise in system configuration management tools such as Chef, Ansible for database server configurations. You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen strategic thinking and digital and technology, as well as job-specific technical skills This role can be based in our Knutsford or Glasgow, office. Purpose of the role To apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations. Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth. Assistant Vice President Expectations To advise and influence decision making, contribute to policy development and take responsibility for operational effectiveness. Collaborate closely with other functions/ business divisions. Lead a team performing complex tasks, using well developed professional knowledge and skills to deliver on work that impacts the whole business function. Set objectives and coach employees in pursuit of those objectives, appraisal of performance relative to objectives and determination of reward outcomes If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: L - Listen and be authentic, E - Energise and inspire, A - Align across the enterprise, D - Develop others. OR for an individual contributor, they will lead collaborative assignments and guide team members through structured assignments, identify the need for the inclusion of other areas of specialisation to complete assignments. They will identify new directions for assignments and/ or projects, identifying a combination of cross functional methodologies or practices to meet required outcomes. Consult on complex issues; providing advice to People Leaders to support the resolution of escalated issues. Identify ways to mitigate risk and developing new policies/procedures in support of the control and governance agenda. Take ownership for managing risk and strengthening controls in relation to the work done. Perform work that is closely related to that of other areas, which requires understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function. Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategy. Engage in complex analysis of data from multiple sources of information, internal and external sources such as procedures and practises (in other areas, teams, companies, etc).to solve problems creatively and effectively. Communicate complex information. 'Complex' information could include sensitive information or information that is difficult to communicate because of its content or its audience. Influence or convince stakeholders to achieve outcomes. All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave. Investment
Join us as a PostgreSQL SRE at Barclays where you'll effectively monitor and maintain the bank's critical technology infrastructure and resolve more complex technical issues, whilst minimizing disruption to operations. In this role you will assume a key technical leadership role. You will shape the direction of our database administration, ensuring our technological approaches are innovative and aligned with the Bank's business goals. To be successful as a PostgreSQL SRE, you should have: Experience as a Database Administrator, with a focus on PostgreSQL and similar database technologies such as Oracle or MS-SQL. A background in implementing and leading SRE practices across large organizations or complex teams. Hands-on experience on Containers and Kubernetes Experience with DevOps automation tools such as Code versioning (git), JIRA, Ansible, database CI/CD tools and their implementation. Some other highly valued skills may include: Expertise with scripting languages (e.g. PowerShell, Python, Bash) for automation/migration tasks Experience of working on Data migration tools and software's Expertise in system configuration management tools such as Chef, Ansible for database server configurations. You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen strategic thinking and digital and technology, as well as job-specific technical skills This role can be based in our Knutsford or Glasgow, office. Purpose of the role To apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations. Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth. Assistant Vice President Expectations To advise and influence decision making, contribute to policy development and take responsibility for operational effectiveness. Collaborate closely with other functions/ business divisions. Lead a team performing complex tasks, using well developed professional knowledge and skills to deliver on work that impacts the whole business function. Set objectives and coach employees in pursuit of those objectives, appraisal of performance relative to objectives and determination of reward outcomes If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: L - Listen and be authentic, E - Energise and inspire, A - Align across the enterprise, D - Develop others. OR for an individual contributor, they will lead collaborative assignments and guide team members through structured assignments, identify the need for the inclusion of other areas of specialisation to complete assignments. They will identify new directions for assignments and/ or projects, identifying a combination of cross functional methodologies or practices to meet required outcomes. Consult on complex issues; providing advice to People Leaders to support the resolution of escalated issues. Identify ways to mitigate risk and developing new policies/procedures in support of the control and governance agenda. Take ownership for managing risk and strengthening controls in relation to the work done. Perform work that is closely related to that of other areas, which requires understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function. Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategy. Engage in complex analysis of data from multiple sources of information, internal and external sources such as procedures and practises (in other areas, teams, companies, etc).to solve problems creatively and effectively. Communicate complex information. 'Complex' information could include sensitive information or information that is difficult to communicate because of its content or its audience. Influence or convince stakeholders to achieve outcomes. All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave. Investment
06/10/2025
Full time
Join us as a PostgreSQL SRE at Barclays where you'll effectively monitor and maintain the bank's critical technology infrastructure and resolve more complex technical issues, whilst minimizing disruption to operations. In this role you will assume a key technical leadership role. You will shape the direction of our database administration, ensuring our technological approaches are innovative and aligned with the Bank's business goals. To be successful as a PostgreSQL SRE, you should have: Experience as a Database Administrator, with a focus on PostgreSQL and similar database technologies such as Oracle or MS-SQL. A background in implementing and leading SRE practices across large organizations or complex teams. Hands-on experience on Containers and Kubernetes Experience with DevOps automation tools such as Code versioning (git), JIRA, Ansible, database CI/CD tools and their implementation. Some other highly valued skills may include: Expertise with scripting languages (e.g. PowerShell, Python, Bash) for automation/migration tasks Experience of working on Data migration tools and software's Expertise in system configuration management tools such as Chef, Ansible for database server configurations. You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen strategic thinking and digital and technology, as well as job-specific technical skills This role can be based in our Knutsford or Glasgow, office. Purpose of the role To apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations. Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth. Assistant Vice President Expectations To advise and influence decision making, contribute to policy development and take responsibility for operational effectiveness. Collaborate closely with other functions/ business divisions. Lead a team performing complex tasks, using well developed professional knowledge and skills to deliver on work that impacts the whole business function. Set objectives and coach employees in pursuit of those objectives, appraisal of performance relative to objectives and determination of reward outcomes If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: L - Listen and be authentic, E - Energise and inspire, A - Align across the enterprise, D - Develop others. OR for an individual contributor, they will lead collaborative assignments and guide team members through structured assignments, identify the need for the inclusion of other areas of specialisation to complete assignments. They will identify new directions for assignments and/ or projects, identifying a combination of cross functional methodologies or practices to meet required outcomes. Consult on complex issues; providing advice to People Leaders to support the resolution of escalated issues. Identify ways to mitigate risk and developing new policies/procedures in support of the control and governance agenda. Take ownership for managing risk and strengthening controls in relation to the work done. Perform work that is closely related to that of other areas, which requires understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function. Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategy. Engage in complex analysis of data from multiple sources of information, internal and external sources such as procedures and practises (in other areas, teams, companies, etc).to solve problems creatively and effectively. Communicate complex information. 'Complex' information could include sensitive information or information that is difficult to communicate because of its content or its audience. Influence or convince stakeholders to achieve outcomes. All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave. Investment
AVP Infrastructure Cloud Support - AWS, Terraform, Python, DevOps, SRE - Permanent Job purpose This role is supporting the AWS Public cloud infrastructure and implementation of Infrastructure as Code using Terraform. The role will work closely with the SRE and Engineering teams to ensure that the Cloud environment has sufficient observability and is appropriately managed. What you will be doing: Responsible for ensuring the Production service is prioritized, with all service incidents, problems and requests for cloud hosted services responded to and actioned. Responsible for maintaining the reliability and security of the Cloud Hosted environments. Improve Observability and Telemetry in the Cloud Hosted environments utilizing SRE methodology to give SLA, SLO and SLIs. Ensure risks within the Cloud hosted environment are documented and regularly reviewed. Identified operational risk issues are captured with appropriate actions tracked to agreed timelines. Define and implement standards and procedures to adhere to current best practice and drive continual service improvement. Responsible for ensuring Security standards are implemented and maintained in the Cloud hosted environment. Including delivery of upgrades and security updates to minimise risk and ensure stability for all cloud hosted services. Responsible for maintaining service resilience for all cloud hosted services, including backup and disaster recovery processes. Where necessary plan and conduct quarterly DR tests for all cloud hosted services ensuring any findings are captured and addressed promptly. What we're looking for: Must have strong technical operational skills in supporting AWS Cloud Hosted environments and at least 3 years in an Infrastructure support role. Strong understanding of Infrastructure as Code technologies, ideally including Terraform and Ansible. Operational risk and control management processes, including an understanding of Security best practice and how to apply this safely within a Production environment. Asset management and life cycle (EOS/EOL) process management. Planning and leading disaster recovery fail-overs of IT systems and services. Preferably experience of working in a regulated financial services/banking organization. Able to understand and use AWS including an understanding of AWS services, security and networking. Knowledge of at least 1 programming language, preferably Python. Knowledge of CI/CD specifically relating to Cloud Hosted environments. Including an understanding of some of the Infrastructure as Code tools GIT, Terraform, Ansible, Jenkins. Permanent Role - Hybrid working (Central London based) - Candidate must be eligible to work in the UK By applying to this job you are sending us your CV, which may contain personal information. Please refer to our Privacy Notice to understand how we process this information. In short, in order to supply you with work finding services, we will hold and process your personal data, and only with your express permission we will share this personal data with a client (or a third party working on behalf of the client) by email or by upload to the Client/third parties vendor management system. By giving us permission to send your CV to a client, this constitutes permission to share the personal data that would be necessary to consider your application, interview you (Phone/video/face to face) and if successful hire you. Scope AT acts as an employment agency for Permanent Recruitment and an employment business for the supply of temporary workers. By applying for this job you accept the Terms and Conditions, Data Protection Policy, Privacy Notice and Disclaimers which can be found at our website.
06/10/2025
Full time
AVP Infrastructure Cloud Support - AWS, Terraform, Python, DevOps, SRE - Permanent Job purpose This role is supporting the AWS Public cloud infrastructure and implementation of Infrastructure as Code using Terraform. The role will work closely with the SRE and Engineering teams to ensure that the Cloud environment has sufficient observability and is appropriately managed. What you will be doing: Responsible for ensuring the Production service is prioritized, with all service incidents, problems and requests for cloud hosted services responded to and actioned. Responsible for maintaining the reliability and security of the Cloud Hosted environments. Improve Observability and Telemetry in the Cloud Hosted environments utilizing SRE methodology to give SLA, SLO and SLIs. Ensure risks within the Cloud hosted environment are documented and regularly reviewed. Identified operational risk issues are captured with appropriate actions tracked to agreed timelines. Define and implement standards and procedures to adhere to current best practice and drive continual service improvement. Responsible for ensuring Security standards are implemented and maintained in the Cloud hosted environment. Including delivery of upgrades and security updates to minimise risk and ensure stability for all cloud hosted services. Responsible for maintaining service resilience for all cloud hosted services, including backup and disaster recovery processes. Where necessary plan and conduct quarterly DR tests for all cloud hosted services ensuring any findings are captured and addressed promptly. What we're looking for: Must have strong technical operational skills in supporting AWS Cloud Hosted environments and at least 3 years in an Infrastructure support role. Strong understanding of Infrastructure as Code technologies, ideally including Terraform and Ansible. Operational risk and control management processes, including an understanding of Security best practice and how to apply this safely within a Production environment. Asset management and life cycle (EOS/EOL) process management. Planning and leading disaster recovery fail-overs of IT systems and services. Preferably experience of working in a regulated financial services/banking organization. Able to understand and use AWS including an understanding of AWS services, security and networking. Knowledge of at least 1 programming language, preferably Python. Knowledge of CI/CD specifically relating to Cloud Hosted environments. Including an understanding of some of the Infrastructure as Code tools GIT, Terraform, Ansible, Jenkins. Permanent Role - Hybrid working (Central London based) - Candidate must be eligible to work in the UK By applying to this job you are sending us your CV, which may contain personal information. Please refer to our Privacy Notice to understand how we process this information. In short, in order to supply you with work finding services, we will hold and process your personal data, and only with your express permission we will share this personal data with a client (or a third party working on behalf of the client) by email or by upload to the Client/third parties vendor management system. By giving us permission to send your CV to a client, this constitutes permission to share the personal data that would be necessary to consider your application, interview you (Phone/video/face to face) and if successful hire you. Scope AT acts as an employment agency for Permanent Recruitment and an employment business for the supply of temporary workers. By applying for this job you accept the Terms and Conditions, Data Protection Policy, Privacy Notice and Disclaimers which can be found at our website.
Cloud DevOps Support Engineer Salary: £45-55k Hybrid- Cardiff/Bristol Join an industry-leading MSP and cloud consulting business at an exciting phase of growth. This is a fantastic opportunity to work with some of the top AWS and Azure partner talent in the sector, contributing to the management and evolution of high-scale operational environments. As a Cloud DevOps Support Engineer, this position is predominantly operational (80%), with opportunities for rotation into project delivery and solution development to further enhance technical skills and cloud expertise. You'll play a critical part in optimising and supporting our customers' AWS and Azure environments, leveraging your Infrastructure-as-Code (IaC) proficiency, automation skills, and passion for cloud technology. This role suits a coder by nature who enjoys troubleshooting complex technical problems in cloud-native and hybrid settings, ensuring the highest standards of reliability, efficiency, and innovation. The successful candidate will be directly involved in managing our customer cloud platforms for a diverse enterprise client base, acting both as a trusted technical expert and a collaborative team player. You'll work side-by-side within a cross functional squad supporting both day-to-day operational excellence and next-gen cloud adoption initiatives. AWS associate level certification is essential, with a commitment to achieve professional certification needed; AI/ML experience is advantageous but not mandatory. Key technologies you will need to support include Windows, networking, with a blend of cloud native- PaaS expertise across security, serverless and AI/ML. Now is a great time to join and contribute to our operational maturity journey, benefit from best-in-class mentoring, and accelerate your career as we scale to meet ambitious growth targets. What you'll be doing: Operational Cloud Support Providing technical support and troubleshooting of AWS and Azure environments for enterprise customers, including incident management, monitoring, backup, and disaster recovery. Implement and maintain robust monitoring, alerting, and reporting frameworks to ensure SLA adherence and proactive issue detection. Support upgrades, patches, and problem resolution across cloud platforms with an automation-first mindset. Supporting cost optimisation (FinOps) and security posture improvement across client deployments. Automation, IaC, and CI/CD Build, optimise, and manage Infrastructure-as-Code (IaC) templates and automation scripts-primarily using Terraform, CloudFormation, ARM/Azure Bicep, and related tools. Develop, maintain, and enhance CI/CD pipelines and GitOps workflows to accelerate cloud deployments and streamline operational changes. Participate in release management, change configuration, and cloud resource life cycle operations. Project Delivery & Skill Development Rotate into project-based delivery assignments to participate in cloud migration, modernisation, and optimisation engagements, building hands-on expertise and expanding knowledge of new services (including AI/ML/GenAI when relevant). Contribute to knowledge sharing and continually develop skillsets by collaborating with cloud architects, engineers, and product specialists. Collaboration & Continuous Improvement Work closely with service desk, SREs, developers, and security teams to resolve incidents, enhance reliability, and adopt best operational practices. Document technical solutions, create playbooks, and recommend process improvements to drive efficiency and standardisation. Promote a culture of automation, continuous learning, and operational excellence within the cloud team. What you need to succeed: Solid, hands-on experience supporting, configuring, and troubleshooting AWS and/or Azure environments in large-scale or MSP settings. Diligent and client-focussed mentality ensuring customer outcomes are maintained. Expertise in moving Windows Server workloads to AWS Workspaces or Azure AVD/Workspaces is advantageous. Proficiency in Infrastructure-as-Code (Terraform, CloudFormation, or equivalent), with a strong automation and Scripting background (Python, PowerShell, or Bash). Direct experience with cloud platform operations, monitoring, and incident response, including root cause analysis and problem management. Demonstrated ability to manage CI/CD tools, source control (Git), and modern DevOps workflows. Enthusiasm for collaborating with diverse technical teams and mentoring less-experienced team members. Strong communication skills, both written and verbal, for engaging with technical peers, customers, and non-technical stakeholders. AWS Associate certification required; willingness to achieve AWS Professional (DevOps or Solutions Architect). Azure certification or experience highly valued. Experience or demonstrated interest in supporting AI/ML/GenAI operations is a plus but not essential. At Lucid, we celebrate difference and value diverse perspectives, underpinned by our values of Honesty, Integrity, and Pragmatism. We welcome applications from all suitably qualified or experienced candidates, regardless of personal characteristics. If you have a disability or health condition and seek support throughout the recruitment process, please do not hesitate to contact us.
02/10/2025
Full time
Cloud DevOps Support Engineer Salary: £45-55k Hybrid- Cardiff/Bristol Join an industry-leading MSP and cloud consulting business at an exciting phase of growth. This is a fantastic opportunity to work with some of the top AWS and Azure partner talent in the sector, contributing to the management and evolution of high-scale operational environments. As a Cloud DevOps Support Engineer, this position is predominantly operational (80%), with opportunities for rotation into project delivery and solution development to further enhance technical skills and cloud expertise. You'll play a critical part in optimising and supporting our customers' AWS and Azure environments, leveraging your Infrastructure-as-Code (IaC) proficiency, automation skills, and passion for cloud technology. This role suits a coder by nature who enjoys troubleshooting complex technical problems in cloud-native and hybrid settings, ensuring the highest standards of reliability, efficiency, and innovation. The successful candidate will be directly involved in managing our customer cloud platforms for a diverse enterprise client base, acting both as a trusted technical expert and a collaborative team player. You'll work side-by-side within a cross functional squad supporting both day-to-day operational excellence and next-gen cloud adoption initiatives. AWS associate level certification is essential, with a commitment to achieve professional certification needed; AI/ML experience is advantageous but not mandatory. Key technologies you will need to support include Windows, networking, with a blend of cloud native- PaaS expertise across security, serverless and AI/ML. Now is a great time to join and contribute to our operational maturity journey, benefit from best-in-class mentoring, and accelerate your career as we scale to meet ambitious growth targets. What you'll be doing: Operational Cloud Support Providing technical support and troubleshooting of AWS and Azure environments for enterprise customers, including incident management, monitoring, backup, and disaster recovery. Implement and maintain robust monitoring, alerting, and reporting frameworks to ensure SLA adherence and proactive issue detection. Support upgrades, patches, and problem resolution across cloud platforms with an automation-first mindset. Supporting cost optimisation (FinOps) and security posture improvement across client deployments. Automation, IaC, and CI/CD Build, optimise, and manage Infrastructure-as-Code (IaC) templates and automation scripts-primarily using Terraform, CloudFormation, ARM/Azure Bicep, and related tools. Develop, maintain, and enhance CI/CD pipelines and GitOps workflows to accelerate cloud deployments and streamline operational changes. Participate in release management, change configuration, and cloud resource life cycle operations. Project Delivery & Skill Development Rotate into project-based delivery assignments to participate in cloud migration, modernisation, and optimisation engagements, building hands-on expertise and expanding knowledge of new services (including AI/ML/GenAI when relevant). Contribute to knowledge sharing and continually develop skillsets by collaborating with cloud architects, engineers, and product specialists. Collaboration & Continuous Improvement Work closely with service desk, SREs, developers, and security teams to resolve incidents, enhance reliability, and adopt best operational practices. Document technical solutions, create playbooks, and recommend process improvements to drive efficiency and standardisation. Promote a culture of automation, continuous learning, and operational excellence within the cloud team. What you need to succeed: Solid, hands-on experience supporting, configuring, and troubleshooting AWS and/or Azure environments in large-scale or MSP settings. Diligent and client-focussed mentality ensuring customer outcomes are maintained. Expertise in moving Windows Server workloads to AWS Workspaces or Azure AVD/Workspaces is advantageous. Proficiency in Infrastructure-as-Code (Terraform, CloudFormation, or equivalent), with a strong automation and Scripting background (Python, PowerShell, or Bash). Direct experience with cloud platform operations, monitoring, and incident response, including root cause analysis and problem management. Demonstrated ability to manage CI/CD tools, source control (Git), and modern DevOps workflows. Enthusiasm for collaborating with diverse technical teams and mentoring less-experienced team members. Strong communication skills, both written and verbal, for engaging with technical peers, customers, and non-technical stakeholders. AWS Associate certification required; willingness to achieve AWS Professional (DevOps or Solutions Architect). Azure certification or experience highly valued. Experience or demonstrated interest in supporting AI/ML/GenAI operations is a plus but not essential. At Lucid, we celebrate difference and value diverse perspectives, underpinned by our values of Honesty, Integrity, and Pragmatism. We welcome applications from all suitably qualified or experienced candidates, regardless of personal characteristics. If you have a disability or health condition and seek support throughout the recruitment process, please do not hesitate to contact us.