DevOps Engineer Position Description At CGI, we help organisations transform through secure, scalable, and innovative technology solutions that deliver measurable impact. As a DevOps & Platform Engineer, you will play a key role in designing and delivering modern cloud platforms that enable high-performing digital services for our clients. You'll work on complex, business-critical programmes where automation, resilience, and continuous improvement are central to success. Joining a collaborative and supportive engineering community, you'll have the opportunity to shape technical outcomes, influence delivery approaches, and grow your expertise across cloud-native technologies while contributing to meaningful client transformation. CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named a UK 'Best Employer' by the Financial Times. We offer a competitive salary, excellent pension, private healthcare, plus a share scheme (3.5% + 3.5% matching) which makes you a CGI Partner not just an employee. We are committed to inclusivity, building a genuinely diverse community of tech talent and inspiring everyone to pursue careers in our sector, including our Armed Forces, and are proud to hold a Gold Award in recognition of our support of the Armed Forces Corporate Covenant. Join us and you'll be part of an open, friendly community of experts. We'll train and support you in taking your career wherever you want it to go. Due to the secure nature of the programme, you will need to hold UK Security Clearance or be eligible to go through this clearance. This is a hybrid position. Your future duties and responsibilities In this role, you will design, build, and support secure cloud-native platforms that enable reliable and scalable digital services for CGI clients. You'll contribute across the full platform engineering lifecycle, from infrastructure provisioning and CI/CD pipeline development through to operational support, observability, and continuous optimisation. Working within multidisciplinary delivery teams, you'll help drive engineering excellence while taking ownership of technical solutions and contributing to successful project outcomes. You will collaborate closely with architects, engineers, stakeholders, and clients to deliver resilient infrastructure and automation solutions that improve deployment efficiency, operational stability, and service reliability. Alongside hands-on engineering responsibilities, you'll support continuous improvement initiatives, mentor junior colleagues, and contribute to a culture of knowledge sharing, innovation, and high-quality delivery. Key responsibilities Design & Deliver cloud-native infrastructure and platform solutions across AWS, Azure, or GCP Build & Maintain CI/CD pipelines to support automated testing, deployment, and release processes Develop & Automate Infrastructure as Code using Terraform, CloudFormation, Bicep, or similar tooling Manage & Optimise containerised environments using Docker and Kubernetes Monitor & Improve platform health through observability, alerting, and operational support practices Troubleshoot & Resolve infrastructure, deployment, and platform-related issues independently Implement & Support secure IAM controls, governance standards, and compliance requirements Collaborate & Contribute to solution design, peer reviews, delivery planning, and technical documentation Mentor & Support junior engineers through coaching and knowledge sharing Drive & Enhance engineering standards, automation capabilities, and operational excellence initiatives Required qualifications to be successful in this role To succeed in this role, you should have strong experience delivering modern DevOps and platform engineering solutions within cloud-native environments. You'll bring hands-on expertise across cloud infrastructure, automation, CI/CD, Infrastructure as Code, and container orchestration technologies, alongside excellent troubleshooting and stakeholder engagement skills. You should be comfortable working in collaborative delivery teams, supporting operational reliability, and contributing to secure, scalable engineering outcomes within complex environments. Essential qualifications and experience Proven experience delivering DevOps or platform engineering solutions within enterprise environments Strong knowledge of AWS, Azure, or Google Cloud Platform services and architecture principles Hands-on experience with Infrastructure as Code tools such as Terraform, CloudFormation, or Bicep Experience building and maintaining CI/CD pipelines using Azure DevOps, GitHub Actions, GitLab CI, Jenkins, or similar Strong understanding of containerisation and orchestration technologies including Docker and Kubernetes Experience with observability and monitoring tooling such as Prometheus, Grafana, CloudWatch, or Azure Monitor Proficiency in scripting and automation using Bash, Python, PowerShell, or similar languages Ability to troubleshoot infrastructure and platform issues independently Strong communication skills with experience working in collaborative and client-facing environments Foundation-level cloud or infrastructure certification such as AWS Cloud Practitioner, AZ-900, LFCS, or KCNA Desirable experience Exposure to Site Reliability Engineering (SRE) practices Experience with feature flagging or progressive delivery approaches Knowledge of FinOps or cloud cost optimisation practices Experience supporting regulated or security-sensitive environments Associate-level certifications such as AWS SysOps Administrator, AZ-104, or Terraform Associate Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
15/06/2026
Full time
DevOps Engineer Position Description At CGI, we help organisations transform through secure, scalable, and innovative technology solutions that deliver measurable impact. As a DevOps & Platform Engineer, you will play a key role in designing and delivering modern cloud platforms that enable high-performing digital services for our clients. You'll work on complex, business-critical programmes where automation, resilience, and continuous improvement are central to success. Joining a collaborative and supportive engineering community, you'll have the opportunity to shape technical outcomes, influence delivery approaches, and grow your expertise across cloud-native technologies while contributing to meaningful client transformation. CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named a UK 'Best Employer' by the Financial Times. We offer a competitive salary, excellent pension, private healthcare, plus a share scheme (3.5% + 3.5% matching) which makes you a CGI Partner not just an employee. We are committed to inclusivity, building a genuinely diverse community of tech talent and inspiring everyone to pursue careers in our sector, including our Armed Forces, and are proud to hold a Gold Award in recognition of our support of the Armed Forces Corporate Covenant. Join us and you'll be part of an open, friendly community of experts. We'll train and support you in taking your career wherever you want it to go. Due to the secure nature of the programme, you will need to hold UK Security Clearance or be eligible to go through this clearance. This is a hybrid position. Your future duties and responsibilities In this role, you will design, build, and support secure cloud-native platforms that enable reliable and scalable digital services for CGI clients. You'll contribute across the full platform engineering lifecycle, from infrastructure provisioning and CI/CD pipeline development through to operational support, observability, and continuous optimisation. Working within multidisciplinary delivery teams, you'll help drive engineering excellence while taking ownership of technical solutions and contributing to successful project outcomes. You will collaborate closely with architects, engineers, stakeholders, and clients to deliver resilient infrastructure and automation solutions that improve deployment efficiency, operational stability, and service reliability. Alongside hands-on engineering responsibilities, you'll support continuous improvement initiatives, mentor junior colleagues, and contribute to a culture of knowledge sharing, innovation, and high-quality delivery. Key responsibilities Design & Deliver cloud-native infrastructure and platform solutions across AWS, Azure, or GCP Build & Maintain CI/CD pipelines to support automated testing, deployment, and release processes Develop & Automate Infrastructure as Code using Terraform, CloudFormation, Bicep, or similar tooling Manage & Optimise containerised environments using Docker and Kubernetes Monitor & Improve platform health through observability, alerting, and operational support practices Troubleshoot & Resolve infrastructure, deployment, and platform-related issues independently Implement & Support secure IAM controls, governance standards, and compliance requirements Collaborate & Contribute to solution design, peer reviews, delivery planning, and technical documentation Mentor & Support junior engineers through coaching and knowledge sharing Drive & Enhance engineering standards, automation capabilities, and operational excellence initiatives Required qualifications to be successful in this role To succeed in this role, you should have strong experience delivering modern DevOps and platform engineering solutions within cloud-native environments. You'll bring hands-on expertise across cloud infrastructure, automation, CI/CD, Infrastructure as Code, and container orchestration technologies, alongside excellent troubleshooting and stakeholder engagement skills. You should be comfortable working in collaborative delivery teams, supporting operational reliability, and contributing to secure, scalable engineering outcomes within complex environments. Essential qualifications and experience Proven experience delivering DevOps or platform engineering solutions within enterprise environments Strong knowledge of AWS, Azure, or Google Cloud Platform services and architecture principles Hands-on experience with Infrastructure as Code tools such as Terraform, CloudFormation, or Bicep Experience building and maintaining CI/CD pipelines using Azure DevOps, GitHub Actions, GitLab CI, Jenkins, or similar Strong understanding of containerisation and orchestration technologies including Docker and Kubernetes Experience with observability and monitoring tooling such as Prometheus, Grafana, CloudWatch, or Azure Monitor Proficiency in scripting and automation using Bash, Python, PowerShell, or similar languages Ability to troubleshoot infrastructure and platform issues independently Strong communication skills with experience working in collaborative and client-facing environments Foundation-level cloud or infrastructure certification such as AWS Cloud Practitioner, AZ-900, LFCS, or KCNA Desirable experience Exposure to Site Reliability Engineering (SRE) practices Experience with feature flagging or progressive delivery approaches Knowledge of FinOps or cloud cost optimisation practices Experience supporting regulated or security-sensitive environments Associate-level certifications such as AWS SysOps Administrator, AZ-104, or Terraform Associate Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
DevSecOps Engineer Position Description At CGI, we are transforming how organisations deliver secure, scalable digital services, and our DevSecOps Engineers are at the heart of that change. In this role, you will design and operate resilient cloud platforms that power critical systems, embedding security and automation into every stage of the lifecycle. We combine deep engineering expertise with a culture of collaboration and accountability, enabling you to shape DevOps and SRE best practice while driving measurable impact for our clients. Here, your ideas are valued, your ownership makes a difference, and your growth is supported as you help us deliver innovative, high-value solutions across modern cloud environments. CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named a UK 'Best Employer' by the Financial Times. We offer a competitive salary, excellent pension, private healthcare, plus a share scheme (3.5% + 3.5% matching) which makes you a CGI Partner not just an employee. We are committed to inclusivity, building a genuinely diverse community of tech talent and inspiring everyone to pursue careers in our sector, including our Armed Forces, and are proud to hold a Gold Award in recognition of our support of the Armed Forces Corporate Covenant. Join us and you'll be part of an open, friendly community of experts. We'll train and support you in taking your career wherever you want it to go. Due to the secure nature of the programme, you will need to hold UK Security Clearance or be eligible to go through this clearance. This is a hybrid position. Your future duties and responsibilities In this role, you will design, build, secure and operate scalable cloud-native platforms across AWS, Azure, GCP or Oracle Cloud. You will take ownership of CI/CD pipelines, infrastructure as code and container platforms, embedding security and resilience from the outset. Working closely with architects, developers and operations teams, you will champion DevSecOps and SRE principles, improving reliability, performance and delivery speed. You will also play a key role in shaping engineering standards, mentoring colleagues and driving continuous improvement. With the backing of a collaborative, expert community, you will innovate, automate and optimise platforms that underpin critical services, ensuring measurable outcomes for our clients. You will Contribute to the development of CGI's cloud security engineering accelerators, and building blocks. Key responsibilities: Required qualifications to be successful in this role To succeed, you will bring hands-on experience in DevSecOps or SRE roles within modern cloud environments, alongside strong scripting and automation expertise. You will be comfortable working across infrastructure, platforms and pipelines, with a proactive approach to security, reliability and continuous improvement. Essential qualifications: Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
15/06/2026
Full time
DevSecOps Engineer Position Description At CGI, we are transforming how organisations deliver secure, scalable digital services, and our DevSecOps Engineers are at the heart of that change. In this role, you will design and operate resilient cloud platforms that power critical systems, embedding security and automation into every stage of the lifecycle. We combine deep engineering expertise with a culture of collaboration and accountability, enabling you to shape DevOps and SRE best practice while driving measurable impact for our clients. Here, your ideas are valued, your ownership makes a difference, and your growth is supported as you help us deliver innovative, high-value solutions across modern cloud environments. CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named a UK 'Best Employer' by the Financial Times. We offer a competitive salary, excellent pension, private healthcare, plus a share scheme (3.5% + 3.5% matching) which makes you a CGI Partner not just an employee. We are committed to inclusivity, building a genuinely diverse community of tech talent and inspiring everyone to pursue careers in our sector, including our Armed Forces, and are proud to hold a Gold Award in recognition of our support of the Armed Forces Corporate Covenant. Join us and you'll be part of an open, friendly community of experts. We'll train and support you in taking your career wherever you want it to go. Due to the secure nature of the programme, you will need to hold UK Security Clearance or be eligible to go through this clearance. This is a hybrid position. Your future duties and responsibilities In this role, you will design, build, secure and operate scalable cloud-native platforms across AWS, Azure, GCP or Oracle Cloud. You will take ownership of CI/CD pipelines, infrastructure as code and container platforms, embedding security and resilience from the outset. Working closely with architects, developers and operations teams, you will champion DevSecOps and SRE principles, improving reliability, performance and delivery speed. You will also play a key role in shaping engineering standards, mentoring colleagues and driving continuous improvement. With the backing of a collaborative, expert community, you will innovate, automate and optimise platforms that underpin critical services, ensuring measurable outcomes for our clients. You will Contribute to the development of CGI's cloud security engineering accelerators, and building blocks. Key responsibilities: Required qualifications to be successful in this role To succeed, you will bring hands-on experience in DevSecOps or SRE roles within modern cloud environments, alongside strong scripting and automation expertise. You will be comfortable working across infrastructure, platforms and pipelines, with a proactive approach to security, reliability and continuous improvement. Essential qualifications: Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
Site Reliability Engineer Position Description We are seeking an experienced and motivated Site Reliability Engineer (SRE) to join a high-performing team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, deployment, and operational support of critical data-driven platforms and services operating within complex production environments. The successful candidate will work closely with engineering, platform, and operational support teams to strengthen monitoring and alerting capabilities, improve logging and traceability, troubleshoot incidents, support deployments, and automate operational processes wherever possible. The environment includes Kubernetes, Helm, the ELK stack, and a broad range of modern Site Reliability Engineering and cloud platform practices. This is a hands-on technical role suited to someone who thrives in fast-paced operational environments, enjoys solving complex production issues, and is passionate about automation, platform reliability, and continuous improvement. The role requires strong collaboration with both client stakeholders and engineering teams to ensure operational excellence, platform resilience, and service availability across critical systems. Your future duties and responsibilities - Support, maintain, and improve highly available production platforms and services across cloud and containerised environments. - Manage and support Kubernetes clusters and Helm-based deployments across multiple environments. - Enhance monitoring, alerting, logging, and observability solutions to improve operational visibility and system reliability. - Investigate incidents, analyse logs, identify root causes, and drive timely resolution of production issues. - Participate in incident response, post-incident reviews, and continuous service improvement activities. - Automate operational tasks and repetitive support activities to reduce manual effort and improve platform efficiency. - Collaborate with engineering and data platform teams to improve scalability, resilience, deployment reliability, and operational maturity. - Develop and maintain operational documentation, support procedures, runbooks, and troubleshooting guides. - Contribute to reliability engineering initiatives including proactive monitoring, service health management, operational readiness, and platform optimisation. - Support deployment activities, release processes, and production change management activities. Required qualifications to be successful in this role - Strong commercial experience in Site Reliability Engineering, DevOps, Platform Engineering, or Production Support environments. - Strong hands-on experience with Kubernetes and Helm within enterprise or production environments. - Proven experience supporting mission-critical production platforms and operational support environments. - Strong experience with the ELK stack (Elasticsearch, Logstash, Kibana) for logging, monitoring, troubleshooting, and operational analysis. - Demonstrated capability in log analysis, incident investigation, troubleshooting, and root cause analysis. - Strong understanding and practical experience with core SRE practices - Experience working with data platforms, analytics platforms, or data product teams would be highly advantageous. - Experience with scripting and automation technologies such as Bash, Python, or similar would be beneficial. - Exposure to CI/CD pipelines, Infrastructure as Code, cloud-native platforms, or observability tooling would be desirable. - Strong communication, stakeholder engagement, and collaboration skills. - Ability to work effectively within fast-paced operational support environments while managing competing priorities and deadlines. Security Clearance - Resource must be willing and able to work onsite at the client location five days per week. - Candidate must already hold current HLC clearance (mandatory requirement). - Previous experience working within secure, government, defence, or highly regulated environments will be highly regarded. - Due to client security requirements, only candidates meeting the required clearance criteria will be considered. Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
15/06/2026
Full time
Site Reliability Engineer Position Description We are seeking an experienced and motivated Site Reliability Engineer (SRE) to join a high-performing team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, deployment, and operational support of critical data-driven platforms and services operating within complex production environments. The successful candidate will work closely with engineering, platform, and operational support teams to strengthen monitoring and alerting capabilities, improve logging and traceability, troubleshoot incidents, support deployments, and automate operational processes wherever possible. The environment includes Kubernetes, Helm, the ELK stack, and a broad range of modern Site Reliability Engineering and cloud platform practices. This is a hands-on technical role suited to someone who thrives in fast-paced operational environments, enjoys solving complex production issues, and is passionate about automation, platform reliability, and continuous improvement. The role requires strong collaboration with both client stakeholders and engineering teams to ensure operational excellence, platform resilience, and service availability across critical systems. Your future duties and responsibilities - Support, maintain, and improve highly available production platforms and services across cloud and containerised environments. - Manage and support Kubernetes clusters and Helm-based deployments across multiple environments. - Enhance monitoring, alerting, logging, and observability solutions to improve operational visibility and system reliability. - Investigate incidents, analyse logs, identify root causes, and drive timely resolution of production issues. - Participate in incident response, post-incident reviews, and continuous service improvement activities. - Automate operational tasks and repetitive support activities to reduce manual effort and improve platform efficiency. - Collaborate with engineering and data platform teams to improve scalability, resilience, deployment reliability, and operational maturity. - Develop and maintain operational documentation, support procedures, runbooks, and troubleshooting guides. - Contribute to reliability engineering initiatives including proactive monitoring, service health management, operational readiness, and platform optimisation. - Support deployment activities, release processes, and production change management activities. Required qualifications to be successful in this role - Strong commercial experience in Site Reliability Engineering, DevOps, Platform Engineering, or Production Support environments. - Strong hands-on experience with Kubernetes and Helm within enterprise or production environments. - Proven experience supporting mission-critical production platforms and operational support environments. - Strong experience with the ELK stack (Elasticsearch, Logstash, Kibana) for logging, monitoring, troubleshooting, and operational analysis. - Demonstrated capability in log analysis, incident investigation, troubleshooting, and root cause analysis. - Strong understanding and practical experience with core SRE practices - Experience working with data platforms, analytics platforms, or data product teams would be highly advantageous. - Experience with scripting and automation technologies such as Bash, Python, or similar would be beneficial. - Exposure to CI/CD pipelines, Infrastructure as Code, cloud-native platforms, or observability tooling would be desirable. - Strong communication, stakeholder engagement, and collaboration skills. - Ability to work effectively within fast-paced operational support environments while managing competing priorities and deadlines. Security Clearance - Resource must be willing and able to work onsite at the client location five days per week. - Candidate must already hold current HLC clearance (mandatory requirement). - Previous experience working within secure, government, defence, or highly regulated environments will be highly regarded. - Due to client security requirements, only candidates meeting the required clearance criteria will be considered. Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
Site Reliability Engineer Position Description We are seeking an experienced and proactive Site Reliability Engineer (SRE) to join a team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, and operational performance of critical data-driven platforms and services across complex production environments. The successful candidate will work closely with engineering, platform, and support teams to strengthen monitoring and alerting capabilities, improve logging and traceability, troubleshoot production incidents, support deployments, and automate operational processes wherever possible. The environment includes Kubernetes, Helm, the ELK stack, and a strong focus on modern Site Reliability Engineering practices across cloud and platform services. This is a hands-on technical role suited to someone who thrives in fast-paced operational environments and is passionate about reliability engineering, automation, and continuous improvement. The role requires strong collaboration with both client stakeholders and engineering teams to ensure platform stability, operational excellence, and high service availability Your future duties and responsibilities - Support, maintain, and improve highly available production platforms and services across cloud and containerised environments. - Manage and support Kubernetes clusters and Helm-based deployments across multiple environments. - Implement and enhance monitoring, alerting, logging, and observability solutions to improve platform reliability and operational visibility. - Investigate incidents, analyse logs, identify root causes, and drive timely resolution of production issues. - Participate in incident response, post-incident reviews, and continuous operational improvement initiatives. - Automate operational tasks and repetitive support activities to reduce manual effort and improve platform efficiency. - Work closely with engineering and data platform teams to improve system resilience, scalability, deployment reliability, and operational maturity. - Develop and maintain operational documentation, support procedures, runbooks, and troubleshooting guides. - Contribute to reliability engineering practices including proactive monitoring, service health management, and operational readiness. - Support deployment activities, release processes, and production change management activities. Required qualifications to be successful in this role - Strong commercial experience in Site Reliability Engineering, Platform Engineering, DevOps, or Production Support environments. - Strong hands-on experience with Kubernetes and Helm in enterprise or production environments. - Proven experience supporting mission-critical production platforms and operational support functions. - Strong hands-on experience with the ELK stack (Elasticsearch, Logstash, Kibana) for logging, monitoring, troubleshooting, and operational analysis. - Demonstrated capability in log analysis, incident investigation, troubleshooting, and root cause analysis. - Strong understanding and practical experience with core SRE practices including: Monitoring and alerting Incident management and response Root cause analysis and post-incident reviews Automation and operational improvement Production support and reliability engineering -Experience working with data platforms, analytics platforms, or data product teams would be highly advantageous. - Experience with scripting and automation tools such as Bash, Python, or similar technologies is desirable. - Exposure to CI/CD pipelines, Infrastructure as Code, and cloud-native environments would be beneficial. - Strong communication, stakeholder engagement, and collaboration skills. - Ability to work effectively in fast-paced support environments and manage competing priorities under pressure. Security Clearance - Resource must be willing and able to work onsite at the client location five days per week. - Candidate must already hold current HLC clearance (mandatory requirement). - Previous experience working within secure, government, defence, or highly regulated environments will be highly regarded. - Due to client security requirements, only candidates meeting the required clearance criteria will be considered. Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
15/06/2026
Full time
Site Reliability Engineer Position Description We are seeking an experienced and proactive Site Reliability Engineer (SRE) to join a team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, and operational performance of critical data-driven platforms and services across complex production environments. The successful candidate will work closely with engineering, platform, and support teams to strengthen monitoring and alerting capabilities, improve logging and traceability, troubleshoot production incidents, support deployments, and automate operational processes wherever possible. The environment includes Kubernetes, Helm, the ELK stack, and a strong focus on modern Site Reliability Engineering practices across cloud and platform services. This is a hands-on technical role suited to someone who thrives in fast-paced operational environments and is passionate about reliability engineering, automation, and continuous improvement. The role requires strong collaboration with both client stakeholders and engineering teams to ensure platform stability, operational excellence, and high service availability Your future duties and responsibilities - Support, maintain, and improve highly available production platforms and services across cloud and containerised environments. - Manage and support Kubernetes clusters and Helm-based deployments across multiple environments. - Implement and enhance monitoring, alerting, logging, and observability solutions to improve platform reliability and operational visibility. - Investigate incidents, analyse logs, identify root causes, and drive timely resolution of production issues. - Participate in incident response, post-incident reviews, and continuous operational improvement initiatives. - Automate operational tasks and repetitive support activities to reduce manual effort and improve platform efficiency. - Work closely with engineering and data platform teams to improve system resilience, scalability, deployment reliability, and operational maturity. - Develop and maintain operational documentation, support procedures, runbooks, and troubleshooting guides. - Contribute to reliability engineering practices including proactive monitoring, service health management, and operational readiness. - Support deployment activities, release processes, and production change management activities. Required qualifications to be successful in this role - Strong commercial experience in Site Reliability Engineering, Platform Engineering, DevOps, or Production Support environments. - Strong hands-on experience with Kubernetes and Helm in enterprise or production environments. - Proven experience supporting mission-critical production platforms and operational support functions. - Strong hands-on experience with the ELK stack (Elasticsearch, Logstash, Kibana) for logging, monitoring, troubleshooting, and operational analysis. - Demonstrated capability in log analysis, incident investigation, troubleshooting, and root cause analysis. - Strong understanding and practical experience with core SRE practices including: Monitoring and alerting Incident management and response Root cause analysis and post-incident reviews Automation and operational improvement Production support and reliability engineering -Experience working with data platforms, analytics platforms, or data product teams would be highly advantageous. - Experience with scripting and automation tools such as Bash, Python, or similar technologies is desirable. - Exposure to CI/CD pipelines, Infrastructure as Code, and cloud-native environments would be beneficial. - Strong communication, stakeholder engagement, and collaboration skills. - Ability to work effectively in fast-paced support environments and manage competing priorities under pressure. Security Clearance - Resource must be willing and able to work onsite at the client location five days per week. - Candidate must already hold current HLC clearance (mandatory requirement). - Previous experience working within secure, government, defence, or highly regulated environments will be highly regarded. - Due to client security requirements, only candidates meeting the required clearance criteria will be considered. Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you'll reach your full potential because You are invited to be an owner from day 1 as we work together to bring our Dream to life. That's why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company's strategy and direction. Your work creates value. You'll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You'll shape your career by joining a company built to grow and last. You'll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team-one of the largest IT and business consulting services firms in the world.
ABOUT IDBS IDBS helps BioPharma organisations unlock the potential of AI/ML to improve the lives of patients. As a trusted long-term partner to 80% of the top 20 global BioPharma companies1, IDBS delivers powerful cloud software and services specifically designed to meet the evolving needs of the BioPharma sector. IDBS, a Danaher company, leverages 35 years of scientific informatics expertise to help organisations design, execute and orchestrate processes, manage, contextualise and structure data and gain valuable insights throughout the product lifecycle, from R&D through manufacturing. Known for its signature IDBS E WorkBook software, IDBS has extended its flexible, scalable solutions to the IDBS Polar and PIMS cloud platforms to help scientists make smarter decisions with assured confidence in both GxP and non GxP environments. Do you want to join a DevOps team that keeps our cloud platform running at its best? We manage a high availability Linux environment in AWS and empower development teams with the tools, automation, and CI/CD pipelines they rely on every day. Our DevOps team helps engineers build reliable software fast by providing shared infrastructure and strong SRE practices that ensure consistency, efficiency, and scalability. As a Principal DevOps Engineer, you'll lead the design and operation of our AWS/Linux platform, improving reliability and scalability, and drive faster, safer deployments. You'll shape the technical roadmap, elevate DevOps standards, and deliver measurable impact from reducing MTTR to increasing deployment frequency and improving consistency across environments. What you'll get you doing Lead the technical roadmap by partnering with Architecture and feature teams, influencing key design decisions and driving modern tooling and best practices. Optimise and operate scalable AWS infrastructure, ensuring strong performance, resilience, and cost efficiency through effective AWS Cost Management and FinOps practices. Drive DevOps excellence by building and evolving CI/CD pipelines, automation, and Infrastructure as Code to enable fast, safe, and reliable software delivery. Champion modern engineering practices by evaluating and introducing new technologies, running proofs of concept, and uplifting standards across the organisation. Mentor and support engineering teams, while ensuring platform reliability, compliance, and participating in on call/out of hour support when needed. Here is what success in this role looks like Confidently operate and improve AWS/Linux platforms, delivering secure, scalable, and highly available systems while guiding teams on best practices and architectural decisions. Drive cloud cost efficiency through strong FinOps practices, reducing waste and optimising workloads with rightsizing, storage lifecycle policies, and AWS Savings Plans or Reserved Instances. Deliver high quality IaaC using Terraform (and/or Puppet for configuration management), creating modular, reusable, and policy compliant IaaC that enables consistent, multi environment provisioning. Design and run production grade Kubernetes environments covering Helm, RBAC, networking/ingress, secrets management, scaling strategies, and deployment patterns-to deliver reliable, scalable, cloud native infrastructure. Build and maintain robust CI/CD workflows using Git and/or Jenkins, including automated testing, security scanning, and environment promotion, resulting in improved deployment frequencies, change failure rates, etc. Automate operational and developer workflows using Bash, Python, or Java, eliminating manual effort and enabling self service capabilities that empower engineering teams. Elevate engineering capability by mentoring others, improving documentation, and influencing platform wide standards, ensuring best practices are embedded across teams and reflected in everyday development and operations. It would be a plus if you also possess previous experience in Hands on experience with enterprise and cloud native databases such as Oracle, MySQL, MongoDB, or PostgreSQL, supporting deeper insight into platform behaviour and performance. A strong passion for open source technologies, with the ability to evaluate, advocate, and introduce community driven tools that strengthen engineering capabilities. Relevant AWS certifications (e.g., AWS Solutions Architect - Associate/Professional or DevOps Engineer - Professional), demonstrating advanced cloud proficiency and architectural depth. Flexible Work Arrangement At IDBS, we believe in designing a better, more sustainable workforce. We recognise the benefits of flexible working arrangements for eligible roles and are committed to providing enriching careers, no matter the work arrangement. This position is eligible for a flexible work arrangement in which you can work part time at the Company location identified above and part time remotely from your home. Additional information about this work arrangement will be provided by your interview team. Explore the flexibility and challenge that working for IDBS can provide. Additional Information Join our winning team today. Together, we'll accelerate the real life impact of tomorrow's science and technology. We partner with customers across the globe to help them solve their most complex challenges, architecting solutions that bring the power of science to life.
15/06/2026
Full time
ABOUT IDBS IDBS helps BioPharma organisations unlock the potential of AI/ML to improve the lives of patients. As a trusted long-term partner to 80% of the top 20 global BioPharma companies1, IDBS delivers powerful cloud software and services specifically designed to meet the evolving needs of the BioPharma sector. IDBS, a Danaher company, leverages 35 years of scientific informatics expertise to help organisations design, execute and orchestrate processes, manage, contextualise and structure data and gain valuable insights throughout the product lifecycle, from R&D through manufacturing. Known for its signature IDBS E WorkBook software, IDBS has extended its flexible, scalable solutions to the IDBS Polar and PIMS cloud platforms to help scientists make smarter decisions with assured confidence in both GxP and non GxP environments. Do you want to join a DevOps team that keeps our cloud platform running at its best? We manage a high availability Linux environment in AWS and empower development teams with the tools, automation, and CI/CD pipelines they rely on every day. Our DevOps team helps engineers build reliable software fast by providing shared infrastructure and strong SRE practices that ensure consistency, efficiency, and scalability. As a Principal DevOps Engineer, you'll lead the design and operation of our AWS/Linux platform, improving reliability and scalability, and drive faster, safer deployments. You'll shape the technical roadmap, elevate DevOps standards, and deliver measurable impact from reducing MTTR to increasing deployment frequency and improving consistency across environments. What you'll get you doing Lead the technical roadmap by partnering with Architecture and feature teams, influencing key design decisions and driving modern tooling and best practices. Optimise and operate scalable AWS infrastructure, ensuring strong performance, resilience, and cost efficiency through effective AWS Cost Management and FinOps practices. Drive DevOps excellence by building and evolving CI/CD pipelines, automation, and Infrastructure as Code to enable fast, safe, and reliable software delivery. Champion modern engineering practices by evaluating and introducing new technologies, running proofs of concept, and uplifting standards across the organisation. Mentor and support engineering teams, while ensuring platform reliability, compliance, and participating in on call/out of hour support when needed. Here is what success in this role looks like Confidently operate and improve AWS/Linux platforms, delivering secure, scalable, and highly available systems while guiding teams on best practices and architectural decisions. Drive cloud cost efficiency through strong FinOps practices, reducing waste and optimising workloads with rightsizing, storage lifecycle policies, and AWS Savings Plans or Reserved Instances. Deliver high quality IaaC using Terraform (and/or Puppet for configuration management), creating modular, reusable, and policy compliant IaaC that enables consistent, multi environment provisioning. Design and run production grade Kubernetes environments covering Helm, RBAC, networking/ingress, secrets management, scaling strategies, and deployment patterns-to deliver reliable, scalable, cloud native infrastructure. Build and maintain robust CI/CD workflows using Git and/or Jenkins, including automated testing, security scanning, and environment promotion, resulting in improved deployment frequencies, change failure rates, etc. Automate operational and developer workflows using Bash, Python, or Java, eliminating manual effort and enabling self service capabilities that empower engineering teams. Elevate engineering capability by mentoring others, improving documentation, and influencing platform wide standards, ensuring best practices are embedded across teams and reflected in everyday development and operations. It would be a plus if you also possess previous experience in Hands on experience with enterprise and cloud native databases such as Oracle, MySQL, MongoDB, or PostgreSQL, supporting deeper insight into platform behaviour and performance. A strong passion for open source technologies, with the ability to evaluate, advocate, and introduce community driven tools that strengthen engineering capabilities. Relevant AWS certifications (e.g., AWS Solutions Architect - Associate/Professional or DevOps Engineer - Professional), demonstrating advanced cloud proficiency and architectural depth. Flexible Work Arrangement At IDBS, we believe in designing a better, more sustainable workforce. We recognise the benefits of flexible working arrangements for eligible roles and are committed to providing enriching careers, no matter the work arrangement. This position is eligible for a flexible work arrangement in which you can work part time at the Company location identified above and part time remotely from your home. Additional information about this work arrangement will be provided by your interview team. Explore the flexibility and challenge that working for IDBS can provide. Additional Information Join our winning team today. Together, we'll accelerate the real life impact of tomorrow's science and technology. We partner with customers across the globe to help them solve their most complex challenges, architecting solutions that bring the power of science to life.
Your Opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Responsibilities Design, build, and evolve enterprise platform services to support the IDP and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g. Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. Benefits Hybrid working and reasonable accommodations Generous holiday policies Excellent health and wellbeing benefits including corporate membership to Wellhub Paid volunteer time Professional development support (courses, tuition/qualification reimbursement) Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Annual bonus opportunity: position may be eligible to receive an annual discretionary bonus award from the profit pool. Competitive compensation, pension/retirement plans, and various health, wellbeing, and lifestyle benefits. Qualifications Bachelor's or master's in computer science, engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on experience with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from Azure DevOps to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g. Veracode), secrets management, and secure pipeline execution Expertise in artifact management (Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Exposure to AI enabled development (e.g. GitHub Copilot, automation workflows) Relevant cloud or Kubernetes certifications Potential for Growth Mentoring Leadership development programs Compliance and Policies Applicants should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities. Applicants' past political contributions or activity may impact eligibility for this position. You will be expected to understand regulatory obligations of the firm and abide by the regulated entity requirements and Janus Henderson Investors policies applicable for your role. Equal Opportunity Employer Janus Henderson Investors is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks.
14/06/2026
Full time
Your Opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Responsibilities Design, build, and evolve enterprise platform services to support the IDP and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g. Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. Benefits Hybrid working and reasonable accommodations Generous holiday policies Excellent health and wellbeing benefits including corporate membership to Wellhub Paid volunteer time Professional development support (courses, tuition/qualification reimbursement) Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Annual bonus opportunity: position may be eligible to receive an annual discretionary bonus award from the profit pool. Competitive compensation, pension/retirement plans, and various health, wellbeing, and lifestyle benefits. Qualifications Bachelor's or master's in computer science, engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on experience with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from Azure DevOps to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g. Veracode), secrets management, and secure pipeline execution Expertise in artifact management (Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Exposure to AI enabled development (e.g. GitHub Copilot, automation workflows) Relevant cloud or Kubernetes certifications Potential for Growth Mentoring Leadership development programs Compliance and Policies Applicants should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities. Applicants' past political contributions or activity may impact eligibility for this position. You will be expected to understand regulatory obligations of the firm and abide by the regulated entity requirements and Janus Henderson Investors policies applicable for your role. Equal Opportunity Employer Janus Henderson Investors is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks.
Career Choices Dewis Gyrfa Ltd
Manchester, Lancashire
Join us as a Senior Site Reliability Engineer In this key role, you'll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development You'll need to have the flexibility to support the team by working shifts and weekends on rotation What you'll do As a Senior Site Reliability Engineer, you'll act as a hands on expert responsible for ensuring the reliability, availability, and performance of critical production platforms. You'll lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes. You'll also take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from. We'll expect you as well to design and operate highly resilient AWS based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks. You'll lead incident management, escalation, and 24/7 on call practices, including post incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams. Furthermore, you'll implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self healing, auto scaling, and failure recovery mechanisms using tools such as Karpenter. In addition to this, you'll be: Building secure and scalable networking and service communication such as Cilium Defining and operating observability platforms using Grafana, Prometheus, Loki, and Tempo Partnering with DevOps and engineering teams to ensure production readiness and operational excellence Leading complex troubleshooting across distributed systems and cloud native environments Developing reusable "golden paths," operational runbooks, and reliability patterns Ensuring platforms meet regulatory, security, and operational risk requirements Using data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements The skills you'll need We're looking for a highly experienced Site Reliability Engineer with a strong background in operating large scale, business critical platforms and a passion for reliability engineering. You must also have deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on call leadership. Moreover, you'll need to demonstrate advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction, as well as proficiency in Terraform, GitOps, and cloud automation practices. Hands on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD is also essential. In addition, you'll have to bring: A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium Experience scaling infrastructure using Karpenter and auto scaling strategies Expertise in observability tooling, including Grafana, Prometheus, Loki and Tempo A proven ability to troubleshoot and resolve complex, cross system production issues Experience operating in regulated or high security environments Strong leadership, mentoring, and stakeholder engagement capabilities The ability to balance reliability, risk, and delivery in a fast paced environment
14/06/2026
Full time
Join us as a Senior Site Reliability Engineer In this key role, you'll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development You'll need to have the flexibility to support the team by working shifts and weekends on rotation What you'll do As a Senior Site Reliability Engineer, you'll act as a hands on expert responsible for ensuring the reliability, availability, and performance of critical production platforms. You'll lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes. You'll also take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from. We'll expect you as well to design and operate highly resilient AWS based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks. You'll lead incident management, escalation, and 24/7 on call practices, including post incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams. Furthermore, you'll implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self healing, auto scaling, and failure recovery mechanisms using tools such as Karpenter. In addition to this, you'll be: Building secure and scalable networking and service communication such as Cilium Defining and operating observability platforms using Grafana, Prometheus, Loki, and Tempo Partnering with DevOps and engineering teams to ensure production readiness and operational excellence Leading complex troubleshooting across distributed systems and cloud native environments Developing reusable "golden paths," operational runbooks, and reliability patterns Ensuring platforms meet regulatory, security, and operational risk requirements Using data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements The skills you'll need We're looking for a highly experienced Site Reliability Engineer with a strong background in operating large scale, business critical platforms and a passion for reliability engineering. You must also have deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on call leadership. Moreover, you'll need to demonstrate advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction, as well as proficiency in Terraform, GitOps, and cloud automation practices. Hands on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD is also essential. In addition, you'll have to bring: A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium Experience scaling infrastructure using Karpenter and auto scaling strategies Expertise in observability tooling, including Grafana, Prometheus, Loki and Tempo A proven ability to troubleshoot and resolve complex, cross system production issues Experience operating in regulated or high security environments Strong leadership, mentoring, and stakeholder engagement capabilities The ability to balance reliability, risk, and delivery in a fast paced environment
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
13/06/2026
Full time
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
13/06/2026
Full time
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
13/06/2026
Full time
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
Role - Senior Enterprise Architect Technology - Enterprise Architect/Segment lead Location - UK/Europe Business Unit - STG Compensation - Competitive (including bonus) Job Description AI-First Solutioning, Human + Agent Ways of Working & Large-Scale Modernisation Your role This is a senior strategic role within the Enterprise Strategic Architecture practice, focused on defining and delivering next generation digital transformation programs for leading global organisations. The successful candidate will bring together deep technology expertise and strong business acumen to help clients navigate complex, large scale modernisation initiatives. As AI becomes central to how enterprises transform, this role is expanding in scope: the architect must be equally comfortable designing cloud native platforms, structuring human and agent collaborative workflows, and embedding AI driven capabilities as first class components of the overall solution. You will collaborate closely with sales and delivery teams across the full programme lifecycle - from shaping solutions during presales through to governing technical quality in delivery. You will engage with CDOs, CTOs, and senior digital leaders at client organisations, contribute to industry thinking through published viewpoints and speaking engagements, and play an active role in identifying emerging technology opportunities that can be developed into compelling propositions for the market. Responsibilities Strategic Thinking - Candidate can articulate where AI agents replace human tasks vs. augment them in a $10M+ transformation context. Can draw a human+agent operating model for a business process - showing handoff logic, oversight points, and accountability chains. Understands that LLM inference is now a line item in programme budgets and can estimate it at ROM level for a given use case volume. Design Depth - Has personally designed or reviewed an agentic system in production - e.g. a multi step reasoning pipeline, an autonomous code review agent, or a RAG powered enterprise knowledge layer. Can explain prompt architecture decisions (system prompt structuring, context compression strategies, few shot vs. zero shot trade offs) and how these affect both quality and cost. Understands model selection trade offs - when to use frontier models vs. fine tuned smaller models vs. cached completions. Token Optimization Fluency - Has operationalised token efficiency at scale - structured prompt libraries, semantic caching, chunk sizing for RAG pipelines, output length controls, batching strategies. Can model cost per transaction for an AI enabled workflow and present that as part of a business case. Understands how token spend interacts with context window limits across model families (GPT 4o, Claude, Gemini) and can make architecture trade offs accordingly. Must Have Skills Agentic architecture design - Multi agent orchestration, tool use design, human in the loop checkpoints, agent failure modes and recovery Human + agent workflow design - Task decomposition across human and AI agents; escalation paths; accountability mapping in regulated environments Expertise in leveraging coding agents - GitHub Copilot, Claude, Devin.ai and similar - to accelerate software delivery within a structured, governed engineering lifecycle Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation Full stack application development - Architecture and delivery of modern full stack applications; proficiency across frontend frameworks, API layers, backend services, and data tiers at enterprise scale Modern CI/CD & delivery pipelines - Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation High scalability integration - Architecting event driven and streaming integration at scale using Apache Kafka and Kafka Streams; asynchronous messaging patterns, schema registries, and real time data pipelines across distributed systems NoSQL & enterprise data platforms - Design of polyglot persistence architectures spanning NoSQL stores (MongoDB, Cassandra, DynamoDB), enterprise caching layers (Redis, Hazelcast, Memcached) and search platforms (Elasticsearch, OpenSearch) Hyperscaler resilience patterns - Building highly available, fault tolerant solutions on AWS, Azure and GCP - multi region active/active, chaos engineering, SRE practices, availability zone failover, and disaster recovery at cloud scale Token economics & LLM costing - Prompt compression, context window sizing, model tier selection, cost per transaction modelling at enterprise scale AI TCO & commercial modelling - Inference cost projections, build vs buy for foundation models, ROI framing for AI augmented delivery Digital transformation leadership - AI native programme design spanning cloud, integration, agentic capability layers and responsible AI governance Enterprise integration patterns - Streaming, API, event driven and real time patterns extending to RAG, vector stores, embedding services and LLM APIs as first class integration nodes Chief Architect leadership - Governing cross domain architect teams while managing AI risk, hallucination mitigation and responsible AI policy at programme level Multi cloud architecture (15+ yrs) - Hybrid IaaS/PaaS, multi az/region, IaC automation first, DevSecOps, K8s orchestration Influencing & stakeholder leadership - Builds and sustains networks across organisational boundaries through credibility and influence rather than authority; aligns diverse stakeholders - engineering, business, and executive - around a shared technology direction and drives teams to deliver outcomes in complex, matrixed environments CXO communication - Articulates at the right level of abstraction and detail from developer to board level Preferred Should be an excellent planner when it comes to release planning and other delivery planning. Should have excellent problem solving skills. Responsible for coaching and mentoring team members. BFSI/FS domain experience. Personal High analytical skills. High customer orientation. High quality awareness. Equal Opportunity Employer All aspects of employment at Infosys are based on merit, competence and performance. We are committed to embracing diversity and creating an inclusive environment for all employees. Infosys is proud to be an equal opportunity employer.
13/06/2026
Full time
Role - Senior Enterprise Architect Technology - Enterprise Architect/Segment lead Location - UK/Europe Business Unit - STG Compensation - Competitive (including bonus) Job Description AI-First Solutioning, Human + Agent Ways of Working & Large-Scale Modernisation Your role This is a senior strategic role within the Enterprise Strategic Architecture practice, focused on defining and delivering next generation digital transformation programs for leading global organisations. The successful candidate will bring together deep technology expertise and strong business acumen to help clients navigate complex, large scale modernisation initiatives. As AI becomes central to how enterprises transform, this role is expanding in scope: the architect must be equally comfortable designing cloud native platforms, structuring human and agent collaborative workflows, and embedding AI driven capabilities as first class components of the overall solution. You will collaborate closely with sales and delivery teams across the full programme lifecycle - from shaping solutions during presales through to governing technical quality in delivery. You will engage with CDOs, CTOs, and senior digital leaders at client organisations, contribute to industry thinking through published viewpoints and speaking engagements, and play an active role in identifying emerging technology opportunities that can be developed into compelling propositions for the market. Responsibilities Strategic Thinking - Candidate can articulate where AI agents replace human tasks vs. augment them in a $10M+ transformation context. Can draw a human+agent operating model for a business process - showing handoff logic, oversight points, and accountability chains. Understands that LLM inference is now a line item in programme budgets and can estimate it at ROM level for a given use case volume. Design Depth - Has personally designed or reviewed an agentic system in production - e.g. a multi step reasoning pipeline, an autonomous code review agent, or a RAG powered enterprise knowledge layer. Can explain prompt architecture decisions (system prompt structuring, context compression strategies, few shot vs. zero shot trade offs) and how these affect both quality and cost. Understands model selection trade offs - when to use frontier models vs. fine tuned smaller models vs. cached completions. Token Optimization Fluency - Has operationalised token efficiency at scale - structured prompt libraries, semantic caching, chunk sizing for RAG pipelines, output length controls, batching strategies. Can model cost per transaction for an AI enabled workflow and present that as part of a business case. Understands how token spend interacts with context window limits across model families (GPT 4o, Claude, Gemini) and can make architecture trade offs accordingly. Must Have Skills Agentic architecture design - Multi agent orchestration, tool use design, human in the loop checkpoints, agent failure modes and recovery Human + agent workflow design - Task decomposition across human and AI agents; escalation paths; accountability mapping in regulated environments Expertise in leveraging coding agents - GitHub Copilot, Claude, Devin.ai and similar - to accelerate software delivery within a structured, governed engineering lifecycle Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation Full stack application development - Architecture and delivery of modern full stack applications; proficiency across frontend frameworks, API layers, backend services, and data tiers at enterprise scale Modern CI/CD & delivery pipelines - Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation High scalability integration - Architecting event driven and streaming integration at scale using Apache Kafka and Kafka Streams; asynchronous messaging patterns, schema registries, and real time data pipelines across distributed systems NoSQL & enterprise data platforms - Design of polyglot persistence architectures spanning NoSQL stores (MongoDB, Cassandra, DynamoDB), enterprise caching layers (Redis, Hazelcast, Memcached) and search platforms (Elasticsearch, OpenSearch) Hyperscaler resilience patterns - Building highly available, fault tolerant solutions on AWS, Azure and GCP - multi region active/active, chaos engineering, SRE practices, availability zone failover, and disaster recovery at cloud scale Token economics & LLM costing - Prompt compression, context window sizing, model tier selection, cost per transaction modelling at enterprise scale AI TCO & commercial modelling - Inference cost projections, build vs buy for foundation models, ROI framing for AI augmented delivery Digital transformation leadership - AI native programme design spanning cloud, integration, agentic capability layers and responsible AI governance Enterprise integration patterns - Streaming, API, event driven and real time patterns extending to RAG, vector stores, embedding services and LLM APIs as first class integration nodes Chief Architect leadership - Governing cross domain architect teams while managing AI risk, hallucination mitigation and responsible AI policy at programme level Multi cloud architecture (15+ yrs) - Hybrid IaaS/PaaS, multi az/region, IaC automation first, DevSecOps, K8s orchestration Influencing & stakeholder leadership - Builds and sustains networks across organisational boundaries through credibility and influence rather than authority; aligns diverse stakeholders - engineering, business, and executive - around a shared technology direction and drives teams to deliver outcomes in complex, matrixed environments CXO communication - Articulates at the right level of abstraction and detail from developer to board level Preferred Should be an excellent planner when it comes to release planning and other delivery planning. Should have excellent problem solving skills. Responsible for coaching and mentoring team members. BFSI/FS domain experience. Personal High analytical skills. High customer orientation. High quality awareness. Equal Opportunity Employer All aspects of employment at Infosys are based on merit, competence and performance. We are committed to embracing diversity and creating an inclusive environment for all employees. Infosys is proud to be an equal opportunity employer.
Thought Machine's mission is bold - to properly and permanently rid the world's banks of legacy technology. To achieve this, we have developed the foundations of modern banking through core and payments technology which run natively in the cloud. What we are attempting is hard and means we need great people working together to build great technology. We have grown rapidly in the past few years - growing our team to more than 550 individuals across offices in London, New York, Singapore and Sydney. We have raised more than $500m in funding and are now valued at $2.7bn. Our investors include Molten Ventures, Eurazeo, Intesa Sanpaolo, Temasek, Nyca Partners, JPMorgan Chase Strategic Investments, Standard Chartered Ventures, and more. We have created a culture that enables our team to produce the best work in the industry while ensuring we have fun along the way. We're regularly cited as having a fantastic workplace culture and have been recognised by Sifted magazine as having one of the highest Glassdoor ratings for a UK fintech company and the industry's most generous employee share package. Named one of the world's most innovative fintechs by Global Finance Magazine, we were also recognised by the Financial Times as one of Europe's fastest-growing companies for two consecutive years-and a UK Best Employer for 2026. Thought Machine's Site Reliability Engineers are the guardians of mission-critical systems for the world's most influential financial institutions. As a member of our elite, globally distributed team, you'll be entrusted with running and maintaining the robust production infrastructure that powers our customers' cutting-edge Core Banking and Payments platforms. This is an opportunity to make a tangible impact on the global financial landscape while collaborating with brilliant minds to solve complex engineering challenges. This role will be part of the Site Reliability Engineering team at Thought Machine HQ in London. The team is deeply involved in tackling the technical challenges of executing Thought Machine's growth ambitions - expect to be working with senior stakeholders in the organisation, our customers, and working on programmes and initiatives that are critical to the success of the company. As an SRE at Thought Machine, you will be responsible for: Supporting the product engineering teams in building highly fault-tolerant, scalable applications by participating in design discussions, engaging in RFCs and code reviews. Contributing to the execution of department strategies such as implementing disaster recovery, backup, redundancy, and capacity planning activities. Participating in a global on-call rotation responsible for identifying and fixing bottlenecks in SaaS customer environments. Regular maintenance of production systems that host Vault products. Contributing to the evolution of our SaaS products by building features that foster exceptional reliability and an unparalleled user experience. Implementing and testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform. Maintaining high-quality written documentation of assets, processes and runbooks that are used by the team in their day-to-day operations. Collaborating effectively with team members, actively participating in knowledge sharing, and continuously growing your own technical understanding of Vault Products. What we're looking for: You have experience successfully delivering engineering tasks and projects with a focus on reliability and scalability. You possess a good understanding of design patterns relevant to hosting and networking architectures. You proactively champion product development, driven by a desire to build truly exceptional products, not just solve immediate challenges. You have a strong background working in either Python, Golang or Java, having used one of these programming languages to build production level software. You have experience working with Kubernetes or other container orchestration systems. You have experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible. You have a good understanding of one or more of the following areas: Database Administration, Networking, Observability Tools (such as Prometheus, Jaeger) or automation infrastructure. You have solid experience working with either GCP or AWS. Benefits: Highly competitive salary Pension plan (match up to 5%) Life insurance - three times annual salary Competitive maternity (six months fully paid) and paternity leave (four weeks fully paid) Shared parental leave (matched to our maternity leave for the same point in time) 25 days holiday and bank holidays Flexible working hours Cycle-to-work scheme Electric car scheme Season ticket loan Access to outstanding learning materials and courses Sports and hobby clubs, subsidised by Thought Machine All the latest tech you need Start the day properly with fresh fruit and cereals Huge range of healthy (and not-so-healthy) snacks, smoothies and drinks A talented and experienced team as your colleagues An environment where we encourage learning and progress Two charity days a year Weekly food pop-up We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those with different abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.
13/06/2026
Full time
Thought Machine's mission is bold - to properly and permanently rid the world's banks of legacy technology. To achieve this, we have developed the foundations of modern banking through core and payments technology which run natively in the cloud. What we are attempting is hard and means we need great people working together to build great technology. We have grown rapidly in the past few years - growing our team to more than 550 individuals across offices in London, New York, Singapore and Sydney. We have raised more than $500m in funding and are now valued at $2.7bn. Our investors include Molten Ventures, Eurazeo, Intesa Sanpaolo, Temasek, Nyca Partners, JPMorgan Chase Strategic Investments, Standard Chartered Ventures, and more. We have created a culture that enables our team to produce the best work in the industry while ensuring we have fun along the way. We're regularly cited as having a fantastic workplace culture and have been recognised by Sifted magazine as having one of the highest Glassdoor ratings for a UK fintech company and the industry's most generous employee share package. Named one of the world's most innovative fintechs by Global Finance Magazine, we were also recognised by the Financial Times as one of Europe's fastest-growing companies for two consecutive years-and a UK Best Employer for 2026. Thought Machine's Site Reliability Engineers are the guardians of mission-critical systems for the world's most influential financial institutions. As a member of our elite, globally distributed team, you'll be entrusted with running and maintaining the robust production infrastructure that powers our customers' cutting-edge Core Banking and Payments platforms. This is an opportunity to make a tangible impact on the global financial landscape while collaborating with brilliant minds to solve complex engineering challenges. This role will be part of the Site Reliability Engineering team at Thought Machine HQ in London. The team is deeply involved in tackling the technical challenges of executing Thought Machine's growth ambitions - expect to be working with senior stakeholders in the organisation, our customers, and working on programmes and initiatives that are critical to the success of the company. As an SRE at Thought Machine, you will be responsible for: Supporting the product engineering teams in building highly fault-tolerant, scalable applications by participating in design discussions, engaging in RFCs and code reviews. Contributing to the execution of department strategies such as implementing disaster recovery, backup, redundancy, and capacity planning activities. Participating in a global on-call rotation responsible for identifying and fixing bottlenecks in SaaS customer environments. Regular maintenance of production systems that host Vault products. Contributing to the evolution of our SaaS products by building features that foster exceptional reliability and an unparalleled user experience. Implementing and testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform. Maintaining high-quality written documentation of assets, processes and runbooks that are used by the team in their day-to-day operations. Collaborating effectively with team members, actively participating in knowledge sharing, and continuously growing your own technical understanding of Vault Products. What we're looking for: You have experience successfully delivering engineering tasks and projects with a focus on reliability and scalability. You possess a good understanding of design patterns relevant to hosting and networking architectures. You proactively champion product development, driven by a desire to build truly exceptional products, not just solve immediate challenges. You have a strong background working in either Python, Golang or Java, having used one of these programming languages to build production level software. You have experience working with Kubernetes or other container orchestration systems. You have experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible. You have a good understanding of one or more of the following areas: Database Administration, Networking, Observability Tools (such as Prometheus, Jaeger) or automation infrastructure. You have solid experience working with either GCP or AWS. Benefits: Highly competitive salary Pension plan (match up to 5%) Life insurance - three times annual salary Competitive maternity (six months fully paid) and paternity leave (four weeks fully paid) Shared parental leave (matched to our maternity leave for the same point in time) 25 days holiday and bank holidays Flexible working hours Cycle-to-work scheme Electric car scheme Season ticket loan Access to outstanding learning materials and courses Sports and hobby clubs, subsidised by Thought Machine All the latest tech you need Start the day properly with fresh fruit and cereals Huge range of healthy (and not-so-healthy) snacks, smoothies and drinks A talented and experienced team as your colleagues An environment where we encourage learning and progress Two charity days a year Weekly food pop-up We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those with different abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.
Requisition ID31464-Posted -London-Janus Henderson A career at Janus Henderson is more than a job, it's about investing in a brighter future together. Our Mission at Janus Henderson is to help clients define and achieve superior financial outcomes through differentiated insights, disciplined investments, and world class service. We will do this by protecting and growing our core business, amplifying our strengths and diversifying where we have the right. Our Values are key to driving our success, and are at the heart of everything we do: Clients Come First - Always Execution Supersedes Intention Together We Win Diversity Improves Results Truth Builds Trust If our mission, values, and purpose align with your own, we would love to hear from you! Your opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Design, build, and evolve enterprise platform services to support the Internal Developer Platform (IDP) and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g., Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. What to expect when you join our firm Hybrid working and reasonable accommodations Excellent Health and Wellbeing benefits including corporate membership to Wellhub Paid volunteer time to step away from your desk and into the community Support to grow through professional development courses, tuition/qualification reimbursement and more Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Must have skills Bachelor's or master's in computer science, Engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Strong experience in DevOps, Platform Engineering, or Infrastructure Engineering roles within enterprise environments Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from ADO to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g., Veracode), secrets management, and secure pipeline execution Expertise in artifact management (e.g., Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong knowledge of containerisation and cloud native technologies (Docker, Kubernetes) Experience with observability and monitoring frameworks (e.g., Azure Monitor, Prometheus, Grafana) and understanding of SRE practices (SLIs/SLOs, reliability engineering) Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Exposure to AI enabled developer tooling (e.g., GitHub Copilot, intelligent automation) and improving developer experience Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Experience building or contributing to an Internal Developer Platform (IDP) Exposure to AI enabled development (e.g., GitHub Copilot, automation workflows) Knowledge of FinOps, cost optimisation, and cloud governance Relevant cloud or Kubernetes certifications Supervisory responsibilities No Potential for growth Regular training Continuing education courses Cross functional collaboration You will be expected to understand the regulatory obligations of the firm and abide by the regulated entity requirements and JHI policies applicable for your role. At Janus Henderson Investors we're committed to an inclusive and supportive environment. We believe diversity improves results and we welcome applications from candidates from all backgrounds. Don't worry if you don't think you tick every box, we still want to hear from you! We understand everyone has different commitments and while we can't accommodate every flexible working request, we're happy to be asked about work flexibility and our hybrid working environment. If you need any reasonable accommodations during our recruitment process, please get in touch and let us know at . Annual Bonus Opportunity: Position may be eligible to receive an annual discretionary bonus award from the profit pool. The profit pool is funded based on Company profits. Individual bonuses are determined based on Company, department, team and individual performance. Benefits: Janus Henderson is committed to offering a comprehensive total rewards package to eligible employees that includes; competitive compensation, pension/retirement plans, and various health, wellbeing and lifestyle benefits. To learn more about our offerings please visit the Why Join Us section on the career pagehere . Janus Henderson Investors is an equal opportunity employer.All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks. Janus Henderson (including its subsidiaries) will not maintain existing or sponsor new industry registrations or licenses where not supported by an employee's job functions (as determined by Janus Henderson at its sole discretion). You should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities . click apply for full job details
12/06/2026
Full time
Requisition ID31464-Posted -London-Janus Henderson A career at Janus Henderson is more than a job, it's about investing in a brighter future together. Our Mission at Janus Henderson is to help clients define and achieve superior financial outcomes through differentiated insights, disciplined investments, and world class service. We will do this by protecting and growing our core business, amplifying our strengths and diversifying where we have the right. Our Values are key to driving our success, and are at the heart of everything we do: Clients Come First - Always Execution Supersedes Intention Together We Win Diversity Improves Results Truth Builds Trust If our mission, values, and purpose align with your own, we would love to hear from you! Your opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Design, build, and evolve enterprise platform services to support the Internal Developer Platform (IDP) and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g., Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. What to expect when you join our firm Hybrid working and reasonable accommodations Excellent Health and Wellbeing benefits including corporate membership to Wellhub Paid volunteer time to step away from your desk and into the community Support to grow through professional development courses, tuition/qualification reimbursement and more Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Must have skills Bachelor's or master's in computer science, Engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Strong experience in DevOps, Platform Engineering, or Infrastructure Engineering roles within enterprise environments Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from ADO to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g., Veracode), secrets management, and secure pipeline execution Expertise in artifact management (e.g., Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong knowledge of containerisation and cloud native technologies (Docker, Kubernetes) Experience with observability and monitoring frameworks (e.g., Azure Monitor, Prometheus, Grafana) and understanding of SRE practices (SLIs/SLOs, reliability engineering) Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Exposure to AI enabled developer tooling (e.g., GitHub Copilot, intelligent automation) and improving developer experience Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Experience building or contributing to an Internal Developer Platform (IDP) Exposure to AI enabled development (e.g., GitHub Copilot, automation workflows) Knowledge of FinOps, cost optimisation, and cloud governance Relevant cloud or Kubernetes certifications Supervisory responsibilities No Potential for growth Regular training Continuing education courses Cross functional collaboration You will be expected to understand the regulatory obligations of the firm and abide by the regulated entity requirements and JHI policies applicable for your role. At Janus Henderson Investors we're committed to an inclusive and supportive environment. We believe diversity improves results and we welcome applications from candidates from all backgrounds. Don't worry if you don't think you tick every box, we still want to hear from you! We understand everyone has different commitments and while we can't accommodate every flexible working request, we're happy to be asked about work flexibility and our hybrid working environment. If you need any reasonable accommodations during our recruitment process, please get in touch and let us know at . Annual Bonus Opportunity: Position may be eligible to receive an annual discretionary bonus award from the profit pool. The profit pool is funded based on Company profits. Individual bonuses are determined based on Company, department, team and individual performance. Benefits: Janus Henderson is committed to offering a comprehensive total rewards package to eligible employees that includes; competitive compensation, pension/retirement plans, and various health, wellbeing and lifestyle benefits. To learn more about our offerings please visit the Why Join Us section on the career pagehere . Janus Henderson Investors is an equal opportunity employer.All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks. Janus Henderson (including its subsidiaries) will not maintain existing or sponsor new industry registrations or licenses where not supported by an employee's job functions (as determined by Janus Henderson at its sole discretion). You should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities . click apply for full job details
Platform Architect Hybrid £70,000-£100,000 About the Role: We're partnering with a growing SaaS business to hire a senior Platform Architect to own the design, security, reliability, and operational management of their AWS platform and internal IT function. This is a hands-on leadership role in a lean organisation where you'll shape cloud architecture, modernise a legacy platform into a cloud-native environment, and provide senior oversight across platform engineering, security, SRE, CI/CD, and operational IT. Key Responsibilities: Own the AWS platform architecture and modernisation roadmap, including migration from a Java monolith to microservices on EKS. Define standards for containers, runtime environments, observability, tenancy, security, and infrastructure automation. Lead SRE practices including SLI/SLOs, incident management, DR/BCP planning, post-mortems, and operational resilience. Own platform security, secure SDLC, CI/CD pipelines, IaC, and software supply chain governance. Drive developer productivity through automation, self-service tooling, and platform standardisation. Provide senior oversight of IT operations including service desk governance, endpoint management, onboarding/offboarding, patching, ITAM, and MSP/vendor management. Act as a senior escalation point for critical incidents, outages, and operational issues. About You: Experience within a platform, infrastructure, or software engineering within SaaS environments. Strong AWS expertise including EKS, IAM, networking, KMS, RDS, and multi-account architecture. Hands-on Kubernetes, CI/CD, Terraform, and cloud security experience. Strong understanding of SRE, observability, incident response, and disaster recovery. Experience operating within regulated environments such as ISO 27001, SOC 2, or GxP. Comfortable balancing strategic leadership with hands-on operational delivery. AWS Solutions Architect - Professional certification required. CKA or CKS certification highly desirable. Platform Architect Hybrid £70,000-£100,000 Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.
11/06/2026
Full time
Platform Architect Hybrid £70,000-£100,000 About the Role: We're partnering with a growing SaaS business to hire a senior Platform Architect to own the design, security, reliability, and operational management of their AWS platform and internal IT function. This is a hands-on leadership role in a lean organisation where you'll shape cloud architecture, modernise a legacy platform into a cloud-native environment, and provide senior oversight across platform engineering, security, SRE, CI/CD, and operational IT. Key Responsibilities: Own the AWS platform architecture and modernisation roadmap, including migration from a Java monolith to microservices on EKS. Define standards for containers, runtime environments, observability, tenancy, security, and infrastructure automation. Lead SRE practices including SLI/SLOs, incident management, DR/BCP planning, post-mortems, and operational resilience. Own platform security, secure SDLC, CI/CD pipelines, IaC, and software supply chain governance. Drive developer productivity through automation, self-service tooling, and platform standardisation. Provide senior oversight of IT operations including service desk governance, endpoint management, onboarding/offboarding, patching, ITAM, and MSP/vendor management. Act as a senior escalation point for critical incidents, outages, and operational issues. About You: Experience within a platform, infrastructure, or software engineering within SaaS environments. Strong AWS expertise including EKS, IAM, networking, KMS, RDS, and multi-account architecture. Hands-on Kubernetes, CI/CD, Terraform, and cloud security experience. Strong understanding of SRE, observability, incident response, and disaster recovery. Experience operating within regulated environments such as ISO 27001, SOC 2, or GxP. Comfortable balancing strategic leadership with hands-on operational delivery. AWS Solutions Architect - Professional certification required. CKA or CKS certification highly desirable. Platform Architect Hybrid £70,000-£100,000 Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.
Senior AWS Cloud Platform Engineer / DevOps Engineer (SC or DV Cleared) Location: Hybrid UK Security Clearance: Active SC or DV Clearance required Contract: Initial 6 months with extension potential Day Rate: Competitive (Inside IR35) Overview We are supporting a major national technology programme seeking an experienced AWS Cloud Platform Engineer to join a modern engineering team delivering cloud-native applications, platform services and automation capability within a secure environment. This role sits within a highly collaborative engineering function working alongside Software Engineers, Platform Engineers and Automation Specialists. The successful candidate will help improve software delivery velocity, deployment automation, platform reliability and engineering quality across AWS-hosted services and cloud-native platforms. This is not a traditional testing role. We are looking for an engineer with strong AWS, DevOps and platform engineering capability who understands modern automation, release engineering and cloud-native delivery practices. Key Responsibilities Design, build and maintain AWS cloud infrastructure and platform services. Develop and enhance Infrastructure as Code solutions using Terraform. Support containerised workloads and Kubernetes-based platforms. Build and optimise CI/CD pipelines to improve deployment speed and reliability. Work closely with software engineering teams to improve delivery workflows and release processes. Implement automation solutions that improve platform resilience, operational efficiency and deployment quality. Support cloud-native application delivery across microservices and distributed systems architectures. Contribute to DevOps best practice, platform engineering standards and automation strategies. Assist with cloud integration, deployment validation and release assurance activities. Essential Skills & Experience AWS Cloud Engineering Strong hands on AWS experience across modern cloud-native environments. Experience working with AWS services such as Lambda, ECS/EKS, S3, API Gateway, CloudWatch, SNS and SQS. AWS certification strongly preferred (Solutions Architect, DevOps Engineer, SysOps or Developer). Infrastructure as Code Strong Terraform experience. Experience building and maintaining cloud infrastructure through Infrastructure as Code. Exposure to CloudFormation or AWS CDK beneficial. DevOps & CI/CD Experience designing and maintaining CI/CD pipelines. GitLab CI, GitHub Actions, Jenkins, Azure DevOps or similar tooling. Strong understanding of deployment automation and release engineering practices. Containers & Platform Engineering Kubernetes experience (EKS preferred). Docker and containerised application delivery. Experience supporting cloud-native platforms and microservices environments. Development & Automation Python experience preferred. Experience building automation tooling and scripts. Understanding of REST APIs, integration patterns and distributed systems. Desirable Experience AWS DevOps Engineer Professional certification. AWS Solutions Architect certification. Experience working within Government, Defence, Law Enforcement or other highly regulated environments. Experience supporting platform engineering or SRE functions. Experience integrating automated validation and quality controls into CI/CD pipelines. Exposure to modern AI, machine learning or cloud-hosted AI services. Ideal Backgrounds We would particularly like to hear from candidates currently working as: AWS DevOps Engineer Cloud Platform Engineer Platform Engineer Site Reliability Engineer (SRE) Cloud Infrastructure Engineer AWS Cloud Engineer DevOps Platform Engineer Kubernetes Engineer Cloud Automation Engineer What Success Looks Like The successful candidate will help drive cloud-native engineering excellence, improving deployment automation, platform reliability and software delivery performance across a modern AWS environment. They will be comfortable operating across platform engineering, DevOps, cloud infrastructure and automation disciplines, working closely with software engineering teams to accelerate delivery and improve operational resilience.
09/06/2026
Full time
Senior AWS Cloud Platform Engineer / DevOps Engineer (SC or DV Cleared) Location: Hybrid UK Security Clearance: Active SC or DV Clearance required Contract: Initial 6 months with extension potential Day Rate: Competitive (Inside IR35) Overview We are supporting a major national technology programme seeking an experienced AWS Cloud Platform Engineer to join a modern engineering team delivering cloud-native applications, platform services and automation capability within a secure environment. This role sits within a highly collaborative engineering function working alongside Software Engineers, Platform Engineers and Automation Specialists. The successful candidate will help improve software delivery velocity, deployment automation, platform reliability and engineering quality across AWS-hosted services and cloud-native platforms. This is not a traditional testing role. We are looking for an engineer with strong AWS, DevOps and platform engineering capability who understands modern automation, release engineering and cloud-native delivery practices. Key Responsibilities Design, build and maintain AWS cloud infrastructure and platform services. Develop and enhance Infrastructure as Code solutions using Terraform. Support containerised workloads and Kubernetes-based platforms. Build and optimise CI/CD pipelines to improve deployment speed and reliability. Work closely with software engineering teams to improve delivery workflows and release processes. Implement automation solutions that improve platform resilience, operational efficiency and deployment quality. Support cloud-native application delivery across microservices and distributed systems architectures. Contribute to DevOps best practice, platform engineering standards and automation strategies. Assist with cloud integration, deployment validation and release assurance activities. Essential Skills & Experience AWS Cloud Engineering Strong hands on AWS experience across modern cloud-native environments. Experience working with AWS services such as Lambda, ECS/EKS, S3, API Gateway, CloudWatch, SNS and SQS. AWS certification strongly preferred (Solutions Architect, DevOps Engineer, SysOps or Developer). Infrastructure as Code Strong Terraform experience. Experience building and maintaining cloud infrastructure through Infrastructure as Code. Exposure to CloudFormation or AWS CDK beneficial. DevOps & CI/CD Experience designing and maintaining CI/CD pipelines. GitLab CI, GitHub Actions, Jenkins, Azure DevOps or similar tooling. Strong understanding of deployment automation and release engineering practices. Containers & Platform Engineering Kubernetes experience (EKS preferred). Docker and containerised application delivery. Experience supporting cloud-native platforms and microservices environments. Development & Automation Python experience preferred. Experience building automation tooling and scripts. Understanding of REST APIs, integration patterns and distributed systems. Desirable Experience AWS DevOps Engineer Professional certification. AWS Solutions Architect certification. Experience working within Government, Defence, Law Enforcement or other highly regulated environments. Experience supporting platform engineering or SRE functions. Experience integrating automated validation and quality controls into CI/CD pipelines. Exposure to modern AI, machine learning or cloud-hosted AI services. Ideal Backgrounds We would particularly like to hear from candidates currently working as: AWS DevOps Engineer Cloud Platform Engineer Platform Engineer Site Reliability Engineer (SRE) Cloud Infrastructure Engineer AWS Cloud Engineer DevOps Platform Engineer Kubernetes Engineer Cloud Automation Engineer What Success Looks Like The successful candidate will help drive cloud-native engineering excellence, improving deployment automation, platform reliability and software delivery performance across a modern AWS environment. They will be comfortable operating across platform engineering, DevOps, cloud infrastructure and automation disciplines, working closely with software engineering teams to accelerate delivery and improve operational resilience.
We are seeking an experienced Platform Engineering Lead (contract) to support the evolution of our cloud and application delivery strategy as we modernise our deployment, integration, and operational capabilities. This role will focus on building scalable, secure, and automated platform services that improve developer productivity, deployment consistency, operational resilience, and software delivery across engineering teams. The successful candidate will play a key role in transitioning from traditional infrastructure deployments toward modern cloud native and platform engineering practices. Security will be a core component of the role, ensuring that platform services, deployment pipelines, and engineering standards are designed with secure by default principles. The role will help embed DevSecOps practices across the software delivery lifecycle, ensuring security controls are automated, repeatable, and integrated into engineering workflows rather than applied retrospectively. The role will also contribute to the development of platform security standards and recommendations, supporting secure application delivery, cloud governance, identity management, secrets handling, and compliance requirements across engineering environments. In addition, the role will support secure collaboration models for internal engineering teams, third party development partners, and external data providers, ensuring appropriate identity management, access controls, environment segregation, and secure integration practices are implemented across platform services and delivery pipelines. Key Responsibilities Design and maintain CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tooling. Implement automated multi stage deployment pipelines across development, test, UAT, and production environments. Support blue/green and phased deployment strategies. Develop Infrastructure as Code solutions using Terraform and/or Bicep. Build reusable infrastructure templates and standardised deployment patterns. Support cloud native services and event driven architectures using Azure technologies. Implement security controls within CI/CD pipelines including code scanning, dependency validation, secrets detection, and policy enforcement. Support secure identity, access control, and secrets management practices across cloud platforms and deployment pipelines. Support secure collaboration and integration with internal teams, third party development partners, and external data providers. Contribute to platform security recommendations, engineering governance, and secure deployment standards. Support security testing and validation activities within pre production and UAT environments. Contribute to platform architecture standards, engineering governance, and future technology strategy. Implement monitoring, logging, alerting, and operational resilience practices aligned to SRE principles. Support operational stability, incident management, and platform optimisation initiatives. Technical Skills & Experience Essential: Azure cloud platform experience, CI/CD pipeline engineering (Azure DevOps, GitHub Actions), Infrastructure as Code (Terraform and/or Bicep), Automation and scripting (PowerShell, Bash, Python), Experience with Azure Service Bus and serverless technologies, Security and DevSecOps practices, Experience implementing security controls within CI/CD pipelines, Identity and access management experience, Monitoring, logging, and operational support experience, Experience working within cloud native or platform engineering environments. Desirable GitOps practices SRE / reliability engineering experience Experience defining platform standards and reference architectures Knowledge of security tooling including SAST/DAST, vulnerability scanning, and policy as code frameworks.
09/06/2026
Full time
We are seeking an experienced Platform Engineering Lead (contract) to support the evolution of our cloud and application delivery strategy as we modernise our deployment, integration, and operational capabilities. This role will focus on building scalable, secure, and automated platform services that improve developer productivity, deployment consistency, operational resilience, and software delivery across engineering teams. The successful candidate will play a key role in transitioning from traditional infrastructure deployments toward modern cloud native and platform engineering practices. Security will be a core component of the role, ensuring that platform services, deployment pipelines, and engineering standards are designed with secure by default principles. The role will help embed DevSecOps practices across the software delivery lifecycle, ensuring security controls are automated, repeatable, and integrated into engineering workflows rather than applied retrospectively. The role will also contribute to the development of platform security standards and recommendations, supporting secure application delivery, cloud governance, identity management, secrets handling, and compliance requirements across engineering environments. In addition, the role will support secure collaboration models for internal engineering teams, third party development partners, and external data providers, ensuring appropriate identity management, access controls, environment segregation, and secure integration practices are implemented across platform services and delivery pipelines. Key Responsibilities Design and maintain CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tooling. Implement automated multi stage deployment pipelines across development, test, UAT, and production environments. Support blue/green and phased deployment strategies. Develop Infrastructure as Code solutions using Terraform and/or Bicep. Build reusable infrastructure templates and standardised deployment patterns. Support cloud native services and event driven architectures using Azure technologies. Implement security controls within CI/CD pipelines including code scanning, dependency validation, secrets detection, and policy enforcement. Support secure identity, access control, and secrets management practices across cloud platforms and deployment pipelines. Support secure collaboration and integration with internal teams, third party development partners, and external data providers. Contribute to platform security recommendations, engineering governance, and secure deployment standards. Support security testing and validation activities within pre production and UAT environments. Contribute to platform architecture standards, engineering governance, and future technology strategy. Implement monitoring, logging, alerting, and operational resilience practices aligned to SRE principles. Support operational stability, incident management, and platform optimisation initiatives. Technical Skills & Experience Essential: Azure cloud platform experience, CI/CD pipeline engineering (Azure DevOps, GitHub Actions), Infrastructure as Code (Terraform and/or Bicep), Automation and scripting (PowerShell, Bash, Python), Experience with Azure Service Bus and serverless technologies, Security and DevSecOps practices, Experience implementing security controls within CI/CD pipelines, Identity and access management experience, Monitoring, logging, and operational support experience, Experience working within cloud native or platform engineering environments. Desirable GitOps practices SRE / reliability engineering experience Experience defining platform standards and reference architectures Knowledge of security tooling including SAST/DAST, vulnerability scanning, and policy as code frameworks.
An amazing Global Investment Client of ours located in Central London are looking for a Site Reliability Engineer to join their team on a permanent basis. This is a rare opportunity and the package offered for this role is up to £300k depending on skills and experience. About the Company The company is a leading provider of alternative investment solutions with approximately $63 billion of assets under management ("AUM") and over 550 employees worldwide including London, New York, Singapore and Hong Kong. One of their founding beliefs is that technology and data are at the core of the business allowing them to build and maintain cutting edge hardware and software solutions. The technology team is lean and has a culture that encourages interaction across all areas of the business on a global scale. Their aim is to use the best tool for the job therefore there is the opportunity to be constantly learning and use modern technologies. Their teams strive to push boundaries and think innovatively creating an environment that is fast paced, dynamic and successful. About the Role They are looking for an enthusiastic Site Reliability Engineer to join the SRE team in London. Their team is central to the business as they are responsible for the technology that underpins everything they do therefore you will have a direct impact on the success of the company. From scaling for the huge volumes of data that drive their research process, to improving the reliability and speed of a rapidly evolving application estate, there is always a relentless focus on automation and efficiency at scale. The company's engineers own their varied technology stack, end-to-end, and are in constant search of incremental improvements, new technologies and ways of working to evolve their platform and give them a competitive edge. They are looking for people who want to find unique solutions for optimising efficiency and performance in a context where they are key enablers. The ideal candidate will be passionate about improving reliability and removing toil by identifying opportunities for automation and building platforms to make the systems more "reliable by default". Responsibilities Evangelise the SRE mindset and implement best practices across the environment Understand the business and find ways to measure and enhance resilience across the application estate Eliminate the toil that emerges with complex, distributed systems by automating where possible Working as both an individual contributor and collaboratively to find new ways of improving the reliability, availability, security and performance of the infrastructure Accelerate the migration strategy to more cloud-native, distributed applications Improve productivity and developer experience through automation and interface improvements in local tool chains, IDEs, CI/CD. Requirements Expert level scripting / coding skills in one or more languages (Python / Golang etc.) Expert knowledge of observability systems (Prometheus / ELK / Jaeger / Opentelemetry / Service Meshes etc.) Experience with configuration management tools (Ansible / Puppet / Kapitan / Terraform) Experience with distributed data platforms (Kafka / Flink / Airflow) Comfortable using cloud native and containerisation technologies (Kubernetes / Docker) Good Linux systems knowledge (experience with RHEL desirable) Broad knowledge across network technologies, server virtualisation and storage Self-starter, able to quickly pick up concepts, implement new ideas and think outside the box Focused on improving system reliability, availability, security, and performance through testing, automation, and standardisation Ability to simply articulate the "why" behind best practices Ability to build positive and collaborative relationships with colleagues across teams and geographies Benefits Food & Beverage: Complimentary breakfast and lunch for all employees plus on-site coffee bars and a wide variety of healthy snacks. Annual Discretionary Bonuses: Reflecting firm and individual performance. Cycle to Work Initiative: Green loan scheme which employees are able to use for the purchase of bicycles. Employee Referral Programme: Bonus for each successful hire in the month your referral joins the company. Global Office Design: They aim to create a cohesive environment, regardless of region. They've designed office spaces to ensure everyone feels the connection no matter where you're located. Pension Scheme: Generous pension and retirement savings plans. Carbon Offset Programme: The company offsets its CO2 emissions annually and aims to sustainably source all office materials. Physical and Mental Fitness: Health and wellness benefits include an onsite gym & classes (LDN and NYC), gym subsidies in other regions, access to mental health support, and subscriptions to mindfulness platforms. Charity Donation Matching: Generous charity matching scheme and ample opportunities to become involved in the community. They offer charity of the year awards in each region and encourage employees to submit causes they're passionate about. Enhanced Caregiver Leave: Enhanced, flexible primary and secondary caregiver leave. Sabbatical: Generous sabbatical after you've been with the company for 8 years and every 4 years after that. Annual Training Allowance: Encourage personal and professional development. This allowance may be used towards conferences, seminars, and training courses which supplement extensive on-site training materials. Health and Life Insurance: Range of healthcare benefits to help you manage your personal, physical and emotional wellbeing.
09/06/2026
Full time
An amazing Global Investment Client of ours located in Central London are looking for a Site Reliability Engineer to join their team on a permanent basis. This is a rare opportunity and the package offered for this role is up to £300k depending on skills and experience. About the Company The company is a leading provider of alternative investment solutions with approximately $63 billion of assets under management ("AUM") and over 550 employees worldwide including London, New York, Singapore and Hong Kong. One of their founding beliefs is that technology and data are at the core of the business allowing them to build and maintain cutting edge hardware and software solutions. The technology team is lean and has a culture that encourages interaction across all areas of the business on a global scale. Their aim is to use the best tool for the job therefore there is the opportunity to be constantly learning and use modern technologies. Their teams strive to push boundaries and think innovatively creating an environment that is fast paced, dynamic and successful. About the Role They are looking for an enthusiastic Site Reliability Engineer to join the SRE team in London. Their team is central to the business as they are responsible for the technology that underpins everything they do therefore you will have a direct impact on the success of the company. From scaling for the huge volumes of data that drive their research process, to improving the reliability and speed of a rapidly evolving application estate, there is always a relentless focus on automation and efficiency at scale. The company's engineers own their varied technology stack, end-to-end, and are in constant search of incremental improvements, new technologies and ways of working to evolve their platform and give them a competitive edge. They are looking for people who want to find unique solutions for optimising efficiency and performance in a context where they are key enablers. The ideal candidate will be passionate about improving reliability and removing toil by identifying opportunities for automation and building platforms to make the systems more "reliable by default". Responsibilities Evangelise the SRE mindset and implement best practices across the environment Understand the business and find ways to measure and enhance resilience across the application estate Eliminate the toil that emerges with complex, distributed systems by automating where possible Working as both an individual contributor and collaboratively to find new ways of improving the reliability, availability, security and performance of the infrastructure Accelerate the migration strategy to more cloud-native, distributed applications Improve productivity and developer experience through automation and interface improvements in local tool chains, IDEs, CI/CD. Requirements Expert level scripting / coding skills in one or more languages (Python / Golang etc.) Expert knowledge of observability systems (Prometheus / ELK / Jaeger / Opentelemetry / Service Meshes etc.) Experience with configuration management tools (Ansible / Puppet / Kapitan / Terraform) Experience with distributed data platforms (Kafka / Flink / Airflow) Comfortable using cloud native and containerisation technologies (Kubernetes / Docker) Good Linux systems knowledge (experience with RHEL desirable) Broad knowledge across network technologies, server virtualisation and storage Self-starter, able to quickly pick up concepts, implement new ideas and think outside the box Focused on improving system reliability, availability, security, and performance through testing, automation, and standardisation Ability to simply articulate the "why" behind best practices Ability to build positive and collaborative relationships with colleagues across teams and geographies Benefits Food & Beverage: Complimentary breakfast and lunch for all employees plus on-site coffee bars and a wide variety of healthy snacks. Annual Discretionary Bonuses: Reflecting firm and individual performance. Cycle to Work Initiative: Green loan scheme which employees are able to use for the purchase of bicycles. Employee Referral Programme: Bonus for each successful hire in the month your referral joins the company. Global Office Design: They aim to create a cohesive environment, regardless of region. They've designed office spaces to ensure everyone feels the connection no matter where you're located. Pension Scheme: Generous pension and retirement savings plans. Carbon Offset Programme: The company offsets its CO2 emissions annually and aims to sustainably source all office materials. Physical and Mental Fitness: Health and wellness benefits include an onsite gym & classes (LDN and NYC), gym subsidies in other regions, access to mental health support, and subscriptions to mindfulness platforms. Charity Donation Matching: Generous charity matching scheme and ample opportunities to become involved in the community. They offer charity of the year awards in each region and encourage employees to submit causes they're passionate about. Enhanced Caregiver Leave: Enhanced, flexible primary and secondary caregiver leave. Sabbatical: Generous sabbatical after you've been with the company for 8 years and every 4 years after that. Annual Training Allowance: Encourage personal and professional development. This allowance may be used towards conferences, seminars, and training courses which supplement extensive on-site training materials. Health and Life Insurance: Range of healthcare benefits to help you manage your personal, physical and emotional wellbeing.
H&R Talent is seeking a Site Reliability Engineer to join their team in Central London. This permanent role offers a competitive salary up to £300k, depending on skills and experience. The candidate will contribute to the technology underpinning the business and improve system reliability while working in a dynamic and innovative environment. Responsibilities include automating processes, enhancing application resilience, and collaborating with a global team. Ideal candidates will have expertise in scripting, observability, and cloud-native technologies.
09/06/2026
Full time
H&R Talent is seeking a Site Reliability Engineer to join their team in Central London. This permanent role offers a competitive salary up to £300k, depending on skills and experience. The candidate will contribute to the technology underpinning the business and improve system reliability while working in a dynamic and innovative environment. Responsibilities include automating processes, enhancing application resilience, and collaborating with a global team. Ideal candidates will have expertise in scripting, observability, and cloud-native technologies.
Hybrid Working - London - 2 days a week on site. Lorien's leading banking client is looking for an exceptional AI Engineer with strong experience in Python, SQL, and working with Cloud Based AI/ML Ecosystems, and AWS SageMaker. This role will be building the pipelines, services, and monitoring capabilities that underpin AI observability and governance across the bank. This is a hands on, high impact role at the intersection of AI governance, distributed systems, observability, and platform engineering. You will develop core components of the platform, contribute to its evolution, and ensure our AI systems are measurable, transparent, and well controlled from model training through to production. The Ideal Candidate will have: Strong engineering foundations, with experience building scalable distributed systems or data platforms. Proficiency in Python, SQL, Java, and modern data processing frameworks. Experience working with cloud-based AI/ML ecosystems, particularly AWS SageMaker (required). This role is based in London. This role will be Via Umbrella. Working in a Hybrid Model of 2 days a week on site. What You'll Do Contribute to the development of data pipelines, APIs, and services that power the AI Control Tower. Implement components supporting AI observability, guardrails, performance monitoring, and lifecycle controls. Develop integrations with model registries, feature stores, lineage tools, and governance systems. Write clean, well tested, scalable code in Python, Java, SQL, and modern data/stream processing frameworks. Build high throughput pipelines to capture metrics such as: Model performance, drift, and degradation Operational and service health Security posture and policy adherence Guardrail compliance for ML and GenAI systems Governance and risk indicators Implement observability tooling using logging, metrics, tracing, and event driven patterns. Support monitoring and measurement of AI systems across development, deployment, and runtime environments. Work closely with data engineering, platform engineering, security, MLOps, and Independent Model Monitoring (IMM) teams. Contribute to integration efforts with AWS SageMaker, model pipelines, and enterprise data platforms. Use technologies such as AWS, SageMaker, Python, Java, Kafka, OpenTelemetry, and cloud native monitoring stacks. Support governance and reporting workflows with automated checks, standardised metrics, and platform tooling. Understanding of monitoring frameworks, observability pipelines, and dashboards. Familiarity of event-driven architectures and messaging systems (Kafka, Vert.x, or similar). Knowledge of security engineering, IAM principles, encryption, and cloud security controls. Experience with CI/CD, infrastructure-as-code, and automated testing for data/ML systems. Helpful Experience Exposure to MLOps, LLMOps, or model lifecycle management. Awareness of model risk and regulatory frameworks (e.g., SS1/23, NIST AI Risk Management Framework). Understanding of operational resilience concepts and SRE practices (SLIs/SLOs). Experience with data lineage or governance tooling (DataHub, Glue, Collibra). Interest in Responsible AI, explainability, fairness/bias, and governance automation Guidant, Carbon60, Lorien & SRG - The Impellam Group Portfolio are acting as an Employment Business in relation to this vacancy.
09/06/2026
Full time
Hybrid Working - London - 2 days a week on site. Lorien's leading banking client is looking for an exceptional AI Engineer with strong experience in Python, SQL, and working with Cloud Based AI/ML Ecosystems, and AWS SageMaker. This role will be building the pipelines, services, and monitoring capabilities that underpin AI observability and governance across the bank. This is a hands on, high impact role at the intersection of AI governance, distributed systems, observability, and platform engineering. You will develop core components of the platform, contribute to its evolution, and ensure our AI systems are measurable, transparent, and well controlled from model training through to production. The Ideal Candidate will have: Strong engineering foundations, with experience building scalable distributed systems or data platforms. Proficiency in Python, SQL, Java, and modern data processing frameworks. Experience working with cloud-based AI/ML ecosystems, particularly AWS SageMaker (required). This role is based in London. This role will be Via Umbrella. Working in a Hybrid Model of 2 days a week on site. What You'll Do Contribute to the development of data pipelines, APIs, and services that power the AI Control Tower. Implement components supporting AI observability, guardrails, performance monitoring, and lifecycle controls. Develop integrations with model registries, feature stores, lineage tools, and governance systems. Write clean, well tested, scalable code in Python, Java, SQL, and modern data/stream processing frameworks. Build high throughput pipelines to capture metrics such as: Model performance, drift, and degradation Operational and service health Security posture and policy adherence Guardrail compliance for ML and GenAI systems Governance and risk indicators Implement observability tooling using logging, metrics, tracing, and event driven patterns. Support monitoring and measurement of AI systems across development, deployment, and runtime environments. Work closely with data engineering, platform engineering, security, MLOps, and Independent Model Monitoring (IMM) teams. Contribute to integration efforts with AWS SageMaker, model pipelines, and enterprise data platforms. Use technologies such as AWS, SageMaker, Python, Java, Kafka, OpenTelemetry, and cloud native monitoring stacks. Support governance and reporting workflows with automated checks, standardised metrics, and platform tooling. Understanding of monitoring frameworks, observability pipelines, and dashboards. Familiarity of event-driven architectures and messaging systems (Kafka, Vert.x, or similar). Knowledge of security engineering, IAM principles, encryption, and cloud security controls. Experience with CI/CD, infrastructure-as-code, and automated testing for data/ML systems. Helpful Experience Exposure to MLOps, LLMOps, or model lifecycle management. Awareness of model risk and regulatory frameworks (e.g., SS1/23, NIST AI Risk Management Framework). Understanding of operational resilience concepts and SRE practices (SLIs/SLOs). Experience with data lineage or governance tooling (DataHub, Glue, Collibra). Interest in Responsible AI, explainability, fairness/bias, and governance automation Guidant, Carbon60, Lorien & SRG - The Impellam Group Portfolio are acting as an Employment Business in relation to this vacancy.
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does. Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation. We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world. As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision making. Drive Incident & Problem Management: Working with Operations teams on high priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post incident reviews and implement long term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements Essential Qualifications Experience: 5-7+ years of hands on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large scale monitoring and observability solutions. Expert Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem Solving: Exceptional analytical and strategic problem solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands on: Demonstratable hands on engineering and coding experience, ability to deep dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self motivated with a high degree of initiative and excellent follow up skills, along with strong analytical and problem solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field. Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator). Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category Universal Music Group
08/06/2026
Full time
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does. Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation. We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world. As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision making. Drive Incident & Problem Management: Working with Operations teams on high priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post incident reviews and implement long term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements Essential Qualifications Experience: 5-7+ years of hands on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large scale monitoring and observability solutions. Expert Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem Solving: Exceptional analytical and strategic problem solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands on: Demonstratable hands on engineering and coding experience, ability to deep dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self motivated with a high degree of initiative and excellent follow up skills, along with strong analytical and problem solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field. Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator). Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category Universal Music Group