it job board logo
  • Home
  • Find IT Jobs
  • Register CV
  • Career Advice
  • Contact us
  • Employers
    • Register as Employer
    • Pricing Plans
  • Recruiting? Post a job
  • Sign in
  • Sign up
  • Home
  • Find IT Jobs
  • Register CV
  • Career Advice
  • Contact us
  • Employers
    • Register as Employer
    • Pricing Plans
Sorry, that job is no longer available. Here are some results that may be similar to the job you were looking for.

32 jobs found

Email me jobs like this
Refine Search
Current Search
sre engineer cloud native automation resilience
Site Reliability Engineer (SRE)
Air Apps
The Role As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You will work at the intersection of software development and operations, implementing automation, monitoring, and performance optimization strategies to minimize downtime and improve system resilience. This is a fully onsite position, based at our office in Lisbon, where you will collaborate closely with cross functional teams in person and contribute to a dynamic and fast paced environment. We are open to support with relocation efforts. Responsibilities Design and implement scalable, reliable, and fault tolerant systems across cloud environments. Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK). Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. Optimize system performance, scalability, and incident response workflows to improve uptime. Work closely with development and DevOps teams to improve system design for reliability. Conduct root cause analysis (RCA) and implement preventative measures to minimize failures. Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies. Improve CI/CD pipelines to enhance deployment speed while maintaining stability. Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP). Participate in on call rotations to quickly address system failures and minimize downtime. Requirements Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering. Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud native architectures. Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic). Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi. Hands on experience with containerization and orchestration (Docker, Kubernetes, Helm). Strong Linux system administration and networking fundamentals. Experience with incident management, debugging, and root cause analysis. Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring. Knowledge of load balancing, failover strategies, and distributed systems. Understanding of security best practices, access control, and compliance requirements. Strong communication skills and the ability to collaborate with cross functional teams. What benefits are we offering? Apple hardware ecosystem for work. Annual Bonus Top tier Health and Life Insurance for peace of mind. Transportation Budget to support your commute needs. Coverflex benefits package for meal allowances, well being, and more. Childcare support. Air Conference - an opportunity to meet the team, collaborate, and grow together. Pension Fund to support your long term financial planning. Urban Sports Club membership to keep you active. Meals 100% free at the hub.
14/06/2026
Full time
The Role As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You will work at the intersection of software development and operations, implementing automation, monitoring, and performance optimization strategies to minimize downtime and improve system resilience. This is a fully onsite position, based at our office in Lisbon, where you will collaborate closely with cross functional teams in person and contribute to a dynamic and fast paced environment. We are open to support with relocation efforts. Responsibilities Design and implement scalable, reliable, and fault tolerant systems across cloud environments. Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK). Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. Optimize system performance, scalability, and incident response workflows to improve uptime. Work closely with development and DevOps teams to improve system design for reliability. Conduct root cause analysis (RCA) and implement preventative measures to minimize failures. Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies. Improve CI/CD pipelines to enhance deployment speed while maintaining stability. Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP). Participate in on call rotations to quickly address system failures and minimize downtime. Requirements Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering. Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud native architectures. Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic). Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi. Hands on experience with containerization and orchestration (Docker, Kubernetes, Helm). Strong Linux system administration and networking fundamentals. Experience with incident management, debugging, and root cause analysis. Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring. Knowledge of load balancing, failover strategies, and distributed systems. Understanding of security best practices, access control, and compliance requirements. Strong communication skills and the ability to collaborate with cross functional teams. What benefits are we offering? Apple hardware ecosystem for work. Annual Bonus Top tier Health and Life Insurance for peace of mind. Transportation Budget to support your commute needs. Coverflex benefits package for meal allowances, well being, and more. Childcare support. Air Conference - an opportunity to meet the team, collaborate, and grow together. Pension Fund to support your long term financial planning. Urban Sports Club membership to keep you active. Meals 100% free at the hub.
Senior Engineer- Platform Engineering
LGBT Great
Your Opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Responsibilities Design, build, and evolve enterprise platform services to support the IDP and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g. Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. Benefits Hybrid working and reasonable accommodations Generous holiday policies Excellent health and wellbeing benefits including corporate membership to Wellhub Paid volunteer time Professional development support (courses, tuition/qualification reimbursement) Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Annual bonus opportunity: position may be eligible to receive an annual discretionary bonus award from the profit pool. Competitive compensation, pension/retirement plans, and various health, wellbeing, and lifestyle benefits. Qualifications Bachelor's or master's in computer science, engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on experience with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from Azure DevOps to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g. Veracode), secrets management, and secure pipeline execution Expertise in artifact management (Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Exposure to AI enabled development (e.g. GitHub Copilot, automation workflows) Relevant cloud or Kubernetes certifications Potential for Growth Mentoring Leadership development programs Compliance and Policies Applicants should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities. Applicants' past political contributions or activity may impact eligibility for this position. You will be expected to understand regulatory obligations of the firm and abide by the regulated entity requirements and Janus Henderson Investors policies applicable for your role. Equal Opportunity Employer Janus Henderson Investors is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks.
14/06/2026
Full time
Your Opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Responsibilities Design, build, and evolve enterprise platform services to support the IDP and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g. Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. Benefits Hybrid working and reasonable accommodations Generous holiday policies Excellent health and wellbeing benefits including corporate membership to Wellhub Paid volunteer time Professional development support (courses, tuition/qualification reimbursement) Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Annual bonus opportunity: position may be eligible to receive an annual discretionary bonus award from the profit pool. Competitive compensation, pension/retirement plans, and various health, wellbeing, and lifestyle benefits. Qualifications Bachelor's or master's in computer science, engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on experience with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from Azure DevOps to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g. Veracode), secrets management, and secure pipeline execution Expertise in artifact management (Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Exposure to AI enabled development (e.g. GitHub Copilot, automation workflows) Relevant cloud or Kubernetes certifications Potential for Growth Mentoring Leadership development programs Compliance and Policies Applicants should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities. Applicants' past political contributions or activity may impact eligibility for this position. You will be expected to understand regulatory obligations of the firm and abide by the regulated entity requirements and Janus Henderson Investors policies applicable for your role. Equal Opportunity Employer Janus Henderson Investors is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks.
Senior Site Reliability Engineer
Career Choices Dewis Gyrfa Ltd Manchester, Lancashire
Join us as a Senior Site Reliability Engineer In this key role, you'll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development You'll need to have the flexibility to support the team by working shifts and weekends on rotation What you'll do As a Senior Site Reliability Engineer, you'll act as a hands on expert responsible for ensuring the reliability, availability, and performance of critical production platforms. You'll lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes. You'll also take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from. We'll expect you as well to design and operate highly resilient AWS based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks. You'll lead incident management, escalation, and 24/7 on call practices, including post incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams. Furthermore, you'll implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self healing, auto scaling, and failure recovery mechanisms using tools such as Karpenter. In addition to this, you'll be: Building secure and scalable networking and service communication such as Cilium Defining and operating observability platforms using Grafana, Prometheus, Loki, and Tempo Partnering with DevOps and engineering teams to ensure production readiness and operational excellence Leading complex troubleshooting across distributed systems and cloud native environments Developing reusable "golden paths," operational runbooks, and reliability patterns Ensuring platforms meet regulatory, security, and operational risk requirements Using data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements The skills you'll need We're looking for a highly experienced Site Reliability Engineer with a strong background in operating large scale, business critical platforms and a passion for reliability engineering. You must also have deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on call leadership. Moreover, you'll need to demonstrate advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction, as well as proficiency in Terraform, GitOps, and cloud automation practices. Hands on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD is also essential. In addition, you'll have to bring: A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium Experience scaling infrastructure using Karpenter and auto scaling strategies Expertise in observability tooling, including Grafana, Prometheus, Loki and Tempo A proven ability to troubleshoot and resolve complex, cross system production issues Experience operating in regulated or high security environments Strong leadership, mentoring, and stakeholder engagement capabilities The ability to balance reliability, risk, and delivery in a fast paced environment
14/06/2026
Full time
Join us as a Senior Site Reliability Engineer In this key role, you'll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development You'll need to have the flexibility to support the team by working shifts and weekends on rotation What you'll do As a Senior Site Reliability Engineer, you'll act as a hands on expert responsible for ensuring the reliability, availability, and performance of critical production platforms. You'll lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes. You'll also take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from. We'll expect you as well to design and operate highly resilient AWS based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks. You'll lead incident management, escalation, and 24/7 on call practices, including post incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams. Furthermore, you'll implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self healing, auto scaling, and failure recovery mechanisms using tools such as Karpenter. In addition to this, you'll be: Building secure and scalable networking and service communication such as Cilium Defining and operating observability platforms using Grafana, Prometheus, Loki, and Tempo Partnering with DevOps and engineering teams to ensure production readiness and operational excellence Leading complex troubleshooting across distributed systems and cloud native environments Developing reusable "golden paths," operational runbooks, and reliability patterns Ensuring platforms meet regulatory, security, and operational risk requirements Using data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements The skills you'll need We're looking for a highly experienced Site Reliability Engineer with a strong background in operating large scale, business critical platforms and a passion for reliability engineering. You must also have deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on call leadership. Moreover, you'll need to demonstrate advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction, as well as proficiency in Terraform, GitOps, and cloud automation practices. Hands on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD is also essential. In addition, you'll have to bring: A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium Experience scaling infrastructure using Karpenter and auto scaling strategies Expertise in observability tooling, including Grafana, Prometheus, Loki and Tempo A proven ability to troubleshoot and resolve complex, cross system production issues Experience operating in regulated or high security environments Strong leadership, mentoring, and stakeholder engagement capabilities The ability to balance reliability, risk, and delivery in a fast paced environment
DevOps Engineer - Security & Intelligence
Envitia Cheltenham, Gloucestershire
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
13/06/2026
Full time
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
DevOps Engineer - Security & Intelligence
Envitia
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
13/06/2026
Full time
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
DevOps Engineer - Security & Intelligence
Envitia Manchester, Lancashire
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
13/06/2026
Full time
Join the Growth Story at Envitia Envitia operates at the heart of the UK's most sensitive and high-profile operations. Our approach is entirely mission-focused, enabling customers to see more, decide faster, and act with confidence across complex domains. We deliver situational awareness and information advantage across C5ISR, fusing cyber, geospatial, and kinetic data into an operational picture. This transforms fragmented data into a shared understanding of the battlespace, delivering actionable insight when it matters most. Our solutions are secure-by-design, built from the ground up to protect sensitive information and ensure resilience in contested environments. We simplify complex heterogeneous data sources into a rich picture of fused intelligence, providing a digital backbone that underpins success. We operate as small teams delivering fast value, blending deep technical expertise with agile methods to deliver measurable outcomes. Every engagement focuses on capability, clarity, and confidence for those who protect the nation. A career with Envitia enables you to work with mission driven innovation. We're thrilled to announce that Envitia has been named one of the Sunday Times Top 100 Medium Sized Companies to Work For in 2025-a prestigious recognition that highlights our commitment to creating an outstanding workplace where innovation, collaboration, and personal growth thrive. The Role As a DevOps / Platform Engineer, you will design, build and operate secure cloud and platform capabilities within classified government environments. Working across AWS and cloud native technologies, you will play a key role in delivering reliable, scalable and secure platforms that underpin mission critical digital services. You'll operate within multi disciplinary Agile teams, collaborating closely with software engineers, test engineers, architects, SRE specialists and mission stakeholders to ensure robust, production grade delivery. This role is suited to someone who thrives in complex, secure environments and enjoys working across infrastructure automation, CI/CD and live service support, with a strong focus on reliability and continuous improvement. Typical responsibilities include: Building and maintaining secure DevSecOps pipelines to support continuous delivery Developing Infrastructure as Code and automated platform capabilities Supporting AWS based environments, including Kubernetes, OpenShift, EKS and ECS Implementing observability, monitoring, logging and alerting for live services Supporting SRE practices, cloud migration activities and production platform operations Job Responsibilities Design, implement and maintain secure CI/CD pipelines to support efficient, automated software delivery Develop and manage cloud native infrastructure using AWS and Infrastructure as Code (e.g. Terraform, CloudFormation, Ansible) Support containerised platforms and environments, including Kubernetes, OpenShift, EKS and ECS Embed security controls, quality gates and compliance checks into DevSecOps workflows Implement observability, monitoring, logging and alerting to support reliable live services Contribute to SRE practices, improving service reliability, performance and operational resilience Automate build, deployment and platform processes to reduce manual overhead and improve release confidence Collaborate within Agile teams, supporting delivery, troubleshooting and continuous improvement Provide technical guidance on platform engineering, cloud and deployment best practices Support production environments, including incident resolution, environment management and release readiness Skills Required Experience working within the UK Security & Intelligence, defence or other secure government environments Strong knowledge of cloud, platform engineering and DevSecOps practices, with a focus on secure, automated delivery Experience with AWS and container platforms such as Kubernetes, OpenShift, EKS or ECS Proficiency in Infrastructure as Code and automation tooling (e.g. Terraform, CloudFormation, Ansible or similar) Experience building and maintaining CI/CD pipelines, including automated build, test, deployment and release processes Familiarity with security integration, quality gates and compliance controls within delivery pipelines Experience implementing observability practices, including monitoring, logging, alerting and service metrics Understanding of SRE principles, including reliability engineering, incident management and continuous improvement Awareness of secure deployment, compliance requirements and operational security best practices Security Clearance Requirements The successful candidate must hold a current high level security clearance. Location It is anticipated the role will require up to 80% (4d/wk) onsite working at client locations - Cheltenham, London, Manchester (based on project requirements). What it's like to work at Envitia At Envitia, we believe that our greatest asset is our talented and dedicated team. We are committed to fostering a work environment where every employee feels valued, supported, and motivated to excel. As part of this commitment, we offer a comprehensive range of benefits designed to enhance both your professional and personal well being. Annual Leave: 25 days plus your birthday off. You will have the ability to buy and sell 5 days holiday to work around your needs. Private Healthcare Coverage: Our health plan is tailored to meet the diverse needs of our employees with additional levels for family if required. Training & Skills Development: Stay ahead in your career with ongoing training opportunities and skill development initiatives tailored to your evolving needs. Fitness Reimbursement: We encourage an active lifestyle. Our fitness reimbursement program helps you stay fit by covering a portion of your gym memberships or fitness related expenses. Life Assurance: Gain peace of mind with extensive life insurance coverage that ensures financial protection for you and your loved ones. Pension Contribution: Plan for your future with our pension options. We provide resources and support to help you build a secure financial foundation. Perkbox Subscription: Enjoy exclusive discounts on a variety of products and services. From technology to entertainment, we've partnered with various businesses to bring you special perks. Internal Reward Schemes: Be rewarded for your exceptional contributions through our employee recognition initiatives that celebrate your achievements. Community Engagement & Volunteer Opportunities: Contribute meaningful causes with company sponsored volunteer programs, fostering a sense of community and social responsibility. Inclusion at Envitia At Envitia, we celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants from all backgrounds and walks of life. We believe that our strength lies in our differences, and we are dedicated to fostering a workplace where everyone feels valued, respected, and empowered. We encourage applications from people of all abilities, ages, genders, sexual orientations, races, ethnicities, and religions. We strive to support a culture of inclusion, accessibility, and work life balance. If you require any accommodations during the application or interview process, please let us know.
Senior Enterprise Architect-Enterprise Architect/Segment lead-UK
Infosys Limited
Role - Senior Enterprise Architect Technology - Enterprise Architect/Segment lead Location - UK/Europe Business Unit - STG Compensation - Competitive (including bonus) Job Description AI-First Solutioning, Human + Agent Ways of Working & Large-Scale Modernisation Your role This is a senior strategic role within the Enterprise Strategic Architecture practice, focused on defining and delivering next generation digital transformation programs for leading global organisations. The successful candidate will bring together deep technology expertise and strong business acumen to help clients navigate complex, large scale modernisation initiatives. As AI becomes central to how enterprises transform, this role is expanding in scope: the architect must be equally comfortable designing cloud native platforms, structuring human and agent collaborative workflows, and embedding AI driven capabilities as first class components of the overall solution. You will collaborate closely with sales and delivery teams across the full programme lifecycle - from shaping solutions during presales through to governing technical quality in delivery. You will engage with CDOs, CTOs, and senior digital leaders at client organisations, contribute to industry thinking through published viewpoints and speaking engagements, and play an active role in identifying emerging technology opportunities that can be developed into compelling propositions for the market. Responsibilities Strategic Thinking - Candidate can articulate where AI agents replace human tasks vs. augment them in a $10M+ transformation context. Can draw a human+agent operating model for a business process - showing handoff logic, oversight points, and accountability chains. Understands that LLM inference is now a line item in programme budgets and can estimate it at ROM level for a given use case volume. Design Depth - Has personally designed or reviewed an agentic system in production - e.g. a multi step reasoning pipeline, an autonomous code review agent, or a RAG powered enterprise knowledge layer. Can explain prompt architecture decisions (system prompt structuring, context compression strategies, few shot vs. zero shot trade offs) and how these affect both quality and cost. Understands model selection trade offs - when to use frontier models vs. fine tuned smaller models vs. cached completions. Token Optimization Fluency - Has operationalised token efficiency at scale - structured prompt libraries, semantic caching, chunk sizing for RAG pipelines, output length controls, batching strategies. Can model cost per transaction for an AI enabled workflow and present that as part of a business case. Understands how token spend interacts with context window limits across model families (GPT 4o, Claude, Gemini) and can make architecture trade offs accordingly. Must Have Skills Agentic architecture design - Multi agent orchestration, tool use design, human in the loop checkpoints, agent failure modes and recovery Human + agent workflow design - Task decomposition across human and AI agents; escalation paths; accountability mapping in regulated environments Expertise in leveraging coding agents - GitHub Copilot, Claude, Devin.ai and similar - to accelerate software delivery within a structured, governed engineering lifecycle Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation Full stack application development - Architecture and delivery of modern full stack applications; proficiency across frontend frameworks, API layers, backend services, and data tiers at enterprise scale Modern CI/CD & delivery pipelines - Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation High scalability integration - Architecting event driven and streaming integration at scale using Apache Kafka and Kafka Streams; asynchronous messaging patterns, schema registries, and real time data pipelines across distributed systems NoSQL & enterprise data platforms - Design of polyglot persistence architectures spanning NoSQL stores (MongoDB, Cassandra, DynamoDB), enterprise caching layers (Redis, Hazelcast, Memcached) and search platforms (Elasticsearch, OpenSearch) Hyperscaler resilience patterns - Building highly available, fault tolerant solutions on AWS, Azure and GCP - multi region active/active, chaos engineering, SRE practices, availability zone failover, and disaster recovery at cloud scale Token economics & LLM costing - Prompt compression, context window sizing, model tier selection, cost per transaction modelling at enterprise scale AI TCO & commercial modelling - Inference cost projections, build vs buy for foundation models, ROI framing for AI augmented delivery Digital transformation leadership - AI native programme design spanning cloud, integration, agentic capability layers and responsible AI governance Enterprise integration patterns - Streaming, API, event driven and real time patterns extending to RAG, vector stores, embedding services and LLM APIs as first class integration nodes Chief Architect leadership - Governing cross domain architect teams while managing AI risk, hallucination mitigation and responsible AI policy at programme level Multi cloud architecture (15+ yrs) - Hybrid IaaS/PaaS, multi az/region, IaC automation first, DevSecOps, K8s orchestration Influencing & stakeholder leadership - Builds and sustains networks across organisational boundaries through credibility and influence rather than authority; aligns diverse stakeholders - engineering, business, and executive - around a shared technology direction and drives teams to deliver outcomes in complex, matrixed environments CXO communication - Articulates at the right level of abstraction and detail from developer to board level Preferred Should be an excellent planner when it comes to release planning and other delivery planning. Should have excellent problem solving skills. Responsible for coaching and mentoring team members. BFSI/FS domain experience. Personal High analytical skills. High customer orientation. High quality awareness. Equal Opportunity Employer All aspects of employment at Infosys are based on merit, competence and performance. We are committed to embracing diversity and creating an inclusive environment for all employees. Infosys is proud to be an equal opportunity employer.
13/06/2026
Full time
Role - Senior Enterprise Architect Technology - Enterprise Architect/Segment lead Location - UK/Europe Business Unit - STG Compensation - Competitive (including bonus) Job Description AI-First Solutioning, Human + Agent Ways of Working & Large-Scale Modernisation Your role This is a senior strategic role within the Enterprise Strategic Architecture practice, focused on defining and delivering next generation digital transformation programs for leading global organisations. The successful candidate will bring together deep technology expertise and strong business acumen to help clients navigate complex, large scale modernisation initiatives. As AI becomes central to how enterprises transform, this role is expanding in scope: the architect must be equally comfortable designing cloud native platforms, structuring human and agent collaborative workflows, and embedding AI driven capabilities as first class components of the overall solution. You will collaborate closely with sales and delivery teams across the full programme lifecycle - from shaping solutions during presales through to governing technical quality in delivery. You will engage with CDOs, CTOs, and senior digital leaders at client organisations, contribute to industry thinking through published viewpoints and speaking engagements, and play an active role in identifying emerging technology opportunities that can be developed into compelling propositions for the market. Responsibilities Strategic Thinking - Candidate can articulate where AI agents replace human tasks vs. augment them in a $10M+ transformation context. Can draw a human+agent operating model for a business process - showing handoff logic, oversight points, and accountability chains. Understands that LLM inference is now a line item in programme budgets and can estimate it at ROM level for a given use case volume. Design Depth - Has personally designed or reviewed an agentic system in production - e.g. a multi step reasoning pipeline, an autonomous code review agent, or a RAG powered enterprise knowledge layer. Can explain prompt architecture decisions (system prompt structuring, context compression strategies, few shot vs. zero shot trade offs) and how these affect both quality and cost. Understands model selection trade offs - when to use frontier models vs. fine tuned smaller models vs. cached completions. Token Optimization Fluency - Has operationalised token efficiency at scale - structured prompt libraries, semantic caching, chunk sizing for RAG pipelines, output length controls, batching strategies. Can model cost per transaction for an AI enabled workflow and present that as part of a business case. Understands how token spend interacts with context window limits across model families (GPT 4o, Claude, Gemini) and can make architecture trade offs accordingly. Must Have Skills Agentic architecture design - Multi agent orchestration, tool use design, human in the loop checkpoints, agent failure modes and recovery Human + agent workflow design - Task decomposition across human and AI agents; escalation paths; accountability mapping in regulated environments Expertise in leveraging coding agents - GitHub Copilot, Claude, Devin.ai and similar - to accelerate software delivery within a structured, governed engineering lifecycle Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation Full stack application development - Architecture and delivery of modern full stack applications; proficiency across frontend frameworks, API layers, backend services, and data tiers at enterprise scale Modern CI/CD & delivery pipelines - Design and governance of automated delivery pipelines using tools such as Harness, GitHub Actions, ArgoCD and Tekton; trunk based development, progressive delivery and release automation High scalability integration - Architecting event driven and streaming integration at scale using Apache Kafka and Kafka Streams; asynchronous messaging patterns, schema registries, and real time data pipelines across distributed systems NoSQL & enterprise data platforms - Design of polyglot persistence architectures spanning NoSQL stores (MongoDB, Cassandra, DynamoDB), enterprise caching layers (Redis, Hazelcast, Memcached) and search platforms (Elasticsearch, OpenSearch) Hyperscaler resilience patterns - Building highly available, fault tolerant solutions on AWS, Azure and GCP - multi region active/active, chaos engineering, SRE practices, availability zone failover, and disaster recovery at cloud scale Token economics & LLM costing - Prompt compression, context window sizing, model tier selection, cost per transaction modelling at enterprise scale AI TCO & commercial modelling - Inference cost projections, build vs buy for foundation models, ROI framing for AI augmented delivery Digital transformation leadership - AI native programme design spanning cloud, integration, agentic capability layers and responsible AI governance Enterprise integration patterns - Streaming, API, event driven and real time patterns extending to RAG, vector stores, embedding services and LLM APIs as first class integration nodes Chief Architect leadership - Governing cross domain architect teams while managing AI risk, hallucination mitigation and responsible AI policy at programme level Multi cloud architecture (15+ yrs) - Hybrid IaaS/PaaS, multi az/region, IaC automation first, DevSecOps, K8s orchestration Influencing & stakeholder leadership - Builds and sustains networks across organisational boundaries through credibility and influence rather than authority; aligns diverse stakeholders - engineering, business, and executive - around a shared technology direction and drives teams to deliver outcomes in complex, matrixed environments CXO communication - Articulates at the right level of abstraction and detail from developer to board level Preferred Should be an excellent planner when it comes to release planning and other delivery planning. Should have excellent problem solving skills. Responsible for coaching and mentoring team members. BFSI/FS domain experience. Personal High analytical skills. High customer orientation. High quality awareness. Equal Opportunity Employer All aspects of employment at Infosys are based on merit, competence and performance. We are committed to embracing diversity and creating an inclusive environment for all employees. Infosys is proud to be an equal opportunity employer.
Site Reliability Engineer
Thought Machine
Thought Machine's mission is bold - to properly and permanently rid the world's banks of legacy technology. To achieve this, we have developed the foundations of modern banking through core and payments technology which run natively in the cloud. What we are attempting is hard and means we need great people working together to build great technology. We have grown rapidly in the past few years - growing our team to more than 550 individuals across offices in London, New York, Singapore and Sydney. We have raised more than $500m in funding and are now valued at $2.7bn. Our investors include Molten Ventures, Eurazeo, Intesa Sanpaolo, Temasek, Nyca Partners, JPMorgan Chase Strategic Investments, Standard Chartered Ventures, and more. We have created a culture that enables our team to produce the best work in the industry while ensuring we have fun along the way. We're regularly cited as having a fantastic workplace culture and have been recognised by Sifted magazine as having one of the highest Glassdoor ratings for a UK fintech company and the industry's most generous employee share package. Named one of the world's most innovative fintechs by Global Finance Magazine, we were also recognised by the Financial Times as one of Europe's fastest-growing companies for two consecutive years-and a UK Best Employer for 2026. Thought Machine's Site Reliability Engineers are the guardians of mission-critical systems for the world's most influential financial institutions. As a member of our elite, globally distributed team, you'll be entrusted with running and maintaining the robust production infrastructure that powers our customers' cutting-edge Core Banking and Payments platforms. This is an opportunity to make a tangible impact on the global financial landscape while collaborating with brilliant minds to solve complex engineering challenges. This role will be part of the Site Reliability Engineering team at Thought Machine HQ in London. The team is deeply involved in tackling the technical challenges of executing Thought Machine's growth ambitions - expect to be working with senior stakeholders in the organisation, our customers, and working on programmes and initiatives that are critical to the success of the company. As an SRE at Thought Machine, you will be responsible for: Supporting the product engineering teams in building highly fault-tolerant, scalable applications by participating in design discussions, engaging in RFCs and code reviews. Contributing to the execution of department strategies such as implementing disaster recovery, backup, redundancy, and capacity planning activities. Participating in a global on-call rotation responsible for identifying and fixing bottlenecks in SaaS customer environments. Regular maintenance of production systems that host Vault products. Contributing to the evolution of our SaaS products by building features that foster exceptional reliability and an unparalleled user experience. Implementing and testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform. Maintaining high-quality written documentation of assets, processes and runbooks that are used by the team in their day-to-day operations. Collaborating effectively with team members, actively participating in knowledge sharing, and continuously growing your own technical understanding of Vault Products. What we're looking for: You have experience successfully delivering engineering tasks and projects with a focus on reliability and scalability. You possess a good understanding of design patterns relevant to hosting and networking architectures. You proactively champion product development, driven by a desire to build truly exceptional products, not just solve immediate challenges. You have a strong background working in either Python, Golang or Java, having used one of these programming languages to build production level software. You have experience working with Kubernetes or other container orchestration systems. You have experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible. You have a good understanding of one or more of the following areas: Database Administration, Networking, Observability Tools (such as Prometheus, Jaeger) or automation infrastructure. You have solid experience working with either GCP or AWS. Benefits: Highly competitive salary Pension plan (match up to 5%) Life insurance - three times annual salary Competitive maternity (six months fully paid) and paternity leave (four weeks fully paid) Shared parental leave (matched to our maternity leave for the same point in time) 25 days holiday and bank holidays Flexible working hours Cycle-to-work scheme Electric car scheme Season ticket loan Access to outstanding learning materials and courses Sports and hobby clubs, subsidised by Thought Machine All the latest tech you need Start the day properly with fresh fruit and cereals Huge range of healthy (and not-so-healthy) snacks, smoothies and drinks A talented and experienced team as your colleagues An environment where we encourage learning and progress Two charity days a year Weekly food pop-up We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those with different abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.
13/06/2026
Full time
Thought Machine's mission is bold - to properly and permanently rid the world's banks of legacy technology. To achieve this, we have developed the foundations of modern banking through core and payments technology which run natively in the cloud. What we are attempting is hard and means we need great people working together to build great technology. We have grown rapidly in the past few years - growing our team to more than 550 individuals across offices in London, New York, Singapore and Sydney. We have raised more than $500m in funding and are now valued at $2.7bn. Our investors include Molten Ventures, Eurazeo, Intesa Sanpaolo, Temasek, Nyca Partners, JPMorgan Chase Strategic Investments, Standard Chartered Ventures, and more. We have created a culture that enables our team to produce the best work in the industry while ensuring we have fun along the way. We're regularly cited as having a fantastic workplace culture and have been recognised by Sifted magazine as having one of the highest Glassdoor ratings for a UK fintech company and the industry's most generous employee share package. Named one of the world's most innovative fintechs by Global Finance Magazine, we were also recognised by the Financial Times as one of Europe's fastest-growing companies for two consecutive years-and a UK Best Employer for 2026. Thought Machine's Site Reliability Engineers are the guardians of mission-critical systems for the world's most influential financial institutions. As a member of our elite, globally distributed team, you'll be entrusted with running and maintaining the robust production infrastructure that powers our customers' cutting-edge Core Banking and Payments platforms. This is an opportunity to make a tangible impact on the global financial landscape while collaborating with brilliant minds to solve complex engineering challenges. This role will be part of the Site Reliability Engineering team at Thought Machine HQ in London. The team is deeply involved in tackling the technical challenges of executing Thought Machine's growth ambitions - expect to be working with senior stakeholders in the organisation, our customers, and working on programmes and initiatives that are critical to the success of the company. As an SRE at Thought Machine, you will be responsible for: Supporting the product engineering teams in building highly fault-tolerant, scalable applications by participating in design discussions, engaging in RFCs and code reviews. Contributing to the execution of department strategies such as implementing disaster recovery, backup, redundancy, and capacity planning activities. Participating in a global on-call rotation responsible for identifying and fixing bottlenecks in SaaS customer environments. Regular maintenance of production systems that host Vault products. Contributing to the evolution of our SaaS products by building features that foster exceptional reliability and an unparalleled user experience. Implementing and testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform. Maintaining high-quality written documentation of assets, processes and runbooks that are used by the team in their day-to-day operations. Collaborating effectively with team members, actively participating in knowledge sharing, and continuously growing your own technical understanding of Vault Products. What we're looking for: You have experience successfully delivering engineering tasks and projects with a focus on reliability and scalability. You possess a good understanding of design patterns relevant to hosting and networking architectures. You proactively champion product development, driven by a desire to build truly exceptional products, not just solve immediate challenges. You have a strong background working in either Python, Golang or Java, having used one of these programming languages to build production level software. You have experience working with Kubernetes or other container orchestration systems. You have experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible. You have a good understanding of one or more of the following areas: Database Administration, Networking, Observability Tools (such as Prometheus, Jaeger) or automation infrastructure. You have solid experience working with either GCP or AWS. Benefits: Highly competitive salary Pension plan (match up to 5%) Life insurance - three times annual salary Competitive maternity (six months fully paid) and paternity leave (four weeks fully paid) Shared parental leave (matched to our maternity leave for the same point in time) 25 days holiday and bank holidays Flexible working hours Cycle-to-work scheme Electric car scheme Season ticket loan Access to outstanding learning materials and courses Sports and hobby clubs, subsidised by Thought Machine All the latest tech you need Start the day properly with fresh fruit and cereals Huge range of healthy (and not-so-healthy) snacks, smoothies and drinks A talented and experienced team as your colleagues An environment where we encourage learning and progress Two charity days a year Weekly food pop-up We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those with different abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.
Senior Engineer- Platform Engineering
Janus Henderson U.S.
Requisition ID31464-Posted -London-Janus Henderson A career at Janus Henderson is more than a job, it's about investing in a brighter future together. Our Mission at Janus Henderson is to help clients define and achieve superior financial outcomes through differentiated insights, disciplined investments, and world class service. We will do this by protecting and growing our core business, amplifying our strengths and diversifying where we have the right. Our Values are key to driving our success, and are at the heart of everything we do: Clients Come First - Always Execution Supersedes Intention Together We Win Diversity Improves Results Truth Builds Trust If our mission, values, and purpose align with your own, we would love to hear from you! Your opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Design, build, and evolve enterprise platform services to support the Internal Developer Platform (IDP) and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g., Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. What to expect when you join our firm Hybrid working and reasonable accommodations Excellent Health and Wellbeing benefits including corporate membership to Wellhub Paid volunteer time to step away from your desk and into the community Support to grow through professional development courses, tuition/qualification reimbursement and more Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Must have skills Bachelor's or master's in computer science, Engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Strong experience in DevOps, Platform Engineering, or Infrastructure Engineering roles within enterprise environments Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from ADO to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g., Veracode), secrets management, and secure pipeline execution Expertise in artifact management (e.g., Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong knowledge of containerisation and cloud native technologies (Docker, Kubernetes) Experience with observability and monitoring frameworks (e.g., Azure Monitor, Prometheus, Grafana) and understanding of SRE practices (SLIs/SLOs, reliability engineering) Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Exposure to AI enabled developer tooling (e.g., GitHub Copilot, intelligent automation) and improving developer experience Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Experience building or contributing to an Internal Developer Platform (IDP) Exposure to AI enabled development (e.g., GitHub Copilot, automation workflows) Knowledge of FinOps, cost optimisation, and cloud governance Relevant cloud or Kubernetes certifications Supervisory responsibilities No Potential for growth Regular training Continuing education courses Cross functional collaboration You will be expected to understand the regulatory obligations of the firm and abide by the regulated entity requirements and JHI policies applicable for your role. At Janus Henderson Investors we're committed to an inclusive and supportive environment. We believe diversity improves results and we welcome applications from candidates from all backgrounds. Don't worry if you don't think you tick every box, we still want to hear from you! We understand everyone has different commitments and while we can't accommodate every flexible working request, we're happy to be asked about work flexibility and our hybrid working environment. If you need any reasonable accommodations during our recruitment process, please get in touch and let us know at . Annual Bonus Opportunity: Position may be eligible to receive an annual discretionary bonus award from the profit pool. The profit pool is funded based on Company profits. Individual bonuses are determined based on Company, department, team and individual performance. Benefits: Janus Henderson is committed to offering a comprehensive total rewards package to eligible employees that includes; competitive compensation, pension/retirement plans, and various health, wellbeing and lifestyle benefits. To learn more about our offerings please visit the Why Join Us section on the career pagehere . Janus Henderson Investors is an equal opportunity employer.All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks. Janus Henderson (including its subsidiaries) will not maintain existing or sponsor new industry registrations or licenses where not supported by an employee's job functions (as determined by Janus Henderson at its sole discretion). You should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities . click apply for full job details
12/06/2026
Full time
Requisition ID31464-Posted -London-Janus Henderson A career at Janus Henderson is more than a job, it's about investing in a brighter future together. Our Mission at Janus Henderson is to help clients define and achieve superior financial outcomes through differentiated insights, disciplined investments, and world class service. We will do this by protecting and growing our core business, amplifying our strengths and diversifying where we have the right. Our Values are key to driving our success, and are at the heart of everything we do: Clients Come First - Always Execution Supersedes Intention Together We Win Diversity Improves Results Truth Builds Trust If our mission, values, and purpose align with your own, we would love to hear from you! Your opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Design, build, and evolve enterprise platform services to support the Internal Developer Platform (IDP) and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g., Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. What to expect when you join our firm Hybrid working and reasonable accommodations Excellent Health and Wellbeing benefits including corporate membership to Wellhub Paid volunteer time to step away from your desk and into the community Support to grow through professional development courses, tuition/qualification reimbursement and more Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Must have skills Bachelor's or master's in computer science, Engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Strong experience in DevOps, Platform Engineering, or Infrastructure Engineering roles within enterprise environments Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from ADO to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g., Veracode), secrets management, and secure pipeline execution Expertise in artifact management (e.g., Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong knowledge of containerisation and cloud native technologies (Docker, Kubernetes) Experience with observability and monitoring frameworks (e.g., Azure Monitor, Prometheus, Grafana) and understanding of SRE practices (SLIs/SLOs, reliability engineering) Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Exposure to AI enabled developer tooling (e.g., GitHub Copilot, intelligent automation) and improving developer experience Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Experience building or contributing to an Internal Developer Platform (IDP) Exposure to AI enabled development (e.g., GitHub Copilot, automation workflows) Knowledge of FinOps, cost optimisation, and cloud governance Relevant cloud or Kubernetes certifications Supervisory responsibilities No Potential for growth Regular training Continuing education courses Cross functional collaboration You will be expected to understand the regulatory obligations of the firm and abide by the regulated entity requirements and JHI policies applicable for your role. At Janus Henderson Investors we're committed to an inclusive and supportive environment. We believe diversity improves results and we welcome applications from candidates from all backgrounds. Don't worry if you don't think you tick every box, we still want to hear from you! We understand everyone has different commitments and while we can't accommodate every flexible working request, we're happy to be asked about work flexibility and our hybrid working environment. If you need any reasonable accommodations during our recruitment process, please get in touch and let us know at . Annual Bonus Opportunity: Position may be eligible to receive an annual discretionary bonus award from the profit pool. The profit pool is funded based on Company profits. Individual bonuses are determined based on Company, department, team and individual performance. Benefits: Janus Henderson is committed to offering a comprehensive total rewards package to eligible employees that includes; competitive compensation, pension/retirement plans, and various health, wellbeing and lifestyle benefits. To learn more about our offerings please visit the Why Join Us section on the career pagehere . Janus Henderson Investors is an equal opportunity employer.All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks. Janus Henderson (including its subsidiaries) will not maintain existing or sponsor new industry registrations or licenses where not supported by an employee's job functions (as determined by Janus Henderson at its sole discretion). You should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities . click apply for full job details
Oscar Technology
AWS Platform Architect
Oscar Technology
Platform Architect Hybrid £70,000-£100,000 About the Role: We're partnering with a growing SaaS business to hire a senior Platform Architect to own the design, security, reliability, and operational management of their AWS platform and internal IT function. This is a hands-on leadership role in a lean organisation where you'll shape cloud architecture, modernise a legacy platform into a cloud-native environment, and provide senior oversight across platform engineering, security, SRE, CI/CD, and operational IT. Key Responsibilities: Own the AWS platform architecture and modernisation roadmap, including migration from a Java monolith to microservices on EKS. Define standards for containers, runtime environments, observability, tenancy, security, and infrastructure automation. Lead SRE practices including SLI/SLOs, incident management, DR/BCP planning, post-mortems, and operational resilience. Own platform security, secure SDLC, CI/CD pipelines, IaC, and software supply chain governance. Drive developer productivity through automation, self-service tooling, and platform standardisation. Provide senior oversight of IT operations including service desk governance, endpoint management, onboarding/offboarding, patching, ITAM, and MSP/vendor management. Act as a senior escalation point for critical incidents, outages, and operational issues. About You: Experience within a platform, infrastructure, or software engineering within SaaS environments. Strong AWS expertise including EKS, IAM, networking, KMS, RDS, and multi-account architecture. Hands-on Kubernetes, CI/CD, Terraform, and cloud security experience. Strong understanding of SRE, observability, incident response, and disaster recovery. Experience operating within regulated environments such as ISO 27001, SOC 2, or GxP. Comfortable balancing strategic leadership with hands-on operational delivery. AWS Solutions Architect - Professional certification required. CKA or CKS certification highly desirable. Platform Architect Hybrid £70,000-£100,000 Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.
11/06/2026
Full time
Platform Architect Hybrid £70,000-£100,000 About the Role: We're partnering with a growing SaaS business to hire a senior Platform Architect to own the design, security, reliability, and operational management of their AWS platform and internal IT function. This is a hands-on leadership role in a lean organisation where you'll shape cloud architecture, modernise a legacy platform into a cloud-native environment, and provide senior oversight across platform engineering, security, SRE, CI/CD, and operational IT. Key Responsibilities: Own the AWS platform architecture and modernisation roadmap, including migration from a Java monolith to microservices on EKS. Define standards for containers, runtime environments, observability, tenancy, security, and infrastructure automation. Lead SRE practices including SLI/SLOs, incident management, DR/BCP planning, post-mortems, and operational resilience. Own platform security, secure SDLC, CI/CD pipelines, IaC, and software supply chain governance. Drive developer productivity through automation, self-service tooling, and platform standardisation. Provide senior oversight of IT operations including service desk governance, endpoint management, onboarding/offboarding, patching, ITAM, and MSP/vendor management. Act as a senior escalation point for critical incidents, outages, and operational issues. About You: Experience within a platform, infrastructure, or software engineering within SaaS environments. Strong AWS expertise including EKS, IAM, networking, KMS, RDS, and multi-account architecture. Hands-on Kubernetes, CI/CD, Terraform, and cloud security experience. Strong understanding of SRE, observability, incident response, and disaster recovery. Experience operating within regulated environments such as ISO 27001, SOC 2, or GxP. Comfortable balancing strategic leadership with hands-on operational delivery. AWS Solutions Architect - Professional certification required. CKA or CKS certification highly desirable. Platform Architect Hybrid £70,000-£100,000 Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.
La Fosse Associates
Senior AWS Cloud Platform Engineer / DevOps Engineer (SC or DV Cleared)
La Fosse Associates
Senior AWS Cloud Platform Engineer / DevOps Engineer (SC or DV Cleared) Location: Hybrid UK Security Clearance: Active SC or DV Clearance required Contract: Initial 6 months with extension potential Day Rate: Competitive (Inside IR35) Overview We are supporting a major national technology programme seeking an experienced AWS Cloud Platform Engineer to join a modern engineering team delivering cloud-native applications, platform services and automation capability within a secure environment. This role sits within a highly collaborative engineering function working alongside Software Engineers, Platform Engineers and Automation Specialists. The successful candidate will help improve software delivery velocity, deployment automation, platform reliability and engineering quality across AWS-hosted services and cloud-native platforms. This is not a traditional testing role. We are looking for an engineer with strong AWS, DevOps and platform engineering capability who understands modern automation, release engineering and cloud-native delivery practices. Key Responsibilities Design, build and maintain AWS cloud infrastructure and platform services. Develop and enhance Infrastructure as Code solutions using Terraform. Support containerised workloads and Kubernetes-based platforms. Build and optimise CI/CD pipelines to improve deployment speed and reliability. Work closely with software engineering teams to improve delivery workflows and release processes. Implement automation solutions that improve platform resilience, operational efficiency and deployment quality. Support cloud-native application delivery across microservices and distributed systems architectures. Contribute to DevOps best practice, platform engineering standards and automation strategies. Assist with cloud integration, deployment validation and release assurance activities. Essential Skills & Experience AWS Cloud Engineering Strong hands on AWS experience across modern cloud-native environments. Experience working with AWS services such as Lambda, ECS/EKS, S3, API Gateway, CloudWatch, SNS and SQS. AWS certification strongly preferred (Solutions Architect, DevOps Engineer, SysOps or Developer). Infrastructure as Code Strong Terraform experience. Experience building and maintaining cloud infrastructure through Infrastructure as Code. Exposure to CloudFormation or AWS CDK beneficial. DevOps & CI/CD Experience designing and maintaining CI/CD pipelines. GitLab CI, GitHub Actions, Jenkins, Azure DevOps or similar tooling. Strong understanding of deployment automation and release engineering practices. Containers & Platform Engineering Kubernetes experience (EKS preferred). Docker and containerised application delivery. Experience supporting cloud-native platforms and microservices environments. Development & Automation Python experience preferred. Experience building automation tooling and scripts. Understanding of REST APIs, integration patterns and distributed systems. Desirable Experience AWS DevOps Engineer Professional certification. AWS Solutions Architect certification. Experience working within Government, Defence, Law Enforcement or other highly regulated environments. Experience supporting platform engineering or SRE functions. Experience integrating automated validation and quality controls into CI/CD pipelines. Exposure to modern AI, machine learning or cloud-hosted AI services. Ideal Backgrounds We would particularly like to hear from candidates currently working as: AWS DevOps Engineer Cloud Platform Engineer Platform Engineer Site Reliability Engineer (SRE) Cloud Infrastructure Engineer AWS Cloud Engineer DevOps Platform Engineer Kubernetes Engineer Cloud Automation Engineer What Success Looks Like The successful candidate will help drive cloud-native engineering excellence, improving deployment automation, platform reliability and software delivery performance across a modern AWS environment. They will be comfortable operating across platform engineering, DevOps, cloud infrastructure and automation disciplines, working closely with software engineering teams to accelerate delivery and improve operational resilience.
09/06/2026
Full time
Senior AWS Cloud Platform Engineer / DevOps Engineer (SC or DV Cleared) Location: Hybrid UK Security Clearance: Active SC or DV Clearance required Contract: Initial 6 months with extension potential Day Rate: Competitive (Inside IR35) Overview We are supporting a major national technology programme seeking an experienced AWS Cloud Platform Engineer to join a modern engineering team delivering cloud-native applications, platform services and automation capability within a secure environment. This role sits within a highly collaborative engineering function working alongside Software Engineers, Platform Engineers and Automation Specialists. The successful candidate will help improve software delivery velocity, deployment automation, platform reliability and engineering quality across AWS-hosted services and cloud-native platforms. This is not a traditional testing role. We are looking for an engineer with strong AWS, DevOps and platform engineering capability who understands modern automation, release engineering and cloud-native delivery practices. Key Responsibilities Design, build and maintain AWS cloud infrastructure and platform services. Develop and enhance Infrastructure as Code solutions using Terraform. Support containerised workloads and Kubernetes-based platforms. Build and optimise CI/CD pipelines to improve deployment speed and reliability. Work closely with software engineering teams to improve delivery workflows and release processes. Implement automation solutions that improve platform resilience, operational efficiency and deployment quality. Support cloud-native application delivery across microservices and distributed systems architectures. Contribute to DevOps best practice, platform engineering standards and automation strategies. Assist with cloud integration, deployment validation and release assurance activities. Essential Skills & Experience AWS Cloud Engineering Strong hands on AWS experience across modern cloud-native environments. Experience working with AWS services such as Lambda, ECS/EKS, S3, API Gateway, CloudWatch, SNS and SQS. AWS certification strongly preferred (Solutions Architect, DevOps Engineer, SysOps or Developer). Infrastructure as Code Strong Terraform experience. Experience building and maintaining cloud infrastructure through Infrastructure as Code. Exposure to CloudFormation or AWS CDK beneficial. DevOps & CI/CD Experience designing and maintaining CI/CD pipelines. GitLab CI, GitHub Actions, Jenkins, Azure DevOps or similar tooling. Strong understanding of deployment automation and release engineering practices. Containers & Platform Engineering Kubernetes experience (EKS preferred). Docker and containerised application delivery. Experience supporting cloud-native platforms and microservices environments. Development & Automation Python experience preferred. Experience building automation tooling and scripts. Understanding of REST APIs, integration patterns and distributed systems. Desirable Experience AWS DevOps Engineer Professional certification. AWS Solutions Architect certification. Experience working within Government, Defence, Law Enforcement or other highly regulated environments. Experience supporting platform engineering or SRE functions. Experience integrating automated validation and quality controls into CI/CD pipelines. Exposure to modern AI, machine learning or cloud-hosted AI services. Ideal Backgrounds We would particularly like to hear from candidates currently working as: AWS DevOps Engineer Cloud Platform Engineer Platform Engineer Site Reliability Engineer (SRE) Cloud Infrastructure Engineer AWS Cloud Engineer DevOps Platform Engineer Kubernetes Engineer Cloud Automation Engineer What Success Looks Like The successful candidate will help drive cloud-native engineering excellence, improving deployment automation, platform reliability and software delivery performance across a modern AWS environment. They will be comfortable operating across platform engineering, DevOps, cloud infrastructure and automation disciplines, working closely with software engineering teams to accelerate delivery and improve operational resilience.
The MRJ Group
Platform Engineering Lead
The MRJ Group City, Liverpool
We are seeking an experienced Platform Engineering Lead (contract) to support the evolution of our cloud and application delivery strategy as we modernise our deployment, integration, and operational capabilities. This role will focus on building scalable, secure, and automated platform services that improve developer productivity, deployment consistency, operational resilience, and software delivery across engineering teams. The successful candidate will play a key role in transitioning from traditional infrastructure deployments toward modern cloud native and platform engineering practices. Security will be a core component of the role, ensuring that platform services, deployment pipelines, and engineering standards are designed with secure by default principles. The role will help embed DevSecOps practices across the software delivery lifecycle, ensuring security controls are automated, repeatable, and integrated into engineering workflows rather than applied retrospectively. The role will also contribute to the development of platform security standards and recommendations, supporting secure application delivery, cloud governance, identity management, secrets handling, and compliance requirements across engineering environments. In addition, the role will support secure collaboration models for internal engineering teams, third party development partners, and external data providers, ensuring appropriate identity management, access controls, environment segregation, and secure integration practices are implemented across platform services and delivery pipelines. Key Responsibilities Design and maintain CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tooling. Implement automated multi stage deployment pipelines across development, test, UAT, and production environments. Support blue/green and phased deployment strategies. Develop Infrastructure as Code solutions using Terraform and/or Bicep. Build reusable infrastructure templates and standardised deployment patterns. Support cloud native services and event driven architectures using Azure technologies. Implement security controls within CI/CD pipelines including code scanning, dependency validation, secrets detection, and policy enforcement. Support secure identity, access control, and secrets management practices across cloud platforms and deployment pipelines. Support secure collaboration and integration with internal teams, third party development partners, and external data providers. Contribute to platform security recommendations, engineering governance, and secure deployment standards. Support security testing and validation activities within pre production and UAT environments. Contribute to platform architecture standards, engineering governance, and future technology strategy. Implement monitoring, logging, alerting, and operational resilience practices aligned to SRE principles. Support operational stability, incident management, and platform optimisation initiatives. Technical Skills & Experience Essential: Azure cloud platform experience, CI/CD pipeline engineering (Azure DevOps, GitHub Actions), Infrastructure as Code (Terraform and/or Bicep), Automation and scripting (PowerShell, Bash, Python), Experience with Azure Service Bus and serverless technologies, Security and DevSecOps practices, Experience implementing security controls within CI/CD pipelines, Identity and access management experience, Monitoring, logging, and operational support experience, Experience working within cloud native or platform engineering environments. Desirable GitOps practices SRE / reliability engineering experience Experience defining platform standards and reference architectures Knowledge of security tooling including SAST/DAST, vulnerability scanning, and policy as code frameworks.
09/06/2026
Full time
We are seeking an experienced Platform Engineering Lead (contract) to support the evolution of our cloud and application delivery strategy as we modernise our deployment, integration, and operational capabilities. This role will focus on building scalable, secure, and automated platform services that improve developer productivity, deployment consistency, operational resilience, and software delivery across engineering teams. The successful candidate will play a key role in transitioning from traditional infrastructure deployments toward modern cloud native and platform engineering practices. Security will be a core component of the role, ensuring that platform services, deployment pipelines, and engineering standards are designed with secure by default principles. The role will help embed DevSecOps practices across the software delivery lifecycle, ensuring security controls are automated, repeatable, and integrated into engineering workflows rather than applied retrospectively. The role will also contribute to the development of platform security standards and recommendations, supporting secure application delivery, cloud governance, identity management, secrets handling, and compliance requirements across engineering environments. In addition, the role will support secure collaboration models for internal engineering teams, third party development partners, and external data providers, ensuring appropriate identity management, access controls, environment segregation, and secure integration practices are implemented across platform services and delivery pipelines. Key Responsibilities Design and maintain CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tooling. Implement automated multi stage deployment pipelines across development, test, UAT, and production environments. Support blue/green and phased deployment strategies. Develop Infrastructure as Code solutions using Terraform and/or Bicep. Build reusable infrastructure templates and standardised deployment patterns. Support cloud native services and event driven architectures using Azure technologies. Implement security controls within CI/CD pipelines including code scanning, dependency validation, secrets detection, and policy enforcement. Support secure identity, access control, and secrets management practices across cloud platforms and deployment pipelines. Support secure collaboration and integration with internal teams, third party development partners, and external data providers. Contribute to platform security recommendations, engineering governance, and secure deployment standards. Support security testing and validation activities within pre production and UAT environments. Contribute to platform architecture standards, engineering governance, and future technology strategy. Implement monitoring, logging, alerting, and operational resilience practices aligned to SRE principles. Support operational stability, incident management, and platform optimisation initiatives. Technical Skills & Experience Essential: Azure cloud platform experience, CI/CD pipeline engineering (Azure DevOps, GitHub Actions), Infrastructure as Code (Terraform and/or Bicep), Automation and scripting (PowerShell, Bash, Python), Experience with Azure Service Bus and serverless technologies, Security and DevSecOps practices, Experience implementing security controls within CI/CD pipelines, Identity and access management experience, Monitoring, logging, and operational support experience, Experience working within cloud native or platform engineering environments. Desirable GitOps practices SRE / reliability engineering experience Experience defining platform standards and reference architectures Knowledge of security tooling including SAST/DAST, vulnerability scanning, and policy as code frameworks.
Site Reliability Engineer (Applications)
H&R Talent
An amazing Global Investment Client of ours located in Central London are looking for a Site Reliability Engineer to join their team on a permanent basis. This is a rare opportunity and the package offered for this role is up to £300k depending on skills and experience. About the Company The company is a leading provider of alternative investment solutions with approximately $63 billion of assets under management ("AUM") and over 550 employees worldwide including London, New York, Singapore and Hong Kong. One of their founding beliefs is that technology and data are at the core of the business allowing them to build and maintain cutting edge hardware and software solutions. The technology team is lean and has a culture that encourages interaction across all areas of the business on a global scale. Their aim is to use the best tool for the job therefore there is the opportunity to be constantly learning and use modern technologies. Their teams strive to push boundaries and think innovatively creating an environment that is fast paced, dynamic and successful. About the Role They are looking for an enthusiastic Site Reliability Engineer to join the SRE team in London. Their team is central to the business as they are responsible for the technology that underpins everything they do therefore you will have a direct impact on the success of the company. From scaling for the huge volumes of data that drive their research process, to improving the reliability and speed of a rapidly evolving application estate, there is always a relentless focus on automation and efficiency at scale. The company's engineers own their varied technology stack, end-to-end, and are in constant search of incremental improvements, new technologies and ways of working to evolve their platform and give them a competitive edge. They are looking for people who want to find unique solutions for optimising efficiency and performance in a context where they are key enablers. The ideal candidate will be passionate about improving reliability and removing toil by identifying opportunities for automation and building platforms to make the systems more "reliable by default". Responsibilities Evangelise the SRE mindset and implement best practices across the environment Understand the business and find ways to measure and enhance resilience across the application estate Eliminate the toil that emerges with complex, distributed systems by automating where possible Working as both an individual contributor and collaboratively to find new ways of improving the reliability, availability, security and performance of the infrastructure Accelerate the migration strategy to more cloud-native, distributed applications Improve productivity and developer experience through automation and interface improvements in local tool chains, IDEs, CI/CD. Requirements Expert level scripting / coding skills in one or more languages (Python / Golang etc.) Expert knowledge of observability systems (Prometheus / ELK / Jaeger / Opentelemetry / Service Meshes etc.) Experience with configuration management tools (Ansible / Puppet / Kapitan / Terraform) Experience with distributed data platforms (Kafka / Flink / Airflow) Comfortable using cloud native and containerisation technologies (Kubernetes / Docker) Good Linux systems knowledge (experience with RHEL desirable) Broad knowledge across network technologies, server virtualisation and storage Self-starter, able to quickly pick up concepts, implement new ideas and think outside the box Focused on improving system reliability, availability, security, and performance through testing, automation, and standardisation Ability to simply articulate the "why" behind best practices Ability to build positive and collaborative relationships with colleagues across teams and geographies Benefits Food & Beverage: Complimentary breakfast and lunch for all employees plus on-site coffee bars and a wide variety of healthy snacks. Annual Discretionary Bonuses: Reflecting firm and individual performance. Cycle to Work Initiative: Green loan scheme which employees are able to use for the purchase of bicycles. Employee Referral Programme: Bonus for each successful hire in the month your referral joins the company. Global Office Design: They aim to create a cohesive environment, regardless of region. They've designed office spaces to ensure everyone feels the connection no matter where you're located. Pension Scheme: Generous pension and retirement savings plans. Carbon Offset Programme: The company offsets its CO2 emissions annually and aims to sustainably source all office materials. Physical and Mental Fitness: Health and wellness benefits include an onsite gym & classes (LDN and NYC), gym subsidies in other regions, access to mental health support, and subscriptions to mindfulness platforms. Charity Donation Matching: Generous charity matching scheme and ample opportunities to become involved in the community. They offer charity of the year awards in each region and encourage employees to submit causes they're passionate about. Enhanced Caregiver Leave: Enhanced, flexible primary and secondary caregiver leave. Sabbatical: Generous sabbatical after you've been with the company for 8 years and every 4 years after that. Annual Training Allowance: Encourage personal and professional development. This allowance may be used towards conferences, seminars, and training courses which supplement extensive on-site training materials. Health and Life Insurance: Range of healthcare benefits to help you manage your personal, physical and emotional wellbeing.
09/06/2026
Full time
An amazing Global Investment Client of ours located in Central London are looking for a Site Reliability Engineer to join their team on a permanent basis. This is a rare opportunity and the package offered for this role is up to £300k depending on skills and experience. About the Company The company is a leading provider of alternative investment solutions with approximately $63 billion of assets under management ("AUM") and over 550 employees worldwide including London, New York, Singapore and Hong Kong. One of their founding beliefs is that technology and data are at the core of the business allowing them to build and maintain cutting edge hardware and software solutions. The technology team is lean and has a culture that encourages interaction across all areas of the business on a global scale. Their aim is to use the best tool for the job therefore there is the opportunity to be constantly learning and use modern technologies. Their teams strive to push boundaries and think innovatively creating an environment that is fast paced, dynamic and successful. About the Role They are looking for an enthusiastic Site Reliability Engineer to join the SRE team in London. Their team is central to the business as they are responsible for the technology that underpins everything they do therefore you will have a direct impact on the success of the company. From scaling for the huge volumes of data that drive their research process, to improving the reliability and speed of a rapidly evolving application estate, there is always a relentless focus on automation and efficiency at scale. The company's engineers own their varied technology stack, end-to-end, and are in constant search of incremental improvements, new technologies and ways of working to evolve their platform and give them a competitive edge. They are looking for people who want to find unique solutions for optimising efficiency and performance in a context where they are key enablers. The ideal candidate will be passionate about improving reliability and removing toil by identifying opportunities for automation and building platforms to make the systems more "reliable by default". Responsibilities Evangelise the SRE mindset and implement best practices across the environment Understand the business and find ways to measure and enhance resilience across the application estate Eliminate the toil that emerges with complex, distributed systems by automating where possible Working as both an individual contributor and collaboratively to find new ways of improving the reliability, availability, security and performance of the infrastructure Accelerate the migration strategy to more cloud-native, distributed applications Improve productivity and developer experience through automation and interface improvements in local tool chains, IDEs, CI/CD. Requirements Expert level scripting / coding skills in one or more languages (Python / Golang etc.) Expert knowledge of observability systems (Prometheus / ELK / Jaeger / Opentelemetry / Service Meshes etc.) Experience with configuration management tools (Ansible / Puppet / Kapitan / Terraform) Experience with distributed data platforms (Kafka / Flink / Airflow) Comfortable using cloud native and containerisation technologies (Kubernetes / Docker) Good Linux systems knowledge (experience with RHEL desirable) Broad knowledge across network technologies, server virtualisation and storage Self-starter, able to quickly pick up concepts, implement new ideas and think outside the box Focused on improving system reliability, availability, security, and performance through testing, automation, and standardisation Ability to simply articulate the "why" behind best practices Ability to build positive and collaborative relationships with colleagues across teams and geographies Benefits Food & Beverage: Complimentary breakfast and lunch for all employees plus on-site coffee bars and a wide variety of healthy snacks. Annual Discretionary Bonuses: Reflecting firm and individual performance. Cycle to Work Initiative: Green loan scheme which employees are able to use for the purchase of bicycles. Employee Referral Programme: Bonus for each successful hire in the month your referral joins the company. Global Office Design: They aim to create a cohesive environment, regardless of region. They've designed office spaces to ensure everyone feels the connection no matter where you're located. Pension Scheme: Generous pension and retirement savings plans. Carbon Offset Programme: The company offsets its CO2 emissions annually and aims to sustainably source all office materials. Physical and Mental Fitness: Health and wellness benefits include an onsite gym & classes (LDN and NYC), gym subsidies in other regions, access to mental health support, and subscriptions to mindfulness platforms. Charity Donation Matching: Generous charity matching scheme and ample opportunities to become involved in the community. They offer charity of the year awards in each region and encourage employees to submit causes they're passionate about. Enhanced Caregiver Leave: Enhanced, flexible primary and secondary caregiver leave. Sabbatical: Generous sabbatical after you've been with the company for 8 years and every 4 years after that. Annual Training Allowance: Encourage personal and professional development. This allowance may be used towards conferences, seminars, and training courses which supplement extensive on-site training materials. Health and Life Insurance: Range of healthcare benefits to help you manage your personal, physical and emotional wellbeing.
SRE Engineer: Cloud-Native, Automation & Resilience
H&R Talent
H&R Talent is seeking a Site Reliability Engineer to join their team in Central London. This permanent role offers a competitive salary up to £300k, depending on skills and experience. The candidate will contribute to the technology underpinning the business and improve system reliability while working in a dynamic and innovative environment. Responsibilities include automating processes, enhancing application resilience, and collaborating with a global team. Ideal candidates will have expertise in scripting, observability, and cloud-native technologies.
09/06/2026
Full time
H&R Talent is seeking a Site Reliability Engineer to join their team in Central London. This permanent role offers a competitive salary up to £300k, depending on skills and experience. The candidate will contribute to the technology underpinning the business and improve system reliability while working in a dynamic and innovative environment. Responsibilities include automating processes, enhancing application resilience, and collaborating with a global team. Ideal candidates will have expertise in scripting, observability, and cloud-native technologies.
Lorien
AI Engineer
Lorien
Hybrid Working - London - 2 days a week on site. Lorien's leading banking client is looking for an exceptional AI Engineer with strong experience in Python, SQL, and working with Cloud Based AI/ML Ecosystems, and AWS SageMaker. This role will be building the pipelines, services, and monitoring capabilities that underpin AI observability and governance across the bank. This is a hands on, high impact role at the intersection of AI governance, distributed systems, observability, and platform engineering. You will develop core components of the platform, contribute to its evolution, and ensure our AI systems are measurable, transparent, and well controlled from model training through to production. The Ideal Candidate will have: Strong engineering foundations, with experience building scalable distributed systems or data platforms. Proficiency in Python, SQL, Java, and modern data processing frameworks. Experience working with cloud-based AI/ML ecosystems, particularly AWS SageMaker (required). This role is based in London. This role will be Via Umbrella. Working in a Hybrid Model of 2 days a week on site. What You'll Do Contribute to the development of data pipelines, APIs, and services that power the AI Control Tower. Implement components supporting AI observability, guardrails, performance monitoring, and lifecycle controls. Develop integrations with model registries, feature stores, lineage tools, and governance systems. Write clean, well tested, scalable code in Python, Java, SQL, and modern data/stream processing frameworks. Build high throughput pipelines to capture metrics such as: Model performance, drift, and degradation Operational and service health Security posture and policy adherence Guardrail compliance for ML and GenAI systems Governance and risk indicators Implement observability tooling using logging, metrics, tracing, and event driven patterns. Support monitoring and measurement of AI systems across development, deployment, and runtime environments. Work closely with data engineering, platform engineering, security, MLOps, and Independent Model Monitoring (IMM) teams. Contribute to integration efforts with AWS SageMaker, model pipelines, and enterprise data platforms. Use technologies such as AWS, SageMaker, Python, Java, Kafka, OpenTelemetry, and cloud native monitoring stacks. Support governance and reporting workflows with automated checks, standardised metrics, and platform tooling. Understanding of monitoring frameworks, observability pipelines, and dashboards. Familiarity of event-driven architectures and messaging systems (Kafka, Vert.x, or similar). Knowledge of security engineering, IAM principles, encryption, and cloud security controls. Experience with CI/CD, infrastructure-as-code, and automated testing for data/ML systems. Helpful Experience Exposure to MLOps, LLMOps, or model lifecycle management. Awareness of model risk and regulatory frameworks (e.g., SS1/23, NIST AI Risk Management Framework). Understanding of operational resilience concepts and SRE practices (SLIs/SLOs). Experience with data lineage or governance tooling (DataHub, Glue, Collibra). Interest in Responsible AI, explainability, fairness/bias, and governance automation Guidant, Carbon60, Lorien & SRG - The Impellam Group Portfolio are acting as an Employment Business in relation to this vacancy.
09/06/2026
Full time
Hybrid Working - London - 2 days a week on site. Lorien's leading banking client is looking for an exceptional AI Engineer with strong experience in Python, SQL, and working with Cloud Based AI/ML Ecosystems, and AWS SageMaker. This role will be building the pipelines, services, and monitoring capabilities that underpin AI observability and governance across the bank. This is a hands on, high impact role at the intersection of AI governance, distributed systems, observability, and platform engineering. You will develop core components of the platform, contribute to its evolution, and ensure our AI systems are measurable, transparent, and well controlled from model training through to production. The Ideal Candidate will have: Strong engineering foundations, with experience building scalable distributed systems or data platforms. Proficiency in Python, SQL, Java, and modern data processing frameworks. Experience working with cloud-based AI/ML ecosystems, particularly AWS SageMaker (required). This role is based in London. This role will be Via Umbrella. Working in a Hybrid Model of 2 days a week on site. What You'll Do Contribute to the development of data pipelines, APIs, and services that power the AI Control Tower. Implement components supporting AI observability, guardrails, performance monitoring, and lifecycle controls. Develop integrations with model registries, feature stores, lineage tools, and governance systems. Write clean, well tested, scalable code in Python, Java, SQL, and modern data/stream processing frameworks. Build high throughput pipelines to capture metrics such as: Model performance, drift, and degradation Operational and service health Security posture and policy adherence Guardrail compliance for ML and GenAI systems Governance and risk indicators Implement observability tooling using logging, metrics, tracing, and event driven patterns. Support monitoring and measurement of AI systems across development, deployment, and runtime environments. Work closely with data engineering, platform engineering, security, MLOps, and Independent Model Monitoring (IMM) teams. Contribute to integration efforts with AWS SageMaker, model pipelines, and enterprise data platforms. Use technologies such as AWS, SageMaker, Python, Java, Kafka, OpenTelemetry, and cloud native monitoring stacks. Support governance and reporting workflows with automated checks, standardised metrics, and platform tooling. Understanding of monitoring frameworks, observability pipelines, and dashboards. Familiarity of event-driven architectures and messaging systems (Kafka, Vert.x, or similar). Knowledge of security engineering, IAM principles, encryption, and cloud security controls. Experience with CI/CD, infrastructure-as-code, and automated testing for data/ML systems. Helpful Experience Exposure to MLOps, LLMOps, or model lifecycle management. Awareness of model risk and regulatory frameworks (e.g., SS1/23, NIST AI Risk Management Framework). Understanding of operational resilience concepts and SRE practices (SLIs/SLOs). Experience with data lineage or governance tooling (DataHub, Glue, Collibra). Interest in Responsible AI, explainability, fairness/bias, and governance automation Guidant, Carbon60, Lorien & SRG - The Impellam Group Portfolio are acting as an Employment Business in relation to this vacancy.
Sr. Observability Engineer
Dormont Manufacturing Co
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does. Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation. We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world. As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision making. Drive Incident & Problem Management: Working with Operations teams on high priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post incident reviews and implement long term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements Essential Qualifications Experience: 5-7+ years of hands on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large scale monitoring and observability solutions. Expert Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem Solving: Exceptional analytical and strategic problem solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands on: Demonstratable hands on engineering and coding experience, ability to deep dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self motivated with a high degree of initiative and excellent follow up skills, along with strong analytical and problem solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field. Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator). Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category Universal Music Group
08/06/2026
Full time
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does. Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation. We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world. As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision making. Drive Incident & Problem Management: Working with Operations teams on high priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post incident reviews and implement long term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements Essential Qualifications Experience: 5-7+ years of hands on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large scale monitoring and observability solutions. Expert Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem Solving: Exceptional analytical and strategic problem solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands on: Demonstratable hands on engineering and coding experience, ability to deep dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self motivated with a high degree of initiative and excellent follow up skills, along with strong analytical and problem solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field. Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator). Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category Universal Music Group
Senior Site Reliability Engineer
iManage City, Belfast
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
08/06/2026
Full time
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Senior Site Reliability Engineer
iManage
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
08/06/2026
Full time
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Cloud Operations Manager Engineering Cheltenham
Finova Technologies Private Limited Cheltenham, Gloucestershire
Cloud Operations Manager - Cheltenham (3 days office based) About Finova Finova is the UK's largest financial services technology provider, supporting one in every five mortgages nationwide. Our agile, cloud-native solutions enable over 60 banks, building societies, specialist lenders, equity release providers and a network of 2,400+ brokers to stay ahead in a competitive market. Built on open architecture and backed by deep industry expertise, our platform is designed to scale. Each year, we process over £50 billion in loans, manage nearly £50 billion in savings, and support the digital servicing of more than 650,000 UK borrower accounts. Be part of a team that's driving innovation, enabling growth and shaping the future of UK lending. For Lenders Finova offers a flexible, modular technology suite designed to help lenders move faster, scale efficiently and deliver standout digital experiences. Financial Institutions use Finova to launch products faster, process applications up to 50% more efficiently and reduce operational costs - all while staying fully compliant in a fast-moving market. About the Role: As the Cloud Operations Manager, you will be responsible for the day-to-day operational health, stability, and continuous improvement of the cloud-hosted environments for multiple clients within our product suites. You will bridge the gap between high-level cloud strategy and technical execution, ensuring that our production environments are resilient, secure, and performant. Acting as the primary authority for operational delivery within your remit, you will lead a team of Cloud Operations Engineers. You will maintain high service availability while driving modern practices such as automation, toil reduction, and proactive resilience testing. A core focus of this role is leadership: providing daily mentorship, administrative oversight, and clear career pathways for your team. You will hold responsibility for operational gatekeeping and readiness, directly influencing disaster recovery drills, security remediation, and the seamless transition of new features into stable production services. About you: Technical Foundation: Deep experience in managing high-availability production environments (AWS or Azure). You are a technically grounded professional with a passion for operational stability. Leadership Aptitude: Proven ability to mentor junior peers, lead technical projects, or act as a 'sounding board' for complex engineering challenges. (This role suits an experienced manager or a Senior Cloud Engineer ready for management). Resilience Mindset: A strong drive to replace manual bottlenecks with scalable, code-based solutions and an understanding of SRE principles. Event Coordination: Exceptional organizational skills with the confidence to facilitate cross-team events like disaster recovery simulations, pen tests, and post-incident retrospectives. Process & Compliance: Familiarity with ITIL and working in ISO-aligned environments. Detail-oriented approach to assessing vulnerabilities and maintaining industry standards. Communication Skills: Ability to translate technical metrics into clear availability and performance updates for engineering teams, senior leadership, external clients, and third-party partners. Collaborative Approach: Eager to partner with the Head of Cloud Ops, Platform teams, and Finance to build a high-performing, cost-effective engineering culture. What will you be doing? Operational Excellence & Service Management: Manage day-to-day capacity, priorities, and workloads to consistently meet availability and performance SLAs. Embed ITIL-aligned service management practices (incident, problem, and change processes) to drive continuous improvement. Team Leadership & Development: Own the capability and growth of your team through formal line management, coaching, and career development. Maintain a high-quality technical wiki and skills matrix to eliminate silos. Automation & Toil Reduction: Eliminate operational waste by identifying repetitive manual tasks and directing Senior Engineers to prioritize automation and self-healing infrastructure. Operational Gatekeeping: Conduct readiness checks on new releases to ensure they meet strict standards for monitoring, documentation, and long-term supportability. Resilience & Security: Facilitate regular Game Days, disaster recovery drills, and coordinate third-party penetration testing and remediation. Lead blameless post-mortems after significant incidents. Escalation & On-Call Management: Act as a senior escalation point for platform/application issues. Manage (and participate in/provide cover for) the out-of-hours on-call rotation for P1/P2 incidents and deployments. Compliance & Audit: Feed into the departmental risk register and coordinate evidence, controls, and documentation required for regulatory and customer audits (e.g., ISO 27001/ISO 9001). FinOps & Stakeholder Collaboration: Act as the primary point of contact for internal/external queries. Collaborate with FinOps/Finance to improve cost transparency, tagging discipline, and resource optimization. What We Offer: Hybrid working We operate on a hybrid model that is primarily office based, requiring three days in the office each week, with the flexibility to work remotely for the remainder. Private medical insurance Comprehensive health cover, with the option to add your family to your plan, because your well-being matters to us. Life assurance & income protection We provide life assurance and income protection to give you peace of mind for the future. Family friendly policies Our enhanced family-friendly policy goes beyond maternity and paternity leave, offering paid time off for when plans change or alternative paths to parenthood are needed. Work from anywhere Some thrive in the office, others at home - and many do best with choice. With approval, Finova employees can work abroad for up to 4 weeks each year. Flexible holiday package Enjoy 25 days paid holiday allowance, plus all public holidays. And, you can rebook any public holidays for a day that aligns with your personal beliefs or celebration calendar. We also offer holiday trading allowing you to purchase or sell your holiday allowance. Company pension scheme With salary exchange, you save on tax and can build a secure future. Employee assistance programme We understand that mental health is just as important as physical health. Access to a 24/7 confidential counselling helpline ensures you have support when you need it. Electric car scheme Get a brand-new electric vehicle with salary sacrifice as a benefit, paid for through your gross monthly pay, saving on Income Tax and National Insurance. Health cash plan Our Health Cash Plan empowers you to prioritise your wellbeing by providing effortless reimbursement for everyday healthcare costs, from dental and optical visits to physiotherapy. Gym discounts Achieve your fitness goals for less with GymFlex, which offers significant savings on annual memberships at over 3,000 gyms and leisure centres nationwide. Perks that matter We fuel your day with a fully stocked pantry of fresh fruit and snacks and keep the team spirit high with weekly socials and events. Equal Opportunity Statement We value diversity and are committed to creating an inclusive environment for all employees. If you're passionate about this role but don't meet all the criteria, please reach out, we'd love to discuss how your skills and experiences align with our needs.
07/06/2026
Full time
Cloud Operations Manager - Cheltenham (3 days office based) About Finova Finova is the UK's largest financial services technology provider, supporting one in every five mortgages nationwide. Our agile, cloud-native solutions enable over 60 banks, building societies, specialist lenders, equity release providers and a network of 2,400+ brokers to stay ahead in a competitive market. Built on open architecture and backed by deep industry expertise, our platform is designed to scale. Each year, we process over £50 billion in loans, manage nearly £50 billion in savings, and support the digital servicing of more than 650,000 UK borrower accounts. Be part of a team that's driving innovation, enabling growth and shaping the future of UK lending. For Lenders Finova offers a flexible, modular technology suite designed to help lenders move faster, scale efficiently and deliver standout digital experiences. Financial Institutions use Finova to launch products faster, process applications up to 50% more efficiently and reduce operational costs - all while staying fully compliant in a fast-moving market. About the Role: As the Cloud Operations Manager, you will be responsible for the day-to-day operational health, stability, and continuous improvement of the cloud-hosted environments for multiple clients within our product suites. You will bridge the gap between high-level cloud strategy and technical execution, ensuring that our production environments are resilient, secure, and performant. Acting as the primary authority for operational delivery within your remit, you will lead a team of Cloud Operations Engineers. You will maintain high service availability while driving modern practices such as automation, toil reduction, and proactive resilience testing. A core focus of this role is leadership: providing daily mentorship, administrative oversight, and clear career pathways for your team. You will hold responsibility for operational gatekeeping and readiness, directly influencing disaster recovery drills, security remediation, and the seamless transition of new features into stable production services. About you: Technical Foundation: Deep experience in managing high-availability production environments (AWS or Azure). You are a technically grounded professional with a passion for operational stability. Leadership Aptitude: Proven ability to mentor junior peers, lead technical projects, or act as a 'sounding board' for complex engineering challenges. (This role suits an experienced manager or a Senior Cloud Engineer ready for management). Resilience Mindset: A strong drive to replace manual bottlenecks with scalable, code-based solutions and an understanding of SRE principles. Event Coordination: Exceptional organizational skills with the confidence to facilitate cross-team events like disaster recovery simulations, pen tests, and post-incident retrospectives. Process & Compliance: Familiarity with ITIL and working in ISO-aligned environments. Detail-oriented approach to assessing vulnerabilities and maintaining industry standards. Communication Skills: Ability to translate technical metrics into clear availability and performance updates for engineering teams, senior leadership, external clients, and third-party partners. Collaborative Approach: Eager to partner with the Head of Cloud Ops, Platform teams, and Finance to build a high-performing, cost-effective engineering culture. What will you be doing? Operational Excellence & Service Management: Manage day-to-day capacity, priorities, and workloads to consistently meet availability and performance SLAs. Embed ITIL-aligned service management practices (incident, problem, and change processes) to drive continuous improvement. Team Leadership & Development: Own the capability and growth of your team through formal line management, coaching, and career development. Maintain a high-quality technical wiki and skills matrix to eliminate silos. Automation & Toil Reduction: Eliminate operational waste by identifying repetitive manual tasks and directing Senior Engineers to prioritize automation and self-healing infrastructure. Operational Gatekeeping: Conduct readiness checks on new releases to ensure they meet strict standards for monitoring, documentation, and long-term supportability. Resilience & Security: Facilitate regular Game Days, disaster recovery drills, and coordinate third-party penetration testing and remediation. Lead blameless post-mortems after significant incidents. Escalation & On-Call Management: Act as a senior escalation point for platform/application issues. Manage (and participate in/provide cover for) the out-of-hours on-call rotation for P1/P2 incidents and deployments. Compliance & Audit: Feed into the departmental risk register and coordinate evidence, controls, and documentation required for regulatory and customer audits (e.g., ISO 27001/ISO 9001). FinOps & Stakeholder Collaboration: Act as the primary point of contact for internal/external queries. Collaborate with FinOps/Finance to improve cost transparency, tagging discipline, and resource optimization. What We Offer: Hybrid working We operate on a hybrid model that is primarily office based, requiring three days in the office each week, with the flexibility to work remotely for the remainder. Private medical insurance Comprehensive health cover, with the option to add your family to your plan, because your well-being matters to us. Life assurance & income protection We provide life assurance and income protection to give you peace of mind for the future. Family friendly policies Our enhanced family-friendly policy goes beyond maternity and paternity leave, offering paid time off for when plans change or alternative paths to parenthood are needed. Work from anywhere Some thrive in the office, others at home - and many do best with choice. With approval, Finova employees can work abroad for up to 4 weeks each year. Flexible holiday package Enjoy 25 days paid holiday allowance, plus all public holidays. And, you can rebook any public holidays for a day that aligns with your personal beliefs or celebration calendar. We also offer holiday trading allowing you to purchase or sell your holiday allowance. Company pension scheme With salary exchange, you save on tax and can build a secure future. Employee assistance programme We understand that mental health is just as important as physical health. Access to a 24/7 confidential counselling helpline ensures you have support when you need it. Electric car scheme Get a brand-new electric vehicle with salary sacrifice as a benefit, paid for through your gross monthly pay, saving on Income Tax and National Insurance. Health cash plan Our Health Cash Plan empowers you to prioritise your wellbeing by providing effortless reimbursement for everyday healthcare costs, from dental and optical visits to physiotherapy. Gym discounts Achieve your fitness goals for less with GymFlex, which offers significant savings on annual memberships at over 3,000 gyms and leisure centres nationwide. Perks that matter We fuel your day with a fully stocked pantry of fresh fruit and snacks and keep the team spirit high with weekly socials and events. Equal Opportunity Statement We value diversity and are committed to creating an inclusive environment for all employees. If you're passionate about this role but don't meet all the criteria, please reach out, we'd love to discuss how your skills and experiences align with our needs.
Senior Site Reliability Engineer (SRE)
The Investigo Group
Role: Senior Site Reliability Engineer (SRE) - Kubernetes / OpenShift Location: Remote - UK (possible paid occasional travel to TIG Secure site locations as required) Job Type: Full-time, Permanent (37.5 hours) Salary: Competitive + benefits + package Security Clearance Requirements Please note that holding a current Security Clearance is not essential at the time of application, but eligibility is required. This role requires the successful candidate to be eligible for Security Check (SC) clearance. To meet this requirement, applicants must: Have the right to work in the UK Have lived in the UK continuously for the past 5 years Not have spent more than 6 months outside the UK in total during that period Be willing to undergo security vetting as part of the onboarding process About You You're an experienced SRE, Platform Engineer or Cloud Engineer with strong hands on experience running Kubernetes in production environments. You're comfortable working across Linux, Kubernetes, cloud native tooling, automation, observability, CI/CD and infrastructure as code. You understand that reliability, security and operational maturity are critical to how modern platforms support engineering teams and customer facing services. You enjoy treating infrastructure as a product, automating repeatable work, improving resilience, and building platforms that other engineers can rely on. You're calm under pressure, methodical during incidents, and able to turn operational challenges into long term improvements. You may have worked in a regulated, secure, government, defence, financial services, telecoms, managed services or cloud native environment, but most importantly you have operated Kubernetes at depth and understand the realities of production ownership. You're a senior individual contributor who can mentor others, influence engineering practice, and provide technical authority without needing formal line management responsibility. About the Role We're looking for a Senior Site Reliability Engineer (SRE) to help operate, harden and mature our production OKD / Kubernetes platforms. This is a hands on engineering role focused on reliability, automation, observability, GitOps, CI/CD and secure platform operations. You'll work across the full stack, from bare metal and virtualisation through to Kubernetes control plane operations, ingress, identity, monitoring, developer platform tooling and application delivery. The role will play a key part in improving the operational maturity of our platform estate, supporting the migration from VMware to KVM, strengthening GitOps and CI/CD practices, and helping ensure our platforms remain secure, scalable and aligned to the needs of regulated customer environments. You'll work closely with platform, application, AI, networking, security, QA and architecture teams to build reliable foundations that enable other engineering teams to deliver safely and at pace. This is not a ticket handling role. It is a senior engineering position where you'll be expected to own problems, drive improvements, and help shape how TIG operates critical cloud native infrastructure. About the Team You'll be joining our Cloud team, working closely with Platform Engineering and wider engineering teams responsible for the foundational platforms on which TIG's services run. This is a great opportunity to join a small, senior technical environment where you can have direct ownership, meaningful influence, and visibility across modern platform engineering, Kubernetes, automation, observability, security and cloud native delivery. Key Responsibilities Operate, harden and extend production OpenShift / OKD / Kubernetes clusters across on premises and hybrid environments. Support the migration from VMware to KVM, helping modernise the underlying compute and storage layer. Own and improve CI/CD processes across the full lifecycle of platform and application components. Work with platform and application engineers to support cloud native delivery using tools such as Helm and Kustomize. Develop and mature GitOps deployment practices using tools such as Argo CD or Flux. Maintain and improve core platform services including identity, ingress, observability, certificate management, service mesh and container registry capabilities. Build and operate observability across logs, metrics, traces, alerting, SLOs and error budgets. Improve platform hardening in line with secure and regulated environment requirements, including network policy, SELinux, image provenance, secret management and audit. Automate repeatable operational tasks using tools such as Ansible, Terraform, Helm, Kustomize, Go, Python or equivalent technologies. Lead incident response activity, support blameless post mortems and drive systemic fixes. Partner with networking and security teams on platform integration, segmentation, load balancing and accreditation evidence. Create and maintain clear technical documentation, runbooks, design notes and operational guidance. Mentor other engineers and act as a senior technical authority across cloud and Kubernetes operations. Participate in an on call rota, with appropriate compensation. Success in This Role Looks Like A more reliable, secure and measurable production Kubernetes estate. Improved platform observability, with meaningful alerting, SLOs and trend data that engineering teams actively use. Progress against the VMware to KVM migration, with a clear and automated path for the underlying infrastructure layer. A mature GitOps approach covering platform and application components, including rollback, drift detection and operational control. Improved CI/CD practices that help teams move at pace while considering security, QA and compliance earlier in the lifecycle. Well documented, supportable and scalable platform services. Stronger incident response, clearer runbooks and post mortems that lead to real operational improvements. Recognition as a technical authority for Kubernetes, cloud and platform operations across the organisation. What We're Looking For We're looking for a Senior Site Reliability Engineer (SRE) with strong experience operating production Kubernetes environments. This role is well suited to someone who combines deep technical capability with strong operational discipline. You'll be comfortable taking ownership of complex platform challenges, improving reliability, and working collaboratively across engineering, security, networking and architecture teams. Essential Experience & Skills Strong experience running production Kubernetes environments, not just consuming or deploying into them. Strong Linux fundamentals, including systemd, networking, storage and performance troubleshooting. Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS or GKE. Solid infrastructure as code experience, including Ansible plus Terraform or equivalent, alongside tools such as Helm and Kustomize. GitOps and CI/CD experience managing full application and component lifecycles, using tools such as Argo CD, Flux, GitHub Actions or similar. Prometheus, Grafana, Elastic Stack / LGTM, OpenTelemetry or similar. Experience working with identity and access technologies such as OIDC, SAML, SCIM or Keycloak. Experience with virtualisation or infrastructure platforms such as KVM, libvirt or VMware. Scripting or tooling experience using Go, Python, shell scripting or similar. Strong troubleshooting, problem solving and analytical skills. Experience working in secure, regulated or enterprise scale environments. Strong communication skills, with the ability to produce clear documentation, runbooks, post mortems and technical guidance. Eligible to hold UK SC clearance. Desirable (Not Essential) Specific OpenShift or OKD experience, including operators, MachineConfig or SCCs. Service mesh experience such as Istio or Linkerd. Policy engine experience such as OPA, Gatekeeper or Kyverno. Cloud native application deployment experience using Helm, Terraform, Kustomize or similar. Storage experience such as Ceph, Longhorn, OpenShift Data Foundation or equivalent. Networking experience including BGP, VXLAN, Palo Alto or Juniper technologies. Software supply chain security experience, including SBOMs, image signing, admission control or tools such as Sigstore. Experience operating AI, ML or GPU enabled platforms. CKA, CKAD, CKS, Red Hat certifications or equivalent. Active or recent UK SC clearance. Recognised open source contributions to the Kubernetes ecosystem. Soft Skills & Behaviours Calm, structured and methodical under pressure. Strong written and verbal communication skills. Collaborative working style across platform, development, QA, security, networking and architecture teams. Strong sense of ownership and accountability. Automation first mindset, with a focus on removing repeatable manual work. Able to influence technical practice through evidence, example and credibility. Pragmatic and solutions focused approach to problem solving. Curious about why systems fail, not just how to bring them back online. . click apply for full job details
07/06/2026
Full time
Role: Senior Site Reliability Engineer (SRE) - Kubernetes / OpenShift Location: Remote - UK (possible paid occasional travel to TIG Secure site locations as required) Job Type: Full-time, Permanent (37.5 hours) Salary: Competitive + benefits + package Security Clearance Requirements Please note that holding a current Security Clearance is not essential at the time of application, but eligibility is required. This role requires the successful candidate to be eligible for Security Check (SC) clearance. To meet this requirement, applicants must: Have the right to work in the UK Have lived in the UK continuously for the past 5 years Not have spent more than 6 months outside the UK in total during that period Be willing to undergo security vetting as part of the onboarding process About You You're an experienced SRE, Platform Engineer or Cloud Engineer with strong hands on experience running Kubernetes in production environments. You're comfortable working across Linux, Kubernetes, cloud native tooling, automation, observability, CI/CD and infrastructure as code. You understand that reliability, security and operational maturity are critical to how modern platforms support engineering teams and customer facing services. You enjoy treating infrastructure as a product, automating repeatable work, improving resilience, and building platforms that other engineers can rely on. You're calm under pressure, methodical during incidents, and able to turn operational challenges into long term improvements. You may have worked in a regulated, secure, government, defence, financial services, telecoms, managed services or cloud native environment, but most importantly you have operated Kubernetes at depth and understand the realities of production ownership. You're a senior individual contributor who can mentor others, influence engineering practice, and provide technical authority without needing formal line management responsibility. About the Role We're looking for a Senior Site Reliability Engineer (SRE) to help operate, harden and mature our production OKD / Kubernetes platforms. This is a hands on engineering role focused on reliability, automation, observability, GitOps, CI/CD and secure platform operations. You'll work across the full stack, from bare metal and virtualisation through to Kubernetes control plane operations, ingress, identity, monitoring, developer platform tooling and application delivery. The role will play a key part in improving the operational maturity of our platform estate, supporting the migration from VMware to KVM, strengthening GitOps and CI/CD practices, and helping ensure our platforms remain secure, scalable and aligned to the needs of regulated customer environments. You'll work closely with platform, application, AI, networking, security, QA and architecture teams to build reliable foundations that enable other engineering teams to deliver safely and at pace. This is not a ticket handling role. It is a senior engineering position where you'll be expected to own problems, drive improvements, and help shape how TIG operates critical cloud native infrastructure. About the Team You'll be joining our Cloud team, working closely with Platform Engineering and wider engineering teams responsible for the foundational platforms on which TIG's services run. This is a great opportunity to join a small, senior technical environment where you can have direct ownership, meaningful influence, and visibility across modern platform engineering, Kubernetes, automation, observability, security and cloud native delivery. Key Responsibilities Operate, harden and extend production OpenShift / OKD / Kubernetes clusters across on premises and hybrid environments. Support the migration from VMware to KVM, helping modernise the underlying compute and storage layer. Own and improve CI/CD processes across the full lifecycle of platform and application components. Work with platform and application engineers to support cloud native delivery using tools such as Helm and Kustomize. Develop and mature GitOps deployment practices using tools such as Argo CD or Flux. Maintain and improve core platform services including identity, ingress, observability, certificate management, service mesh and container registry capabilities. Build and operate observability across logs, metrics, traces, alerting, SLOs and error budgets. Improve platform hardening in line with secure and regulated environment requirements, including network policy, SELinux, image provenance, secret management and audit. Automate repeatable operational tasks using tools such as Ansible, Terraform, Helm, Kustomize, Go, Python or equivalent technologies. Lead incident response activity, support blameless post mortems and drive systemic fixes. Partner with networking and security teams on platform integration, segmentation, load balancing and accreditation evidence. Create and maintain clear technical documentation, runbooks, design notes and operational guidance. Mentor other engineers and act as a senior technical authority across cloud and Kubernetes operations. Participate in an on call rota, with appropriate compensation. Success in This Role Looks Like A more reliable, secure and measurable production Kubernetes estate. Improved platform observability, with meaningful alerting, SLOs and trend data that engineering teams actively use. Progress against the VMware to KVM migration, with a clear and automated path for the underlying infrastructure layer. A mature GitOps approach covering platform and application components, including rollback, drift detection and operational control. Improved CI/CD practices that help teams move at pace while considering security, QA and compliance earlier in the lifecycle. Well documented, supportable and scalable platform services. Stronger incident response, clearer runbooks and post mortems that lead to real operational improvements. Recognition as a technical authority for Kubernetes, cloud and platform operations across the organisation. What We're Looking For We're looking for a Senior Site Reliability Engineer (SRE) with strong experience operating production Kubernetes environments. This role is well suited to someone who combines deep technical capability with strong operational discipline. You'll be comfortable taking ownership of complex platform challenges, improving reliability, and working collaboratively across engineering, security, networking and architecture teams. Essential Experience & Skills Strong experience running production Kubernetes environments, not just consuming or deploying into them. Strong Linux fundamentals, including systemd, networking, storage and performance troubleshooting. Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS or GKE. Solid infrastructure as code experience, including Ansible plus Terraform or equivalent, alongside tools such as Helm and Kustomize. GitOps and CI/CD experience managing full application and component lifecycles, using tools such as Argo CD, Flux, GitHub Actions or similar. Prometheus, Grafana, Elastic Stack / LGTM, OpenTelemetry or similar. Experience working with identity and access technologies such as OIDC, SAML, SCIM or Keycloak. Experience with virtualisation or infrastructure platforms such as KVM, libvirt or VMware. Scripting or tooling experience using Go, Python, shell scripting or similar. Strong troubleshooting, problem solving and analytical skills. Experience working in secure, regulated or enterprise scale environments. Strong communication skills, with the ability to produce clear documentation, runbooks, post mortems and technical guidance. Eligible to hold UK SC clearance. Desirable (Not Essential) Specific OpenShift or OKD experience, including operators, MachineConfig or SCCs. Service mesh experience such as Istio or Linkerd. Policy engine experience such as OPA, Gatekeeper or Kyverno. Cloud native application deployment experience using Helm, Terraform, Kustomize or similar. Storage experience such as Ceph, Longhorn, OpenShift Data Foundation or equivalent. Networking experience including BGP, VXLAN, Palo Alto or Juniper technologies. Software supply chain security experience, including SBOMs, image signing, admission control or tools such as Sigstore. Experience operating AI, ML or GPU enabled platforms. CKA, CKAD, CKS, Red Hat certifications or equivalent. Active or recent UK SC clearance. Recognised open source contributions to the Kubernetes ecosystem. Soft Skills & Behaviours Calm, structured and methodical under pressure. Strong written and verbal communication skills. Collaborative working style across platform, development, QA, security, networking and architecture teams. Strong sense of ownership and accountability. Automation first mindset, with a focus on removing repeatable manual work. Able to influence technical practice through evidence, example and credibility. Pragmatic and solutions focused approach to problem solving. Curious about why systems fail, not just how to bring them back online. . click apply for full job details

Modal Window

  • Home
  • Contact
  • About Us
  • FAQs
  • Terms & Conditions
  • Privacy
  • Employer
  • Post a Job
  • Search Resumes
  • Sign in
  • Job Seeker
  • Find Jobs
  • Create Resume
  • Sign in
  • IT blog
  • Facebook
  • Twitter
  • LinkedIn
  • Youtube
© 2008-2026 IT Job Board