Browse IT Jobs | IT Job Board

Deloitte - Recruitment City, Belfast

Belfast, United Kingdom Posted on 11/05/2026 Contract Location: Belfast Contract Duration: 12 months initially (with potential extension up to 2 years) Contract Start Date: Immediate Contract Classification: Inside IR35 Role Overview This is an opportunity to join a large Tier 1 financial services organization. The team is dedicated to evolving the application and data hosting environments into an integrated, hybrid multi cloud ecosystem, primarily leveraging AWS and Google Cloud. We focus on building, automating, and optimizing the foundational cloud infrastructure and services that empower developers to innovate rapidly and securely. In this role, you will design, implement, and maintain scalable, resilient, and secure cloud platforms and solutions, embodying a "you build it, you run it" philosophy to ensure operational excellence. Key Responsibilities Cloud Platform Engineering: Design, implement, and manage core cloud infrastructure and services on AWS and GCP, focusing on compute, storage, networking, security, and identity management. Infrastructure as Code (IaC): Develop, maintain, and optimize cloud infrastructure using IaC tools such as Terraform, CloudFormation (AWS), and Deployment Manager (GCP). Create reusable modules and blueprints to standardize deployments. Automation & Orchestration: Automate the provisioning, configuration, and management of cloud resources and services. Develop scripts and tools (e.g., Python, Go, Bash) to streamline operational tasks and improve efficiency. Containerization & Orchestration: Implement and manage containerization technologies (Docker) and orchestration platforms (Kubernetes, e.g., Amazon EKS, Google Kubernetes Engine) to support cloud native application deployments. Site Reliability Engineering (SRE): Embrace a "you build it, you run it" mindset. Take ownership of the reliability, performance, and availability of the cloud platforms and services you build. Implement robust monitoring, alerting, and logging solutions, and participate in on call rotations to ensure rapid incident response and resolution. Security & Compliance: Implement and enforce cloud security best practices, ensuring compliance with policies, industry standards, and regulatory requirements. Configure IAM roles, network security groups, firewalls, and encryption. CI/CD Pipeline Integration: Collaborate with development teams to integrate cloud infrastructure provisioning and management into CI/CD pipelines, enabling automated and continuous delivery. Documentation & Knowledge Sharing: Create and maintain comprehensive documentation for cloud architectures, configurations, and operational procedures. Mentor junior engineers and contribute to knowledge sharing within the team. Required Qualifications & Skills Proven experience as a Cloud Engineer, DevOps Engineer, or SRE in a mid level or senior capacity. Strong hands on experience with public cloud platforms, specifically AWS and/or Google Cloud (GCP). Infrastructure as Code (IaC) Hands on Expertise: Demonstrable experience with the following: Programming Languages: Python and Go. Solid understanding and experience with containerization technologies (Docker) and orchestration platforms (Kubernetes). Experience implementing and managing CI/CD pipelines. Strong understanding of networking concepts (VPC, subnets, routing, firewalls), security best practices, and IAM in a cloud environment. Experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, Splunk, ELK stack, CloudWatch, Stackdriver). A strong commitment to Site Reliability Engineering (SRE) principles and practices, including operational ownership. Preferred Qualifications & Skills Public cloud provider certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer, Google Cloud Professional Cloud Architect, Google Cloud Professional DevOps Engineer). Experience with serverless computing (e.g., AWS Lambda, Google Cloud Functions). Familiarity with configuration management tools like Ansible. Experience with database services (relational and NoSQL) in the cloud. Knowledge of disaster recovery and business continuity strategies in a multi cloud environment. Experience with GitOps principles and tools.

12/06/2026

Full time

Belfast, United Kingdom Posted on 11/05/2026 Contract Location: Belfast Contract Duration: 12 months initially (with potential extension up to 2 years) Contract Start Date: Immediate Contract Classification: Inside IR35 Role Overview This is an opportunity to join a large Tier 1 financial services organization. The team is dedicated to evolving the application and data hosting environments into an integrated, hybrid multi cloud ecosystem, primarily leveraging AWS and Google Cloud. We focus on building, automating, and optimizing the foundational cloud infrastructure and services that empower developers to innovate rapidly and securely. In this role, you will design, implement, and maintain scalable, resilient, and secure cloud platforms and solutions, embodying a "you build it, you run it" philosophy to ensure operational excellence. Key Responsibilities Cloud Platform Engineering: Design, implement, and manage core cloud infrastructure and services on AWS and GCP, focusing on compute, storage, networking, security, and identity management. Infrastructure as Code (IaC): Develop, maintain, and optimize cloud infrastructure using IaC tools such as Terraform, CloudFormation (AWS), and Deployment Manager (GCP). Create reusable modules and blueprints to standardize deployments. Automation & Orchestration: Automate the provisioning, configuration, and management of cloud resources and services. Develop scripts and tools (e.g., Python, Go, Bash) to streamline operational tasks and improve efficiency. Containerization & Orchestration: Implement and manage containerization technologies (Docker) and orchestration platforms (Kubernetes, e.g., Amazon EKS, Google Kubernetes Engine) to support cloud native application deployments. Site Reliability Engineering (SRE): Embrace a "you build it, you run it" mindset. Take ownership of the reliability, performance, and availability of the cloud platforms and services you build. Implement robust monitoring, alerting, and logging solutions, and participate in on call rotations to ensure rapid incident response and resolution. Security & Compliance: Implement and enforce cloud security best practices, ensuring compliance with policies, industry standards, and regulatory requirements. Configure IAM roles, network security groups, firewalls, and encryption. CI/CD Pipeline Integration: Collaborate with development teams to integrate cloud infrastructure provisioning and management into CI/CD pipelines, enabling automated and continuous delivery. Documentation & Knowledge Sharing: Create and maintain comprehensive documentation for cloud architectures, configurations, and operational procedures. Mentor junior engineers and contribute to knowledge sharing within the team. Required Qualifications & Skills Proven experience as a Cloud Engineer, DevOps Engineer, or SRE in a mid level or senior capacity. Strong hands on experience with public cloud platforms, specifically AWS and/or Google Cloud (GCP). Infrastructure as Code (IaC) Hands on Expertise: Demonstrable experience with the following: Programming Languages: Python and Go. Solid understanding and experience with containerization technologies (Docker) and orchestration platforms (Kubernetes). Experience implementing and managing CI/CD pipelines. Strong understanding of networking concepts (VPC, subnets, routing, firewalls), security best practices, and IAM in a cloud environment. Experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, Splunk, ELK stack, CloudWatch, Stackdriver). A strong commitment to Site Reliability Engineering (SRE) principles and practices, including operational ownership. Preferred Qualifications & Skills Public cloud provider certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer, Google Cloud Professional Cloud Architect, Google Cloud Professional DevOps Engineer). Experience with serverless computing (e.g., AWS Lambda, Google Cloud Functions). Familiarity with configuration management tools like Ansible. Experience with database services (relational and NoSQL) in the cloud. Knowledge of disaster recovery and business continuity strategies in a multi cloud environment. Experience with GitOps principles and tools.

Senior Software Engineer / SRE - Application Middleware London, GBR Posted today

Job Search Place Limited

Senior Software Engineer / SRE - Application Middleware Location London Business Area Engineering and CTO Description & Requirements Are you passionate about building high-performance systems that are fast, resilient, and operate at global scale? Join Bloomberg's Application Middleware SRE team, where you'll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal running smoothly for hundreds of thousands of users around the world. We're not your typical SRE team. We're embedded in a group that powers real-time connectivity, and we own systems where uptime isn't just important-it's essential to the global financial system. This is your opportunity to engineer resilience at scale, automate critical infrastructure, and shape reliability practices across one of the world's most powerful tech platforms. The Team We're the Site Reliability Engineering team within Bloomberg's Application Middleware group. Our mission: ensure that Bloomberg's core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency, including: Gateways: Secure, high-performance TCP/SSL entry points to our data centers HFN & NSTP: A global HTTP CDN and SOCKS5 proxy network delivering fast access from any geography Playlist Services: Dynamic path configuration systems optimizing user connectivity in real-time PGM Relays: Infrastructure for reliable multicast data delivery We use automation, observability, and software engineering to detect issues before they impact customers and reduce manual toil wherever we can. What You'll Do Build production grade software that powers Bloomberg's global infrastructure Design and implement scalable, fault tolerant systems with a focus on observability, performance, and automation Collaborate across engineering teams to introduce automated, self service operational workflows Conduct deep systems analysis and root cause investigations for complex, distributed systems Propose and prototype innovative approaches to reliability and risk mitigation Contribute to design docs, runbooks, and post incident reviews - clear communication is part of the job You'll Need to Have A degree in Computer Science, Engineering, Mathematics, or equivalent practical experience Strong software engineering skills in any high-level language (we mainly use Python and C++) A deep understanding of software system reliability and risk management - including how to identify potential points of failure and design mitigation strategies. A good understanding of data structures, algorithms, and system design Experience navigating and improving large, distributed codebases An ability to identify system risks and engineer around points of failure Clear written and verbal communication, including technical documentation and incident analysis We'd Love to See We are building a team with a breadth of expertise and value depth in any of the following areas: Systems Knowledge: A strong grasp of operating systems, fundamental networking protocols (TCP, UDP, multicast), or core database concepts as they apply to modern infrastructure. Cluster Management: Experience with deployments, staging, and configuration management. Direct experience with Argo and/or Kubernetes or other Pipeline Management Platforms is a significant advantage. Machine Management at Scale: Experience with capacity planning and automating the lifecycle of large machine fleets. System Observability and Monitoring: Deep understanding of SLIs/SLOs/SLAs, alerting, and building dashboards for complex systems. Reliability in Distributed Systems: Knowledge of fault tolerance and the unique challenges of network and node failure in distributed environments. Mentoring: Proven experience mentoring and growing junior Engineers Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email

12/06/2026

Full time

Senior Software Engineer / SRE - Application Middleware Location London Business Area Engineering and CTO Description & Requirements Are you passionate about building high-performance systems that are fast, resilient, and operate at global scale? Join Bloomberg's Application Middleware SRE team, where you'll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal running smoothly for hundreds of thousands of users around the world. We're not your typical SRE team. We're embedded in a group that powers real-time connectivity, and we own systems where uptime isn't just important-it's essential to the global financial system. This is your opportunity to engineer resilience at scale, automate critical infrastructure, and shape reliability practices across one of the world's most powerful tech platforms. The Team We're the Site Reliability Engineering team within Bloomberg's Application Middleware group. Our mission: ensure that Bloomberg's core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency, including: Gateways: Secure, high-performance TCP/SSL entry points to our data centers HFN & NSTP: A global HTTP CDN and SOCKS5 proxy network delivering fast access from any geography Playlist Services: Dynamic path configuration systems optimizing user connectivity in real-time PGM Relays: Infrastructure for reliable multicast data delivery We use automation, observability, and software engineering to detect issues before they impact customers and reduce manual toil wherever we can. What You'll Do Build production grade software that powers Bloomberg's global infrastructure Design and implement scalable, fault tolerant systems with a focus on observability, performance, and automation Collaborate across engineering teams to introduce automated, self service operational workflows Conduct deep systems analysis and root cause investigations for complex, distributed systems Propose and prototype innovative approaches to reliability and risk mitigation Contribute to design docs, runbooks, and post incident reviews - clear communication is part of the job You'll Need to Have A degree in Computer Science, Engineering, Mathematics, or equivalent practical experience Strong software engineering skills in any high-level language (we mainly use Python and C++) A deep understanding of software system reliability and risk management - including how to identify potential points of failure and design mitigation strategies. A good understanding of data structures, algorithms, and system design Experience navigating and improving large, distributed codebases An ability to identify system risks and engineer around points of failure Clear written and verbal communication, including technical documentation and incident analysis We'd Love to See We are building a team with a breadth of expertise and value depth in any of the following areas: Systems Knowledge: A strong grasp of operating systems, fundamental networking protocols (TCP, UDP, multicast), or core database concepts as they apply to modern infrastructure. Cluster Management: Experience with deployments, staging, and configuration management. Direct experience with Argo and/or Kubernetes or other Pipeline Management Platforms is a significant advantage. Machine Management at Scale: Experience with capacity planning and automating the lifecycle of large machine fleets. System Observability and Monitoring: Deep understanding of SLIs/SLOs/SLAs, alerting, and building dashboards for complex systems. Reliability in Distributed Systems: Knowledge of fault tolerance and the unique challenges of network and node failure in distributed environments. Mentoring: Proven experience mentoring and growing junior Engineers Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email

Senior Engineer- Platform Engineering

Janus Henderson U.S.

Requisition ID31464-Posted -London-Janus Henderson A career at Janus Henderson is more than a job, it's about investing in a brighter future together. Our Mission at Janus Henderson is to help clients define and achieve superior financial outcomes through differentiated insights, disciplined investments, and world class service. We will do this by protecting and growing our core business, amplifying our strengths and diversifying where we have the right. Our Values are key to driving our success, and are at the heart of everything we do: Clients Come First - Always Execution Supersedes Intention Together We Win Diversity Improves Results Truth Builds Trust If our mission, values, and purpose align with your own, we would love to hear from you! Your opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Design, build, and evolve enterprise platform services to support the Internal Developer Platform (IDP) and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g., Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. What to expect when you join our firm Hybrid working and reasonable accommodations Excellent Health and Wellbeing benefits including corporate membership to Wellhub Paid volunteer time to step away from your desk and into the community Support to grow through professional development courses, tuition/qualification reimbursement and more Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Must have skills Bachelor's or master's in computer science, Engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Strong experience in DevOps, Platform Engineering, or Infrastructure Engineering roles within enterprise environments Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from ADO to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g., Veracode), secrets management, and secure pipeline execution Expertise in artifact management (e.g., Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong knowledge of containerisation and cloud native technologies (Docker, Kubernetes) Experience with observability and monitoring frameworks (e.g., Azure Monitor, Prometheus, Grafana) and understanding of SRE practices (SLIs/SLOs, reliability engineering) Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Exposure to AI enabled developer tooling (e.g., GitHub Copilot, intelligent automation) and improving developer experience Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Experience building or contributing to an Internal Developer Platform (IDP) Exposure to AI enabled development (e.g., GitHub Copilot, automation workflows) Knowledge of FinOps, cost optimisation, and cloud governance Relevant cloud or Kubernetes certifications Supervisory responsibilities No Potential for growth Regular training Continuing education courses Cross functional collaboration You will be expected to understand the regulatory obligations of the firm and abide by the regulated entity requirements and JHI policies applicable for your role. At Janus Henderson Investors we're committed to an inclusive and supportive environment. We believe diversity improves results and we welcome applications from candidates from all backgrounds. Don't worry if you don't think you tick every box, we still want to hear from you! We understand everyone has different commitments and while we can't accommodate every flexible working request, we're happy to be asked about work flexibility and our hybrid working environment. If you need any reasonable accommodations during our recruitment process, please get in touch and let us know at . Annual Bonus Opportunity: Position may be eligible to receive an annual discretionary bonus award from the profit pool. The profit pool is funded based on Company profits. Individual bonuses are determined based on Company, department, team and individual performance. Benefits: Janus Henderson is committed to offering a comprehensive total rewards package to eligible employees that includes; competitive compensation, pension/retirement plans, and various health, wellbeing and lifestyle benefits. To learn more about our offerings please visit the Why Join Us section on the career pagehere . Janus Henderson Investors is an equal opportunity employer.All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks. Janus Henderson (including its subsidiaries) will not maintain existing or sponsor new industry registrations or licenses where not supported by an employee's job functions (as determined by Janus Henderson at its sole discretion). You should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities . click apply for full job details

12/06/2026

Full time

Requisition ID31464-Posted -London-Janus Henderson A career at Janus Henderson is more than a job, it's about investing in a brighter future together. Our Mission at Janus Henderson is to help clients define and achieve superior financial outcomes through differentiated insights, disciplined investments, and world class service. We will do this by protecting and growing our core business, amplifying our strengths and diversifying where we have the right. Our Values are key to driving our success, and are at the heart of everything we do: Clients Come First - Always Execution Supersedes Intention Together We Win Diversity Improves Results Truth Builds Trust If our mission, values, and purpose align with your own, we would love to hear from you! Your opportunity As a Senior Engineer within Platform Engineering, you will lead the design, build, and evolution of our Internal Developer Platform (IDP), enabling consistent, secure, and scalable software delivery across the enterprise. This role combines DevOps engineering, platform architecture, and developer experience enablement, with a strong focus on CI/CD transformation (Azure DevOps to GitHub), platform tooling, and data platform integration (Snowflake, Databricks). You will act as a subject matter expert (SME) across DevOps tooling, automation, and platform reliability-driving best practices, standardisation, and self service capabilities for engineering teams. Design, build, and evolve enterprise platform services to support the Internal Developer Platform (IDP) and enable scalable, secure, and self service engineering environments. Lead DevOps transformation initiatives, including migration from Azure DevOps to GitHub, and implement standardised CI/CD pipelines, reusable workflows, and release automation frameworks. Develop and maintain Infrastructure as Code (IaC) solutions using Terraform, Bicep, or similar tools to provision and manage cloud infrastructure. Deliver and optimise cloud native platforms on Azure (primary), ensuring scalability, resilience, and cost efficiency. Act as SME across DevOps tooling, including GitHub (Actions, Advanced Security), Nexus (artifact management), and Veracode (application security), embedding security controls into pipelines and platform services. Enable and support DevOps practices for core data platforms, including Snowflake and Databricks, covering environment provisioning, CI/CD integration, and access control models. Implement observability frameworks, including monitoring, logging, and alerting, and contribute to SRE practices such as SLIs/SLOs, reliability engineering, and incident management. Embed security and compliance standards into all platform components, ensuring auditability, policy enforcement, and alignment with enterprise governance requirements. Drive developer experience improvements through platform standardisation, self service tooling, templates, and AI enabled capabilities (e.g., Copilot, intelligent automation). Collaborate with Architecture, Cloud COE, SRE, and engineering teams to deliver consistent and governed platform capabilities across the organisation. Mentor junior engineers and contribute to technical leadership, standards definition, and engineering best practices. What to expect when you join our firm Hybrid working and reasonable accommodations Excellent Health and Wellbeing benefits including corporate membership to Wellhub Paid volunteer time to step away from your desk and into the community Support to grow through professional development courses, tuition/qualification reimbursement and more Maternal/paternal leave benefits and family services All employee events including networking opportunities and social activities Lunch allowance for use within our subsidised onsite canteen Must have skills Bachelor's or master's in computer science, Engineering, or related field 6+ years of experience in platform engineering, DevOps, or infrastructure roles Strong experience with cloud platforms (Azure preferred) Proficiency in containerisation (Docker, Kubernetes) Hands on with CI/CD tools (GitHub, Azure DevOps, GitLab CI) Experience with IaC tools (Terraform, Pulumi, Ansible) Strong experience in DevOps, Platform Engineering, or Infrastructure Engineering roles within enterprise environments Proven expertise in CI/CD pipeline design, automation, and standardisation using GitHub (Actions, Advanced Security) and Azure DevOps, including migration from ADO to GitHub Deep hands on experience with Infrastructure as Code (Terraform, Bicep or equivalent) and automated cloud provisioning Strong knowledge of Azure cloud platform, including compute, networking, identity, and security services Experience implementing DevSecOps practices, including integration of SAST/DAST tools (e.g., Veracode), secrets management, and secure pipeline execution Expertise in artifact management (e.g., Nexus) and modern DevOps tooling ecosystems Experience enabling Internal Developer Platform (IDP) capabilities, including self service provisioning, reusable templates, and platform standardisation Solid understanding of software development lifecycle (SDLC), release engineering, and environment lifecycle management Experience working with data platforms (Snowflake and/or Databricks), including CI/CD integration, environment provisioning, and access control models Strong knowledge of containerisation and cloud native technologies (Docker, Kubernetes) Experience with observability and monitoring frameworks (e.g., Azure Monitor, Prometheus, Grafana) and understanding of SRE practices (SLIs/SLOs, reliability engineering) Strong scripting/programming skills (Python, PowerShell, Bash) and automation mindset Good understanding of security, networking, RBAC, and Zero Trust principles in cloud and DevOps environments Exposure to AI enabled developer tooling (e.g., GitHub Copilot, intelligent automation) and improving developer experience Experience operating in regulated, enterprise scale environments with strong focus on governance, auditability, and compliance Strong communication, collaboration, and stakeholder management skills, with ability to act as a hands on SME and technical leader Nice to have skills Certifications in cloud technologies or Kubernetes. Experience building or contributing to an Internal Developer Platform (IDP) Familiarity with service mesh, API gateways, and platform observability tools Knowledge of FinOps, cost optimisation, and cloud governance Solid programming skills (Python, Go, or Java) Strong understanding of networking, security, and system architecture Experience building or contributing to an Internal Developer Platform (IDP) Exposure to AI enabled development (e.g., GitHub Copilot, automation workflows) Knowledge of FinOps, cost optimisation, and cloud governance Relevant cloud or Kubernetes certifications Supervisory responsibilities No Potential for growth Regular training Continuing education courses Cross functional collaboration You will be expected to understand the regulatory obligations of the firm and abide by the regulated entity requirements and JHI policies applicable for your role. At Janus Henderson Investors we're committed to an inclusive and supportive environment. We believe diversity improves results and we welcome applications from candidates from all backgrounds. Don't worry if you don't think you tick every box, we still want to hear from you! We understand everyone has different commitments and while we can't accommodate every flexible working request, we're happy to be asked about work flexibility and our hybrid working environment. If you need any reasonable accommodations during our recruitment process, please get in touch and let us know at . Annual Bonus Opportunity: Position may be eligible to receive an annual discretionary bonus award from the profit pool. The profit pool is funded based on Company profits. Individual bonuses are determined based on Company, department, team and individual performance. Benefits: Janus Henderson is committed to offering a comprehensive total rewards package to eligible employees that includes; competitive compensation, pension/retirement plans, and various health, wellbeing and lifestyle benefits. To learn more about our offerings please visit the Why Join Us section on the career pagehere . Janus Henderson Investors is an equal opportunity employer.All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. All applications are subject to background checks. Janus Henderson (including its subsidiaries) will not maintain existing or sponsor new industry registrations or licenses where not supported by an employee's job functions (as determined by Janus Henderson at its sole discretion). You should be willing to adhere to the provisions of our Investment Advisory Code of Ethics related to personal securities activities and other disclosure and certification requirements, including past political contributions and political activities . click apply for full job details

Senior Site Reliability Engineer

DWP Digital Blackpool, Lancashire

Site Reliability Engineer Pay up to £80,664 plus 28.97% employer pension contributions, hybrid working, flexible hours, and a truly great work life balance. DWP. Digital with Purpose. We have a fantastic opportunity to join our community of experts at DWP Digital as a Senior Site Reliability Engineer, within one of our SRE teams at the heart of Digital Transformation. We're using fresh ideas and leading-edge tech to build and maintain digital solutions that will be used by nearly every person in the UK, every day and at key moments in their lives. DWP is the UK's largest government department. We help people into work and make payments worth over £195bn a year to support and empower millions of people. The scale of what we do is extraordinary, and our purpose is unique. We'd love you to join us. What skills, knowledge and experience will you need? Demonstrable experience of reliability engineering including capacity and performance management through monitoring, logging, and alerting. Demonstrable experience of supporting a Live Service, including live operations, incident management, and continuous improvement. Demonstrable experience of developing and supporting cloud-based applications in AWS. Demonstrable experience of building and maintaining CI/CD pipelines. Demonstrable experience communicating effectively with stakeholders at multiple levels to provide feedback and support. Demonstrable experience using automation to remove toil with scripting, infrastructure, and configuration as code. You and your role Your day will be all about making sure our applications and infrastructure are reliable, secure and ready for scale. You'll work closely with development teams from the design stage, helping them build systems that follow best practices and meet department standards. You'll lead by example, mentoring other SREs, guiding teams and driving improvements. A big part of your role will be creating and maintaining detailed runbooks so incidents can be resolved quickly and efficiently. You'll also automate repetitive tasks, reduce toil and make sure monitoring is in place so issues are spotted before they become problems. When major incidents happen, you'll take the lead in coordinating the right people and restoring services fast. You'll manage error budgets, review high-priority incidents and push a culture of engineering ownership across the organisation. Details. Wages. Perks. Location: You'll join us in one of our brilliant digital hubs in Birmingham, Blackpool, Leeds, Manchester, Newcastle or Sheffield, whichever is most convenient for you. Hybrid Working: We work a hybrid model - you'll spend some time working at home and some time collaborating face to face in a hub. Pay: We offer competitive pay of up to £80,664 Pension: You'll get a brilliant civil service pension with employer contributions worth 28.97%, worth over £16,000 per year. Holidays: A generous leave package starting at 26 days rising to 31 days over time. You can also take up to 3 extra days off a month on flexi-time. You'll also get all the usual public holidays. We have a broad benefits package built around your work-life balance which includes: We have a broad benefits package built around your work-life balance which includes: Flexible working including flexible hours and flex-friendly policies Time off volunteering and charitable giving Bring your authentic self to work with 'I Can Be Me in DWP' Discounts and savings on shopping, fun days out and more Interest-free loans to buy a bike or a season ticket, so it's even easier for you to get to work and start making a difference Professional development, coaching, mentoring and career progression opportunities. And we have an award-winning environment and culture: DWP have been recognised as 2024 Diversity Employer of the Year at the Computing Women in Tech Excellence awards Diverse and Inclusive Leadership at Digital Leaders Awards 2024 Commended as Best Place to Work in Digital category in the Computing Digital Technology Leaders awards 2025 Recognised as one of the Best Public Sector Employers at 2025 Women In Tech Employer Awards Process: We know your time is valuable, so our application and selection process are just two stages: Apply: complete your application on Civil Service Jobs. There'll be full instructions when you click through. Interview: a single stage interview online. CLICK APPLY for more information and to start your application. JBRP1_UKTJ

12/06/2026

Full time

Site Reliability Engineer Pay up to £80,664 plus 28.97% employer pension contributions, hybrid working, flexible hours, and a truly great work life balance. DWP. Digital with Purpose. We have a fantastic opportunity to join our community of experts at DWP Digital as a Senior Site Reliability Engineer, within one of our SRE teams at the heart of Digital Transformation. We're using fresh ideas and leading-edge tech to build and maintain digital solutions that will be used by nearly every person in the UK, every day and at key moments in their lives. DWP is the UK's largest government department. We help people into work and make payments worth over £195bn a year to support and empower millions of people. The scale of what we do is extraordinary, and our purpose is unique. We'd love you to join us. What skills, knowledge and experience will you need? Demonstrable experience of reliability engineering including capacity and performance management through monitoring, logging, and alerting. Demonstrable experience of supporting a Live Service, including live operations, incident management, and continuous improvement. Demonstrable experience of developing and supporting cloud-based applications in AWS. Demonstrable experience of building and maintaining CI/CD pipelines. Demonstrable experience communicating effectively with stakeholders at multiple levels to provide feedback and support. Demonstrable experience using automation to remove toil with scripting, infrastructure, and configuration as code. You and your role Your day will be all about making sure our applications and infrastructure are reliable, secure and ready for scale. You'll work closely with development teams from the design stage, helping them build systems that follow best practices and meet department standards. You'll lead by example, mentoring other SREs, guiding teams and driving improvements. A big part of your role will be creating and maintaining detailed runbooks so incidents can be resolved quickly and efficiently. You'll also automate repetitive tasks, reduce toil and make sure monitoring is in place so issues are spotted before they become problems. When major incidents happen, you'll take the lead in coordinating the right people and restoring services fast. You'll manage error budgets, review high-priority incidents and push a culture of engineering ownership across the organisation. Details. Wages. Perks. Location: You'll join us in one of our brilliant digital hubs in Birmingham, Blackpool, Leeds, Manchester, Newcastle or Sheffield, whichever is most convenient for you. Hybrid Working: We work a hybrid model - you'll spend some time working at home and some time collaborating face to face in a hub. Pay: We offer competitive pay of up to £80,664 Pension: You'll get a brilliant civil service pension with employer contributions worth 28.97%, worth over £16,000 per year. Holidays: A generous leave package starting at 26 days rising to 31 days over time. You can also take up to 3 extra days off a month on flexi-time. You'll also get all the usual public holidays. We have a broad benefits package built around your work-life balance which includes: We have a broad benefits package built around your work-life balance which includes: Flexible working including flexible hours and flex-friendly policies Time off volunteering and charitable giving Bring your authentic self to work with 'I Can Be Me in DWP' Discounts and savings on shopping, fun days out and more Interest-free loans to buy a bike or a season ticket, so it's even easier for you to get to work and start making a difference Professional development, coaching, mentoring and career progression opportunities. And we have an award-winning environment and culture: DWP have been recognised as 2024 Diversity Employer of the Year at the Computing Women in Tech Excellence awards Diverse and Inclusive Leadership at Digital Leaders Awards 2024 Commended as Best Place to Work in Digital category in the Computing Digital Technology Leaders awards 2025 Recognised as one of the Best Public Sector Employers at 2025 Women In Tech Employer Awards Process: We know your time is valuable, so our application and selection process are just two stages: Apply: complete your application on Civil Service Jobs. There'll be full instructions when you click through. Interview: a single stage interview online. CLICK APPLY for more information and to start your application. JBRP1_UKTJ

Devops SRE

Test Triangle

Our Cloud Engineering team is seeking a seasoned and passionate Senior Cloud Engineer with deep hands on development and cloud engineering expertise. In this role, you will serve as a key technical contributor within a cloud focused engineering team, working on one of the Group's flagship initiatives-delivering a strategic platform on Google Cloud Platform (GCP) that enables the business to realise next generation services aligned with the Bank's long term vision. Key Responsibilities Architect, implement, and maintain highly resilient, scalable, and secure Kubernetes environments on GCP. Engineer and optimise Kubernetes infrastructure to support multitenant workloads, ensuring robust isolation, resource efficiency, and operational scalability. Design and enforce strong security controls, including OPA Gatekeeper policies, fine grained RBAC, mTLS enforcement, and secure service mesh configurations. Build, maintain, and enhance CI/CD pipelines enabling automated testing, seamless deployments, and continuous integration. Diagnose and resolve complex system level issues related to performance, scalability, networking, and automation. Collaborate with cross functional teams to deliver cloud native solutions aligned with engineering best practices and business goals. Required Skills & Experience Core Cloud & DevOps Competencies Extensive experience in DevOps or Site Reliability Engineering (SRE) roles across consumer or SaaS environments. Strong expertise in deploying and managing production grade Kubernetes clusters and containerised services. Hands on experience with Kubernetes (k8s) and Containers in live, high availability environments. Proven experience designing and implementing CI/CD pipelines for automated build, test, and deployment workflows. Proficiency in programming languages such as Python, Go, and Bash for automation and tooling. Demonstrated ability to take ownership of engineering initiatives and drive them to successful completion. Strong experience developing and managing Infrastructure as Code (IaC) using Terraform. Exposure to managing the full product lifecycle of cloud native core services. Hands on experience with GCP infrastructure and services. Deep understanding of cloud networking concepts such as Hybrid Connectivity, VPN, NAT, IPAM, DNS, and routing. Strong knowledge of cloud security including KMS, PKI, encryption standards, and least privilege access principles. Experience with Service Mesh technologies such as Istio or Anthos for secure service to service communication and observability. Competence in managing Istio telemetry, sidecar injection, and enforcing mTLS. Experience with Anthos Config Management, GitOps driven provisioning, and Backstage GitOps workflows. Understanding of shared Kubernetes services such as CoreDNS, cert manager, Dynatrace, Cloudability, and Infoblox. Familiarity with OPA Gatekeeper for policy enforcement and tenant isolation. Security, Observability & Performance Strong security mindset with a proven track record of designing secure, resilient cloud native systems. Experience implementing observability stacks including Prometheus, Dynatrace, and OpenTelemetry. Deep understanding of Linux internals, system performance tuning, and troubleshooting. Familiarity with Aqua Security for container runtime protection. CI/CD & Automation Tooling Hands on experience with Harness CI/CD for secure and automated deployment workflows. Professional Attributes Excellent verbal, written, and interpersonal communication skills with the ability to explain complex technical concepts clearly. Ability to work effectively in fast paced, dynamic environments and adapt quickly to change. Strong analytical and problem solving abilities with a focus on delivering high quality outcomes.

12/06/2026

Full time

Our Cloud Engineering team is seeking a seasoned and passionate Senior Cloud Engineer with deep hands on development and cloud engineering expertise. In this role, you will serve as a key technical contributor within a cloud focused engineering team, working on one of the Group's flagship initiatives-delivering a strategic platform on Google Cloud Platform (GCP) that enables the business to realise next generation services aligned with the Bank's long term vision. Key Responsibilities Architect, implement, and maintain highly resilient, scalable, and secure Kubernetes environments on GCP. Engineer and optimise Kubernetes infrastructure to support multitenant workloads, ensuring robust isolation, resource efficiency, and operational scalability. Design and enforce strong security controls, including OPA Gatekeeper policies, fine grained RBAC, mTLS enforcement, and secure service mesh configurations. Build, maintain, and enhance CI/CD pipelines enabling automated testing, seamless deployments, and continuous integration. Diagnose and resolve complex system level issues related to performance, scalability, networking, and automation. Collaborate with cross functional teams to deliver cloud native solutions aligned with engineering best practices and business goals. Required Skills & Experience Core Cloud & DevOps Competencies Extensive experience in DevOps or Site Reliability Engineering (SRE) roles across consumer or SaaS environments. Strong expertise in deploying and managing production grade Kubernetes clusters and containerised services. Hands on experience with Kubernetes (k8s) and Containers in live, high availability environments. Proven experience designing and implementing CI/CD pipelines for automated build, test, and deployment workflows. Proficiency in programming languages such as Python, Go, and Bash for automation and tooling. Demonstrated ability to take ownership of engineering initiatives and drive them to successful completion. Strong experience developing and managing Infrastructure as Code (IaC) using Terraform. Exposure to managing the full product lifecycle of cloud native core services. Hands on experience with GCP infrastructure and services. Deep understanding of cloud networking concepts such as Hybrid Connectivity, VPN, NAT, IPAM, DNS, and routing. Strong knowledge of cloud security including KMS, PKI, encryption standards, and least privilege access principles. Experience with Service Mesh technologies such as Istio or Anthos for secure service to service communication and observability. Competence in managing Istio telemetry, sidecar injection, and enforcing mTLS. Experience with Anthos Config Management, GitOps driven provisioning, and Backstage GitOps workflows. Understanding of shared Kubernetes services such as CoreDNS, cert manager, Dynatrace, Cloudability, and Infoblox. Familiarity with OPA Gatekeeper for policy enforcement and tenant isolation. Security, Observability & Performance Strong security mindset with a proven track record of designing secure, resilient cloud native systems. Experience implementing observability stacks including Prometheus, Dynatrace, and OpenTelemetry. Deep understanding of Linux internals, system performance tuning, and troubleshooting. Familiarity with Aqua Security for container runtime protection. CI/CD & Automation Tooling Hands on experience with Harness CI/CD for secure and automated deployment workflows. Professional Attributes Excellent verbal, written, and interpersonal communication skills with the ability to explain complex technical concepts clearly. Ability to work effectively in fast paced, dynamic environments and adapt quickly to change. Strong analytical and problem solving abilities with a focus on delivering high quality outcomes.

Senior Cloud SRE - Kubernetes, GCP & CI/CD

Test Triangle

A leading technology firm in Greater London is seeking a Senior Cloud Engineer to contribute technical expertise within a cloud engineering team. This role involves architecting scalable Kubernetes environments on Google Cloud Platform (GCP) and ensuring robust security measures. The ideal candidate will have extensive experience in DevOps or Site Reliability Engineering, deployment of production-grade Kubernetes clusters, and proficiency in CI/CD pipelines, alongside programming skills in Python, Go, and Bash. Join this dynamic team to drive innovative cloud solutions.

12/06/2026

Full time

A leading technology firm in Greater London is seeking a Senior Cloud Engineer to contribute technical expertise within a cloud engineering team. This role involves architecting scalable Kubernetes environments on Google Cloud Platform (GCP) and ensuring robust security measures. The ideal candidate will have extensive experience in DevOps or Site Reliability Engineering, deployment of production-grade Kubernetes clusters, and proficiency in CI/CD pipelines, alongside programming skills in Python, Go, and Bash. Join this dynamic team to drive innovative cloud solutions.

Staff Software Engineer, AI Reliability Engineering

Menlo Ventures

About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About The Role Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths every hop from the SDK through our network, API layers, serving infrastructure, and accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects. Reliability here is an emergent phenomenon that transcends any single team's boundaries, so someone has to zoom out and look at the whole picture. That's us and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most. Responsibilities Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity. Design and implement monitoring and observability systems across the token path. Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements. Support the reliability of safeguard model serving critical for both site reliability and Anthropic's safety commitments. You may be a good fit if you Have strong distributed systems, infrastructure, or reliability backgrounds we're looking for reliability-minded software engineers and SREs. Are curious and brave comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don't have deep expertise yet. Think holistically about how systems compose and where the seams are. Can build lasting relationships across teams our engagement model depends on being welcomed as teammates, not outsiders with opinions. Care about users and feel ownership over outcomes, even for systems you don't own. Have excellent communication and collaboration skills you'll be partnering across the entire company. Bring diverse experience the team's strength comes from people who've built product stacks, scaled databases, run massive distributed systems, and everything in between. Strong candidates may also Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems Have experience operating large-scale model serving or training infrastructure (>1000 GPUs). Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. The annual compensation range for this role is listed below. For sales roles, the range provided is the role's On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary £325,000-£390,000 GBP Logistics Education requirements We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links-visit directly for confirmed position openings. How We're Different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact - advancing our long-term goals of steerable, trustworthy AI - rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage Learn about our policy for using AI in our application process

11/06/2026

Full time

About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About The Role Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths every hop from the SDK through our network, API layers, serving infrastructure, and accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects. Reliability here is an emergent phenomenon that transcends any single team's boundaries, so someone has to zoom out and look at the whole picture. That's us and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most. Responsibilities Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity. Design and implement monitoring and observability systems across the token path. Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements. Support the reliability of safeguard model serving critical for both site reliability and Anthropic's safety commitments. You may be a good fit if you Have strong distributed systems, infrastructure, or reliability backgrounds we're looking for reliability-minded software engineers and SREs. Are curious and brave comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don't have deep expertise yet. Think holistically about how systems compose and where the seams are. Can build lasting relationships across teams our engagement model depends on being welcomed as teammates, not outsiders with opinions. Care about users and feel ownership over outcomes, even for systems you don't own. Have excellent communication and collaboration skills you'll be partnering across the entire company. Bring diverse experience the team's strength comes from people who've built product stacks, scaled databases, run massive distributed systems, and everything in between. Strong candidates may also Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems Have experience operating large-scale model serving or training infrastructure (>1000 GPUs). Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. The annual compensation range for this role is listed below. For sales roles, the range provided is the role's On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary £325,000-£390,000 GBP Logistics Education requirements We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links-visit directly for confirmed position openings. How We're Different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact - advancing our long-term goals of steerable, trustworthy AI - rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage Learn about our policy for using AI in our application process

Lithene Reliability Technician

Synthomer plc Stallingborough, Lincolnshire

Synthomer is a leading supplier of high-performance, highly specialised polymers and ingredients that play vital roles in key sectors such as coatings, construction, adhesives, and health and protection - growing markets that serve billions of end users worldwide. Headquartered in London, UK and publicly listed there since 1971, we employ c.3,900 employees across our 5 innovation centres of excellence and more than 29 manufacturing sites across Europe, North America and Asia. Around 20% of our sales volumes are from new and patent protected products.At our innovation centres of excellence in the UK, Germany, China, Malaysia and Ohio, USA we collaborate closely with our customers to develop new products and enhance existing ones tailored to their needs, with an increasing range of sustainability benefits. Since 2021, we have been proud holders of the London Stock Exchange Green Economy Mark, which recognises green technology businesses making a significant contribution to a more sustainable, low-carbon economy.About the RoleWe are looking for a proactive and detail-oriented Reliability Technician to join our Maintenance team. In this role, you will support the delivery of reliability and maintenance engineering activities predominantly on our Lithene Plant, helping to improve performance, safety, and operational efficiency. Working closely with the Site Mechanical Maintenance Engineer, you will carry out inspections, monitoring, and repairs, while also contributing to the development of reliability systems and best practices.Key ResponsibilitiesEnsure full compliance with health, safety, and environmental procedures, including oversight of contractors when requiredCarry out reliability inspections and monitoring.Servicing and Maintenance of mechanical equipment.Support and contribute to the development of reliability tours and systemsCollaborate with production, engineering, and technical teams to resolve process, SHE, and performance issuesIdentify and recommend improvements in inspection, testing, and reliability technologiesProvide regular updates and reports on reliability activities and progressAssist in the investigation of incidents and failures related to reliability issuesEnsure all risk assessments and method statements are followed and adhered to.What We're Looking ForQualificationsIdeally HNC (or equivalent) in an engineering discipline (Mechanical preferred, other disciplines considered)Experience & SkillsMinimum 2 years' experience in a relevant engineering roleStrong planning, reporting, and documentation skillsAbility to work independently and make informed decisionsExcellent communication and interpersonal skills, with the ability to engage across all levels of the businessA proactive mindset, with a passion for continuous improvement and reliability excellenceGood IT skills with experience of using Microsoft office.Knowledge of SAP S4 system would be advantageous but not essential.Why Join Us?Be part of a collaborative and forward-thinking engineering environmentOpportunities to contribute to site reliability improvements and innovationWork across departments and build strong cross-functional relationshipsDevelop your technical expertise within a supportive team cultureGlobal Benefits OverviewCompetitive, market-aligned compensationDiscretionary global bonus schemeDiscretionary Long-Term Incentive Plan (LTIP) - for senior positionsCompany car or car allowance - varies by region and roleHealthcare - tailored to regional locationsParental leave and family support - maternity, paternity, adoption (aligned with regional policies)Working options - flexibility where it matters, based on role and business needsLearning & development opportunities - training, online platforms, buddy/mentorship programs, Internal Synthomer University with L&D offersWellbeing support - employee assistance program (EAP), mental health resources, wellbeing initiativesRetirement / pension contributions - plans vary by countryCulture of Inclusion - where everyone can thrivePerformance culture, global reward & recognition programmes

11/06/2026

Full time

Synthomer is a leading supplier of high-performance, highly specialised polymers and ingredients that play vital roles in key sectors such as coatings, construction, adhesives, and health and protection - growing markets that serve billions of end users worldwide. Headquartered in London, UK and publicly listed there since 1971, we employ c.3,900 employees across our 5 innovation centres of excellence and more than 29 manufacturing sites across Europe, North America and Asia. Around 20% of our sales volumes are from new and patent protected products.At our innovation centres of excellence in the UK, Germany, China, Malaysia and Ohio, USA we collaborate closely with our customers to develop new products and enhance existing ones tailored to their needs, with an increasing range of sustainability benefits. Since 2021, we have been proud holders of the London Stock Exchange Green Economy Mark, which recognises green technology businesses making a significant contribution to a more sustainable, low-carbon economy.About the RoleWe are looking for a proactive and detail-oriented Reliability Technician to join our Maintenance team. In this role, you will support the delivery of reliability and maintenance engineering activities predominantly on our Lithene Plant, helping to improve performance, safety, and operational efficiency. Working closely with the Site Mechanical Maintenance Engineer, you will carry out inspections, monitoring, and repairs, while also contributing to the development of reliability systems and best practices.Key ResponsibilitiesEnsure full compliance with health, safety, and environmental procedures, including oversight of contractors when requiredCarry out reliability inspections and monitoring.Servicing and Maintenance of mechanical equipment.Support and contribute to the development of reliability tours and systemsCollaborate with production, engineering, and technical teams to resolve process, SHE, and performance issuesIdentify and recommend improvements in inspection, testing, and reliability technologiesProvide regular updates and reports on reliability activities and progressAssist in the investigation of incidents and failures related to reliability issuesEnsure all risk assessments and method statements are followed and adhered to.What We're Looking ForQualificationsIdeally HNC (or equivalent) in an engineering discipline (Mechanical preferred, other disciplines considered)Experience & SkillsMinimum 2 years' experience in a relevant engineering roleStrong planning, reporting, and documentation skillsAbility to work independently and make informed decisionsExcellent communication and interpersonal skills, with the ability to engage across all levels of the businessA proactive mindset, with a passion for continuous improvement and reliability excellenceGood IT skills with experience of using Microsoft office.Knowledge of SAP S4 system would be advantageous but not essential.Why Join Us?Be part of a collaborative and forward-thinking engineering environmentOpportunities to contribute to site reliability improvements and innovationWork across departments and build strong cross-functional relationshipsDevelop your technical expertise within a supportive team cultureGlobal Benefits OverviewCompetitive, market-aligned compensationDiscretionary global bonus schemeDiscretionary Long-Term Incentive Plan (LTIP) - for senior positionsCompany car or car allowance - varies by region and roleHealthcare - tailored to regional locationsParental leave and family support - maternity, paternity, adoption (aligned with regional policies)Working options - flexibility where it matters, based on role and business needsLearning & development opportunities - training, online platforms, buddy/mentorship programs, Internal Synthomer University with L&D offersWellbeing support - employee assistance program (EAP), mental health resources, wellbeing initiativesRetirement / pension contributions - plans vary by countryCulture of Inclusion - where everyone can thrivePerformance culture, global reward & recognition programmes

Remote Lead Site Reliability Engineer - CSRE Consulting

live nation

Live Nation is looking for a Lead Site Reliability Engineer to guide reliability consulting efforts across various teams within Ticketmaster. This role involves establishing collaboration among stakeholders, delivering measurable reliability improvements, and mentoring other engineers. Candidates should possess deep knowledge of SRE principles, Kubernetes, AWS, and have excellent communication skills. The position is full-time and offers the chance to work in a dynamic environment supporting live events.

11/06/2026

Full time

Live Nation is looking for a Lead Site Reliability Engineer to guide reliability consulting efforts across various teams within Ticketmaster. This role involves establishing collaboration among stakeholders, delivering measurable reliability improvements, and mentoring other engineers. Candidates should possess deep knowledge of SRE principles, Kubernetes, AWS, and have excellent communication skills. The position is full-time and offers the chance to work in a dynamic environment supporting live events.

Lead Site Reliability Developer - CSRE Consulting

live nation

Job Summary:JOB DESCRIPTION - LEAD SITE RELIABILITY ENGINEER - CSRE CONSULTINGLocation: London, United KingdomDivision: Ticketmaster UK LimitedLine Manager: Engagement Lead, CSRE ConsultingContract Terms: Permanent, 40 hours per weekTHE TEAMA career at Ticketmaster will challenge and engage you. We support the creators and producers of shows and live performances, while connecting more passionate fans to these events. The pace here is fast, the atmosphere is fun and a passion for live events is a common thread that ties us together. As a global and growing business, we can truly offer a world of opportunities to expand your skills and develop your career. Visit any of our offices and you'll find a diverse mix of passionate employees, helping fans around the globe connect with the artists, teams and events they love. It truly is a unique and rewarding environment.You will be part of the Central SRE Consulting team, which partners with product and platform engineering teams throughout Ticketmaster to improve reliability, resilience, and sustainable engineering practices. We often deliver through work that combine hands-on delivery with capability building so teams can sustain improvements independently. The team's remit is to increase adoption and maturity of SRE principles across Ticketmaster and ensure our services are appropriately scaled and reliable.We support teams across the globe, with many peers in the USA. Most of your teammates operate in UTC/UTC+1, and we are adding people in other time zones.THE JOBAs a Lead Site Reliability Engineer in CSRE Consulting, you will lead reliability consulting work across multiple teams or a domain, aligning stakeholders on priorities and driving delivery of sustained improvements. You will translate reliability goals into sequenced workstreams, align dependencies, and ensure teams can maintain the mechanisms after you move on.You will mentor other consultants, codify reusable patterns, and influence shared platforms so reliability improvements propagate beyond any single team or engagement.WHAT YOU WILL BE DOINGLead consulting work from discovery through delivery by aligning stakeholders on priorities, sequencing work, and communicating measurable outcomes.Establish working cadence and facilitate decision forums to surface risks, map dependencies, and drive clear ownership and timelines.Align product, platform, and engineering stakeholders on reliability targets and trade-offs using SLOs and error budgets.Partner regularly with Engineering Managers, product managers, Staff and Principal engineers, and platform leads to keep dependencies, decisions, and delivery aligned.Identify systemic risks across shared dependencies and coordinate remediation across multiple teams to reduce recurring incidents.Drive change adoption by embedding reliability mechanisms into partner team routines such as planning, PRRs, and on-call practices.Design and implement reusable reliability mechanisms, templates, and tooling that can be adopted across teams.Establish and evolve production readiness review practices with partner teams to improve launch quality and change safety.Drive observability strategy for partner domains by improving signal quality, alerting philosophy, and operational dashboards.Lead complex incident investigations and ensure learnings translate into durable fixes with clear owners and verification.Lead reliability-focused design and code reviews and guide teams toward simpler, safer architectures.Mentor Senior engineers and other consultants through pairing, reviews, and structured coaching to multiply impact.Partner with internal platform engineering to influence roadmaps and deliver shared capabilities that accelerate SRE adoption.Improve CSRE Consulting playbooks and operating practices based on repeated patterns observed across teams.WHAT YOU NEED TO KNOW (or TECHNICAL SKILLS)Deep practical understanding of SRE principles, including SLO governance and error budget policy in practice.Proven ability to lead cross-team technical work and influence without authority.Strong experience designing and troubleshooting distributed systems with cross-service failure modes.Experience shaping observability and alerting strategy and improving operational signal quality.Strong Kubernetes and AWS experience, including governance and cost trade-offs.Ability to design reliability automation and tooling that is reusable and adopted by multiple teams.Experience leading production readiness and resilience practices, including DR validation and controlled testing.Strong software engineering fundamentals with the ability to deliver and review high-quality changes in enterprise codebases.Advanced incident analysis skills focused on systemic risk reduction and organizational learning.Excellent communication skills, including exec-ready summaries and clear technical diagrams.YOU (BEHAVIOURAL SKILLS)Lead with service and humility, creating clarity and momentum without relying on authority.Build relationships across teams and functions, and set clear expectations for how you partner and deliver.Facilitate alignment by framing problems, surfacing trade-offs, and running working sessions that end in decisions.Persuade with evidence and empathy, adapting your narrative for engineers, product, and senior stakeholders.Coach and mentor deliberately, helping others grow in reliability thinking and consulting craft.Maintain psychological safety while raising standards, giving direct feedback with respect.Stay persistent and patient in complex organizations, keeping work moving despite slow dependencies.Hold ambiguity comfortably and turn messy inputs into clear plans, options, and next steps.Favor simple mechanisms that scale adoption, not bespoke one-offs that require you to maintain them.Operate at a sustainable pace and discourage hero culture by designing systems that do not need it.Take pride in quality, including documentation and decision records that help teams sustain the work.Remain adaptable, switching between hands-on debugging, stakeholder management, and planning as needed.LIFE AT TICKETMASTERWe are proud to be a part of Live Nation Entertainment, the world's largest live entertainment company.Our vision at Ticketmaster is to connect people around the world to the live events they love. As the world's largest ticket marketplace and the leading global provider of enterprise tools and services for the live entertainment business, we are uniquely positioned to successfully deliver on that vision.We do it all with an intense passion for Live and an inspiring and diverse culture driven by accessible leaders, attentive managers, and enthusiastic teams. If you're passionate about live entertainment like we are, and you want to work at a company dedicated to helping millions of fans experience it, we want to hear from you.Our work is guided by our values:Reliability - We understand that fans and clients rely on us to power their live event experiences, and we rely on each other to make it happen.Teamwork - We believe individual achievement pales in comparison to the level of success that can be achieved by a teamIntegrity - We are committed to the highest moral and ethical standards on behalf of the countless partners and stakeholders we representBelonging - We are committed to building a culture in which all people can be their authentic selves, have an equal voice and opportunities to thriveEQUAL OPPORTUNITIESWe are passionate and committed to our people and go beyond the rhetoric of diversity and inclusion. You will be working in an inclusive environment and be encouraged to bring your whole self to work. We will do all that we can to help you successfully balance your work and homelife. As a growing business we will encourage you to develop your professional and personal aspirations, enjoy new experiences, and learn from the talented people you will be working with. It's talent that matters to us and we encourage applications from people irrespective of their gender, race, sexual orientation, religion, age, disability status or caring responsibilities. Nation Entertainment will never request payment or equipment purchases as part of the hiring process. Recruiters will only contact candidates from official Live Nation or affiliated brand email domains.

11/06/2026

Full time

Job Summary:JOB DESCRIPTION - LEAD SITE RELIABILITY ENGINEER - CSRE CONSULTINGLocation: London, United KingdomDivision: Ticketmaster UK LimitedLine Manager: Engagement Lead, CSRE ConsultingContract Terms: Permanent, 40 hours per weekTHE TEAMA career at Ticketmaster will challenge and engage you. We support the creators and producers of shows and live performances, while connecting more passionate fans to these events. The pace here is fast, the atmosphere is fun and a passion for live events is a common thread that ties us together. As a global and growing business, we can truly offer a world of opportunities to expand your skills and develop your career. Visit any of our offices and you'll find a diverse mix of passionate employees, helping fans around the globe connect with the artists, teams and events they love. It truly is a unique and rewarding environment.You will be part of the Central SRE Consulting team, which partners with product and platform engineering teams throughout Ticketmaster to improve reliability, resilience, and sustainable engineering practices. We often deliver through work that combine hands-on delivery with capability building so teams can sustain improvements independently. The team's remit is to increase adoption and maturity of SRE principles across Ticketmaster and ensure our services are appropriately scaled and reliable.We support teams across the globe, with many peers in the USA. Most of your teammates operate in UTC/UTC+1, and we are adding people in other time zones.THE JOBAs a Lead Site Reliability Engineer in CSRE Consulting, you will lead reliability consulting work across multiple teams or a domain, aligning stakeholders on priorities and driving delivery of sustained improvements. You will translate reliability goals into sequenced workstreams, align dependencies, and ensure teams can maintain the mechanisms after you move on.You will mentor other consultants, codify reusable patterns, and influence shared platforms so reliability improvements propagate beyond any single team or engagement.WHAT YOU WILL BE DOINGLead consulting work from discovery through delivery by aligning stakeholders on priorities, sequencing work, and communicating measurable outcomes.Establish working cadence and facilitate decision forums to surface risks, map dependencies, and drive clear ownership and timelines.Align product, platform, and engineering stakeholders on reliability targets and trade-offs using SLOs and error budgets.Partner regularly with Engineering Managers, product managers, Staff and Principal engineers, and platform leads to keep dependencies, decisions, and delivery aligned.Identify systemic risks across shared dependencies and coordinate remediation across multiple teams to reduce recurring incidents.Drive change adoption by embedding reliability mechanisms into partner team routines such as planning, PRRs, and on-call practices.Design and implement reusable reliability mechanisms, templates, and tooling that can be adopted across teams.Establish and evolve production readiness review practices with partner teams to improve launch quality and change safety.Drive observability strategy for partner domains by improving signal quality, alerting philosophy, and operational dashboards.Lead complex incident investigations and ensure learnings translate into durable fixes with clear owners and verification.Lead reliability-focused design and code reviews and guide teams toward simpler, safer architectures.Mentor Senior engineers and other consultants through pairing, reviews, and structured coaching to multiply impact.Partner with internal platform engineering to influence roadmaps and deliver shared capabilities that accelerate SRE adoption.Improve CSRE Consulting playbooks and operating practices based on repeated patterns observed across teams.WHAT YOU NEED TO KNOW (or TECHNICAL SKILLS)Deep practical understanding of SRE principles, including SLO governance and error budget policy in practice.Proven ability to lead cross-team technical work and influence without authority.Strong experience designing and troubleshooting distributed systems with cross-service failure modes.Experience shaping observability and alerting strategy and improving operational signal quality.Strong Kubernetes and AWS experience, including governance and cost trade-offs.Ability to design reliability automation and tooling that is reusable and adopted by multiple teams.Experience leading production readiness and resilience practices, including DR validation and controlled testing.Strong software engineering fundamentals with the ability to deliver and review high-quality changes in enterprise codebases.Advanced incident analysis skills focused on systemic risk reduction and organizational learning.Excellent communication skills, including exec-ready summaries and clear technical diagrams.YOU (BEHAVIOURAL SKILLS)Lead with service and humility, creating clarity and momentum without relying on authority.Build relationships across teams and functions, and set clear expectations for how you partner and deliver.Facilitate alignment by framing problems, surfacing trade-offs, and running working sessions that end in decisions.Persuade with evidence and empathy, adapting your narrative for engineers, product, and senior stakeholders.Coach and mentor deliberately, helping others grow in reliability thinking and consulting craft.Maintain psychological safety while raising standards, giving direct feedback with respect.Stay persistent and patient in complex organizations, keeping work moving despite slow dependencies.Hold ambiguity comfortably and turn messy inputs into clear plans, options, and next steps.Favor simple mechanisms that scale adoption, not bespoke one-offs that require you to maintain them.Operate at a sustainable pace and discourage hero culture by designing systems that do not need it.Take pride in quality, including documentation and decision records that help teams sustain the work.Remain adaptable, switching between hands-on debugging, stakeholder management, and planning as needed.LIFE AT TICKETMASTERWe are proud to be a part of Live Nation Entertainment, the world's largest live entertainment company.Our vision at Ticketmaster is to connect people around the world to the live events they love. As the world's largest ticket marketplace and the leading global provider of enterprise tools and services for the live entertainment business, we are uniquely positioned to successfully deliver on that vision.We do it all with an intense passion for Live and an inspiring and diverse culture driven by accessible leaders, attentive managers, and enthusiastic teams. If you're passionate about live entertainment like we are, and you want to work at a company dedicated to helping millions of fans experience it, we want to hear from you.Our work is guided by our values:Reliability - We understand that fans and clients rely on us to power their live event experiences, and we rely on each other to make it happen.Teamwork - We believe individual achievement pales in comparison to the level of success that can be achieved by a teamIntegrity - We are committed to the highest moral and ethical standards on behalf of the countless partners and stakeholders we representBelonging - We are committed to building a culture in which all people can be their authentic selves, have an equal voice and opportunities to thriveEQUAL OPPORTUNITIESWe are passionate and committed to our people and go beyond the rhetoric of diversity and inclusion. You will be working in an inclusive environment and be encouraged to bring your whole self to work. We will do all that we can to help you successfully balance your work and homelife. As a growing business we will encourage you to develop your professional and personal aspirations, enjoy new experiences, and learn from the talented people you will be working with. It's talent that matters to us and we encourage applications from people irrespective of their gender, race, sexual orientation, religion, age, disability status or caring responsibilities. Nation Entertainment will never request payment or equipment purchases as part of the hiring process. Recruiters will only contact candidates from official Live Nation or affiliated brand email domains.

AWS Platform Architect

Oscar Technology

Platform Architect Hybrid £70,000-£100,000 About the Role: We're partnering with a growing SaaS business to hire a senior Platform Architect to own the design, security, reliability, and operational management of their AWS platform and internal IT function. This is a hands-on leadership role in a lean organisation where you'll shape cloud architecture, modernise a legacy platform into a cloud-native environment, and provide senior oversight across platform engineering, security, SRE, CI/CD, and operational IT. Key Responsibilities: Own the AWS platform architecture and modernisation roadmap, including migration from a Java monolith to microservices on EKS. Define standards for containers, runtime environments, observability, tenancy, security, and infrastructure automation. Lead SRE practices including SLI/SLOs, incident management, DR/BCP planning, post-mortems, and operational resilience. Own platform security, secure SDLC, CI/CD pipelines, IaC, and software supply chain governance. Drive developer productivity through automation, self-service tooling, and platform standardisation. Provide senior oversight of IT operations including service desk governance, endpoint management, onboarding/offboarding, patching, ITAM, and MSP/vendor management. Act as a senior escalation point for critical incidents, outages, and operational issues. About You: Experience within a platform, infrastructure, or software engineering within SaaS environments. Strong AWS expertise including EKS, IAM, networking, KMS, RDS, and multi-account architecture. Hands-on Kubernetes, CI/CD, Terraform, and cloud security experience. Strong understanding of SRE, observability, incident response, and disaster recovery. Experience operating within regulated environments such as ISO 27001, SOC 2, or GxP. Comfortable balancing strategic leadership with hands-on operational delivery. AWS Solutions Architect - Professional certification required. CKA or CKS certification highly desirable. Platform Architect Hybrid £70,000-£100,000 Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.

11/06/2026

Full time

Platform Architect Hybrid £70,000-£100,000 About the Role: We're partnering with a growing SaaS business to hire a senior Platform Architect to own the design, security, reliability, and operational management of their AWS platform and internal IT function. This is a hands-on leadership role in a lean organisation where you'll shape cloud architecture, modernise a legacy platform into a cloud-native environment, and provide senior oversight across platform engineering, security, SRE, CI/CD, and operational IT. Key Responsibilities: Own the AWS platform architecture and modernisation roadmap, including migration from a Java monolith to microservices on EKS. Define standards for containers, runtime environments, observability, tenancy, security, and infrastructure automation. Lead SRE practices including SLI/SLOs, incident management, DR/BCP planning, post-mortems, and operational resilience. Own platform security, secure SDLC, CI/CD pipelines, IaC, and software supply chain governance. Drive developer productivity through automation, self-service tooling, and platform standardisation. Provide senior oversight of IT operations including service desk governance, endpoint management, onboarding/offboarding, patching, ITAM, and MSP/vendor management. Act as a senior escalation point for critical incidents, outages, and operational issues. About You: Experience within a platform, infrastructure, or software engineering within SaaS environments. Strong AWS expertise including EKS, IAM, networking, KMS, RDS, and multi-account architecture. Hands-on Kubernetes, CI/CD, Terraform, and cloud security experience. Strong understanding of SRE, observability, incident response, and disaster recovery. Experience operating within regulated environments such as ISO 27001, SOC 2, or GxP. Comfortable balancing strategic leadership with hands-on operational delivery. AWS Solutions Architect - Professional certification required. CKA or CKS certification highly desirable. Platform Architect Hybrid £70,000-£100,000 Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.

Remote Fintech SRE - Build Scalable, Automated Infra

Ant-Tech

Ant-Tech is looking for a talented Site Reliability Engineer to join its fintech team, focusing on building high-performance infrastructure for global financial institutions. This remote-first role offers a competitive salary between £90,000 to £110,000 plus shares and benefits. Key responsibilities include automating server provisioning, maintaining CI/CD pipelines, and ensuring network reliability. Successful candidates will have experience with Ansible, strong Linux knowledge, and excellent vendor management skills.

11/06/2026

Full time

Ant-Tech is looking for a talented Site Reliability Engineer to join its fintech team, focusing on building high-performance infrastructure for global financial institutions. This remote-first role offers a competitive salary between £90,000 to £110,000 plus shares and benefits. Key responsibilities include automating server provisioning, maintaining CI/CD pipelines, and ensuring network reliability. Successful candidates will have experience with Ansible, strong Linux knowledge, and excellent vendor management skills.

Telemetry and Observability Engineer

Oscar Technology

Telemetry and Observability Engineer (Inside IR-35) London / Hybrid (3 days on-site) I'm working with a global organisation building next-generation cloud-native and observability platforms at enterprise scale, and they're looking for a strong Senior Observability Engineer to join the team. This is a high-impact role focused on scalable telemetry pipelines, monitoring, alerting, reliability engineering, and embedding observability across complex distributed systems and Kubernetes environments. Key experience needed: Observability / SRE / Platform Engineering background OpenTelemetry , Prometheus, Grafana, Splunk, Elastic, Loki, or Jaeger Kubernetes, microservices, and cloud-native platforms Python, Go, or Java Terraform, Helm, and IaC SLIs, SLOs, alerting, and reliability engineering Financial services or regulated environment experience is a bonus. Great opportunity to work with cutting-edge technology, influence engineering standards, and help shape observability at enterprise scale. Interested? Drop me a message or send over your CV. Oscar Associates (UK) Limited is acting as an Employment Business in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.

11/06/2026

Contractor

Telemetry and Observability Engineer (Inside IR-35) London / Hybrid (3 days on-site) I'm working with a global organisation building next-generation cloud-native and observability platforms at enterprise scale, and they're looking for a strong Senior Observability Engineer to join the team. This is a high-impact role focused on scalable telemetry pipelines, monitoring, alerting, reliability engineering, and embedding observability across complex distributed systems and Kubernetes environments. Key experience needed: Observability / SRE / Platform Engineering background OpenTelemetry , Prometheus, Grafana, Splunk, Elastic, Loki, or Jaeger Kubernetes, microservices, and cloud-native platforms Python, Go, or Java Terraform, Helm, and IaC SLIs, SLOs, alerting, and reliability engineering Financial services or regulated environment experience is a bonus. Great opportunity to work with cutting-edge technology, influence engineering standards, and help shape observability at enterprise scale. Interested? Drop me a message or send over your CV. Oscar Associates (UK) Limited is acting as an Employment Business in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.

Senior Site Reliability Engineer

DWP Digital Manchester, Lancashire

Site Reliability Engineer Pay up to £80,664 plus 28.97% employer pension contributions, hybrid working, flexible hours, and a truly great work life balance. DWP. Digital with Purpose. We have a fantastic opportunity to join our community of experts at DWP Digital as a Senior Site Reliability Engineer, within one of our SRE teams at the heart of Digital Transformation. We're using fresh ideas and leading-edge tech to build and maintain digital solutions that will be used by nearly every person in the UK, every day and at key moments in their lives. DWP is the UK's largest government department. We help people into work and make payments worth over £195bn a year to support and empower millions of people. The scale of what we do is extraordinary, and our purpose is unique. We'd love you to join us. What skills, knowledge and experience will you need? Demonstrable experience of reliability engineering including capacity and performance management through monitoring, logging, and alerting. Demonstrable experience of supporting a Live Service, including live operations, incident management, and continuous improvement. Demonstrable experience of developing and supporting cloud-based applications in AWS. Demonstrable experience of building and maintaining CI/CD pipelines. Demonstrable experience communicating effectively with stakeholders at multiple levels to provide feedback and support. Demonstrable experience using automation to remove toil with scripting, infrastructure, and configuration as code. You and your role Your day will be all about making sure our applications and infrastructure are reliable, secure and ready for scale. You'll work closely with development teams from the design stage, helping them build systems that follow best practices and meet department standards. You'll lead by example, mentoring other SREs, guiding teams and driving improvements. A big part of your role will be creating and maintaining detailed runbooks so incidents can be resolved quickly and efficiently. You'll also automate repetitive tasks, reduce toil and make sure monitoring is in place so issues are spotted before they become problems. When major incidents happen, you'll take the lead in coordinating the right people and restoring services fast. You'll manage error budgets, review high-priority incidents and push a culture of engineering ownership across the organisation. Details. Wages. Perks. Location: You'll join us in one of our brilliant digital hubs in Birmingham, Blackpool, Leeds, Manchester, Newcastle or Sheffield, whichever is most convenient for you. Hybrid Working: We work a hybrid model - you'll spend some time working at home and some time collaborating face to face in a hub. Pay: We offer competitive pay of up to £80,664 Pension: You'll get a brilliant civil service pension with employer contributions worth 28.97%, worth over £16,000 per year. Holidays: A generous leave package starting at 26 days rising to 31 days over time. You can also take up to 3 extra days off a month on flexi-time. You'll also get all the usual public holidays. We have a broad benefits package built around your work-life balance which includes: We have a broad benefits package built around your work-life balance which includes: Flexible working including flexible hours and flex-friendly policies Time off volunteering and charitable giving Bring your authentic self to work with 'I Can Be Me in DWP' Discounts and savings on shopping, fun days out and more Interest-free loans to buy a bike or a season ticket, so it's even easier for you to get to work and start making a difference Professional development, coaching, mentoring and career progression opportunities. And we have an award-winning environment and culture: DWP have been recognised as 2024 Diversity Employer of the Year at the Computing Women in Tech Excellence awards Diverse and Inclusive Leadership at Digital Leaders Awards 2024 Commended as Best Place to Work in Digital category in the Computing Digital Technology Leaders awards 2025 Recognised as one of the Best Public Sector Employers at 2025 Women In Tech Employer Awards Process: We know your time is valuable, so our application and selection process are just two stages: Apply: complete your application on Civil Service Jobs. There'll be full instructions when you click through. Interview: a single stage interview online. CLICK APPLY for more information and to start your application. JBRP1_UKTJ

11/06/2026

Full time

Site Reliability Engineer Pay up to £80,664 plus 28.97% employer pension contributions, hybrid working, flexible hours, and a truly great work life balance. DWP. Digital with Purpose. We have a fantastic opportunity to join our community of experts at DWP Digital as a Senior Site Reliability Engineer, within one of our SRE teams at the heart of Digital Transformation. We're using fresh ideas and leading-edge tech to build and maintain digital solutions that will be used by nearly every person in the UK, every day and at key moments in their lives. DWP is the UK's largest government department. We help people into work and make payments worth over £195bn a year to support and empower millions of people. The scale of what we do is extraordinary, and our purpose is unique. We'd love you to join us. What skills, knowledge and experience will you need? Demonstrable experience of reliability engineering including capacity and performance management through monitoring, logging, and alerting. Demonstrable experience of supporting a Live Service, including live operations, incident management, and continuous improvement. Demonstrable experience of developing and supporting cloud-based applications in AWS. Demonstrable experience of building and maintaining CI/CD pipelines. Demonstrable experience communicating effectively with stakeholders at multiple levels to provide feedback and support. Demonstrable experience using automation to remove toil with scripting, infrastructure, and configuration as code. You and your role Your day will be all about making sure our applications and infrastructure are reliable, secure and ready for scale. You'll work closely with development teams from the design stage, helping them build systems that follow best practices and meet department standards. You'll lead by example, mentoring other SREs, guiding teams and driving improvements. A big part of your role will be creating and maintaining detailed runbooks so incidents can be resolved quickly and efficiently. You'll also automate repetitive tasks, reduce toil and make sure monitoring is in place so issues are spotted before they become problems. When major incidents happen, you'll take the lead in coordinating the right people and restoring services fast. You'll manage error budgets, review high-priority incidents and push a culture of engineering ownership across the organisation. Details. Wages. Perks. Location: You'll join us in one of our brilliant digital hubs in Birmingham, Blackpool, Leeds, Manchester, Newcastle or Sheffield, whichever is most convenient for you. Hybrid Working: We work a hybrid model - you'll spend some time working at home and some time collaborating face to face in a hub. Pay: We offer competitive pay of up to £80,664 Pension: You'll get a brilliant civil service pension with employer contributions worth 28.97%, worth over £16,000 per year. Holidays: A generous leave package starting at 26 days rising to 31 days over time. You can also take up to 3 extra days off a month on flexi-time. You'll also get all the usual public holidays. We have a broad benefits package built around your work-life balance which includes: We have a broad benefits package built around your work-life balance which includes: Flexible working including flexible hours and flex-friendly policies Time off volunteering and charitable giving Bring your authentic self to work with 'I Can Be Me in DWP' Discounts and savings on shopping, fun days out and more Interest-free loans to buy a bike or a season ticket, so it's even easier for you to get to work and start making a difference Professional development, coaching, mentoring and career progression opportunities. And we have an award-winning environment and culture: DWP have been recognised as 2024 Diversity Employer of the Year at the Computing Women in Tech Excellence awards Diverse and Inclusive Leadership at Digital Leaders Awards 2024 Commended as Best Place to Work in Digital category in the Computing Digital Technology Leaders awards 2025 Recognised as one of the Best Public Sector Employers at 2025 Women In Tech Employer Awards Process: We know your time is valuable, so our application and selection process are just two stages: Apply: complete your application on Civil Service Jobs. There'll be full instructions when you click through. Interview: a single stage interview online. CLICK APPLY for more information and to start your application. JBRP1_UKTJ

Site Reliability Engineer

Huxley Associates City, London

Site Reliability Engineer (Cloud & Automation) - London - 2 Days on Site per week. A leading global financial services organisation is seeking a Site Reliability Engineer (SRE) to drive reliability, automation, and performance across its cloud-hosted platforms. The Opportunity This role sits within a high-performing Platform Operations function, acting as a central point of expertise for SRE methodologies and automation. You will play a key role in improving system resilience, scalability, and operational excellence across a complex, regulated environment. Key Responsibilities Lead the implementation of SRE best practices across cloud infrastructure Drive improvements in observability, alerting, and capacity planning (SLA / SLO / SLI) Identify and reduce operational toil through automation and remediation frameworks Build and enhance GitOps and Infrastructure-as-Code capabilities (e.g. Terraform, Ansible) Develop and review production-grade code to support automation initiatives Support incident management and on-call processes, ensuring production stability Contribute to post-incident reviews, embedding SRE principles to reduce risk Requirements Demonstrable experience in SRE or infrastructure operations within cloud environments (AWS / GCP) Strong scripting skills (Python, Ansible, or PowerShell) Experience with Infrastructure as Code and GitOps methodologies Hands-on knowledge of observability / APM tools (e.g. Grafana, Datadog, Dynatrace) Proven experience managing incidents, root cause analysis, and on-call support Understanding of SLA/SLO/SLI frameworks and reliability engineering principles Desirable Background in software development Experience working within regulated financial services environments Familiarity with ITIL and enterprise service management frameworks Relevant certifications (e.g. AWS, Terraform) Why Apply Opportunity to shape cloud reliability strategy in a large-scale environment Work with modern tooling across automation, DevOps, and SRE practices Strong emphasis on engineering excellence and continuous improvement Competitive compensation and long-term career progression To find out more about Huxley, please visit (url removed) Huxley, a trading division of SThree Partnership LLP is acting as an Employment Business in relation to this vacancy Registered office 8 Bishopsgate, London, EC2N 4BQ, United Kingdom Partnership Number OC(phone number removed) England and Wales

11/06/2026

Full time

Site Reliability Engineer (Cloud & Automation) - London - 2 Days on Site per week. A leading global financial services organisation is seeking a Site Reliability Engineer (SRE) to drive reliability, automation, and performance across its cloud-hosted platforms. The Opportunity This role sits within a high-performing Platform Operations function, acting as a central point of expertise for SRE methodologies and automation. You will play a key role in improving system resilience, scalability, and operational excellence across a complex, regulated environment. Key Responsibilities Lead the implementation of SRE best practices across cloud infrastructure Drive improvements in observability, alerting, and capacity planning (SLA / SLO / SLI) Identify and reduce operational toil through automation and remediation frameworks Build and enhance GitOps and Infrastructure-as-Code capabilities (e.g. Terraform, Ansible) Develop and review production-grade code to support automation initiatives Support incident management and on-call processes, ensuring production stability Contribute to post-incident reviews, embedding SRE principles to reduce risk Requirements Demonstrable experience in SRE or infrastructure operations within cloud environments (AWS / GCP) Strong scripting skills (Python, Ansible, or PowerShell) Experience with Infrastructure as Code and GitOps methodologies Hands-on knowledge of observability / APM tools (e.g. Grafana, Datadog, Dynatrace) Proven experience managing incidents, root cause analysis, and on-call support Understanding of SLA/SLO/SLI frameworks and reliability engineering principles Desirable Background in software development Experience working within regulated financial services environments Familiarity with ITIL and enterprise service management frameworks Relevant certifications (e.g. AWS, Terraform) Why Apply Opportunity to shape cloud reliability strategy in a large-scale environment Work with modern tooling across automation, DevOps, and SRE practices Strong emphasis on engineering excellence and continuous improvement Competitive compensation and long-term career progression To find out more about Huxley, please visit (url removed) Huxley, a trading division of SThree Partnership LLP is acting as an Employment Business in relation to this vacancy Registered office 8 Bishopsgate, London, EC2N 4BQ, United Kingdom Partnership Number OC(phone number removed) England and Wales

Senior Site Reliability Engineer - UK

black.ai

Location London, Melbourne Employment Type Full time Department Engineering Who We Are Healthcare needs a better rhythm: one that keeps care continuous and deeply human. Heidi is building an AI Care Partner that works alongside clinicians to make that possible. We're a team of doctors, engineers, designers, researchers, and creatives building tools that help clinicians stay focused on what matters most: their patients. In just 18 months, Heidi has given back more than 18 million hours to healthcare professionals - supporting 73 million patient visits in 116 countries. Today, more than two million patient visits each week are powered by Heidi worldwide. Backed by nearly $100 million in funding, we're growing in the US, UK, Canada, and Europe, partnering with leading health systems including the NHS, Beth Israel Lahey Health, and Monash Health. The Role This role sits in the core Platform/SRE team that owns production. You'll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi's platform. We're open to candidates who are strong mid-level SREs ready to take on more ownership, as well as senior SREs who enjoy being hands-on in operations. The role is intentionally ops-heavy and focused on keeping real systems healthy in production. What you'll do Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements. Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases. Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals. Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer. Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change. Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time. Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations. What we're looking for 3-6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles. Experience supporting production systems and participating in on-call rotations. Comfortable debugging live systems under pressure. Experience operating cloud infrastructure (AWS preferred). Working knowledge of Kubernetes and containerised workloads. Infrastructure as Code experience (Terraform or similar). Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc). Scripting or automation experience (Python, Bash, or similar). Nice to have: Experience leading incidents or mentoring others during on-call. Experience in regulated or security-sensitive environments. Familiarity with databases, queues, and caches in production. Interest in reliability practices such as SLOs, error budgets, and capacity planning. How We Work We own production: The Platform/SRE team is responsible for reliability and incident response. Incidents are blameless: We focus on learning and improving systems, not assigning fault. Practical over perfect: We prioritise improvements that reduce real operational pain. Calm under pressure: Clear thinking and communication matter during incidents. What do we believe in? Heidi builds for the future of healthcare, not just the next quarter, and our goals are ambitious because the world's health demands it. We believe in progress built through precision, pace, and ownership. Live Forever - Every release moves care forward: measured, safe, and built to last. Data guides us, but patients define the truth that matters. Practice Ownership - Decisions follow logic and proof, not hierarchy. Exceptional care demands exceptional standards in our work, our thinking, and our character. Small Cuts Heal Faster - Stability earns trust, speed delivers impact. Progress is about learning fast without breaking what people depend on. Make others better - Feedback is direct, kindness is constant, and excellence lifts everyone. Our success is measured by collective growth, not individual output. Our mission is clear: expand the world's capacity to care, and do it without losing the humanity that makes care worth delivering. Why you should join Heidi Real product momentum. We're not trying to generate interest, we're channeling it. Equity from day one. When Heidi wins, you win. You'll share directly in the success you help create. Unmatched impact. Play a pivotal role in defining and scaling customer success at a critical growth moment - all while working on a product that delivers tangible value to clinicians and patients every day. Work alongside world-class talent. Join a team of operators and builders who've scaled unicorns. Global reach. Help shape our international expansion as we bring Heidi to key international markets. Growth and balance. Enjoy a personal development budget, work from anywhere for a month, dedicated wellness days, and your birthday off to recharge. Flexibility that works. A hybrid environment, with 3 days in the office. Heidi's commitment to Diversity, Equity and Inclusion Heidi is dedicated to creating an equitable, inclusive, and supportive work environment that brings people together from diverse backgrounds, experiences, and perspectives. Our strength is in our differences. We're proud to be an equal opportunity employer and are proud to welcome all applicants as we're committed to promoting a culture of opportunity for all.

10/06/2026

Full time

Location London, Melbourne Employment Type Full time Department Engineering Who We Are Healthcare needs a better rhythm: one that keeps care continuous and deeply human. Heidi is building an AI Care Partner that works alongside clinicians to make that possible. We're a team of doctors, engineers, designers, researchers, and creatives building tools that help clinicians stay focused on what matters most: their patients. In just 18 months, Heidi has given back more than 18 million hours to healthcare professionals - supporting 73 million patient visits in 116 countries. Today, more than two million patient visits each week are powered by Heidi worldwide. Backed by nearly $100 million in funding, we're growing in the US, UK, Canada, and Europe, partnering with leading health systems including the NHS, Beth Israel Lahey Health, and Monash Health. The Role This role sits in the core Platform/SRE team that owns production. You'll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi's platform. We're open to candidates who are strong mid-level SREs ready to take on more ownership, as well as senior SREs who enjoy being hands-on in operations. The role is intentionally ops-heavy and focused on keeping real systems healthy in production. What you'll do Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements. Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases. Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals. Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer. Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change. Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time. Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations. What we're looking for 3-6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles. Experience supporting production systems and participating in on-call rotations. Comfortable debugging live systems under pressure. Experience operating cloud infrastructure (AWS preferred). Working knowledge of Kubernetes and containerised workloads. Infrastructure as Code experience (Terraform or similar). Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc). Scripting or automation experience (Python, Bash, or similar). Nice to have: Experience leading incidents or mentoring others during on-call. Experience in regulated or security-sensitive environments. Familiarity with databases, queues, and caches in production. Interest in reliability practices such as SLOs, error budgets, and capacity planning. How We Work We own production: The Platform/SRE team is responsible for reliability and incident response. Incidents are blameless: We focus on learning and improving systems, not assigning fault. Practical over perfect: We prioritise improvements that reduce real operational pain. Calm under pressure: Clear thinking and communication matter during incidents. What do we believe in? Heidi builds for the future of healthcare, not just the next quarter, and our goals are ambitious because the world's health demands it. We believe in progress built through precision, pace, and ownership. Live Forever - Every release moves care forward: measured, safe, and built to last. Data guides us, but patients define the truth that matters. Practice Ownership - Decisions follow logic and proof, not hierarchy. Exceptional care demands exceptional standards in our work, our thinking, and our character. Small Cuts Heal Faster - Stability earns trust, speed delivers impact. Progress is about learning fast without breaking what people depend on. Make others better - Feedback is direct, kindness is constant, and excellence lifts everyone. Our success is measured by collective growth, not individual output. Our mission is clear: expand the world's capacity to care, and do it without losing the humanity that makes care worth delivering. Why you should join Heidi Real product momentum. We're not trying to generate interest, we're channeling it. Equity from day one. When Heidi wins, you win. You'll share directly in the success you help create. Unmatched impact. Play a pivotal role in defining and scaling customer success at a critical growth moment - all while working on a product that delivers tangible value to clinicians and patients every day. Work alongside world-class talent. Join a team of operators and builders who've scaled unicorns. Global reach. Help shape our international expansion as we bring Heidi to key international markets. Growth and balance. Enjoy a personal development budget, work from anywhere for a month, dedicated wellness days, and your birthday off to recharge. Flexibility that works. A hybrid environment, with 3 days in the office. Heidi's commitment to Diversity, Equity and Inclusion Heidi is dedicated to creating an equitable, inclusive, and supportive work environment that brings people together from diverse backgrounds, experiences, and perspectives. Our strength is in our differences. We're proud to be an equal opportunity employer and are proud to welcome all applicants as we're committed to promoting a culture of opportunity for all.

Azure Site Reliability Engineer

GCS Recruitment Knutsford, Cheshire

Azure Site Reliability Engineer (SRE) Location: Glasgow / Knutsford (Hybrid- 2 days a week in office) Team: 6 UK / 5 India Environment: Part of a wider multi cloud engineering organisation (Azure, AWS, GCP) Growth: Significant technical development opportunities across cloud engineering, automation, and platform build Role Overview We are looking for a hands on Azure SRE who can design, build, and automate enterprise grade Azure Landing Zones and cloud governance frameworks. This is not an application development role - it is a platform engineering role focused on controls, policies, guardrails, IaC, and DevOps automation. You will work as part of a global SRE function, collaborating with engineers in the UK and India to deliver secure, scalable, policy driven Azure environments. Key Responsibilities Azure Landing Zone engineering - build, enhance, and automate enterprise landing zones Azure governance & controls - develop Azure Policies, RBAC, blueprints, and compliance guardrails Infrastructure as Code delivery - author and maintain IaC using Bicep or Terraform DevOps engineering - implement CI/CD pipelines, Git workflows, and automated deployments Cloud platform reliability - ensure availability, performance, and operational excellence Cross team collaboration - work with multi cloud engineering teams to drive standards and best practices Required Skills & Experience Azure expertise across policies, governance, networking, compute, identity, and platform services IaC development using Bicep or Terraform (must be able to write code, not just run templates) CI/CD pipelines using Azure DevOps or GitHub Actions Git repositories - strong understanding of branching, PRs, code reviews DevOps mindset - automation first, iterative delivery, shift left engineering Strong scripting or development capability (PowerShell, Python, or similar) Experience working in distributed teams (UK + offshore) GCS is acting as an Employment Agency in relation to this vacancy.

10/06/2026

Full time

Azure Site Reliability Engineer (SRE) Location: Glasgow / Knutsford (Hybrid- 2 days a week in office) Team: 6 UK / 5 India Environment: Part of a wider multi cloud engineering organisation (Azure, AWS, GCP) Growth: Significant technical development opportunities across cloud engineering, automation, and platform build Role Overview We are looking for a hands on Azure SRE who can design, build, and automate enterprise grade Azure Landing Zones and cloud governance frameworks. This is not an application development role - it is a platform engineering role focused on controls, policies, guardrails, IaC, and DevOps automation. You will work as part of a global SRE function, collaborating with engineers in the UK and India to deliver secure, scalable, policy driven Azure environments. Key Responsibilities Azure Landing Zone engineering - build, enhance, and automate enterprise landing zones Azure governance & controls - develop Azure Policies, RBAC, blueprints, and compliance guardrails Infrastructure as Code delivery - author and maintain IaC using Bicep or Terraform DevOps engineering - implement CI/CD pipelines, Git workflows, and automated deployments Cloud platform reliability - ensure availability, performance, and operational excellence Cross team collaboration - work with multi cloud engineering teams to drive standards and best practices Required Skills & Experience Azure expertise across policies, governance, networking, compute, identity, and platform services IaC development using Bicep or Terraform (must be able to write code, not just run templates) CI/CD pipelines using Azure DevOps or GitHub Actions Git repositories - strong understanding of branching, PRs, code reviews DevOps mindset - automation first, iterative delivery, shift left engineering Strong scripting or development capability (PowerShell, Python, or similar) Experience working in distributed teams (UK + offshore) GCS is acting as an Employment Agency in relation to this vacancy.

Azure Platform Engineer: Landing Zones, Governance & IaC

GCS Recruitment Knutsford, Cheshire

GCS Recruitment is seeking an Azure Site Reliability Engineer (SRE) for a hybrid role based in Knutsford. This position involves designing, building, and automating enterprise-grade Azure Landing Zones, with a significant emphasis on governance frameworks and platform engineering. Ideal candidates will have strong Azure expertise and IaC development skills using Bicep or Terraform, along with experience in CI/CD pipelines. Join a global SRE team focusing on delivering secure, scalable Azure environments.

10/06/2026

Full time

GCS Recruitment is seeking an Azure Site Reliability Engineer (SRE) for a hybrid role based in Knutsford. This position involves designing, building, and automating enterprise-grade Azure Landing Zones, with a significant emphasis on governance frameworks and platform engineering. Ideal candidates will have strong Azure expertise and IaC development skills using Bicep or Terraform, along with experience in CI/CD pipelines. Join a global SRE team focusing on delivering secure, scalable Azure environments.

AI Platform SRE Engineer - Onsite in Sheffield

Experis - ManpowerGroup Sheffield, Yorkshire

Experis - ManpowerGroup is looking for a Platform/SRE Engineer based in Sheffield, required to work onsite 3 days a week. This role involves managing production operations for their AI helpdesk platform, focusing on deployment practices, observability, and system reliability. Candidates must have strong DevOps and SRE experience, familiarity with Docker and Kubernetes, and experience with AWS. The position offers a competitive rate of £525 per day via Umbrella.

09/06/2026

Full time

Experis - ManpowerGroup is looking for a Platform/SRE Engineer based in Sheffield, required to work onsite 3 days a week. This role involves managing production operations for their AI helpdesk platform, focusing on deployment practices, observability, and system reliability. Candidates must have strong DevOps and SRE experience, familiarity with Docker and Kubernetes, and experience with AWS. The position offers a competitive rate of £525 per day via Umbrella.

Senior Site Reliability Engineer

Trades Workforce Solutions Wokingham, Berkshire

Senior Site Reliability Engineer (SRE) Location: Wokingham (2 days/week onsite) Type: Inside IR35 Rate: Up to £70.00 per hour (DOE) We're looking for a Senior Site Reliability Engineer (SRE) to lead efforts in maintaining the reliability, performance, and scalability of mission-critical platforms and services. This role is ideal for someone who thrives at the intersection of software engineering, infrastructure, automation, and incident response. You'll be instrumental in defining and implementing the standards and systems that keep applications running smoothly across cloud and hybrid environments-including OpenShift clusters. What You'll Be Responsible For As a Senior SRE, you will: Ensure high availability, performance, and latency of critical systems across Azure, AWS, and OpenShift. Design and implement robust observability systems (logging, monitoring, alerting) to detect and resolve issues proactively. Lead and evolve incident management processes-runbooks, comms, postmortems, and root cause analysis. Define and monitor SLIs, SLOs, and error budgets to balance innovation with stability. Automate manual processes through infrastructure-as-code, scripting, and modern CI/CD pipelines. Mentor engineering teams on best practices for deployment, reliability, scalability, and incident preparedness. Support and scale OpenShift-based containerized applications, including upgrade strategies, patching, and workload optimization. Core Responsibilities Operations & Incident Management Act as the senior escalation point for outages and critical incidents. Lead post-incident reviews and implement long-term remediation plans. Communicate platform health and risk posture to stakeholders at all levels. Engineering & Automation Build and improve CI/CD pipelines using tools like Azure DevOps, GitHub Actions, Jenkins, and GitLab. Design scalable, fault-tolerant infrastructure with IaC tools (Terraform, Bicep). Create internal tools and automation to accelerate development and reduce operational toil. Strategic & Advisory Architect cloud and container infrastructure, with a focus on OpenShift, Kubernetes, and hybrid deployments. Collaborate with engineering, architecture, and security teams to embed reliability into the SDLC. Promote advanced deployment strategies (blue-green, canary, rolling updates) and rollback readiness. Drive a culture of reliability, observability, and operational excellence across engineering teams. Technical Environment Hands-on experience with many of the following is expected: Cloud & Containers: Azure, AWS, OpenShift, Kubernetes, Docker, App Services, IaaS (EC2, VMs) CI/CD & Automation: Terraform, Bicep, Azure DevOps, Jenkins, GitHub Actions, GitLab Observability: Prometheus, Grafana, Datadog, ELK, Splunk, Application Insights, CloudWatch Languages & Scripting: Python, C#, Bash, PowerShell Networking: DNS, SSL/TLS, load balancing, WAF, proxies, CDN, Azure App Gateway Databases: MSSQL, PostgreSQL, MongoDB, CosmosDB, DynamoDB OS & Systems: Windows, Linux, Nginx, IIS Ideal Candidate Profile 5+ years of experience in SRE, DevOps, or production engineering roles. Expertise operating in high-availability, fast-paced production environments. Solid engineering foundation with experience reading and writing production code. Hands-on experience deploying, supporting, and scaling OpenShift environments. Proven track record of leading incident responses and improving system reliability. Strong collaboration and mentoring abilities across infrastructure, development, and security teams. What You'll Bring Ability to balance operational risk with engineering velocity. Strong communication skills across technical and non-technical audiences. A passion for automating everything and eliminating manual work. A mindset of ownership, continuous improvement, and technical leadership. Ready to make reliability your legacy? If you're a senior SRE with OpenShift experience and a drive to solve complex operational challenges, we'd love to hear from you.

09/06/2026

Full time

Senior Site Reliability Engineer (SRE) Location: Wokingham (2 days/week onsite) Type: Inside IR35 Rate: Up to £70.00 per hour (DOE) We're looking for a Senior Site Reliability Engineer (SRE) to lead efforts in maintaining the reliability, performance, and scalability of mission-critical platforms and services. This role is ideal for someone who thrives at the intersection of software engineering, infrastructure, automation, and incident response. You'll be instrumental in defining and implementing the standards and systems that keep applications running smoothly across cloud and hybrid environments-including OpenShift clusters. What You'll Be Responsible For As a Senior SRE, you will: Ensure high availability, performance, and latency of critical systems across Azure, AWS, and OpenShift. Design and implement robust observability systems (logging, monitoring, alerting) to detect and resolve issues proactively. Lead and evolve incident management processes-runbooks, comms, postmortems, and root cause analysis. Define and monitor SLIs, SLOs, and error budgets to balance innovation with stability. Automate manual processes through infrastructure-as-code, scripting, and modern CI/CD pipelines. Mentor engineering teams on best practices for deployment, reliability, scalability, and incident preparedness. Support and scale OpenShift-based containerized applications, including upgrade strategies, patching, and workload optimization. Core Responsibilities Operations & Incident Management Act as the senior escalation point for outages and critical incidents. Lead post-incident reviews and implement long-term remediation plans. Communicate platform health and risk posture to stakeholders at all levels. Engineering & Automation Build and improve CI/CD pipelines using tools like Azure DevOps, GitHub Actions, Jenkins, and GitLab. Design scalable, fault-tolerant infrastructure with IaC tools (Terraform, Bicep). Create internal tools and automation to accelerate development and reduce operational toil. Strategic & Advisory Architect cloud and container infrastructure, with a focus on OpenShift, Kubernetes, and hybrid deployments. Collaborate with engineering, architecture, and security teams to embed reliability into the SDLC. Promote advanced deployment strategies (blue-green, canary, rolling updates) and rollback readiness. Drive a culture of reliability, observability, and operational excellence across engineering teams. Technical Environment Hands-on experience with many of the following is expected: Cloud & Containers: Azure, AWS, OpenShift, Kubernetes, Docker, App Services, IaaS (EC2, VMs) CI/CD & Automation: Terraform, Bicep, Azure DevOps, Jenkins, GitHub Actions, GitLab Observability: Prometheus, Grafana, Datadog, ELK, Splunk, Application Insights, CloudWatch Languages & Scripting: Python, C#, Bash, PowerShell Networking: DNS, SSL/TLS, load balancing, WAF, proxies, CDN, Azure App Gateway Databases: MSSQL, PostgreSQL, MongoDB, CosmosDB, DynamoDB OS & Systems: Windows, Linux, Nginx, IIS Ideal Candidate Profile 5+ years of experience in SRE, DevOps, or production engineering roles. Expertise operating in high-availability, fast-paced production environments. Solid engineering foundation with experience reading and writing production code. Hands-on experience deploying, supporting, and scaling OpenShift environments. Proven track record of leading incident responses and improving system reliability. Strong collaboration and mentoring abilities across infrastructure, development, and security teams. What You'll Bring Ability to balance operational risk with engineering velocity. Strong communication skills across technical and non-technical audiences. A passion for automating everything and eliminating manual work. A mindset of ownership, continuous improvement, and technical leadership. Ready to make reliability your legacy? If you're a senior SRE with OpenShift experience and a drive to solve complex operational challenges, we'd love to hear from you.

264 jobs found

Modal Window