Browse IT Jobs | IT Job Board

Public Cloud Senior Infrastructure Engineer

Lloyds Bank plc Halifax, Yorkshire

Salary: £72,702-£80,780 Location: Halifax or Leeds Workstyle: Hybrid (at least two days a week/on average 40% on site) Job Description Summary Senior Public Cloud Infrastructure Engineer - responsible for building and operating large scale Kubernetes platforms in a regulated environment, focusing on automation, security, observability and reliability. Key Responsibilities Platform Engineering (GKE) Design, build and operate scalable, resilient GKE environments Engineer multi tenant Kubernetes clusters with strong workload isolation and platform guardrails Support shared and dedicated cluster patterns, including tenant onboarding Improve platform performance under production conditions (e.g. scaling, storage, node pressure) Automation & DevOps Build automation first infrastructure using Terraform, CI/CD and GitOps Simplify cluster lifecycle management (provisioning, upgrades, add ons) Develop self service platform capabilities to improve developer experience Reliability & SRE Apply SRE practices to platform operations Support incident response, monitoring, observability and continuous improvement Diagnose issues across performance, scaling, storage and automation Contribute to a 24x7 on call rotation Security & Compliance Implement policy as code controls (e.g. OPA Gatekeeper, RBAC, workload identity) Support audit, compliance and risk mitigation activities Ensure platforms are secure, supportable and aligned to control frameworks Networking & Platform Services Work with service mesh and ingress/egress patterns (e.g. Istio, Anthos, Cloud Service Mesh) Support cloud networking (VPCs, DNS, NAT, VPN, routing, connectivity) Integrate shared platform services (cert manager, observability, cost tooling) Essential Skills & Experience Strong experience in Platform Engineering, DevOps or SRE Proven delivery of production Kubernetes platforms, ideally GKE Experience with multi tenant platform environments (shared clusters, isolation, scaling) Deep understanding of Kubernetes internals (scheduling, storage, node lifecycle, upgrades) Strong knowledge of Google Cloud Platform (GCP), including GKE IAM / Workload Identity, Networking (VPC, DNS, NAT, ingress/egress), Storage patterns Experience with Infrastructure as Code (Terraform) using modular design Strong experience with CI/CD pipelines and GitOps workflows Coding/scripting (Python, Go or Bash) Strong troubleshooting and problem solving skills Ability to own and deliver complex engineering outcomes Desirable Experience Advanced GKE operational expertise (node pools, upgrades, scaling, security boundaries) Experience operating platforms at scale (multi cluster, multi tenant) Service mesh experience (e.g. Istio, mTLS, traffic management) Experience with policy as code (OPA Gatekeeper, Config Sync) Experience in regulated or compliance heavy environments Strong focus on SRE and platform reliability improvements Nice to have: Experience with Backstage or self service platform tooling Familiarity with Anthos Config Management / Config Sync Exposure to tools such as CoreDNS, cert manager, Dynatrace, Cloudability, Infoblox Understanding of platform scaling challenges (ephemeral storage, workload density, resilience) Experience working with cloud providers on platform architecture Benefits Performance bonus Generous pension Flexible benefits package Private healthcare 30 days holiday + bank holidays Share schemes Flexible Working Options Hybrid working, job sharing, and other flexible working options are supported.

23/06/2026

Full time

Salary: £72,702-£80,780 Location: Halifax or Leeds Workstyle: Hybrid (at least two days a week/on average 40% on site) Job Description Summary Senior Public Cloud Infrastructure Engineer - responsible for building and operating large scale Kubernetes platforms in a regulated environment, focusing on automation, security, observability and reliability. Key Responsibilities Platform Engineering (GKE) Design, build and operate scalable, resilient GKE environments Engineer multi tenant Kubernetes clusters with strong workload isolation and platform guardrails Support shared and dedicated cluster patterns, including tenant onboarding Improve platform performance under production conditions (e.g. scaling, storage, node pressure) Automation & DevOps Build automation first infrastructure using Terraform, CI/CD and GitOps Simplify cluster lifecycle management (provisioning, upgrades, add ons) Develop self service platform capabilities to improve developer experience Reliability & SRE Apply SRE practices to platform operations Support incident response, monitoring, observability and continuous improvement Diagnose issues across performance, scaling, storage and automation Contribute to a 24x7 on call rotation Security & Compliance Implement policy as code controls (e.g. OPA Gatekeeper, RBAC, workload identity) Support audit, compliance and risk mitigation activities Ensure platforms are secure, supportable and aligned to control frameworks Networking & Platform Services Work with service mesh and ingress/egress patterns (e.g. Istio, Anthos, Cloud Service Mesh) Support cloud networking (VPCs, DNS, NAT, VPN, routing, connectivity) Integrate shared platform services (cert manager, observability, cost tooling) Essential Skills & Experience Strong experience in Platform Engineering, DevOps or SRE Proven delivery of production Kubernetes platforms, ideally GKE Experience with multi tenant platform environments (shared clusters, isolation, scaling) Deep understanding of Kubernetes internals (scheduling, storage, node lifecycle, upgrades) Strong knowledge of Google Cloud Platform (GCP), including GKE IAM / Workload Identity, Networking (VPC, DNS, NAT, ingress/egress), Storage patterns Experience with Infrastructure as Code (Terraform) using modular design Strong experience with CI/CD pipelines and GitOps workflows Coding/scripting (Python, Go or Bash) Strong troubleshooting and problem solving skills Ability to own and deliver complex engineering outcomes Desirable Experience Advanced GKE operational expertise (node pools, upgrades, scaling, security boundaries) Experience operating platforms at scale (multi cluster, multi tenant) Service mesh experience (e.g. Istio, mTLS, traffic management) Experience with policy as code (OPA Gatekeeper, Config Sync) Experience in regulated or compliance heavy environments Strong focus on SRE and platform reliability improvements Nice to have: Experience with Backstage or self service platform tooling Familiarity with Anthos Config Management / Config Sync Exposure to tools such as CoreDNS, cert manager, Dynatrace, Cloudability, Infoblox Understanding of platform scaling challenges (ephemeral storage, workload density, resilience) Experience working with cloud providers on platform architecture Benefits Performance bonus Generous pension Flexible benefits package Private healthcare 30 days holiday + bank holidays Share schemes Flexible Working Options Hybrid working, job sharing, and other flexible working options are supported.

Applications Support Manager - Vice President

Citi City, Belfast

Engineer the future of global finance. At Citi, our Tech team doesn't just support finance - we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning and flexibility with potential hybrid work opportunities. Help solve real-world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech. Key Responsibilities Hands On Operational Leadership: Directly manage, mentor, and develop a technical support team while actively engaging in day to day operational tasks, incident response, and problem resolution for the Instant Payments application. Direct Operational Management: Take ownership of ensuring the operational stability and performance of the Instant Payments application across diverse cloud environments (Citi's Enterprise Cloud and Public Cloud), including active monitoring and system checks. Technical Implementation & Optimization: Lead the implementation, configuration, and continuous optimization of observability (monitoring, logging, tracing tools), resiliency (designing and implementing auto healing and retry mechanisms), and recoverability (executing disaster recovery strategies) solutions for the cloud native Instant Payments application. This includes writing and maintaining scripts for these functions. Service Level Execution & Improvement: Contribute to improving service levels by implementing operational efficiencies, performing incident management, problem management, and enhancing knowledge sharing practices for the Instant Payments application. Application Onboarding & Technical Guidance: Actively participate in defining and implementing application onboarding guidelines and standards, and provide direct technical guidance to development teams on stability and supportability improvements for the Instant Payments application. Incident & Problem Resolution: Lead and execute troubleshooting efforts for complex technical issues, perform in depth root cause analysis, and implement permanent fixes for the Instant Payments application. Cost Efficiency & Automation: Identify and implement opportunities for cost reduction and operational efficiencies through proactive analysis, performance tuning, and the development of automation scripts and tools. Ensure adherence to support process and tool standards. Technical Communication: Effectively communicate technical details, application status, operational risks, and support initiatives to product teams, development teams, and relevant stakeholders. Risk & Compliance: Directly ensure operational risk is managed effectively and compliance with applicable policies, rules, and regulations is maintained for the Instant Payments application support function. Qualifications Progressive, hands on experience in application support, Site Reliability Engineering (SRE), or technical operations for mission critical, high volume financial applications. Direct experience with cloud native architectures, including configuration and management of microservices, containers (e.g., Kubernetes), and serverless technologies. Practical experience with major Public Cloud platforms (e.g., AWS, Azure, GCP) and enterprise private cloud environments. Track record in implementing and operating comprehensive observability stacks (e.g., Prometheus, Grafana, ELK stack, Jaeger, distributed tracing). Understanding and application of resiliency engineering principles (e.g., circuit breakers, bulkheads, retry mechanisms) and robust disaster recovery strategies. Strong technical background in instant payments or real time financial transaction processing systems highly desirable. Expertise in automation, scripting (e.g., Python, Go, Shell), and infrastructure as code principles (e.g., Terraform, CloudFormation). Excellent communication, interpersonal, and team leadership skills, with the ability to manage and motivate a technical team while remaining deeply technical. Proven ability to troubleshoot and resolve complex technical issues independently, prioritize effectively, and make sound decisions under pressure. Education Bachelor's/University degree in Computer Science, Engineering, or a related technical field is required. Relevant certifications (e.g., Public Cloud Certified Solutions Architect, Certified Kubernetes Administrator) preferred. What we'll provide you 27 days annual leave (plus bank holidays) Discretionary annual performance related bonus Private Medical Care & Life Insurance Employee Assistance Program Pension Plan Paid Parental Leave Special discounts for employees, family, and friends Access to an array of learning and development resources Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity, review Accessibility at Citi. View Citi's EEO Policy Statement and the Know Your Rights poster.

23/06/2026

Full time

Engineer the future of global finance. At Citi, our Tech team doesn't just support finance - we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning and flexibility with potential hybrid work opportunities. Help solve real-world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech. Key Responsibilities Hands On Operational Leadership: Directly manage, mentor, and develop a technical support team while actively engaging in day to day operational tasks, incident response, and problem resolution for the Instant Payments application. Direct Operational Management: Take ownership of ensuring the operational stability and performance of the Instant Payments application across diverse cloud environments (Citi's Enterprise Cloud and Public Cloud), including active monitoring and system checks. Technical Implementation & Optimization: Lead the implementation, configuration, and continuous optimization of observability (monitoring, logging, tracing tools), resiliency (designing and implementing auto healing and retry mechanisms), and recoverability (executing disaster recovery strategies) solutions for the cloud native Instant Payments application. This includes writing and maintaining scripts for these functions. Service Level Execution & Improvement: Contribute to improving service levels by implementing operational efficiencies, performing incident management, problem management, and enhancing knowledge sharing practices for the Instant Payments application. Application Onboarding & Technical Guidance: Actively participate in defining and implementing application onboarding guidelines and standards, and provide direct technical guidance to development teams on stability and supportability improvements for the Instant Payments application. Incident & Problem Resolution: Lead and execute troubleshooting efforts for complex technical issues, perform in depth root cause analysis, and implement permanent fixes for the Instant Payments application. Cost Efficiency & Automation: Identify and implement opportunities for cost reduction and operational efficiencies through proactive analysis, performance tuning, and the development of automation scripts and tools. Ensure adherence to support process and tool standards. Technical Communication: Effectively communicate technical details, application status, operational risks, and support initiatives to product teams, development teams, and relevant stakeholders. Risk & Compliance: Directly ensure operational risk is managed effectively and compliance with applicable policies, rules, and regulations is maintained for the Instant Payments application support function. Qualifications Progressive, hands on experience in application support, Site Reliability Engineering (SRE), or technical operations for mission critical, high volume financial applications. Direct experience with cloud native architectures, including configuration and management of microservices, containers (e.g., Kubernetes), and serverless technologies. Practical experience with major Public Cloud platforms (e.g., AWS, Azure, GCP) and enterprise private cloud environments. Track record in implementing and operating comprehensive observability stacks (e.g., Prometheus, Grafana, ELK stack, Jaeger, distributed tracing). Understanding and application of resiliency engineering principles (e.g., circuit breakers, bulkheads, retry mechanisms) and robust disaster recovery strategies. Strong technical background in instant payments or real time financial transaction processing systems highly desirable. Expertise in automation, scripting (e.g., Python, Go, Shell), and infrastructure as code principles (e.g., Terraform, CloudFormation). Excellent communication, interpersonal, and team leadership skills, with the ability to manage and motivate a technical team while remaining deeply technical. Proven ability to troubleshoot and resolve complex technical issues independently, prioritize effectively, and make sound decisions under pressure. Education Bachelor's/University degree in Computer Science, Engineering, or a related technical field is required. Relevant certifications (e.g., Public Cloud Certified Solutions Architect, Certified Kubernetes Administrator) preferred. What we'll provide you 27 days annual leave (plus bank holidays) Discretionary annual performance related bonus Private Medical Care & Life Insurance Employee Assistance Program Pension Plan Paid Parental Leave Special discounts for employees, family, and friends Access to an array of learning and development resources Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity, review Accessibility at Citi. View Citi's EEO Policy Statement and the Know Your Rights poster.

Release Engineer

FNZ (UK) Ltd Birmingham, Staffordshire

Location: Birmingham, UK (commutable distance required)About FNZ and the teamThe Release Operations Engineer is part of the Figaro Release Operations team within Operational Support (CEO function), reporting through to the Head of Support Services and the Release Operations Team Leader. The team is responsible for deploying software and delivering accompanying documentation to FNZ Figaro clients, acting as a quality gatekeeper for release packages, and supporting internal code bases and environments used by developers and testers.Role overviewYou will play a key role in the deployment and delivery of software releases across FNZ platforms for internal and external FNZ Figaro clients, delivering to agreed timelines. Working closely with Delivery, Product and Support teams, you will execute release activities and continuously improve release processes. You may also support configuration and maintenance of underlying infrastructure used by Figaro to enable successful software deployment, ensuring documentation is provided and processes are followed accurately. This is a varied, technical role requiring strong problem solving capability and the ability to communicate clearly with technical and non technical stakeholders.Key responsibilitiesRelease engineering & deployments Deploy FNZ Figaro software to internal and external environments using appropriate technology for the change. Support and coordinate release and deployment activities across hybrid environments, including Google Cloud Platform (GCP), Microsoft Azure, AWS, and on prem IBM platforms. Support deployments using automation tools and frameworks. Carry out agreed application maintenance tasks. Lead or assist with software and infrastructure changes for internal FNZ Figaro and client environments.Change management, governance & readiness Produce and maintain release documentation, including release notes, implementation plans, rollback plans, runbooks, and release communications. Contribute to Release Boards, ensuring releases follow the FNZ Change Management lifecycle and that Release Operations activities are completed. Coordinate technical and operational readiness activities (pre checks, deployment windows, cutover plans, post release validation).Continuous improvement / automation / DevOps ways of working Participate in analysis, design, testing and implementation of process improvements and automation to improve efficiency and reliability of change delivery across FNZ Figaro teams. Participate in projects supporting delivery of change from initial design through to production implementation.Support, stakeholder & team contribution Provide technical support to FNZ Figaro staff and clients, including working on client site when required. Independently investigate and diagnose support issues, proposing solutions and providing advice to resolve incidents. Provide mentoring, guidance and technical support to junior team members and wider FNZ Figaro teams as required. Manage work commitments with minimal referral, providing regular progress updates to the Release Operations Team Leader. Provide voluntary out of hours deployment cover, including weekends and public holidays. Work collaboratively with Release Management, Managed Service Operations, Infrastructure, Solutions Development, Product Owners and other FNZ departments.Required experience and skillsEssential experience Experience in a Release Management / Release Engineering role. Experience in a client facing support role. Experience documenting processes, procedures and policies for internal use. Experience working in regulated environments (financial services preferred).Essential behaviours Highly self motivated and delivery focused; confident taking initiative and working independently. Highly logical with proven problem solving capability. Strong organisation, administration and time management. Clear written and verbal communicator; effective with internal and external stakeholders.Technical skills Familiarity with JIRA and Confluence. Intermediate SQL skills. Familiarity with IBM i (iSeries/AS400) platforms. Knowledge of operating systems: IBM, Windows, Linux. Cloud knowledge: AWS and GCP (and deployment coordination across Azure is part of the environment).Desirable Experience with Terraform, Jenkins, Infrastructure as Code, SDLC, CI/CD, automation, and DevOps methodologies. ITIL certification.Applications will close May 13th . Early application is encouraged. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealth's growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with the world's leading financial institutions, with over US$2.4 trillion in assets on platform (AoP). Together with our clients, we empower nearly 30 million people across all wealth segments to invest in their future.

23/06/2026

Full time

Location: Birmingham, UK (commutable distance required)About FNZ and the teamThe Release Operations Engineer is part of the Figaro Release Operations team within Operational Support (CEO function), reporting through to the Head of Support Services and the Release Operations Team Leader. The team is responsible for deploying software and delivering accompanying documentation to FNZ Figaro clients, acting as a quality gatekeeper for release packages, and supporting internal code bases and environments used by developers and testers.Role overviewYou will play a key role in the deployment and delivery of software releases across FNZ platforms for internal and external FNZ Figaro clients, delivering to agreed timelines. Working closely with Delivery, Product and Support teams, you will execute release activities and continuously improve release processes. You may also support configuration and maintenance of underlying infrastructure used by Figaro to enable successful software deployment, ensuring documentation is provided and processes are followed accurately. This is a varied, technical role requiring strong problem solving capability and the ability to communicate clearly with technical and non technical stakeholders.Key responsibilitiesRelease engineering & deployments Deploy FNZ Figaro software to internal and external environments using appropriate technology for the change. Support and coordinate release and deployment activities across hybrid environments, including Google Cloud Platform (GCP), Microsoft Azure, AWS, and on prem IBM platforms. Support deployments using automation tools and frameworks. Carry out agreed application maintenance tasks. Lead or assist with software and infrastructure changes for internal FNZ Figaro and client environments.Change management, governance & readiness Produce and maintain release documentation, including release notes, implementation plans, rollback plans, runbooks, and release communications. Contribute to Release Boards, ensuring releases follow the FNZ Change Management lifecycle and that Release Operations activities are completed. Coordinate technical and operational readiness activities (pre checks, deployment windows, cutover plans, post release validation).Continuous improvement / automation / DevOps ways of working Participate in analysis, design, testing and implementation of process improvements and automation to improve efficiency and reliability of change delivery across FNZ Figaro teams. Participate in projects supporting delivery of change from initial design through to production implementation.Support, stakeholder & team contribution Provide technical support to FNZ Figaro staff and clients, including working on client site when required. Independently investigate and diagnose support issues, proposing solutions and providing advice to resolve incidents. Provide mentoring, guidance and technical support to junior team members and wider FNZ Figaro teams as required. Manage work commitments with minimal referral, providing regular progress updates to the Release Operations Team Leader. Provide voluntary out of hours deployment cover, including weekends and public holidays. Work collaboratively with Release Management, Managed Service Operations, Infrastructure, Solutions Development, Product Owners and other FNZ departments.Required experience and skillsEssential experience Experience in a Release Management / Release Engineering role. Experience in a client facing support role. Experience documenting processes, procedures and policies for internal use. Experience working in regulated environments (financial services preferred).Essential behaviours Highly self motivated and delivery focused; confident taking initiative and working independently. Highly logical with proven problem solving capability. Strong organisation, administration and time management. Clear written and verbal communicator; effective with internal and external stakeholders.Technical skills Familiarity with JIRA and Confluence. Intermediate SQL skills. Familiarity with IBM i (iSeries/AS400) platforms. Knowledge of operating systems: IBM, Windows, Linux. Cloud knowledge: AWS and GCP (and deployment coordination across Azure is part of the environment).Desirable Experience with Terraform, Jenkins, Infrastructure as Code, SDLC, CI/CD, automation, and DevOps methodologies. ITIL certification.Applications will close May 13th . Early application is encouraged. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealth's growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with the world's leading financial institutions, with over US$2.4 trillion in assets on platform (AoP). Together with our clients, we empower nearly 30 million people across all wealth segments to invest in their future.

Early in Career Network Engineer (Site Reliability Engineer) - UK

Cisco Systems, Inc.

Please note this posting is to advertise potential job opportunities. This exact role may not be open today but could open in the near future. When you apply, a Cisco representative may contact you directly if a relevant position opens. Start Date: as soon as possible Location: Feltham, United Kingdom (Hybrid work approach, working from the Feltham office 1-2 days per week.) Meet the Team We at Cisco are looking for a Site Reliability Engineer, with a passion for technology and solid academic foundations in analytical disciplines. Cisco is a strong advocate of using its own enterprise networking, datacenter, collaboration products, and solutions internally; Cisco IT deploys all these technologies - the result being that Cisco IT accrues a great deal of experience in how to design, deploy, operate, and automate these solutions within a large global enterprise. In the Network Engineering Core Team, we are responsible for connecting our offices to our enterprise network across Cisco. We maintain and support the Wan and Core infrastructure, alongside several hardware and software remote access solutions with an Agile, SRE mindset and have lots of fun along the way. Your Impact As a Site Reliability Engineer, daily activities of the role involve working within a large global team of DevOps Network Engineers, Product Owners, and Product Managers to enable the efficient running of all Cisco offices and remote/hybrid working solutions. You'll also have the opportunity to work on a variety of different projects across our technology portfolio. Activities include but are not limited to: Use creative problem-solving to provide Cisco with advanced, essential business capabilities. Developing technical prototype environments and concepts. Supporting existing platforms and network solutions,including but not limited toWAN, LAN, and Core. Single working/or part of a team dependentofthe project, using theSAFemethodology. Identify and work on areas that can be automated to streamline processes within the team. There will be some on call work required as you become familiar with our network but this is limited to 1 week in every 6 which will cover the working day during the week and include the weekend. Minimum Qualifications We are looking for someone that can demonstrate thefollowing; Including but not limited to a recent/upcoming graduate of a Bachelor's degree (or higher) or a certification program (e.g. a Bootcamp or Apprenticeship). Equivalent experience accepted in lieu of these. Demonstrate a keen interest in some of the following technologies: Networking (Routing,Switching, and WAN/SDWAN) Automation / Programming-i.e.Python,Ansible,REST, APIsare advantageous but not essential Virtualisation Technologies-VMware, OpenStack, Dockerare advantageous Able to legally live and work in the country for which you're applying Preferred Qualifications Strong analytical mind-set Familiarity with design concepts Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you.

23/06/2026

Full time

Please note this posting is to advertise potential job opportunities. This exact role may not be open today but could open in the near future. When you apply, a Cisco representative may contact you directly if a relevant position opens. Start Date: as soon as possible Location: Feltham, United Kingdom (Hybrid work approach, working from the Feltham office 1-2 days per week.) Meet the Team We at Cisco are looking for a Site Reliability Engineer, with a passion for technology and solid academic foundations in analytical disciplines. Cisco is a strong advocate of using its own enterprise networking, datacenter, collaboration products, and solutions internally; Cisco IT deploys all these technologies - the result being that Cisco IT accrues a great deal of experience in how to design, deploy, operate, and automate these solutions within a large global enterprise. In the Network Engineering Core Team, we are responsible for connecting our offices to our enterprise network across Cisco. We maintain and support the Wan and Core infrastructure, alongside several hardware and software remote access solutions with an Agile, SRE mindset and have lots of fun along the way. Your Impact As a Site Reliability Engineer, daily activities of the role involve working within a large global team of DevOps Network Engineers, Product Owners, and Product Managers to enable the efficient running of all Cisco offices and remote/hybrid working solutions. You'll also have the opportunity to work on a variety of different projects across our technology portfolio. Activities include but are not limited to: Use creative problem-solving to provide Cisco with advanced, essential business capabilities. Developing technical prototype environments and concepts. Supporting existing platforms and network solutions,including but not limited toWAN, LAN, and Core. Single working/or part of a team dependentofthe project, using theSAFemethodology. Identify and work on areas that can be automated to streamline processes within the team. There will be some on call work required as you become familiar with our network but this is limited to 1 week in every 6 which will cover the working day during the week and include the weekend. Minimum Qualifications We are looking for someone that can demonstrate thefollowing; Including but not limited to a recent/upcoming graduate of a Bachelor's degree (or higher) or a certification program (e.g. a Bootcamp or Apprenticeship). Equivalent experience accepted in lieu of these. Demonstrate a keen interest in some of the following technologies: Networking (Routing,Switching, and WAN/SDWAN) Automation / Programming-i.e.Python,Ansible,REST, APIsare advantageous but not essential Virtualisation Technologies-VMware, OpenStack, Dockerare advantageous Able to legally live and work in the country for which you're applying Preferred Qualifications Strong analytical mind-set Familiarity with design concepts Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you.

Senior Site Reliability Engineer

Experian Ltd Nottingham, Nottinghamshire

Company Description Experian is a global data and technology company, powering opportunities for people and businesses around the world. We help to redefine lending practices, uncover and prevent fraud, simplify healthcare, create marketing solutions, and gain deeper insights into the automotive market, all using our unique combination of data, analytics and software. We also assist millions of people to realize their financial goals and help them save time and money. We operate across a range of markets, from financial services to healthcare, automotive, agribusiness, insurance, and many more industry segments. We invest in people and new advanced technologies to unlock the power of data. As a FTSE 100 Index company listed on the London Stock Exchange (EXPN), we have a team of 22,500 people across 32 countries. Our corporate headquarters are in Dublin, Ireland. Learn . Job Description We are looking for a Site Reliability Engineer to improve the reliability, and performance of business-critical systems. Reporting into our Head of SRE you will focus on AWS cloud infrastructure, DevOps tooling, and core SRE practices within a distributed, production environment. Main Responsibilities: Leadership & Strategy Define and implement SRE best practices across the organization. Proven expertise in production support, engineering, disaster recovery (DCR), automation, and cloud operations Mentor and guide a team of SREs, fostering growth. Collaborate with senior stakeholders to align reliability goals with business objectives. Reliability & Performance Establish SLIs, SLOs, and SLAs for critical services and ensure adherence. Drive initiatives to improve system resilience and reduce operational toil. Excellent in designing systems that detect and remediate issues without manual intervention Self Healing systems, Runbook automation Exposure to tools like Gremlin, Chaos Monkey, AWS FIS to simulate outages and improve fault tolerance Incident Management Act as the primary point of escalation for critical production issues and lead major incident response, root cause analysis, and postmortems. Perform detailed post-incident investigations to identify underlying causes. Document findings and share learnings to prevent recurrence. Implement preventive measures and continuous improvement processes. Observability Champion monitoring, logging, and alerting strategies using tools like Prometheus, Grafana, ELK, and AWS CloudWatch. Build real-time dashboards to visualize system health and reliability metrics. Configure intelligent alerting based on anomaly detection and thresholds. Combine metrics, logs, and traces to enable root cause analysis and reduce Mean Time to Resolution (MTTR). Knowledge of AIOps or ML-based anomaly detection for proactive reliability management. Collaboration Work closely with development teams to integrate reliability into application design and deployment Promote a culture of shared responsibility for uptime and performance across engineering teams. Qualifications Deep expertise with various AWS services. Advanced knowledge of monitoring and observability tools. Strong leadership capabilities with a focus on setting clear direction, aligning team efforts with organizational goals, and maintaining high levels of motivation and engagement across the team. Excellent communication skills, with the ability to articulate complex ideas, solutions, and feedback clearly to both technical and non-technical stakeholders. Adept at managing conflict constructively and facilitating consensus. Proven track record of building secure, mission-critical, high-volume transaction web-based software systems, preferably in regulated environments (finance and insurance industries). Hands on technologist working in software development includingleading anSRE team. Additional Information Hybrid working, 2 days a week our Nottingham Office Great compensation package and discretionary bonus Core benefits include pension, bupa healthcare, sharesave scheme and more 25 days annual leave with 8 bank holidays and 3 volunteering days. You can purchase additional annual leave. Our uniqueness is that we celebrate yours. Experian's culture and people are important differentiators. We take our people agenda very seriously and focus on what matters; DEI, work/life balance, development, authenticity, collaboration, wellness, reward & recognition, volunteering the list goes on. Experian's people first approach is award-winning; World's Best Workplaces 2024 (Fortune Top 25), Great Place To Work in 24 countries, and Glassdoor Best Places to Work 2024 to name a few. Check out Experian Life on social or our Careers Site to understand why. Experian is proud to be an Equal Opportunity and Affirmative Action employer. Innovation is an important part of Experian's DNA and practices, and our diverse workforce drives our success. Everyone can succeed at Experian and bring their whole self to work, irrespective of their gender, ethnicity, religion, colour, sexuality, physical ability or age. If you have a disability or special need that requires accommodation, please let us know at the earliest opportunity. Experian Careers - Creating a better tomorrow together Find out what its like to work for Experian by clicking here Experian Careers - Creating a better tomorrow together Find out what its like to work for Experian by clicking here JBRP1_UKTJ

23/06/2026

Full time

Company Description Experian is a global data and technology company, powering opportunities for people and businesses around the world. We help to redefine lending practices, uncover and prevent fraud, simplify healthcare, create marketing solutions, and gain deeper insights into the automotive market, all using our unique combination of data, analytics and software. We also assist millions of people to realize their financial goals and help them save time and money. We operate across a range of markets, from financial services to healthcare, automotive, agribusiness, insurance, and many more industry segments. We invest in people and new advanced technologies to unlock the power of data. As a FTSE 100 Index company listed on the London Stock Exchange (EXPN), we have a team of 22,500 people across 32 countries. Our corporate headquarters are in Dublin, Ireland. Learn . Job Description We are looking for a Site Reliability Engineer to improve the reliability, and performance of business-critical systems. Reporting into our Head of SRE you will focus on AWS cloud infrastructure, DevOps tooling, and core SRE practices within a distributed, production environment. Main Responsibilities: Leadership & Strategy Define and implement SRE best practices across the organization. Proven expertise in production support, engineering, disaster recovery (DCR), automation, and cloud operations Mentor and guide a team of SREs, fostering growth. Collaborate with senior stakeholders to align reliability goals with business objectives. Reliability & Performance Establish SLIs, SLOs, and SLAs for critical services and ensure adherence. Drive initiatives to improve system resilience and reduce operational toil. Excellent in designing systems that detect and remediate issues without manual intervention Self Healing systems, Runbook automation Exposure to tools like Gremlin, Chaos Monkey, AWS FIS to simulate outages and improve fault tolerance Incident Management Act as the primary point of escalation for critical production issues and lead major incident response, root cause analysis, and postmortems. Perform detailed post-incident investigations to identify underlying causes. Document findings and share learnings to prevent recurrence. Implement preventive measures and continuous improvement processes. Observability Champion monitoring, logging, and alerting strategies using tools like Prometheus, Grafana, ELK, and AWS CloudWatch. Build real-time dashboards to visualize system health and reliability metrics. Configure intelligent alerting based on anomaly detection and thresholds. Combine metrics, logs, and traces to enable root cause analysis and reduce Mean Time to Resolution (MTTR). Knowledge of AIOps or ML-based anomaly detection for proactive reliability management. Collaboration Work closely with development teams to integrate reliability into application design and deployment Promote a culture of shared responsibility for uptime and performance across engineering teams. Qualifications Deep expertise with various AWS services. Advanced knowledge of monitoring and observability tools. Strong leadership capabilities with a focus on setting clear direction, aligning team efforts with organizational goals, and maintaining high levels of motivation and engagement across the team. Excellent communication skills, with the ability to articulate complex ideas, solutions, and feedback clearly to both technical and non-technical stakeholders. Adept at managing conflict constructively and facilitating consensus. Proven track record of building secure, mission-critical, high-volume transaction web-based software systems, preferably in regulated environments (finance and insurance industries). Hands on technologist working in software development includingleading anSRE team. Additional Information Hybrid working, 2 days a week our Nottingham Office Great compensation package and discretionary bonus Core benefits include pension, bupa healthcare, sharesave scheme and more 25 days annual leave with 8 bank holidays and 3 volunteering days. You can purchase additional annual leave. Our uniqueness is that we celebrate yours. Experian's culture and people are important differentiators. We take our people agenda very seriously and focus on what matters; DEI, work/life balance, development, authenticity, collaboration, wellness, reward & recognition, volunteering the list goes on. Experian's people first approach is award-winning; World's Best Workplaces 2024 (Fortune Top 25), Great Place To Work in 24 countries, and Glassdoor Best Places to Work 2024 to name a few. Check out Experian Life on social or our Careers Site to understand why. Experian is proud to be an Equal Opportunity and Affirmative Action employer. Innovation is an important part of Experian's DNA and practices, and our diverse workforce drives our success. Everyone can succeed at Experian and bring their whole self to work, irrespective of their gender, ethnicity, religion, colour, sexuality, physical ability or age. If you have a disability or special need that requires accommodation, please let us know at the earliest opportunity. Experian Careers - Creating a better tomorrow together Find out what its like to work for Experian by clicking here Experian Careers - Creating a better tomorrow together Find out what its like to work for Experian by clicking here JBRP1_UKTJ

Site Reliability Engineer

Onyx-Conseil Manchester, Lancashire

Overview We are seeking an experienced and motivated Site Reliability Engineer (SRE) to join a high-performing team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, deployment, and operational support of critical data-driven platforms and services operating within complex production environments. Responsibilities Work closely with engineering, platform, and operational support teams to strengthen monitoring and alerting capabilities. Improve logging and traceability. Troubleshoot incidents. Support deployments. Automate operational processes wherever possible. Environment The environment includes Kubernetes, Helm, the ELK stack, and a broad range of modern Site Reliability Engineering and cloud platform practices. Role Expectations This is a hands-on technical role suited to someone who thrives in fast-paced operational environments, enjoys solving complex production issues, and is passionate about automation, platform reliability, and continuous improvement. Collaboration The role requires strong collaboration with both client stakeholders and engineering teams to ensure operational excellence, platform resilience, and service availability across critical systems.

23/06/2026

Full time

Overview We are seeking an experienced and motivated Site Reliability Engineer (SRE) to join a high-performing team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, deployment, and operational support of critical data-driven platforms and services operating within complex production environments. Responsibilities Work closely with engineering, platform, and operational support teams to strengthen monitoring and alerting capabilities. Improve logging and traceability. Troubleshoot incidents. Support deployments. Automate operational processes wherever possible. Environment The environment includes Kubernetes, Helm, the ELK stack, and a broad range of modern Site Reliability Engineering and cloud platform practices. Role Expectations This is a hands-on technical role suited to someone who thrives in fast-paced operational environments, enjoys solving complex production issues, and is passionate about automation, platform reliability, and continuous improvement. Collaboration The role requires strong collaboration with both client stakeholders and engineering teams to ensure operational excellence, platform resilience, and service availability across critical systems.

Software Engineering III - AMDP

JPMorgan Chase & Co.

Join us to shape the future of AI/ML data platforms and make a real impact on how we deliver market-leading solutions. You will collaborate with talented colleagues, solve complex challenges, and help drive strategic change across our organization. At JPMorganChase, you'll find opportunities for growth, mentorship, and the chance to work with cutting edge technologies. Your contributions will help us deliver resilient and innovative data solutions that power our business. As a Site Reliability Engineer in the AI/ML Data Platforms team, you will play a key role in building and supporting scalable, resilient data solutions. You will engage in root cause analysis, production changes, and collaborate with cross functional teams to drive improvements. You will also mentor team members and partner with colleagues across our global network. Your work will directly impact the reliability and performance of our AI/ML platforms. Job Responsibilities Develop and support AI/ML solutions for troubleshooting and incident resolution Coordinate incident management coverage to ensure effective resolution of application issues Collaborate with cross functional teams to perform root cause analysis and implement production changes Apply expertise in application development and support using technologies such as Databricks, Snowflake, AWS, and Kubernetes Mentor and guide team members to drive strategic change Build tools to automate repeated tasks and reduce operational toil Ensure compliance with risk controls and company standards Contribute to system design, resiliency, testing, operational stability, and disaster recovery Foster a collaborative team environment to achieve common goals Required Qualifications, Capabilities, and Skills Proficient in site reliability culture and principles, with experience implementing them within applications or platforms Skilled in running production incident calls and managing incident resolution Experience with observability, including monitoring, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, or Splunk Strong understanding of SLI/SLO/SLA and error budgets Proficiency in Python or PySpark for AI/ML modeling Ability to automate tasks and reduce toil through tool development Hands on experience in system design, resiliency, testing, operational stability, and disaster recovery Awareness of risk controls and compliance with organizational standards Ability to work collaboratively and build meaningful relationships Preferred Qualifications, Capabilities, and Skills Experience in an SRE or production support role with AWS Cloud, Databricks, Snowflake, or similar technologies AWS and Databricks certifications

23/06/2026

Full time

Join us to shape the future of AI/ML data platforms and make a real impact on how we deliver market-leading solutions. You will collaborate with talented colleagues, solve complex challenges, and help drive strategic change across our organization. At JPMorganChase, you'll find opportunities for growth, mentorship, and the chance to work with cutting edge technologies. Your contributions will help us deliver resilient and innovative data solutions that power our business. As a Site Reliability Engineer in the AI/ML Data Platforms team, you will play a key role in building and supporting scalable, resilient data solutions. You will engage in root cause analysis, production changes, and collaborate with cross functional teams to drive improvements. You will also mentor team members and partner with colleagues across our global network. Your work will directly impact the reliability and performance of our AI/ML platforms. Job Responsibilities Develop and support AI/ML solutions for troubleshooting and incident resolution Coordinate incident management coverage to ensure effective resolution of application issues Collaborate with cross functional teams to perform root cause analysis and implement production changes Apply expertise in application development and support using technologies such as Databricks, Snowflake, AWS, and Kubernetes Mentor and guide team members to drive strategic change Build tools to automate repeated tasks and reduce operational toil Ensure compliance with risk controls and company standards Contribute to system design, resiliency, testing, operational stability, and disaster recovery Foster a collaborative team environment to achieve common goals Required Qualifications, Capabilities, and Skills Proficient in site reliability culture and principles, with experience implementing them within applications or platforms Skilled in running production incident calls and managing incident resolution Experience with observability, including monitoring, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, or Splunk Strong understanding of SLI/SLO/SLA and error budgets Proficiency in Python or PySpark for AI/ML modeling Ability to automate tasks and reduce toil through tool development Hands on experience in system design, resiliency, testing, operational stability, and disaster recovery Awareness of risk controls and compliance with organizational standards Ability to work collaboratively and build meaningful relationships Preferred Qualifications, Capabilities, and Skills Experience in an SRE or production support role with AWS Cloud, Databricks, Snowflake, or similar technologies AWS and Databricks certifications

AI/ML Data Platform SRE - Scale, Resilience, Impact

JPMorgan Chase & Co.

JPMorgan Chase & Co. is looking for a Site Reliability Engineer to join their AI/ML Data Platforms team in Greater London. The role involves developing scalable data solutions, incident management, and mentoring team members. Candidates should be proficient in site reliability principles, Python and experience with AWS, Databricks, and observability tools like Grafana and Splunk. This position offers the opportunity for growth in a collaborative environment while contributing to innovative data solutions.

23/06/2026

Full time

JPMorgan Chase & Co. is looking for a Site Reliability Engineer to join their AI/ML Data Platforms team in Greater London. The role involves developing scalable data solutions, incident management, and mentoring team members. Candidates should be proficient in site reliability principles, Python and experience with AWS, Databricks, and observability tools like Grafana and Splunk. This position offers the opportunity for growth in a collaborative environment while contributing to innovative data solutions.

Systems Engineer, Database Services

Amazon

Job ID: AWS EMEA SARL (UK Branch) Would you like to help implement innovative cloud computing solutions and solve the world's most complex technical problems? Do you have a deep passion and desire to engineer and operate the world's largest cloud computing infrastructure to provide a better world for future generations? Amazon Web Services (AWS) builds and operates some of the largest internet infrastructure on the planet; providing companies of all sizes with an infrastructure web services platform in the cloud. With AWS, customers provision compute power, storage, database, and other cloud resources as their business demands them. To meet the growing demand for AWS Services around the globe, we need exceptionally motivated people who are driven by learning and innovation. The AWS Database Services (DBS) team helps to support and operate some of the largest distributed database and analytics services in the world. From DynamoDB to OpenSearch, Glue and EMR, the DBS team manages many of the most heavily used services in AWS. If you join us, you'll be part of a world-class team in a dynamic environment that has the entrepreneurial feel of a start up. This is an opportunity to operate and engineer systems on a massive scale and to gain world class experience in cloud computing. You'll be surrounded by people who are passionate about their field, believe that first class service is critical to customer success, and are committed to rapid iteration and continual improvement. Top Reasons to Join Our Team Be a catalyst to deliver truly disruptive products that are growing rapidly Define, build, own, and run services in high growth environments Solve unique and first order problems in foundational AWS services such as DynamoDB, OpenSearch and Glue Learn how to build and operate distributed systems at scale Build and influence the tools and utilities that are part of the AWS fleet running our internal services Key Job Responsibilities Our Systems Engineers are expected to be leaders in their teams by becoming subject matter experts on multiple AWS services. They develop, build, deploy, operate, sustain and grow their services in cloud production environments. They are able to utilise trends and metrics to identify opportunities for improvements. They help develop and refine procedures used by their team and internal customers. They are able to develop at a high standard and can deal with new and ambiguous problem domains while consistently delivering customer impacting change. Systems Engineers Spend Time Operating the world's most advanced cloud computing infrastructure Simplifying and reinventing systems, processes, and tools to make things better for our customers and our builders Building unique solutions to our customers' technical problems Studying and learning from industry recognised Amazon Senior and Principal Engineers Coaching, hiring and developing team members Basic Qualifications Experience in site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration Experience working with Linux Experience in any of the following: Python, Java, Perl, PHP, Ruby, Bash, Shell or equivalent Preferred Qualifications Knowledge of TCP/IP and networking protocols such as HTTP and DNS Experience designing and developing scripts to automate operational burdens and reviewing scripting changes to ensure they meet the standards for maintainability, scalability and security Experience working in 24/7 production environment Experience with service oriented architecture and web services Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice ( ) to know more about how we collect, use and transfer the personal data of our candidates. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or any other legally protected status. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information.

23/06/2026

Full time

Job ID: AWS EMEA SARL (UK Branch) Would you like to help implement innovative cloud computing solutions and solve the world's most complex technical problems? Do you have a deep passion and desire to engineer and operate the world's largest cloud computing infrastructure to provide a better world for future generations? Amazon Web Services (AWS) builds and operates some of the largest internet infrastructure on the planet; providing companies of all sizes with an infrastructure web services platform in the cloud. With AWS, customers provision compute power, storage, database, and other cloud resources as their business demands them. To meet the growing demand for AWS Services around the globe, we need exceptionally motivated people who are driven by learning and innovation. The AWS Database Services (DBS) team helps to support and operate some of the largest distributed database and analytics services in the world. From DynamoDB to OpenSearch, Glue and EMR, the DBS team manages many of the most heavily used services in AWS. If you join us, you'll be part of a world-class team in a dynamic environment that has the entrepreneurial feel of a start up. This is an opportunity to operate and engineer systems on a massive scale and to gain world class experience in cloud computing. You'll be surrounded by people who are passionate about their field, believe that first class service is critical to customer success, and are committed to rapid iteration and continual improvement. Top Reasons to Join Our Team Be a catalyst to deliver truly disruptive products that are growing rapidly Define, build, own, and run services in high growth environments Solve unique and first order problems in foundational AWS services such as DynamoDB, OpenSearch and Glue Learn how to build and operate distributed systems at scale Build and influence the tools and utilities that are part of the AWS fleet running our internal services Key Job Responsibilities Our Systems Engineers are expected to be leaders in their teams by becoming subject matter experts on multiple AWS services. They develop, build, deploy, operate, sustain and grow their services in cloud production environments. They are able to utilise trends and metrics to identify opportunities for improvements. They help develop and refine procedures used by their team and internal customers. They are able to develop at a high standard and can deal with new and ambiguous problem domains while consistently delivering customer impacting change. Systems Engineers Spend Time Operating the world's most advanced cloud computing infrastructure Simplifying and reinventing systems, processes, and tools to make things better for our customers and our builders Building unique solutions to our customers' technical problems Studying and learning from industry recognised Amazon Senior and Principal Engineers Coaching, hiring and developing team members Basic Qualifications Experience in site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration Experience working with Linux Experience in any of the following: Python, Java, Perl, PHP, Ruby, Bash, Shell or equivalent Preferred Qualifications Knowledge of TCP/IP and networking protocols such as HTTP and DNS Experience designing and developing scripts to automate operational burdens and reviewing scripting changes to ensure they meet the standards for maintainability, scalability and security Experience working in 24/7 production environment Experience with service oriented architecture and web services Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice ( ) to know more about how we collect, use and transfer the personal data of our candidates. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or any other legally protected status. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information.

Senior Site Reliability Engineer

iManage

Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

23/06/2026

Full time

Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Field CTO EMEA

Dynatrace LLC Maidenhead, Berkshire

Your role at DynatraceDynatrace is seeking a strategic, customer-facing Field CTO to serve as a senior technical advisor to executives, enterprise architects, and transformation leaders at our most important customers and prospects. This leader will connect business priorities to technical strategy, helping organizations use Dynatrace to improve resilience, accelerate innovation, strengthen security, and drive measurable business outcomes.The Field CTO operates at the intersection of executive engagement, technical vision, sales strategy, and market influence. This role partners closely with Sales, Solution Engineering, Product Management, Customer Success, Alliances, and Marketing to shape large, strategic opportunities and elevate Dynatrace's role as a trusted transformation partner.Act as the executive technical advisor for strategic accounts, engaging CIOs, CTOs, CISOs, VP Engineering, platform teams, and business stakeholders.Translate customer business goals into compelling transformation strategies powered by Dynatrace.Lead high-impact technical discovery and executive conversations around observability, cloud modernization, AI adoption, security, automation, and business outcomes.Shape account strategy with Sales and Solution Engineering teams for complex, multi-stakeholder deals.Develop board-level and executive-level narratives that connect platform capabilities to risk reduction, operational excellence, digital experience, and growth.Guide customers on modern observability and security operating models, including platform engineering, SRE, DevSecOps, and AI-assisted operations.Support large opportunities by validating architecture direction, differentiation, value realization, and long-term platform vision.Influence go-to-market strategy by bringing field insight back to Product, Marketing, and leadership teams.Represent Dynatrace externally through executive briefings, customer workshops, industry events, webinars, and thought leadership.Mentor field teams on executive engagement, storytelling, value selling, and strategic account planning.Help create reusable field assets, strategic points of view, and technical value frameworks for priority industries and use cases.Partner with Customer Success and Services to promote adoption strategies that expand platform value over time.What will help you succeed12+ years of experience in enterprise technology, including senior roles in architecture, engineering, observability, cloud, security, or technical go-to-market leadership.5+ years in a customer-facing leadership role such as Field CTO, Enterprise Architect, CTO Advisor, Chief Architect, VP/Senior Director of Solution Engineering, or similar.Strong executive presence with the ability to communicate equally well with C-level leaders and deeply technical teams.Proven experience supporting complex enterprise sales cycles and strategic digital transformation programs.Deep knowledge of cloud platforms, modern application architectures, distributed systems, platform engineering, and enterprise IT operations.Strong understanding of observability, application performance, infrastructure, log analytics, digital experience, automation, and security.Experience in observability, AIOps, application security, cloud-native platforms, or enterprise analytics.Ability to connect technical transformation to business KPIs, value realization, and organizational change.Excellent communication, presentation, and workshop facilitation skills.Willingness to travel based on customer and business needs.Familiarity with executive value frameworks, business case development, and enterprise transformation methodology.Experience working with Fortune 500 or large global organizations.Background in SaaS or platform companies serving engineering, operations, and security teams.Public speaking and thought leadership experience, including conferences, customer events, or published content.Knowledge of AI/LLM adoption patterns and how AI can improve operational and business decision-making.Why you will love being a DynatracerDynatrace is a leader in unified observability and security.We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance.Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances.The Dynatrace platform uses cutting-edge technologies, including our own Davis hypermodal AI, to help our customers modernize and automate cloud operations, deliver software faster and more securely, and enable flawless digital experiences.Over 50% of the Fortune 100 companies are current customers of Dynatrace.Compensation and RewardsNote to Recruiters and Agencies : Thank you for your interest in Dynatrace. Please note that we do not accept unsolicited agency resumes -do not forward them via our website or directly to Dynatrace employees. Dynatrace will not pay fees for unsolicited resumes, and any resumes received this way will be considered the property of Dynatrace.Benefits and work-life perksWe offer best-in-class core rewards, including paid time off, financial security benefits, retirement savings plans, and health insurance. Beyond that, you'll get other benefits and work-life perks designed to make your ride with us even more rewarding.Mental health supportOur Employee Assistance Program, powered by Telus Health, offers support for you and your family members.Wellness DaysFour company-designated extra paid days off for you to recharge batteries.FlexibilityOur hybrid working model and flexible working hours offer you the flexibility you need.Employee Stock Purchase PlanPurchase company stock ( NYSE:DT ) at a discounted price and become a shareholder.Learn & developCompany-wide learning perks, designated team's learning days, and more.Volunteering dayA day of paid volunteer time to support a community or cause you care about.Regular team eventsWe host Global Culture Parties, Family & Friends at Work Day, Global Breakfasts, Green Weeks, Pride Month, and beyond!International vibeMost of our offices and teams are proudly multicultural. English is our shared language, but we embrace and learn from each other's cultures.Rewards vary depending on your employment type. Some benefits and perks also differ by location - explore your city to see what's available there.About DynatraceDynatrace (NYSE: DT) is the leading AI-powered observability and security platform. We're advancing observability for today's digital businesses, helping transform modern digital ecosystems' complexity into powerful business assets.Our AI-driven insights cut through the noise, allowing customers to focus on what truly matters by automating manual tasks and resolving issues with pinpoint accuracy. Dynatrace offers simplicity, clarity, and reliability at scale to ensure teams can make informed decisions, minimize downtime, and drive their business forward with confidence.

23/06/2026

Full time

Your role at DynatraceDynatrace is seeking a strategic, customer-facing Field CTO to serve as a senior technical advisor to executives, enterprise architects, and transformation leaders at our most important customers and prospects. This leader will connect business priorities to technical strategy, helping organizations use Dynatrace to improve resilience, accelerate innovation, strengthen security, and drive measurable business outcomes.The Field CTO operates at the intersection of executive engagement, technical vision, sales strategy, and market influence. This role partners closely with Sales, Solution Engineering, Product Management, Customer Success, Alliances, and Marketing to shape large, strategic opportunities and elevate Dynatrace's role as a trusted transformation partner.Act as the executive technical advisor for strategic accounts, engaging CIOs, CTOs, CISOs, VP Engineering, platform teams, and business stakeholders.Translate customer business goals into compelling transformation strategies powered by Dynatrace.Lead high-impact technical discovery and executive conversations around observability, cloud modernization, AI adoption, security, automation, and business outcomes.Shape account strategy with Sales and Solution Engineering teams for complex, multi-stakeholder deals.Develop board-level and executive-level narratives that connect platform capabilities to risk reduction, operational excellence, digital experience, and growth.Guide customers on modern observability and security operating models, including platform engineering, SRE, DevSecOps, and AI-assisted operations.Support large opportunities by validating architecture direction, differentiation, value realization, and long-term platform vision.Influence go-to-market strategy by bringing field insight back to Product, Marketing, and leadership teams.Represent Dynatrace externally through executive briefings, customer workshops, industry events, webinars, and thought leadership.Mentor field teams on executive engagement, storytelling, value selling, and strategic account planning.Help create reusable field assets, strategic points of view, and technical value frameworks for priority industries and use cases.Partner with Customer Success and Services to promote adoption strategies that expand platform value over time.What will help you succeed12+ years of experience in enterprise technology, including senior roles in architecture, engineering, observability, cloud, security, or technical go-to-market leadership.5+ years in a customer-facing leadership role such as Field CTO, Enterprise Architect, CTO Advisor, Chief Architect, VP/Senior Director of Solution Engineering, or similar.Strong executive presence with the ability to communicate equally well with C-level leaders and deeply technical teams.Proven experience supporting complex enterprise sales cycles and strategic digital transformation programs.Deep knowledge of cloud platforms, modern application architectures, distributed systems, platform engineering, and enterprise IT operations.Strong understanding of observability, application performance, infrastructure, log analytics, digital experience, automation, and security.Experience in observability, AIOps, application security, cloud-native platforms, or enterprise analytics.Ability to connect technical transformation to business KPIs, value realization, and organizational change.Excellent communication, presentation, and workshop facilitation skills.Willingness to travel based on customer and business needs.Familiarity with executive value frameworks, business case development, and enterprise transformation methodology.Experience working with Fortune 500 or large global organizations.Background in SaaS or platform companies serving engineering, operations, and security teams.Public speaking and thought leadership experience, including conferences, customer events, or published content.Knowledge of AI/LLM adoption patterns and how AI can improve operational and business decision-making.Why you will love being a DynatracerDynatrace is a leader in unified observability and security.We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance.Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances.The Dynatrace platform uses cutting-edge technologies, including our own Davis hypermodal AI, to help our customers modernize and automate cloud operations, deliver software faster and more securely, and enable flawless digital experiences.Over 50% of the Fortune 100 companies are current customers of Dynatrace.Compensation and RewardsNote to Recruiters and Agencies : Thank you for your interest in Dynatrace. Please note that we do not accept unsolicited agency resumes -do not forward them via our website or directly to Dynatrace employees. Dynatrace will not pay fees for unsolicited resumes, and any resumes received this way will be considered the property of Dynatrace.Benefits and work-life perksWe offer best-in-class core rewards, including paid time off, financial security benefits, retirement savings plans, and health insurance. Beyond that, you'll get other benefits and work-life perks designed to make your ride with us even more rewarding.Mental health supportOur Employee Assistance Program, powered by Telus Health, offers support for you and your family members.Wellness DaysFour company-designated extra paid days off for you to recharge batteries.FlexibilityOur hybrid working model and flexible working hours offer you the flexibility you need.Employee Stock Purchase PlanPurchase company stock ( NYSE:DT ) at a discounted price and become a shareholder.Learn & developCompany-wide learning perks, designated team's learning days, and more.Volunteering dayA day of paid volunteer time to support a community or cause you care about.Regular team eventsWe host Global Culture Parties, Family & Friends at Work Day, Global Breakfasts, Green Weeks, Pride Month, and beyond!International vibeMost of our offices and teams are proudly multicultural. English is our shared language, but we embrace and learn from each other's cultures.Rewards vary depending on your employment type. Some benefits and perks also differ by location - explore your city to see what's available there.About DynatraceDynatrace (NYSE: DT) is the leading AI-powered observability and security platform. We're advancing observability for today's digital businesses, helping transform modern digital ecosystems' complexity into powerful business assets.Our AI-driven insights cut through the noise, allowing customers to focus on what truly matters by automating manual tasks and resolving issues with pinpoint accuracy. Dynatrace offers simplicity, clarity, and reliability at scale to ensure teams can make informed decisions, minimize downtime, and drive their business forward with confidence.

Platform Engineer - Engine by Starling

Onyx-Conseil

Overview At Engine by Starling, we are on a mission to find and work with leading banks around the world who have the ambition to build rapid growth businesses, on our technology. Engine is Starling's software-as-a-service (SaaS) business, the technology that was built to power Starling, and two years ago we split out as a separate business. Starling has seen exceptional growth and success, and a large part of that is down to the fact that we have built our own modern technology from the ground up. This SaaS technology platform is now available to banks and financial institutions all around the world, enabling them to benefit from the innovative digital features and efficient back-office processes that have helped achieve Starling's success. Our purpose is underpinned by five values: Listen, Keep It Simple, Do The Right Thing, Own It, and Aim For Greatness. Hybrid Working We have a Hybrid approach to working here at Engine - our preference is that you're located within a commutable distance of one of our offices so that we're able to interact and collaborate in person. About Engineering at Engine by Starling The Cross Cutting Engineering team at Engine is the backbone of our innovation. We're dedicated to building and maintaining the reliable, scalable, and maintainable infrastructure and tooling that powers our entire software delivery pipeline - from the first line of code to seamless production deployment and ongoing operations. We own the lifecycle of our features, tackling complex challenges with a first-principles approach and fostering a multi-disciplinary environment where you're encouraged to explore and contribute across the platform. As a Platform Engineer at Engine, you'll be at the forefront of building and scaling our cutting-edge cloud-native banking platform across multiple global cloud providers and regions. We're looking for engineers with a strong SRE mindset, who embrace ownership of the entire software delivery pipeline, and are passionate about building internal tooling that empowers our technology teams to operate their applications flawlessly in production. Don't worry if you don't tick every box below! We value curiosity, a willingness to learn, and a desire to work across multiple disciplines. If you're excited by the challenges of building and operating a global, cloud-native platform, we encourage you to apply. We have a great team - read about our work with Women In Tech, a Day in the life of a Software Engineer at Engine and our interview with our Staff Platform Engineer. What you'll get to do? Building and Scaling Cloud Infrastructure: Design, build, and maintain our cloud infrastructure across multiple providers (including but not limited to GCP) and regions, ensuring scalability, reliability, and security. Building on Google Cloud: Contribute to the build-out and optimisation of our core Engine on Google Cloud Platform using Java and Kubernetes. Scaling our SaaS Release Tooling: Enhance and improve our multi-tenant, multi-region SaaS release and continuous deployment systems using Java, Golang, and Terraform at its core. Empowering Developers: Develop and maintain internal tooling using Java and Golang to improve developer experience and on-call efficiency. Automating Compliance and Security: Build automation solutions in Golang to enforce compliance and security controls across our platform. Driving Efficiency: Optimise the performance and reliability of our cloud environment with a strong focus on cost-effectiveness. Embracing Automation: Identify and implement automation opportunities to minimise manual processes across the platform lifecycle. Ensuring Security: Implement and maintain robust security practices to protect our platform and customer data. Championing Best Practices: Stay abreast of new technologies and industry changes, particularly in SRE practices and deployment automation, and share your knowledge with the team. Maintaining Compliance: Contribute to ensuring our platform adheres to relevant industry standards such as ISO27001, SOC2, and PCI-DSS. Collaborating and Learning: Work closely with cross-functional teams, share your expertise, and contribute to our vibrant learning culture. Aiming for Greatness: Strive for excellence in everything you do, maintaining a curious and inquisitive mindset. Documenting Solutions: Design and document scalable internal tooling clearly and comprehensively. Taking Ownership: Own features and improvements throughout their entire lifecycle. Participate in on-call: The option to join our on-call rota (not mandatory!) to deal with interesting technical issues and gain deep insights into our platform's behavior. Your place within the team will depend on your individual strengths and interests. We are generally open-minded when it comes to hiring and we care more about aptitude and attitude than specific experience or qualifications. For this role, we are looking for some specific additional skills - if you prefer Java only roles be sure to check out our other Software Engineer roles. What skills are essential Proven experience as a Site Reliability Engineer, DevOps Engineer, Platform Engineer or similar role. Strong proficiency in Golang and/or Java (if you have experience with only one of these that's fine, we'll expect you to pick up the other whilst you're here!). Hands-on experience with Google Cloud Platform (GCP). Solid understanding and practical experience with Kubernetes. Experience with Terraform or other Infrastructure-as-Code tools. Deep understanding of SRE principles and practices, including monitoring, alerting, incident management, and capacity planning. A strong focus on automation and a passion for eliminating manual tasks. Experience with building and maintaining CI/CD pipelines. Knowledge of security best practices in cloud environments. Excellent problem-solving and analytical skills. Strong collaboration and communication skills. A proactive and continuous learning mindset. Ability to design and document technical solutions effectively. What skills are desirable Experience with other cloud providers, particularly AWS. Contributions to open-source projects. Experience with database technologies, particularly Postgres. Familiarity with observability and monitoring systems, and a solid understanding of database monitoring, analysis, disaster recovery, and performance tuning. Familiarity with compliance standards such as ISO27001, SOC2, and PCI-DSS is a plus. Our Interview process Interviewing is a two-way process and we want you to have the time and opportunity to get to know us, as much as we are getting to know you. Our interviews are conversational and we want to get the best from you, so come with questions and be curious. In general you can expect the below, following a chat with one of our Talent Team: Initial interview with an Engineer - 45 minutes Take home technical test to be discussed in the next interview Technical interview with some Engineers - 1.5 hours Final interview with our CTO / deputy CTO 45 minutes 33 days holiday (including public holidays, which you can take when it works best for you) An extra day's holiday for your birthday Annual leave is increased with length of service, and you can choose to buy or sell up to five extra days off 16 hours paid volunteering time a year Salary sacrifice, company enhanced pension scheme Life insurance at 4x your salary & group income protection Private Medical Insurance with VitalityHealth including mental health support and cancer care. Partner benefits include discounts with Waitrose, Mr&Mrs Smith and Peloton Generous family-friendly policies Incentives refer a friend scheme Perkbox membership giving access to retail discounts, a wellness platform for physical and mental health, and weekly free and boosted perks Access to initiatives like Cycle to Work, Salary Sacrificed Gym partnerships and Electric Vehicle (EV) leasing About Us You may be put off applying for a role because you don't tick every box. Forget that! While we can't accommodate every flexible working request, we're always open to discussion. So, if you're excited about working with us, but aren't sure if you're 100% there yet, get in touch anyway. We're on a mission to radically reshape banking - and that starts with our brilliant team. Whatever came before, we're proud to bring together people of all backgrounds and experiences who love working together to solve problems. Engine by Starling is an equal opportunity employer, and we're proud of our ongoing efforts to foster diversity & inclusion in the workplace. Individuals seeking employment at Engine by Starling are considered without regard to race, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, physical or mental disability, military or veteran status, or any other characteristic protected by applicable law. When you provide us with this information, you are doing so at your own consent, with full knowledge that we will process this personal data in accordance with our Privacy Notice. By submitting your application . click apply for full job details

23/06/2026

Full time

Overview At Engine by Starling, we are on a mission to find and work with leading banks around the world who have the ambition to build rapid growth businesses, on our technology. Engine is Starling's software-as-a-service (SaaS) business, the technology that was built to power Starling, and two years ago we split out as a separate business. Starling has seen exceptional growth and success, and a large part of that is down to the fact that we have built our own modern technology from the ground up. This SaaS technology platform is now available to banks and financial institutions all around the world, enabling them to benefit from the innovative digital features and efficient back-office processes that have helped achieve Starling's success. Our purpose is underpinned by five values: Listen, Keep It Simple, Do The Right Thing, Own It, and Aim For Greatness. Hybrid Working We have a Hybrid approach to working here at Engine - our preference is that you're located within a commutable distance of one of our offices so that we're able to interact and collaborate in person. About Engineering at Engine by Starling The Cross Cutting Engineering team at Engine is the backbone of our innovation. We're dedicated to building and maintaining the reliable, scalable, and maintainable infrastructure and tooling that powers our entire software delivery pipeline - from the first line of code to seamless production deployment and ongoing operations. We own the lifecycle of our features, tackling complex challenges with a first-principles approach and fostering a multi-disciplinary environment where you're encouraged to explore and contribute across the platform. As a Platform Engineer at Engine, you'll be at the forefront of building and scaling our cutting-edge cloud-native banking platform across multiple global cloud providers and regions. We're looking for engineers with a strong SRE mindset, who embrace ownership of the entire software delivery pipeline, and are passionate about building internal tooling that empowers our technology teams to operate their applications flawlessly in production. Don't worry if you don't tick every box below! We value curiosity, a willingness to learn, and a desire to work across multiple disciplines. If you're excited by the challenges of building and operating a global, cloud-native platform, we encourage you to apply. We have a great team - read about our work with Women In Tech, a Day in the life of a Software Engineer at Engine and our interview with our Staff Platform Engineer. What you'll get to do? Building and Scaling Cloud Infrastructure: Design, build, and maintain our cloud infrastructure across multiple providers (including but not limited to GCP) and regions, ensuring scalability, reliability, and security. Building on Google Cloud: Contribute to the build-out and optimisation of our core Engine on Google Cloud Platform using Java and Kubernetes. Scaling our SaaS Release Tooling: Enhance and improve our multi-tenant, multi-region SaaS release and continuous deployment systems using Java, Golang, and Terraform at its core. Empowering Developers: Develop and maintain internal tooling using Java and Golang to improve developer experience and on-call efficiency. Automating Compliance and Security: Build automation solutions in Golang to enforce compliance and security controls across our platform. Driving Efficiency: Optimise the performance and reliability of our cloud environment with a strong focus on cost-effectiveness. Embracing Automation: Identify and implement automation opportunities to minimise manual processes across the platform lifecycle. Ensuring Security: Implement and maintain robust security practices to protect our platform and customer data. Championing Best Practices: Stay abreast of new technologies and industry changes, particularly in SRE practices and deployment automation, and share your knowledge with the team. Maintaining Compliance: Contribute to ensuring our platform adheres to relevant industry standards such as ISO27001, SOC2, and PCI-DSS. Collaborating and Learning: Work closely with cross-functional teams, share your expertise, and contribute to our vibrant learning culture. Aiming for Greatness: Strive for excellence in everything you do, maintaining a curious and inquisitive mindset. Documenting Solutions: Design and document scalable internal tooling clearly and comprehensively. Taking Ownership: Own features and improvements throughout their entire lifecycle. Participate in on-call: The option to join our on-call rota (not mandatory!) to deal with interesting technical issues and gain deep insights into our platform's behavior. Your place within the team will depend on your individual strengths and interests. We are generally open-minded when it comes to hiring and we care more about aptitude and attitude than specific experience or qualifications. For this role, we are looking for some specific additional skills - if you prefer Java only roles be sure to check out our other Software Engineer roles. What skills are essential Proven experience as a Site Reliability Engineer, DevOps Engineer, Platform Engineer or similar role. Strong proficiency in Golang and/or Java (if you have experience with only one of these that's fine, we'll expect you to pick up the other whilst you're here!). Hands-on experience with Google Cloud Platform (GCP). Solid understanding and practical experience with Kubernetes. Experience with Terraform or other Infrastructure-as-Code tools. Deep understanding of SRE principles and practices, including monitoring, alerting, incident management, and capacity planning. A strong focus on automation and a passion for eliminating manual tasks. Experience with building and maintaining CI/CD pipelines. Knowledge of security best practices in cloud environments. Excellent problem-solving and analytical skills. Strong collaboration and communication skills. A proactive and continuous learning mindset. Ability to design and document technical solutions effectively. What skills are desirable Experience with other cloud providers, particularly AWS. Contributions to open-source projects. Experience with database technologies, particularly Postgres. Familiarity with observability and monitoring systems, and a solid understanding of database monitoring, analysis, disaster recovery, and performance tuning. Familiarity with compliance standards such as ISO27001, SOC2, and PCI-DSS is a plus. Our Interview process Interviewing is a two-way process and we want you to have the time and opportunity to get to know us, as much as we are getting to know you. Our interviews are conversational and we want to get the best from you, so come with questions and be curious. In general you can expect the below, following a chat with one of our Talent Team: Initial interview with an Engineer - 45 minutes Take home technical test to be discussed in the next interview Technical interview with some Engineers - 1.5 hours Final interview with our CTO / deputy CTO 45 minutes 33 days holiday (including public holidays, which you can take when it works best for you) An extra day's holiday for your birthday Annual leave is increased with length of service, and you can choose to buy or sell up to five extra days off 16 hours paid volunteering time a year Salary sacrifice, company enhanced pension scheme Life insurance at 4x your salary & group income protection Private Medical Insurance with VitalityHealth including mental health support and cancer care. Partner benefits include discounts with Waitrose, Mr&Mrs Smith and Peloton Generous family-friendly policies Incentives refer a friend scheme Perkbox membership giving access to retail discounts, a wellness platform for physical and mental health, and weekly free and boosted perks Access to initiatives like Cycle to Work, Salary Sacrificed Gym partnerships and Electric Vehicle (EV) leasing About Us You may be put off applying for a role because you don't tick every box. Forget that! While we can't accommodate every flexible working request, we're always open to discussion. So, if you're excited about working with us, but aren't sure if you're 100% there yet, get in touch anyway. We're on a mission to radically reshape banking - and that starts with our brilliant team. Whatever came before, we're proud to bring together people of all backgrounds and experiences who love working together to solve problems. Engine by Starling is an equal opportunity employer, and we're proud of our ongoing efforts to foster diversity & inclusion in the workplace. Individuals seeking employment at Engine by Starling are considered without regard to race, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, physical or mental disability, military or veteran status, or any other characteristic protected by applicable law. When you provide us with this information, you are doing so at your own consent, with full knowledge that we will process this personal data in accordance with our Privacy Notice. By submitting your application . click apply for full job details

Senior AWS Engineer SRE

Spectrum IT Recruitment

Senior AWS Engineer SRE London (Hybrid - 2 Days Onsite near Barbican Station) Looking for an opportunity to make a real impact on a major government programme? We're recruiting a Cloud Operations Engineer SRE to join a large-scale project focused on onboarding a major public sector client and delivering highly secure customer contact solutions. This is a key role within a growing engineering team, helping to build, support and optimise cloud platforms that underpin critical services. You'll work across AWS cloud infrastructure, Linux environments, container platforms and databases, helping to ensure secure, scalable and highly available systems. What we're looking for: Strong AWS cloud experience (EKS, ECS, EC2, RDS, IAM, VPC) Linux systems administration expertise Containerisation and Kubernetes experience Terraform and Infrastructure as Code knowledge Database administration experience (PostgreSQL, MySQL, Aurora or similar) A passion for reliability, security and continuous improvement Why apply? Join a major long-term government project Work on secure, mission-critical technology Excellent opportunity to influence architecture and operational excellence Be part of a global technology organisation investing heavily in growth and innovation If you enjoy solving complex infrastructure challenges and want to play a key role in a high-profile programme, we'd love to hear from you. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy. JBRP1_UKTJ

23/06/2026

Full time

Senior AWS Engineer SRE London (Hybrid - 2 Days Onsite near Barbican Station) Looking for an opportunity to make a real impact on a major government programme? We're recruiting a Cloud Operations Engineer SRE to join a large-scale project focused on onboarding a major public sector client and delivering highly secure customer contact solutions. This is a key role within a growing engineering team, helping to build, support and optimise cloud platforms that underpin critical services. You'll work across AWS cloud infrastructure, Linux environments, container platforms and databases, helping to ensure secure, scalable and highly available systems. What we're looking for: Strong AWS cloud experience (EKS, ECS, EC2, RDS, IAM, VPC) Linux systems administration expertise Containerisation and Kubernetes experience Terraform and Infrastructure as Code knowledge Database administration experience (PostgreSQL, MySQL, Aurora or similar) A passion for reliability, security and continuous improvement Why apply? Join a major long-term government project Work on secure, mission-critical technology Excellent opportunity to influence architecture and operational excellence Be part of a global technology organisation investing heavily in growth and innovation If you enjoy solving complex infrastructure challenges and want to play a key role in a high-profile programme, we'd love to hear from you. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy. JBRP1_UKTJ

Graduate Devops Engineer, AWS

Amazon

Job ID: AWS EMEA SARL (UK Branch) AWS Utility Computing (UC) provides product innovations-from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2) to new product innovations that set AWS services apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. The region service team is a customer experience oriented team looking for a self motivated, talented engineer who can solve complex problems and improve service support. We need an engineer who brings a mix of operations and networking expertise and shares our passion to change the way our customers operate. A systems engineer will drive opportunities to automate and simplify daily operations and scale the organisational operations. Key Job Responsibilities Work proactively to solve potential problems and inefficiencies. Communicate clearly and collaborate with others to deliver results with minimal supervision. Participate in 24/7 on call rotation to troubleshoot high severity issues. Analyze dashboards and investigate metrics with a vision for improvements. Troubleshoot and diagnose problems and work on solutions. Create and maintain Standard Operating Procedures (SOPs) and runbooks for documentation. Discuss radical new approaches to automate operational issues, assess risks and develop creative solutions. You will need to be a UK national and be able to obtain and maintain a UK Government Security Clearance. Further details can be found here: A Day in the Life On a typical day, engineers dive deep into understanding the root cause of a customer issue, investigate why a metric is trending the wrong way, and consult with senior engineers. Engineers own their services and implement Operational Excellence best practices to make out of hours support painless, automating manual processes. Systems engineering roles focus on troubleshooting, innovating fixes and workarounds, maintaining software updates, and providing data and metrics that support capacity and efficiency. Engineers use Linux skills, networking knowledge, and clear communication to deliver results and thrive in an environment of ambiguity and change. Basic Qualifications Bachelor's degree in Computer Science or another technical degree or related experience. Knowledge of networking fundamentals. Experience working in a 24/7 production environment. Experience in Linux systems administration and/or development. Experience working in at least two of these languages: Python, Java, Perl, PHP, Ruby or Bash/Shell. Preferred Qualifications Knowledge of configuration management systems, such as Puppet, Chef, Ansible, or related systems. Experience in site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration. Experience in network capture and systems troubleshooting. Experience building scripts, tooling, and automation for large scale computing environments. Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, please visit for more information. Please consult our Privacy Notice ( ) to know more about how we collect, use and transfer the personal data of our candidates. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

23/06/2026

Full time

Job ID: AWS EMEA SARL (UK Branch) AWS Utility Computing (UC) provides product innovations-from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2) to new product innovations that set AWS services apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. The region service team is a customer experience oriented team looking for a self motivated, talented engineer who can solve complex problems and improve service support. We need an engineer who brings a mix of operations and networking expertise and shares our passion to change the way our customers operate. A systems engineer will drive opportunities to automate and simplify daily operations and scale the organisational operations. Key Job Responsibilities Work proactively to solve potential problems and inefficiencies. Communicate clearly and collaborate with others to deliver results with minimal supervision. Participate in 24/7 on call rotation to troubleshoot high severity issues. Analyze dashboards and investigate metrics with a vision for improvements. Troubleshoot and diagnose problems and work on solutions. Create and maintain Standard Operating Procedures (SOPs) and runbooks for documentation. Discuss radical new approaches to automate operational issues, assess risks and develop creative solutions. You will need to be a UK national and be able to obtain and maintain a UK Government Security Clearance. Further details can be found here: A Day in the Life On a typical day, engineers dive deep into understanding the root cause of a customer issue, investigate why a metric is trending the wrong way, and consult with senior engineers. Engineers own their services and implement Operational Excellence best practices to make out of hours support painless, automating manual processes. Systems engineering roles focus on troubleshooting, innovating fixes and workarounds, maintaining software updates, and providing data and metrics that support capacity and efficiency. Engineers use Linux skills, networking knowledge, and clear communication to deliver results and thrive in an environment of ambiguity and change. Basic Qualifications Bachelor's degree in Computer Science or another technical degree or related experience. Knowledge of networking fundamentals. Experience working in a 24/7 production environment. Experience in Linux systems administration and/or development. Experience working in at least two of these languages: Python, Java, Perl, PHP, Ruby or Bash/Shell. Preferred Qualifications Knowledge of configuration management systems, such as Puppet, Chef, Ansible, or related systems. Experience in site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration. Experience in network capture and systems troubleshooting. Experience building scripts, tooling, and automation for large scale computing environments. Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, please visit for more information. Please consult our Privacy Notice ( ) to know more about how we collect, use and transfer the personal data of our candidates. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Senior Platform Engineer

PassFort

Automata is transforming the way labs work with open, integrated automation. Our mission is to unlock the potential of labs and the potential of the people who work in them. At Automata, we're on a mission to transform how scientists work by making automation accessible to every lab in the world. We believe that by giving labs the power to automate, we can unlock discoveries that will shape the future of life sciences-from diagnostics and drug discovery to synthetic biology. Our LINQ platform combines hardware and software to streamline workflows, making lab automation fast, flexible, and affordable. This means our customers can focus on groundbreaking research, while we take care of the rest. Why Work at Automata? Impact: Your work will directly contribute to advancements in science and medicine, supporting labs around the globe as they push boundaries in research and innovation. Innovation: You'll be part of a team solving complex problems using cutting edge technology. Growth: We invest in our people through hands on experience, professional development, and collaborative projects. Community: Join a diverse, passionate team that values collaboration. We are looking for a Senior Platform Engineer to help build, scale, and operate the foundational infrastructure powering Automata's LINQ platform. You will play a critical role in designing and maintaining robust, secure, and compliant systems that support deployments across cloud and on premise (including bare metal environments). In this role, you will be responsible for Designing, building and operating Kubernetes platforms across AWS and bare metal environments. Managing and optimizing PostgreSQL databases, ensuring performance, resilience, and data integrity. Developing and maintaining Infrastructure as Code (Terraform, Pulumi, Crossplane, or similar). Implementing and managing GitOps workflows (ArgoCD) for consistent and repeatable deployments. Supporting Windows OS provisioning and hybrid infrastructure environments. Building and improving observability systems (OpenTelemetry, metrics, tracing, logging). Contributing to network architecture and operations, including physical networking (Juniper/Mist). Supporting deployments in regulated environments (ISO27001, SOC2, GxP). Collaborating with field and customer teams on on site deployments and troubleshooting. Developing automation and internal tooling using Python, Go, Rust, or .NET. Participating in building orchestration workflows using tools like Temporal. Helping define and maintain golden paths for developers to improve productivity and reliability. Contributing to documentation, onboarding materials, and operational runbooks. What it takes 5+ years of experience in platform engineering, SRE, or infrastructure roles. Strong experience with Kubernetes in production, including bare metal clusters. Solid understanding of cloud platforms (AWS) and hybrid deployments. Experience managing databases (PostgreSQL) / DBA level knowledge preferred. Familiarity with GitOps and modern IaC practices. Experience with observability tooling and distributed systems debugging. Understanding of networking fundamentals, including physical infrastructure. Experience working in regulated environments or with compliance frameworks. Proficiency in at least one programming language (Python, Go, Rust, or .NET). Strong communication skills and ability to work cross functionally. Willingness to travel to customer sites when needed. Experience with automation/AI assisted development workflows (e.g., Claude Code). Nice to haves Experience with Juniper/Mist networking. Exposure to Temporal or similar workflow orchestration systems. Background in supporting life sciences or GxP environments. Why You'll Want to Join Us You want to operate at the edge of your capability where expectations are high and impact is real. You're motivated by ownership and autonomy- not hierarchy or process. You want to build and scale something meaningful; our product directly saves lives. You're currently under leveraged, moving faster than your environment allows. You want to work with people who are equally driven, pragmatic, and focused on outcomes. We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Discrimination of any kind based on race, colour, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status is strictly prohibited.

23/06/2026

Full time

Automata is transforming the way labs work with open, integrated automation. Our mission is to unlock the potential of labs and the potential of the people who work in them. At Automata, we're on a mission to transform how scientists work by making automation accessible to every lab in the world. We believe that by giving labs the power to automate, we can unlock discoveries that will shape the future of life sciences-from diagnostics and drug discovery to synthetic biology. Our LINQ platform combines hardware and software to streamline workflows, making lab automation fast, flexible, and affordable. This means our customers can focus on groundbreaking research, while we take care of the rest. Why Work at Automata? Impact: Your work will directly contribute to advancements in science and medicine, supporting labs around the globe as they push boundaries in research and innovation. Innovation: You'll be part of a team solving complex problems using cutting edge technology. Growth: We invest in our people through hands on experience, professional development, and collaborative projects. Community: Join a diverse, passionate team that values collaboration. We are looking for a Senior Platform Engineer to help build, scale, and operate the foundational infrastructure powering Automata's LINQ platform. You will play a critical role in designing and maintaining robust, secure, and compliant systems that support deployments across cloud and on premise (including bare metal environments). In this role, you will be responsible for Designing, building and operating Kubernetes platforms across AWS and bare metal environments. Managing and optimizing PostgreSQL databases, ensuring performance, resilience, and data integrity. Developing and maintaining Infrastructure as Code (Terraform, Pulumi, Crossplane, or similar). Implementing and managing GitOps workflows (ArgoCD) for consistent and repeatable deployments. Supporting Windows OS provisioning and hybrid infrastructure environments. Building and improving observability systems (OpenTelemetry, metrics, tracing, logging). Contributing to network architecture and operations, including physical networking (Juniper/Mist). Supporting deployments in regulated environments (ISO27001, SOC2, GxP). Collaborating with field and customer teams on on site deployments and troubleshooting. Developing automation and internal tooling using Python, Go, Rust, or .NET. Participating in building orchestration workflows using tools like Temporal. Helping define and maintain golden paths for developers to improve productivity and reliability. Contributing to documentation, onboarding materials, and operational runbooks. What it takes 5+ years of experience in platform engineering, SRE, or infrastructure roles. Strong experience with Kubernetes in production, including bare metal clusters. Solid understanding of cloud platforms (AWS) and hybrid deployments. Experience managing databases (PostgreSQL) / DBA level knowledge preferred. Familiarity with GitOps and modern IaC practices. Experience with observability tooling and distributed systems debugging. Understanding of networking fundamentals, including physical infrastructure. Experience working in regulated environments or with compliance frameworks. Proficiency in at least one programming language (Python, Go, Rust, or .NET). Strong communication skills and ability to work cross functionally. Willingness to travel to customer sites when needed. Experience with automation/AI assisted development workflows (e.g., Claude Code). Nice to haves Experience with Juniper/Mist networking. Exposure to Temporal or similar workflow orchestration systems. Background in supporting life sciences or GxP environments. Why You'll Want to Join Us You want to operate at the edge of your capability where expectations are high and impact is real. You're motivated by ownership and autonomy- not hierarchy or process. You want to build and scale something meaningful; our product directly saves lives. You're currently under leveraged, moving faster than your environment allows. You want to work with people who are equally driven, pragmatic, and focused on outcomes. We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Discrimination of any kind based on race, colour, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status is strictly prohibited.

Senior AWS Engineer SRE

Spectrum IT Recruitment

Senior AWS Engineer SRE London (Hybrid - 2 Days Onsite near Barbican Station) Looking for an opportunity to make a real impact on a major government programme? We're recruiting a Cloud Operations Engineer SRE to join a large-scale project focused on onboarding a major public sector client and delivering highly secure customer contact solutions. This is a key role within a growing engineering team, helping to build, support and optimise cloud platforms that underpin critical services. You'll work across AWS cloud infrastructure, Linux environments, container platforms and databases, helping to ensure secure, scalable and highly available systems. What we're looking for: Strong AWS cloud experience (EKS, ECS, EC2, RDS, IAM, VPC) Linux systems administration expertise Containerisation and Kubernetes experience Terraform and Infrastructure as Code knowledge Database administration experience (PostgreSQL, MySQL, Aurora or similar) A passion for reliability, security and continuous improvement Why apply? Join a major long-term government project Work on secure, mission-critical technology Excellent opportunity to influence architecture and operational excellence Be part of a global technology organisation investing heavily in growth and innovation If you enjoy solving complex infrastructure challenges and want to play a key role in a high-profile programme, we'd love to hear from you. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

23/06/2026

Full time

Senior AWS Engineer SRE London (Hybrid - 2 Days Onsite near Barbican Station) Looking for an opportunity to make a real impact on a major government programme? We're recruiting a Cloud Operations Engineer SRE to join a large-scale project focused on onboarding a major public sector client and delivering highly secure customer contact solutions. This is a key role within a growing engineering team, helping to build, support and optimise cloud platforms that underpin critical services. You'll work across AWS cloud infrastructure, Linux environments, container platforms and databases, helping to ensure secure, scalable and highly available systems. What we're looking for: Strong AWS cloud experience (EKS, ECS, EC2, RDS, IAM, VPC) Linux systems administration expertise Containerisation and Kubernetes experience Terraform and Infrastructure as Code knowledge Database administration experience (PostgreSQL, MySQL, Aurora or similar) A passion for reliability, security and continuous improvement Why apply? Join a major long-term government project Work on secure, mission-critical technology Excellent opportunity to influence architecture and operational excellence Be part of a global technology organisation investing heavily in growth and innovation If you enjoy solving complex infrastructure challenges and want to play a key role in a high-profile programme, we'd love to hear from you. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

AWS SRE: Reliability, Observability & Cost Optimisation

Source Technology Limited

Source Technology Limited is seeking a Site Reliability Engineer in London for a hybrid role, requiring 3+ years of SRE experience, especially in Kubernetes. Responsibilities include improving system reliability, observability, and cost efficiency. The ideal candidate will work closely with development and platform teams, and should be familiar with operational and incident workflows. This full-time position offers a salary of £90,000 per annum.

22/06/2026

Full time

Source Technology Limited is seeking a Site Reliability Engineer in London for a hybrid role, requiring 3+ years of SRE experience, especially in Kubernetes. Responsibilities include improving system reliability, observability, and cost efficiency. The ideal candidate will work closely with development and platform teams, and should be familiar with operational and incident workflows. This full-time position offers a salary of £90,000 per annum.

SRE Permanent London, Hybrid, AWS

Source Technology Limited

About the job Role: Site Reliability Engineer Type: Full time permanent role Location: Hybrid, London City - 3 days per week on site Salary: £90,000 per annum Industry: Technology - Gaming Platforms Our Client is a premier provider of high volume software solutions for the global iGaming and predictive analytics sector. With a footprint spanning the USA, UK, and Europe, they partner with industry leaders to engineer sophisticated platforms for sports wagering, prize based systems, and complex market simulation environments. Their vision is to lead the evolution of interactive technology through intelligent, data driven architecture that ensures seamless user experiences. The firm is driven by a culture of teamwork, transparency, and technical excellenc. The role You will help shape and drive how the firm builds and operates reliable, observable, secure, and cost efficient systems on AWS. Working closely with development, platform, and incident management teams, you will define reliability in measurable terms and build the tooling and processes to achieve it, improving platform speed, stability, and scalability. Key responsibilities Partner with engineering teams to define, measure, and manage SLOs/SLIs, using error budgets to guide delivery decisions. Enhance observability across services (metrics, logs, traces) to detect and resolve issues proactively. Lead cost optimisation: monitor spend, right size workloads, tune autoscaling, and improve infrastructure efficiency. Improve production readiness via pre deployment checks, post release validation, and robust platform guardrails. Introduce and run chaos engineering experiments to strengthen resilience and recovery. Automate operational processes to reduce manual intervention and toil across the stack. Support major incident response, root cause analysis, and continual improvement actions. Collaborate cross functionally to raise standards for stability, security, performance, and compliance. Required skills & experience 3+ years' experience in SRE, Platform, or DevOps roles within production environments. Strong Kubernetes operational experience (on prem and AWS EKS). Hands on experience defining and operating SLOs/SLIs, alerting, and incident workflows. Deep understanding of observability and telemetry (monitoring, logging, tracing). Infrastructure as Code with Terraform; experience with GitOps workflows and CI/CD. Scripting proficiency in Python, Bash, or Go. Proven ability to balance cost efficiency with reliability and performance. Excellent communication skills and the ability to work effectively across multiple teams. Strong Desirables for this role Experience running chaos engineering experiments. Exposure to high throughput, low latency systems. FinOps knowledge or cost management practices. AWS certifications (e.g., Solutions Architect, DevOps Engineer).

22/06/2026

Full time

About the job Role: Site Reliability Engineer Type: Full time permanent role Location: Hybrid, London City - 3 days per week on site Salary: £90,000 per annum Industry: Technology - Gaming Platforms Our Client is a premier provider of high volume software solutions for the global iGaming and predictive analytics sector. With a footprint spanning the USA, UK, and Europe, they partner with industry leaders to engineer sophisticated platforms for sports wagering, prize based systems, and complex market simulation environments. Their vision is to lead the evolution of interactive technology through intelligent, data driven architecture that ensures seamless user experiences. The firm is driven by a culture of teamwork, transparency, and technical excellenc. The role You will help shape and drive how the firm builds and operates reliable, observable, secure, and cost efficient systems on AWS. Working closely with development, platform, and incident management teams, you will define reliability in measurable terms and build the tooling and processes to achieve it, improving platform speed, stability, and scalability. Key responsibilities Partner with engineering teams to define, measure, and manage SLOs/SLIs, using error budgets to guide delivery decisions. Enhance observability across services (metrics, logs, traces) to detect and resolve issues proactively. Lead cost optimisation: monitor spend, right size workloads, tune autoscaling, and improve infrastructure efficiency. Improve production readiness via pre deployment checks, post release validation, and robust platform guardrails. Introduce and run chaos engineering experiments to strengthen resilience and recovery. Automate operational processes to reduce manual intervention and toil across the stack. Support major incident response, root cause analysis, and continual improvement actions. Collaborate cross functionally to raise standards for stability, security, performance, and compliance. Required skills & experience 3+ years' experience in SRE, Platform, or DevOps roles within production environments. Strong Kubernetes operational experience (on prem and AWS EKS). Hands on experience defining and operating SLOs/SLIs, alerting, and incident workflows. Deep understanding of observability and telemetry (monitoring, logging, tracing). Infrastructure as Code with Terraform; experience with GitOps workflows and CI/CD. Scripting proficiency in Python, Bash, or Go. Proven ability to balance cost efficiency with reliability and performance. Excellent communication skills and the ability to work effectively across multiple teams. Strong Desirables for this role Experience running chaos engineering experiments. Exposure to high throughput, low latency systems. FinOps knowledge or cost management practices. AWS certifications (e.g., Solutions Architect, DevOps Engineer).

Manager, Forward Deployed Engineer, TC, FS

慨正橡扯

Manager, Forward Deployed Engineer, TC, FS Location: London Other locations: Primary Location Only Salary: Competitive Date: 9 Apr 2026 Job description Requisition ID: Location: UK (London CP / Manchester / Birmingham / Edinburgh/ Belfast) - Hybrid working with client-site travel as required. Contract: Permanent, full-time The opportunity Organisations are moving rapidly from AI experimentation to operational adoption. However, many struggle to translate ideas into secure, scalable and reliable production solutions. What you'll do Client facing engineering & delivery Lead technical delivery for AI solution areas, guiding teams in translating client needs into scalable engineering approaches. Engage with business and technology stakeholders to shape technical direction, communicate trade-offs and ensure alignment on solution outcomes. Support delivery teams in navigating complex client environments while ensuring engineering quality and reliability. Solution design & implementation Architect AI enabled services such as agents, RAG pipelines and supporting platform components. Ensure solutions are designed with reliability, observability and operational readiness in mind. Guide teams in implementing responsible AI controls, evaluation approaches and engineering best practices. Product mindset & continuous improvement Mentor engineers and support the development of strong engineering practices across squads. Lead technical reviews and help establish reusable patterns, accelerators and reference architectures. Contribute to internal knowledge sharing and external thought leadership around applied AI engineering. What we're looking for Essential skills & experience Software & systems engineering: Python/TypeScript, distributed systems, API/microservice design, testing/CI/CD. Applied AI/ML: building and operating ML/DL in production; expertise in NLP/CV/transformers and classical ML. LLM/RAG engineering: embeddings, vector stores (FAISS/Milvus/Pinecone), retrieval strategies, grounding and hallucination mitigation. LLMOps: prompt pipelines, automated evaluation, telemetry/drift monitoring, model versioning and release management. Cloud architecture: Azure (preferred) and/or AWS/GCP; Kubernetes/Docker; serverless; IAM and network security. Data engineering: Spark/Databricks, ETL/ELT; collaboration with platform/data teams to deliver cloud native data + AI architectures. Enterprise integration: legacy/LoB systems; design for reliability/observability (SLIs/SLOs) and operational readiness with runbooks/SRE practices. Product leadership: discovery facilitation, PRDs, acceptance criteria, prioritisation (RICE/MoSCoW), value/adoption metrics. Responsible AI & compliance: privacy by design, auditability and UK regulatory awareness (FCA, PRA, GDPR). Consulting capabilities: stakeholder management, client ready communication, time/budget/risk management and team leadership. Nice to have Big data/graph stacks (e.g., Hadoop, Cassandra, Neo4j) and streaming (Event Hub/Kafka). Azure/AWS Solutions Architect experience; optional governance/model risk/responsible AI credentials. Technical Certifications (preferred) Microsoft Azure AI Engineer Associate (AI 102) or Azure Data Scientist Associate. AWS Machine Learning Specialty or Google Professional ML Engineer. Databricks (Data Engineer/ML Engineer) and Kubernetes (CKA/CKAD). Azure/AWS Solutions Architect; optional model risk/responsible AI governance credentials. How you work You are hands on with engineering while setting the technical direction for delivery teams. You help teams navigate technical trade offs and ensure solutions meet enterprise standards for reliability and security. You care about quality, operational readiness and long term maintainability of systems delivered to clients. What we offer High impact work with leading organisations across sectors, within a collaborative engineering led AI capability. You will benefit from: Continuous development through the FDE Academy, strengthening the architecture and engineering leadership capabilities required to build AI systems at scale. Opportunities to participate in hackathons, engineering showcases and innovation challenges. Learning and certification support across cloud, AI and engineering platforms. Competitive compensation and benefits. Flexible hybrid working arrangements depending on client needs. Travel & Working Model Hybrid working and periodic travel to client sites across the UK (and occasionally internationally), discussed based on projects and location. Inclusion and accessibility EY is committed to building an inclusive culture where everyone can thrive. If you require adjustments or support during the recruitment process, we encourage you to let us know.

22/06/2026

Full time

Manager, Forward Deployed Engineer, TC, FS Location: London Other locations: Primary Location Only Salary: Competitive Date: 9 Apr 2026 Job description Requisition ID: Location: UK (London CP / Manchester / Birmingham / Edinburgh/ Belfast) - Hybrid working with client-site travel as required. Contract: Permanent, full-time The opportunity Organisations are moving rapidly from AI experimentation to operational adoption. However, many struggle to translate ideas into secure, scalable and reliable production solutions. What you'll do Client facing engineering & delivery Lead technical delivery for AI solution areas, guiding teams in translating client needs into scalable engineering approaches. Engage with business and technology stakeholders to shape technical direction, communicate trade-offs and ensure alignment on solution outcomes. Support delivery teams in navigating complex client environments while ensuring engineering quality and reliability. Solution design & implementation Architect AI enabled services such as agents, RAG pipelines and supporting platform components. Ensure solutions are designed with reliability, observability and operational readiness in mind. Guide teams in implementing responsible AI controls, evaluation approaches and engineering best practices. Product mindset & continuous improvement Mentor engineers and support the development of strong engineering practices across squads. Lead technical reviews and help establish reusable patterns, accelerators and reference architectures. Contribute to internal knowledge sharing and external thought leadership around applied AI engineering. What we're looking for Essential skills & experience Software & systems engineering: Python/TypeScript, distributed systems, API/microservice design, testing/CI/CD. Applied AI/ML: building and operating ML/DL in production; expertise in NLP/CV/transformers and classical ML. LLM/RAG engineering: embeddings, vector stores (FAISS/Milvus/Pinecone), retrieval strategies, grounding and hallucination mitigation. LLMOps: prompt pipelines, automated evaluation, telemetry/drift monitoring, model versioning and release management. Cloud architecture: Azure (preferred) and/or AWS/GCP; Kubernetes/Docker; serverless; IAM and network security. Data engineering: Spark/Databricks, ETL/ELT; collaboration with platform/data teams to deliver cloud native data + AI architectures. Enterprise integration: legacy/LoB systems; design for reliability/observability (SLIs/SLOs) and operational readiness with runbooks/SRE practices. Product leadership: discovery facilitation, PRDs, acceptance criteria, prioritisation (RICE/MoSCoW), value/adoption metrics. Responsible AI & compliance: privacy by design, auditability and UK regulatory awareness (FCA, PRA, GDPR). Consulting capabilities: stakeholder management, client ready communication, time/budget/risk management and team leadership. Nice to have Big data/graph stacks (e.g., Hadoop, Cassandra, Neo4j) and streaming (Event Hub/Kafka). Azure/AWS Solutions Architect experience; optional governance/model risk/responsible AI credentials. Technical Certifications (preferred) Microsoft Azure AI Engineer Associate (AI 102) or Azure Data Scientist Associate. AWS Machine Learning Specialty or Google Professional ML Engineer. Databricks (Data Engineer/ML Engineer) and Kubernetes (CKA/CKAD). Azure/AWS Solutions Architect; optional model risk/responsible AI governance credentials. How you work You are hands on with engineering while setting the technical direction for delivery teams. You help teams navigate technical trade offs and ensure solutions meet enterprise standards for reliability and security. You care about quality, operational readiness and long term maintainability of systems delivered to clients. What we offer High impact work with leading organisations across sectors, within a collaborative engineering led AI capability. You will benefit from: Continuous development through the FDE Academy, strengthening the architecture and engineering leadership capabilities required to build AI systems at scale. Opportunities to participate in hackathons, engineering showcases and innovation challenges. Learning and certification support across cloud, AI and engineering platforms. Competitive compensation and benefits. Flexible hybrid working arrangements depending on client needs. Travel & Working Model Hybrid working and periodic travel to client sites across the UK (and occasionally internationally), discussed based on projects and location. Inclusion and accessibility EY is committed to building an inclusive culture where everyone can thrive. If you require adjustments or support during the recruitment process, we encourage you to let us know.

NetBackup SRE: Reliability Engineer (UK/Bulgaria)

Job Search Place Limited

Cohesity Inc. is seeking a Designated Site Reliability Engineer to enhance stability and performance for customer NetBackup systems based in the United Kingdom or Bulgaria. The role includes troubleshooting and managing various cases effectively. Ideal candidates will possess solid expertise in technical support, particularly within NetBackup solutions. Cohesity offers benefits such as healthcare coverage, paid parental leave, and continuous learning opportunities.

21/06/2026

Full time

Cohesity Inc. is seeking a Designated Site Reliability Engineer to enhance stability and performance for customer NetBackup systems based in the United Kingdom or Bulgaria. The role includes troubleshooting and managing various cases effectively. Ideal candidates will possess solid expertise in technical support, particularly within NetBackup solutions. Cohesity offers benefits such as healthcare coverage, paid parental leave, and continuous learning opportunities.

267 jobs found

Modal Window