The successful candidate will be responsible for building team resilience, overseeing day-to-day operations, and ensuring effective resource allocation across support demands. They must demonstrate strong business acumen and confidently collaborate with stakeholders during critical incidents. The role requires ownership of incident calls, leading them independently until Subject Matter Experts (SMEs) are engaged, and coordinating resolution efforts effectively.Flexibility is crucial, as the role includes out-of-office availability, including public holidays, and managing on-call responsibilities. The ideal candidate will possess excellent self-leadership, prioritization skills, and the ability to drive team performance while maintaining high service standards. Key Responsibilities Manage BAU support activities for Risk Intelligence applications during UTC morning hours Oversee shift planning to ensure 24x5 onsite coverage and 24x7 on-call support Provide out-of-hours and on-call support, including overnight production monitoring and weekend release support Ensure team adherence to performance metrics and promote continuous improvement Maintain effective documentation and processes to meet support and audit requirements Develop and regularly review 24-hour runbooks for each business line Lead incident management and resolution within the SRE team Deliver daily, weekly, and monthly status reports to stakeholders Collaborate with technical and business teams to manage stakeholder expectations People Leadership & Site Management Lead and mentor a team of SRE engineers, fostering a culture of ownership, learning, and engineering excellence Drive career development, performance management, and technical capability growth Partner with HR and Talent teams to support hiring and workforce planning Promote diversity, equity, inclusion (DEI), well-being, and psychological safety Represent the site in global SRE forums and contribute to offshore strategy and location planning Support Business Continuity Planning (BCP) and Disaster Recovery (DR) readiness Person Specification Education Bachelor's degree or equivalent experience or equivalent, preferably in a technical discipline Required Skills and Experience Experience with Linux (Amazon Linux AMI) and Windows Server 2019 in cloud environments Proficient in MySQL , PostgreSQL , MongoDB , and Aurora RDS Familiarity with AWS DocumentDB , DynamoDB , and SQLite Knowledge of MS SQL Always On Availability Groups and migration to Azure SQL Managed Instances Hands-on experience with AWS SQS and AWS SES Exposure to Amazon MSK , Coviant , and Cerberus Solid understanding of AWS S3 and EFS , including frontend integration Experience with Synapse Analytics and D365 Skilled in development using Spring Boot , Node.js , Python (Django, Flask, Apache Airflow) , Java (Java 11, Lambdas) , React , Angular , JavaScript , C# (.NET Framework) , and PHP Proficient in containerization and orchestration using Docker , Amazon ECS , EKS , and EC2 Additional Attributes 8+ years in production operations, SRE, or DevOps roles, with 3+ years in a leadership capacity Demonstrable experience managing SRE or production support teams in complex environments Strong collaboration and knowledge-sharing mentality Experience in investment banking and familiarity with financial products Excellent analytical and problem-solving skills Effective communicator with technical and business stakeholders, including senior managementLSEG is committed to encouraging a diverse, equitable, an inclusive work environment, ensuring equal opportunities for all employees, regardless of their background. We offer great employee benefits to make sure everyone performs to the best of their abilities.Join us and be part of a team that values innovation, quality, and continuous improvement. If you're ready to take your career to the next level and make a significant impact, we'd love to hear from you.LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.Our purpose is the foundation on which our culture is built. Our values of Integrity, Partnership, Excellence and Change underpin our purpose and set the standard for everything we do, every day. They go to the heart of who we are and guide our decision making and everyday actions.Working with us means that you will be part of a dynamic organisation of 25,000 people across 65 countries. However, we will value your individuality and enable you to bring your true self to work so you can help enrich our diverse workforce.We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone's race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.You will be part of a collaborative and creative culture where we encourage new ideas. We are committed to sustainability across our global business and we are proud to partner with our customers to help them meet their sustainability objectives. Our charity, the LSEG Foundation provides charitable grants to community groups that help people access economic opportunities and build a secure future with financial independence. Colleagues can get involved through fundraising and volunteering. LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid volunteering days and wellbeing initiatives.Proud to share LSEG in the India is Great Place to Work certified (Jun '25 - Jun '26).Learn more about life and purpose of our company directly from India colleagues' video: Career Stage: Senior Associate London Stock Exchange Group (LSEG) Information: Join us and be part of a team that values innovation, quality, and continuous improvement. If you're ready to take your career to the next level and make a significant impact, we'd love to hear from you.LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.Our purpose is the foundation on which our culture is built. Our values of Integrity, Partnership , Excellence and Change underpin our purpose and set the standard for everything we do, every day. They go to the heart of who we are and guide our decision making and everyday actions.Working with us means that you will be part of a dynamic organisation of 25,000 people across 65 countries. However, we will value your individuality and enable you to bring your true self to work so you can help enrich our diverse workforce.We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone's race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.You will be part of a collaborative and creative culture where we encourage new ideas. We are committed to sustainability across our global business and we are proud to partner with our customers to help them meet their sustainability
26/06/2026
Full time
The successful candidate will be responsible for building team resilience, overseeing day-to-day operations, and ensuring effective resource allocation across support demands. They must demonstrate strong business acumen and confidently collaborate with stakeholders during critical incidents. The role requires ownership of incident calls, leading them independently until Subject Matter Experts (SMEs) are engaged, and coordinating resolution efforts effectively.Flexibility is crucial, as the role includes out-of-office availability, including public holidays, and managing on-call responsibilities. The ideal candidate will possess excellent self-leadership, prioritization skills, and the ability to drive team performance while maintaining high service standards. Key Responsibilities Manage BAU support activities for Risk Intelligence applications during UTC morning hours Oversee shift planning to ensure 24x5 onsite coverage and 24x7 on-call support Provide out-of-hours and on-call support, including overnight production monitoring and weekend release support Ensure team adherence to performance metrics and promote continuous improvement Maintain effective documentation and processes to meet support and audit requirements Develop and regularly review 24-hour runbooks for each business line Lead incident management and resolution within the SRE team Deliver daily, weekly, and monthly status reports to stakeholders Collaborate with technical and business teams to manage stakeholder expectations People Leadership & Site Management Lead and mentor a team of SRE engineers, fostering a culture of ownership, learning, and engineering excellence Drive career development, performance management, and technical capability growth Partner with HR and Talent teams to support hiring and workforce planning Promote diversity, equity, inclusion (DEI), well-being, and psychological safety Represent the site in global SRE forums and contribute to offshore strategy and location planning Support Business Continuity Planning (BCP) and Disaster Recovery (DR) readiness Person Specification Education Bachelor's degree or equivalent experience or equivalent, preferably in a technical discipline Required Skills and Experience Experience with Linux (Amazon Linux AMI) and Windows Server 2019 in cloud environments Proficient in MySQL , PostgreSQL , MongoDB , and Aurora RDS Familiarity with AWS DocumentDB , DynamoDB , and SQLite Knowledge of MS SQL Always On Availability Groups and migration to Azure SQL Managed Instances Hands-on experience with AWS SQS and AWS SES Exposure to Amazon MSK , Coviant , and Cerberus Solid understanding of AWS S3 and EFS , including frontend integration Experience with Synapse Analytics and D365 Skilled in development using Spring Boot , Node.js , Python (Django, Flask, Apache Airflow) , Java (Java 11, Lambdas) , React , Angular , JavaScript , C# (.NET Framework) , and PHP Proficient in containerization and orchestration using Docker , Amazon ECS , EKS , and EC2 Additional Attributes 8+ years in production operations, SRE, or DevOps roles, with 3+ years in a leadership capacity Demonstrable experience managing SRE or production support teams in complex environments Strong collaboration and knowledge-sharing mentality Experience in investment banking and familiarity with financial products Excellent analytical and problem-solving skills Effective communicator with technical and business stakeholders, including senior managementLSEG is committed to encouraging a diverse, equitable, an inclusive work environment, ensuring equal opportunities for all employees, regardless of their background. We offer great employee benefits to make sure everyone performs to the best of their abilities.Join us and be part of a team that values innovation, quality, and continuous improvement. If you're ready to take your career to the next level and make a significant impact, we'd love to hear from you.LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.Our purpose is the foundation on which our culture is built. Our values of Integrity, Partnership, Excellence and Change underpin our purpose and set the standard for everything we do, every day. They go to the heart of who we are and guide our decision making and everyday actions.Working with us means that you will be part of a dynamic organisation of 25,000 people across 65 countries. However, we will value your individuality and enable you to bring your true self to work so you can help enrich our diverse workforce.We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone's race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.You will be part of a collaborative and creative culture where we encourage new ideas. We are committed to sustainability across our global business and we are proud to partner with our customers to help them meet their sustainability objectives. Our charity, the LSEG Foundation provides charitable grants to community groups that help people access economic opportunities and build a secure future with financial independence. Colleagues can get involved through fundraising and volunteering. LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid volunteering days and wellbeing initiatives.Proud to share LSEG in the India is Great Place to Work certified (Jun '25 - Jun '26).Learn more about life and purpose of our company directly from India colleagues' video: Career Stage: Senior Associate London Stock Exchange Group (LSEG) Information: Join us and be part of a team that values innovation, quality, and continuous improvement. If you're ready to take your career to the next level and make a significant impact, we'd love to hear from you.LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.Our purpose is the foundation on which our culture is built. Our values of Integrity, Partnership , Excellence and Change underpin our purpose and set the standard for everything we do, every day. They go to the heart of who we are and guide our decision making and everyday actions.Working with us means that you will be part of a dynamic organisation of 25,000 people across 65 countries. However, we will value your individuality and enable you to bring your true self to work so you can help enrich our diverse workforce.We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone's race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.You will be part of a collaborative and creative culture where we encourage new ideas. We are committed to sustainability across our global business and we are proud to partner with our customers to help them meet their sustainability
3 Month intial contract, scope for extension Inside IR35, (Apply online only) a day Location: Southampton - 2 x a week on site Role Overview We are seeking an experienced Principal Cloud Platform Engineer with strong DevOps leadership capability to support the design, delivery and continuous improvement of secure, scalable and resilient cloud platforms across Azure and AWS environments. The role will focus on building and governing cloud architecture patterns, landing zones, infrastructure standards and automation practices, while working closely with engineering, security, product and delivery teams. The successful candidate will provide manager-level technical leadership across DevOps, cloud platforms, Infrastructure as Code, CI/CD, networking, security, observability and reliability engineering. They will help shape enterprise-scale transformation, hybrid cloud strategy and platform services aligned to the Azure Well-Architected Framework, ensuring solutions are robust, cost-effective and operationally ready. Own and evolve Azure & AWS cloud architecture, platform patterns, guardrails, and design principles. Provide architectural oversight across Terraform modules, CI/CD pipelines (Azure DevOps, GitHub Actions), networking patterns, and compute/storage design. Evaluate platform changes including major provider upgrades (AzureRM / Cloudflare), DR and high availability improvements, cost optimisation strategies, and observability frameworks. Lead technical designs for large-scale refactoring and provider upgrades, environment creation pipelines, secure container registry access, identity integration and Zero Trust patterns, and event-driven architectures with caching strategies. Drive Cloudflare integration (CDN, WAF, edge, traffic management) aligned to Azure networking and security architecture. Provide engineering governance through review of RFCs, design documents, and technical decisions, and collaborate with delivery teams, security, shared services, and product groups to maintain aligned architecture. Requires 8-12+ years in cloud engineering/architecture, with deep expertise in Azure architecture (networking, App Services, Functions, APIM, PaaS, ACR, VNets, Private Endpoints, identity/security), strong Terraform experience (modules, pipelines, state management), and strong CI/CD background. Experience designing for SRE (DR, failover, monitoring, logging, autoscaling, resilience) and working across multiple engineering teams. Previous Cloudflare experience preferred. Architecture-led platform role (not a pure Solution Architect); deep scripting expertise not a primary requirement.
26/06/2026
Contractor
3 Month intial contract, scope for extension Inside IR35, (Apply online only) a day Location: Southampton - 2 x a week on site Role Overview We are seeking an experienced Principal Cloud Platform Engineer with strong DevOps leadership capability to support the design, delivery and continuous improvement of secure, scalable and resilient cloud platforms across Azure and AWS environments. The role will focus on building and governing cloud architecture patterns, landing zones, infrastructure standards and automation practices, while working closely with engineering, security, product and delivery teams. The successful candidate will provide manager-level technical leadership across DevOps, cloud platforms, Infrastructure as Code, CI/CD, networking, security, observability and reliability engineering. They will help shape enterprise-scale transformation, hybrid cloud strategy and platform services aligned to the Azure Well-Architected Framework, ensuring solutions are robust, cost-effective and operationally ready. Own and evolve Azure & AWS cloud architecture, platform patterns, guardrails, and design principles. Provide architectural oversight across Terraform modules, CI/CD pipelines (Azure DevOps, GitHub Actions), networking patterns, and compute/storage design. Evaluate platform changes including major provider upgrades (AzureRM / Cloudflare), DR and high availability improvements, cost optimisation strategies, and observability frameworks. Lead technical designs for large-scale refactoring and provider upgrades, environment creation pipelines, secure container registry access, identity integration and Zero Trust patterns, and event-driven architectures with caching strategies. Drive Cloudflare integration (CDN, WAF, edge, traffic management) aligned to Azure networking and security architecture. Provide engineering governance through review of RFCs, design documents, and technical decisions, and collaborate with delivery teams, security, shared services, and product groups to maintain aligned architecture. Requires 8-12+ years in cloud engineering/architecture, with deep expertise in Azure architecture (networking, App Services, Functions, APIM, PaaS, ACR, VNets, Private Endpoints, identity/security), strong Terraform experience (modules, pipelines, state management), and strong CI/CD background. Experience designing for SRE (DR, failover, monitoring, logging, autoscaling, resilience) and working across multiple engineering teams. Previous Cloudflare experience preferred. Architecture-led platform role (not a pure Solution Architect); deep scripting expertise not a primary requirement.
Site Reliability Engineer (AWS) Reporting to:Director of Engineering Location:London (Hybrid - we're flexible) Job Type:Permanent About Us Camascope is a fast-growing technology company focused on empowering the care and medication sector with technology. We are a team of talented, caring, and ambitious individuals who are committed to making a difference in care. Our ecosystem connects pharmacies, care homes, and doctors to improve the lives of many. There has never been a better time to join Camascope. Our team is growing and our product is reaching more users and partners every day. You will join a collaborative and passionate team. We love solving real problems and are committed to building the highest-quality solutions. If you are eager to make a meaningful impact in healthcare and thrive in a fast-paced startup environment, Camascope will be the perfect place for you. What You'll Do Own reliability - Maintain and improve our AWS infrastructure using Terraform, bringing your expertise and best practices Champion observability - Partner with developers to implement effective monitoring, logging, and tracing strategies Strengthen security - Work closely with the CISO to implement security best practices and ensure compliance Optimise costs - Monitor cloud spend and implement FinOps best practices Maintain CI/CD pipelines - Implement and maintain reliability and observability aspects of GitHub workflows and deployment pipelines Incident response - Lead incidents, run blameless post-mortems, and drive continuous improvement Enable developers - Mentor teams on SRE and observability practices, helping them quickly understand and resolve issues Leverage AI tooling - Use AI assisted development tools (e.g. GitHub Copilot) to accelerate infrastructure work, and explore AI driven approaches to incident detection, root cause analysis, and remediation What We're Looking For Essential 3+ years in an SRE, Platform, or DevOps engineering role AWS services:CloudWatch, X-Ray, Lambda, API Gateway, S3, SQS, Aurora PostgreSQL, DynamoDB, CloudFront, VPC, IAM, Security Groups Python for scripting, tooling, and Lambda development Terraform for Infrastructure as Code GitHub (Actions, workflows, repository management) Strong understanding of observability - metrics, logs, and traces Good understanding of cloud security principles and best practices Experience with cloud cost management and optimisationExcellent communication skills for working with technical and non-technical colleagues Self starter who can prioritise and organise their own workload Comfortable using AI assisted development tools such as GitHub Copilot Bonus Points For Datadog for monitoring, APM, and log management Azure experience:Front Door, Storage Accounts, App Service, Azure SQL Database, Application Insights Previous experience in early-stage startups or scale ups Having worked in Healthcare or Pharmacy tech previously Experience working in regulated environments or with compliance frameworks Experience with AI driven DevOps tooling (e.g. AWS DevOps Agent or similar AI agents for incident resolution, root cause analysis, and operational improvement) Experience with SLIs, SLOs, and error budgets On Call We have a 24/7 customer support team who handle day to day issues. We don't have a formal on call engineering rota, but our platform supports care homes around the clock - so we're looking for someone who is happy to occasionally jump on a call with the team if critical platform issues arise out of hours (and part of your job will be making sure this isn't necessary!). Why Join Us? Join an established engineering team and have the opportunity to enhance and shape how we approach platform and reliability engineering Make a meaningful impact in healthcare technology Work with modern cloud-native infrastructure Influence engineering culture and platform practices Collaborate in an environment where your ideas matter Grow with us as we scale Benefits Competitive salary (dependent on experience) Pension scheme and healthcare benefits Ongoing training and professional development 25 days annual leave excluding bank holidays We welcome applications from candidates of all backgrounds. If you're excited about this role but don't meet 100% of the requirements, we encourage you to apply anyway.
26/06/2026
Full time
Site Reliability Engineer (AWS) Reporting to:Director of Engineering Location:London (Hybrid - we're flexible) Job Type:Permanent About Us Camascope is a fast-growing technology company focused on empowering the care and medication sector with technology. We are a team of talented, caring, and ambitious individuals who are committed to making a difference in care. Our ecosystem connects pharmacies, care homes, and doctors to improve the lives of many. There has never been a better time to join Camascope. Our team is growing and our product is reaching more users and partners every day. You will join a collaborative and passionate team. We love solving real problems and are committed to building the highest-quality solutions. If you are eager to make a meaningful impact in healthcare and thrive in a fast-paced startup environment, Camascope will be the perfect place for you. What You'll Do Own reliability - Maintain and improve our AWS infrastructure using Terraform, bringing your expertise and best practices Champion observability - Partner with developers to implement effective monitoring, logging, and tracing strategies Strengthen security - Work closely with the CISO to implement security best practices and ensure compliance Optimise costs - Monitor cloud spend and implement FinOps best practices Maintain CI/CD pipelines - Implement and maintain reliability and observability aspects of GitHub workflows and deployment pipelines Incident response - Lead incidents, run blameless post-mortems, and drive continuous improvement Enable developers - Mentor teams on SRE and observability practices, helping them quickly understand and resolve issues Leverage AI tooling - Use AI assisted development tools (e.g. GitHub Copilot) to accelerate infrastructure work, and explore AI driven approaches to incident detection, root cause analysis, and remediation What We're Looking For Essential 3+ years in an SRE, Platform, or DevOps engineering role AWS services:CloudWatch, X-Ray, Lambda, API Gateway, S3, SQS, Aurora PostgreSQL, DynamoDB, CloudFront, VPC, IAM, Security Groups Python for scripting, tooling, and Lambda development Terraform for Infrastructure as Code GitHub (Actions, workflows, repository management) Strong understanding of observability - metrics, logs, and traces Good understanding of cloud security principles and best practices Experience with cloud cost management and optimisationExcellent communication skills for working with technical and non-technical colleagues Self starter who can prioritise and organise their own workload Comfortable using AI assisted development tools such as GitHub Copilot Bonus Points For Datadog for monitoring, APM, and log management Azure experience:Front Door, Storage Accounts, App Service, Azure SQL Database, Application Insights Previous experience in early-stage startups or scale ups Having worked in Healthcare or Pharmacy tech previously Experience working in regulated environments or with compliance frameworks Experience with AI driven DevOps tooling (e.g. AWS DevOps Agent or similar AI agents for incident resolution, root cause analysis, and operational improvement) Experience with SLIs, SLOs, and error budgets On Call We have a 24/7 customer support team who handle day to day issues. We don't have a formal on call engineering rota, but our platform supports care homes around the clock - so we're looking for someone who is happy to occasionally jump on a call with the team if critical platform issues arise out of hours (and part of your job will be making sure this isn't necessary!). Why Join Us? Join an established engineering team and have the opportunity to enhance and shape how we approach platform and reliability engineering Make a meaningful impact in healthcare technology Work with modern cloud-native infrastructure Influence engineering culture and platform practices Collaborate in an environment where your ideas matter Grow with us as we scale Benefits Competitive salary (dependent on experience) Pension scheme and healthcare benefits Ongoing training and professional development 25 days annual leave excluding bank holidays We welcome applications from candidates of all backgrounds. If you're excited about this role but don't meet 100% of the requirements, we encourage you to apply anyway.
About the job you're consideringAre you a Senior or Lead Platform Engineer who thrives on solving complex infrastructure problems and building production-grade platforms that operate reliably at scale while also influencing the design, strategy, and governance decisions that shape how teams deliver? Join our engineering team helping public sector clients build and continuously improve critical digital services using modern cloud-native and open-source tooling.You'll be part of a strong, established community of digital specialists. Together, you will share your ideas, innovate and grow. Our team of engineers support each other to deliver and develop professionally and you'll get to work alongside amazing people in one of the best cultures you can find.Hybrid working: Your working location will vary depending on the client engagement, delivery phase, base location, and security requirements. You should expect a mix of home working, Capgemini offices, and client sites. More onsite presence is typically needed for discovery, workshops, and secure environments. This is not a 100% remote appointment.Your RoleWhat You'll Build & DeliverYou'll help design, build, run and improve the platforms behind critical government services, systems that need to stay secure, observable and resilient even under national-scale load.You'll help build secure multi-cloud landing zones (majority AWS), GitOps-driven platforms and internal developer platforms that give engineers true self-service. You'll modernise legacy estates into cloud-native architectures, unify observability across complex environments using OpenTelemetry and modern APM platforms, and drive FinOps-focused optimisation.You'll contribute to and help shape engineering standards, SRE-aligned operability, CI/CD performance, supply-chain security, platform-as-a-product thinking and event-driven automation. You'll collaborate closely with client teams, bringing clear thinking and bold ideas to co-create modern platforms that make a real national-scale difference.Your scope & influence as a Senior/Lead Platform EngineerAct as a senior technical voice across the engagement, influencing platform strategy, design choices and governance guardrails.Lead technical workshops and design reviews with client stakeholders and engineering teams.Set and evolve platform standards (security, reliability, operability, CI/CD, observability) and help teams adopt them in practice.Define SLOs, improve reliability, and lead incident reviews.Coach engineers through mentoring, pairing and pragmatic hands-on technical leadership.Bring platform-as-a-product thinking: golden paths, reusable capabilities and sustainable operations.Remain hands-on, by actively contributing to designs, code, reviews and problem-solving while leading and mentoring others.What this looks like in practiceDesigning secure-by-default multi-account AWS landing zones (awareness of Azure/GCP patterns also relevant).Building internal developer platforms (IDPs) and self-service automation that boost developer experience.Deploying and operating hardened Kubernetes platforms (EKS/AKS).Creating unified observability stacks with OpenTelemetry, Prometheus and Grafana.Delivering cloud-native modernisation and transformation of legacy systems.Engineering high-quality CI/CD pipelines with DevSecOps & supply chain integrity.Implementing event-driven, automated infrastructure capabilities.Driving FinOps and cloud optimisation across complex estates.Applying platform-as-a-product thinking to create scalable, reusable capabilities.Providing hands-on technical skills & leadership and collaborating with client teams.We like engineers who are curious, opinionated about good engineering and comfortable taking ownership of systems that need to work every time, at national scale.If you want to solve hard problems, push modern engineering forward and deliver platforms that genuinely make a difference - you'll fit right in.We're a Disability Confident EmployerCapgemini is proud to be a Disability Confident Employer (Level 2) under the UK Government's Disability Confident scheme.As part of our commitment to inclusive recruitment, we will offer an interview to all candidates who: Declare they have a disability, and Meet the minimum essential criteria for the role.Please opt in during the application process.Your Security ClearanceBaseline Personnel Security Standard (BPSS)To be successfully appointed to this role you will need to undergo Baseline Personnel Security Standard checks.There are certain criteria and checks required for BPSS, and throughout the recruitment process, you will be asked questions about your security clearance eligibility such as, but not limited to, country of residence and nationality.In addition to BPSS, you will also need SC (Security Check) Clearance or to be eligible for this level of clearance (by being a UK resident for at least 5 years and not having left the country for more than 28 consecutive days during this period)Make it real - what does it mean for you?You'd be joining an accredited Great Place to work for Wellbeing in 2023. Employee wellbeing is vitally important to us as an organisation. We see a healthy and happy workforce a critical component for us to achieve our organisational ambitions. To help support wellbeing we have trained 'Mental Health Champions' across each of our business areas, and we have invested in wellbeing apps such as Thrive and Peppy.You will reimagine what's possible: creating value for the world's leading organisations through technology to build a sustainable, more inclusive future. You will work with a range of clients all with a unique set of business, technological and societal ambitions, which will make a real impact across the UK.You will be empowered to explore, innovate, and progress. You will benefit from Capgemini's 'learning for life' mindset, meaning you will have countless training and development opportunities from think tanks to hackathons, and access to 250,000 courses with numerous external certifications from AWS, Microsoft, Harvard ManageMentor, Cybersecurity qualifications and much more.You'll be bringing your unique skills and perspectives to the team, inspiring and taking inspiration from your teammates as you unlock value in everything you do. You'll be joining a professional community of experts, who have got your back and will support you, every step of the way.Why you should consider CapgeminiGrowing clients' businesses while building a more sustainable, more inclusive future is a tough ask. But when you join Capgemini, you join a thriving company and become part of a diverse collective of free-thinkers, entrepreneurs and industry experts.A powerful source of energy that drives us all to find new ways technology can help us reimagine what's possible. It's why, together, we seek out opportunities that will transform the world's leading businesses. And it's how you'll gain the experiences and connections you need to shape your future. By learning from each other every day, sharing knowledge and always pushing yourself to do better, you'll build the skills you want. And you'll use them to help our clients leverage technology to grow their business and give innovation that human touch the world needs. So, it might not always be easy, but making the world a better place rarely is.About CapgeminiCapgemini is a global business and technology transformation partner, helping organisations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fuelled by its market-leading capabilities in AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of €22.5 billion.
26/06/2026
Full time
About the job you're consideringAre you a Senior or Lead Platform Engineer who thrives on solving complex infrastructure problems and building production-grade platforms that operate reliably at scale while also influencing the design, strategy, and governance decisions that shape how teams deliver? Join our engineering team helping public sector clients build and continuously improve critical digital services using modern cloud-native and open-source tooling.You'll be part of a strong, established community of digital specialists. Together, you will share your ideas, innovate and grow. Our team of engineers support each other to deliver and develop professionally and you'll get to work alongside amazing people in one of the best cultures you can find.Hybrid working: Your working location will vary depending on the client engagement, delivery phase, base location, and security requirements. You should expect a mix of home working, Capgemini offices, and client sites. More onsite presence is typically needed for discovery, workshops, and secure environments. This is not a 100% remote appointment.Your RoleWhat You'll Build & DeliverYou'll help design, build, run and improve the platforms behind critical government services, systems that need to stay secure, observable and resilient even under national-scale load.You'll help build secure multi-cloud landing zones (majority AWS), GitOps-driven platforms and internal developer platforms that give engineers true self-service. You'll modernise legacy estates into cloud-native architectures, unify observability across complex environments using OpenTelemetry and modern APM platforms, and drive FinOps-focused optimisation.You'll contribute to and help shape engineering standards, SRE-aligned operability, CI/CD performance, supply-chain security, platform-as-a-product thinking and event-driven automation. You'll collaborate closely with client teams, bringing clear thinking and bold ideas to co-create modern platforms that make a real national-scale difference.Your scope & influence as a Senior/Lead Platform EngineerAct as a senior technical voice across the engagement, influencing platform strategy, design choices and governance guardrails.Lead technical workshops and design reviews with client stakeholders and engineering teams.Set and evolve platform standards (security, reliability, operability, CI/CD, observability) and help teams adopt them in practice.Define SLOs, improve reliability, and lead incident reviews.Coach engineers through mentoring, pairing and pragmatic hands-on technical leadership.Bring platform-as-a-product thinking: golden paths, reusable capabilities and sustainable operations.Remain hands-on, by actively contributing to designs, code, reviews and problem-solving while leading and mentoring others.What this looks like in practiceDesigning secure-by-default multi-account AWS landing zones (awareness of Azure/GCP patterns also relevant).Building internal developer platforms (IDPs) and self-service automation that boost developer experience.Deploying and operating hardened Kubernetes platforms (EKS/AKS).Creating unified observability stacks with OpenTelemetry, Prometheus and Grafana.Delivering cloud-native modernisation and transformation of legacy systems.Engineering high-quality CI/CD pipelines with DevSecOps & supply chain integrity.Implementing event-driven, automated infrastructure capabilities.Driving FinOps and cloud optimisation across complex estates.Applying platform-as-a-product thinking to create scalable, reusable capabilities.Providing hands-on technical skills & leadership and collaborating with client teams.We like engineers who are curious, opinionated about good engineering and comfortable taking ownership of systems that need to work every time, at national scale.If you want to solve hard problems, push modern engineering forward and deliver platforms that genuinely make a difference - you'll fit right in.We're a Disability Confident EmployerCapgemini is proud to be a Disability Confident Employer (Level 2) under the UK Government's Disability Confident scheme.As part of our commitment to inclusive recruitment, we will offer an interview to all candidates who: Declare they have a disability, and Meet the minimum essential criteria for the role.Please opt in during the application process.Your Security ClearanceBaseline Personnel Security Standard (BPSS)To be successfully appointed to this role you will need to undergo Baseline Personnel Security Standard checks.There are certain criteria and checks required for BPSS, and throughout the recruitment process, you will be asked questions about your security clearance eligibility such as, but not limited to, country of residence and nationality.In addition to BPSS, you will also need SC (Security Check) Clearance or to be eligible for this level of clearance (by being a UK resident for at least 5 years and not having left the country for more than 28 consecutive days during this period)Make it real - what does it mean for you?You'd be joining an accredited Great Place to work for Wellbeing in 2023. Employee wellbeing is vitally important to us as an organisation. We see a healthy and happy workforce a critical component for us to achieve our organisational ambitions. To help support wellbeing we have trained 'Mental Health Champions' across each of our business areas, and we have invested in wellbeing apps such as Thrive and Peppy.You will reimagine what's possible: creating value for the world's leading organisations through technology to build a sustainable, more inclusive future. You will work with a range of clients all with a unique set of business, technological and societal ambitions, which will make a real impact across the UK.You will be empowered to explore, innovate, and progress. You will benefit from Capgemini's 'learning for life' mindset, meaning you will have countless training and development opportunities from think tanks to hackathons, and access to 250,000 courses with numerous external certifications from AWS, Microsoft, Harvard ManageMentor, Cybersecurity qualifications and much more.You'll be bringing your unique skills and perspectives to the team, inspiring and taking inspiration from your teammates as you unlock value in everything you do. You'll be joining a professional community of experts, who have got your back and will support you, every step of the way.Why you should consider CapgeminiGrowing clients' businesses while building a more sustainable, more inclusive future is a tough ask. But when you join Capgemini, you join a thriving company and become part of a diverse collective of free-thinkers, entrepreneurs and industry experts.A powerful source of energy that drives us all to find new ways technology can help us reimagine what's possible. It's why, together, we seek out opportunities that will transform the world's leading businesses. And it's how you'll gain the experiences and connections you need to shape your future. By learning from each other every day, sharing knowledge and always pushing yourself to do better, you'll build the skills you want. And you'll use them to help our clients leverage technology to grow their business and give innovation that human touch the world needs. So, it might not always be easy, but making the world a better place rarely is.About CapgeminiCapgemini is a global business and technology transformation partner, helping organisations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fuelled by its market-leading capabilities in AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of €22.5 billion.
Site Reliability Engineer (AWS) Reporting to:Director of Engineering Location:London (Hybrid - we're flexible) Job Type:Permanent About Us Camascope is a fast-growing technology company focused on empowering the care and medication sector with technology. We are a team of talented, caring, and ambitious individuals who are committed to making a difference in care. Our ecosystem connects pharmacies, care homes, and doctors to improve the lives of many. There has never been a better time to join Camascope. Our team is growing and our product is reaching more users and partners every day. You will join a collaborative and passionate team. We love solving real problems and are committed to building the highest-quality solutions. If you are eager to make a meaningful impact in healthcare and thrive in a fast-paced startup environment, Camascope will be the perfect place for you. What You'll Do Own reliability - Maintain and improve our AWS infrastructure using Terraform, bringing your expertise and best practices Champion observability - Partner with developers to implement effective monitoring, logging, and tracing strategies Strengthen security - Work closely with the CISO to implement security best practices and ensure compliance Optimise costs - Monitor cloud spend and implement FinOps best practices Maintain CI/CD pipelines - Implement and maintain reliability and observability aspects of GitHub workflows and deployment pipelines Incident response - Lead incidents, run blameless post-mortems, and drive continuous improvement Enable developers - Mentor teams on SRE and observability practices, helping them quickly understand and resolve issues Leverage AI tooling - Use AI assisted development tools (e.g. GitHub Copilot) to accelerate infrastructure work, and explore AI driven approaches to incident detection, root cause analysis, and remediation What We're Looking For Essential 3+ years in an SRE, Platform, or DevOps engineering role AWS services:CloudWatch, X-Ray, Lambda, API Gateway, S3, SQS, Aurora PostgreSQL, DynamoDB, CloudFront, VPC, IAM, Security Groups Python for scripting, tooling, and Lambda development Terraform for Infrastructure as Code GitHub (Actions, workflows, repository management) Strong understanding of observability - metrics, logs, and traces Good understanding of cloud security principles and best practices Experience with cloud cost management and optimisationExcellent communication skills for working with technical and non-technical colleagues Self starter who can prioritise and organise their own workload Comfortable using AI assisted development tools such as GitHub Copilot Bonus Points For Datadog for monitoring, APM, and log management Azure experience:Front Door, Storage Accounts, App Service, Azure SQL Database, Application Insights Previous experience in early-stage startups or scale ups Having worked in Healthcare or Pharmacy tech previously Experience working in regulated environments or with compliance frameworks Experience with AI driven DevOps tooling (e.g. AWS DevOps Agent or similar AI agents for incident resolution, root cause analysis, and operational improvement) Experience with SLIs, SLOs, and error budgets On Call We have a 24/7 customer support team who handle day to day issues. We don't have a formal on call engineering rota, but our platform supports care homes around the clock - so we're looking for someone who is happy to occasionally jump on a call with the team if critical platform issues arise out of hours (and part of your job will be making sure this isn't necessary!). Why Join Us? Join an established engineering team and have the opportunity to enhance and shape how we approach platform and reliability engineering Make a meaningful impact in healthcare technology Work with modern cloud-native infrastructure Influence engineering culture and platform practices Collaborate in an environment where your ideas matter Grow with us as we scale Benefits Competitive salary (dependent on experience) Pension scheme and healthcare benefits Ongoing training and professional development 25 days annual leave excluding bank holidays We welcome applications from candidates of all backgrounds. If you're excited about this role but don't meet 100% of the requirements, we encourage you to apply anyway.
26/06/2026
Full time
Site Reliability Engineer (AWS) Reporting to:Director of Engineering Location:London (Hybrid - we're flexible) Job Type:Permanent About Us Camascope is a fast-growing technology company focused on empowering the care and medication sector with technology. We are a team of talented, caring, and ambitious individuals who are committed to making a difference in care. Our ecosystem connects pharmacies, care homes, and doctors to improve the lives of many. There has never been a better time to join Camascope. Our team is growing and our product is reaching more users and partners every day. You will join a collaborative and passionate team. We love solving real problems and are committed to building the highest-quality solutions. If you are eager to make a meaningful impact in healthcare and thrive in a fast-paced startup environment, Camascope will be the perfect place for you. What You'll Do Own reliability - Maintain and improve our AWS infrastructure using Terraform, bringing your expertise and best practices Champion observability - Partner with developers to implement effective monitoring, logging, and tracing strategies Strengthen security - Work closely with the CISO to implement security best practices and ensure compliance Optimise costs - Monitor cloud spend and implement FinOps best practices Maintain CI/CD pipelines - Implement and maintain reliability and observability aspects of GitHub workflows and deployment pipelines Incident response - Lead incidents, run blameless post-mortems, and drive continuous improvement Enable developers - Mentor teams on SRE and observability practices, helping them quickly understand and resolve issues Leverage AI tooling - Use AI assisted development tools (e.g. GitHub Copilot) to accelerate infrastructure work, and explore AI driven approaches to incident detection, root cause analysis, and remediation What We're Looking For Essential 3+ years in an SRE, Platform, or DevOps engineering role AWS services:CloudWatch, X-Ray, Lambda, API Gateway, S3, SQS, Aurora PostgreSQL, DynamoDB, CloudFront, VPC, IAM, Security Groups Python for scripting, tooling, and Lambda development Terraform for Infrastructure as Code GitHub (Actions, workflows, repository management) Strong understanding of observability - metrics, logs, and traces Good understanding of cloud security principles and best practices Experience with cloud cost management and optimisationExcellent communication skills for working with technical and non-technical colleagues Self starter who can prioritise and organise their own workload Comfortable using AI assisted development tools such as GitHub Copilot Bonus Points For Datadog for monitoring, APM, and log management Azure experience:Front Door, Storage Accounts, App Service, Azure SQL Database, Application Insights Previous experience in early-stage startups or scale ups Having worked in Healthcare or Pharmacy tech previously Experience working in regulated environments or with compliance frameworks Experience with AI driven DevOps tooling (e.g. AWS DevOps Agent or similar AI agents for incident resolution, root cause analysis, and operational improvement) Experience with SLIs, SLOs, and error budgets On Call We have a 24/7 customer support team who handle day to day issues. We don't have a formal on call engineering rota, but our platform supports care homes around the clock - so we're looking for someone who is happy to occasionally jump on a call with the team if critical platform issues arise out of hours (and part of your job will be making sure this isn't necessary!). Why Join Us? Join an established engineering team and have the opportunity to enhance and shape how we approach platform and reliability engineering Make a meaningful impact in healthcare technology Work with modern cloud-native infrastructure Influence engineering culture and platform practices Collaborate in an environment where your ideas matter Grow with us as we scale Benefits Competitive salary (dependent on experience) Pension scheme and healthcare benefits Ongoing training and professional development 25 days annual leave excluding bank holidays We welcome applications from candidates of all backgrounds. If you're excited about this role but don't meet 100% of the requirements, we encourage you to apply anyway.
A global wealth management platform provider is seeking a Lead Site Reliability Engineer to ensure the reliability and performance of its platforms. The role involves deploying and integrating mission-critical systems, collaborating with various engineering teams, and using modern automation practices. Candidates should have deep expertise in Kubernetes, Terraform, and public cloud environments like AWS or Azure, along with strong problem-solving skills. The role offers a full-time contract based in Edinburgh, with a strong emphasis on teamwork and automation.
26/06/2026
Full time
A global wealth management platform provider is seeking a Lead Site Reliability Engineer to ensure the reliability and performance of its platforms. The role involves deploying and integrating mission-critical systems, collaborating with various engineering teams, and using modern automation practices. Candidates should have deep expertise in Kubernetes, Terraform, and public cloud environments like AWS or Azure, along with strong problem-solving skills. The role offers a full-time contract based in Edinburgh, with a strong emphasis on teamwork and automation.
Lead Site Reliability Engineer page is loaded Lead Site Reliability Engineerlocations: Edinburgh - United Kingdom: London - United Kingdomtime type: Full timeposted on: Posted Todayjob requisition id: REQ-16143 Role Purpose The Site Reliability Engineer will work closely with Application, Infrastructure, and Network Engineering teams to ensure the reliability, scalability, and performance of FNZ platforms. This role focuses on deploying, integrating, and providing ongoing operational support for mission-critical systems, leveraging modern automation and cloud-native practices. Key Responsibilities Maintain high availability and performance of FNZ platforms. Implement monitoring, alerting, and observability solutions to proactively detect and resolve issues. Collaborate with engineering teams to design and implement robust deployment pipelines. Ensure smooth integration of applications with infrastructure and network components. Use Terraform for provisioning and managing infrastructure across environments. Operate and optimize workloads onprem and public cloud. Manage and troubleshoot application delivery networks, load balancing, and traffic routing. Configure and support F5 Distributed Cloud or similar CDN/ADC technologies. Participate in on-call rotations, perform root cause analysis, and implement preventive measures. Work cross-functionally with Application, Infrastructure, and Network Engineering teams to deliver reliable services. Required Skills & Experience Kubernetes (K8s): Deep understanding of container orchestration and cluster management. Terraform: Strong experience in Infrastructure as Code for cloud and on-prem environments. Public Cloud: Hands-on experience with AWS, Azure, or GCP. F5 Distributed Cloud or Similar: Knowledge of CDN/ADC platforms and their integration. Networking Fundamentals: Expertise in application delivery networks, load balancing, traffic routing, and troubleshooting. Observability Tools: Familiarity with Splunk, NewRelic, or similar. Scripting & Automation: Proficiency in Terraform, Bash, or similar languages. Desirable Skills Experience with CI/CD pipelines and GitOps workflows. Knowledge of SRE principles. Familiarity with security best practices. Key Attributes Strong problem-solving and troubleshooting skills. Ability to work collaboratively across multiple teams. Passion for automation and reducing operational toil. Reporting Line Reports to: Head of Platform Operations/Application Engineering.Works closely with Application Engineering, Infrastructure Engineering, Network Engineering teams. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealth's growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with the world's leading financial institutions, with over US$2.2 trillion in assets on platform (AoP). Together with our clients, we empower nearly 30 million people across all wealth segments to invest in their future. (blob:)0:00 / 2:32
26/06/2026
Full time
Lead Site Reliability Engineer page is loaded Lead Site Reliability Engineerlocations: Edinburgh - United Kingdom: London - United Kingdomtime type: Full timeposted on: Posted Todayjob requisition id: REQ-16143 Role Purpose The Site Reliability Engineer will work closely with Application, Infrastructure, and Network Engineering teams to ensure the reliability, scalability, and performance of FNZ platforms. This role focuses on deploying, integrating, and providing ongoing operational support for mission-critical systems, leveraging modern automation and cloud-native practices. Key Responsibilities Maintain high availability and performance of FNZ platforms. Implement monitoring, alerting, and observability solutions to proactively detect and resolve issues. Collaborate with engineering teams to design and implement robust deployment pipelines. Ensure smooth integration of applications with infrastructure and network components. Use Terraform for provisioning and managing infrastructure across environments. Operate and optimize workloads onprem and public cloud. Manage and troubleshoot application delivery networks, load balancing, and traffic routing. Configure and support F5 Distributed Cloud or similar CDN/ADC technologies. Participate in on-call rotations, perform root cause analysis, and implement preventive measures. Work cross-functionally with Application, Infrastructure, and Network Engineering teams to deliver reliable services. Required Skills & Experience Kubernetes (K8s): Deep understanding of container orchestration and cluster management. Terraform: Strong experience in Infrastructure as Code for cloud and on-prem environments. Public Cloud: Hands-on experience with AWS, Azure, or GCP. F5 Distributed Cloud or Similar: Knowledge of CDN/ADC platforms and their integration. Networking Fundamentals: Expertise in application delivery networks, load balancing, traffic routing, and troubleshooting. Observability Tools: Familiarity with Splunk, NewRelic, or similar. Scripting & Automation: Proficiency in Terraform, Bash, or similar languages. Desirable Skills Experience with CI/CD pipelines and GitOps workflows. Knowledge of SRE principles. Familiarity with security best practices. Key Attributes Strong problem-solving and troubleshooting skills. Ability to work collaboratively across multiple teams. Passion for automation and reducing operational toil. Reporting Line Reports to: Head of Platform Operations/Application Engineering.Works closely with Application Engineering, Infrastructure Engineering, Network Engineering teams. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealth's growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with the world's leading financial institutions, with over US$2.2 trillion in assets on platform (AoP). Together with our clients, we empower nearly 30 million people across all wealth segments to invest in their future. (blob:)0:00 / 2:32
At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law. About us Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems. Our vision is to create autonomy that propels the world forward. Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving. In our fast-paced environment big problems ignite us-we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future. At Wayve, your contributions matter. We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact. Make Wayve the experience that defines your career! The role As a Cloud Site Reliability Engineer at Wayve, you will build and scale the reliability foundations of our AI cloud platform. This includes our Model Development Platform (powering end-to-end model development from raw data to on-road experimentation) and our GPU Compute platform (large-scale, multi-tenant GPU fleets and scheduling systems driving model training and inference at scale). This is a founding Cloud SRE role. You won't inherit a mature SRE function, you'll help create it. You will define the frameworks, automation, and operational standards that ensure our model development infrastructure, distributed systems, and large compute clusters operate predictably, efficiently, and at scale. This role sits at the intersection of AI research, large-scale cloud infrastructure, and production operations. Your work will directly enable faster model training, reliable experimentation, and scalable AI deployment by ensuring our cloud infrastructure is resilient and performant. Key responsibilities Reliability & Platform Ownership Own the reliability, availability, and performance of the Model Dev Platform and GPU Compute environments. Define and operationalise SLOs, SLIs, and error budgets across platform services. Improve capacity planning, scaling strategies, and resource efficiency across large GPU-backed clusters. Partner with ML, platform, and software teams to establish clear production readiness standards. Incident Response & On-Call Participate in a 24/7 on-call rotation as first-line response for cloud and cluster-related incidents. Lead incident triage, escalation, communications, and root cause analysis. Translate post-incident learning into durable architectural or automation improvements. Continuously reduce alert noise and recurring operational burden. Observability & Operational Excellence Design and operate monitoring, logging, tracing, and alerting systems that enable rapid detection and recovery. Build dashboards that reflect real user-centric platform health (not just infrastructure metrics). Improve deployment safety through better change management, validation, and rollback mechanisms. Automation & Tooling Build automation for cluster operations, training workflows, remediation, and scaling tasks. Implement self healing patterns and resilient recovery workflows. Harden CI/CD and release processes to improve deployment safety and velocity. Support infrastructure as code and policy driven guardrails to ensure secure, reliable cloud environments. About you In order to set you up for success as a Cloud Site Reliability Engineer at Wayve, we're looking for the following skills and experience. Essential skills Proven experience in an SRE, Production Engineer, or Cloud Reliability role supporting large-scale cloud systems. Strong Kubernetes experience, including operating production clusters. Hands on experience running production workloads in AWS, GCP, or Azure. Experience operating complex distributed systems in production, ideally including compute-heavy or high-performance workloads. Experience working with large compute clusters; exposure to AI/ML training or inference workloads strongly preferred. Strong Linux fundamentals and proficiency in at least one scripting or systems language (e.g., Python, Go, C++) with a bias toward automation. Deep troubleshooting skills across networking, storage, distributed systems, and performance at scale. Experience designing and operating observability stacks (e.g., Datadog, Prometheus, Grafana, OpenTelemetry). Clear communication skills, including leading incidents, writing post mortems, and influencing teams to prioritise reliability improvements. Desirable skills Experience operating GPU backed environments or large scale ML infrastructure. Experience running model training or inference pipelines in production (MLOps). Familiarity with infrastructure as code (e.g., Terraform) and secure cloud production environments. Experience defining and running SLOs/SLIs and building reliability programs across multiple teams. Experience as an early or founding SRE hire establishing processes from scratch. Interest in helping shape and grow a Cloud SRE function, with potential to take on leadership responsibilities over time. This is a full time role based in our office in London (2 days a week in the office). At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home. Wayve is committed to creating an inclusive interview experience. If you require any accommodations or adjustments to participate fully in our interview process, please let us know.
24/06/2026
Full time
At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law. About us Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems. Our vision is to create autonomy that propels the world forward. Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving. In our fast-paced environment big problems ignite us-we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future. At Wayve, your contributions matter. We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact. Make Wayve the experience that defines your career! The role As a Cloud Site Reliability Engineer at Wayve, you will build and scale the reliability foundations of our AI cloud platform. This includes our Model Development Platform (powering end-to-end model development from raw data to on-road experimentation) and our GPU Compute platform (large-scale, multi-tenant GPU fleets and scheduling systems driving model training and inference at scale). This is a founding Cloud SRE role. You won't inherit a mature SRE function, you'll help create it. You will define the frameworks, automation, and operational standards that ensure our model development infrastructure, distributed systems, and large compute clusters operate predictably, efficiently, and at scale. This role sits at the intersection of AI research, large-scale cloud infrastructure, and production operations. Your work will directly enable faster model training, reliable experimentation, and scalable AI deployment by ensuring our cloud infrastructure is resilient and performant. Key responsibilities Reliability & Platform Ownership Own the reliability, availability, and performance of the Model Dev Platform and GPU Compute environments. Define and operationalise SLOs, SLIs, and error budgets across platform services. Improve capacity planning, scaling strategies, and resource efficiency across large GPU-backed clusters. Partner with ML, platform, and software teams to establish clear production readiness standards. Incident Response & On-Call Participate in a 24/7 on-call rotation as first-line response for cloud and cluster-related incidents. Lead incident triage, escalation, communications, and root cause analysis. Translate post-incident learning into durable architectural or automation improvements. Continuously reduce alert noise and recurring operational burden. Observability & Operational Excellence Design and operate monitoring, logging, tracing, and alerting systems that enable rapid detection and recovery. Build dashboards that reflect real user-centric platform health (not just infrastructure metrics). Improve deployment safety through better change management, validation, and rollback mechanisms. Automation & Tooling Build automation for cluster operations, training workflows, remediation, and scaling tasks. Implement self healing patterns and resilient recovery workflows. Harden CI/CD and release processes to improve deployment safety and velocity. Support infrastructure as code and policy driven guardrails to ensure secure, reliable cloud environments. About you In order to set you up for success as a Cloud Site Reliability Engineer at Wayve, we're looking for the following skills and experience. Essential skills Proven experience in an SRE, Production Engineer, or Cloud Reliability role supporting large-scale cloud systems. Strong Kubernetes experience, including operating production clusters. Hands on experience running production workloads in AWS, GCP, or Azure. Experience operating complex distributed systems in production, ideally including compute-heavy or high-performance workloads. Experience working with large compute clusters; exposure to AI/ML training or inference workloads strongly preferred. Strong Linux fundamentals and proficiency in at least one scripting or systems language (e.g., Python, Go, C++) with a bias toward automation. Deep troubleshooting skills across networking, storage, distributed systems, and performance at scale. Experience designing and operating observability stacks (e.g., Datadog, Prometheus, Grafana, OpenTelemetry). Clear communication skills, including leading incidents, writing post mortems, and influencing teams to prioritise reliability improvements. Desirable skills Experience operating GPU backed environments or large scale ML infrastructure. Experience running model training or inference pipelines in production (MLOps). Familiarity with infrastructure as code (e.g., Terraform) and secure cloud production environments. Experience defining and running SLOs/SLIs and building reliability programs across multiple teams. Experience as an early or founding SRE hire establishing processes from scratch. Interest in helping shape and grow a Cloud SRE function, with potential to take on leadership responsibilities over time. This is a full time role based in our office in London (2 days a week in the office). At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home. Wayve is committed to creating an inclusive interview experience. If you require any accommodations or adjustments to participate fully in our interview process, please let us know.
Engineer the future of global finance. At Citi, our Tech team doesn't just support finance - we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning and flexibility with potential hybrid work opportunities. Help solve real-world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech. Key Responsibilities Hands On Operational Leadership: Directly manage, mentor, and develop a technical support team while actively engaging in day to day operational tasks, incident response, and problem resolution for the Instant Payments application. Direct Operational Management: Take ownership of ensuring the operational stability and performance of the Instant Payments application across diverse cloud environments (Citi's Enterprise Cloud and Public Cloud), including active monitoring and system checks. Technical Implementation & Optimization: Lead the implementation, configuration, and continuous optimization of observability (monitoring, logging, tracing tools), resiliency (designing and implementing auto healing and retry mechanisms), and recoverability (executing disaster recovery strategies) solutions for the cloud native Instant Payments application. This includes writing and maintaining scripts for these functions. Service Level Execution & Improvement: Contribute to improving service levels by implementing operational efficiencies, performing incident management, problem management, and enhancing knowledge sharing practices for the Instant Payments application. Application Onboarding & Technical Guidance: Actively participate in defining and implementing application onboarding guidelines and standards, and provide direct technical guidance to development teams on stability and supportability improvements for the Instant Payments application. Incident & Problem Resolution: Lead and execute troubleshooting efforts for complex technical issues, perform in depth root cause analysis, and implement permanent fixes for the Instant Payments application. Cost Efficiency & Automation: Identify and implement opportunities for cost reduction and operational efficiencies through proactive analysis, performance tuning, and the development of automation scripts and tools. Ensure adherence to support process and tool standards. Technical Communication: Effectively communicate technical details, application status, operational risks, and support initiatives to product teams, development teams, and relevant stakeholders. Risk & Compliance: Directly ensure operational risk is managed effectively and compliance with applicable policies, rules, and regulations is maintained for the Instant Payments application support function. Qualifications Progressive, hands on experience in application support, Site Reliability Engineering (SRE), or technical operations for mission critical, high volume financial applications. Direct experience with cloud native architectures, including configuration and management of microservices, containers (e.g., Kubernetes), and serverless technologies. Practical experience with major Public Cloud platforms (e.g., AWS, Azure, GCP) and enterprise private cloud environments. Track record in implementing and operating comprehensive observability stacks (e.g., Prometheus, Grafana, ELK stack, Jaeger, distributed tracing). Understanding and application of resiliency engineering principles (e.g., circuit breakers, bulkheads, retry mechanisms) and robust disaster recovery strategies. Strong technical background in instant payments or real time financial transaction processing systems highly desirable. Expertise in automation, scripting (e.g., Python, Go, Shell), and infrastructure as code principles (e.g., Terraform, CloudFormation). Excellent communication, interpersonal, and team leadership skills, with the ability to manage and motivate a technical team while remaining deeply technical. Proven ability to troubleshoot and resolve complex technical issues independently, prioritize effectively, and make sound decisions under pressure. Education Bachelor's/University degree in Computer Science, Engineering, or a related technical field is required. Relevant certifications (e.g., Public Cloud Certified Solutions Architect, Certified Kubernetes Administrator) preferred. What we'll provide you 27 days annual leave (plus bank holidays) Discretionary annual performance related bonus Private Medical Care & Life Insurance Employee Assistance Program Pension Plan Paid Parental Leave Special discounts for employees, family, and friends Access to an array of learning and development resources Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity, review Accessibility at Citi. View Citi's EEO Policy Statement and the Know Your Rights poster.
23/06/2026
Full time
Engineer the future of global finance. At Citi, our Tech team doesn't just support finance - we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning and flexibility with potential hybrid work opportunities. Help solve real-world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech. Key Responsibilities Hands On Operational Leadership: Directly manage, mentor, and develop a technical support team while actively engaging in day to day operational tasks, incident response, and problem resolution for the Instant Payments application. Direct Operational Management: Take ownership of ensuring the operational stability and performance of the Instant Payments application across diverse cloud environments (Citi's Enterprise Cloud and Public Cloud), including active monitoring and system checks. Technical Implementation & Optimization: Lead the implementation, configuration, and continuous optimization of observability (monitoring, logging, tracing tools), resiliency (designing and implementing auto healing and retry mechanisms), and recoverability (executing disaster recovery strategies) solutions for the cloud native Instant Payments application. This includes writing and maintaining scripts for these functions. Service Level Execution & Improvement: Contribute to improving service levels by implementing operational efficiencies, performing incident management, problem management, and enhancing knowledge sharing practices for the Instant Payments application. Application Onboarding & Technical Guidance: Actively participate in defining and implementing application onboarding guidelines and standards, and provide direct technical guidance to development teams on stability and supportability improvements for the Instant Payments application. Incident & Problem Resolution: Lead and execute troubleshooting efforts for complex technical issues, perform in depth root cause analysis, and implement permanent fixes for the Instant Payments application. Cost Efficiency & Automation: Identify and implement opportunities for cost reduction and operational efficiencies through proactive analysis, performance tuning, and the development of automation scripts and tools. Ensure adherence to support process and tool standards. Technical Communication: Effectively communicate technical details, application status, operational risks, and support initiatives to product teams, development teams, and relevant stakeholders. Risk & Compliance: Directly ensure operational risk is managed effectively and compliance with applicable policies, rules, and regulations is maintained for the Instant Payments application support function. Qualifications Progressive, hands on experience in application support, Site Reliability Engineering (SRE), or technical operations for mission critical, high volume financial applications. Direct experience with cloud native architectures, including configuration and management of microservices, containers (e.g., Kubernetes), and serverless technologies. Practical experience with major Public Cloud platforms (e.g., AWS, Azure, GCP) and enterprise private cloud environments. Track record in implementing and operating comprehensive observability stacks (e.g., Prometheus, Grafana, ELK stack, Jaeger, distributed tracing). Understanding and application of resiliency engineering principles (e.g., circuit breakers, bulkheads, retry mechanisms) and robust disaster recovery strategies. Strong technical background in instant payments or real time financial transaction processing systems highly desirable. Expertise in automation, scripting (e.g., Python, Go, Shell), and infrastructure as code principles (e.g., Terraform, CloudFormation). Excellent communication, interpersonal, and team leadership skills, with the ability to manage and motivate a technical team while remaining deeply technical. Proven ability to troubleshoot and resolve complex technical issues independently, prioritize effectively, and make sound decisions under pressure. Education Bachelor's/University degree in Computer Science, Engineering, or a related technical field is required. Relevant certifications (e.g., Public Cloud Certified Solutions Architect, Certified Kubernetes Administrator) preferred. What we'll provide you 27 days annual leave (plus bank holidays) Discretionary annual performance related bonus Private Medical Care & Life Insurance Employee Assistance Program Pension Plan Paid Parental Leave Special discounts for employees, family, and friends Access to an array of learning and development resources Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity, review Accessibility at Citi. View Citi's EEO Policy Statement and the Know Your Rights poster.
Location: Birmingham, UK (commutable distance required)About FNZ and the teamThe Release Operations Engineer is part of the Figaro Release Operations team within Operational Support (CEO function), reporting through to the Head of Support Services and the Release Operations Team Leader. The team is responsible for deploying software and delivering accompanying documentation to FNZ Figaro clients, acting as a quality gatekeeper for release packages, and supporting internal code bases and environments used by developers and testers.Role overviewYou will play a key role in the deployment and delivery of software releases across FNZ platforms for internal and external FNZ Figaro clients, delivering to agreed timelines. Working closely with Delivery, Product and Support teams, you will execute release activities and continuously improve release processes. You may also support configuration and maintenance of underlying infrastructure used by Figaro to enable successful software deployment, ensuring documentation is provided and processes are followed accurately. This is a varied, technical role requiring strong problem solving capability and the ability to communicate clearly with technical and non technical stakeholders.Key responsibilitiesRelease engineering & deployments Deploy FNZ Figaro software to internal and external environments using appropriate technology for the change. Support and coordinate release and deployment activities across hybrid environments, including Google Cloud Platform (GCP), Microsoft Azure, AWS, and on prem IBM platforms. Support deployments using automation tools and frameworks. Carry out agreed application maintenance tasks. Lead or assist with software and infrastructure changes for internal FNZ Figaro and client environments.Change management, governance & readiness Produce and maintain release documentation, including release notes, implementation plans, rollback plans, runbooks, and release communications. Contribute to Release Boards, ensuring releases follow the FNZ Change Management lifecycle and that Release Operations activities are completed. Coordinate technical and operational readiness activities (pre checks, deployment windows, cutover plans, post release validation).Continuous improvement / automation / DevOps ways of working Participate in analysis, design, testing and implementation of process improvements and automation to improve efficiency and reliability of change delivery across FNZ Figaro teams. Participate in projects supporting delivery of change from initial design through to production implementation.Support, stakeholder & team contribution Provide technical support to FNZ Figaro staff and clients, including working on client site when required. Independently investigate and diagnose support issues, proposing solutions and providing advice to resolve incidents. Provide mentoring, guidance and technical support to junior team members and wider FNZ Figaro teams as required. Manage work commitments with minimal referral, providing regular progress updates to the Release Operations Team Leader. Provide voluntary out of hours deployment cover, including weekends and public holidays. Work collaboratively with Release Management, Managed Service Operations, Infrastructure, Solutions Development, Product Owners and other FNZ departments.Required experience and skillsEssential experience Experience in a Release Management / Release Engineering role. Experience in a client facing support role. Experience documenting processes, procedures and policies for internal use. Experience working in regulated environments (financial services preferred).Essential behaviours Highly self motivated and delivery focused; confident taking initiative and working independently. Highly logical with proven problem solving capability. Strong organisation, administration and time management. Clear written and verbal communicator; effective with internal and external stakeholders.Technical skills Familiarity with JIRA and Confluence. Intermediate SQL skills. Familiarity with IBM i (iSeries/AS400) platforms. Knowledge of operating systems: IBM, Windows, Linux. Cloud knowledge: AWS and GCP (and deployment coordination across Azure is part of the environment).Desirable Experience with Terraform, Jenkins, Infrastructure as Code, SDLC, CI/CD, automation, and DevOps methodologies. ITIL certification.Applications will close May 13th . Early application is encouraged. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealth's growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with the world's leading financial institutions, with over US$2.4 trillion in assets on platform (AoP). Together with our clients, we empower nearly 30 million people across all wealth segments to invest in their future.
23/06/2026
Full time
Location: Birmingham, UK (commutable distance required)About FNZ and the teamThe Release Operations Engineer is part of the Figaro Release Operations team within Operational Support (CEO function), reporting through to the Head of Support Services and the Release Operations Team Leader. The team is responsible for deploying software and delivering accompanying documentation to FNZ Figaro clients, acting as a quality gatekeeper for release packages, and supporting internal code bases and environments used by developers and testers.Role overviewYou will play a key role in the deployment and delivery of software releases across FNZ platforms for internal and external FNZ Figaro clients, delivering to agreed timelines. Working closely with Delivery, Product and Support teams, you will execute release activities and continuously improve release processes. You may also support configuration and maintenance of underlying infrastructure used by Figaro to enable successful software deployment, ensuring documentation is provided and processes are followed accurately. This is a varied, technical role requiring strong problem solving capability and the ability to communicate clearly with technical and non technical stakeholders.Key responsibilitiesRelease engineering & deployments Deploy FNZ Figaro software to internal and external environments using appropriate technology for the change. Support and coordinate release and deployment activities across hybrid environments, including Google Cloud Platform (GCP), Microsoft Azure, AWS, and on prem IBM platforms. Support deployments using automation tools and frameworks. Carry out agreed application maintenance tasks. Lead or assist with software and infrastructure changes for internal FNZ Figaro and client environments.Change management, governance & readiness Produce and maintain release documentation, including release notes, implementation plans, rollback plans, runbooks, and release communications. Contribute to Release Boards, ensuring releases follow the FNZ Change Management lifecycle and that Release Operations activities are completed. Coordinate technical and operational readiness activities (pre checks, deployment windows, cutover plans, post release validation).Continuous improvement / automation / DevOps ways of working Participate in analysis, design, testing and implementation of process improvements and automation to improve efficiency and reliability of change delivery across FNZ Figaro teams. Participate in projects supporting delivery of change from initial design through to production implementation.Support, stakeholder & team contribution Provide technical support to FNZ Figaro staff and clients, including working on client site when required. Independently investigate and diagnose support issues, proposing solutions and providing advice to resolve incidents. Provide mentoring, guidance and technical support to junior team members and wider FNZ Figaro teams as required. Manage work commitments with minimal referral, providing regular progress updates to the Release Operations Team Leader. Provide voluntary out of hours deployment cover, including weekends and public holidays. Work collaboratively with Release Management, Managed Service Operations, Infrastructure, Solutions Development, Product Owners and other FNZ departments.Required experience and skillsEssential experience Experience in a Release Management / Release Engineering role. Experience in a client facing support role. Experience documenting processes, procedures and policies for internal use. Experience working in regulated environments (financial services preferred).Essential behaviours Highly self motivated and delivery focused; confident taking initiative and working independently. Highly logical with proven problem solving capability. Strong organisation, administration and time management. Clear written and verbal communicator; effective with internal and external stakeholders.Technical skills Familiarity with JIRA and Confluence. Intermediate SQL skills. Familiarity with IBM i (iSeries/AS400) platforms. Knowledge of operating systems: IBM, Windows, Linux. Cloud knowledge: AWS and GCP (and deployment coordination across Azure is part of the environment).Desirable Experience with Terraform, Jenkins, Infrastructure as Code, SDLC, CI/CD, automation, and DevOps methodologies. ITIL certification.Applications will close May 13th . Early application is encouraged. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealth's growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with the world's leading financial institutions, with over US$2.4 trillion in assets on platform (AoP). Together with our clients, we empower nearly 30 million people across all wealth segments to invest in their future.
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
23/06/2026
Full time
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Manager, Forward Deployed Engineer, TC, FS Location: London Other locations: Primary Location Only Salary: Competitive Date: 9 Apr 2026 Job description Requisition ID: Location: UK (London CP / Manchester / Birmingham / Edinburgh/ Belfast) - Hybrid working with client-site travel as required. Contract: Permanent, full-time The opportunity Organisations are moving rapidly from AI experimentation to operational adoption. However, many struggle to translate ideas into secure, scalable and reliable production solutions. What you'll do Client facing engineering & delivery Lead technical delivery for AI solution areas, guiding teams in translating client needs into scalable engineering approaches. Engage with business and technology stakeholders to shape technical direction, communicate trade-offs and ensure alignment on solution outcomes. Support delivery teams in navigating complex client environments while ensuring engineering quality and reliability. Solution design & implementation Architect AI enabled services such as agents, RAG pipelines and supporting platform components. Ensure solutions are designed with reliability, observability and operational readiness in mind. Guide teams in implementing responsible AI controls, evaluation approaches and engineering best practices. Product mindset & continuous improvement Mentor engineers and support the development of strong engineering practices across squads. Lead technical reviews and help establish reusable patterns, accelerators and reference architectures. Contribute to internal knowledge sharing and external thought leadership around applied AI engineering. What we're looking for Essential skills & experience Software & systems engineering: Python/TypeScript, distributed systems, API/microservice design, testing/CI/CD. Applied AI/ML: building and operating ML/DL in production; expertise in NLP/CV/transformers and classical ML. LLM/RAG engineering: embeddings, vector stores (FAISS/Milvus/Pinecone), retrieval strategies, grounding and hallucination mitigation. LLMOps: prompt pipelines, automated evaluation, telemetry/drift monitoring, model versioning and release management. Cloud architecture: Azure (preferred) and/or AWS/GCP; Kubernetes/Docker; serverless; IAM and network security. Data engineering: Spark/Databricks, ETL/ELT; collaboration with platform/data teams to deliver cloud native data + AI architectures. Enterprise integration: legacy/LoB systems; design for reliability/observability (SLIs/SLOs) and operational readiness with runbooks/SRE practices. Product leadership: discovery facilitation, PRDs, acceptance criteria, prioritisation (RICE/MoSCoW), value/adoption metrics. Responsible AI & compliance: privacy by design, auditability and UK regulatory awareness (FCA, PRA, GDPR). Consulting capabilities: stakeholder management, client ready communication, time/budget/risk management and team leadership. Nice to have Big data/graph stacks (e.g., Hadoop, Cassandra, Neo4j) and streaming (Event Hub/Kafka). Azure/AWS Solutions Architect experience; optional governance/model risk/responsible AI credentials. Technical Certifications (preferred) Microsoft Azure AI Engineer Associate (AI 102) or Azure Data Scientist Associate. AWS Machine Learning Specialty or Google Professional ML Engineer. Databricks (Data Engineer/ML Engineer) and Kubernetes (CKA/CKAD). Azure/AWS Solutions Architect; optional model risk/responsible AI governance credentials. How you work You are hands on with engineering while setting the technical direction for delivery teams. You help teams navigate technical trade offs and ensure solutions meet enterprise standards for reliability and security. You care about quality, operational readiness and long term maintainability of systems delivered to clients. What we offer High impact work with leading organisations across sectors, within a collaborative engineering led AI capability. You will benefit from: Continuous development through the FDE Academy, strengthening the architecture and engineering leadership capabilities required to build AI systems at scale. Opportunities to participate in hackathons, engineering showcases and innovation challenges. Learning and certification support across cloud, AI and engineering platforms. Competitive compensation and benefits. Flexible hybrid working arrangements depending on client needs. Travel & Working Model Hybrid working and periodic travel to client sites across the UK (and occasionally internationally), discussed based on projects and location. Inclusion and accessibility EY is committed to building an inclusive culture where everyone can thrive. If you require adjustments or support during the recruitment process, we encourage you to let us know.
22/06/2026
Full time
Manager, Forward Deployed Engineer, TC, FS Location: London Other locations: Primary Location Only Salary: Competitive Date: 9 Apr 2026 Job description Requisition ID: Location: UK (London CP / Manchester / Birmingham / Edinburgh/ Belfast) - Hybrid working with client-site travel as required. Contract: Permanent, full-time The opportunity Organisations are moving rapidly from AI experimentation to operational adoption. However, many struggle to translate ideas into secure, scalable and reliable production solutions. What you'll do Client facing engineering & delivery Lead technical delivery for AI solution areas, guiding teams in translating client needs into scalable engineering approaches. Engage with business and technology stakeholders to shape technical direction, communicate trade-offs and ensure alignment on solution outcomes. Support delivery teams in navigating complex client environments while ensuring engineering quality and reliability. Solution design & implementation Architect AI enabled services such as agents, RAG pipelines and supporting platform components. Ensure solutions are designed with reliability, observability and operational readiness in mind. Guide teams in implementing responsible AI controls, evaluation approaches and engineering best practices. Product mindset & continuous improvement Mentor engineers and support the development of strong engineering practices across squads. Lead technical reviews and help establish reusable patterns, accelerators and reference architectures. Contribute to internal knowledge sharing and external thought leadership around applied AI engineering. What we're looking for Essential skills & experience Software & systems engineering: Python/TypeScript, distributed systems, API/microservice design, testing/CI/CD. Applied AI/ML: building and operating ML/DL in production; expertise in NLP/CV/transformers and classical ML. LLM/RAG engineering: embeddings, vector stores (FAISS/Milvus/Pinecone), retrieval strategies, grounding and hallucination mitigation. LLMOps: prompt pipelines, automated evaluation, telemetry/drift monitoring, model versioning and release management. Cloud architecture: Azure (preferred) and/or AWS/GCP; Kubernetes/Docker; serverless; IAM and network security. Data engineering: Spark/Databricks, ETL/ELT; collaboration with platform/data teams to deliver cloud native data + AI architectures. Enterprise integration: legacy/LoB systems; design for reliability/observability (SLIs/SLOs) and operational readiness with runbooks/SRE practices. Product leadership: discovery facilitation, PRDs, acceptance criteria, prioritisation (RICE/MoSCoW), value/adoption metrics. Responsible AI & compliance: privacy by design, auditability and UK regulatory awareness (FCA, PRA, GDPR). Consulting capabilities: stakeholder management, client ready communication, time/budget/risk management and team leadership. Nice to have Big data/graph stacks (e.g., Hadoop, Cassandra, Neo4j) and streaming (Event Hub/Kafka). Azure/AWS Solutions Architect experience; optional governance/model risk/responsible AI credentials. Technical Certifications (preferred) Microsoft Azure AI Engineer Associate (AI 102) or Azure Data Scientist Associate. AWS Machine Learning Specialty or Google Professional ML Engineer. Databricks (Data Engineer/ML Engineer) and Kubernetes (CKA/CKAD). Azure/AWS Solutions Architect; optional model risk/responsible AI governance credentials. How you work You are hands on with engineering while setting the technical direction for delivery teams. You help teams navigate technical trade offs and ensure solutions meet enterprise standards for reliability and security. You care about quality, operational readiness and long term maintainability of systems delivered to clients. What we offer High impact work with leading organisations across sectors, within a collaborative engineering led AI capability. You will benefit from: Continuous development through the FDE Academy, strengthening the architecture and engineering leadership capabilities required to build AI systems at scale. Opportunities to participate in hackathons, engineering showcases and innovation challenges. Learning and certification support across cloud, AI and engineering platforms. Competitive compensation and benefits. Flexible hybrid working arrangements depending on client needs. Travel & Working Model Hybrid working and periodic travel to client sites across the UK (and occasionally internationally), discussed based on projects and location. Inclusion and accessibility EY is committed to building an inclusive culture where everyone can thrive. If you require adjustments or support during the recruitment process, we encourage you to let us know.
Senior Site Reliability Engineer (Private Cloud)Applylocations: Leeds: Manchestertime type: Full timeposted on: Posted Todaytime left to apply: End Date: July 3, 2026 (14 days left to apply)job requisition id: 157451 End Date Thursday 02 July 2026 Salary Range £72,702 - £80,780 We support flexible working - click here for more information on flexible working options Flexible Working Options Hybrid Working, Job Share Job Description Summary . Job Description JOB TITLE: Senior Site Reliability Engineer (Private Cloud) SALARY: £72,702 - £80,780 LOCATION(S): HOURS: Full-time - 35 hours per week WORKING PATTERN : Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our Bristol office sites. Colleagues with disabilities can be supported with workplace adjustments including hybrid working expectations in line with our Flexibility Works policy. About this opportunity Our Private Cloud SRE (Site Reliability Engineering) team is looking for a passionate and experienced engineer to help run and evolve one of the Group's most critical platforms. As a Private Cloud SRE, you'll be a key contributor to the stability, performance, and scalability of services that support the Bank's digital transformation and long-term technology vision.You'll work hands-on with container platforms, VMware infrastructure, and observability tooling to ensure our services are resilient and efficient. You'll lead and participate in post-mortems, drive automation, and continuously improve the platform through engineering-led solutions. This role also involves working in Agile environments, collaborating across multiple teams and disciplines to deliver high-quality outcomes at pace. What you'll be doing Support and enhance a wide range of platform technologies, including VMware infrastructure, container platforms and orchestration (e.g., Kubernetes, OpenShift), databases, and applications. Use Infrastructure as Code to manage environments and support CI/CD pipelines. Improve observability using tools such as Dynatrace, ensuring proactive monitoring and alerting. Lead and contribute to post-mortems to identify and implement long-term fixes. Troubleshoot complex issues across the platform stack, including infrastructure, networking, storage, databases, and applications. Work in Agile teams, collaborating with engineers, architects, and product owners across the organisation. Identify and implement automation opportunities to reduce manual effort and improve operational efficiency. Why join us? We're on an exciting journey to transform our Group and the way we're shaping finance for good. We're focusing on the future, investing in our technologies, workplaces, and colleagues to make our Group a great place for everyone. Including you. What we're looking for? At least 5 years experience of DevOps principles, including Infrastructure as Code and CI/CD. 5+ years of experience with container platforms and orchestration (e.g., Docker, Kubernetes, OpenShift). Hands-on experience with VMware technologies in a production environment. Familiarity with observability platforms, such as Dynatrace. Proven ability to solve across a broad range of platform technologies. Experience with either Linux or Windows operating systems. An attitude focused on continuous improvement and reducing manual steps through automation. And any experience of these would be great Experience with automation tools and APIs for infrastructure management. Exposure to configuration management tools (e.g., Ansible, Puppet). Leadership or mentoring experience in technical teams. Certifications in VMware or any major cloud provider (e.g., Azure, GCP, AWS). Background in system administration or software engineering with a strong interest in learning cloud-native practices.We know that great talent comes from many backgrounds. Whilst this job advert may reference specific years of experience, we recognise that skills are developed in many ways, so if you have relevant, transferable experience, we encourage you to apply. This is a place for you Our ambition is to be the leading UK business for diversity, equity and inclusion supporting our customers, colleagues and communities and we're committed to creating an environment in which everyone can thrive, learn and develop. We also offer a wide-ranging benefits package, which includes: A generous pension contribution of up to 15% An annual performance-related bonus Share schemes including free shares Benefits you can adapt to your lifestyle, such as discounted shopping 30 days' holiday, with bank holidays on top A range of wellbeing initiatives and generous parental leave policies Ready for a career where you'll learn and thrive? Apply today and find out more. At Lloyds Banking Group, we're driven by a clear purpose; to help Britain prosper. Across the Group, our colleagues are focused on making a difference to customers, businesses and communities. With us you'll have a key role to play in shaping the financial services of the future, whilst the scale and reach of our Group means you'll have many opportunities to learn, grow and develop. We keep your data safe. So, we'll only ever ask you to provide confidential or sensitive information once you have formally been invited along to an interview or accepted a verbal offer to join us which is when we run our background checks. We'll always explain what we need and why, with any request coming from a trusted Lloyds Banking Group person. We're focused on creating a values-led culture and are committed to building a workforce which reflects the diversity of the customers and communities we serve. Together we're building a truly inclusive workplace where all of our colleagues have the opportunity to make a real difference.
21/06/2026
Full time
Senior Site Reliability Engineer (Private Cloud)Applylocations: Leeds: Manchestertime type: Full timeposted on: Posted Todaytime left to apply: End Date: July 3, 2026 (14 days left to apply)job requisition id: 157451 End Date Thursday 02 July 2026 Salary Range £72,702 - £80,780 We support flexible working - click here for more information on flexible working options Flexible Working Options Hybrid Working, Job Share Job Description Summary . Job Description JOB TITLE: Senior Site Reliability Engineer (Private Cloud) SALARY: £72,702 - £80,780 LOCATION(S): HOURS: Full-time - 35 hours per week WORKING PATTERN : Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our Bristol office sites. Colleagues with disabilities can be supported with workplace adjustments including hybrid working expectations in line with our Flexibility Works policy. About this opportunity Our Private Cloud SRE (Site Reliability Engineering) team is looking for a passionate and experienced engineer to help run and evolve one of the Group's most critical platforms. As a Private Cloud SRE, you'll be a key contributor to the stability, performance, and scalability of services that support the Bank's digital transformation and long-term technology vision.You'll work hands-on with container platforms, VMware infrastructure, and observability tooling to ensure our services are resilient and efficient. You'll lead and participate in post-mortems, drive automation, and continuously improve the platform through engineering-led solutions. This role also involves working in Agile environments, collaborating across multiple teams and disciplines to deliver high-quality outcomes at pace. What you'll be doing Support and enhance a wide range of platform technologies, including VMware infrastructure, container platforms and orchestration (e.g., Kubernetes, OpenShift), databases, and applications. Use Infrastructure as Code to manage environments and support CI/CD pipelines. Improve observability using tools such as Dynatrace, ensuring proactive monitoring and alerting. Lead and contribute to post-mortems to identify and implement long-term fixes. Troubleshoot complex issues across the platform stack, including infrastructure, networking, storage, databases, and applications. Work in Agile teams, collaborating with engineers, architects, and product owners across the organisation. Identify and implement automation opportunities to reduce manual effort and improve operational efficiency. Why join us? We're on an exciting journey to transform our Group and the way we're shaping finance for good. We're focusing on the future, investing in our technologies, workplaces, and colleagues to make our Group a great place for everyone. Including you. What we're looking for? At least 5 years experience of DevOps principles, including Infrastructure as Code and CI/CD. 5+ years of experience with container platforms and orchestration (e.g., Docker, Kubernetes, OpenShift). Hands-on experience with VMware technologies in a production environment. Familiarity with observability platforms, such as Dynatrace. Proven ability to solve across a broad range of platform technologies. Experience with either Linux or Windows operating systems. An attitude focused on continuous improvement and reducing manual steps through automation. And any experience of these would be great Experience with automation tools and APIs for infrastructure management. Exposure to configuration management tools (e.g., Ansible, Puppet). Leadership or mentoring experience in technical teams. Certifications in VMware or any major cloud provider (e.g., Azure, GCP, AWS). Background in system administration or software engineering with a strong interest in learning cloud-native practices.We know that great talent comes from many backgrounds. Whilst this job advert may reference specific years of experience, we recognise that skills are developed in many ways, so if you have relevant, transferable experience, we encourage you to apply. This is a place for you Our ambition is to be the leading UK business for diversity, equity and inclusion supporting our customers, colleagues and communities and we're committed to creating an environment in which everyone can thrive, learn and develop. We also offer a wide-ranging benefits package, which includes: A generous pension contribution of up to 15% An annual performance-related bonus Share schemes including free shares Benefits you can adapt to your lifestyle, such as discounted shopping 30 days' holiday, with bank holidays on top A range of wellbeing initiatives and generous parental leave policies Ready for a career where you'll learn and thrive? Apply today and find out more. At Lloyds Banking Group, we're driven by a clear purpose; to help Britain prosper. Across the Group, our colleagues are focused on making a difference to customers, businesses and communities. With us you'll have a key role to play in shaping the financial services of the future, whilst the scale and reach of our Group means you'll have many opportunities to learn, grow and develop. We keep your data safe. So, we'll only ever ask you to provide confidential or sensitive information once you have formally been invited along to an interview or accepted a verbal offer to join us which is when we run our background checks. We'll always explain what we need and why, with any request coming from a trusted Lloyds Banking Group person. We're focused on creating a values-led culture and are committed to building a workforce which reflects the diversity of the customers and communities we serve. Together we're building a truly inclusive workplace where all of our colleagues have the opportunity to make a real difference.
Salary: £80,000 - 80,000 per year Requirements Strong experience in SRE, DevOps, or infrastructure engineering Strong programming or scripting skills in at least one language such as Go, Python, or similar In-depth experience with cloud platforms AWS and/or Azure Experience with observability tools such as Prometheus, Grafana, or Datadog Experience leading incident response and driving reliability improvements Proficiency with container orchestration such as Kubernetes and Infrastructure-as-Code such as Terraform, Pulumi, or similar Good understanding of networking, Linux OS, and distributed systems Collaborative mindset with strong communication skills Responsibilities Build and operate highly available, scalable, and resilient platforms Work closely with Platform Engineering and DevSecOps to drive reliability across the technology stack Improve observability and automate operational processes Help ensure systems remain secure, performant, and easy to operate Lead incident response activities Champion a culture of continuous improvement Collaborate with engineering teams to embed reliability into service design Define and evolve reliability standards Contribute to capacity planning and performance optimisation Mentor fellow engineers Help shape the tools, platforms, and practices that support reliable service delivery at scale Technologies AI AWS Azure Cloud Datadog DevSecOps DevOps Grafana Support Kubernetes Linux Prometheus Python REST Terraform More We are a world-leading cybersecurity technology business using AI to protect clients across the globe from advanced cyber threats. You will join a highly talented, diverse team in our Cambridge office twice a week, with the flexibility of working from home the rest of the time. We offer a great team atmosphere, free lunches, problem-solving sessions, and a competitive package including bonus, pension, private medical insurance, life assurance, enhanced parental leave, employee assistance, 23 days holiday plus your birthday off, charity giving schemes, and personal training and development budgets. last updated 25 week of 2026
21/06/2026
Full time
Salary: £80,000 - 80,000 per year Requirements Strong experience in SRE, DevOps, or infrastructure engineering Strong programming or scripting skills in at least one language such as Go, Python, or similar In-depth experience with cloud platforms AWS and/or Azure Experience with observability tools such as Prometheus, Grafana, or Datadog Experience leading incident response and driving reliability improvements Proficiency with container orchestration such as Kubernetes and Infrastructure-as-Code such as Terraform, Pulumi, or similar Good understanding of networking, Linux OS, and distributed systems Collaborative mindset with strong communication skills Responsibilities Build and operate highly available, scalable, and resilient platforms Work closely with Platform Engineering and DevSecOps to drive reliability across the technology stack Improve observability and automate operational processes Help ensure systems remain secure, performant, and easy to operate Lead incident response activities Champion a culture of continuous improvement Collaborate with engineering teams to embed reliability into service design Define and evolve reliability standards Contribute to capacity planning and performance optimisation Mentor fellow engineers Help shape the tools, platforms, and practices that support reliable service delivery at scale Technologies AI AWS Azure Cloud Datadog DevSecOps DevOps Grafana Support Kubernetes Linux Prometheus Python REST Terraform More We are a world-leading cybersecurity technology business using AI to protect clients across the globe from advanced cyber threats. You will join a highly talented, diverse team in our Cambridge office twice a week, with the flexibility of working from home the rest of the time. We offer a great team atmosphere, free lunches, problem-solving sessions, and a competitive package including bonus, pension, private medical insurance, life assurance, enhanced parental leave, employee assistance, 23 days holiday plus your birthday off, charity giving schemes, and personal training and development budgets. last updated 25 week of 2026
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does.Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation.We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary: We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions: Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision-making. Drive Incident & Problem Management: Working with Operations teams on high-priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post-incident reviews and implement long-term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements: Essential Qualifications Experience: 5-7+ years of hands-on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large-scale monitoring and observability solutions. Expert-Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem-Solving: Exceptional analytical and strategic problem-solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands-on: Demonstratable hands-on engineering and coding experience, ability to deep-dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network-related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross-functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self-motivated with a high degree of initiative and excellent follow-up skills, along with strong analytical and problem-solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field.Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI-driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue-Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior-level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator).Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category: Universal Music Group
19/06/2026
Full time
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does.Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation.We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary: We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions: Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision-making. Drive Incident & Problem Management: Working with Operations teams on high-priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post-incident reviews and implement long-term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements: Essential Qualifications Experience: 5-7+ years of hands-on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large-scale monitoring and observability solutions. Expert-Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem-Solving: Exceptional analytical and strategic problem-solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands-on: Demonstratable hands-on engineering and coding experience, ability to deep-dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network-related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross-functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self-motivated with a high degree of initiative and excellent follow-up skills, along with strong analytical and problem-solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field.Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI-driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue-Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior-level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator).Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category: Universal Music Group
Site Reliability EngineerApplylocations: Cambridge Office, United Kingdomtime type: Full timeposted on: Posted Todayjob requisition id: JR101920Darktrace is a global leader in AI for cybersecurity that keeps organizations ahead of the changing threat landscape every day. Founded in 2013, Darktrace provides the essential cybersecurity platform protecting nearly 10,000 organizations from unknown threats using its proprietary AI.The Darktrace Active AI Security PlatformTM delivers a proactive approach to cyber resilience to secure the business across the entire digital estate - from network to cloud to email. Breakthrough innovations from our R&D teams have resulted in over 200 patent applications filed. Darktrace's platform and services are supported by over 2,400 employees around the world. To learn more, visit . Job D escription : About the Role We're looking for a Site Reliability Engineer (SRE) to bring deep expertise in a key reliability domain and help shape the future of our platform reliability strategy.SRE sits at the heart of our operational trifecta alongside Platform Engineering and DevSecOps . In this role, you'll act as the go-to authority in your area of specialism , working across teams to embed best practices, solve complex reliability challenges, and improve system resilience at scale.Unlike a generalist SRE, this role focuses on a core domain of expertise -such as observability, performance engineering, data infrastructure reliability, security-focused SRE, or network reliability -while influencing reliability standards across the wider engineering organisation. Key Responsibilities Domain Expertise & Strategy Act as the subject matter expert in your chosen reliability domain Define and implement standards, frameworks, and best practices across SRE, Platform Engineering, and DevSecOps Stay current with industry trends and bring innovative ideas into the organisation Engineering & Delivery Design and implement solutions to complex, cross-cutting reliability challenges Build tooling, automation, and frameworks to improve system resilience and scalability Lead deep-dive investigations into systemic issues and drive long-term fixes Collaboration & Platform Integration Partner with Platform Engineering to ensure your domain is embedded within the internal developer platform Collaborate with DevSecOps to integrate security, compliance, and resilience practices Contribute to cross-team initiatives that improve reliability across the stack Incident & Operational Excellence Play a key role in incident response , particularly within your specialism Contribute to on-call rotations and continuous improvement of operational processes Develop runbooks, documentation, and training materials to support teams What You'll Bring Essential Proven experience in Site Reliability Engineering, DevOps, or infrastructure engineering Deep expertise in at least one of the following areas: + Observability & monitoring (metrics, logging, distributed tracing) + Performance engineering & capacity planning + Data infrastructure reliability (databases, streaming, pipelines) + Security-focused SRE (hardening, compliance automation, secrets management) + Network reliability & traffic management Strong programming skills (e.g. Go, Python, or similar ) Experience with cloud platforms (AWS, GCP, Azure) and Kubernetes Strong communication skills, with the ability to explain complex technical concepts clearly Self-driven with the ability to identify and prioritise high-impact work independently Desirable Experience building internal developer platforms or tooling Contributions to open-source, technical blogs, or public speaking Experience working in regulated environments Familiarity with SLO frameworks and error budget management Relevant certifications in your specialist domain Success Measures Improved reliability and performance within your domain of specialism Adoption of best practices across SRE, Platform Engineering, and DevSecOps Reduction in incidents and faster resolution times Scalable, well-integrated solutions within the internal platform Strong collaboration across teams and measurable improvements in operational maturity Why Join Us? Shape reliability strategy in a modern, cloud-native engineering environment Work on complex, high-impact systems at scale Collaborate with expert teams across Platform Engineering and DevSecOps Take ownership of a domain and drive meaningful, organisation-wide impact Benefits: 23 days' holiday + all public holidays, rising to 25 days after 2 years of service, Additional day off for your birthday, Private medical insurance which covers you, your cohabiting partner and children, Life insurance of 4 times your base salary, Salary sacrifice pension scheme, Enhanced family leave, Confidential Employee Assistance Program, Cycle to work scheme. Darktrace is an Equal Opportunity Employer. We consider all qualified applicants for employment without regard to race, color, religion, sex (including pregnancy, childbirth, and related medical conditions), sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, veteran or military status, or any other characteristic protected by applicable federal, state, or local law. Darktrace is committed to providing reasonable accommodations to qualified individuals with disabilities in accordance with applicable laws. If you require a reasonable accommodation to participate in the application or interview process, please contact your Talent Partner.
19/06/2026
Full time
Site Reliability EngineerApplylocations: Cambridge Office, United Kingdomtime type: Full timeposted on: Posted Todayjob requisition id: JR101920Darktrace is a global leader in AI for cybersecurity that keeps organizations ahead of the changing threat landscape every day. Founded in 2013, Darktrace provides the essential cybersecurity platform protecting nearly 10,000 organizations from unknown threats using its proprietary AI.The Darktrace Active AI Security PlatformTM delivers a proactive approach to cyber resilience to secure the business across the entire digital estate - from network to cloud to email. Breakthrough innovations from our R&D teams have resulted in over 200 patent applications filed. Darktrace's platform and services are supported by over 2,400 employees around the world. To learn more, visit . Job D escription : About the Role We're looking for a Site Reliability Engineer (SRE) to bring deep expertise in a key reliability domain and help shape the future of our platform reliability strategy.SRE sits at the heart of our operational trifecta alongside Platform Engineering and DevSecOps . In this role, you'll act as the go-to authority in your area of specialism , working across teams to embed best practices, solve complex reliability challenges, and improve system resilience at scale.Unlike a generalist SRE, this role focuses on a core domain of expertise -such as observability, performance engineering, data infrastructure reliability, security-focused SRE, or network reliability -while influencing reliability standards across the wider engineering organisation. Key Responsibilities Domain Expertise & Strategy Act as the subject matter expert in your chosen reliability domain Define and implement standards, frameworks, and best practices across SRE, Platform Engineering, and DevSecOps Stay current with industry trends and bring innovative ideas into the organisation Engineering & Delivery Design and implement solutions to complex, cross-cutting reliability challenges Build tooling, automation, and frameworks to improve system resilience and scalability Lead deep-dive investigations into systemic issues and drive long-term fixes Collaboration & Platform Integration Partner with Platform Engineering to ensure your domain is embedded within the internal developer platform Collaborate with DevSecOps to integrate security, compliance, and resilience practices Contribute to cross-team initiatives that improve reliability across the stack Incident & Operational Excellence Play a key role in incident response , particularly within your specialism Contribute to on-call rotations and continuous improvement of operational processes Develop runbooks, documentation, and training materials to support teams What You'll Bring Essential Proven experience in Site Reliability Engineering, DevOps, or infrastructure engineering Deep expertise in at least one of the following areas: + Observability & monitoring (metrics, logging, distributed tracing) + Performance engineering & capacity planning + Data infrastructure reliability (databases, streaming, pipelines) + Security-focused SRE (hardening, compliance automation, secrets management) + Network reliability & traffic management Strong programming skills (e.g. Go, Python, or similar ) Experience with cloud platforms (AWS, GCP, Azure) and Kubernetes Strong communication skills, with the ability to explain complex technical concepts clearly Self-driven with the ability to identify and prioritise high-impact work independently Desirable Experience building internal developer platforms or tooling Contributions to open-source, technical blogs, or public speaking Experience working in regulated environments Familiarity with SLO frameworks and error budget management Relevant certifications in your specialist domain Success Measures Improved reliability and performance within your domain of specialism Adoption of best practices across SRE, Platform Engineering, and DevSecOps Reduction in incidents and faster resolution times Scalable, well-integrated solutions within the internal platform Strong collaboration across teams and measurable improvements in operational maturity Why Join Us? Shape reliability strategy in a modern, cloud-native engineering environment Work on complex, high-impact systems at scale Collaborate with expert teams across Platform Engineering and DevSecOps Take ownership of a domain and drive meaningful, organisation-wide impact Benefits: 23 days' holiday + all public holidays, rising to 25 days after 2 years of service, Additional day off for your birthday, Private medical insurance which covers you, your cohabiting partner and children, Life insurance of 4 times your base salary, Salary sacrifice pension scheme, Enhanced family leave, Confidential Employee Assistance Program, Cycle to work scheme. Darktrace is an Equal Opportunity Employer. We consider all qualified applicants for employment without regard to race, color, religion, sex (including pregnancy, childbirth, and related medical conditions), sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, veteran or military status, or any other characteristic protected by applicable federal, state, or local law. Darktrace is committed to providing reasonable accommodations to qualified individuals with disabilities in accordance with applicable laws. If you require a reasonable accommodation to participate in the application or interview process, please contact your Talent Partner.
Background At Motion Applied - Connected Intelligence (CI), we create cutting edge wireless connectivity solutions that are transforming customer experience across the transport industry. We create solutions that drive efficiency and cost-effectiveness for customers, delivering unrivalled internet connectivity services. Purpose of the Role This is an opportunity to join our Software Engineering community as a Site Reliability Engineer (SRE) leading initiatives that create and improve the CI Platform as a Service (Paas) offering to the product delivery teams. Enabling product development teams to deliver software products on immutable infrastructure. Developing and facilitating production and development infrastructure and associated tooling. Integrating third party managed services used for delivery and development lifecycle. Lead on implementation of the CI cloud governance policy and be a key participant in maintaining and updating the policy in accordance with technology shifts, customer feedback and product development needs. Be an open minded technologist who values a collaborative work environment and is willing to learn and explore as the fast paced industry evolves and changes. Key Responsibilities Key contributor to the Roadmap for the CI Platform. Development and maintenance of the CI Platform. Collaborate and consult with software engineers and data scientists to help design and implement robust and scalable software products. Knowledge sharing and education of team members to enable our DevOps culture. Proactively monitor costs and security posture of the CI Platform and products running on it. Define and implement tooling to continually improve our software development, release and maintenance processes. Develop product features with product delivery teams, building upon the CI Platform offering. Supporting our live systems, including identifying and implementing improvements to products, tools and processes to improve the on going reliability of our solutions. Design and implement infrastructure for multi region and multi tenant products and platform. Design and implement monitoring infrastructure for real time data streams. Design network and access to allow software engineers and data scientists to access services in AWS while keeping the services and data safe and secure. Enable and collaborate with teams to automate the entire delivery of a product. From a single web application to the configuration of a cloud account. Design and implement security and access management so that users and roles have access only to resources they need within the AWS account and attaching IoT devices. Identify root cause of live issues, to help both recover any immediate situation and design/implement improvements for future reliability. Experience We are Looking For Working with delivery teams deploying software on the cloud. Strategic technical leadership, in particular related to Site Reliability Engineering. Evidence of tailored and contextual communication in all directions to realise value as feedback and enquiry. Hands on experience in delivering production quality services. Experience in supporting live production systems. Required We are looking for an applicant who has: a Bachelor's degree in computer science, similar technical field of study, or equivalent practical experience. 5 + years of hands on experience with large cloud providers such as AWS, Azure, GCP, or OCI. Leading technical initiatives including roadmap input, cross team collaboration and mentoring. Proficiency in Infrastructure as Code (IaC) tools such as Terraform. 3 + years of hands on experience with containerization and orchestration tools like Docker and Kubernetes. Experience working in and developing for a Linux environment. Excellent programming, debugging, and optimisation skills and at least one strong purpose programming language (Go or Python preferred). Solid understanding of DevOps practices, CI/CD pipelines and version control e.g. Git. Knowledge of observability and monitoring tools e.g. Prometheus, Grafana, ELK etc. Experience in troubleshooting incidents and live environments. Ability to write and speak in English fluently. Desirable & Development Areas Familiarity with AWS Well Architected Framework. Designing, building and maintaining multi tenant and multi region cloud infrastructure. Exposure to configuration management tools like Ansible or SaltStack. Knowledge in cyber security, including but not limited to threat intelligence, IAM, key management systems, data security, application security, applied cryptography, certificate management. Cost optimisation experience at a platform or organisation level. Experience of IoT systems integration, protocols, and services, including gRPC and MQTT. Networking concepts and network performance modelling / optimisation. Experience with Hashicorp tools like Vault or Consul. AWS Certified Solutions Architect - Professional or AWS Certified DevOps Engineer - Professional. Location This role is based at our head office in central Woking. The role is included in our hybrid working policy with the expectation of a minimum of two days a week in the office, which should be co ordinated with other members of the Engineering team. What We Can Offer You Annual leave (25 days + bank holidays, pro rated for part time colleagues). Enhanced Company Maternity, Paternity and Adoption leave and pay. Flexible working policies, including Hybrid Working. Life assurance to the value of 4 times base salary. Opportunity to join the Motion Applied Pension Plan. Company funded individual private healthcare with the opportunity to extend to partner or spouse and/or dependents at a discounted rate. Electric car scheme - opportunity to drive a brand new car in a more affordable way through this salary sacrifice scheme. Employees are eligible to join the scheme after successful completion of their probationary period. Motion Applied are committed to Diversity, Equality and Inclusion (DEI) and promote DEI in all we do. Motion Applied are also members of the UK Government Disability Confident Scheme.
17/06/2026
Full time
Background At Motion Applied - Connected Intelligence (CI), we create cutting edge wireless connectivity solutions that are transforming customer experience across the transport industry. We create solutions that drive efficiency and cost-effectiveness for customers, delivering unrivalled internet connectivity services. Purpose of the Role This is an opportunity to join our Software Engineering community as a Site Reliability Engineer (SRE) leading initiatives that create and improve the CI Platform as a Service (Paas) offering to the product delivery teams. Enabling product development teams to deliver software products on immutable infrastructure. Developing and facilitating production and development infrastructure and associated tooling. Integrating third party managed services used for delivery and development lifecycle. Lead on implementation of the CI cloud governance policy and be a key participant in maintaining and updating the policy in accordance with technology shifts, customer feedback and product development needs. Be an open minded technologist who values a collaborative work environment and is willing to learn and explore as the fast paced industry evolves and changes. Key Responsibilities Key contributor to the Roadmap for the CI Platform. Development and maintenance of the CI Platform. Collaborate and consult with software engineers and data scientists to help design and implement robust and scalable software products. Knowledge sharing and education of team members to enable our DevOps culture. Proactively monitor costs and security posture of the CI Platform and products running on it. Define and implement tooling to continually improve our software development, release and maintenance processes. Develop product features with product delivery teams, building upon the CI Platform offering. Supporting our live systems, including identifying and implementing improvements to products, tools and processes to improve the on going reliability of our solutions. Design and implement infrastructure for multi region and multi tenant products and platform. Design and implement monitoring infrastructure for real time data streams. Design network and access to allow software engineers and data scientists to access services in AWS while keeping the services and data safe and secure. Enable and collaborate with teams to automate the entire delivery of a product. From a single web application to the configuration of a cloud account. Design and implement security and access management so that users and roles have access only to resources they need within the AWS account and attaching IoT devices. Identify root cause of live issues, to help both recover any immediate situation and design/implement improvements for future reliability. Experience We are Looking For Working with delivery teams deploying software on the cloud. Strategic technical leadership, in particular related to Site Reliability Engineering. Evidence of tailored and contextual communication in all directions to realise value as feedback and enquiry. Hands on experience in delivering production quality services. Experience in supporting live production systems. Required We are looking for an applicant who has: a Bachelor's degree in computer science, similar technical field of study, or equivalent practical experience. 5 + years of hands on experience with large cloud providers such as AWS, Azure, GCP, or OCI. Leading technical initiatives including roadmap input, cross team collaboration and mentoring. Proficiency in Infrastructure as Code (IaC) tools such as Terraform. 3 + years of hands on experience with containerization and orchestration tools like Docker and Kubernetes. Experience working in and developing for a Linux environment. Excellent programming, debugging, and optimisation skills and at least one strong purpose programming language (Go or Python preferred). Solid understanding of DevOps practices, CI/CD pipelines and version control e.g. Git. Knowledge of observability and monitoring tools e.g. Prometheus, Grafana, ELK etc. Experience in troubleshooting incidents and live environments. Ability to write and speak in English fluently. Desirable & Development Areas Familiarity with AWS Well Architected Framework. Designing, building and maintaining multi tenant and multi region cloud infrastructure. Exposure to configuration management tools like Ansible or SaltStack. Knowledge in cyber security, including but not limited to threat intelligence, IAM, key management systems, data security, application security, applied cryptography, certificate management. Cost optimisation experience at a platform or organisation level. Experience of IoT systems integration, protocols, and services, including gRPC and MQTT. Networking concepts and network performance modelling / optimisation. Experience with Hashicorp tools like Vault or Consul. AWS Certified Solutions Architect - Professional or AWS Certified DevOps Engineer - Professional. Location This role is based at our head office in central Woking. The role is included in our hybrid working policy with the expectation of a minimum of two days a week in the office, which should be co ordinated with other members of the Engineering team. What We Can Offer You Annual leave (25 days + bank holidays, pro rated for part time colleagues). Enhanced Company Maternity, Paternity and Adoption leave and pay. Flexible working policies, including Hybrid Working. Life assurance to the value of 4 times base salary. Opportunity to join the Motion Applied Pension Plan. Company funded individual private healthcare with the opportunity to extend to partner or spouse and/or dependents at a discounted rate. Electric car scheme - opportunity to drive a brand new car in a more affordable way through this salary sacrifice scheme. Employees are eligible to join the scheme after successful completion of their probationary period. Motion Applied are committed to Diversity, Equality and Inclusion (DEI) and promote DEI in all we do. Motion Applied are also members of the UK Government Disability Confident Scheme.
Summary Yelp engineering culture is driven by our values: we're a cooperative team that values individual authenticity and encourages creative solutions to problems. All new engineers deploy working code their first week, and we strive to broaden individual impact with support from managers, mentors, and teams. At the end of the day, we're all about helping our users, growing as engineers, and having fun in a collaborative environment. Do you want to build and manage scaleable, self-healing, globally-distributed systems? Our Site Reliability engineers keep Yelp fast, available, and growing, connecting users to great local businesses. No matter how many times we get searched, scraped, scanned, spammed, pinged, paged, or queried, we keep our cool - and keep the site running smoothly. We work in both the dev and systems worlds, implementing key parts of the core architecture and supporting devs as they try to do the same. We get to tackle interesting challenges that you can only find at the kind of scale that serves over 100 million users per month. You'll work to empower product teams and developers: at Yelp, spinning up infrastructure should always be a git commit and a code review away: automation and self-service are at the core of what we do. This opportunity requires you to be located in the United Kingdom. We'd love to have you apply, even if you don't feel you meet every single requirement in this posting. At Yelp, we're looking for great people, not just those who simply check off all the boxes. What you'll do: Work closely with engineers in supporting new features and services. Build tools to monitor site stability and performance. Help ensure the reliability and scalability of our infrastructure, while maintaining platform SLOs. Troubleshoot site issues using industry-leading tools like Splunk, Prometheus and OpenTelemetry. Automate everything with Python, Puppet, Git, Jenkins, and Terraform and leveraging an extensive suite of AI tooling. Develop custom tools when off-the-shelf solutions don't work at our scale and contribute upstream to open source projects. Participate in light on-call rotations - we have geographically distributed SRE teams for follow-the-sun support, which means nobody needs to be on-call 24h a day! What it takes to succeed: Familiarity with Linux and an enthusiasm for learning more (we use Ubuntu but any distro is fine). Command of your favorite modern programming language: Python, Ruby, Go, Java, C++, etc. Hands on experience with public cloud platforms (we use AWS, but Azure/GCP are fine). Practical background in Infrastructure as Code and Configuration Management (e.g., Terraform, Puppet, Ansible, or similar). Relevant experience with Linux containerisation and orchestration (e.g., Docker and Kubernetes). A solid understanding of fundamental technologies like TCP/IP, HTTP (transport, serving and load balancing), and DNS, containerization & workload orchestration or datastores and streaming technologies. Exposure to monitoring concepts and best practices around performance, reliability and security. Excellent communication and documentation skills. An eagerness to ask questions, take initiative and learn everyday. What you'll get: Full responsibility for projects from day one, a collaborative team, and a dynamic work environment. Competitive salary, a pension scheme, and an optional employee stock purchase plan. 25 days paid holiday (rising to 29 with service), plus one floating holiday. £150 monthly reimbursement to help cover remote working expenses. £75 caregiver reimbursement to support dependent care for families. Private health insurance, including dental and vision. Flexible working hours and meeting free Wednesdays. Regular 3 day Hackathons, bi weekly learning groups, and productivity spending to support and encourage your career growth. Opportunities to participate in digital events and conferences. £75 per month to use toward qualifying wellness expenses. Quarterly team offsites. Closing Yelp values diversity. We're proud to be an equal opportunity employer and consider qualified applicants without regard to race, color, religion, sex, national origin, ancestry, age, genetic information, sexual orientation, gender identity, marital or family status, veteran status, medical condition, disability, or any other protected status. Notice to Northern Ireland Applicants A Basic criminal background check viaAccessNIis required for employment.Yelp complies with the AccessNI Code of Practice. Having a criminal record will not necessarily prevent a candidate from working with Yelp. Yelp will consider the nature of the position together with the circumstances and background of the candidate's offences or other information contained on a disclosure certificate. AccessNI's Privacy Policy is availablehere.Yelp's Criminal Background Check Policy is available upon request.
17/06/2026
Full time
Summary Yelp engineering culture is driven by our values: we're a cooperative team that values individual authenticity and encourages creative solutions to problems. All new engineers deploy working code their first week, and we strive to broaden individual impact with support from managers, mentors, and teams. At the end of the day, we're all about helping our users, growing as engineers, and having fun in a collaborative environment. Do you want to build and manage scaleable, self-healing, globally-distributed systems? Our Site Reliability engineers keep Yelp fast, available, and growing, connecting users to great local businesses. No matter how many times we get searched, scraped, scanned, spammed, pinged, paged, or queried, we keep our cool - and keep the site running smoothly. We work in both the dev and systems worlds, implementing key parts of the core architecture and supporting devs as they try to do the same. We get to tackle interesting challenges that you can only find at the kind of scale that serves over 100 million users per month. You'll work to empower product teams and developers: at Yelp, spinning up infrastructure should always be a git commit and a code review away: automation and self-service are at the core of what we do. This opportunity requires you to be located in the United Kingdom. We'd love to have you apply, even if you don't feel you meet every single requirement in this posting. At Yelp, we're looking for great people, not just those who simply check off all the boxes. What you'll do: Work closely with engineers in supporting new features and services. Build tools to monitor site stability and performance. Help ensure the reliability and scalability of our infrastructure, while maintaining platform SLOs. Troubleshoot site issues using industry-leading tools like Splunk, Prometheus and OpenTelemetry. Automate everything with Python, Puppet, Git, Jenkins, and Terraform and leveraging an extensive suite of AI tooling. Develop custom tools when off-the-shelf solutions don't work at our scale and contribute upstream to open source projects. Participate in light on-call rotations - we have geographically distributed SRE teams for follow-the-sun support, which means nobody needs to be on-call 24h a day! What it takes to succeed: Familiarity with Linux and an enthusiasm for learning more (we use Ubuntu but any distro is fine). Command of your favorite modern programming language: Python, Ruby, Go, Java, C++, etc. Hands on experience with public cloud platforms (we use AWS, but Azure/GCP are fine). Practical background in Infrastructure as Code and Configuration Management (e.g., Terraform, Puppet, Ansible, or similar). Relevant experience with Linux containerisation and orchestration (e.g., Docker and Kubernetes). A solid understanding of fundamental technologies like TCP/IP, HTTP (transport, serving and load balancing), and DNS, containerization & workload orchestration or datastores and streaming technologies. Exposure to monitoring concepts and best practices around performance, reliability and security. Excellent communication and documentation skills. An eagerness to ask questions, take initiative and learn everyday. What you'll get: Full responsibility for projects from day one, a collaborative team, and a dynamic work environment. Competitive salary, a pension scheme, and an optional employee stock purchase plan. 25 days paid holiday (rising to 29 with service), plus one floating holiday. £150 monthly reimbursement to help cover remote working expenses. £75 caregiver reimbursement to support dependent care for families. Private health insurance, including dental and vision. Flexible working hours and meeting free Wednesdays. Regular 3 day Hackathons, bi weekly learning groups, and productivity spending to support and encourage your career growth. Opportunities to participate in digital events and conferences. £75 per month to use toward qualifying wellness expenses. Quarterly team offsites. Closing Yelp values diversity. We're proud to be an equal opportunity employer and consider qualified applicants without regard to race, color, religion, sex, national origin, ancestry, age, genetic information, sexual orientation, gender identity, marital or family status, veteran status, medical condition, disability, or any other protected status. Notice to Northern Ireland Applicants A Basic criminal background check viaAccessNIis required for employment.Yelp complies with the AccessNI Code of Practice. Having a criminal record will not necessarily prevent a candidate from working with Yelp. Yelp will consider the nature of the position together with the circumstances and background of the candidate's offences or other information contained on a disclosure certificate. AccessNI's Privacy Policy is availablehere.Yelp's Criminal Background Check Policy is available upon request.
Miro is a fast-growing engineering organization building a business-critical collaboration platform used by companies around the world. As our product and infrastructure scale, we are looking for a Senior Network Site Reliability Engineer to help strengthen the reliability, availability, and scalability of our production environment. In this role, you will focus on cloud automation, Infrastructure as Code, network reliability, and operational excellence across our AWS infrastructure. You will help improve how we deploy, operate, observe, and govern cloud infrastructure at scale. With millions of users, a distributed team, and the support of outstanding VCs and advisors, it's an incredible time to join Miro. Bring your skills, experience, and vision, so we can transform together how the world collaborates. About the Role The Foundations & Network team is growing to support Miro's cloud infrastructure at scale. The team focuses on building stable, reliable, secure, and highly available solutions in AWS. We own all Infrastructure-as-Code aspects and provide workflows to orchestrate automated solutions. We are also responsible for applying Governance policies and administration of our infrastructure resources on AWS. What you'll do Work with our AWS infrastructure to continuously improve how we operate. Build and lead all aspects of our CI/CD pipelines (GitLab/GitHub) and provide automated solutions for IaC deployment. Implement automation and observability across infrastructure resources. Bring your own ideas for improving our infrastructure stack and implement them. Take part in our operational rotation to react and resolve incidents in production. Maintain and develop the building blocks of our IaC terraform codebase. Responsible for building, maintaining and optimizing Amazon Load Balancers, ensuring efficient traffic distribution and high availability for applications. Configure and maintain Amazon CloudFront distributions to accelerate content delivery and improve end-user experience. Develop and build Governance policies for our Cloud infrastructure. Maintain and develop our CloudWan infrastructure What you'll need 8+ years of professional experience in infrastructure, reliability, networking, or software engineering 6+ years of experience as an SRE, DevOps Engineer, Network Engineer, Software Engineer, or similar Hands-on experience with AWS infrastructure, including EC2, VPC, ALB, S3, Route 53, and CloudFront Confident networking knowledge, including TCP/IP, HTTP/S, DNS, BGP, IPV6 and hands-on experience with AWS Cloud WAN Strong experience with Infrastructure as Code, especially Terraform Deep understanding of DevOps practices, CI/CD, automation, and production operations Strong Linux skills and proficiency in Python, Bash, or another scripting/programming language Experience driving cross-functional projects from concept to delivery in Agile environments Strong problem-solving skills, with the ability to troubleshoot complex distributed systems and network issues Clear communication skills and ability to collaborate effectively across engineering teams Nice to have Experience with containers and Kubernetes Familiarity with AWS governance and security controls, including SCPs and IAM policies Experience improving reliability, observability, performance, or incident response processes Exposure to large-scale CDN, edge, or traffic-routing environments Experience working in globally distributed infrastructure or platform teams Basic experience using GCP and Azure What's in it for you We want you to feel supported, connected, and ready to grow. Our global benefits package generally includes equity, a wellbeing benefit, a WFH equipment allowance, and an annual Learning & Development stipend. Join a diverse team where you can do your best work. Full benefits may differ per location. If you would like to learn more about location-specific benefits, please refer to our Global Miro benefits board. About Miro Miro is a visual workspace for innovation that enables distributed teams of any size to build the next big thing. The platform's infinite canvas enables teams to lead engaging workshops and meetings, design products, brainstorm ideas, and more. Miro, co-headquartered in San Francisco and Amsterdam, serves more than 100M users and 250,000 companies collaborate in the Innovation Workspace. Miro was founded in 2011 and currently has more than 1,600 employees in 13 hubs around the world. We are a team of dreamers. We look for individuals who dream big, work hard, and above all stay humble. Collaboration is at the heart of what we do and through our work together we hope to create a supportive, welcoming, and innovative environment. We strive to play as a team to win the world and create a better version of ourselves every day. If this sounds like something that excites you, we want to hear from you! At Miro, we strive to create and foster an environment of belonging and collaboration across cultural differences. Miro's mission - Empower teams to create the next big thing - is how we think about our product, people, and culture. We believe that creating big things requires diverse and inclusive teams. Diversity invites all talent with different demography, identities and styles to step in, and inclusion invites them to step closer together. Every day, we are working to build a more diverse Miro, cultivate a sense of belonging for future and current Mironeers around the world, and foster an environment where everyone can collaborate and embrace differences.
15/06/2026
Full time
Miro is a fast-growing engineering organization building a business-critical collaboration platform used by companies around the world. As our product and infrastructure scale, we are looking for a Senior Network Site Reliability Engineer to help strengthen the reliability, availability, and scalability of our production environment. In this role, you will focus on cloud automation, Infrastructure as Code, network reliability, and operational excellence across our AWS infrastructure. You will help improve how we deploy, operate, observe, and govern cloud infrastructure at scale. With millions of users, a distributed team, and the support of outstanding VCs and advisors, it's an incredible time to join Miro. Bring your skills, experience, and vision, so we can transform together how the world collaborates. About the Role The Foundations & Network team is growing to support Miro's cloud infrastructure at scale. The team focuses on building stable, reliable, secure, and highly available solutions in AWS. We own all Infrastructure-as-Code aspects and provide workflows to orchestrate automated solutions. We are also responsible for applying Governance policies and administration of our infrastructure resources on AWS. What you'll do Work with our AWS infrastructure to continuously improve how we operate. Build and lead all aspects of our CI/CD pipelines (GitLab/GitHub) and provide automated solutions for IaC deployment. Implement automation and observability across infrastructure resources. Bring your own ideas for improving our infrastructure stack and implement them. Take part in our operational rotation to react and resolve incidents in production. Maintain and develop the building blocks of our IaC terraform codebase. Responsible for building, maintaining and optimizing Amazon Load Balancers, ensuring efficient traffic distribution and high availability for applications. Configure and maintain Amazon CloudFront distributions to accelerate content delivery and improve end-user experience. Develop and build Governance policies for our Cloud infrastructure. Maintain and develop our CloudWan infrastructure What you'll need 8+ years of professional experience in infrastructure, reliability, networking, or software engineering 6+ years of experience as an SRE, DevOps Engineer, Network Engineer, Software Engineer, or similar Hands-on experience with AWS infrastructure, including EC2, VPC, ALB, S3, Route 53, and CloudFront Confident networking knowledge, including TCP/IP, HTTP/S, DNS, BGP, IPV6 and hands-on experience with AWS Cloud WAN Strong experience with Infrastructure as Code, especially Terraform Deep understanding of DevOps practices, CI/CD, automation, and production operations Strong Linux skills and proficiency in Python, Bash, or another scripting/programming language Experience driving cross-functional projects from concept to delivery in Agile environments Strong problem-solving skills, with the ability to troubleshoot complex distributed systems and network issues Clear communication skills and ability to collaborate effectively across engineering teams Nice to have Experience with containers and Kubernetes Familiarity with AWS governance and security controls, including SCPs and IAM policies Experience improving reliability, observability, performance, or incident response processes Exposure to large-scale CDN, edge, or traffic-routing environments Experience working in globally distributed infrastructure or platform teams Basic experience using GCP and Azure What's in it for you We want you to feel supported, connected, and ready to grow. Our global benefits package generally includes equity, a wellbeing benefit, a WFH equipment allowance, and an annual Learning & Development stipend. Join a diverse team where you can do your best work. Full benefits may differ per location. If you would like to learn more about location-specific benefits, please refer to our Global Miro benefits board. About Miro Miro is a visual workspace for innovation that enables distributed teams of any size to build the next big thing. The platform's infinite canvas enables teams to lead engaging workshops and meetings, design products, brainstorm ideas, and more. Miro, co-headquartered in San Francisco and Amsterdam, serves more than 100M users and 250,000 companies collaborate in the Innovation Workspace. Miro was founded in 2011 and currently has more than 1,600 employees in 13 hubs around the world. We are a team of dreamers. We look for individuals who dream big, work hard, and above all stay humble. Collaboration is at the heart of what we do and through our work together we hope to create a supportive, welcoming, and innovative environment. We strive to play as a team to win the world and create a better version of ourselves every day. If this sounds like something that excites you, we want to hear from you! At Miro, we strive to create and foster an environment of belonging and collaboration across cultural differences. Miro's mission - Empower teams to create the next big thing - is how we think about our product, people, and culture. We believe that creating big things requires diverse and inclusive teams. Diversity invites all talent with different demography, identities and styles to step in, and inclusion invites them to step closer together. Every day, we are working to build a more diverse Miro, cultivate a sense of belonging for future and current Mironeers around the world, and foster an environment where everyone can collaborate and embrace differences.
Front Office Product Support page is loaded Front Office Product Supportlocations: Londontime type: Full timeposted on: Posted Yesterdayjob requisition id: R5215The TP ICAP Group is a world leading provider of market infrastructure.Our purpose is to provide clients with access to global financial and commodities markets, improving price discovery, liquidity, and distribution of data, through responsible and innovative solutions.Through our people and technology, we connect clients to superior liquidity and data solutions.The Group is home to a stable of premium brands. Collectively, TP ICAP is the largest interdealer broker in the world by revenue, the number one Energy & Commodities broker in the world, the world's leading provider of OTC data, and an award winning all-to-all trading platform.The Group operates from more than 60 offices in 27 countries. We are 5,300 people strong. We work as one to achieve our vision of being the world's most trusted, innovative, liquidity and data solutions specialist. Role Overview Liquidnet is looking for an Application Support engineer to work within the EMEA Front Office Support team. The team requires a motivated self-starter who has the technical skills to support a growing number of buy-side members utilising their FIX, Linux, Windows Server, DevOps, database and networking skills.This will be a varied role involving working on a multitude of cross-platform market-leading technologies to support the running of our bespoke trading platform. The successful candidate will be responsible for all aspects of support covering both proprietary and third-party applications from the front to back office, with a particular focus on Transaction & Regulatory Reporting. Liquidnet champions automation and you will be expected to identify and help streamline manual or repetitive tasks. You will have the opportunity to contribute to, and run with projects, new feature implementations, client migrations, and help Liquidnet migrate to cloud-based technologies. Additionally, the role will involve member user administration and support via phone and email, OMS integration support and trade lifecycle issues for both the MTF platform and trading desk.The successful candidate should possess a positive 'can-do' attitude and an intuitively high level of customer service in their approach. This will complement strong FIX, database (SQL, Sybase or Oracle), as well as Linux, understanding of cloud-based technologies, Windows and Networking troubleshooting skills. Role Responsibilities Contribute towards 'follow the sun' support model, working closely with global teams in APAC and US to ensure pre-market health checks are performed for each region Perform regional start of day health checks to ensure all members are connected to the platform Utilising proprietary tools, provide daily application support and troubleshooting for platform members and internal users, escalating to Development teams appropriately An application support focus on back-office flows, particularly around Regulatory and Transaction Reporting support Daily interaction with all internal stakeholders with regards to support issues Efficiently create and track issues within an incident-management system to help identify trends and patterns Create and monitor internal reports and usage queries Assist with product testing and project work Identify and escalate possible platform improvements Experience / Competences Essential Hands-on support experience within a financial institution (buy-side, sell-side, venue/platform provider) Solid application support experience within a Linux environment Excellent working knowledge of the FIX protocol Good understanding of European Equity market structure, mechanics and flows Ability to convey expected behaviour of industry-standard algorithms (VWAP, TWAP, IS, POV etc) Automation and scripting experience Proven experience of MSSQL, Oracle and Sybase database environments, including complex query-writing Proven experience of supporting Windows Server environments Experience in troubleshooting network problems: i.e. firewall and routing problems Motivated self-starter who takes ownership of responsibilities, and can work autonomously Ability to confidently communicate at all stakeholder levels (technical, client, trader, executive team, etc) Excellent organisational skills Analytical and disciplined approach to problem-solving Must be a team player with ability and interest in participating in new projects and helping other departments within the companyDesired Client / Venue technical FIX onboarding exposure Proven experience in managing cloud-based infrastructure and services, including AWS, Azure, or Google Cloud Platform. Strong understanding of DevOps principles and practices, including CI/CD pipelines, infrastructure as code (IaC), and automated testing Hands-on experience with containerization technologies like Docker and orchestration platforms like Kubernetes. Exposure to supporting message-based architecture Working knowledge of at least one buy-side or sell-side Order Management System Experience with industry-standard monitoring tools (ITRS or similar) Experience with Site Reliability Engineering (SRE) practices, including monitoring, incident response, and post-mortem analysis Job Band & Level Professional / 5 Company Statement We know that the best innovation happens when diverse people with different perspectives and skills work together in an inclusive atmosphere. That's why we're building a culture where everyone plays a part in making people feel welcome, ready and willing to contribute. TP ICAP Accord - our Employee Network - is a central to this. As well as representing specific groups, TP ICAP Accord helps increase awareness, collaboration, shares best practice, and holds our firm to account for driving continuous cultural improvement. Location UK - 135 Bishopsgate - London Connecting clients, communities and colleagues for sustainable growth TP ICAP connects people, platforms, ideas, and insight across the world's financial, energy and commodities markets. As a global leader in market infrastructure and data-led solutions, we enhance market access, increase efficiencies, and unlock possibilities. Work with us Joining TP ICAP puts you at the heart of markets that matter.You'll have the freedom to innovate and act on your initiative. We'll train you and build your abilities in your specialist area, so that you can become an expert in your field. And all within a connected network that's there to set you up for success.TP ICAP Group is a collection of premium brands each with a distinct, client-focused offering. Underpinning and connecting these client-facing brands is the financial security, operational strength and know-how we have as a Group.Connections are at the heart of what we do. We combine our people's know-how with the latest technology to improve price discovery, trade execution and liquidity flow.Connections create strength. Through them, we help our clients to manage risk, realise investment strategies and expand the scope for growth.And connections act as a catalyst. Sparking richer solutions for our clients to break new ground, modernising markets for future performance, and creating dynamic careers for our people. Our capacity to connect builds trust, supports communities and gives us the power to anticipate and respond to change, whatever direction the world takes. It's what makes TP ICAP a mainstay in the global markets, now and in the future.TP ICAP. We connect.
15/06/2026
Full time
Front Office Product Support page is loaded Front Office Product Supportlocations: Londontime type: Full timeposted on: Posted Yesterdayjob requisition id: R5215The TP ICAP Group is a world leading provider of market infrastructure.Our purpose is to provide clients with access to global financial and commodities markets, improving price discovery, liquidity, and distribution of data, through responsible and innovative solutions.Through our people and technology, we connect clients to superior liquidity and data solutions.The Group is home to a stable of premium brands. Collectively, TP ICAP is the largest interdealer broker in the world by revenue, the number one Energy & Commodities broker in the world, the world's leading provider of OTC data, and an award winning all-to-all trading platform.The Group operates from more than 60 offices in 27 countries. We are 5,300 people strong. We work as one to achieve our vision of being the world's most trusted, innovative, liquidity and data solutions specialist. Role Overview Liquidnet is looking for an Application Support engineer to work within the EMEA Front Office Support team. The team requires a motivated self-starter who has the technical skills to support a growing number of buy-side members utilising their FIX, Linux, Windows Server, DevOps, database and networking skills.This will be a varied role involving working on a multitude of cross-platform market-leading technologies to support the running of our bespoke trading platform. The successful candidate will be responsible for all aspects of support covering both proprietary and third-party applications from the front to back office, with a particular focus on Transaction & Regulatory Reporting. Liquidnet champions automation and you will be expected to identify and help streamline manual or repetitive tasks. You will have the opportunity to contribute to, and run with projects, new feature implementations, client migrations, and help Liquidnet migrate to cloud-based technologies. Additionally, the role will involve member user administration and support via phone and email, OMS integration support and trade lifecycle issues for both the MTF platform and trading desk.The successful candidate should possess a positive 'can-do' attitude and an intuitively high level of customer service in their approach. This will complement strong FIX, database (SQL, Sybase or Oracle), as well as Linux, understanding of cloud-based technologies, Windows and Networking troubleshooting skills. Role Responsibilities Contribute towards 'follow the sun' support model, working closely with global teams in APAC and US to ensure pre-market health checks are performed for each region Perform regional start of day health checks to ensure all members are connected to the platform Utilising proprietary tools, provide daily application support and troubleshooting for platform members and internal users, escalating to Development teams appropriately An application support focus on back-office flows, particularly around Regulatory and Transaction Reporting support Daily interaction with all internal stakeholders with regards to support issues Efficiently create and track issues within an incident-management system to help identify trends and patterns Create and monitor internal reports and usage queries Assist with product testing and project work Identify and escalate possible platform improvements Experience / Competences Essential Hands-on support experience within a financial institution (buy-side, sell-side, venue/platform provider) Solid application support experience within a Linux environment Excellent working knowledge of the FIX protocol Good understanding of European Equity market structure, mechanics and flows Ability to convey expected behaviour of industry-standard algorithms (VWAP, TWAP, IS, POV etc) Automation and scripting experience Proven experience of MSSQL, Oracle and Sybase database environments, including complex query-writing Proven experience of supporting Windows Server environments Experience in troubleshooting network problems: i.e. firewall and routing problems Motivated self-starter who takes ownership of responsibilities, and can work autonomously Ability to confidently communicate at all stakeholder levels (technical, client, trader, executive team, etc) Excellent organisational skills Analytical and disciplined approach to problem-solving Must be a team player with ability and interest in participating in new projects and helping other departments within the companyDesired Client / Venue technical FIX onboarding exposure Proven experience in managing cloud-based infrastructure and services, including AWS, Azure, or Google Cloud Platform. Strong understanding of DevOps principles and practices, including CI/CD pipelines, infrastructure as code (IaC), and automated testing Hands-on experience with containerization technologies like Docker and orchestration platforms like Kubernetes. Exposure to supporting message-based architecture Working knowledge of at least one buy-side or sell-side Order Management System Experience with industry-standard monitoring tools (ITRS or similar) Experience with Site Reliability Engineering (SRE) practices, including monitoring, incident response, and post-mortem analysis Job Band & Level Professional / 5 Company Statement We know that the best innovation happens when diverse people with different perspectives and skills work together in an inclusive atmosphere. That's why we're building a culture where everyone plays a part in making people feel welcome, ready and willing to contribute. TP ICAP Accord - our Employee Network - is a central to this. As well as representing specific groups, TP ICAP Accord helps increase awareness, collaboration, shares best practice, and holds our firm to account for driving continuous cultural improvement. Location UK - 135 Bishopsgate - London Connecting clients, communities and colleagues for sustainable growth TP ICAP connects people, platforms, ideas, and insight across the world's financial, energy and commodities markets. As a global leader in market infrastructure and data-led solutions, we enhance market access, increase efficiencies, and unlock possibilities. Work with us Joining TP ICAP puts you at the heart of markets that matter.You'll have the freedom to innovate and act on your initiative. We'll train you and build your abilities in your specialist area, so that you can become an expert in your field. And all within a connected network that's there to set you up for success.TP ICAP Group is a collection of premium brands each with a distinct, client-focused offering. Underpinning and connecting these client-facing brands is the financial security, operational strength and know-how we have as a Group.Connections are at the heart of what we do. We combine our people's know-how with the latest technology to improve price discovery, trade execution and liquidity flow.Connections create strength. Through them, we help our clients to manage risk, realise investment strategies and expand the scope for growth.And connections act as a catalyst. Sparking richer solutions for our clients to break new ground, modernising markets for future performance, and creating dynamic careers for our people. Our capacity to connect builds trust, supports communities and gives us the power to anticipate and respond to change, whatever direction the world takes. It's what makes TP ICAP a mainstay in the global markets, now and in the future.TP ICAP. We connect.