A World-Changing Company Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role Substrate is the team responsible for Palantir's core production infrastructure - 100s of K8s clusters - from on-prem to the major cloud hyperscalers, whether they are internet-connected or air-gapped, small hardware footprint or large. As a Senior Software Engineer on Substrate, you will design and build Palantir's managed Kubernetes product offerings across all these environments. You and your team will be responsible for bootstrapping and operating the entire fleet of K8s clusters with zero manual steps by building industry leading tooling and contributing to core CNCF components. You will also be responsible for ensuring scale, stability and security across a matrix of compliance regimes and hosting infrastructure types. Your team culture emphasizes engineering rigor and operational excellence at scale. This means issues in production should be pre-empted and deeply root-caused, and investments in automation and self-healing systems are key. If you're excited about infrastructure at scale and working with Kubernetes, this is the right role for you. Core Responsibilities Deliver a container runtime to challenging new environment types - new clouds, on premise, edge devices Build automation and establish standards for operating K8s securely at scale with zero manual ops overhead Drive innovation through adoption of novel K8s features and CNCF tools, making upstream contributions as needed Design the next generation of Palantir's infrastructure through a deep understanding of internal systems and CNCF standards What We Value Systems programming experience with strong proficiency in golang, C/C++ or equivalent Working knowledge or hands on experience of infrastructure automation tools such as Terraform, ansible, puppet or K8s operators, and competent coding in Go, Java, or equivalent for the purposes of automation or scripting Deep familiarity with hardware and OS configurations, diagnostic tooling, networking nuts and bolts Deep familiarity with containers (Docker) and orchestration (Kubernetes) at scale Experience working with a cloud provider (AWS/Azure/GCE), or sysadmin/SRE experience in data centers Experience designing, building, and operating high-scale observability or infrastructure systems Working knowledge of networking fundamentals, experience with CNIs or cloud networking infrastructure preferred What We Require 4+ years of professional software development experience on core infrastructure with emphasis on operational excellence 2+ years of experience contributing to the system design or architecture (architecture, design patterns, reliability and scaling) of new and existing systems Bachelor's degree in Computer Science or equivalent Life at Palantir We want every Palantirian to achieve their best outcomes, that's why we celebrate individuals' strengths, skills, and interests, from your first interview to your longterm growth, rather than rely on traditional career ladders. Paying attention to the needs of our community enables us to optimize our opportunities to grow and helps ensure many pathways to success at Palantir. Promoting health and well-being across all areas of Palantirians' lives is just one of the ways we're investing in our community. Learn more at Life at Palantir and note that our offerings may vary by region. In keeping consistent with Palantir's values and culture, we believe employees are "better together" and in-person work affords the opportunity for more creative outcomes. Therefore, we encourage employees to work from our offices to foster connectivity and innovation. Many teams do offer hybrid options (WFH a day or two a week), allowing our employees to strike the right trade-off for their personal productivity. Based on business need, there are a few roles that allow for "Remote" work on an exceptional basis. If you are applying for one of these roles, you must work from the city and or country in which you are employed. If the posting is specified as Onsite, you are required to work from an office. If you want to empower the world's most important institutions, you belong here. Palantir values excellence regardless of background. We are committed to making the application and hiring process accessible to everyone and will provide a reasonable accommodation for those living with a disability. If you need an accommodation for the application or hiring process, please reach out and let us know how we can help.
27/06/2026
Full time
A World-Changing Company Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role Substrate is the team responsible for Palantir's core production infrastructure - 100s of K8s clusters - from on-prem to the major cloud hyperscalers, whether they are internet-connected or air-gapped, small hardware footprint or large. As a Senior Software Engineer on Substrate, you will design and build Palantir's managed Kubernetes product offerings across all these environments. You and your team will be responsible for bootstrapping and operating the entire fleet of K8s clusters with zero manual steps by building industry leading tooling and contributing to core CNCF components. You will also be responsible for ensuring scale, stability and security across a matrix of compliance regimes and hosting infrastructure types. Your team culture emphasizes engineering rigor and operational excellence at scale. This means issues in production should be pre-empted and deeply root-caused, and investments in automation and self-healing systems are key. If you're excited about infrastructure at scale and working with Kubernetes, this is the right role for you. Core Responsibilities Deliver a container runtime to challenging new environment types - new clouds, on premise, edge devices Build automation and establish standards for operating K8s securely at scale with zero manual ops overhead Drive innovation through adoption of novel K8s features and CNCF tools, making upstream contributions as needed Design the next generation of Palantir's infrastructure through a deep understanding of internal systems and CNCF standards What We Value Systems programming experience with strong proficiency in golang, C/C++ or equivalent Working knowledge or hands on experience of infrastructure automation tools such as Terraform, ansible, puppet or K8s operators, and competent coding in Go, Java, or equivalent for the purposes of automation or scripting Deep familiarity with hardware and OS configurations, diagnostic tooling, networking nuts and bolts Deep familiarity with containers (Docker) and orchestration (Kubernetes) at scale Experience working with a cloud provider (AWS/Azure/GCE), or sysadmin/SRE experience in data centers Experience designing, building, and operating high-scale observability or infrastructure systems Working knowledge of networking fundamentals, experience with CNIs or cloud networking infrastructure preferred What We Require 4+ years of professional software development experience on core infrastructure with emphasis on operational excellence 2+ years of experience contributing to the system design or architecture (architecture, design patterns, reliability and scaling) of new and existing systems Bachelor's degree in Computer Science or equivalent Life at Palantir We want every Palantirian to achieve their best outcomes, that's why we celebrate individuals' strengths, skills, and interests, from your first interview to your longterm growth, rather than rely on traditional career ladders. Paying attention to the needs of our community enables us to optimize our opportunities to grow and helps ensure many pathways to success at Palantir. Promoting health and well-being across all areas of Palantirians' lives is just one of the ways we're investing in our community. Learn more at Life at Palantir and note that our offerings may vary by region. In keeping consistent with Palantir's values and culture, we believe employees are "better together" and in-person work affords the opportunity for more creative outcomes. Therefore, we encourage employees to work from our offices to foster connectivity and innovation. Many teams do offer hybrid options (WFH a day or two a week), allowing our employees to strike the right trade-off for their personal productivity. Based on business need, there are a few roles that allow for "Remote" work on an exceptional basis. If you are applying for one of these roles, you must work from the city and or country in which you are employed. If the posting is specified as Onsite, you are required to work from an office. If you want to empower the world's most important institutions, you belong here. Palantir values excellence regardless of background. We are committed to making the application and hiring process accessible to everyone and will provide a reasonable accommodation for those living with a disability. If you need an accommodation for the application or hiring process, please reach out and let us know how we can help.
Senior Software Engineer / Reliability Engineering - Real-time Data Location: London Business Area: Engineering and CTO Ref #: Description & Requirements Our department is responsible for efficiently distributing financial data from its source to interested users all around the world. This includes (for example) stock prices or foreign exchange rates. Data can either be served in response to a request or streamed in real time. The group owns: The distribution software and infrastructure A range of different sources of data Supporting services to administer and manage the system, including permissioning and metering The team is also responsible for the Enterprise endpoint ("B-PIPE"), which allows end-users to programmatically consume data via our SDK. Data is also available through the Bloomberg Terminal and Microsoft Excel. The main challenge faced by the group is one of scale. Data is sourced from more than 370 global exchanges, with a combined volume in excess of 60 billion messages each day. We deliver this data to hundreds of thousands of terminals and thousands of B-PIPEs. Handling this volume requires significant infrastructure, we manage multiple clusters in our main data centres, as well as a network of many thousands of servers around the world. Group Overview The RD Reliability Engineering group comprises three sub-teams located in Tokyo, London, and New York, providing follow-the-sun support. Our mission is to ensure systems are reliable, scalable, and observable through software engineering, while continuously improving how systems behave under load and failure conditions. We work in an outcome-driven model, focusing on measurable improvements in availability, latency, capacity, and recovery. Our goal is to ensure systems meet defined service level objectives while minimising manual operational effort through automation and software solutions. The systems we support must behave predictably under extreme load, recover quickly from failures, and continue to evolve without compromising stability - these are the core challenges we solve. London Team Focus - Availability & Resiliency The London team plays a key role in ensuring the availability and resiliency of RD infrastructure globally. We focus on: Detecting and preventing failures across large-scale distributed systems Ensuring infrastructure demonstrates sufficient capacity and failover capability during site-loss scenarios Reducing time to detect, diagnose, and recover from incidents Ensuring systems behave predictably under both normal and adverse conditions This role provides the opportunity to influence how reliability is engineered across the platform, working closely with teams globally to improve system behaviour and design. What You'll Do Build and maintain production-grade software supporting Bloomberg's global distribution infrastructure Design and implement scalable, fault-tolerant systems with a focus on observability, performance, and automation Analyse system behaviour under real-world and failure scenarios to validate capacity, failover, and recovery meet resilience objectives Identify bottlenecks, scaling limits, and reliability risks across distributed systems Improve detection, diagnosis, and prevention of production issues Build tools and frameworks to increase system visibility and reduce time to detect and resolve incidents Automate operational workflows to reduce manual effort and improve system reliability Partner with application and infrastructure teams to improve system design, resilience, and performance Contribute to design discussions, incident reviews, and reliability improvements across the platform Systems You'll Work With Configuration systems serving thousands of servers across the global network Service discovery and clustering systems for distributed infrastructure Monitoring and observability frameworks for large-scale server estates Tooling for diagnosing data quality and distribution issues Ownership of systems may evolve over time as the team focuses on areas of highest impact. What Success Looks Like Systems consistently meet defined reliability, latency, and capacity objectives Issues are detected and mitigated before significant customer impact Systems are demonstrably resilient, with proven failover capability and sufficient capacity under failure conditions Operational processes are automated and scalable Reliability is achieved through engineering improvements rather than manual intervention What We're Looking For We're not a traditional SRE team. We engineer reliability through software, building solutions that automate operations and improve system resilience by design. Experience with an object-oriented programming language (preferably Python or C++) Strong focus on building reliable, observable distributed systems Experience working with SLOs, SLIs, and production reliability metrics Proven ability to triage and resolve live production problems A mindset focused on automation and reducing operational toil A strength in collaborating within an inclusive team environment The ability to work across departments and build strong relationships with both technical and non-technical partners Why Join Us You'll work on systems that sit at the core of Bloomberg's real-time data platform, operating at global scale and under demanding performance and reliability requirements. This is an opportunity to: Solve complex distributed systems problems with real-world impact Influence how reliability is engineered across a critical platform Work with teams across multiple regions and technical domains Build systems that are resilient by design and operate at massive scale If indicated, please note that years of experience are a guide; we will consider applications from all candidates who can demonstrate the skills necessary for the role. Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
26/06/2026
Full time
Senior Software Engineer / Reliability Engineering - Real-time Data Location: London Business Area: Engineering and CTO Ref #: Description & Requirements Our department is responsible for efficiently distributing financial data from its source to interested users all around the world. This includes (for example) stock prices or foreign exchange rates. Data can either be served in response to a request or streamed in real time. The group owns: The distribution software and infrastructure A range of different sources of data Supporting services to administer and manage the system, including permissioning and metering The team is also responsible for the Enterprise endpoint ("B-PIPE"), which allows end-users to programmatically consume data via our SDK. Data is also available through the Bloomberg Terminal and Microsoft Excel. The main challenge faced by the group is one of scale. Data is sourced from more than 370 global exchanges, with a combined volume in excess of 60 billion messages each day. We deliver this data to hundreds of thousands of terminals and thousands of B-PIPEs. Handling this volume requires significant infrastructure, we manage multiple clusters in our main data centres, as well as a network of many thousands of servers around the world. Group Overview The RD Reliability Engineering group comprises three sub-teams located in Tokyo, London, and New York, providing follow-the-sun support. Our mission is to ensure systems are reliable, scalable, and observable through software engineering, while continuously improving how systems behave under load and failure conditions. We work in an outcome-driven model, focusing on measurable improvements in availability, latency, capacity, and recovery. Our goal is to ensure systems meet defined service level objectives while minimising manual operational effort through automation and software solutions. The systems we support must behave predictably under extreme load, recover quickly from failures, and continue to evolve without compromising stability - these are the core challenges we solve. London Team Focus - Availability & Resiliency The London team plays a key role in ensuring the availability and resiliency of RD infrastructure globally. We focus on: Detecting and preventing failures across large-scale distributed systems Ensuring infrastructure demonstrates sufficient capacity and failover capability during site-loss scenarios Reducing time to detect, diagnose, and recover from incidents Ensuring systems behave predictably under both normal and adverse conditions This role provides the opportunity to influence how reliability is engineered across the platform, working closely with teams globally to improve system behaviour and design. What You'll Do Build and maintain production-grade software supporting Bloomberg's global distribution infrastructure Design and implement scalable, fault-tolerant systems with a focus on observability, performance, and automation Analyse system behaviour under real-world and failure scenarios to validate capacity, failover, and recovery meet resilience objectives Identify bottlenecks, scaling limits, and reliability risks across distributed systems Improve detection, diagnosis, and prevention of production issues Build tools and frameworks to increase system visibility and reduce time to detect and resolve incidents Automate operational workflows to reduce manual effort and improve system reliability Partner with application and infrastructure teams to improve system design, resilience, and performance Contribute to design discussions, incident reviews, and reliability improvements across the platform Systems You'll Work With Configuration systems serving thousands of servers across the global network Service discovery and clustering systems for distributed infrastructure Monitoring and observability frameworks for large-scale server estates Tooling for diagnosing data quality and distribution issues Ownership of systems may evolve over time as the team focuses on areas of highest impact. What Success Looks Like Systems consistently meet defined reliability, latency, and capacity objectives Issues are detected and mitigated before significant customer impact Systems are demonstrably resilient, with proven failover capability and sufficient capacity under failure conditions Operational processes are automated and scalable Reliability is achieved through engineering improvements rather than manual intervention What We're Looking For We're not a traditional SRE team. We engineer reliability through software, building solutions that automate operations and improve system resilience by design. Experience with an object-oriented programming language (preferably Python or C++) Strong focus on building reliable, observable distributed systems Experience working with SLOs, SLIs, and production reliability metrics Proven ability to triage and resolve live production problems A mindset focused on automation and reducing operational toil A strength in collaborating within an inclusive team environment The ability to work across departments and build strong relationships with both technical and non-technical partners Why Join Us You'll work on systems that sit at the core of Bloomberg's real-time data platform, operating at global scale and under demanding performance and reliability requirements. This is an opportunity to: Solve complex distributed systems problems with real-world impact Influence how reliability is engineered across a critical platform Work with teams across multiple regions and technical domains Build systems that are resilient by design and operate at massive scale If indicated, please note that years of experience are a guide; we will consider applications from all candidates who can demonstrate the skills necessary for the role. Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
Platform Engineer Location: Gloucester Type: Permanent, full time Work arrangement: Hybrid (min 3 days a week on site) Security clearance: Must be willing to obtain SC and eDV clearance. Responsibilities Deploy applications and software to cloud or on-prem environments for various business areas. Build and set up development tools and infrastructure. Understand project stakeholder needs. Automate and improve development and release processes. Ensure systems are safe and secure against cyber security threats. Identify technical problems and develop software updates and solutions. Collaborate with other engineers to ensure development follows established processes and works as intended. Required experience Experience working in an Agile/SCRUM/DevOps delivery model. Proficiency with cloud technologies (AWS or Azure). Experience with infrastructure-as-code tools (e.g., Terraform, Puppet, Chef, Ansible). Experience building and deploying large-scale applications in Continuous Integration/Delivery pipelines. Experience with container platforms and orchestration systems (ECS, AKS, Kubernetes, Helm, Docker). Experience with automation and integration tools such as Jenkins, Concourse CI, or cloud equivalents. Experience with scripting languages and source control.
26/06/2026
Full time
Platform Engineer Location: Gloucester Type: Permanent, full time Work arrangement: Hybrid (min 3 days a week on site) Security clearance: Must be willing to obtain SC and eDV clearance. Responsibilities Deploy applications and software to cloud or on-prem environments for various business areas. Build and set up development tools and infrastructure. Understand project stakeholder needs. Automate and improve development and release processes. Ensure systems are safe and secure against cyber security threats. Identify technical problems and develop software updates and solutions. Collaborate with other engineers to ensure development follows established processes and works as intended. Required experience Experience working in an Agile/SCRUM/DevOps delivery model. Proficiency with cloud technologies (AWS or Azure). Experience with infrastructure-as-code tools (e.g., Terraform, Puppet, Chef, Ansible). Experience building and deploying large-scale applications in Continuous Integration/Delivery pipelines. Experience with container platforms and orchestration systems (ECS, AKS, Kubernetes, Helm, Docker). Experience with automation and integration tools such as Jenkins, Concourse CI, or cloud equivalents. Experience with scripting languages and source control.
Viridien ( ) is an advanced technology, digital and Earth data company that pushes the boundaries of science for a more prosperous and sustainable future. With our ingenuity, drive and deep curiosity we discover new insights, innovations, and solutions that efficiently and responsibly resolve complex natural resource, digital, energy transition and infrastructure challenges.Job DetailsViridien is seeking a Senior Cloud Engineer to lead the implementation and operation of secure, scalable cloud platforms across both Viridien and client environments.This role is responsible for translating cloud architectures into production-ready solutions, supporting the deployment of Data Hub's data transformation and analytics platform across multiple cloud environments. You will work closely with software, data, and infrastructure teams to deliver reliable, secure, and repeatable deployments while ensuring operational excellence and adherence to governance requirements.About The TeamYou will join the Data Hub team, a multidisciplinary group of scientists, engineers, and developers focused on solving complex data transformation and analytics challenges across industries including geothermal, environmental, hydrocarbon, and mineral exploration.The team works in a collaborative environment alongside data engineers, machine learning engineers, and software developers. This role is based in either North Wales or Crawley with a hybrid working arrangement.Key ResponsibilitiesCloud Platform EngineeringImplement cloud architectures within Viridien and client environments.Build and automate cloud foundations including networking, identity, governance, and security controls.Deliver and operate Kubernetes-based platforms, primarily using Azure Kubernetes Service (AKS).Infrastructure Automation & DeploymentDevelop and maintain Infrastructure as Code using Terraform, Bicep, or similar tools.Build and manage CI/CD pipelines, deployment processes, and release controls.Create reusable cloud modules and deployment patterns to improve consistency and scalability.Security & GovernanceImplement security controls including IAM/RBAC, network segmentation, encryption, secrets management, and policy enforcement.Ensure cloud platforms meet security, compliance, and governance requirements.Support vulnerability management and security monitoring practices.Operations & ReliabilityImplement monitoring, alerting, backup, and disaster recovery solutions.Troubleshoot and resolve complex issues across cloud, Kubernetes, and database platforms.Develop operational documentation, runbooks, and knowledge transfer materials.Stakeholder CollaborationWork closely with software teams, client stakeholders, and operations teams to deliver cloud solutions.Support technical planning, risk management, and deployment activities.Contribute to engineering standards, best practices, and continuous improvement initiatives.QualificationsRequiredProven experience delivering production cloud environments, particularly within Microsoft Azure.Strong experience implementing and operating Kubernetes platforms.Experience deploying and supporting database platforms using both PaaS and VM-based solutions.Strong experience with Infrastructure as Code tools such as Terraform or Bicep.Experience building and maintaining CI/CD pipelines.Strong understanding of cloud security principles, including IAM, networking, encryption, and secrets management.Experience troubleshooting networking, distributed systems, and cloud platform issues.Ability to work independently and take ownership of technical delivery.PreferredExperience with AWS and/or OpenStack environments.Experience implementing landing zones and cloud governance frameworks.Scripting and automation experience using PowerShell, Python, or Bash.Experience with policy-as-code and cloud security tooling.Experience supporting hybrid or private cloud environments.Experience working within highly regulated or security-sensitive environments.Additional InformationHybrid working available from North Wales or Crawley.Opportunity to work across both Viridien and client cloud environments.Collaboration with software, data, and machine learning teams on large-scale data and analytics platforms.Why work with us?Competitive salary commensurate with experienceHighly attractive bonus schemeHybrid model and flexible working with up to 2 days at homeInitial 22 days annual leave with future increases, complemented by a flexible buying and selling holiday programCompany pension with generous employer contributionWellbeing Unmind app - puts you in control of your mental healthA flexible benefits platform with numerous discount schemes - gym membership, restaurants, cinema tickets, and much more!Regular social club events, spontaneous reward events throughout the yearCycle purchase schemeFlexible Private Medical & Dental care programmesSponsorship of visas/comprehensive relocation packagesBank Holiday Swap - our holiday swap program allows you to change it for another day of your choice!Relaxed dress code policyL earning and DevelopmentAt Viridien, we foster a culture of continuous learning and provide tailored training programs through our Learning Hub, designed to enhance technical, commercial, and personal growth.We Care About The EnvironmentWe encourage and actively support a strong sense of community, through volunteering and various company initiatives, as well as a strong company commitment to protecting our environment through sustainable solutions, energy saving and waste reduction enterprises. Hiring ProcessAt Viridien, we are committed to delivering a respectful, inclusive, and transparent recruitment experience.Due to the high volume of applications we receive, we may not be able to provide individual feedback to every applicant. Only candidates whose qualifications closely match the role criteria will be contacted for an interview. We do, however, aim to share personalized feedback with those who progress to the first round of interviews and beyond.We are also dedicated to ensuring that our hiring process accessible to all. If you require any reasonable adjustments to fully participate in the application or interview stages, please don't hesitate to contact your recruiter directly.We see things differently. Diversity fuels our innovation, we value the unique ways in which we differ, and we are committed to equal employment opportunities for all professionals.
26/06/2026
Full time
Viridien ( ) is an advanced technology, digital and Earth data company that pushes the boundaries of science for a more prosperous and sustainable future. With our ingenuity, drive and deep curiosity we discover new insights, innovations, and solutions that efficiently and responsibly resolve complex natural resource, digital, energy transition and infrastructure challenges.Job DetailsViridien is seeking a Senior Cloud Engineer to lead the implementation and operation of secure, scalable cloud platforms across both Viridien and client environments.This role is responsible for translating cloud architectures into production-ready solutions, supporting the deployment of Data Hub's data transformation and analytics platform across multiple cloud environments. You will work closely with software, data, and infrastructure teams to deliver reliable, secure, and repeatable deployments while ensuring operational excellence and adherence to governance requirements.About The TeamYou will join the Data Hub team, a multidisciplinary group of scientists, engineers, and developers focused on solving complex data transformation and analytics challenges across industries including geothermal, environmental, hydrocarbon, and mineral exploration.The team works in a collaborative environment alongside data engineers, machine learning engineers, and software developers. This role is based in either North Wales or Crawley with a hybrid working arrangement.Key ResponsibilitiesCloud Platform EngineeringImplement cloud architectures within Viridien and client environments.Build and automate cloud foundations including networking, identity, governance, and security controls.Deliver and operate Kubernetes-based platforms, primarily using Azure Kubernetes Service (AKS).Infrastructure Automation & DeploymentDevelop and maintain Infrastructure as Code using Terraform, Bicep, or similar tools.Build and manage CI/CD pipelines, deployment processes, and release controls.Create reusable cloud modules and deployment patterns to improve consistency and scalability.Security & GovernanceImplement security controls including IAM/RBAC, network segmentation, encryption, secrets management, and policy enforcement.Ensure cloud platforms meet security, compliance, and governance requirements.Support vulnerability management and security monitoring practices.Operations & ReliabilityImplement monitoring, alerting, backup, and disaster recovery solutions.Troubleshoot and resolve complex issues across cloud, Kubernetes, and database platforms.Develop operational documentation, runbooks, and knowledge transfer materials.Stakeholder CollaborationWork closely with software teams, client stakeholders, and operations teams to deliver cloud solutions.Support technical planning, risk management, and deployment activities.Contribute to engineering standards, best practices, and continuous improvement initiatives.QualificationsRequiredProven experience delivering production cloud environments, particularly within Microsoft Azure.Strong experience implementing and operating Kubernetes platforms.Experience deploying and supporting database platforms using both PaaS and VM-based solutions.Strong experience with Infrastructure as Code tools such as Terraform or Bicep.Experience building and maintaining CI/CD pipelines.Strong understanding of cloud security principles, including IAM, networking, encryption, and secrets management.Experience troubleshooting networking, distributed systems, and cloud platform issues.Ability to work independently and take ownership of technical delivery.PreferredExperience with AWS and/or OpenStack environments.Experience implementing landing zones and cloud governance frameworks.Scripting and automation experience using PowerShell, Python, or Bash.Experience with policy-as-code and cloud security tooling.Experience supporting hybrid or private cloud environments.Experience working within highly regulated or security-sensitive environments.Additional InformationHybrid working available from North Wales or Crawley.Opportunity to work across both Viridien and client cloud environments.Collaboration with software, data, and machine learning teams on large-scale data and analytics platforms.Why work with us?Competitive salary commensurate with experienceHighly attractive bonus schemeHybrid model and flexible working with up to 2 days at homeInitial 22 days annual leave with future increases, complemented by a flexible buying and selling holiday programCompany pension with generous employer contributionWellbeing Unmind app - puts you in control of your mental healthA flexible benefits platform with numerous discount schemes - gym membership, restaurants, cinema tickets, and much more!Regular social club events, spontaneous reward events throughout the yearCycle purchase schemeFlexible Private Medical & Dental care programmesSponsorship of visas/comprehensive relocation packagesBank Holiday Swap - our holiday swap program allows you to change it for another day of your choice!Relaxed dress code policyL earning and DevelopmentAt Viridien, we foster a culture of continuous learning and provide tailored training programs through our Learning Hub, designed to enhance technical, commercial, and personal growth.We Care About The EnvironmentWe encourage and actively support a strong sense of community, through volunteering and various company initiatives, as well as a strong company commitment to protecting our environment through sustainable solutions, energy saving and waste reduction enterprises. Hiring ProcessAt Viridien, we are committed to delivering a respectful, inclusive, and transparent recruitment experience.Due to the high volume of applications we receive, we may not be able to provide individual feedback to every applicant. Only candidates whose qualifications closely match the role criteria will be contacted for an interview. We do, however, aim to share personalized feedback with those who progress to the first round of interviews and beyond.We are also dedicated to ensuring that our hiring process accessible to all. If you require any reasonable adjustments to fully participate in the application or interview stages, please don't hesitate to contact your recruiter directly.We see things differently. Diversity fuels our innovation, we value the unique ways in which we differ, and we are committed to equal employment opportunities for all professionals.
Senior Software Engineer, Site Reliability Engineering, Cloud IRT corporate_fare Google place London, UK Apply Bachelor's degree in Computer Science, a related field, or equivalent practical experience. 5 years of experience with software development in one or more programming languages. 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems. 2 years of experience leading projects and providing technical leadership. Experience troubleshooting production incidents as part of an on-call rotation. Preferred qualifications: Master's degree in Computer Science or Engineering. Experience in telemetry systems, incident and risk management. Ability to work across organizational boundaries. Excellent systematic problem-solving approach, coupled with effective communication skills and a sense of drive. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame free environment. We promote self direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. Responsibilities Engage in and improve the whole lifecycle of service from inception and design, through to deployment, operation, and refinement. Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews. Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. Build systems and tooling to support Cloud IRT team; improve visibility into state of Cloud, detection of large scale issues, communications to customers, stakeholders and customer facing teams. Participate in oncall rotation supporting critical incident response for Google Cloud Platform (GCP). Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents-to-be, criminal histories consistent with legal requirements, or any other basis protected by law. See also Google's EEO Policy , Know your rights: workplace discrimination is illegal , Belonging at Google , and How we hire . Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting. Equity is granted exclusively and discretionarily by Alphabet Inc. on the basis of an agreement concluded between you and Alphabet Inc. Alphabet Inc. is your sole contractual partner with respect to equity grants. GSU grants are not guaranteed, are discretionary, are subject to approval by the Alphabet Inc. board of directors or its delegate, the terms of the relevant Alphabet Inc. stock plan, and your grant agreement. They have no impact on statutory payments. Current or past grants do not confer an acquired right.
26/06/2026
Full time
Senior Software Engineer, Site Reliability Engineering, Cloud IRT corporate_fare Google place London, UK Apply Bachelor's degree in Computer Science, a related field, or equivalent practical experience. 5 years of experience with software development in one or more programming languages. 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems. 2 years of experience leading projects and providing technical leadership. Experience troubleshooting production incidents as part of an on-call rotation. Preferred qualifications: Master's degree in Computer Science or Engineering. Experience in telemetry systems, incident and risk management. Ability to work across organizational boundaries. Excellent systematic problem-solving approach, coupled with effective communication skills and a sense of drive. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame free environment. We promote self direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. Responsibilities Engage in and improve the whole lifecycle of service from inception and design, through to deployment, operation, and refinement. Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews. Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. Build systems and tooling to support Cloud IRT team; improve visibility into state of Cloud, detection of large scale issues, communications to customers, stakeholders and customer facing teams. Participate in oncall rotation supporting critical incident response for Google Cloud Platform (GCP). Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents-to-be, criminal histories consistent with legal requirements, or any other basis protected by law. See also Google's EEO Policy , Know your rights: workplace discrimination is illegal , Belonging at Google , and How we hire . Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting. Equity is granted exclusively and discretionarily by Alphabet Inc. on the basis of an agreement concluded between you and Alphabet Inc. Alphabet Inc. is your sole contractual partner with respect to equity grants. GSU grants are not guaranteed, are discretionary, are subject to approval by the Alphabet Inc. board of directors or its delegate, the terms of the relevant Alphabet Inc. stock plan, and your grant agreement. They have no impact on statutory payments. Current or past grants do not confer an acquired right.
Site Reliability Engineer IIIApplylocations: Belfast - Millennium Housetime type: Full timeposted on: Posted Todayjob requisition id: 33993 Site Reliability Engineer III/SRE III (Tue - Sat) CME Group is seeking a Site Reliability Engineer III (Tue - Sat) to take a key role in building, operating, and scaling systems in our Markets portfolio. As an SRE III, you will apply your experience to the complex challenges of the CME Globex trading platform, where our systems deliver an exceptional combination of low-latency performance and rock-solid reliability .You will work with senior engineers on complex projects, take ownership of key reliability initiatives, and act as a mentor to junior colleagues, helping to shape the team's technical direction. Key Responsibilities Own Observability: Design, build, and refine monitoring, alerting, and observability solutions. Drive the continuous improvement of our SLIs & SLOs to enable faster issue detection and resolution. Drive Reliability Projects: Take ownership of reliability-focused projects from design to implementation, collaborating with product teams to ensure new features are scalable, resilient, and safe. Lead Technical Solutions: Lead technical discussions for your work, presenting solution options and proposals with clear trade-offs. Automate Intelligently: Proactively identify and eliminate toil through robust automation, improving both system reliability and team velocity. Manage Incidents: Take a leading role in incident response, owning the resolution of significant incidents, ensuring rapid system recovery, and driving meaningful action from blameless post-mortems. Mentor & Coach: Act as a technical mentor and point of escalation for L1 and L2 SREs, fostering their growth through code reviews and paired work. Architect for the Future: Contribute your own ideas to the product backlog and play an active role in the architectural design for the migration to Google Cloud Platform (GCP). What We're Looking For 3-5+ years of professional experience in a Site Reliability, DevOps, Software, or Systems Engineering role. Strong, hands-on experience administering and troubleshooting Linux-based production systems. Proficient programming skills in a language like Python or Go, with a track record of automating complex operational tasks. Proven ability to lead technical initiatives and solve complex problems with a high degree of autonomy. Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences. A proactive and ownership-oriented mindset. Desirable Skills Cloud Platforms: Deep experience with Google Cloud Platform (GCP), especially GCE, GKE, and cloud networking. Monitoring Tools: Expertise in designing and managing monitoring stacks (e.g., Prometheus, Grafana, OpenTelemetry). Distributed Systems: Strong practical knowledge of building and maintaining large-scale distributed systems. Containerisation: Advanced experience with Kubernetes and Docker in a production environment. Networking: Solid understanding of networking protocols (HTTP, TCP/UDP, IP) and network architecture. Domain Knowledge: Experience in financial markets, low-latency systems, or with message-oriented middleware. Why CME Group Be part of a global leader in financial services technology. Work on cutting-edge technology in a collaborative and innovative culture. Receive a competitive compensation and benefits package. Grow your career in SRE with an organisation committed to this modern approach.Join CME Group and play a crucial role in ensuring the stability and performance of our global trading applications. Apply now to be a part of our dynamic SRE team! Company Benefits: Bonus Programme Generous shift allowance Equity Programme Employee Stock Purchase Plan (ESPP) Private Medical and Dental coverage Mental Health Benefit Programme Group Pension Plan Income Protection Life Assurance Cycle To Work EV Car Benefit Scheme Gym Membership Family Leave Education Assistance - MBA/Advanced Degree/Bachelor Degree Ongoing Employee Development Training/Certification Hybrid Working
26/06/2026
Full time
Site Reliability Engineer IIIApplylocations: Belfast - Millennium Housetime type: Full timeposted on: Posted Todayjob requisition id: 33993 Site Reliability Engineer III/SRE III (Tue - Sat) CME Group is seeking a Site Reliability Engineer III (Tue - Sat) to take a key role in building, operating, and scaling systems in our Markets portfolio. As an SRE III, you will apply your experience to the complex challenges of the CME Globex trading platform, where our systems deliver an exceptional combination of low-latency performance and rock-solid reliability .You will work with senior engineers on complex projects, take ownership of key reliability initiatives, and act as a mentor to junior colleagues, helping to shape the team's technical direction. Key Responsibilities Own Observability: Design, build, and refine monitoring, alerting, and observability solutions. Drive the continuous improvement of our SLIs & SLOs to enable faster issue detection and resolution. Drive Reliability Projects: Take ownership of reliability-focused projects from design to implementation, collaborating with product teams to ensure new features are scalable, resilient, and safe. Lead Technical Solutions: Lead technical discussions for your work, presenting solution options and proposals with clear trade-offs. Automate Intelligently: Proactively identify and eliminate toil through robust automation, improving both system reliability and team velocity. Manage Incidents: Take a leading role in incident response, owning the resolution of significant incidents, ensuring rapid system recovery, and driving meaningful action from blameless post-mortems. Mentor & Coach: Act as a technical mentor and point of escalation for L1 and L2 SREs, fostering their growth through code reviews and paired work. Architect for the Future: Contribute your own ideas to the product backlog and play an active role in the architectural design for the migration to Google Cloud Platform (GCP). What We're Looking For 3-5+ years of professional experience in a Site Reliability, DevOps, Software, or Systems Engineering role. Strong, hands-on experience administering and troubleshooting Linux-based production systems. Proficient programming skills in a language like Python or Go, with a track record of automating complex operational tasks. Proven ability to lead technical initiatives and solve complex problems with a high degree of autonomy. Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences. A proactive and ownership-oriented mindset. Desirable Skills Cloud Platforms: Deep experience with Google Cloud Platform (GCP), especially GCE, GKE, and cloud networking. Monitoring Tools: Expertise in designing and managing monitoring stacks (e.g., Prometheus, Grafana, OpenTelemetry). Distributed Systems: Strong practical knowledge of building and maintaining large-scale distributed systems. Containerisation: Advanced experience with Kubernetes and Docker in a production environment. Networking: Solid understanding of networking protocols (HTTP, TCP/UDP, IP) and network architecture. Domain Knowledge: Experience in financial markets, low-latency systems, or with message-oriented middleware. Why CME Group Be part of a global leader in financial services technology. Work on cutting-edge technology in a collaborative and innovative culture. Receive a competitive compensation and benefits package. Grow your career in SRE with an organisation committed to this modern approach.Join CME Group and play a crucial role in ensuring the stability and performance of our global trading applications. Apply now to be a part of our dynamic SRE team! Company Benefits: Bonus Programme Generous shift allowance Equity Programme Employee Stock Purchase Plan (ESPP) Private Medical and Dental coverage Mental Health Benefit Programme Group Pension Plan Income Protection Life Assurance Cycle To Work EV Car Benefit Scheme Gym Membership Family Leave Education Assistance - MBA/Advanced Degree/Bachelor Degree Ongoing Employee Development Training/Certification Hybrid Working
GCP Platform Engineer Location: Hybrid / Manchester Salary: Up to £80,000 Benefits Type: Permanent We're partnering with an innovative platform business that is investing heavily in its cloud infrastructure and engineering capabilities. They are looking for an experienced GCP Platform Engineer to help build, automate and scale a modern cloud platform used across multiple products and teams. This is an opportunity to join a collaborative engineering environment where infrastructure is treated as code, automation is a priority, and engineers are empowered to drive technical decisions. The Role As a GCP Platform Engineer, you will be responsible for designing, building and maintaining the company's cloud platform on Google Cloud Platform (GCP). You'll work closely with software engineers and DevOps teams to create scalable, secure and highly available infrastructure. Key responsibilities Designing and implementing cloud infrastructure on GCP Building and maintaining Infrastructure as Code using Terraform Automating infrastructure provisioning and deployment pipelines Managing Kubernetes and containerised workloads Implementing monitoring, logging and observability solutions Driving platform reliability, security and best practices Collaborating with engineering teams to improve developer experience Skills & Experience Essential: Strong commercial experience with Google Cloud Platform (GCP) Extensive experience with Terraform and Infrastructure as Code Experience building CI/CD pipelines Knowledge of Kubernetes / GKE and container technologies Experience with Linux and scripting (Bash, Python or Go) Understanding of networking, IAM and cloud security principles Experience with monitoring and observability tooling Desirable: Experience with GitOps practices Knowledge of Prometheus, Grafana or similar tools Experience in a platform engineering or SRE environment Certifications in GCP are advantageous What's On Offer Salary up to £80,000 Flexible hybrid Generous holiday allowance Pension scheme Training and certification budget Opportunity to shape a growing cloud platform and influence technical direction Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and creating an inclusive and equitable environment for employees and applicants. We actively celebrate and embrace differences, including but not limited to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran status, and disability. We encourage applications from individuals of all backgrounds and experiences and all will be considered for employment without discrimination. At Eligo Recruitment diversity, equity and inclusion is integral to achieving our mission to ensure every workplace reflects the richness of human diversity.
26/06/2026
Full time
GCP Platform Engineer Location: Hybrid / Manchester Salary: Up to £80,000 Benefits Type: Permanent We're partnering with an innovative platform business that is investing heavily in its cloud infrastructure and engineering capabilities. They are looking for an experienced GCP Platform Engineer to help build, automate and scale a modern cloud platform used across multiple products and teams. This is an opportunity to join a collaborative engineering environment where infrastructure is treated as code, automation is a priority, and engineers are empowered to drive technical decisions. The Role As a GCP Platform Engineer, you will be responsible for designing, building and maintaining the company's cloud platform on Google Cloud Platform (GCP). You'll work closely with software engineers and DevOps teams to create scalable, secure and highly available infrastructure. Key responsibilities Designing and implementing cloud infrastructure on GCP Building and maintaining Infrastructure as Code using Terraform Automating infrastructure provisioning and deployment pipelines Managing Kubernetes and containerised workloads Implementing monitoring, logging and observability solutions Driving platform reliability, security and best practices Collaborating with engineering teams to improve developer experience Skills & Experience Essential: Strong commercial experience with Google Cloud Platform (GCP) Extensive experience with Terraform and Infrastructure as Code Experience building CI/CD pipelines Knowledge of Kubernetes / GKE and container technologies Experience with Linux and scripting (Bash, Python or Go) Understanding of networking, IAM and cloud security principles Experience with monitoring and observability tooling Desirable: Experience with GitOps practices Knowledge of Prometheus, Grafana or similar tools Experience in a platform engineering or SRE environment Certifications in GCP are advantageous What's On Offer Salary up to £80,000 Flexible hybrid Generous holiday allowance Pension scheme Training and certification budget Opportunity to shape a growing cloud platform and influence technical direction Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and creating an inclusive and equitable environment for employees and applicants. We actively celebrate and embrace differences, including but not limited to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran status, and disability. We encourage applications from individuals of all backgrounds and experiences and all will be considered for employment without discrimination. At Eligo Recruitment diversity, equity and inclusion is integral to achieving our mission to ensure every workplace reflects the richness of human diversity.
Join us as a Software Engineer, to support the delivery and maintenance of reliable technology solutions within our engineering teams, contributing to the effective operation of business systems. This role sits within a team responsible for releasing changes to a widely used internal platform supporting around 40,000 colleagues, enabling critical day-to-day operations. You will work across the full software development lifecycle, with a focus on supporting and improving application releases in a structured and controlled way. You will also contribute to enhancing how applications are monitored and supported, helping to identify issues early and improve visibility of system performance. This is an opportunity to work on large-scale implementations while collaborating with technical, business and product teams to ensure clear communication and alignment. To be successful as a Software Engineer, you should have: Experience working across an end-to-end software development lifecycle, with an understanding of how changes are delivered and supported in production environments. The ability to apply problem solving and judgement when considering different options before releasing live changes. Familiarity with tools such as Service First, JIRA and Confluence to organise work, track progress and support effective team collaboration. The ability to work with a range of stakeholders across technical, business and product teams, helping to communicate status, explain changes and support shared understanding. Some other highly valued skills may include: Awareness of Site Reliability Engineering (SRE) concepts and an interest in developing skills in areas such as automation, observability and system reliability. Awareness of software development using Java or JavaScript technologies such as MVC, React or Angular. Experience working with tools such as SQL or GitLab to support development and release activities. You may be assessed on the key critical skills relevant for success in this role, such as risk and controls, change and transformation, business acumen, strategic thinking and digital and technology, as well as job-specific technical skills. This role will be based in Knutsford. Purpose of the role To design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. Accountabilities Development and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and tools. Ensuring that code is scalable, maintainable, and optimized for performance. Cross-functional collaboration with product managers, designers, and other engineers to define software requirements, devise solution strategies, and ensure seamless integration and alignment with business objectives. Collaboration with peers, participate in code reviews, and promote a culture of code quality and knowledge sharing. Stay informed of industry technology trends and innovations and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth. Adherence to secure coding practices to mitigate vulnerabilities, protect sensitive data, and ensure secure software solutions. Implementation of effective unit testing practices to ensure proper code design, readability, and reliability. Analyst Expectations To perform prescribed activities in a timely manner and to a high standard consistently driving continuous improvement. Requires in-depth technical knowledge and experience in their assigned area of expertise. Thorough understanding of the underlying principles and concepts within the area of expertise. They lead and supervise a team, guiding and supporting professional development, allocating work requirements and coordinating team resources. If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: Listen and be authentic, Energise and inspire, Align across the enterprise, Develop others. OR for an individual contributor, they develop technical expertise in work area, acting as an advisor where appropriate. Will have impact on the work of related teams within the area. Partner with other functions and business areas. Takes responsibility for end results of a team's operational processing and activities. Escalate breaches of policies / procedure appropriately. Take responsibility for embedding new policies/ procedures adopted due to risk mitigation. Advise and influence decision making within own area of expertise. Take ownership for managing risk and strengthening controls in relation to the work you own or contribute to. Deliver your work and areas of responsibility in line with relevant rules, regulation and codes of conduct. Maintain and continually build an understanding of how own sub-function integrates with function, alongside knowledge of the organisations products, services and processes within the function. Demonstrate understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function. Make evaluative judgements based on the analysis of factual information, paying attention to detail. Resolve problems by identifying and selecting solutions through the application of acquired technical experience and will be guided by precedents. Guide and persuade team members and communicate complex / sensitive information. Act as contact point for stakeholders outside of the immediate function, while building a network of contacts outside team and external to the organisation. All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave.
24/06/2026
Full time
Join us as a Software Engineer, to support the delivery and maintenance of reliable technology solutions within our engineering teams, contributing to the effective operation of business systems. This role sits within a team responsible for releasing changes to a widely used internal platform supporting around 40,000 colleagues, enabling critical day-to-day operations. You will work across the full software development lifecycle, with a focus on supporting and improving application releases in a structured and controlled way. You will also contribute to enhancing how applications are monitored and supported, helping to identify issues early and improve visibility of system performance. This is an opportunity to work on large-scale implementations while collaborating with technical, business and product teams to ensure clear communication and alignment. To be successful as a Software Engineer, you should have: Experience working across an end-to-end software development lifecycle, with an understanding of how changes are delivered and supported in production environments. The ability to apply problem solving and judgement when considering different options before releasing live changes. Familiarity with tools such as Service First, JIRA and Confluence to organise work, track progress and support effective team collaboration. The ability to work with a range of stakeholders across technical, business and product teams, helping to communicate status, explain changes and support shared understanding. Some other highly valued skills may include: Awareness of Site Reliability Engineering (SRE) concepts and an interest in developing skills in areas such as automation, observability and system reliability. Awareness of software development using Java or JavaScript technologies such as MVC, React or Angular. Experience working with tools such as SQL or GitLab to support development and release activities. You may be assessed on the key critical skills relevant for success in this role, such as risk and controls, change and transformation, business acumen, strategic thinking and digital and technology, as well as job-specific technical skills. This role will be based in Knutsford. Purpose of the role To design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. Accountabilities Development and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and tools. Ensuring that code is scalable, maintainable, and optimized for performance. Cross-functional collaboration with product managers, designers, and other engineers to define software requirements, devise solution strategies, and ensure seamless integration and alignment with business objectives. Collaboration with peers, participate in code reviews, and promote a culture of code quality and knowledge sharing. Stay informed of industry technology trends and innovations and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth. Adherence to secure coding practices to mitigate vulnerabilities, protect sensitive data, and ensure secure software solutions. Implementation of effective unit testing practices to ensure proper code design, readability, and reliability. Analyst Expectations To perform prescribed activities in a timely manner and to a high standard consistently driving continuous improvement. Requires in-depth technical knowledge and experience in their assigned area of expertise. Thorough understanding of the underlying principles and concepts within the area of expertise. They lead and supervise a team, guiding and supporting professional development, allocating work requirements and coordinating team resources. If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: Listen and be authentic, Energise and inspire, Align across the enterprise, Develop others. OR for an individual contributor, they develop technical expertise in work area, acting as an advisor where appropriate. Will have impact on the work of related teams within the area. Partner with other functions and business areas. Takes responsibility for end results of a team's operational processing and activities. Escalate breaches of policies / procedure appropriately. Take responsibility for embedding new policies/ procedures adopted due to risk mitigation. Advise and influence decision making within own area of expertise. Take ownership for managing risk and strengthening controls in relation to the work you own or contribute to. Deliver your work and areas of responsibility in line with relevant rules, regulation and codes of conduct. Maintain and continually build an understanding of how own sub-function integrates with function, alongside knowledge of the organisations products, services and processes within the function. Demonstrate understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function. Make evaluative judgements based on the analysis of factual information, paying attention to detail. Resolve problems by identifying and selecting solutions through the application of acquired technical experience and will be guided by precedents. Guide and persuade team members and communicate complex / sensitive information. Act as contact point for stakeholders outside of the immediate function, while building a network of contacts outside team and external to the organisation. All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave.
What's Exciting About The Role This is a chance to join Verifone and be a part of the team building the tooling, automation and infrastructure that power secure, high volume payment services used globally. You'll work hands on with technologies like Kubernetes, ELK, Prometheus, Python/Go and modern CI/CD frameworks to shape how software is delivered at scale. Your impact will be visible: from improving observability and reliability across our payment platforms to driving an automate first culture that removes manual effort and accelerates delivery for engineering teams. Because this role sits at the intersection of DevOps, Platform Engineering and SRE principles, you'll have the opportunity to influence architecture, introduce best practices, and mentor teams across regions. It's a role for someone who enjoys solving complex problems, building resilient systems, and helping an organisation evolve into a truly modern engineering environment. If you're excited by ownership, technical depth, and the chance to make a meaningful difference in a global payments ecosystem, this role offers the scale, challenge and growth you're looking for. Key Responsibilities: Support the design and deployment of applications and tooling to support Verifone's evolution from traditional Application Support and Development teams to scalable and modernized Platform and Software Engineering functions Manage CI/CD pipelines with Verifone's Software Engineering Teams and support the streamlining of software development lifecycle processes Demonstrate an 'automate-first' mindset, leading initiatives to automate manual maintenance and repeatable tasks and critical business requests in addition to environment deployment Recommend, design and execute optimizations to improve the observability, performance and overall reliability of Verifone's payment platforms Develop and maintain monitoring and logging solutions to ensure visibility into application performance and security Advocate DevOps culture, mentoring peers within Application Support Teams Create and maintain technical documentation to support the deployment of DevOps fundamentals driven throughout our regions Collaborate with cross-functional teams (DevOps, Development, Operations) to ensure seamless deployment of new features and services Skills and Experience We're Looking For 5+ years' experience with DevOps / Platform Engineering within production environments Expert knowledge of operating, configuring and optimizing enterprise monitoring systems such as ELK, New Relic, Grafana, Prometheus etc. Strong knowledge of Kubernetes Strong programming/scripting skills in a modern language like Python, Go, or Rust Strong, hands on knowledge of DevOps tooling, concepts, build/release process Strong understanding of Linux and Windows systems administration Strong written and verbal communication skills Proven track record of successfully implementing large scale CI/CD pipelines and automated testing frameworks Strong experience providing on call support Working knowledge of Infrastructure as Code (IAC), Configuration Management (CM) Strong awareness of release and deployment best practices, SDLC and security best practices Knowledge of private and public cloud networking fundamentals Strong problem solving and analytical skills Nice to Have Experience working within financial services industry, particularly exposure to payment solutions Knowledge of PCI DSS standards and regulations Experience in application Development Knowledge of PKI infrastructure and certificates Experience of Hardware Security Modules (HSMs) Our Commitment Verifone is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. Verifone is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
24/06/2026
Full time
What's Exciting About The Role This is a chance to join Verifone and be a part of the team building the tooling, automation and infrastructure that power secure, high volume payment services used globally. You'll work hands on with technologies like Kubernetes, ELK, Prometheus, Python/Go and modern CI/CD frameworks to shape how software is delivered at scale. Your impact will be visible: from improving observability and reliability across our payment platforms to driving an automate first culture that removes manual effort and accelerates delivery for engineering teams. Because this role sits at the intersection of DevOps, Platform Engineering and SRE principles, you'll have the opportunity to influence architecture, introduce best practices, and mentor teams across regions. It's a role for someone who enjoys solving complex problems, building resilient systems, and helping an organisation evolve into a truly modern engineering environment. If you're excited by ownership, technical depth, and the chance to make a meaningful difference in a global payments ecosystem, this role offers the scale, challenge and growth you're looking for. Key Responsibilities: Support the design and deployment of applications and tooling to support Verifone's evolution from traditional Application Support and Development teams to scalable and modernized Platform and Software Engineering functions Manage CI/CD pipelines with Verifone's Software Engineering Teams and support the streamlining of software development lifecycle processes Demonstrate an 'automate-first' mindset, leading initiatives to automate manual maintenance and repeatable tasks and critical business requests in addition to environment deployment Recommend, design and execute optimizations to improve the observability, performance and overall reliability of Verifone's payment platforms Develop and maintain monitoring and logging solutions to ensure visibility into application performance and security Advocate DevOps culture, mentoring peers within Application Support Teams Create and maintain technical documentation to support the deployment of DevOps fundamentals driven throughout our regions Collaborate with cross-functional teams (DevOps, Development, Operations) to ensure seamless deployment of new features and services Skills and Experience We're Looking For 5+ years' experience with DevOps / Platform Engineering within production environments Expert knowledge of operating, configuring and optimizing enterprise monitoring systems such as ELK, New Relic, Grafana, Prometheus etc. Strong knowledge of Kubernetes Strong programming/scripting skills in a modern language like Python, Go, or Rust Strong, hands on knowledge of DevOps tooling, concepts, build/release process Strong understanding of Linux and Windows systems administration Strong written and verbal communication skills Proven track record of successfully implementing large scale CI/CD pipelines and automated testing frameworks Strong experience providing on call support Working knowledge of Infrastructure as Code (IAC), Configuration Management (CM) Strong awareness of release and deployment best practices, SDLC and security best practices Knowledge of private and public cloud networking fundamentals Strong problem solving and analytical skills Nice to Have Experience working within financial services industry, particularly exposure to payment solutions Knowledge of PCI DSS standards and regulations Experience in application Development Knowledge of PKI infrastructure and certificates Experience of Hardware Security Modules (HSMs) Our Commitment Verifone is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. Verifone is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Senior Software Engineer, Site Reliability Engineering, Distributed Cloud Google London, UK Bachelor's degree in Computer Science, a related field, or equivalent practical experience. 5 years of experience with software development in one or more programming languages. 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems. 2 years of experience leading projects and providing technical leadership. Preferred qualifications Master's degree in Computer Science or Engineering. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large scale, massively distributed, fault tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally visible systems-have reliability, uptime appropriate to customers' needs and a fast rate of improvement. Additionally, SRE will keep an ever watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex issues of scale unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Responsibilities Engage in and improve the whole life cycle of services, from inception and design, through to deployment, operation and refinement. Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity. Practice sustainable incident response and blameless post mortems. Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity or expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, including breastfeeding, expecting or parents to be, criminal histories consistent with legal requirements, or any other basis protected by law.
24/06/2026
Full time
Senior Software Engineer, Site Reliability Engineering, Distributed Cloud Google London, UK Bachelor's degree in Computer Science, a related field, or equivalent practical experience. 5 years of experience with software development in one or more programming languages. 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems. 2 years of experience leading projects and providing technical leadership. Preferred qualifications Master's degree in Computer Science or Engineering. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large scale, massively distributed, fault tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally visible systems-have reliability, uptime appropriate to customers' needs and a fast rate of improvement. Additionally, SRE will keep an ever watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex issues of scale unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Responsibilities Engage in and improve the whole life cycle of services, from inception and design, through to deployment, operation and refinement. Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity. Practice sustainable incident response and blameless post mortems. Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity or expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, including breastfeeding, expecting or parents to be, criminal histories consistent with legal requirements, or any other basis protected by law.
At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law. About us Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems. Our vision is to create autonomy that propels the world forward. Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving. In our fast-paced environment big problems ignite us-we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future. At Wayve, your contributions matter. We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact. Make Wayve the experience that defines your career! The role As a Cloud Site Reliability Engineer at Wayve, you will build and scale the reliability foundations of our AI cloud platform. This includes our Model Development Platform (powering end-to-end model development from raw data to on-road experimentation) and our GPU Compute platform (large-scale, multi-tenant GPU fleets and scheduling systems driving model training and inference at scale). This is a founding Cloud SRE role. You won't inherit a mature SRE function, you'll help create it. You will define the frameworks, automation, and operational standards that ensure our model development infrastructure, distributed systems, and large compute clusters operate predictably, efficiently, and at scale. This role sits at the intersection of AI research, large-scale cloud infrastructure, and production operations. Your work will directly enable faster model training, reliable experimentation, and scalable AI deployment by ensuring our cloud infrastructure is resilient and performant. Key responsibilities Reliability & Platform Ownership Own the reliability, availability, and performance of the Model Dev Platform and GPU Compute environments. Define and operationalise SLOs, SLIs, and error budgets across platform services. Improve capacity planning, scaling strategies, and resource efficiency across large GPU-backed clusters. Partner with ML, platform, and software teams to establish clear production readiness standards. Incident Response & On-Call Participate in a 24/7 on-call rotation as first-line response for cloud and cluster-related incidents. Lead incident triage, escalation, communications, and root cause analysis. Translate post-incident learning into durable architectural or automation improvements. Continuously reduce alert noise and recurring operational burden. Observability & Operational Excellence Design and operate monitoring, logging, tracing, and alerting systems that enable rapid detection and recovery. Build dashboards that reflect real user-centric platform health (not just infrastructure metrics). Improve deployment safety through better change management, validation, and rollback mechanisms. Automation & Tooling Build automation for cluster operations, training workflows, remediation, and scaling tasks. Implement self healing patterns and resilient recovery workflows. Harden CI/CD and release processes to improve deployment safety and velocity. Support infrastructure as code and policy driven guardrails to ensure secure, reliable cloud environments. About you In order to set you up for success as a Cloud Site Reliability Engineer at Wayve, we're looking for the following skills and experience. Essential skills Proven experience in an SRE, Production Engineer, or Cloud Reliability role supporting large-scale cloud systems. Strong Kubernetes experience, including operating production clusters. Hands on experience running production workloads in AWS, GCP, or Azure. Experience operating complex distributed systems in production, ideally including compute-heavy or high-performance workloads. Experience working with large compute clusters; exposure to AI/ML training or inference workloads strongly preferred. Strong Linux fundamentals and proficiency in at least one scripting or systems language (e.g., Python, Go, C++) with a bias toward automation. Deep troubleshooting skills across networking, storage, distributed systems, and performance at scale. Experience designing and operating observability stacks (e.g., Datadog, Prometheus, Grafana, OpenTelemetry). Clear communication skills, including leading incidents, writing post mortems, and influencing teams to prioritise reliability improvements. Desirable skills Experience operating GPU backed environments or large scale ML infrastructure. Experience running model training or inference pipelines in production (MLOps). Familiarity with infrastructure as code (e.g., Terraform) and secure cloud production environments. Experience defining and running SLOs/SLIs and building reliability programs across multiple teams. Experience as an early or founding SRE hire establishing processes from scratch. Interest in helping shape and grow a Cloud SRE function, with potential to take on leadership responsibilities over time. This is a full time role based in our office in London (2 days a week in the office). At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home. Wayve is committed to creating an inclusive interview experience. If you require any accommodations or adjustments to participate fully in our interview process, please let us know.
24/06/2026
Full time
At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law. About us Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems. Our vision is to create autonomy that propels the world forward. Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving. In our fast-paced environment big problems ignite us-we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future. At Wayve, your contributions matter. We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact. Make Wayve the experience that defines your career! The role As a Cloud Site Reliability Engineer at Wayve, you will build and scale the reliability foundations of our AI cloud platform. This includes our Model Development Platform (powering end-to-end model development from raw data to on-road experimentation) and our GPU Compute platform (large-scale, multi-tenant GPU fleets and scheduling systems driving model training and inference at scale). This is a founding Cloud SRE role. You won't inherit a mature SRE function, you'll help create it. You will define the frameworks, automation, and operational standards that ensure our model development infrastructure, distributed systems, and large compute clusters operate predictably, efficiently, and at scale. This role sits at the intersection of AI research, large-scale cloud infrastructure, and production operations. Your work will directly enable faster model training, reliable experimentation, and scalable AI deployment by ensuring our cloud infrastructure is resilient and performant. Key responsibilities Reliability & Platform Ownership Own the reliability, availability, and performance of the Model Dev Platform and GPU Compute environments. Define and operationalise SLOs, SLIs, and error budgets across platform services. Improve capacity planning, scaling strategies, and resource efficiency across large GPU-backed clusters. Partner with ML, platform, and software teams to establish clear production readiness standards. Incident Response & On-Call Participate in a 24/7 on-call rotation as first-line response for cloud and cluster-related incidents. Lead incident triage, escalation, communications, and root cause analysis. Translate post-incident learning into durable architectural or automation improvements. Continuously reduce alert noise and recurring operational burden. Observability & Operational Excellence Design and operate monitoring, logging, tracing, and alerting systems that enable rapid detection and recovery. Build dashboards that reflect real user-centric platform health (not just infrastructure metrics). Improve deployment safety through better change management, validation, and rollback mechanisms. Automation & Tooling Build automation for cluster operations, training workflows, remediation, and scaling tasks. Implement self healing patterns and resilient recovery workflows. Harden CI/CD and release processes to improve deployment safety and velocity. Support infrastructure as code and policy driven guardrails to ensure secure, reliable cloud environments. About you In order to set you up for success as a Cloud Site Reliability Engineer at Wayve, we're looking for the following skills and experience. Essential skills Proven experience in an SRE, Production Engineer, or Cloud Reliability role supporting large-scale cloud systems. Strong Kubernetes experience, including operating production clusters. Hands on experience running production workloads in AWS, GCP, or Azure. Experience operating complex distributed systems in production, ideally including compute-heavy or high-performance workloads. Experience working with large compute clusters; exposure to AI/ML training or inference workloads strongly preferred. Strong Linux fundamentals and proficiency in at least one scripting or systems language (e.g., Python, Go, C++) with a bias toward automation. Deep troubleshooting skills across networking, storage, distributed systems, and performance at scale. Experience designing and operating observability stacks (e.g., Datadog, Prometheus, Grafana, OpenTelemetry). Clear communication skills, including leading incidents, writing post mortems, and influencing teams to prioritise reliability improvements. Desirable skills Experience operating GPU backed environments or large scale ML infrastructure. Experience running model training or inference pipelines in production (MLOps). Familiarity with infrastructure as code (e.g., Terraform) and secure cloud production environments. Experience defining and running SLOs/SLIs and building reliability programs across multiple teams. Experience as an early or founding SRE hire establishing processes from scratch. Interest in helping shape and grow a Cloud SRE function, with potential to take on leadership responsibilities over time. This is a full time role based in our office in London (2 days a week in the office). At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home. Wayve is committed to creating an inclusive interview experience. If you require any accommodations or adjustments to participate fully in our interview process, please let us know.
Selection changes the language of the page/content Site Reliability Engineer, iCloud London, England, United Kingdom Software and Services People at Apple don't just build products - they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you've used Apple products, you've likely interacted with us. Apple Services Site Reliability Engineering (SRE) teams are responsible for the systems and services that directly support those customers and their experiences. We are looking for an SRE with experience in building and supporting highly available customer-facing services. Description Apple Services' scale is BIG. Operating at our scale, across multiple geographies and servicing hundreds of millions of users presents unique challenges. As a Software Developer in SRE at Apple, you'll need to solve these problems using data, teamwork, and your own expertise. ASE Products Site Reliability teams are responsible for the reliability and performance of the server software stack that powers products like iCloud Photos, Mail, Drive, Backup and many more. We do that by focusing on reliability best practices from service inception to production, collaborating deeply with product development teams to deliver a superlative product and shared vision while leveraging data and automation as first principles. We run a mix of open source, vendor licensed, and internally developed tools to manage the end to end SDLC of our products. You'll learn these tools and have opportunities to improve them. Responsibilities Engage with our product teams to understand requirements, design and implement resilient and scalable infrastructure solutions. Operate, monitor, and triage all aspects of our production and non-production environments. Collaborate on code, infrastructure, design reviews, and process enhancements Evaluate and integrate new technologies to improve system reliability, security, and performance. Develop and implement automation to provision, configure, deploy, and monitor Apple services. Participate in an oncall rotation providing hands on technical expertise during service impacting events. Contribute to capacity planning, scale testing, and disaster recovery exercises Approach operational problems with a software engineering mindset. Minimum Qualifications Strong sense of ownership, customer service, and integrity proven through clear communication. BS in Computer Science or related field, or equivalent employment 5 + years experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment Strong experience with deploying, supporting and supervising new and existing services, platforms, and application stacks Experience with scale testing, disaster recovery, and capacity planning Experience with observability platforms with Splunk, Grafana, Prometheus. Demonstrable fluency in at least one of the following languages: Java, Python, or Go. Experience with Kubernetes, Nginx, Envoy, Prometheus, and/or Docker. Preferred Qualifications Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies. Understanding of the Linux Operating System, including Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals. Experience in developing iOS apps using Xcode and Swift. Experience in OpenTelemetry Standards / distributed tracing like jaeger At Apple, we believe in treating all applicants fairly and equally. Because to create products that serve everyone, we believe in including everyone. We are committed to treating all applicants fairly and equally. As a registered Disability Confident employer, we will work with applicants to make any reasonable accommodations. Apple will consider for employment all qualified applicants with criminal backgrounds in a manner consistent with applicable law. At Apple, we believe accessibility is a fundamental human right. You'll find that idea reflected in everything here - in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.
24/06/2026
Full time
Selection changes the language of the page/content Site Reliability Engineer, iCloud London, England, United Kingdom Software and Services People at Apple don't just build products - they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you've used Apple products, you've likely interacted with us. Apple Services Site Reliability Engineering (SRE) teams are responsible for the systems and services that directly support those customers and their experiences. We are looking for an SRE with experience in building and supporting highly available customer-facing services. Description Apple Services' scale is BIG. Operating at our scale, across multiple geographies and servicing hundreds of millions of users presents unique challenges. As a Software Developer in SRE at Apple, you'll need to solve these problems using data, teamwork, and your own expertise. ASE Products Site Reliability teams are responsible for the reliability and performance of the server software stack that powers products like iCloud Photos, Mail, Drive, Backup and many more. We do that by focusing on reliability best practices from service inception to production, collaborating deeply with product development teams to deliver a superlative product and shared vision while leveraging data and automation as first principles. We run a mix of open source, vendor licensed, and internally developed tools to manage the end to end SDLC of our products. You'll learn these tools and have opportunities to improve them. Responsibilities Engage with our product teams to understand requirements, design and implement resilient and scalable infrastructure solutions. Operate, monitor, and triage all aspects of our production and non-production environments. Collaborate on code, infrastructure, design reviews, and process enhancements Evaluate and integrate new technologies to improve system reliability, security, and performance. Develop and implement automation to provision, configure, deploy, and monitor Apple services. Participate in an oncall rotation providing hands on technical expertise during service impacting events. Contribute to capacity planning, scale testing, and disaster recovery exercises Approach operational problems with a software engineering mindset. Minimum Qualifications Strong sense of ownership, customer service, and integrity proven through clear communication. BS in Computer Science or related field, or equivalent employment 5 + years experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment Strong experience with deploying, supporting and supervising new and existing services, platforms, and application stacks Experience with scale testing, disaster recovery, and capacity planning Experience with observability platforms with Splunk, Grafana, Prometheus. Demonstrable fluency in at least one of the following languages: Java, Python, or Go. Experience with Kubernetes, Nginx, Envoy, Prometheus, and/or Docker. Preferred Qualifications Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies. Understanding of the Linux Operating System, including Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals. Experience in developing iOS apps using Xcode and Swift. Experience in OpenTelemetry Standards / distributed tracing like jaeger At Apple, we believe in treating all applicants fairly and equally. Because to create products that serve everyone, we believe in including everyone. We are committed to treating all applicants fairly and equally. As a registered Disability Confident employer, we will work with applicants to make any reasonable accommodations. Apple will consider for employment all qualified applicants with criminal backgrounds in a manner consistent with applicable law. At Apple, we believe accessibility is a fundamental human right. You'll find that idea reflected in everything here - in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.
GCP Platform Engineer Location: Hybrid /Manchester Salary: Up to 80,000 + Benefits Type: Permanent The CompanyWe're partnering with an innovative platform business that is investing heavily in its cloud infrastructure and engineering capabilities. They are looking for an experienced GCP Platform Engineer to help build, automate and scale a modern cloud platform used across multiple products and teams. This is an opportunity to join a collaborative engineering environment where infrastructure is treated as code, automation is a priority, and engineers are empowered to drive technical decisions. The RoleAs a GCP Platform Engineer, you will be responsible for designing, building and maintaining the company's cloud platform on Google Cloud Platform (GCP). You'll work closely with software engineers and DevOps teams to create scalable, secure and highly available infrastructure. Key responsibilities include: Designing and implementing cloud infrastructure on GCP Building and maintaining Infrastructure as Code using Terraform Automating infrastructure provisioning and deployment pipelines Managing Kubernetes and containerised workloads Implementing monitoring, logging and observability solutions Driving platform reliability, security and best practices Collaborating with engineering teams to improve developer experience Skills & Experience Essential: Strong commercial experience with Google Cloud Platform (GCP) Extensive experience with Terraform and Infrastructure as Code Experience building CI/CD pipelines Knowledge of Kubernetes / GKE and container technologies Experience with Linux and scripting (Bash, Python or Go) Understanding of networking, IAM and cloud security principles Experience with monitoring and observability tooling Desirable: Experience with GitOps practices Knowledge of Prometheus, Grafana or similar tools Experience in a platform engineering or SRE environment Certifications in GCP are advantageous What's On Offer Salary up to 80,000 Flexible hybrid Generous holiday allowance Pension scheme Training and certification budget Opportunity to shape a growing cloud platform and influence technical direction If you're passionate about cloud infrastructure, automation and platform engineering, and want to work with modern GCP technologies at scale, we'd love to hear from you. Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and creating an inclusive and equitable environment for employees and applicants. We actively celebrate and embrace differences, including but not limited to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran status, and disability. We encourage applications from individuals of all backgrounds and experiences and all will be considered for employment without discrimination. At Eligo Recruitment diversity, equity and inclusion is integral to achieving our mission to ensure every workplace reflects the richness of human diversity.
24/06/2026
Full time
GCP Platform Engineer Location: Hybrid /Manchester Salary: Up to 80,000 + Benefits Type: Permanent The CompanyWe're partnering with an innovative platform business that is investing heavily in its cloud infrastructure and engineering capabilities. They are looking for an experienced GCP Platform Engineer to help build, automate and scale a modern cloud platform used across multiple products and teams. This is an opportunity to join a collaborative engineering environment where infrastructure is treated as code, automation is a priority, and engineers are empowered to drive technical decisions. The RoleAs a GCP Platform Engineer, you will be responsible for designing, building and maintaining the company's cloud platform on Google Cloud Platform (GCP). You'll work closely with software engineers and DevOps teams to create scalable, secure and highly available infrastructure. Key responsibilities include: Designing and implementing cloud infrastructure on GCP Building and maintaining Infrastructure as Code using Terraform Automating infrastructure provisioning and deployment pipelines Managing Kubernetes and containerised workloads Implementing monitoring, logging and observability solutions Driving platform reliability, security and best practices Collaborating with engineering teams to improve developer experience Skills & Experience Essential: Strong commercial experience with Google Cloud Platform (GCP) Extensive experience with Terraform and Infrastructure as Code Experience building CI/CD pipelines Knowledge of Kubernetes / GKE and container technologies Experience with Linux and scripting (Bash, Python or Go) Understanding of networking, IAM and cloud security principles Experience with monitoring and observability tooling Desirable: Experience with GitOps practices Knowledge of Prometheus, Grafana or similar tools Experience in a platform engineering or SRE environment Certifications in GCP are advantageous What's On Offer Salary up to 80,000 Flexible hybrid Generous holiday allowance Pension scheme Training and certification budget Opportunity to shape a growing cloud platform and influence technical direction If you're passionate about cloud infrastructure, automation and platform engineering, and want to work with modern GCP technologies at scale, we'd love to hear from you. Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and creating an inclusive and equitable environment for employees and applicants. We actively celebrate and embrace differences, including but not limited to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran status, and disability. We encourage applications from individuals of all backgrounds and experiences and all will be considered for employment without discrimination. At Eligo Recruitment diversity, equity and inclusion is integral to achieving our mission to ensure every workplace reflects the richness of human diversity.
Please note this posting is to advertise potential job opportunities. This exact role may not be open today but could open in the near future. When you apply, a Cisco representative may contact you directly if a relevant position opens. Start Date: as soon as possible Location: Feltham, United Kingdom (Hybrid work approach, working from the Feltham office 1-2 days per week.) Meet the Team We at Cisco are looking for a Site Reliability Engineer, with a passion for technology and solid academic foundations in analytical disciplines. Cisco is a strong advocate of using its own enterprise networking, datacenter, collaboration products, and solutions internally; Cisco IT deploys all these technologies - the result being that Cisco IT accrues a great deal of experience in how to design, deploy, operate, and automate these solutions within a large global enterprise. In the Network Engineering Core Team, we are responsible for connecting our offices to our enterprise network across Cisco. We maintain and support the Wan and Core infrastructure, alongside several hardware and software remote access solutions with an Agile, SRE mindset and have lots of fun along the way. Your Impact As a Site Reliability Engineer, daily activities of the role involve working within a large global team of DevOps Network Engineers, Product Owners, and Product Managers to enable the efficient running of all Cisco offices and remote/hybrid working solutions. You'll also have the opportunity to work on a variety of different projects across our technology portfolio. Activities include but are not limited to: Use creative problem-solving to provide Cisco with advanced, essential business capabilities. Developing technical prototype environments and concepts. Supporting existing platforms and network solutions,including but not limited toWAN, LAN, and Core. Single working/or part of a team dependentofthe project, using theSAFemethodology. Identify and work on areas that can be automated to streamline processes within the team. There will be some on call work required as you become familiar with our network but this is limited to 1 week in every 6 which will cover the working day during the week and include the weekend. Minimum Qualifications We are looking for someone that can demonstrate thefollowing; Including but not limited to a recent/upcoming graduate of a Bachelor's degree (or higher) or a certification program (e.g. a Bootcamp or Apprenticeship). Equivalent experience accepted in lieu of these. Demonstrate a keen interest in some of the following technologies: Networking (Routing,Switching, and WAN/SDWAN) Automation / Programming-i.e.Python,Ansible,REST, APIsare advantageous but not essential Virtualisation Technologies-VMware, OpenStack, Dockerare advantageous Able to legally live and work in the country for which you're applying Preferred Qualifications Strong analytical mind-set Familiarity with design concepts Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you.
23/06/2026
Full time
Please note this posting is to advertise potential job opportunities. This exact role may not be open today but could open in the near future. When you apply, a Cisco representative may contact you directly if a relevant position opens. Start Date: as soon as possible Location: Feltham, United Kingdom (Hybrid work approach, working from the Feltham office 1-2 days per week.) Meet the Team We at Cisco are looking for a Site Reliability Engineer, with a passion for technology and solid academic foundations in analytical disciplines. Cisco is a strong advocate of using its own enterprise networking, datacenter, collaboration products, and solutions internally; Cisco IT deploys all these technologies - the result being that Cisco IT accrues a great deal of experience in how to design, deploy, operate, and automate these solutions within a large global enterprise. In the Network Engineering Core Team, we are responsible for connecting our offices to our enterprise network across Cisco. We maintain and support the Wan and Core infrastructure, alongside several hardware and software remote access solutions with an Agile, SRE mindset and have lots of fun along the way. Your Impact As a Site Reliability Engineer, daily activities of the role involve working within a large global team of DevOps Network Engineers, Product Owners, and Product Managers to enable the efficient running of all Cisco offices and remote/hybrid working solutions. You'll also have the opportunity to work on a variety of different projects across our technology portfolio. Activities include but are not limited to: Use creative problem-solving to provide Cisco with advanced, essential business capabilities. Developing technical prototype environments and concepts. Supporting existing platforms and network solutions,including but not limited toWAN, LAN, and Core. Single working/or part of a team dependentofthe project, using theSAFemethodology. Identify and work on areas that can be automated to streamline processes within the team. There will be some on call work required as you become familiar with our network but this is limited to 1 week in every 6 which will cover the working day during the week and include the weekend. Minimum Qualifications We are looking for someone that can demonstrate thefollowing; Including but not limited to a recent/upcoming graduate of a Bachelor's degree (or higher) or a certification program (e.g. a Bootcamp or Apprenticeship). Equivalent experience accepted in lieu of these. Demonstrate a keen interest in some of the following technologies: Networking (Routing,Switching, and WAN/SDWAN) Automation / Programming-i.e.Python,Ansible,REST, APIsare advantageous but not essential Virtualisation Technologies-VMware, OpenStack, Dockerare advantageous Able to legally live and work in the country for which you're applying Preferred Qualifications Strong analytical mind-set Familiarity with design concepts Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you.
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
23/06/2026
Full time
Senior Site Reliability Engineer - iManage SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams - SRE teams are anchored to iManage offices across the globe. Tuesdays and Thursdays are dedicated to in office collaboration, rapid innovation, and developing a sense of belonging at iManage. Mondays and Fridays are reserved for focus time to get things done. Have the best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage means You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data. You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization. As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on call rotations. You'll become a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events. iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you. sRE Responsibilities Eliminate TOIL through automation and software development. Partner cross functionally with application teams and internal stakeholders. Create a modern, cloud native platform that is resilient, cost effective, and secure by default. Scale cloud infrastructure to support our Kubernetes based ecosystem. Maintain the freshness and utility of platform services. Improve the security posture of our products. Design automation, orchestration, observability, and disaster readiness into our products. Participate in production support and on call rotations, providing senior level guidance during critical events. Lead incident management and post incident retrospectives, coaching teams in these practices. Qualifications Experience writing design documents, postmortems, and refactoring application code. Built automation to reduce operational burden or developed internal SaaS tools. Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively. Experience in public cloud or hosted datacenter environments (Azure and AKS preferred). A passion for collaborative teamwork and influencing reliability best practices across teams. Bonus Points Hands on experience with Linux server stacks (Ubuntu/Debian preferred). Knowledge of cloud provisioning platforms (Terraform preferred). Exposure to configuration management tools (Chef preferred). Experience with containerization/clustering technologies (Docker preferred). Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK). Practical experience with CI/CD pipelines and rollout strategies. A bachelor's degree (or equivalent experience) in Computer Engineering or related field. Proficiency in one or more programming languages (e.g., Java, Python, Golang). Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby). Benefits Creating an inclusive environment where you're encouraged to help shape the culture. Market leading salary determined through a fair and consistent process, equitable for all employees. Annual performance based bonus. Enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay). Matching pension contribution (up to 6%). Private medical insurance and cash plan. Group life cover, income protection, and critical illness protection. Flexible time off policy, 25 days of annual leave with additional flexibility. Wellness days each year to prioritize mental health and well being. Access to RethinkCare, a global behavioral health platform. We welcome those who come with a growth mindset and a hunger for learning; if you are excited about this role but your past experience doesn't align perfectly with every qualification, we encourage you to apply anyway. iManage is committed to providing an excellent candidate experience and will never ask you to engage in recruitment activity via text and exclusively communicate from emails using domain. If you have any concerns or questions about communications you have received, please send them to so our team members can review. iManage provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Your role at DynatraceDynatrace is seeking a strategic, customer-facing Field CTO to serve as a senior technical advisor to executives, enterprise architects, and transformation leaders at our most important customers and prospects. This leader will connect business priorities to technical strategy, helping organizations use Dynatrace to improve resilience, accelerate innovation, strengthen security, and drive measurable business outcomes.The Field CTO operates at the intersection of executive engagement, technical vision, sales strategy, and market influence. This role partners closely with Sales, Solution Engineering, Product Management, Customer Success, Alliances, and Marketing to shape large, strategic opportunities and elevate Dynatrace's role as a trusted transformation partner.Act as the executive technical advisor for strategic accounts, engaging CIOs, CTOs, CISOs, VP Engineering, platform teams, and business stakeholders.Translate customer business goals into compelling transformation strategies powered by Dynatrace.Lead high-impact technical discovery and executive conversations around observability, cloud modernization, AI adoption, security, automation, and business outcomes.Shape account strategy with Sales and Solution Engineering teams for complex, multi-stakeholder deals.Develop board-level and executive-level narratives that connect platform capabilities to risk reduction, operational excellence, digital experience, and growth.Guide customers on modern observability and security operating models, including platform engineering, SRE, DevSecOps, and AI-assisted operations.Support large opportunities by validating architecture direction, differentiation, value realization, and long-term platform vision.Influence go-to-market strategy by bringing field insight back to Product, Marketing, and leadership teams.Represent Dynatrace externally through executive briefings, customer workshops, industry events, webinars, and thought leadership.Mentor field teams on executive engagement, storytelling, value selling, and strategic account planning.Help create reusable field assets, strategic points of view, and technical value frameworks for priority industries and use cases.Partner with Customer Success and Services to promote adoption strategies that expand platform value over time.What will help you succeed12+ years of experience in enterprise technology, including senior roles in architecture, engineering, observability, cloud, security, or technical go-to-market leadership.5+ years in a customer-facing leadership role such as Field CTO, Enterprise Architect, CTO Advisor, Chief Architect, VP/Senior Director of Solution Engineering, or similar.Strong executive presence with the ability to communicate equally well with C-level leaders and deeply technical teams.Proven experience supporting complex enterprise sales cycles and strategic digital transformation programs.Deep knowledge of cloud platforms, modern application architectures, distributed systems, platform engineering, and enterprise IT operations.Strong understanding of observability, application performance, infrastructure, log analytics, digital experience, automation, and security.Experience in observability, AIOps, application security, cloud-native platforms, or enterprise analytics.Ability to connect technical transformation to business KPIs, value realization, and organizational change.Excellent communication, presentation, and workshop facilitation skills.Willingness to travel based on customer and business needs.Familiarity with executive value frameworks, business case development, and enterprise transformation methodology.Experience working with Fortune 500 or large global organizations.Background in SaaS or platform companies serving engineering, operations, and security teams.Public speaking and thought leadership experience, including conferences, customer events, or published content.Knowledge of AI/LLM adoption patterns and how AI can improve operational and business decision-making.Why you will love being a DynatracerDynatrace is a leader in unified observability and security.We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance.Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances.The Dynatrace platform uses cutting-edge technologies, including our own Davis hypermodal AI, to help our customers modernize and automate cloud operations, deliver software faster and more securely, and enable flawless digital experiences.Over 50% of the Fortune 100 companies are current customers of Dynatrace.Compensation and RewardsNote to Recruiters and Agencies : Thank you for your interest in Dynatrace. Please note that we do not accept unsolicited agency resumes -do not forward them via our website or directly to Dynatrace employees. Dynatrace will not pay fees for unsolicited resumes, and any resumes received this way will be considered the property of Dynatrace.Benefits and work-life perksWe offer best-in-class core rewards, including paid time off, financial security benefits, retirement savings plans, and health insurance. Beyond that, you'll get other benefits and work-life perks designed to make your ride with us even more rewarding.Mental health supportOur Employee Assistance Program, powered by Telus Health, offers support for you and your family members.Wellness DaysFour company-designated extra paid days off for you to recharge batteries.FlexibilityOur hybrid working model and flexible working hours offer you the flexibility you need.Employee Stock Purchase PlanPurchase company stock ( NYSE:DT ) at a discounted price and become a shareholder.Learn & developCompany-wide learning perks, designated team's learning days, and more.Volunteering dayA day of paid volunteer time to support a community or cause you care about.Regular team eventsWe host Global Culture Parties, Family & Friends at Work Day, Global Breakfasts, Green Weeks, Pride Month, and beyond!International vibeMost of our offices and teams are proudly multicultural. English is our shared language, but we embrace and learn from each other's cultures.Rewards vary depending on your employment type. Some benefits and perks also differ by location - explore your city to see what's available there.About DynatraceDynatrace (NYSE: DT) is the leading AI-powered observability and security platform. We're advancing observability for today's digital businesses, helping transform modern digital ecosystems' complexity into powerful business assets.Our AI-driven insights cut through the noise, allowing customers to focus on what truly matters by automating manual tasks and resolving issues with pinpoint accuracy. Dynatrace offers simplicity, clarity, and reliability at scale to ensure teams can make informed decisions, minimize downtime, and drive their business forward with confidence.
23/06/2026
Full time
Your role at DynatraceDynatrace is seeking a strategic, customer-facing Field CTO to serve as a senior technical advisor to executives, enterprise architects, and transformation leaders at our most important customers and prospects. This leader will connect business priorities to technical strategy, helping organizations use Dynatrace to improve resilience, accelerate innovation, strengthen security, and drive measurable business outcomes.The Field CTO operates at the intersection of executive engagement, technical vision, sales strategy, and market influence. This role partners closely with Sales, Solution Engineering, Product Management, Customer Success, Alliances, and Marketing to shape large, strategic opportunities and elevate Dynatrace's role as a trusted transformation partner.Act as the executive technical advisor for strategic accounts, engaging CIOs, CTOs, CISOs, VP Engineering, platform teams, and business stakeholders.Translate customer business goals into compelling transformation strategies powered by Dynatrace.Lead high-impact technical discovery and executive conversations around observability, cloud modernization, AI adoption, security, automation, and business outcomes.Shape account strategy with Sales and Solution Engineering teams for complex, multi-stakeholder deals.Develop board-level and executive-level narratives that connect platform capabilities to risk reduction, operational excellence, digital experience, and growth.Guide customers on modern observability and security operating models, including platform engineering, SRE, DevSecOps, and AI-assisted operations.Support large opportunities by validating architecture direction, differentiation, value realization, and long-term platform vision.Influence go-to-market strategy by bringing field insight back to Product, Marketing, and leadership teams.Represent Dynatrace externally through executive briefings, customer workshops, industry events, webinars, and thought leadership.Mentor field teams on executive engagement, storytelling, value selling, and strategic account planning.Help create reusable field assets, strategic points of view, and technical value frameworks for priority industries and use cases.Partner with Customer Success and Services to promote adoption strategies that expand platform value over time.What will help you succeed12+ years of experience in enterprise technology, including senior roles in architecture, engineering, observability, cloud, security, or technical go-to-market leadership.5+ years in a customer-facing leadership role such as Field CTO, Enterprise Architect, CTO Advisor, Chief Architect, VP/Senior Director of Solution Engineering, or similar.Strong executive presence with the ability to communicate equally well with C-level leaders and deeply technical teams.Proven experience supporting complex enterprise sales cycles and strategic digital transformation programs.Deep knowledge of cloud platforms, modern application architectures, distributed systems, platform engineering, and enterprise IT operations.Strong understanding of observability, application performance, infrastructure, log analytics, digital experience, automation, and security.Experience in observability, AIOps, application security, cloud-native platforms, or enterprise analytics.Ability to connect technical transformation to business KPIs, value realization, and organizational change.Excellent communication, presentation, and workshop facilitation skills.Willingness to travel based on customer and business needs.Familiarity with executive value frameworks, business case development, and enterprise transformation methodology.Experience working with Fortune 500 or large global organizations.Background in SaaS or platform companies serving engineering, operations, and security teams.Public speaking and thought leadership experience, including conferences, customer events, or published content.Knowledge of AI/LLM adoption patterns and how AI can improve operational and business decision-making.Why you will love being a DynatracerDynatrace is a leader in unified observability and security.We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance.Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances.The Dynatrace platform uses cutting-edge technologies, including our own Davis hypermodal AI, to help our customers modernize and automate cloud operations, deliver software faster and more securely, and enable flawless digital experiences.Over 50% of the Fortune 100 companies are current customers of Dynatrace.Compensation and RewardsNote to Recruiters and Agencies : Thank you for your interest in Dynatrace. Please note that we do not accept unsolicited agency resumes -do not forward them via our website or directly to Dynatrace employees. Dynatrace will not pay fees for unsolicited resumes, and any resumes received this way will be considered the property of Dynatrace.Benefits and work-life perksWe offer best-in-class core rewards, including paid time off, financial security benefits, retirement savings plans, and health insurance. Beyond that, you'll get other benefits and work-life perks designed to make your ride with us even more rewarding.Mental health supportOur Employee Assistance Program, powered by Telus Health, offers support for you and your family members.Wellness DaysFour company-designated extra paid days off for you to recharge batteries.FlexibilityOur hybrid working model and flexible working hours offer you the flexibility you need.Employee Stock Purchase PlanPurchase company stock ( NYSE:DT ) at a discounted price and become a shareholder.Learn & developCompany-wide learning perks, designated team's learning days, and more.Volunteering dayA day of paid volunteer time to support a community or cause you care about.Regular team eventsWe host Global Culture Parties, Family & Friends at Work Day, Global Breakfasts, Green Weeks, Pride Month, and beyond!International vibeMost of our offices and teams are proudly multicultural. English is our shared language, but we embrace and learn from each other's cultures.Rewards vary depending on your employment type. Some benefits and perks also differ by location - explore your city to see what's available there.About DynatraceDynatrace (NYSE: DT) is the leading AI-powered observability and security platform. We're advancing observability for today's digital businesses, helping transform modern digital ecosystems' complexity into powerful business assets.Our AI-driven insights cut through the noise, allowing customers to focus on what truly matters by automating manual tasks and resolving issues with pinpoint accuracy. Dynatrace offers simplicity, clarity, and reliability at scale to ensure teams can make informed decisions, minimize downtime, and drive their business forward with confidence.
Automata is transforming the way labs work with open, integrated automation. Our mission is to unlock the potential of labs and the potential of the people who work in them. At Automata, we're on a mission to transform how scientists work by making automation accessible to every lab in the world. We believe that by giving labs the power to automate, we can unlock discoveries that will shape the future of life sciences-from diagnostics and drug discovery to synthetic biology. Our LINQ platform combines hardware and software to streamline workflows, making lab automation fast, flexible, and affordable. This means our customers can focus on groundbreaking research, while we take care of the rest. Why Work at Automata? Impact: Your work will directly contribute to advancements in science and medicine, supporting labs around the globe as they push boundaries in research and innovation. Innovation: You'll be part of a team solving complex problems using cutting edge technology. Growth: We invest in our people through hands on experience, professional development, and collaborative projects. Community: Join a diverse, passionate team that values collaboration. We are looking for a Senior Platform Engineer to help build, scale, and operate the foundational infrastructure powering Automata's LINQ platform. You will play a critical role in designing and maintaining robust, secure, and compliant systems that support deployments across cloud and on premise (including bare metal environments). In this role, you will be responsible for Designing, building and operating Kubernetes platforms across AWS and bare metal environments. Managing and optimizing PostgreSQL databases, ensuring performance, resilience, and data integrity. Developing and maintaining Infrastructure as Code (Terraform, Pulumi, Crossplane, or similar). Implementing and managing GitOps workflows (ArgoCD) for consistent and repeatable deployments. Supporting Windows OS provisioning and hybrid infrastructure environments. Building and improving observability systems (OpenTelemetry, metrics, tracing, logging). Contributing to network architecture and operations, including physical networking (Juniper/Mist). Supporting deployments in regulated environments (ISO27001, SOC2, GxP). Collaborating with field and customer teams on on site deployments and troubleshooting. Developing automation and internal tooling using Python, Go, Rust, or .NET. Participating in building orchestration workflows using tools like Temporal. Helping define and maintain golden paths for developers to improve productivity and reliability. Contributing to documentation, onboarding materials, and operational runbooks. What it takes 5+ years of experience in platform engineering, SRE, or infrastructure roles. Strong experience with Kubernetes in production, including bare metal clusters. Solid understanding of cloud platforms (AWS) and hybrid deployments. Experience managing databases (PostgreSQL) / DBA level knowledge preferred. Familiarity with GitOps and modern IaC practices. Experience with observability tooling and distributed systems debugging. Understanding of networking fundamentals, including physical infrastructure. Experience working in regulated environments or with compliance frameworks. Proficiency in at least one programming language (Python, Go, Rust, or .NET). Strong communication skills and ability to work cross functionally. Willingness to travel to customer sites when needed. Experience with automation/AI assisted development workflows (e.g., Claude Code). Nice to haves Experience with Juniper/Mist networking. Exposure to Temporal or similar workflow orchestration systems. Background in supporting life sciences or GxP environments. Why You'll Want to Join Us You want to operate at the edge of your capability where expectations are high and impact is real. You're motivated by ownership and autonomy- not hierarchy or process. You want to build and scale something meaningful; our product directly saves lives. You're currently under leveraged, moving faster than your environment allows. You want to work with people who are equally driven, pragmatic, and focused on outcomes. We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Discrimination of any kind based on race, colour, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status is strictly prohibited.
23/06/2026
Full time
Automata is transforming the way labs work with open, integrated automation. Our mission is to unlock the potential of labs and the potential of the people who work in them. At Automata, we're on a mission to transform how scientists work by making automation accessible to every lab in the world. We believe that by giving labs the power to automate, we can unlock discoveries that will shape the future of life sciences-from diagnostics and drug discovery to synthetic biology. Our LINQ platform combines hardware and software to streamline workflows, making lab automation fast, flexible, and affordable. This means our customers can focus on groundbreaking research, while we take care of the rest. Why Work at Automata? Impact: Your work will directly contribute to advancements in science and medicine, supporting labs around the globe as they push boundaries in research and innovation. Innovation: You'll be part of a team solving complex problems using cutting edge technology. Growth: We invest in our people through hands on experience, professional development, and collaborative projects. Community: Join a diverse, passionate team that values collaboration. We are looking for a Senior Platform Engineer to help build, scale, and operate the foundational infrastructure powering Automata's LINQ platform. You will play a critical role in designing and maintaining robust, secure, and compliant systems that support deployments across cloud and on premise (including bare metal environments). In this role, you will be responsible for Designing, building and operating Kubernetes platforms across AWS and bare metal environments. Managing and optimizing PostgreSQL databases, ensuring performance, resilience, and data integrity. Developing and maintaining Infrastructure as Code (Terraform, Pulumi, Crossplane, or similar). Implementing and managing GitOps workflows (ArgoCD) for consistent and repeatable deployments. Supporting Windows OS provisioning and hybrid infrastructure environments. Building and improving observability systems (OpenTelemetry, metrics, tracing, logging). Contributing to network architecture and operations, including physical networking (Juniper/Mist). Supporting deployments in regulated environments (ISO27001, SOC2, GxP). Collaborating with field and customer teams on on site deployments and troubleshooting. Developing automation and internal tooling using Python, Go, Rust, or .NET. Participating in building orchestration workflows using tools like Temporal. Helping define and maintain golden paths for developers to improve productivity and reliability. Contributing to documentation, onboarding materials, and operational runbooks. What it takes 5+ years of experience in platform engineering, SRE, or infrastructure roles. Strong experience with Kubernetes in production, including bare metal clusters. Solid understanding of cloud platforms (AWS) and hybrid deployments. Experience managing databases (PostgreSQL) / DBA level knowledge preferred. Familiarity with GitOps and modern IaC practices. Experience with observability tooling and distributed systems debugging. Understanding of networking fundamentals, including physical infrastructure. Experience working in regulated environments or with compliance frameworks. Proficiency in at least one programming language (Python, Go, Rust, or .NET). Strong communication skills and ability to work cross functionally. Willingness to travel to customer sites when needed. Experience with automation/AI assisted development workflows (e.g., Claude Code). Nice to haves Experience with Juniper/Mist networking. Exposure to Temporal or similar workflow orchestration systems. Background in supporting life sciences or GxP environments. Why You'll Want to Join Us You want to operate at the edge of your capability where expectations are high and impact is real. You're motivated by ownership and autonomy- not hierarchy or process. You want to build and scale something meaningful; our product directly saves lives. You're currently under leveraged, moving faster than your environment allows. You want to work with people who are equally driven, pragmatic, and focused on outcomes. We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Discrimination of any kind based on race, colour, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status is strictly prohibited.
Senior Software Engineer - TRAX Observability Location: London Business Area: Engineering and CTO Ref #: About TRAX TRade Automation and eXecution (TRAX) is part of Bloomberg Enterprise Products Engineering. We build trade automation solutions and multiple Execution Management Systems (EMSs) that enable clients to route orders, execute trades, and monitor outcomes across asset classes. Trading is the core action of financial markets. Once investment decisions are made, traders rely on our systems to execute and manage trades. Ensuring these systems are observable, scalable, resilient, and well managed from a technical risk perspective is critical - and that's where TRAX Observability comes in. Our work focuses on: Informing (or alerting) stakeholders to system performance and degradation Demonstrating client impact during deployments Identifying emerging client behaviors and future system needs We build and maintain data infrastructure using firm-supported monitoring tools. This includes a custom telemetry platform that combines multiple data sources for advanced analysis, and a distributed trace pipeline (Argo, Spark, Solr) that processes large scale data for deep investigation. We also leverage tools such as Humio, Grafana, and MetricTank to support observability across the department. Learning & Technical Growth Work alongside experienced senior engineers with deep expertise in distributed systems, trading platforms, cloud infrastructure, and operations. You'll gain hands on experience building high throughput metrics and observability systems. Influence & Visibility Observability is central to system reliability and client experience. Your work will directly impact the stability of key Bloomberg systems and help prevent client facing issues. Network & Stakeholder Exposure Collaborate with engineering and product teams across London, Frankfurt, Tel Aviv, and New York, as well as peer SRE teams focused on Scalability and Resilience. You'll develop strong stakeholder management and communication skills. We'll trust you to: Enhance and maintain systems that capture and present performance metrics Improve the reliability and accuracy of telemetry and analysis Understand and assess client experience risks within EMS platforms Communicate system health and performance to stakeholders Partner across teams to strengthen observability Support Scalability and Resilience initiatives with actionable data Assist in triaging major incidents and production issues You will need to have: Experience with a high level language (Python preferred, but not required; Java, C++, etc. welcome) Knowledge of Unix/Linux fundamentals (or strong willingness to learn) Familiarity with observability concepts (e.g., distributed tracing, logging, metrics, tools such as Grafana or similar) Understanding of distributed systems concepts (replication, partitioning, scalability, messaging, state management) and eagerness to deepen that knowledge We would love to see: Exposure to cloud and data processing technologies (e.g., Argo, Spark, Solr) Experience communicating across IC and leadership levels Curiosity across the full software/hardware stack Strong written and verbal technical communication skills Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
22/06/2026
Full time
Senior Software Engineer - TRAX Observability Location: London Business Area: Engineering and CTO Ref #: About TRAX TRade Automation and eXecution (TRAX) is part of Bloomberg Enterprise Products Engineering. We build trade automation solutions and multiple Execution Management Systems (EMSs) that enable clients to route orders, execute trades, and monitor outcomes across asset classes. Trading is the core action of financial markets. Once investment decisions are made, traders rely on our systems to execute and manage trades. Ensuring these systems are observable, scalable, resilient, and well managed from a technical risk perspective is critical - and that's where TRAX Observability comes in. Our work focuses on: Informing (or alerting) stakeholders to system performance and degradation Demonstrating client impact during deployments Identifying emerging client behaviors and future system needs We build and maintain data infrastructure using firm-supported monitoring tools. This includes a custom telemetry platform that combines multiple data sources for advanced analysis, and a distributed trace pipeline (Argo, Spark, Solr) that processes large scale data for deep investigation. We also leverage tools such as Humio, Grafana, and MetricTank to support observability across the department. Learning & Technical Growth Work alongside experienced senior engineers with deep expertise in distributed systems, trading platforms, cloud infrastructure, and operations. You'll gain hands on experience building high throughput metrics and observability systems. Influence & Visibility Observability is central to system reliability and client experience. Your work will directly impact the stability of key Bloomberg systems and help prevent client facing issues. Network & Stakeholder Exposure Collaborate with engineering and product teams across London, Frankfurt, Tel Aviv, and New York, as well as peer SRE teams focused on Scalability and Resilience. You'll develop strong stakeholder management and communication skills. We'll trust you to: Enhance and maintain systems that capture and present performance metrics Improve the reliability and accuracy of telemetry and analysis Understand and assess client experience risks within EMS platforms Communicate system health and performance to stakeholders Partner across teams to strengthen observability Support Scalability and Resilience initiatives with actionable data Assist in triaging major incidents and production issues You will need to have: Experience with a high level language (Python preferred, but not required; Java, C++, etc. welcome) Knowledge of Unix/Linux fundamentals (or strong willingness to learn) Familiarity with observability concepts (e.g., distributed tracing, logging, metrics, tools such as Grafana or similar) Understanding of distributed systems concepts (replication, partitioning, scalability, messaging, state management) and eagerness to deepen that knowledge We would love to see: Exposure to cloud and data processing technologies (e.g., Argo, Spark, Solr) Experience communicating across IC and leadership levels Curiosity across the full software/hardware stack Strong written and verbal technical communication skills Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
Senior Site Reliability Engineer (Private Cloud)Applylocations: Leeds: Manchestertime type: Full timeposted on: Posted Todaytime left to apply: End Date: July 3, 2026 (14 days left to apply)job requisition id: 157451 End Date Thursday 02 July 2026 Salary Range £72,702 - £80,780 We support flexible working - click here for more information on flexible working options Flexible Working Options Hybrid Working, Job Share Job Description Summary . Job Description JOB TITLE: Senior Site Reliability Engineer (Private Cloud) SALARY: £72,702 - £80,780 LOCATION(S): HOURS: Full-time - 35 hours per week WORKING PATTERN : Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our Bristol office sites. Colleagues with disabilities can be supported with workplace adjustments including hybrid working expectations in line with our Flexibility Works policy. About this opportunity Our Private Cloud SRE (Site Reliability Engineering) team is looking for a passionate and experienced engineer to help run and evolve one of the Group's most critical platforms. As a Private Cloud SRE, you'll be a key contributor to the stability, performance, and scalability of services that support the Bank's digital transformation and long-term technology vision.You'll work hands-on with container platforms, VMware infrastructure, and observability tooling to ensure our services are resilient and efficient. You'll lead and participate in post-mortems, drive automation, and continuously improve the platform through engineering-led solutions. This role also involves working in Agile environments, collaborating across multiple teams and disciplines to deliver high-quality outcomes at pace. What you'll be doing Support and enhance a wide range of platform technologies, including VMware infrastructure, container platforms and orchestration (e.g., Kubernetes, OpenShift), databases, and applications. Use Infrastructure as Code to manage environments and support CI/CD pipelines. Improve observability using tools such as Dynatrace, ensuring proactive monitoring and alerting. Lead and contribute to post-mortems to identify and implement long-term fixes. Troubleshoot complex issues across the platform stack, including infrastructure, networking, storage, databases, and applications. Work in Agile teams, collaborating with engineers, architects, and product owners across the organisation. Identify and implement automation opportunities to reduce manual effort and improve operational efficiency. Why join us? We're on an exciting journey to transform our Group and the way we're shaping finance for good. We're focusing on the future, investing in our technologies, workplaces, and colleagues to make our Group a great place for everyone. Including you. What we're looking for? At least 5 years experience of DevOps principles, including Infrastructure as Code and CI/CD. 5+ years of experience with container platforms and orchestration (e.g., Docker, Kubernetes, OpenShift). Hands-on experience with VMware technologies in a production environment. Familiarity with observability platforms, such as Dynatrace. Proven ability to solve across a broad range of platform technologies. Experience with either Linux or Windows operating systems. An attitude focused on continuous improvement and reducing manual steps through automation. And any experience of these would be great Experience with automation tools and APIs for infrastructure management. Exposure to configuration management tools (e.g., Ansible, Puppet). Leadership or mentoring experience in technical teams. Certifications in VMware or any major cloud provider (e.g., Azure, GCP, AWS). Background in system administration or software engineering with a strong interest in learning cloud-native practices.We know that great talent comes from many backgrounds. Whilst this job advert may reference specific years of experience, we recognise that skills are developed in many ways, so if you have relevant, transferable experience, we encourage you to apply. This is a place for you Our ambition is to be the leading UK business for diversity, equity and inclusion supporting our customers, colleagues and communities and we're committed to creating an environment in which everyone can thrive, learn and develop. We also offer a wide-ranging benefits package, which includes: A generous pension contribution of up to 15% An annual performance-related bonus Share schemes including free shares Benefits you can adapt to your lifestyle, such as discounted shopping 30 days' holiday, with bank holidays on top A range of wellbeing initiatives and generous parental leave policies Ready for a career where you'll learn and thrive? Apply today and find out more. At Lloyds Banking Group, we're driven by a clear purpose; to help Britain prosper. Across the Group, our colleagues are focused on making a difference to customers, businesses and communities. With us you'll have a key role to play in shaping the financial services of the future, whilst the scale and reach of our Group means you'll have many opportunities to learn, grow and develop. We keep your data safe. So, we'll only ever ask you to provide confidential or sensitive information once you have formally been invited along to an interview or accepted a verbal offer to join us which is when we run our background checks. We'll always explain what we need and why, with any request coming from a trusted Lloyds Banking Group person. We're focused on creating a values-led culture and are committed to building a workforce which reflects the diversity of the customers and communities we serve. Together we're building a truly inclusive workplace where all of our colleagues have the opportunity to make a real difference.
21/06/2026
Full time
Senior Site Reliability Engineer (Private Cloud)Applylocations: Leeds: Manchestertime type: Full timeposted on: Posted Todaytime left to apply: End Date: July 3, 2026 (14 days left to apply)job requisition id: 157451 End Date Thursday 02 July 2026 Salary Range £72,702 - £80,780 We support flexible working - click here for more information on flexible working options Flexible Working Options Hybrid Working, Job Share Job Description Summary . Job Description JOB TITLE: Senior Site Reliability Engineer (Private Cloud) SALARY: £72,702 - £80,780 LOCATION(S): HOURS: Full-time - 35 hours per week WORKING PATTERN : Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our Bristol office sites. Colleagues with disabilities can be supported with workplace adjustments including hybrid working expectations in line with our Flexibility Works policy. About this opportunity Our Private Cloud SRE (Site Reliability Engineering) team is looking for a passionate and experienced engineer to help run and evolve one of the Group's most critical platforms. As a Private Cloud SRE, you'll be a key contributor to the stability, performance, and scalability of services that support the Bank's digital transformation and long-term technology vision.You'll work hands-on with container platforms, VMware infrastructure, and observability tooling to ensure our services are resilient and efficient. You'll lead and participate in post-mortems, drive automation, and continuously improve the platform through engineering-led solutions. This role also involves working in Agile environments, collaborating across multiple teams and disciplines to deliver high-quality outcomes at pace. What you'll be doing Support and enhance a wide range of platform technologies, including VMware infrastructure, container platforms and orchestration (e.g., Kubernetes, OpenShift), databases, and applications. Use Infrastructure as Code to manage environments and support CI/CD pipelines. Improve observability using tools such as Dynatrace, ensuring proactive monitoring and alerting. Lead and contribute to post-mortems to identify and implement long-term fixes. Troubleshoot complex issues across the platform stack, including infrastructure, networking, storage, databases, and applications. Work in Agile teams, collaborating with engineers, architects, and product owners across the organisation. Identify and implement automation opportunities to reduce manual effort and improve operational efficiency. Why join us? We're on an exciting journey to transform our Group and the way we're shaping finance for good. We're focusing on the future, investing in our technologies, workplaces, and colleagues to make our Group a great place for everyone. Including you. What we're looking for? At least 5 years experience of DevOps principles, including Infrastructure as Code and CI/CD. 5+ years of experience with container platforms and orchestration (e.g., Docker, Kubernetes, OpenShift). Hands-on experience with VMware technologies in a production environment. Familiarity with observability platforms, such as Dynatrace. Proven ability to solve across a broad range of platform technologies. Experience with either Linux or Windows operating systems. An attitude focused on continuous improvement and reducing manual steps through automation. And any experience of these would be great Experience with automation tools and APIs for infrastructure management. Exposure to configuration management tools (e.g., Ansible, Puppet). Leadership or mentoring experience in technical teams. Certifications in VMware or any major cloud provider (e.g., Azure, GCP, AWS). Background in system administration or software engineering with a strong interest in learning cloud-native practices.We know that great talent comes from many backgrounds. Whilst this job advert may reference specific years of experience, we recognise that skills are developed in many ways, so if you have relevant, transferable experience, we encourage you to apply. This is a place for you Our ambition is to be the leading UK business for diversity, equity and inclusion supporting our customers, colleagues and communities and we're committed to creating an environment in which everyone can thrive, learn and develop. We also offer a wide-ranging benefits package, which includes: A generous pension contribution of up to 15% An annual performance-related bonus Share schemes including free shares Benefits you can adapt to your lifestyle, such as discounted shopping 30 days' holiday, with bank holidays on top A range of wellbeing initiatives and generous parental leave policies Ready for a career where you'll learn and thrive? Apply today and find out more. At Lloyds Banking Group, we're driven by a clear purpose; to help Britain prosper. Across the Group, our colleagues are focused on making a difference to customers, businesses and communities. With us you'll have a key role to play in shaping the financial services of the future, whilst the scale and reach of our Group means you'll have many opportunities to learn, grow and develop. We keep your data safe. So, we'll only ever ask you to provide confidential or sensitive information once you have formally been invited along to an interview or accepted a verbal offer to join us which is when we run our background checks. We'll always explain what we need and why, with any request coming from a trusted Lloyds Banking Group person. We're focused on creating a values-led culture and are committed to building a workforce which reflects the diversity of the customers and communities we serve. Together we're building a truly inclusive workplace where all of our colleagues have the opportunity to make a real difference.
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does.Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation.We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary: We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions: Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision-making. Drive Incident & Problem Management: Working with Operations teams on high-priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post-incident reviews and implement long-term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements: Essential Qualifications Experience: 5-7+ years of hands-on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large-scale monitoring and observability solutions. Expert-Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem-Solving: Exceptional analytical and strategic problem-solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands-on: Demonstratable hands-on engineering and coding experience, ability to deep-dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network-related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross-functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self-motivated with a high degree of initiative and excellent follow-up skills, along with strong analytical and problem-solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field.Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI-driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue-Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior-level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator).Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category: Universal Music Group
19/06/2026
Full time
Music is Universal It's the passionate and dedicated team at Universal Music who help make us the world's leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does.Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation.We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email . Job Summary: We are UMG, the Universal Music Group. We are the world's leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.As a Senior Observability Engineer, you will be a driving force for technical excellence and strategic vision within our global team. You will be instrumental in architecting, building, and leading our comprehensive observability strategy to ensure the reliability, performance, and scalability of our critical IT systems. This senior role demands a passion for data-driven strategy, a commitment to automation, and the ability to mentor and lead. You will not only solve complex technical challenges but also influence the direction of observability practices across UMG globally, ensuring our technology landscape is as world-class as our music. Job Functions: Architecture & Strategy: Lead the architectural design and strategic roadmap for our observability stack. Drive the vision for world-class monitoring, logging, tracing, and alerting solutions across our hybrid and cloud-native environments. Innovate & Automate: Spearhead the evaluation, selection, and implementation of cutting-edge observability tools and platforms (e.g., Dynatrace, OpenTelemetry, Prometheus, Grafana). Architect and build robust, automated observability pipelines. Take an active part in documenting and defining processes and best practice. Optimize & Analyze: Conduct deep-dive analysis of telemetry data to proactively identify performance bottlenecks, optimize resource utilization, and guide capacity planning. Lead & Mentor: Act as a technical leader and mentor for the observability team and wider engineering groups. Champion and enforce best practices, fostering a culture of proactive and data-informed decision-making. Drive Incident & Problem Management: Working with Operations teams on high-priority incident resolution efforts, utilizing deep analysis of telemetry data for swift root cause identification. Drive post-incident reviews and implement long-term solutions to enhance system resilience. Collaborate & Influence: Partner with Development, SRE, and Infrastructure leaders to embed observability into the entire technology lifecycle. Influence and drive the adoption of observability best practices across the global organization. Champion the use of observability in the global UMG environment. Make UMG the place to be: Mentoring, managing and genuinely leading the Observability team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this. Job Requirements: Essential Qualifications Experience: 5-7+ years of hands-on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a proven track record of leading complex projects. Technical Leadership: Demonstrated experience in architecting and designing large-scale monitoring and observability solutions. Expert-Level Tooling: Deep expertise with modern observability platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry). Cloud & Infrastructure: Advanced knowledge of major cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible). Programming & Automation: Strong programming and scripting skills (e.g., Python, Go, Shell) with a focus on creating scalable automation and custom tooling. Problem-Solving: Exceptional analytical and strategic problem-solving skills, with the ability to lead through complex technical challenges. Data Analysis: Expertise in analysing and visualising telemetry data into meaningful information to drive actions. Hands-on: Demonstratable hands-on engineering and coding experience, ability to deep-dive into existing and emerging technologies to identify opportunities and solutions. Containerization and Orchestration: Understanding of container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes) to monitor and manage containerized applications. Networking Knowledge: Understanding of networking principles and protocols to effectively monitor and troubleshoot network-related issues. Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes. Communication & Influence: Excellent communication and interpersonal skills, capable of articulating a technical vision to diverse audiences and influencing senior stakeholders. Ability to collaborate with cross-functional teams, convey findings, and discuss improvements with developers and operations teams. Continuous Learning: Given the dynamic nature of technology, a commitment to continuous learning and staying updated on the latest trends in observability and monitoring. Self-motivated with a high degree of initiative and excellent follow-up skills, along with strong analytical and problem-solving skills. Travel may be required but is not part of the regular work schedule. Bachelor's degree in technology related field as well as 5+ years of relevant experience within the Observability field.Desired Qualifications Advanced Concepts: Proven experience with Chaos Engineering, AI-driven analytics, defining SLOs/SLIs, and advanced deployment strategies (Canary/Blue-Green). Software Engineering Foundation: Strong background in software engineering principles, database administration, and distributed systems architecture Certifications: Relevant senior-level industry certifications (e.g., AWS Certified DevOps Engineer - Professional, Certified Kubernetes Administrator).Just So You Know The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement. Job Category: Universal Music Group