it job board logo
  • Home
  • Find IT Jobs
  • Register CV
  • Career Advice
  • Contact us
  • Employers
    • Register as Employer
    • Pricing Plans
  • Recruiting? Post a job
  • Sign in
  • Sign up
  • Home
  • Find IT Jobs
  • Register CV
  • Career Advice
  • Contact us
  • Employers
    • Register as Employer
    • Pricing Plans
Sorry, that job is no longer available. Here are some results that may be similar to the job you were looking for.

231 jobs found

Email me jobs like this
Refine Search
Current Search
software engineer platform sre
Solution Architect
Far Coder Gibraltar, Buckinghamshire
# Remote Solution Architect Job at Xapo BankAnywhere17 hours agoFull TimeAnywhere$25000 - $40000 USDCI/CDPythonAWSGCParchitecturecloudsecurity"When applying, mention the word FarCoder to show you've read the job post completely. Employers can look for these words to identify genuine, thoughtful applicants and avoid spam." Job OverviewXapo Bank is hiring a remote candidate for Solution Architect. This is a full time position. Work location: Anywhere.The role typically involves technologies such as CI/CD, Python, AWS, GCP, architecture, cloud. Key Responsibilities Own and facilitate the Architectural Advice Forum (AAF), driving decentralised decision-making that is captured as ADRs. The forum is yours, the challenge is expected, but you are not "single" approver of every architectural change, the goal is for decisions to be made as close to the teams doing the work as possible. Personally build production-grade Proofs of Concept to de-risk new patterns, validate architectural ideas and demonstrate "what good looks like" ahead of broader adoption. Partner with the CTO on Xapo's AI strategy, identifying high-value opportunities, evaluating tooling, prototyping solutions and helping engineering teams adopt AI responsibly across the SDLC and our customer-facing products. Own cross-cutting standardisation across backend, web and mobile engineering, patterns, paradigms, observability, security, testing and delivery, so that quality and velocity scale together. Define and uphold the engineering quality bar across backend, web and mobile, covering testing, code review, observability, security, performance and production-readiness, and lead by example through your own POCs and code reviews Embed Security-First Architecture across everything we build. For a regulated crypto bank, security is not a checkbox or a separate workstream, it is the DNA of our products. You will set the threat-modelling, secure-by-default and defence-in-depth patterns that every team builds against, and partner with our Security function to evolve them as our risk surface grows. Champion Domain-Driven Design and Event-Driven Architecture as foundations for how we model our domain and evolve our systems. Work closely with the SRE team on reliability, capacity, incident response, observability and the evolution of our internal developer platform. Coach and mentor engineers across SATs, platform and enabling teams, leading through inspiration and influence rather than authority, and helping the whole organisation raise its architectural bar. Surface and unblock systemic technical risks, technical debt and cross-team dependencies before they hit production. Maintain Xapo's engineering principles and Tech Radar, and contribute to architecture-relevant aspects of regulatory, security and risk conversations. Build a great place to work for talented and motivated people, and help develop innovative solutions with Bitcoin at their core. Required Skills Primary Skills CI/CD Python AWS GCP Secondary Skills architecture cloud securitySkills required for this role include CI/CD, Python, AWS, and related tools for day-to-day development. Job Details Employment Type: Full Time Location: Anywhere Salary: $25000 - $40000 USD Tech StackCI/CD, Python, AWS, GCP, architecture, cloud, security Role details Work from anywhere, impact everywhereDiversity is at the heart of who we are at Xapo Bank. We're a fully distributed team of over 160 Xapiens that work remotely from 50+ countries around the world.Our beginning: A world that enjoys economic freedom and wealth protection,no matter where you live or who is running your country. To achieve that, we search the world for the best people for the job. We work hard, think globally, and inspire each other to learn and grow. We are committed to changing the way things are done. Although we are headquartered in Gibraltar, this is a full time, 100% remote position Work from anywhere! Position overviewWe're looking for a Solution Architect to join our engineering function, reporting directly to the CTO. At Xapo, we are building truly cross-functional teams with full ownership of design, architecture, building, testing, delivery, data, and operations, structured as Stream Aligned Teams (SATs), Platform teams and Enabling teams as per Team Topologies. You will collaborate closely with engineering leaders and the SRE function, as well as fellow members of the product, apps, and design.This is a deeply hands-on role. You will not sit on top of an ivory tower drawing boxes and arrows, you will write code, build production-grade Proofs of Concept and demonstrate what good looks like by example. You will own the Architectural Advice Forum (AAF), facilitating decentralised architectural decisions captured as ADRs without becoming a bottleneck or single point of approval.You will be the CTO's partner on AI strategy, drive standardisation across backend, web and mobile engineering, and work closely with our SRE team on reliability, observability and platform concerns.This is an opportunity to shape how a regulated crypto bank designs, builds and operates software, having a real impact on how the future of finance looks. Responsibilities Skills needed 10+ years of software engineering experience, including 3+ years in an Architect, Staff or equivalent senior-IC role. A strong track record as a hands-on engineer-turned-architect: you still read, write and review production code regularly, and you enjoy doing so. The role is language-agnostic, we care how you think, not which language you write, though familiarity with our primary stack is a plus. Deep experience designing, building and operating distributed, event-driven systems in production, ideally at scale and under regulatory scrutiny. Hands-on experience with Domain-Driven Design, both strategic (bounded contexts, context mapping) and tactical patterns. Practical familiarity with decentralised architectural governance, Architectural Advice Forum, RFCs and ADRs, and a clear point of view on why architectural decisions should be made close to the teams doing the work. Demonstrated ability to lead by influence rather than authority: you have inspired engineers to adopt good practices because they wanted to, not because they were told to. Experience working in regulated fintech, payments or banking environments, with an appreciation for the constraints that come with serving customers' money. Deep experience designing security-first architectures, threat modelling, secure-by-default patterns, defence in depth, identity, key and secret management, in environments where a breach has material business and regulatory consequences. A strong perspective on modern AI/ML, LLMs, agents and applied AI in production, and how to integrate them responsibly into engineering workflows and customer-facing products. Excellent written and verbal communication: you can write a crisp ADR, run an effective architecture review, present to engineers and explain trade-offs to non-technical stakeholders. Solid understanding of cloud-native architecture, microservices, container-based 12-factor apps and patterns around fault tolerance, security and resilience. Strong CI/CD, automated testing (unit, service and end-to-end) and overall SDLC practices. Nice to have Hands-on experience in crypto, Web3, custody or blockchain-based systems. Familiarity with Team Topologies and how Stream Aligned, Platform and Enabling teams interact in practice. Multi-platform standardisation experience spanning backend, web and mobile. Background contributing to SRE or platform engineering work, SLOs, error budgets, developer experience. Open-source contributions or external thought leadership (talks, articles, OSS projects). Familiarity with Python, a meaningful portion of our stack is in Python, so it helps you hit the ground running, but the role is language-agnostic. Other requirements AWS as our cloud platform GCP as our data warehousing Containerised microservices A multi-language stack Based in or near a CET-compatible timezone A dedicated workspace A reliable internet connection with the fastest speed possible in your area Alignment with Our Values and the Xapo Values-Driven Leadership principles Why work for Xapo?Impact Globally, Work Remotely. Shape the Future Improve lives through cutting-edge technology, work 100% remotely from anywhere in the world. Great work-life balance Build amazing things with a balance of autonomy and collaborative teamwork. Set your own work schedule and make use of a flexible PTO plan when you need to recharge. Expect Excellence Collaborate, learn, and grow with a high-performance team. Learn how you learn best - from books to conferences, you'll get a yearly budget for your individual learning and development goals. At Xapo, we prioritize consumer protection and adhere to regulatory requirements by ensuring that all Xapiens are accountable for upholding principles of fair treatment, transparency, and ethical conduct in their interactions with customers and stakeholders.
27/06/2026
Full time
# Remote Solution Architect Job at Xapo BankAnywhere17 hours agoFull TimeAnywhere$25000 - $40000 USDCI/CDPythonAWSGCParchitecturecloudsecurity"When applying, mention the word FarCoder to show you've read the job post completely. Employers can look for these words to identify genuine, thoughtful applicants and avoid spam." Job OverviewXapo Bank is hiring a remote candidate for Solution Architect. This is a full time position. Work location: Anywhere.The role typically involves technologies such as CI/CD, Python, AWS, GCP, architecture, cloud. Key Responsibilities Own and facilitate the Architectural Advice Forum (AAF), driving decentralised decision-making that is captured as ADRs. The forum is yours, the challenge is expected, but you are not "single" approver of every architectural change, the goal is for decisions to be made as close to the teams doing the work as possible. Personally build production-grade Proofs of Concept to de-risk new patterns, validate architectural ideas and demonstrate "what good looks like" ahead of broader adoption. Partner with the CTO on Xapo's AI strategy, identifying high-value opportunities, evaluating tooling, prototyping solutions and helping engineering teams adopt AI responsibly across the SDLC and our customer-facing products. Own cross-cutting standardisation across backend, web and mobile engineering, patterns, paradigms, observability, security, testing and delivery, so that quality and velocity scale together. Define and uphold the engineering quality bar across backend, web and mobile, covering testing, code review, observability, security, performance and production-readiness, and lead by example through your own POCs and code reviews Embed Security-First Architecture across everything we build. For a regulated crypto bank, security is not a checkbox or a separate workstream, it is the DNA of our products. You will set the threat-modelling, secure-by-default and defence-in-depth patterns that every team builds against, and partner with our Security function to evolve them as our risk surface grows. Champion Domain-Driven Design and Event-Driven Architecture as foundations for how we model our domain and evolve our systems. Work closely with the SRE team on reliability, capacity, incident response, observability and the evolution of our internal developer platform. Coach and mentor engineers across SATs, platform and enabling teams, leading through inspiration and influence rather than authority, and helping the whole organisation raise its architectural bar. Surface and unblock systemic technical risks, technical debt and cross-team dependencies before they hit production. Maintain Xapo's engineering principles and Tech Radar, and contribute to architecture-relevant aspects of regulatory, security and risk conversations. Build a great place to work for talented and motivated people, and help develop innovative solutions with Bitcoin at their core. Required Skills Primary Skills CI/CD Python AWS GCP Secondary Skills architecture cloud securitySkills required for this role include CI/CD, Python, AWS, and related tools for day-to-day development. Job Details Employment Type: Full Time Location: Anywhere Salary: $25000 - $40000 USD Tech StackCI/CD, Python, AWS, GCP, architecture, cloud, security Role details Work from anywhere, impact everywhereDiversity is at the heart of who we are at Xapo Bank. We're a fully distributed team of over 160 Xapiens that work remotely from 50+ countries around the world.Our beginning: A world that enjoys economic freedom and wealth protection,no matter where you live or who is running your country. To achieve that, we search the world for the best people for the job. We work hard, think globally, and inspire each other to learn and grow. We are committed to changing the way things are done. Although we are headquartered in Gibraltar, this is a full time, 100% remote position Work from anywhere! Position overviewWe're looking for a Solution Architect to join our engineering function, reporting directly to the CTO. At Xapo, we are building truly cross-functional teams with full ownership of design, architecture, building, testing, delivery, data, and operations, structured as Stream Aligned Teams (SATs), Platform teams and Enabling teams as per Team Topologies. You will collaborate closely with engineering leaders and the SRE function, as well as fellow members of the product, apps, and design.This is a deeply hands-on role. You will not sit on top of an ivory tower drawing boxes and arrows, you will write code, build production-grade Proofs of Concept and demonstrate what good looks like by example. You will own the Architectural Advice Forum (AAF), facilitating decentralised architectural decisions captured as ADRs without becoming a bottleneck or single point of approval.You will be the CTO's partner on AI strategy, drive standardisation across backend, web and mobile engineering, and work closely with our SRE team on reliability, observability and platform concerns.This is an opportunity to shape how a regulated crypto bank designs, builds and operates software, having a real impact on how the future of finance looks. Responsibilities Skills needed 10+ years of software engineering experience, including 3+ years in an Architect, Staff or equivalent senior-IC role. A strong track record as a hands-on engineer-turned-architect: you still read, write and review production code regularly, and you enjoy doing so. The role is language-agnostic, we care how you think, not which language you write, though familiarity with our primary stack is a plus. Deep experience designing, building and operating distributed, event-driven systems in production, ideally at scale and under regulatory scrutiny. Hands-on experience with Domain-Driven Design, both strategic (bounded contexts, context mapping) and tactical patterns. Practical familiarity with decentralised architectural governance, Architectural Advice Forum, RFCs and ADRs, and a clear point of view on why architectural decisions should be made close to the teams doing the work. Demonstrated ability to lead by influence rather than authority: you have inspired engineers to adopt good practices because they wanted to, not because they were told to. Experience working in regulated fintech, payments or banking environments, with an appreciation for the constraints that come with serving customers' money. Deep experience designing security-first architectures, threat modelling, secure-by-default patterns, defence in depth, identity, key and secret management, in environments where a breach has material business and regulatory consequences. A strong perspective on modern AI/ML, LLMs, agents and applied AI in production, and how to integrate them responsibly into engineering workflows and customer-facing products. Excellent written and verbal communication: you can write a crisp ADR, run an effective architecture review, present to engineers and explain trade-offs to non-technical stakeholders. Solid understanding of cloud-native architecture, microservices, container-based 12-factor apps and patterns around fault tolerance, security and resilience. Strong CI/CD, automated testing (unit, service and end-to-end) and overall SDLC practices. Nice to have Hands-on experience in crypto, Web3, custody or blockchain-based systems. Familiarity with Team Topologies and how Stream Aligned, Platform and Enabling teams interact in practice. Multi-platform standardisation experience spanning backend, web and mobile. Background contributing to SRE or platform engineering work, SLOs, error budgets, developer experience. Open-source contributions or external thought leadership (talks, articles, OSS projects). Familiarity with Python, a meaningful portion of our stack is in Python, so it helps you hit the ground running, but the role is language-agnostic. Other requirements AWS as our cloud platform GCP as our data warehousing Containerised microservices A multi-language stack Based in or near a CET-compatible timezone A dedicated workspace A reliable internet connection with the fastest speed possible in your area Alignment with Our Values and the Xapo Values-Driven Leadership principles Why work for Xapo?Impact Globally, Work Remotely. Shape the Future Improve lives through cutting-edge technology, work 100% remotely from anywhere in the world. Great work-life balance Build amazing things with a balance of autonomy and collaborative teamwork. Set your own work schedule and make use of a flexible PTO plan when you need to recharge. Expect Excellence Collaborate, learn, and grow with a high-performance team. Learn how you learn best - from books to conferences, you'll get a yearly budget for your individual learning and development goals. At Xapo, we prioritize consumer protection and adhere to regulatory requirements by ensuring that all Xapiens are accountable for upholding principles of fair treatment, transparency, and ethical conduct in their interactions with customers and stakeholders.
Site Reliability Engineer III
CMETS CME Technology and Support Services Ltd. City, Belfast
Site Reliability Engineer III (Tue - Sat) CME Group is seeking a Site Reliability Engineer III to take a key role in building, operating, and scaling systems in our Markets portfolio. As an SRE III, you will apply your experience to the complex challenges of the CME Globex trading platform, where our systems deliver an exceptional combination of low latency performance and rock solid reliability. You will work with senior engineers on complex projects, take ownership of key reliability initiatives, and mentor junior colleagues, shaping the team's technical direction. Key Responsibilities Own Observability: design, build, and refine monitoring, alerting, and observability solutions; drive continuous improvement of SLIs and SLOs to enable faster issue detection and resolution. Drive Reliability Projects: take ownership of reliability focused projects from design to implementation, collaborating with product teams to ensure new features are scalable, resilient, and safe. Lead Technical Solutions: lead technical discussions for your work, presenting solution options and proposals with clear trade offs. Automate Intelligently: proactively identify and eliminate toil through robust automation, improving both system reliability and team velocity. Manage Incidents: lead incident response, own resolution of significant incidents, ensure rapid system recovery, and drive meaningful action from blameless post mortems. Mentor & Coach: act as a technical mentor and point of escalation for L1 and L2 SREs, fostering their growth through code reviews and paired work. Architect for the Future: contribute ideas to the product backlog and play an active role in the architectural design for the migration to Google Cloud Platform. What We're Looking For 3-5+ years of professional experience in a Site Reliability, DevOps, Software, or Systems Engineering role. Strong, hands on experience administering and troubleshooting Linux based production systems. Proficient programming skills in a language like Python or Go, with a track record of automating complex operational tasks. Proven ability to lead technical initiatives and solve complex problems with a high degree of autonomy. Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences. A proactive and ownership oriented mindset. Desirable Skills Cloud Platforms: deep experience with Google Cloud Platform, especially GCE, GKE, and cloud networking. Monitoring Tools: expertise in designing and managing monitoring stacks such as Prometheus, Grafana, and OpenTelemetry. Distributed Systems: strong practical knowledge of building and maintaining large scale distributed systems. Containerisation: advanced experience with Kubernetes and Docker in a production environment. Networking: solid understanding of networking protocols (HTTP, TCP/UDP, IP) and network architecture. Domain Knowledge: experience in financial markets, low latency systems, or message oriented middleware. Company Benefits Bonus Programme Generous shift allowance Equity Programme Employee Stock Purchase Plan (ESPP) Private Medical and Dental coverage Mental Health Benefit Programme Group Pension Plan Income Protection Life Assurance Cycle to Work EV Car Benefit Scheme Gym Membership Family Leave Education Assistance - MBA/Advanced Degree/Bachelor Degree Ongoing Employee Development Hybrid Working Equal Opportunity Employer As an equal opportunity employer, we consider all potential employees without regard to any protected characteristic.
27/06/2026
Full time
Site Reliability Engineer III (Tue - Sat) CME Group is seeking a Site Reliability Engineer III to take a key role in building, operating, and scaling systems in our Markets portfolio. As an SRE III, you will apply your experience to the complex challenges of the CME Globex trading platform, where our systems deliver an exceptional combination of low latency performance and rock solid reliability. You will work with senior engineers on complex projects, take ownership of key reliability initiatives, and mentor junior colleagues, shaping the team's technical direction. Key Responsibilities Own Observability: design, build, and refine monitoring, alerting, and observability solutions; drive continuous improvement of SLIs and SLOs to enable faster issue detection and resolution. Drive Reliability Projects: take ownership of reliability focused projects from design to implementation, collaborating with product teams to ensure new features are scalable, resilient, and safe. Lead Technical Solutions: lead technical discussions for your work, presenting solution options and proposals with clear trade offs. Automate Intelligently: proactively identify and eliminate toil through robust automation, improving both system reliability and team velocity. Manage Incidents: lead incident response, own resolution of significant incidents, ensure rapid system recovery, and drive meaningful action from blameless post mortems. Mentor & Coach: act as a technical mentor and point of escalation for L1 and L2 SREs, fostering their growth through code reviews and paired work. Architect for the Future: contribute ideas to the product backlog and play an active role in the architectural design for the migration to Google Cloud Platform. What We're Looking For 3-5+ years of professional experience in a Site Reliability, DevOps, Software, or Systems Engineering role. Strong, hands on experience administering and troubleshooting Linux based production systems. Proficient programming skills in a language like Python or Go, with a track record of automating complex operational tasks. Proven ability to lead technical initiatives and solve complex problems with a high degree of autonomy. Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences. A proactive and ownership oriented mindset. Desirable Skills Cloud Platforms: deep experience with Google Cloud Platform, especially GCE, GKE, and cloud networking. Monitoring Tools: expertise in designing and managing monitoring stacks such as Prometheus, Grafana, and OpenTelemetry. Distributed Systems: strong practical knowledge of building and maintaining large scale distributed systems. Containerisation: advanced experience with Kubernetes and Docker in a production environment. Networking: solid understanding of networking protocols (HTTP, TCP/UDP, IP) and network architecture. Domain Knowledge: experience in financial markets, low latency systems, or message oriented middleware. Company Benefits Bonus Programme Generous shift allowance Equity Programme Employee Stock Purchase Plan (ESPP) Private Medical and Dental coverage Mental Health Benefit Programme Group Pension Plan Income Protection Life Assurance Cycle to Work EV Car Benefit Scheme Gym Membership Family Leave Education Assistance - MBA/Advanced Degree/Bachelor Degree Ongoing Employee Development Hybrid Working Equal Opportunity Employer As an equal opportunity employer, we consider all potential employees without regard to any protected characteristic.
Lead Infrastructure Engineer
96 Morgan Stanley UK Ltd
We're seeking a Lead Infrastructure Engineer covering designing, building and maintaining robust, resilient and scalable infrastructure automation systems. In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities. This is a Lead Cloud and Infrastructure Engineering position at Vice President level, which is part of the job family responsible for managing and optimizing technical infrastructure and ensuring the seamless operation of IT systems to support business needs effectively. What you'll do in the role: Design, implement and support new infrastructure solutions to enable our algorithmic trading business to operate in an efficient, effective and compliant manner. Collaborate closely with business-facing development groups, system administrators and enterprise infrastructure engineers to implement and maintain appropriate solutions to reduce costs and manage technology risk for our business. Act as a subject matter expert for our Linux-based electronic trading infrastructure platform. Mentor, coach and lead less experienced engineers. Provide technical leadership and vision for a small business-aligned team of engineers and SREs. Drive the continuous improvement of our infrastructure and working practices through regular reviews, blameless post-mortems and other SRE techniques. What you'll bring to the role: Advanced knowledge of the Python programming language and standard software engineering concepts such as common data structures, regular expressions, object-oriented programming, and advanced algorithms. Strong understanding of core Unix components - networking stack, daemon configuration, OS customisation. Experience with architecting large scale, robust, resilient and scalable systems. Domain expertise in infrastructure automation and integration. The ability to describe algorithmic and architectural trade-offs in an algebraic or quantitative fashion. Comfortable developing in a Linux and CLI-based environment. Knowledge of standard Linux command line debugging tools such as tcpdump and strace. Familiarity with modern development tools and practices including agentic-workflows, git, jenkins, test-driven development, and continuous integration. Can take full-lifecycle ownership of components within a project, from initial architecture design to ongoing support. A proven track record of technical leadership, able to work independently on both technical problems as well as customer interactions. Familiarity with the Go programming language would also be useful. Certified Persons Regulatory Requirements: If this role is deemed a Certified role and may require the role holder to hold mandatory regulatory qualifications or the minimum qualifications to meet internal company benchmarks. Flexible work statement: Morgan Stanley empowers employees to have greater freedom of choice through flexible working arrangements. Speak to our recruitment team to find out more. Morgan Stanley is an equal opportunity employer committed to building and maintaining a workforce that is diverse in experience and background. Our recruiting efforts reflect our strong commitment to a culture of inclusion, where individuals are hired, developed, and advanced based on their skills and talents. Our workforce reflects a broad cross-section of the global communities in which we operate, bringing a variety of backgrounds, talents, perspectives, and experiences. For more information, please visit:
27/06/2026
Full time
We're seeking a Lead Infrastructure Engineer covering designing, building and maintaining robust, resilient and scalable infrastructure automation systems. In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities. This is a Lead Cloud and Infrastructure Engineering position at Vice President level, which is part of the job family responsible for managing and optimizing technical infrastructure and ensuring the seamless operation of IT systems to support business needs effectively. What you'll do in the role: Design, implement and support new infrastructure solutions to enable our algorithmic trading business to operate in an efficient, effective and compliant manner. Collaborate closely with business-facing development groups, system administrators and enterprise infrastructure engineers to implement and maintain appropriate solutions to reduce costs and manage technology risk for our business. Act as a subject matter expert for our Linux-based electronic trading infrastructure platform. Mentor, coach and lead less experienced engineers. Provide technical leadership and vision for a small business-aligned team of engineers and SREs. Drive the continuous improvement of our infrastructure and working practices through regular reviews, blameless post-mortems and other SRE techniques. What you'll bring to the role: Advanced knowledge of the Python programming language and standard software engineering concepts such as common data structures, regular expressions, object-oriented programming, and advanced algorithms. Strong understanding of core Unix components - networking stack, daemon configuration, OS customisation. Experience with architecting large scale, robust, resilient and scalable systems. Domain expertise in infrastructure automation and integration. The ability to describe algorithmic and architectural trade-offs in an algebraic or quantitative fashion. Comfortable developing in a Linux and CLI-based environment. Knowledge of standard Linux command line debugging tools such as tcpdump and strace. Familiarity with modern development tools and practices including agentic-workflows, git, jenkins, test-driven development, and continuous integration. Can take full-lifecycle ownership of components within a project, from initial architecture design to ongoing support. A proven track record of technical leadership, able to work independently on both technical problems as well as customer interactions. Familiarity with the Go programming language would also be useful. Certified Persons Regulatory Requirements: If this role is deemed a Certified role and may require the role holder to hold mandatory regulatory qualifications or the minimum qualifications to meet internal company benchmarks. Flexible work statement: Morgan Stanley empowers employees to have greater freedom of choice through flexible working arrangements. Speak to our recruitment team to find out more. Morgan Stanley is an equal opportunity employer committed to building and maintaining a workforce that is diverse in experience and background. Our recruiting efforts reflect our strong commitment to a culture of inclusion, where individuals are hired, developed, and advanced based on their skills and talents. Our workforce reflects a broad cross-section of the global communities in which we operate, bringing a variety of backgrounds, talents, perspectives, and experiences. For more information, please visit:
ML/AI Platform Engineer
Monzo Cardiff, South Glamorgan
Machine Learning Platform Engineering We're on a mission to make money work for everyone, and our Machine Learning Platform team builds the systems that help teams across Monzo train, evaluate, deploy, and serve ML models and AI features safely and reliably. We work on backend services, Python libraries, model lifecycle tooling, evaluation workflows, and low latency serving systems. Our users are internal ML engineers, scientists, and product teams building with ML and LLMs. The work matters because machine learning powers many important decisions and experiences at Monzo, from fraud checks and credit decisions to customer operations. Location & Compensation London, UK (remote within the UK available). Salary £85,000 - £110,000 plus incentive awards tied to performance. Benefits include relocation support, visa sponsorship, flexible working hours, learning budget, and a full list of benefits. Responsibilities Develop backend services, platform APIs, and production systems using Go. Write Python libraries, workflows, and tooling used by our ML engineers and scientists. Implement feature platforms and data workflows with Chronon, Feast, and DataHub. Build model training pipelines and experiment tracking using Vertex AI and Comet. Maintain AI observability, evaluation, and tracing using Langfuse. Deploy and maintain real time serving on AWS and batch compute on GCP, including BigQuery data warehousing. Qualifications Strong backend engineering background with experience in Go and Python. Experience with ML or AI platforms, including pipelines, feature stores, model serving, experiment tracking, or LLM tooling. Designed and operated distributed systems that handle scale, concurrency, and failure. Focus on developer experience and removing friction for internal teams. Comfortable with ambiguity and ability to shape a platform as it grows. Experience with strongly typed languages and writing backend software. Curiosity about system behavior in production, including reliability, latency, quality, safety, and operational risk. This Might NOT Be the Right Fit If Your background is predominantly DevOps, SRE, or infrastructure operations. You are focused on data science or modelling rather than platform engineering. You have shipped AI product features but have not worked on the platform side (serving, evaluation, model lifecycle). Benefits Competitive salary £85,000 - £110,000 plus incentive awards. Relocation assistance to the UK and visa sponsorship. Flexible working hours and trust to work the hours that suit you. Annual learning budget of £1,000 for books, training courses, and conferences. Additional benefits available - see our full benefits list. Equal Opportunity Employer Diversity and inclusion are a priority for us. We are an equal opportunity employer and will consider all applicants without regard to age, ethnicity, religion, sex, sexual orientation, gender identity, family or parental status, national origin, veteran status, neurodiversity, or disability status.
27/06/2026
Full time
Machine Learning Platform Engineering We're on a mission to make money work for everyone, and our Machine Learning Platform team builds the systems that help teams across Monzo train, evaluate, deploy, and serve ML models and AI features safely and reliably. We work on backend services, Python libraries, model lifecycle tooling, evaluation workflows, and low latency serving systems. Our users are internal ML engineers, scientists, and product teams building with ML and LLMs. The work matters because machine learning powers many important decisions and experiences at Monzo, from fraud checks and credit decisions to customer operations. Location & Compensation London, UK (remote within the UK available). Salary £85,000 - £110,000 plus incentive awards tied to performance. Benefits include relocation support, visa sponsorship, flexible working hours, learning budget, and a full list of benefits. Responsibilities Develop backend services, platform APIs, and production systems using Go. Write Python libraries, workflows, and tooling used by our ML engineers and scientists. Implement feature platforms and data workflows with Chronon, Feast, and DataHub. Build model training pipelines and experiment tracking using Vertex AI and Comet. Maintain AI observability, evaluation, and tracing using Langfuse. Deploy and maintain real time serving on AWS and batch compute on GCP, including BigQuery data warehousing. Qualifications Strong backend engineering background with experience in Go and Python. Experience with ML or AI platforms, including pipelines, feature stores, model serving, experiment tracking, or LLM tooling. Designed and operated distributed systems that handle scale, concurrency, and failure. Focus on developer experience and removing friction for internal teams. Comfortable with ambiguity and ability to shape a platform as it grows. Experience with strongly typed languages and writing backend software. Curiosity about system behavior in production, including reliability, latency, quality, safety, and operational risk. This Might NOT Be the Right Fit If Your background is predominantly DevOps, SRE, or infrastructure operations. You are focused on data science or modelling rather than platform engineering. You have shipped AI product features but have not worked on the platform side (serving, evaluation, model lifecycle). Benefits Competitive salary £85,000 - £110,000 plus incentive awards. Relocation assistance to the UK and visa sponsorship. Flexible working hours and trust to work the hours that suit you. Annual learning budget of £1,000 for books, training courses, and conferences. Additional benefits available - see our full benefits list. Equal Opportunity Employer Diversity and inclusion are a priority for us. We are an equal opportunity employer and will consider all applicants without regard to age, ethnicity, religion, sex, sexual orientation, gender identity, family or parental status, national origin, veteran status, neurodiversity, or disability status.
Mastercard
Lead Site Reliability Engineer
Mastercard Dunstable, Bedfordshire
Lead Site Reliability Engineer The Business Operations team is seeking a highly motivated and experienced Lead Site Reliability Engineer (SRE) to join our team. You will play a critical role in ensuring the reliability, scalability, and performance of our applications, supporting essential services that power Mastercard's global operations. As a thought leader in your field, you will bring technical expertise, a passion for automation, and the ability to mentor. Responsibilities Be a developing subject matter expert in the Site Reliability Engineering area, influencing stakeholders and applying advanced knowledge to drive achievement of area goals and initiatives by contributing to solution development and improvements for existing products, services, and/or processes. Implement and maintain high-availability system solutions, ensuring stability, performance, and operational continuity. Evaluate operational requirements to develop effective technical solutions within existing frameworks. Lead automation and scripting efforts to streamline operational processes and incident response workflows. Troubleshoot and resolve complex system issues, escalating as necessary to maintain system health and proactively address risks. Contribute to documentation, knowledge sharing, and best practices to improve team operational procedures. Conduct reviews and quality assurance activities to uphold organizational standards for system stability. Keep current with industry trends and emerging technologies relevant to system reliability and operational automation. Guide and mentor junior team members through on-the-job experiences, reviewing work and fostering a culture of continuous improvement to grow expertise around their discipline. Qualifications Observability - Ability to use scripting and tooling to implement observability solutions, enabling the collection, analysis, and visualization of metrics, logs, and traces to support incident detection, diagnosis, and continuous service improvement. Programming and Scripting - Ability to write and maintain code and scripts to automate tasks, build operational tools, and support monitoring, deployment, and incident response using languages such as Python, Go, Bash, or similar. Systems and Network Administration - Ability to configure, operate, and troubleshoot Linux/Unix systems and network components, applying knowledge of networking concepts, protocols, security, and system reliability. Cloud Computing and Infrastructure - Ability to design, deploy, and manage applications and infrastructure on cloud platforms (e.g., AWS, Azure, GCP), ensuring scalability, security, availability, and operational efficiency. Reliability and Scalability - Ability to design and operate systems for high availability, fault tolerance, and disaster recovery, while ensuring systems can scale to meet current and future demand. DevOps Practices - Ability to apply DevOps principles and practices, including CI/CD pipelines, containerization, and orchestration, to enable faster, more reliable software delivery and operations. Troubleshooting - Capability to systematically identify, diagnose, and resolve technical issues across systems, applications, and networks, using analytical methods and tools to restore functionality, minimize disruption, and ensure stable operations. Capacity Planning and Performance Optimization - Ability to monitor resource utilization, forecast future capacity needs, and optimize system performance to support growth, scalability, and efficient infrastructure usage. IT Service Management - Ability to apply IT service management principles to incident, problem, and change management, ensuring reliable service delivery, effective incident response, and continuous service improvement aligned to business needs. Proactive Monitoring and Improvement (SRE Applications) - The ability to use application reliability signals to anticipate issues, identify risks, and drive preventative improvements that enhance application performance and availability. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks come with an inherent risk to the organization. Each person working for, or on behalf of, Mastercard is responsible for information security and must: abide by Mastercard's security policies and practices; ensure the confidentiality and integrity of the information being accessed; report any suspected information security violation or breach; and complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
27/06/2026
Full time
Lead Site Reliability Engineer The Business Operations team is seeking a highly motivated and experienced Lead Site Reliability Engineer (SRE) to join our team. You will play a critical role in ensuring the reliability, scalability, and performance of our applications, supporting essential services that power Mastercard's global operations. As a thought leader in your field, you will bring technical expertise, a passion for automation, and the ability to mentor. Responsibilities Be a developing subject matter expert in the Site Reliability Engineering area, influencing stakeholders and applying advanced knowledge to drive achievement of area goals and initiatives by contributing to solution development and improvements for existing products, services, and/or processes. Implement and maintain high-availability system solutions, ensuring stability, performance, and operational continuity. Evaluate operational requirements to develop effective technical solutions within existing frameworks. Lead automation and scripting efforts to streamline operational processes and incident response workflows. Troubleshoot and resolve complex system issues, escalating as necessary to maintain system health and proactively address risks. Contribute to documentation, knowledge sharing, and best practices to improve team operational procedures. Conduct reviews and quality assurance activities to uphold organizational standards for system stability. Keep current with industry trends and emerging technologies relevant to system reliability and operational automation. Guide and mentor junior team members through on-the-job experiences, reviewing work and fostering a culture of continuous improvement to grow expertise around their discipline. Qualifications Observability - Ability to use scripting and tooling to implement observability solutions, enabling the collection, analysis, and visualization of metrics, logs, and traces to support incident detection, diagnosis, and continuous service improvement. Programming and Scripting - Ability to write and maintain code and scripts to automate tasks, build operational tools, and support monitoring, deployment, and incident response using languages such as Python, Go, Bash, or similar. Systems and Network Administration - Ability to configure, operate, and troubleshoot Linux/Unix systems and network components, applying knowledge of networking concepts, protocols, security, and system reliability. Cloud Computing and Infrastructure - Ability to design, deploy, and manage applications and infrastructure on cloud platforms (e.g., AWS, Azure, GCP), ensuring scalability, security, availability, and operational efficiency. Reliability and Scalability - Ability to design and operate systems for high availability, fault tolerance, and disaster recovery, while ensuring systems can scale to meet current and future demand. DevOps Practices - Ability to apply DevOps principles and practices, including CI/CD pipelines, containerization, and orchestration, to enable faster, more reliable software delivery and operations. Troubleshooting - Capability to systematically identify, diagnose, and resolve technical issues across systems, applications, and networks, using analytical methods and tools to restore functionality, minimize disruption, and ensure stable operations. Capacity Planning and Performance Optimization - Ability to monitor resource utilization, forecast future capacity needs, and optimize system performance to support growth, scalability, and efficient infrastructure usage. IT Service Management - Ability to apply IT service management principles to incident, problem, and change management, ensuring reliable service delivery, effective incident response, and continuous service improvement aligned to business needs. Proactive Monitoring and Improvement (SRE Applications) - The ability to use application reliability signals to anticipate issues, identify risks, and drive preventative improvements that enhance application performance and availability. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks come with an inherent risk to the organization. Each person working for, or on behalf of, Mastercard is responsible for information security and must: abide by Mastercard's security policies and practices; ensure the confidentiality and integrity of the information being accessed; report any suspected information security violation or breach; and complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
Software Engineering Manager - SRE
Marks & Spencer Plc City Of Westminster, London
We're changing the way we do things, and putting industry leading innovation at the heart of how we operate; we need a stellar engineering team to make it happen. You'll be joining one of the most iconic brands in the UK on its most exciting cycle yet. We're more integrated and product led in our tech teams than ever before: learning, changing, and adapting constantly, with millions of people benefiting from your work every single day. You'll be joining the M&S Platform team as a Software Engineering Manager. Our mission is to streamline development at M&S for 1000+ engineers and 30+ applications - covering both our customer-facing and colleague-facing applications. You will manage the team building the SRE function. You will support the team and help direct the technical vision for how we will build reusable pipelines, tooling, build plugins and frameworks that our many apps will harness to boost the developer experience, while contributing code every step of the way. We want to make M&S one of the best places for software development, target the latest tooling and enable our engineers to focus on building the best user experience so that they don't get held up by common engineering and continuous integration problems. What You'll Do Being accountable for engineering excellence within your teams, from behaviours to operations, from technical direction to solution in production and from skills and growth to reputation. Cultivating self-management and accountability throughout the team via leadership, clear sense of purpose and thoughtful talent management. Lead alignment with the overarching technical strategy and working with the wider Technology organisation to craft it. Act as a platform owner, apply accurate product thinking to what is being built with a view to enable and empower as much as possible the digital product(s) that it supports through data and customer centricity, driving the related partner management. Collaborate with the entire engineering leadership, to make us think strategically, to ensure maximum alignment and to maintain a healthy ability to "think big" within their teams. Line management of supporting Staff and Senior Engineers, as well as driving recruitment and retention within the team. Act as custodian of OKR's within your hub. Supports our engineering communities to drive bar raising and strategic alignment, by creating the space and time within teams for the agenda of these communities to be progressed efficiently. When vendors are involved, the SEM is positioned as the key technical partner. They own it as a vital part of their platform with the same level of reliability and satisfaction as the in house capabilities they build and run. Who You Are Previous polyglot hands on senior software engineer Excellent knowledge in SRE concepts and trade offs Extensive background in software engineering with several years' experience in a variety of systems and technologies Experience building and leading teams of highly skilled, senior software engineers that deliver high quality software. Excellent understanding of system design, software architecture, cloud, and software engineering standard methodologies, Promoter of DevOps: you build it, you run it. Strong understanding of testing strategies and reliability engineering Excellent people management, interpersonal, analytical, and problem solving skills Ability to lead and line manage senior engineers and technical partners to a desired outcome, without prescribing it. Excellent communication skills, both written and spoken and able to adjust for different, including non technical audiences. A servant leadership mentality that is willing to take ownership of problems. Able to influence people at senior levels and from the highly technical to non technical Our Tech Stack SLOs, SLIs, error budgets, and reliability strategy Observability with metrics, logs, traces, and alerting Incident response, on call operations, postmortems, and runbooks Cloud infrastructure, Kubernetes, containers, and distributed systems CI/CD, progressive delivery, rollback, and release safety Automation, toil reduction, and self healing systems Capacity planning, performance engineering, and cost awareness Security, access control, compliance, and operational governance What's In It For You Working at M&S means being part of something bigger - helping to deliver quality, value and service to millions of customers every day. We're inclusive, fast moving and always evolving, with a strong sense of purpose and a focus on doing the right thing. Here are just a few of the benefits that make working here even more rewarding: 20% colleague discount on all M&S products and many third party brands for you and someone in your household, available once you've completed your probation Competitive holiday allowance with the option to buy more Discretionary bonus schemes linked to your performance and ours Strong pension and life assurance to help plan for the future Tailored induction and training to support your development from day one Exclusive perks and savings through our M&S Choices portal Market leading family policies, including parental, adoption and neonatal leave 24/7 wellbeing support, including virtual GP access and mental health services One paid volunteer day a year to support a cause that matters to you Everyone's Welcome We are ambitious about the future of retail. We're disrupting, innovating and leading the industry into a more conscientious, inspiring digital era. We're transforming how we work together and offering our most exciting opportunities yet. Marks & Spencer strives to be an inclusive organisation, trusted and admired by our colleagues, customers and suppliers. Join us and make change happen. We are committed to building diverse and representative teams, where everyone can bring their whole selves to work and be at their best. We support each other and work together to win together. If you feel you'd benefit from any support or reasonable adjustments during any stage of the recruitment process, please don't hesitate to let us know when completing your application. This information will be picked up by our team, so we can try and put steps in place to help you be at your best through this process.
27/06/2026
Full time
We're changing the way we do things, and putting industry leading innovation at the heart of how we operate; we need a stellar engineering team to make it happen. You'll be joining one of the most iconic brands in the UK on its most exciting cycle yet. We're more integrated and product led in our tech teams than ever before: learning, changing, and adapting constantly, with millions of people benefiting from your work every single day. You'll be joining the M&S Platform team as a Software Engineering Manager. Our mission is to streamline development at M&S for 1000+ engineers and 30+ applications - covering both our customer-facing and colleague-facing applications. You will manage the team building the SRE function. You will support the team and help direct the technical vision for how we will build reusable pipelines, tooling, build plugins and frameworks that our many apps will harness to boost the developer experience, while contributing code every step of the way. We want to make M&S one of the best places for software development, target the latest tooling and enable our engineers to focus on building the best user experience so that they don't get held up by common engineering and continuous integration problems. What You'll Do Being accountable for engineering excellence within your teams, from behaviours to operations, from technical direction to solution in production and from skills and growth to reputation. Cultivating self-management and accountability throughout the team via leadership, clear sense of purpose and thoughtful talent management. Lead alignment with the overarching technical strategy and working with the wider Technology organisation to craft it. Act as a platform owner, apply accurate product thinking to what is being built with a view to enable and empower as much as possible the digital product(s) that it supports through data and customer centricity, driving the related partner management. Collaborate with the entire engineering leadership, to make us think strategically, to ensure maximum alignment and to maintain a healthy ability to "think big" within their teams. Line management of supporting Staff and Senior Engineers, as well as driving recruitment and retention within the team. Act as custodian of OKR's within your hub. Supports our engineering communities to drive bar raising and strategic alignment, by creating the space and time within teams for the agenda of these communities to be progressed efficiently. When vendors are involved, the SEM is positioned as the key technical partner. They own it as a vital part of their platform with the same level of reliability and satisfaction as the in house capabilities they build and run. Who You Are Previous polyglot hands on senior software engineer Excellent knowledge in SRE concepts and trade offs Extensive background in software engineering with several years' experience in a variety of systems and technologies Experience building and leading teams of highly skilled, senior software engineers that deliver high quality software. Excellent understanding of system design, software architecture, cloud, and software engineering standard methodologies, Promoter of DevOps: you build it, you run it. Strong understanding of testing strategies and reliability engineering Excellent people management, interpersonal, analytical, and problem solving skills Ability to lead and line manage senior engineers and technical partners to a desired outcome, without prescribing it. Excellent communication skills, both written and spoken and able to adjust for different, including non technical audiences. A servant leadership mentality that is willing to take ownership of problems. Able to influence people at senior levels and from the highly technical to non technical Our Tech Stack SLOs, SLIs, error budgets, and reliability strategy Observability with metrics, logs, traces, and alerting Incident response, on call operations, postmortems, and runbooks Cloud infrastructure, Kubernetes, containers, and distributed systems CI/CD, progressive delivery, rollback, and release safety Automation, toil reduction, and self healing systems Capacity planning, performance engineering, and cost awareness Security, access control, compliance, and operational governance What's In It For You Working at M&S means being part of something bigger - helping to deliver quality, value and service to millions of customers every day. We're inclusive, fast moving and always evolving, with a strong sense of purpose and a focus on doing the right thing. Here are just a few of the benefits that make working here even more rewarding: 20% colleague discount on all M&S products and many third party brands for you and someone in your household, available once you've completed your probation Competitive holiday allowance with the option to buy more Discretionary bonus schemes linked to your performance and ours Strong pension and life assurance to help plan for the future Tailored induction and training to support your development from day one Exclusive perks and savings through our M&S Choices portal Market leading family policies, including parental, adoption and neonatal leave 24/7 wellbeing support, including virtual GP access and mental health services One paid volunteer day a year to support a cause that matters to you Everyone's Welcome We are ambitious about the future of retail. We're disrupting, innovating and leading the industry into a more conscientious, inspiring digital era. We're transforming how we work together and offering our most exciting opportunities yet. Marks & Spencer strives to be an inclusive organisation, trusted and admired by our colleagues, customers and suppliers. Join us and make change happen. We are committed to building diverse and representative teams, where everyone can bring their whole selves to work and be at their best. We support each other and work together to win together. If you feel you'd benefit from any support or reasonable adjustments during any stage of the recruitment process, please don't hesitate to let us know when completing your application. This information will be picked up by our team, so we can try and put steps in place to help you be at your best through this process.
Staff Site Reliability Engineer - Site Experience
Dangote Industries Limited
Reddit is a community of communities. It's built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the largest and most influential platforms on the internet. As Reddit continues to scale globally, reliability and performance are more critical than ever. The Site Experience SRE team sits at the intersection of infrastructure, product engineering, and user experience - ensuring that every interaction across web, mobile, APIs, feeds, media delivery, and real time systems is fast, reliable, and resilient. We are looking for a Staff Site Reliability Engineer to lead reliability engineering initiatives for critical user facing systems at internet scale. In this role, you will partner closely with product and infrastructure teams to improve availability, latency, scalability, and operational excellence across Reddit's most business critical experiences. This is a highly technical leadership role for someone who thrives in large-scale distributed systems, enjoys solving complex reliability challenges, and can influence engineering culture across the organization. What you'll do: Lead Reliability Engineering for User Experience Drive reliability, scalability, and operational excellence for critical user facing systems and services. Improve performance and resiliency across APIs, content delivery, feed generation, search, messaging, and real time experiences. Architect for Scale Partner with product and infrastructure engineering teams to design systems that remain highly available and performant under massive global load. Guide architectural decisions around failover, redundancy, graceful degradation, traffic management, and capacity planning. Reduce Operational Risk Identify systemic risks and reliability bottlenecks across services, dependencies, deployments, and infrastructure. Build proactive mitigation strategies and drive engineering improvements that reduce incidents and improve service health. Drive Automation Eliminate repetitive operational work through automation and tooling. Build systems that improve deployment safety, incident response, remediation workflows, and reliability guardrails. Incident Management Lead complex incident response efforts across engineering teams. Drive blameless postmortems, identify root causes, and ensure sustainable long-term fixes are implemented. Influence Engineering Standards Define and champion best practices around reliability engineering, SLIs/SLOs, capacity management, release engineering, and operational maturity across the company. Mentor and Multiply Impact Provide technical leadership and mentorship to engineers across SRE and software engineering teams. Help shape reliability culture and raise the operational excellence bar across the organization. What We're Looking For 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. Strong collaboration and communication skills with the ability to influence technical direction across teams. Strong experience supporting high traffic, user facing production environments. Deep understanding of one or more: distributed systems, networking, Linux systems, cloud native architectures. Experience designing highly available systems with strong operational and reliability practices. Strong programming skills in languages such as Go, Python, or similar. Strong understanding of observability systems including metrics, logging, tracing, and alerting. Experience improving reliability through SLOs, automation, incident management, and performance optimization. Demonstrated ability to troubleshoot complex issues across applications, infrastructure, networking, and services. Nice to Have Experience operating systems at internet scale traffic volumes. Experience with Kubernetes, containers, cloud infrastructure, and modern deployment platforms. Familiarity with technologies such as Prometheus, Grafana, OpenTelemetry, Envoy, Kafka, ClickHouse, Cassandra, Redis, or similar distributed infrastructure technologies. Experience with CDN optimization, edge reliability, traffic engineering, or global infrastructure. Contributions to open source software or participation in technical communities. Experience leading large scale incident response and operational transformation initiatives. Why Join Reddit? You'll help shape the reliability and performance of one of the internet's largest platforms, influencing experiences used by millions of people every day. This is an opportunity to solve deeply complex engineering problems at massive scale while helping define the future of reliability engineering for a modern consumer platform. Benefits Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support. Family Planning Support. Gender-Affirming Care. Mental Health & Coaching Benefits. Group Personal Pension Scheme with Employer match. Private Medical and Dental Scheme. Income Replacement Programs. Bike to Work scheme. Flexible Vacation & Paid Volunteer Time Off. Generous Paid Parental Leave.
27/06/2026
Full time
Reddit is a community of communities. It's built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the largest and most influential platforms on the internet. As Reddit continues to scale globally, reliability and performance are more critical than ever. The Site Experience SRE team sits at the intersection of infrastructure, product engineering, and user experience - ensuring that every interaction across web, mobile, APIs, feeds, media delivery, and real time systems is fast, reliable, and resilient. We are looking for a Staff Site Reliability Engineer to lead reliability engineering initiatives for critical user facing systems at internet scale. In this role, you will partner closely with product and infrastructure teams to improve availability, latency, scalability, and operational excellence across Reddit's most business critical experiences. This is a highly technical leadership role for someone who thrives in large-scale distributed systems, enjoys solving complex reliability challenges, and can influence engineering culture across the organization. What you'll do: Lead Reliability Engineering for User Experience Drive reliability, scalability, and operational excellence for critical user facing systems and services. Improve performance and resiliency across APIs, content delivery, feed generation, search, messaging, and real time experiences. Architect for Scale Partner with product and infrastructure engineering teams to design systems that remain highly available and performant under massive global load. Guide architectural decisions around failover, redundancy, graceful degradation, traffic management, and capacity planning. Reduce Operational Risk Identify systemic risks and reliability bottlenecks across services, dependencies, deployments, and infrastructure. Build proactive mitigation strategies and drive engineering improvements that reduce incidents and improve service health. Drive Automation Eliminate repetitive operational work through automation and tooling. Build systems that improve deployment safety, incident response, remediation workflows, and reliability guardrails. Incident Management Lead complex incident response efforts across engineering teams. Drive blameless postmortems, identify root causes, and ensure sustainable long-term fixes are implemented. Influence Engineering Standards Define and champion best practices around reliability engineering, SLIs/SLOs, capacity management, release engineering, and operational maturity across the company. Mentor and Multiply Impact Provide technical leadership and mentorship to engineers across SRE and software engineering teams. Help shape reliability culture and raise the operational excellence bar across the organization. What We're Looking For 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. Strong collaboration and communication skills with the ability to influence technical direction across teams. Strong experience supporting high traffic, user facing production environments. Deep understanding of one or more: distributed systems, networking, Linux systems, cloud native architectures. Experience designing highly available systems with strong operational and reliability practices. Strong programming skills in languages such as Go, Python, or similar. Strong understanding of observability systems including metrics, logging, tracing, and alerting. Experience improving reliability through SLOs, automation, incident management, and performance optimization. Demonstrated ability to troubleshoot complex issues across applications, infrastructure, networking, and services. Nice to Have Experience operating systems at internet scale traffic volumes. Experience with Kubernetes, containers, cloud infrastructure, and modern deployment platforms. Familiarity with technologies such as Prometheus, Grafana, OpenTelemetry, Envoy, Kafka, ClickHouse, Cassandra, Redis, or similar distributed infrastructure technologies. Experience with CDN optimization, edge reliability, traffic engineering, or global infrastructure. Contributions to open source software or participation in technical communities. Experience leading large scale incident response and operational transformation initiatives. Why Join Reddit? You'll help shape the reliability and performance of one of the internet's largest platforms, influencing experiences used by millions of people every day. This is an opportunity to solve deeply complex engineering problems at massive scale while helping define the future of reliability engineering for a modern consumer platform. Benefits Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support. Family Planning Support. Gender-Affirming Care. Mental Health & Coaching Benefits. Group Personal Pension Scheme with Employer match. Private Medical and Dental Scheme. Income Replacement Programs. Bike to Work scheme. Flexible Vacation & Paid Volunteer Time Off. Generous Paid Parental Leave.
Senior Platform Engineer
Wickes Watford, Hertfordshire
We are looking for a highly skilled Senior Platform Engineer to join our dynamic Platform Engineering & Technology team. In this role, you will be a key player in designing, building, and maintaining the foundational infrastructure that powers our internal platforms and enables software development teams to deliver applications more efficiently, securely, and reliably, all within an Agile and Kanban-driven environment. You will champion our 'platform as a product' philosophy by creating robust, scalable, and automated solutions. As a senior member of the team, you will provide technical leadership, mentor other engineers, and drive the adoption of best practices in infrastructure management and automation. Wickes is a home improvement retailer which offers a wide range of products for DIY and home improvement, with a strong focus on serving both the local trade and general public, including kitchen and bathroom installations. We are currently undergoing a technical transformation replacing many of the key systems used to run and operate our business. What you'll be doing Infrastructure Automation Design, implement, and manage highly available, scalable, and secure cloud infrastructure on Amazon Web Services (AWS) using Infrastructure as Code (IaC) principles Deep expertise in core AWS services such as EC2, ECS/EKS, Lambda, VPC, S3, RDS, DynamoDB, SQS, SNS, CloudWatch, CloudTrail, IAM, KMS, etc. Optimize AWS resource utilization and costs while maintaining performance and reliability. Implement and enforce security best practices across all AWS environments. Troubleshoot and resolve complex infrastructure and application issues in production and non-production environments. Contribute to architectural decisions and long-term cloud strategy. CI/CD Pipeline Development & Management Architect, develop, and maintain robust, automated, and efficient CI/CD pipelines for various applications and services (e.g., Jenkins, GitLab CI/CD). Implement strategies for continuous integration, continuous delivery, and continuous deployment, including blue/green deployments, canary releases, and rollbacks. Integrate security scanning tools (SAST, DAST, SCA) and quality gates into CI/CD pipelines. Drive the adoption of best practices for version control (Git), branching strategies, and pull request workflows. Automate testing, build, deployment, and release processes to accelerate software delivery and improve reliability. DevOps & Automation Champion a DevOps culture within the organization, promoting collaboration, shared responsibility, and continuous improvement. Develop and maintain automation scripts and tools (e.g., Python, Bash, Go) to streamline operational tasks and reduce manual effort. Implement robust monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK Stack, Datadog, CloudWatch) to ensure platform health and performance. Collaboration & Mentorship Collaborate closely with development teams, product managers, and other stakeholders to understand requirements and deliver effective solutions. Provide technical guidance and mentorship to junior engineers, fostering their growth and development. Participate in code reviews, design discussions, and architectural decision-making processes. Stay up to date with emerging AWS services, cloud technologies, and industry best practices. Security & Compliance Ensure that security and compliance are built into the product roadmap from the outset. Work with security teams to ensure any changes or new deployments meet data protection and compliance requirements. Add all security recommendations from our vendor and hosting providers to the backlog. Governance & Continuous Improvement Define KPIs and success metrics to measure platform effectiveness. Create and maintain comprehensive documentation for our infrastructure, processes, and tools in Confluence. Continuously assess and refine existing products to support evolving business needs. Does this sound like you You'll have a Bachelor's degree in business, computer science, computer engineering, system analysis or a related field of study, or equivalent. You'll have significant experience (10+ years ideally) in a DevOps, Site Reliability Engineering (SRE), or Platform Engineering role. We also expect you have around a minimum of 7+ years of hands on experience designing, deploying, and managing complex infrastructure on AWS. You'll be able to demonstrate strong experience building and maintaining robust CI/CD pipelines from scratch and optimizing existing ones. Expert level proficiency with AWS services and a deep understanding of cloud architecture principles. Extensive experience with Infrastructure as Code (IaC) tools, primarily Terraform. Strong expertise with Jenkins and GitLab CI/CD. Proficiency in at least one scripting language (e.g., Python, Bash, Go). Solid understanding of containerization technologies (Docker) and orchestration (Kubernetes/EKS/ECS). Experience with monitoring and logging tools (DataDog preferred). Strong understanding of networking concepts (TCP/IP, DNS, VPN, Load Balancing) and security best practices in the cloud. Familiarity with database technologies (relational and NoSQL). Experience with Git and collaborative development workflows. Familiarity with other Agile frameworks like Scrum, in addition to Kanban. Poise and ability to act calmly and competently in high pressure, demanding situations. The ability to manage ambiguity to clarify requirements and objectives to ensure successful outcomes. High degree of initiative, dependability, and ability to work with minimal supervision while being resilient to change. Motivation and drive to achieve long term business outcomes. Ability to work effectively in a team environment and contribute to the overall engineering and company wide technical direction. Excellent written, verbal, communication and presentation skills with the ability to articulate new ideas and concepts to technical and nontechnical audiences. What's in it for you We'll also equip you with a benefits package that includes: Competitive package including an annual bonus 25 days holiday plus bank holidays Enhanced contributory pension and life assurance Flexible hybrid working (2 3 days in Watford) Save as you earn scheme Colleague discount Discount platform including savings and cash back at numerous retailers, savings on gym membership, cycle to work scheme Well being strategy with Employee Assistance Programme, financial education & loans, and access to parental, menopause, and fertility support. We're a Disability Confident Employer and committed to building a diverse workforce that reflects the communities we serve. We welcome applications from disabled people and are committed to providing an accessible recruitment process and workplace for everyone. If you require any support or reasonable adjustments, please let us know.
27/06/2026
Full time
We are looking for a highly skilled Senior Platform Engineer to join our dynamic Platform Engineering & Technology team. In this role, you will be a key player in designing, building, and maintaining the foundational infrastructure that powers our internal platforms and enables software development teams to deliver applications more efficiently, securely, and reliably, all within an Agile and Kanban-driven environment. You will champion our 'platform as a product' philosophy by creating robust, scalable, and automated solutions. As a senior member of the team, you will provide technical leadership, mentor other engineers, and drive the adoption of best practices in infrastructure management and automation. Wickes is a home improvement retailer which offers a wide range of products for DIY and home improvement, with a strong focus on serving both the local trade and general public, including kitchen and bathroom installations. We are currently undergoing a technical transformation replacing many of the key systems used to run and operate our business. What you'll be doing Infrastructure Automation Design, implement, and manage highly available, scalable, and secure cloud infrastructure on Amazon Web Services (AWS) using Infrastructure as Code (IaC) principles Deep expertise in core AWS services such as EC2, ECS/EKS, Lambda, VPC, S3, RDS, DynamoDB, SQS, SNS, CloudWatch, CloudTrail, IAM, KMS, etc. Optimize AWS resource utilization and costs while maintaining performance and reliability. Implement and enforce security best practices across all AWS environments. Troubleshoot and resolve complex infrastructure and application issues in production and non-production environments. Contribute to architectural decisions and long-term cloud strategy. CI/CD Pipeline Development & Management Architect, develop, and maintain robust, automated, and efficient CI/CD pipelines for various applications and services (e.g., Jenkins, GitLab CI/CD). Implement strategies for continuous integration, continuous delivery, and continuous deployment, including blue/green deployments, canary releases, and rollbacks. Integrate security scanning tools (SAST, DAST, SCA) and quality gates into CI/CD pipelines. Drive the adoption of best practices for version control (Git), branching strategies, and pull request workflows. Automate testing, build, deployment, and release processes to accelerate software delivery and improve reliability. DevOps & Automation Champion a DevOps culture within the organization, promoting collaboration, shared responsibility, and continuous improvement. Develop and maintain automation scripts and tools (e.g., Python, Bash, Go) to streamline operational tasks and reduce manual effort. Implement robust monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK Stack, Datadog, CloudWatch) to ensure platform health and performance. Collaboration & Mentorship Collaborate closely with development teams, product managers, and other stakeholders to understand requirements and deliver effective solutions. Provide technical guidance and mentorship to junior engineers, fostering their growth and development. Participate in code reviews, design discussions, and architectural decision-making processes. Stay up to date with emerging AWS services, cloud technologies, and industry best practices. Security & Compliance Ensure that security and compliance are built into the product roadmap from the outset. Work with security teams to ensure any changes or new deployments meet data protection and compliance requirements. Add all security recommendations from our vendor and hosting providers to the backlog. Governance & Continuous Improvement Define KPIs and success metrics to measure platform effectiveness. Create and maintain comprehensive documentation for our infrastructure, processes, and tools in Confluence. Continuously assess and refine existing products to support evolving business needs. Does this sound like you You'll have a Bachelor's degree in business, computer science, computer engineering, system analysis or a related field of study, or equivalent. You'll have significant experience (10+ years ideally) in a DevOps, Site Reliability Engineering (SRE), or Platform Engineering role. We also expect you have around a minimum of 7+ years of hands on experience designing, deploying, and managing complex infrastructure on AWS. You'll be able to demonstrate strong experience building and maintaining robust CI/CD pipelines from scratch and optimizing existing ones. Expert level proficiency with AWS services and a deep understanding of cloud architecture principles. Extensive experience with Infrastructure as Code (IaC) tools, primarily Terraform. Strong expertise with Jenkins and GitLab CI/CD. Proficiency in at least one scripting language (e.g., Python, Bash, Go). Solid understanding of containerization technologies (Docker) and orchestration (Kubernetes/EKS/ECS). Experience with monitoring and logging tools (DataDog preferred). Strong understanding of networking concepts (TCP/IP, DNS, VPN, Load Balancing) and security best practices in the cloud. Familiarity with database technologies (relational and NoSQL). Experience with Git and collaborative development workflows. Familiarity with other Agile frameworks like Scrum, in addition to Kanban. Poise and ability to act calmly and competently in high pressure, demanding situations. The ability to manage ambiguity to clarify requirements and objectives to ensure successful outcomes. High degree of initiative, dependability, and ability to work with minimal supervision while being resilient to change. Motivation and drive to achieve long term business outcomes. Ability to work effectively in a team environment and contribute to the overall engineering and company wide technical direction. Excellent written, verbal, communication and presentation skills with the ability to articulate new ideas and concepts to technical and nontechnical audiences. What's in it for you We'll also equip you with a benefits package that includes: Competitive package including an annual bonus 25 days holiday plus bank holidays Enhanced contributory pension and life assurance Flexible hybrid working (2 3 days in Watford) Save as you earn scheme Colleague discount Discount platform including savings and cash back at numerous retailers, savings on gym membership, cycle to work scheme Well being strategy with Employee Assistance Programme, financial education & loans, and access to parental, menopause, and fertility support. We're a Disability Confident Employer and committed to building a diverse workforce that reflects the communities we serve. We welcome applications from disabled people and are committed to providing an accessible recruitment process and workplace for everyone. If you require any support or reasonable adjustments, please let us know.
ML/AI Platform Engineer
Monzo
Machine Learning Platform Engineering We're on a mission to make money work for everyone, and our Machine Learning Platform team builds the systems that help teams across Monzo train, evaluate, deploy, and serve ML models and AI features safely and reliably. We work on backend services, Python libraries, model lifecycle tooling, evaluation workflows, and low latency serving systems. Our users are internal ML engineers, scientists, and product teams building with ML and LLMs. The work matters because machine learning powers many important decisions and experiences at Monzo, from fraud checks and credit decisions to customer operations. Location & Compensation London, UK (remote within the UK available). Salary £85,000 - £110,000 plus incentive awards tied to performance. Benefits include relocation support, visa sponsorship, flexible working hours, learning budget, and a full list of benefits. Responsibilities Develop backend services, platform APIs, and production systems using Go. Write Python libraries, workflows, and tooling used by our ML engineers and scientists. Implement feature platforms and data workflows with Chronon, Feast, and DataHub. Build model training pipelines and experiment tracking using Vertex AI and Comet. Maintain AI observability, evaluation, and tracing using Langfuse. Deploy and maintain real time serving on AWS and batch compute on GCP, including BigQuery data warehousing. Qualifications Strong backend engineering background with experience in Go and Python. Experience with ML or AI platforms, including pipelines, feature stores, model serving, experiment tracking, or LLM tooling. Designed and operated distributed systems that handle scale, concurrency, and failure. Focus on developer experience and removing friction for internal teams. Comfortable with ambiguity and ability to shape a platform as it grows. Experience with strongly typed languages and writing backend software. Curiosity about system behavior in production, including reliability, latency, quality, safety, and operational risk. This Might NOT Be the Right Fit If Your background is predominantly DevOps, SRE, or infrastructure operations. You are focused on data science or modelling rather than platform engineering. You have shipped AI product features but have not worked on the platform side (serving, evaluation, model lifecycle). Benefits Competitive salary £85,000 - £110,000 plus incentive awards. Relocation assistance to the UK and visa sponsorship. Flexible working hours and trust to work the hours that suit you. Annual learning budget of £1,000 for books, training courses, and conferences. Additional benefits available - see our full benefits list. Equal Opportunity Employer Diversity and inclusion are a priority for us. We are an equal opportunity employer and will consider all applicants without regard to age, ethnicity, religion, sex, sexual orientation, gender identity, family or parental status, national origin, veteran status, neurodiversity, or disability status.
27/06/2026
Full time
Machine Learning Platform Engineering We're on a mission to make money work for everyone, and our Machine Learning Platform team builds the systems that help teams across Monzo train, evaluate, deploy, and serve ML models and AI features safely and reliably. We work on backend services, Python libraries, model lifecycle tooling, evaluation workflows, and low latency serving systems. Our users are internal ML engineers, scientists, and product teams building with ML and LLMs. The work matters because machine learning powers many important decisions and experiences at Monzo, from fraud checks and credit decisions to customer operations. Location & Compensation London, UK (remote within the UK available). Salary £85,000 - £110,000 plus incentive awards tied to performance. Benefits include relocation support, visa sponsorship, flexible working hours, learning budget, and a full list of benefits. Responsibilities Develop backend services, platform APIs, and production systems using Go. Write Python libraries, workflows, and tooling used by our ML engineers and scientists. Implement feature platforms and data workflows with Chronon, Feast, and DataHub. Build model training pipelines and experiment tracking using Vertex AI and Comet. Maintain AI observability, evaluation, and tracing using Langfuse. Deploy and maintain real time serving on AWS and batch compute on GCP, including BigQuery data warehousing. Qualifications Strong backend engineering background with experience in Go and Python. Experience with ML or AI platforms, including pipelines, feature stores, model serving, experiment tracking, or LLM tooling. Designed and operated distributed systems that handle scale, concurrency, and failure. Focus on developer experience and removing friction for internal teams. Comfortable with ambiguity and ability to shape a platform as it grows. Experience with strongly typed languages and writing backend software. Curiosity about system behavior in production, including reliability, latency, quality, safety, and operational risk. This Might NOT Be the Right Fit If Your background is predominantly DevOps, SRE, or infrastructure operations. You are focused on data science or modelling rather than platform engineering. You have shipped AI product features but have not worked on the platform side (serving, evaluation, model lifecycle). Benefits Competitive salary £85,000 - £110,000 plus incentive awards. Relocation assistance to the UK and visa sponsorship. Flexible working hours and trust to work the hours that suit you. Annual learning budget of £1,000 for books, training courses, and conferences. Additional benefits available - see our full benefits list. Equal Opportunity Employer Diversity and inclusion are a priority for us. We are an equal opportunity employer and will consider all applicants without regard to age, ethnicity, religion, sex, sexual orientation, gender identity, family or parental status, national origin, veteran status, neurodiversity, or disability status.
Platform Engineering Manager, SRE & Developer Experience
Marks & Spencer Plc City Of Westminster, London
Marks & Spencer Plc is seeking a Software Engineering Manager to lead their engineering team. In this role, you will streamline development for over 1000 engineers and manage the SRE function. Your mission will include enhancing the developer experience with reusable tooling and pipelines. The ideal candidate will have extensive software engineering experience and a proven track record in leading teams to deliver high-quality software. Join M&S as they innovate at the forefront of retail technology.
27/06/2026
Full time
Marks & Spencer Plc is seeking a Software Engineering Manager to lead their engineering team. In this role, you will streamline development for over 1000 engineers and manage the SRE function. Your mission will include enhancing the developer experience with reusable tooling and pipelines. The ideal candidate will have extensive software engineering experience and a proven track record in leading teams to deliver high-quality software. Join M&S as they innovate at the forefront of retail technology.
Platform Engineer - Engine by Starling
Onyx-Conseil
Engine by Starling is on a mission to partner with leading banks worldwide to build rapid growth businesses on our technology. Engine is Starling's SaaS business, the technology that powers Starling. Two years ago we split out as a separate business. Starling has seen exceptional growth and success, thanks to our modern, in-house built technology. The SaaS platform is now available to banks and financial institutions globally, enabling them to benefit from innovative digital features and efficient back office processes. Hybrid Working We have a hybrid approach to working here at Engine. Our preference is that you're located within a commutable distance of one of our offices so we can interact and collaborate in person. About Engineering at Engine by Starling The Cross Cutting Engineering team is the backbone of our innovation. We build and maintain the reliable, scalable, and maintainable infrastructure and tooling that powers our entire software delivery pipeline-from the first line of code to seamless production deployment and ongoing operations. We own the lifecycle of our features, tackling complex challenges with a first principles approach and fostering a multi disciplinary environment where you're encouraged to explore and contribute across the platform. Platform Engineer As a Platform Engineer at Engine, you'll be at the forefront of building and scaling our cutting edge cloud native banking platform across multiple global cloud providers and regions. We are looking for engineers with a strong SRE mindset, who embrace ownership of the entire software delivery pipeline, and are passionate about building internal tooling that empowers our technology teams. What you'll get to do Building and Scaling Cloud Infrastructure: design, build, and maintain our cloud infrastructure across multiple providers (including GCP) and regions, ensuring scalability, reliability, and security. Building on Google Cloud: contribute to the build out and optimisation of our core "Engine" on Google Cloud Platform using Java and Kubernetes. Scaling our SaaS Release Tooling: enhance and improve our multi tenant, multi region SaaS release and continuous deployment systems using Java, Golang, and Terraform. Empowering Developers: develop and maintain internal tooling using Java and Golang to improve developer experience and on call efficiency. Automating Compliance and Security: build automation solutions in Golang to enforce compliance and security controls across our platform. Driving Efficiency: optimise the performance and reliability of our cloud environment with a strong focus on cost effectiveness. Embracing Automation: identify and implement automation opportunities to minimise manual processes across the platform lifecycle. Ensuring Security: implement and maintain robust security practices to protect our platform and customer data. Championing Best Practices: stay abreast of new technologies and industry changes, particularly in SRE practices and deployment automation, and share your knowledge with the team. Maintaining Compliance: contribute to ensuring our platform adheres to relevant industry standards such as ISO27001, SOC2, and PCI DSS. Collaborating and Learning: work closely with cross functional teams, share your expertise, and contribute to our vibrant learning culture. Aiming for Greatness: strive for excellence in everything you do, maintaining a curious and inquisitive mindset. Documenting Solutions: design and document scalable internal tooling clearly and comprehensively. Taking Ownership: own features and improvements throughout their entire lifecycle. Participate in on call: the option to join our on call rota (not mandatory!) to deal with interesting technical issues and gain deep insights into our platform's behaviour. Your place within the team will depend on your individual strengths and interests.
27/06/2026
Full time
Engine by Starling is on a mission to partner with leading banks worldwide to build rapid growth businesses on our technology. Engine is Starling's SaaS business, the technology that powers Starling. Two years ago we split out as a separate business. Starling has seen exceptional growth and success, thanks to our modern, in-house built technology. The SaaS platform is now available to banks and financial institutions globally, enabling them to benefit from innovative digital features and efficient back office processes. Hybrid Working We have a hybrid approach to working here at Engine. Our preference is that you're located within a commutable distance of one of our offices so we can interact and collaborate in person. About Engineering at Engine by Starling The Cross Cutting Engineering team is the backbone of our innovation. We build and maintain the reliable, scalable, and maintainable infrastructure and tooling that powers our entire software delivery pipeline-from the first line of code to seamless production deployment and ongoing operations. We own the lifecycle of our features, tackling complex challenges with a first principles approach and fostering a multi disciplinary environment where you're encouraged to explore and contribute across the platform. Platform Engineer As a Platform Engineer at Engine, you'll be at the forefront of building and scaling our cutting edge cloud native banking platform across multiple global cloud providers and regions. We are looking for engineers with a strong SRE mindset, who embrace ownership of the entire software delivery pipeline, and are passionate about building internal tooling that empowers our technology teams. What you'll get to do Building and Scaling Cloud Infrastructure: design, build, and maintain our cloud infrastructure across multiple providers (including GCP) and regions, ensuring scalability, reliability, and security. Building on Google Cloud: contribute to the build out and optimisation of our core "Engine" on Google Cloud Platform using Java and Kubernetes. Scaling our SaaS Release Tooling: enhance and improve our multi tenant, multi region SaaS release and continuous deployment systems using Java, Golang, and Terraform. Empowering Developers: develop and maintain internal tooling using Java and Golang to improve developer experience and on call efficiency. Automating Compliance and Security: build automation solutions in Golang to enforce compliance and security controls across our platform. Driving Efficiency: optimise the performance and reliability of our cloud environment with a strong focus on cost effectiveness. Embracing Automation: identify and implement automation opportunities to minimise manual processes across the platform lifecycle. Ensuring Security: implement and maintain robust security practices to protect our platform and customer data. Championing Best Practices: stay abreast of new technologies and industry changes, particularly in SRE practices and deployment automation, and share your knowledge with the team. Maintaining Compliance: contribute to ensuring our platform adheres to relevant industry standards such as ISO27001, SOC2, and PCI DSS. Collaborating and Learning: work closely with cross functional teams, share your expertise, and contribute to our vibrant learning culture. Aiming for Greatness: strive for excellence in everything you do, maintaining a curious and inquisitive mindset. Documenting Solutions: design and document scalable internal tooling clearly and comprehensively. Taking Ownership: own features and improvements throughout their entire lifecycle. Participate in on call: the option to join our on call rota (not mandatory!) to deal with interesting technical issues and gain deep insights into our platform's behaviour. Your place within the team will depend on your individual strengths and interests.
Linux Technical Support Engineer - Production Systems
Linuxcareers
Linux Technical Support Engineering Jobs Technical support engineers (TSEs) provide expert assistance to customers of software and infrastructure products. At Linux-centric companies (from kernel-adjacent distributions to observability platforms), TSEs need genuine Linux expertise to diagnose complex production issues and guide customers to resolution. The role is customer-facing with deep technical content, and many TSEs progress into SRE, solutions architecture, or product engineering.
27/06/2026
Full time
Linux Technical Support Engineering Jobs Technical support engineers (TSEs) provide expert assistance to customers of software and infrastructure products. At Linux-centric companies (from kernel-adjacent distributions to observability platforms), TSEs need genuine Linux expertise to diagnose complex production issues and guide customers to resolution. The role is customer-facing with deep technical content, and many TSEs progress into SRE, solutions architecture, or product engineering.
Technical Support Engineering
Linuxcareers
Linux Technical Support Engineering Jobs Technical support engineers (TSEs) provide expert assistance to customers of software and infrastructure products. At Linux-centric companies (from kernel-adjacent distributions to observability platforms), TSEs need genuine Linux expertise to diagnose complex production issues and guide customers to resolution. The role is customer-facing with deep technical content, and many TSEs progress into SRE, solutions architecture, or product engineering.
27/06/2026
Full time
Linux Technical Support Engineering Jobs Technical support engineers (TSEs) provide expert assistance to customers of software and infrastructure products. At Linux-centric companies (from kernel-adjacent distributions to observability platforms), TSEs need genuine Linux expertise to diagnose complex production issues and guide customers to resolution. The role is customer-facing with deep technical content, and many TSEs progress into SRE, solutions architecture, or product engineering.
Senior Software Engineer, Substrate
Palantir
A World-Changing Company Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role Substrate is the team responsible for Palantir's core production infrastructure - 100s of K8s clusters - from on-prem to the major cloud hyperscalers, whether they are internet-connected or air-gapped, small hardware footprint or large. As a Senior Software Engineer on Substrate, you will design and build Palantir's managed Kubernetes product offerings across all these environments. You and your team will be responsible for bootstrapping and operating the entire fleet of K8s clusters with zero manual steps by building industry leading tooling and contributing to core CNCF components. You will also be responsible for ensuring scale, stability and security across a matrix of compliance regimes and hosting infrastructure types. Your team culture emphasizes engineering rigor and operational excellence at scale. This means issues in production should be pre-empted and deeply root-caused, and investments in automation and self-healing systems are key. If you're excited about infrastructure at scale and working with Kubernetes, this is the right role for you. Core Responsibilities Deliver a container runtime to challenging new environment types - new clouds, on premise, edge devices Build automation and establish standards for operating K8s securely at scale with zero manual ops overhead Drive innovation through adoption of novel K8s features and CNCF tools, making upstream contributions as needed Design the next generation of Palantir's infrastructure through a deep understanding of internal systems and CNCF standards What We Value Systems programming experience with strong proficiency in golang, C/C++ or equivalent Working knowledge or hands on experience of infrastructure automation tools such as Terraform, ansible, puppet or K8s operators, and competent coding in Go, Java, or equivalent for the purposes of automation or scripting Deep familiarity with hardware and OS configurations, diagnostic tooling, networking nuts and bolts Deep familiarity with containers (Docker) and orchestration (Kubernetes) at scale Experience working with a cloud provider (AWS/Azure/GCE), or sysadmin/SRE experience in data centers Experience designing, building, and operating high-scale observability or infrastructure systems Working knowledge of networking fundamentals, experience with CNIs or cloud networking infrastructure preferred What We Require 4+ years of professional software development experience on core infrastructure with emphasis on operational excellence 2+ years of experience contributing to the system design or architecture (architecture, design patterns, reliability and scaling) of new and existing systems Bachelor's degree in Computer Science or equivalent Life at Palantir We want every Palantirian to achieve their best outcomes, that's why we celebrate individuals' strengths, skills, and interests, from your first interview to your longterm growth, rather than rely on traditional career ladders. Paying attention to the needs of our community enables us to optimize our opportunities to grow and helps ensure many pathways to success at Palantir. Promoting health and well-being across all areas of Palantirians' lives is just one of the ways we're investing in our community. Learn more at Life at Palantir and note that our offerings may vary by region. In keeping consistent with Palantir's values and culture, we believe employees are "better together" and in-person work affords the opportunity for more creative outcomes. Therefore, we encourage employees to work from our offices to foster connectivity and innovation. Many teams do offer hybrid options (WFH a day or two a week), allowing our employees to strike the right trade-off for their personal productivity. Based on business need, there are a few roles that allow for "Remote" work on an exceptional basis. If you are applying for one of these roles, you must work from the city and or country in which you are employed. If the posting is specified as Onsite, you are required to work from an office. If you want to empower the world's most important institutions, you belong here. Palantir values excellence regardless of background. We are committed to making the application and hiring process accessible to everyone and will provide a reasonable accommodation for those living with a disability. If you need an accommodation for the application or hiring process, please reach out and let us know how we can help.
27/06/2026
Full time
A World-Changing Company Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role Substrate is the team responsible for Palantir's core production infrastructure - 100s of K8s clusters - from on-prem to the major cloud hyperscalers, whether they are internet-connected or air-gapped, small hardware footprint or large. As a Senior Software Engineer on Substrate, you will design and build Palantir's managed Kubernetes product offerings across all these environments. You and your team will be responsible for bootstrapping and operating the entire fleet of K8s clusters with zero manual steps by building industry leading tooling and contributing to core CNCF components. You will also be responsible for ensuring scale, stability and security across a matrix of compliance regimes and hosting infrastructure types. Your team culture emphasizes engineering rigor and operational excellence at scale. This means issues in production should be pre-empted and deeply root-caused, and investments in automation and self-healing systems are key. If you're excited about infrastructure at scale and working with Kubernetes, this is the right role for you. Core Responsibilities Deliver a container runtime to challenging new environment types - new clouds, on premise, edge devices Build automation and establish standards for operating K8s securely at scale with zero manual ops overhead Drive innovation through adoption of novel K8s features and CNCF tools, making upstream contributions as needed Design the next generation of Palantir's infrastructure through a deep understanding of internal systems and CNCF standards What We Value Systems programming experience with strong proficiency in golang, C/C++ or equivalent Working knowledge or hands on experience of infrastructure automation tools such as Terraform, ansible, puppet or K8s operators, and competent coding in Go, Java, or equivalent for the purposes of automation or scripting Deep familiarity with hardware and OS configurations, diagnostic tooling, networking nuts and bolts Deep familiarity with containers (Docker) and orchestration (Kubernetes) at scale Experience working with a cloud provider (AWS/Azure/GCE), or sysadmin/SRE experience in data centers Experience designing, building, and operating high-scale observability or infrastructure systems Working knowledge of networking fundamentals, experience with CNIs or cloud networking infrastructure preferred What We Require 4+ years of professional software development experience on core infrastructure with emphasis on operational excellence 2+ years of experience contributing to the system design or architecture (architecture, design patterns, reliability and scaling) of new and existing systems Bachelor's degree in Computer Science or equivalent Life at Palantir We want every Palantirian to achieve their best outcomes, that's why we celebrate individuals' strengths, skills, and interests, from your first interview to your longterm growth, rather than rely on traditional career ladders. Paying attention to the needs of our community enables us to optimize our opportunities to grow and helps ensure many pathways to success at Palantir. Promoting health and well-being across all areas of Palantirians' lives is just one of the ways we're investing in our community. Learn more at Life at Palantir and note that our offerings may vary by region. In keeping consistent with Palantir's values and culture, we believe employees are "better together" and in-person work affords the opportunity for more creative outcomes. Therefore, we encourage employees to work from our offices to foster connectivity and innovation. Many teams do offer hybrid options (WFH a day or two a week), allowing our employees to strike the right trade-off for their personal productivity. Based on business need, there are a few roles that allow for "Remote" work on an exceptional basis. If you are applying for one of these roles, you must work from the city and or country in which you are employed. If the posting is specified as Onsite, you are required to work from an office. If you want to empower the world's most important institutions, you belong here. Palantir values excellence regardless of background. We are committed to making the application and hiring process accessible to everyone and will provide a reasonable accommodation for those living with a disability. If you need an accommodation for the application or hiring process, please reach out and let us know how we can help.
SF Partners
Platform Engineer
SF Partners
Platform Engineer with key skills in Linux, AWS, Ansible and Kubernetes is sought by an AI & data software house based near Birmingham. Working within national infrastructure this Platform Engineer will be working with a close-knit technical team to monitor, manage and improve business critical applications and infrastructure with the aim of facilitating improvements in application deployment & scalability. This role would suit a mid-level (2-4 years' experience) Platform Engineer with a solid background in software engineering and infrastructure who can bring experience working with the latest automation and config management tooling to create a fully automated deployment environment. In return this Platform Engineer can expect a dynamic, engaging, R&D driven culture with extensive progression opportunities and the chance to own the platform functionality of this high growth business. This Platform Engineer based near Birmingham should have most of the following key skills: - IAC experience - Terraform, Ansible, Redhat etc - Both on-premise and cloud (AWS, Azure) exposure - Strong Kubernetes experience - Experience delivering new tooling for infrastructures - Linux/ GNU expertise would be a plus - Database performance management (MongoDB, PostgreSQL etc) - A background in software engineering would be extremely useful - A personality who feels comfortable working in a fast paced, dynamic environment This Platform Engineer based near Birmingham will receive: - Starting salary of up to £80,000 - Long term remote working (one day a month on a Midlands based site) - Bi-Annual salary reviews - Clear progression pathway - Generous private pension scheme - 10% Bonus scheme - Private healthcare - Training budget & time allocation - 25 days holiday plus bank holiday - Choice of technology So if you are a Platform Engineer and like the idea of joining a market leading company that offers excellent project ownership skills within a collaborative, autonomous environment please apply now to be considered. Platform Engineer Birmingham Linux, Java, Python, C++, MongoDB, SRE, automation, node, infrastructure, Kubernetes, ansible, AWS, Terraform, ansible
27/06/2026
Full time
Platform Engineer with key skills in Linux, AWS, Ansible and Kubernetes is sought by an AI & data software house based near Birmingham. Working within national infrastructure this Platform Engineer will be working with a close-knit technical team to monitor, manage and improve business critical applications and infrastructure with the aim of facilitating improvements in application deployment & scalability. This role would suit a mid-level (2-4 years' experience) Platform Engineer with a solid background in software engineering and infrastructure who can bring experience working with the latest automation and config management tooling to create a fully automated deployment environment. In return this Platform Engineer can expect a dynamic, engaging, R&D driven culture with extensive progression opportunities and the chance to own the platform functionality of this high growth business. This Platform Engineer based near Birmingham should have most of the following key skills: - IAC experience - Terraform, Ansible, Redhat etc - Both on-premise and cloud (AWS, Azure) exposure - Strong Kubernetes experience - Experience delivering new tooling for infrastructures - Linux/ GNU expertise would be a plus - Database performance management (MongoDB, PostgreSQL etc) - A background in software engineering would be extremely useful - A personality who feels comfortable working in a fast paced, dynamic environment This Platform Engineer based near Birmingham will receive: - Starting salary of up to £80,000 - Long term remote working (one day a month on a Midlands based site) - Bi-Annual salary reviews - Clear progression pathway - Generous private pension scheme - 10% Bonus scheme - Private healthcare - Training budget & time allocation - 25 days holiday plus bank holiday - Choice of technology So if you are a Platform Engineer and like the idea of joining a market leading company that offers excellent project ownership skills within a collaborative, autonomous environment please apply now to be considered. Platform Engineer Birmingham Linux, Java, Python, C++, MongoDB, SRE, automation, node, infrastructure, Kubernetes, ansible, AWS, Terraform, ansible
Senior Software Engineer / Reliability Engineering - Real-time Data
BLOOMBERG L.P.
Senior Software Engineer / Reliability Engineering - Real-time Data Location: London Business Area: Engineering and CTO Ref #: Description & Requirements Our department is responsible for efficiently distributing financial data from its source to interested users all around the world. This includes (for example) stock prices or foreign exchange rates. Data can either be served in response to a request or streamed in real time. The group owns: The distribution software and infrastructure A range of different sources of data Supporting services to administer and manage the system, including permissioning and metering The team is also responsible for the Enterprise endpoint ("B-PIPE"), which allows end-users to programmatically consume data via our SDK. Data is also available through the Bloomberg Terminal and Microsoft Excel. The main challenge faced by the group is one of scale. Data is sourced from more than 370 global exchanges, with a combined volume in excess of 60 billion messages each day. We deliver this data to hundreds of thousands of terminals and thousands of B-PIPEs. Handling this volume requires significant infrastructure, we manage multiple clusters in our main data centres, as well as a network of many thousands of servers around the world. Group Overview The RD Reliability Engineering group comprises three sub-teams located in Tokyo, London, and New York, providing follow-the-sun support. Our mission is to ensure systems are reliable, scalable, and observable through software engineering, while continuously improving how systems behave under load and failure conditions. We work in an outcome-driven model, focusing on measurable improvements in availability, latency, capacity, and recovery. Our goal is to ensure systems meet defined service level objectives while minimising manual operational effort through automation and software solutions. The systems we support must behave predictably under extreme load, recover quickly from failures, and continue to evolve without compromising stability - these are the core challenges we solve. London Team Focus - Availability & Resiliency The London team plays a key role in ensuring the availability and resiliency of RD infrastructure globally. We focus on: Detecting and preventing failures across large-scale distributed systems Ensuring infrastructure demonstrates sufficient capacity and failover capability during site-loss scenarios Reducing time to detect, diagnose, and recover from incidents Ensuring systems behave predictably under both normal and adverse conditions This role provides the opportunity to influence how reliability is engineered across the platform, working closely with teams globally to improve system behaviour and design. What You'll Do Build and maintain production-grade software supporting Bloomberg's global distribution infrastructure Design and implement scalable, fault-tolerant systems with a focus on observability, performance, and automation Analyse system behaviour under real-world and failure scenarios to validate capacity, failover, and recovery meet resilience objectives Identify bottlenecks, scaling limits, and reliability risks across distributed systems Improve detection, diagnosis, and prevention of production issues Build tools and frameworks to increase system visibility and reduce time to detect and resolve incidents Automate operational workflows to reduce manual effort and improve system reliability Partner with application and infrastructure teams to improve system design, resilience, and performance Contribute to design discussions, incident reviews, and reliability improvements across the platform Systems You'll Work With Configuration systems serving thousands of servers across the global network Service discovery and clustering systems for distributed infrastructure Monitoring and observability frameworks for large-scale server estates Tooling for diagnosing data quality and distribution issues Ownership of systems may evolve over time as the team focuses on areas of highest impact. What Success Looks Like Systems consistently meet defined reliability, latency, and capacity objectives Issues are detected and mitigated before significant customer impact Systems are demonstrably resilient, with proven failover capability and sufficient capacity under failure conditions Operational processes are automated and scalable Reliability is achieved through engineering improvements rather than manual intervention What We're Looking For We're not a traditional SRE team. We engineer reliability through software, building solutions that automate operations and improve system resilience by design. Experience with an object-oriented programming language (preferably Python or C++) Strong focus on building reliable, observable distributed systems Experience working with SLOs, SLIs, and production reliability metrics Proven ability to triage and resolve live production problems A mindset focused on automation and reducing operational toil A strength in collaborating within an inclusive team environment The ability to work across departments and build strong relationships with both technical and non-technical partners Why Join Us You'll work on systems that sit at the core of Bloomberg's real-time data platform, operating at global scale and under demanding performance and reliability requirements. This is an opportunity to: Solve complex distributed systems problems with real-world impact Influence how reliability is engineered across a critical platform Work with teams across multiple regions and technical domains Build systems that are resilient by design and operate at massive scale If indicated, please note that years of experience are a guide; we will consider applications from all candidates who can demonstrate the skills necessary for the role. Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
26/06/2026
Full time
Senior Software Engineer / Reliability Engineering - Real-time Data Location: London Business Area: Engineering and CTO Ref #: Description & Requirements Our department is responsible for efficiently distributing financial data from its source to interested users all around the world. This includes (for example) stock prices or foreign exchange rates. Data can either be served in response to a request or streamed in real time. The group owns: The distribution software and infrastructure A range of different sources of data Supporting services to administer and manage the system, including permissioning and metering The team is also responsible for the Enterprise endpoint ("B-PIPE"), which allows end-users to programmatically consume data via our SDK. Data is also available through the Bloomberg Terminal and Microsoft Excel. The main challenge faced by the group is one of scale. Data is sourced from more than 370 global exchanges, with a combined volume in excess of 60 billion messages each day. We deliver this data to hundreds of thousands of terminals and thousands of B-PIPEs. Handling this volume requires significant infrastructure, we manage multiple clusters in our main data centres, as well as a network of many thousands of servers around the world. Group Overview The RD Reliability Engineering group comprises three sub-teams located in Tokyo, London, and New York, providing follow-the-sun support. Our mission is to ensure systems are reliable, scalable, and observable through software engineering, while continuously improving how systems behave under load and failure conditions. We work in an outcome-driven model, focusing on measurable improvements in availability, latency, capacity, and recovery. Our goal is to ensure systems meet defined service level objectives while minimising manual operational effort through automation and software solutions. The systems we support must behave predictably under extreme load, recover quickly from failures, and continue to evolve without compromising stability - these are the core challenges we solve. London Team Focus - Availability & Resiliency The London team plays a key role in ensuring the availability and resiliency of RD infrastructure globally. We focus on: Detecting and preventing failures across large-scale distributed systems Ensuring infrastructure demonstrates sufficient capacity and failover capability during site-loss scenarios Reducing time to detect, diagnose, and recover from incidents Ensuring systems behave predictably under both normal and adverse conditions This role provides the opportunity to influence how reliability is engineered across the platform, working closely with teams globally to improve system behaviour and design. What You'll Do Build and maintain production-grade software supporting Bloomberg's global distribution infrastructure Design and implement scalable, fault-tolerant systems with a focus on observability, performance, and automation Analyse system behaviour under real-world and failure scenarios to validate capacity, failover, and recovery meet resilience objectives Identify bottlenecks, scaling limits, and reliability risks across distributed systems Improve detection, diagnosis, and prevention of production issues Build tools and frameworks to increase system visibility and reduce time to detect and resolve incidents Automate operational workflows to reduce manual effort and improve system reliability Partner with application and infrastructure teams to improve system design, resilience, and performance Contribute to design discussions, incident reviews, and reliability improvements across the platform Systems You'll Work With Configuration systems serving thousands of servers across the global network Service discovery and clustering systems for distributed infrastructure Monitoring and observability frameworks for large-scale server estates Tooling for diagnosing data quality and distribution issues Ownership of systems may evolve over time as the team focuses on areas of highest impact. What Success Looks Like Systems consistently meet defined reliability, latency, and capacity objectives Issues are detected and mitigated before significant customer impact Systems are demonstrably resilient, with proven failover capability and sufficient capacity under failure conditions Operational processes are automated and scalable Reliability is achieved through engineering improvements rather than manual intervention What We're Looking For We're not a traditional SRE team. We engineer reliability through software, building solutions that automate operations and improve system resilience by design. Experience with an object-oriented programming language (preferably Python or C++) Strong focus on building reliable, observable distributed systems Experience working with SLOs, SLIs, and production reliability metrics Proven ability to triage and resolve live production problems A mindset focused on automation and reducing operational toil A strength in collaborating within an inclusive team environment The ability to work across departments and build strong relationships with both technical and non-technical partners Why Join Us You'll work on systems that sit at the core of Bloomberg's real-time data platform, operating at global scale and under demanding performance and reliability requirements. This is an opportunity to: Solve complex distributed systems problems with real-world impact Influence how reliability is engineered across a critical platform Work with teams across multiple regions and technical domains Build systems that are resilient by design and operate at massive scale If indicated, please note that years of experience are a guide; we will consider applications from all candidates who can demonstrate the skills necessary for the role. Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
Ordnance Survey
Support Engineer
Ordnance Survey Southampton, Hampshire
Support Engineer Full time Salary £43,918.00 - £51,238.00 (dependent on experience) Southampton, Hybrid Working Ordnance Survey (OS) is the national mapping agency for Great Britain, and a world-leading geospatial data and technology organisation. As a reliable partner to government, business and citizens across Britain and the world, OS helps its customers in virtually all sectors improve quality of life. OS expertise and data supports efficient public services and infrastructure, new technologies in transport and communications, national security and emergency services and exploring the great outdoors. By being at the forefront of geospatial capability for more than 230 years, we've built a reputation as the world's most inspiring and trusted geospatial partner. About the role You'll join one of our Technology Support Teams, which supports and maintains a range of internal ETL services that produce data for our customer delivery platforms. As a Support Engineer, you will help strengthen the resilience, reliability and quality of OS's ETL services by: Maintaining, managing and supporting software and infrastructure that underpin key business activities Designing and implementing improvements to service performance, including automating deployments, right-sizing systems, and extending monitoring and alerting capabilities Safeguarding critical services by continually assessing and improving observability, resilience and security Investigating and resolving root cause issues, identifying why failures occur, and working with subject matter experts when necessary to fully resolve problems Applying DevSecOps and SRE principles to improve both services and team capability Proactively delivering service improvements, ensuring they align with development cycles and Agile workflows You'll work closely with Engineers and wider stakeholders every day, collaborating within an Agile environment. To thrive in this role, you'll bring: Strong software development and operational knowledge Experience supporting scalable cloud-based services, ideally in Azure Effective communication and teamwork skills, enabling smooth collaboration across multiple teams What we're looking for You will need to demonstrate your track record against the following essential criteria: Experience in coding and engineering practices, for example, creating infrastructure-as-code, software engineering, DevOps methodologies or test automation Genuine passion for continually improving the reliability and stability of operational services Supporting and maintaining complex, cloud-based services working within an Agile framework Solid understanding of Site Reliability Engineering (SRE) and software engineering best practices Cloud technologies and best practice - ideally in Azure Infrastructure-as-Code - ideally using Bicep A track record of continually identifying and implementing service improvements or observability Experience of coaching and mentoring other team members and providing consultancy to other teams Additionally, you will provide expert technical consultancy to enable the business to successfully use IT systems, supporting Project Teams by advising on the performance, configuration and functionality of existing systems. You will ensure that new or updated systems can be effectively supported from day one. You will also play a key role in developing the technical capability of the team - mentoring, teaching and strengthening shared knowledge, skills and engineering practice. Here is a snapshot of the technologies that we use Azure Cloud (AppServices, Function Apps, DataFactory, Batch, EntraID, AzureML, etc.) Azure DevOps (Pipelines, Bicep templates, CLI, PowerShell, Artefacts) Python/C# Azure Databricks Git, YAML PowerBI ESRI ArcGIS, FME, QGIS, etc. Key details Closing date: Thursday 2nd July at 23:59 Interview Information: Candidates are required to complete a SOVA personality assessment and a practical exercise in advance of the interview. These will form part of the interview discussion.
26/06/2026
Full time
Support Engineer Full time Salary £43,918.00 - £51,238.00 (dependent on experience) Southampton, Hybrid Working Ordnance Survey (OS) is the national mapping agency for Great Britain, and a world-leading geospatial data and technology organisation. As a reliable partner to government, business and citizens across Britain and the world, OS helps its customers in virtually all sectors improve quality of life. OS expertise and data supports efficient public services and infrastructure, new technologies in transport and communications, national security and emergency services and exploring the great outdoors. By being at the forefront of geospatial capability for more than 230 years, we've built a reputation as the world's most inspiring and trusted geospatial partner. About the role You'll join one of our Technology Support Teams, which supports and maintains a range of internal ETL services that produce data for our customer delivery platforms. As a Support Engineer, you will help strengthen the resilience, reliability and quality of OS's ETL services by: Maintaining, managing and supporting software and infrastructure that underpin key business activities Designing and implementing improvements to service performance, including automating deployments, right-sizing systems, and extending monitoring and alerting capabilities Safeguarding critical services by continually assessing and improving observability, resilience and security Investigating and resolving root cause issues, identifying why failures occur, and working with subject matter experts when necessary to fully resolve problems Applying DevSecOps and SRE principles to improve both services and team capability Proactively delivering service improvements, ensuring they align with development cycles and Agile workflows You'll work closely with Engineers and wider stakeholders every day, collaborating within an Agile environment. To thrive in this role, you'll bring: Strong software development and operational knowledge Experience supporting scalable cloud-based services, ideally in Azure Effective communication and teamwork skills, enabling smooth collaboration across multiple teams What we're looking for You will need to demonstrate your track record against the following essential criteria: Experience in coding and engineering practices, for example, creating infrastructure-as-code, software engineering, DevOps methodologies or test automation Genuine passion for continually improving the reliability and stability of operational services Supporting and maintaining complex, cloud-based services working within an Agile framework Solid understanding of Site Reliability Engineering (SRE) and software engineering best practices Cloud technologies and best practice - ideally in Azure Infrastructure-as-Code - ideally using Bicep A track record of continually identifying and implementing service improvements or observability Experience of coaching and mentoring other team members and providing consultancy to other teams Additionally, you will provide expert technical consultancy to enable the business to successfully use IT systems, supporting Project Teams by advising on the performance, configuration and functionality of existing systems. You will ensure that new or updated systems can be effectively supported from day one. You will also play a key role in developing the technical capability of the team - mentoring, teaching and strengthening shared knowledge, skills and engineering practice. Here is a snapshot of the technologies that we use Azure Cloud (AppServices, Function Apps, DataFactory, Batch, EntraID, AzureML, etc.) Azure DevOps (Pipelines, Bicep templates, CLI, PowerShell, Artefacts) Python/C# Azure Databricks Git, YAML PowerBI ESRI ArcGIS, FME, QGIS, etc. Key details Closing date: Thursday 2nd July at 23:59 Interview Information: Candidates are required to complete a SOVA personality assessment and a practical exercise in advance of the interview. These will form part of the interview discussion.
Senior Lead Software Engineering - AI/ML Engineer
JPMorgan Chase & Co.
Join us to shape the future of AI/ML data platforms, where your expertise will help create resilient and market leading solutions. You will have the opportunity to collaborate with innovators across our global network, driving strategic change and mentoring others. We value your skills in solving complex challenges and fostering a culture of reliability and growth. At JPMorganChase, your impact will reach far beyond your team, opening doors to career advancement and meaningful relationships. As a Site Reliability Engineer in the AI/ML Data Platforms team, you will play a key role in building scalable and resilient data solutions. You will engage in root cause analysis, production changes, and operational improvements, while supporting budgetary and staffing decisions. You will mentor team members and partner with colleagues across the organization to drive strategic change. Your contributions will help shape a collaborative, innovative, and high performing team culture. Job Responsibilities Demonstrate expertise in application development and support across technologies such as Databricks, Snowflake, AWS, and Kubernetes Coordinate incident management coverage to ensure effective resolution of application issues Collaborate with cross functional teams to perform root cause analysis and implement production changes Develop and support AI/ML solutions for troubleshooting and incident resolution Mentor and guide team members to foster growth and drive strategic change Build and maintain scalable, resilient, and market leading data solutions Support budgetary and staffing considerations to optimize team performance Engage in operational stability and disaster recovery planning Implement automation tools to reduce toil and improve efficiency Ensure compliance with risk controls and company wide standards Build meaningful relationships across teams to achieve common goals Required Qualifications, Capabilities, and Skills Proficient in site reliability culture and principles, with experience implementing site reliability within applications or platforms Skilled in running production incident calls and managing incident resolution Experienced in observability, including white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk Strong understanding of SLI/SLO/SLA and Error Budgets Proficient in Python or PySpark for AI/ML modeling Able to reduce toil by building automation tools for repeated tasks Hands on experience in system design, resiliency, testing, operational stability, and disaster recovery Awareness of risk controls and compliance with departmental and company wide standards Collaborative team player with the ability to build meaningful relationships Preferred Qualifications, Capabilities, and Skills Experience in an SRE or production support role with AWS Cloud, Databricks, Snowflake, or similar technologies AWS and Databricks certifications Advanced knowledge of AI/ML troubleshooting and incident resolution Familiarity with budgetary and staffing optimization Experience mentoring and guiding team members Strong communication and interpersonal skills Demonstrated ability to drive strategic change across teams
26/06/2026
Full time
Join us to shape the future of AI/ML data platforms, where your expertise will help create resilient and market leading solutions. You will have the opportunity to collaborate with innovators across our global network, driving strategic change and mentoring others. We value your skills in solving complex challenges and fostering a culture of reliability and growth. At JPMorganChase, your impact will reach far beyond your team, opening doors to career advancement and meaningful relationships. As a Site Reliability Engineer in the AI/ML Data Platforms team, you will play a key role in building scalable and resilient data solutions. You will engage in root cause analysis, production changes, and operational improvements, while supporting budgetary and staffing decisions. You will mentor team members and partner with colleagues across the organization to drive strategic change. Your contributions will help shape a collaborative, innovative, and high performing team culture. Job Responsibilities Demonstrate expertise in application development and support across technologies such as Databricks, Snowflake, AWS, and Kubernetes Coordinate incident management coverage to ensure effective resolution of application issues Collaborate with cross functional teams to perform root cause analysis and implement production changes Develop and support AI/ML solutions for troubleshooting and incident resolution Mentor and guide team members to foster growth and drive strategic change Build and maintain scalable, resilient, and market leading data solutions Support budgetary and staffing considerations to optimize team performance Engage in operational stability and disaster recovery planning Implement automation tools to reduce toil and improve efficiency Ensure compliance with risk controls and company wide standards Build meaningful relationships across teams to achieve common goals Required Qualifications, Capabilities, and Skills Proficient in site reliability culture and principles, with experience implementing site reliability within applications or platforms Skilled in running production incident calls and managing incident resolution Experienced in observability, including white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk Strong understanding of SLI/SLO/SLA and Error Budgets Proficient in Python or PySpark for AI/ML modeling Able to reduce toil by building automation tools for repeated tasks Hands on experience in system design, resiliency, testing, operational stability, and disaster recovery Awareness of risk controls and compliance with departmental and company wide standards Collaborative team player with the ability to build meaningful relationships Preferred Qualifications, Capabilities, and Skills Experience in an SRE or production support role with AWS Cloud, Databricks, Snowflake, or similar technologies AWS and Databricks certifications Advanced knowledge of AI/ML troubleshooting and incident resolution Familiarity with budgetary and staffing optimization Experience mentoring and guiding team members Strong communication and interpersonal skills Demonstrated ability to drive strategic change across teams
London Stock Exchange Group
Technical Product Manager - Digital Platforms
London Stock Exchange Group
Technical Product Manager - Digital Platforms page is loaded Technical Product Manager - Digital Platformslocations: GBR-London-10 Paternoster Squaretime type: Full timeposted on: Posted Todayjob requisition id: R Role profile We have an exciting opportunity for a Technical Product Manager to join our dynamic Digital team within the London Stock Exchange Group.You will be joining Corporate Engineering - a team that is charged to build, release and continuously improve LSEG's Tier 1 corporate websites, turning design and ideas into components that will exceed our customers' expectations. Corporate websites act as the public face of LSEG and the ability to keep those fresh, relevant, safe, performant, and reliable is paramount to the firm.We are a mature, Agile development team who implement and maintain our own systems. Our analysts, developers and testers work alongside and multi-functionally to deliver high quality, multiple high value, high risk digital products and initiatives from conception to launch.This position is ideal for a collaborative and creative Technical Product Manager who facilitates the design, development maintenance of mid to large AEM and AWS implementations. What you will do: Deliver on Digital strategy, Capability projects and become a domain expert and authority for the applications and projects you work on. Conduct thorough business process analysis to understand existing workflows and systems, document current processes and systems, use the specification to create requirements for new processes, develop use cases, and lead requirement changes. Lead the analysis, design, and implementation of AEM sites and draft detailed business requirements use cases and systems interaction diagrams. Work with key business and technology partners to define and conceptualise product strategy and requirements, approved wireframes and mock-ups for solutions. Relentlessly share a comprehensive view of the required functionality to provide context and meaning to the software we are delivering, and to ensure end to end precision. Partner closely with Risk, Legal, Compliance, InfoSec, Architecture, and SRE to ensure platforms meet internal and external regulatory obligations Define and track product health metrics (availability, error rates, deployment risk, defect leakage, technical debt) Create a solid foundation for SaFe Agile development by providing clear direction, meticulous understanding, and strong purpose as standard. Maintain an appropriately prioritised backlog of development work - liaising with the development team and wider collaborators to ensure expectations of all interested parties are correctly handled. Lead requirement breakdown and estimation sessions with multiple development teams. Own and drive quality assurance and software testing efforts with a key focus on automated testing and DevOps enablement of the team. Develop and review test plans, test cases and test reports to provide insight into the quality of in-development products and measure effectiveness of current test strategies. Propose and implement software testing strategies for digital transformation initiatives. Own and drive software life cycle quality documentation. Chip in to project discoveries, business cases, kickoffs, prepare proposals and statements of work following company standards. The type of person we would love to meet: Product Manager with a complete understanding of the Software Delivery Lifecycle and delivery methodologies such as Scrum, SaFE and Kanban. Has a proven background in Product management delivery change to critical applications and or websites that are considered regulatory in nature and experience in high pressured and dynamic environments. Experience as a Product Manager or Product Owner in the digital domain delivering large scale websites and cloud projects. Experience in leading people, mentoring Product Owners and Business Analysts, and improving delivery practices. Experienced in managing senior stakeholders, managing conflicts, aligning priorities, and making clear, defensible decisions. Authoritative knowledge, and proven experience of cloud implementations. Knowledge of AWS is a plus. Knowledge of Adobe Experience Manager (AEM) is a plus. Proficient understanding of all website components and features like Digital Asset Management, workflows, site search, how to overlay components for customisations, integrations analytics. Proficient understanding and working experience of creating and maintaining functional specifications for new website templates, components including authoring widgets, custom widgets and workflow customisation/creation. Exposure working with content and authors. Experience in integrating websites with backend systems and data sources. Proficient understanding of cross-browser compatibility needs. Experience with Test Driven Development. Working with onshore and offshore teams. Champion usage of the Atlassian suite (JIRA, Confluence, bitbucket) or Asana. Proactive, assertive and pragmatic in a demanding and dynamic environment. Servant leaders, who put the team first. Nice to have: Ability to perform some development and maintenance tasks related to AEM platform code. Experience in installation and configuration of AEM, Groups and Permissions, Access Control Lists, Replication agents, service packs, dispatcher configuration. Experience in Java development, design, and coding (Javascript, HTML, CSS, jQuery, React js and web technologies) Experience in fix AEM Environment issues. UX UI knowledge. Career Stage: Manager London Stock Exchange Group (LSEG) Information: Join us and be part of a team that values innovation, quality, and continuous improvement. If you're ready to take your career to the next level and make a significant impact, we'd love to hear from you.LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.Our purpose is the foundation on which our culture is built. Our values of Integrity, Partnership , Excellence and Change underpin our purpose and set the standard for everything we do, every day. They go to the heart of who we are and guide our decision making and everyday actions.Working with us means that you will be part of a dynamic organisation of 25,000 people across 65 countries. However, we will value your individuality and enable you to bring your true self to work so you can help enrich our diverse workforce.We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone's race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.You will be part of a collaborative and creative culture where we encourage new ideas. We are committed to sustainability across our global business and we are proud to partner with our customers to help them meet their sustainability objectives. Our charity, the LSEG Foundation provides charitable grants to community groups that help people access economic opportunities and build a secure future with financial independence. Colleagues can get involved through fundraising and volunteering.LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid volunteering days and wellbeing initiatives.Please take a moment to read this carefully, as it describes what personal information London Stock Exchange Group (LSEG) (we) may hold about
26/06/2026
Full time
Technical Product Manager - Digital Platforms page is loaded Technical Product Manager - Digital Platformslocations: GBR-London-10 Paternoster Squaretime type: Full timeposted on: Posted Todayjob requisition id: R Role profile We have an exciting opportunity for a Technical Product Manager to join our dynamic Digital team within the London Stock Exchange Group.You will be joining Corporate Engineering - a team that is charged to build, release and continuously improve LSEG's Tier 1 corporate websites, turning design and ideas into components that will exceed our customers' expectations. Corporate websites act as the public face of LSEG and the ability to keep those fresh, relevant, safe, performant, and reliable is paramount to the firm.We are a mature, Agile development team who implement and maintain our own systems. Our analysts, developers and testers work alongside and multi-functionally to deliver high quality, multiple high value, high risk digital products and initiatives from conception to launch.This position is ideal for a collaborative and creative Technical Product Manager who facilitates the design, development maintenance of mid to large AEM and AWS implementations. What you will do: Deliver on Digital strategy, Capability projects and become a domain expert and authority for the applications and projects you work on. Conduct thorough business process analysis to understand existing workflows and systems, document current processes and systems, use the specification to create requirements for new processes, develop use cases, and lead requirement changes. Lead the analysis, design, and implementation of AEM sites and draft detailed business requirements use cases and systems interaction diagrams. Work with key business and technology partners to define and conceptualise product strategy and requirements, approved wireframes and mock-ups for solutions. Relentlessly share a comprehensive view of the required functionality to provide context and meaning to the software we are delivering, and to ensure end to end precision. Partner closely with Risk, Legal, Compliance, InfoSec, Architecture, and SRE to ensure platforms meet internal and external regulatory obligations Define and track product health metrics (availability, error rates, deployment risk, defect leakage, technical debt) Create a solid foundation for SaFe Agile development by providing clear direction, meticulous understanding, and strong purpose as standard. Maintain an appropriately prioritised backlog of development work - liaising with the development team and wider collaborators to ensure expectations of all interested parties are correctly handled. Lead requirement breakdown and estimation sessions with multiple development teams. Own and drive quality assurance and software testing efforts with a key focus on automated testing and DevOps enablement of the team. Develop and review test plans, test cases and test reports to provide insight into the quality of in-development products and measure effectiveness of current test strategies. Propose and implement software testing strategies for digital transformation initiatives. Own and drive software life cycle quality documentation. Chip in to project discoveries, business cases, kickoffs, prepare proposals and statements of work following company standards. The type of person we would love to meet: Product Manager with a complete understanding of the Software Delivery Lifecycle and delivery methodologies such as Scrum, SaFE and Kanban. Has a proven background in Product management delivery change to critical applications and or websites that are considered regulatory in nature and experience in high pressured and dynamic environments. Experience as a Product Manager or Product Owner in the digital domain delivering large scale websites and cloud projects. Experience in leading people, mentoring Product Owners and Business Analysts, and improving delivery practices. Experienced in managing senior stakeholders, managing conflicts, aligning priorities, and making clear, defensible decisions. Authoritative knowledge, and proven experience of cloud implementations. Knowledge of AWS is a plus. Knowledge of Adobe Experience Manager (AEM) is a plus. Proficient understanding of all website components and features like Digital Asset Management, workflows, site search, how to overlay components for customisations, integrations analytics. Proficient understanding and working experience of creating and maintaining functional specifications for new website templates, components including authoring widgets, custom widgets and workflow customisation/creation. Exposure working with content and authors. Experience in integrating websites with backend systems and data sources. Proficient understanding of cross-browser compatibility needs. Experience with Test Driven Development. Working with onshore and offshore teams. Champion usage of the Atlassian suite (JIRA, Confluence, bitbucket) or Asana. Proactive, assertive and pragmatic in a demanding and dynamic environment. Servant leaders, who put the team first. Nice to have: Ability to perform some development and maintenance tasks related to AEM platform code. Experience in installation and configuration of AEM, Groups and Permissions, Access Control Lists, Replication agents, service packs, dispatcher configuration. Experience in Java development, design, and coding (Javascript, HTML, CSS, jQuery, React js and web technologies) Experience in fix AEM Environment issues. UX UI knowledge. Career Stage: Manager London Stock Exchange Group (LSEG) Information: Join us and be part of a team that values innovation, quality, and continuous improvement. If you're ready to take your career to the next level and make a significant impact, we'd love to hear from you.LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.Our purpose is the foundation on which our culture is built. Our values of Integrity, Partnership , Excellence and Change underpin our purpose and set the standard for everything we do, every day. They go to the heart of who we are and guide our decision making and everyday actions.Working with us means that you will be part of a dynamic organisation of 25,000 people across 65 countries. However, we will value your individuality and enable you to bring your true self to work so you can help enrich our diverse workforce.We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone's race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.You will be part of a collaborative and creative culture where we encourage new ideas. We are committed to sustainability across our global business and we are proud to partner with our customers to help them meet their sustainability objectives. Our charity, the LSEG Foundation provides charitable grants to community groups that help people access economic opportunities and build a secure future with financial independence. Colleagues can get involved through fundraising and volunteering.LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid volunteering days and wellbeing initiatives.Please take a moment to read this carefully, as it describes what personal information London Stock Exchange Group (LSEG) (we) may hold about
Platform Engineer / DevOps Engineer / Site Reliability Engineer (SRE) / Cloud Engineer
慨正橡扯 Gloucester, Gloucestershire
Platform Engineer Location: Gloucester Type: Permanent, full time Work arrangement: Hybrid (min 3 days a week on site) Security clearance: Must be willing to obtain SC and eDV clearance. Responsibilities Deploy applications and software to cloud or on-prem environments for various business areas. Build and set up development tools and infrastructure. Understand project stakeholder needs. Automate and improve development and release processes. Ensure systems are safe and secure against cyber security threats. Identify technical problems and develop software updates and solutions. Collaborate with other engineers to ensure development follows established processes and works as intended. Required experience Experience working in an Agile/SCRUM/DevOps delivery model. Proficiency with cloud technologies (AWS or Azure). Experience with infrastructure-as-code tools (e.g., Terraform, Puppet, Chef, Ansible). Experience building and deploying large-scale applications in Continuous Integration/Delivery pipelines. Experience with container platforms and orchestration systems (ECS, AKS, Kubernetes, Helm, Docker). Experience with automation and integration tools such as Jenkins, Concourse CI, or cloud equivalents. Experience with scripting languages and source control.
26/06/2026
Full time
Platform Engineer Location: Gloucester Type: Permanent, full time Work arrangement: Hybrid (min 3 days a week on site) Security clearance: Must be willing to obtain SC and eDV clearance. Responsibilities Deploy applications and software to cloud or on-prem environments for various business areas. Build and set up development tools and infrastructure. Understand project stakeholder needs. Automate and improve development and release processes. Ensure systems are safe and secure against cyber security threats. Identify technical problems and develop software updates and solutions. Collaborate with other engineers to ensure development follows established processes and works as intended. Required experience Experience working in an Agile/SCRUM/DevOps delivery model. Proficiency with cloud technologies (AWS or Azure). Experience with infrastructure-as-code tools (e.g., Terraform, Puppet, Chef, Ansible). Experience building and deploying large-scale applications in Continuous Integration/Delivery pipelines. Experience with container platforms and orchestration systems (ECS, AKS, Kubernetes, Helm, Docker). Experience with automation and integration tools such as Jenkins, Concourse CI, or cloud equivalents. Experience with scripting languages and source control.

Modal Window

  • Home
  • Contact
  • About Us
  • FAQs
  • Terms & Conditions
  • Privacy
  • Employer
  • Post a Job
  • Search Resumes
  • Sign in
  • Job Seeker
  • Find Jobs
  • Create Resume
  • Sign in
  • IT blog
  • Facebook
  • Twitter
  • LinkedIn
  • Youtube
© 2008-2026 IT Job Board