it job board logo
  • Home
  • Find IT Jobs
  • Register CV
  • Career Advice
  • Contact us
  • Employers
    • Register as Employer
    • Pricing Plans
  • Recruiting? Post a job
  • Sign in
  • Sign up
  • Home
  • Find IT Jobs
  • Register CV
  • Career Advice
  • Contact us
  • Employers
    • Register as Employer
    • Pricing Plans
Sorry, that job is no longer available. Here are some results that may be similar to the job you were looking for.

56 jobs found

Email me jobs like this
Refine Search
Current Search
staff infrastructure engineer cluster infrastructure
Nutanix Platform Engineer
Onyx-Conseil Stoke-on-trent, Staffordshire
Job Details Job title: Nutanix Platform Engineer Location: Stoke-on-Trent, Permanent onsite Salary: £85,000 to £90,000 (DOE) + benefits Key Responsibilities Build, configure, and deploy Nutanix HCI clusters, including compute, storage, and virtualisation components Install, configure, and manage Nutanix AHV, AOS, Prism, and associated management tools Deliver infrastructure builds in line with design documentation, architectural standards, and security policies Perform system configuration, tuning, and optimisation to ensure platform performance and resilience Support infrastructure projects including new platform deployments, upgrades, expansions, and migrations Produce and maintain build documentation, configuration standards, and operational runbooks Collaborate with network, storage, cloud, and security teams to integrate Nutanix platforms into the wider environment Assist with automation and repeatable builds using scripting or infrastructure-as-code approaches where applicable Provide technical support and escalation for infrastructure-related incidents and issues Contribute to continuous improvement by identifying opportunities to enhance build processes, tooling, and standardisation Education and Experience Requirements Essential: Degree or equivalent experience in Information Technology, Computer Science, Engineering, or a related discipline. Hands on experience building and configuring Nutanix HCI environments Strong knowledge of virtualisation, compute, storage, and infrastructure platforms Experience with cluster deployments, upgrades, and lifecycle management Solid understanding of infrastructure security, resilience, and availability principles Ability to work from technical designs and implement them accurately Strong documentation and communication skills Desirable: Experience with automation tools (e.g. PowerShell, Ansible, Terraform, Calm) Exposure to on prem/private cloud environments Knowledge of enterprise networking and security concepts Nutanix certifications (NCA, NCP, NCM or equivalent) We are committed to fostering an inclusive, equitable and accessible workplace where everyone feels valued and supported. We welcome applications from all individuals, regardless of background or identity, and we encourage candidates who may not meet every listed requirement to still apply. If you require any adjustments or support during the recruitment process, please let us know and we will work with you to ensure a fair and accessible experience.
25/06/2026
Full time
Job Details Job title: Nutanix Platform Engineer Location: Stoke-on-Trent, Permanent onsite Salary: £85,000 to £90,000 (DOE) + benefits Key Responsibilities Build, configure, and deploy Nutanix HCI clusters, including compute, storage, and virtualisation components Install, configure, and manage Nutanix AHV, AOS, Prism, and associated management tools Deliver infrastructure builds in line with design documentation, architectural standards, and security policies Perform system configuration, tuning, and optimisation to ensure platform performance and resilience Support infrastructure projects including new platform deployments, upgrades, expansions, and migrations Produce and maintain build documentation, configuration standards, and operational runbooks Collaborate with network, storage, cloud, and security teams to integrate Nutanix platforms into the wider environment Assist with automation and repeatable builds using scripting or infrastructure-as-code approaches where applicable Provide technical support and escalation for infrastructure-related incidents and issues Contribute to continuous improvement by identifying opportunities to enhance build processes, tooling, and standardisation Education and Experience Requirements Essential: Degree or equivalent experience in Information Technology, Computer Science, Engineering, or a related discipline. Hands on experience building and configuring Nutanix HCI environments Strong knowledge of virtualisation, compute, storage, and infrastructure platforms Experience with cluster deployments, upgrades, and lifecycle management Solid understanding of infrastructure security, resilience, and availability principles Ability to work from technical designs and implement them accurately Strong documentation and communication skills Desirable: Experience with automation tools (e.g. PowerShell, Ansible, Terraform, Calm) Exposure to on prem/private cloud environments Knowledge of enterprise networking and security concepts Nutanix certifications (NCA, NCP, NCM or equivalent) We are committed to fostering an inclusive, equitable and accessible workplace where everyone feels valued and supported. We welcome applications from all individuals, regardless of background or identity, and we encourage candidates who may not meet every listed requirement to still apply. If you require any adjustments or support during the recruitment process, please let us know and we will work with you to ensure a fair and accessible experience.
Nutanix Platform Engineer - HCI, Automation & Resilience
Onyx-Conseil Stoke-on-trent, Staffordshire
Onyx-Conseil is looking for a Nutanix Platform Engineer, to work onsite in Stoke-on-Trent. This role involves building, configuring, and deploying Nutanix HCI clusters, managing AHV and associated tools, and supporting infrastructure projects. The ideal candidate will have a degree in a relevant field, strong knowledge of virtualization and infrastructure platforms, and hands-on experience with Nutanix. The salary ranges from £85,000 to £90,000, alongside benefits.
25/06/2026
Full time
Onyx-Conseil is looking for a Nutanix Platform Engineer, to work onsite in Stoke-on-Trent. This role involves building, configuring, and deploying Nutanix HCI clusters, managing AHV and associated tools, and supporting infrastructure projects. The ideal candidate will have a degree in a relevant field, strong knowledge of virtualization and infrastructure platforms, and hands-on experience with Nutanix. The salary ranges from £85,000 to £90,000, alongside benefits.
Senior Cloud Platform Engineer-AI Infra & Automation
Cerebras Bristol, Gloucestershire
About Graphcore At Graphcore, we're building the future of AI compute. We're a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence. Job Summary We are looking for a Senior Staff Engineer to join our Cloud Platform Team and help develop and deploy cloud services. Working closely with our colleagues in Software Platform, Datacentre Operations and Product Development teams, you will deploy services on our fleet of cutting edge AI systems. As part of our Software Platform organisation, you will be involved in the cloud integration, validation, performance benchmarking, optimisation, and development of our high performance AI solutions, including in house AI systems and off the shelf high performance servers, switches and storage solutions. This is a hand on technical role requiring a solid background in the use of cloud infrastructure, deployment using Infrastructure as Code, observability, high performance networking and storage systems. You may have been working in an IT organisation, a datacentre, a cloud provider or as a developer of orchestration or cloud services. The Software Platform team We build Graphcore products into large scale AI solutions for our customers. The Cloud Platform Team is responsible for providing such systems to both internal users via private clouds and customers via our own public clouds. Often the internal systems will be using and developing pre release hardware and software, so it's vital you are comfortable with unproven components. Responsibilities and Duties Operate and extend existing OpenStack based cloud services and contribute to the deployment and development of new ones. Develop and operate end user services on our clouds and support internal users in their use. Turn end user and product requirements into deployed services. Help build automation to collect and analyse metrics and other observability data from the cloud services to support clear identification and reporting of any issues. Work with users to provide information on any product related issues to Engineering and QA departments. Work with our Datacentre Operations Engineers to maintain and operate the fleet of AI systems at peak performance in our private clouds. Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure as Code in internal and external datacentres. Drive corrective actions for systems that are not operating correctly, working with DC operations and Graphcore Engineering as required. Work with external vendors of off the shelf switches, servers and storage solutions to specify, benchmark and integrate 3rd party products into our Cloud Reference Design. Skills and Experience ALL REQUIRED Bachelor's degree or equivalent practical experience in a relevant subject. Solid infrastructure or IT experience with a proven track record of delivering technical output as an individual contributor. Experience managing or operating on premises or private cloud environments. Experience specifying, scoping, estimating and detailing work plans in an AGILE and SCRUM framework, including priorities, risks, issues, impacts and constraints. Strong proven Linux scripting ability (bash and python required). Strong proven Linux system administration (Ubuntu, RHEL and variants). Experience with a version control system (preferably Git) and using it to manage system configuration or automation. Experience with Continuous Integration or testing pipelines using GitLab, GitHub or similar. Hands on experience deploying services into public or private clouds using Infrastructure as Code. A solid understanding of the technologies underpinning cloud services (APIs, virtualisation of CPUs, IO, systems), virtual networks, block storage, resource management and monitoring. Experience with IAC automation tools (e.g. Terraform/OpenTofu, Ansible, Packer). Experience with container deployment and management tools (e.g. Docker, Podman, Apptainer). Experience with solutions for monitoring and observability (Grafana, Prometheus, OpenSearch/ElasticSearch, Loki, Mimir, OpenTelemetry, Fluentd, Kafka). Good communication and presentation skills, and experience dealing with end users of IT or cloud services. An ability to work independently on critical infrastructure without oversight, and with a focus on end user availability. Desirable but not required Experience with OpenStack deployments or the technologies they rely on (e.g. Ceph, Open vSwitch, KVM, QEMU). Experience with High Performance Computing (HPC) environments using SLURM or similar batch workload solutions. Strong skillset and experience in end to end deployment automation and CI of containerised services. Complete automation of pipelines for build, test, deploy, manage, alert, destroy, rebuild. Experience with managing production Kubernetes clusters and workloads. Experience with workload queue management systems (SLURM, LSF, Kueue). Experience with managed switch configuration (e.g. EOS, SONiC, DNOS). Programming experience with Python3 utilising classes and inheritance. Programming experience with Go. Benefits In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments. Sponsorship Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications.
25/06/2026
Full time
About Graphcore At Graphcore, we're building the future of AI compute. We're a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence. Job Summary We are looking for a Senior Staff Engineer to join our Cloud Platform Team and help develop and deploy cloud services. Working closely with our colleagues in Software Platform, Datacentre Operations and Product Development teams, you will deploy services on our fleet of cutting edge AI systems. As part of our Software Platform organisation, you will be involved in the cloud integration, validation, performance benchmarking, optimisation, and development of our high performance AI solutions, including in house AI systems and off the shelf high performance servers, switches and storage solutions. This is a hand on technical role requiring a solid background in the use of cloud infrastructure, deployment using Infrastructure as Code, observability, high performance networking and storage systems. You may have been working in an IT organisation, a datacentre, a cloud provider or as a developer of orchestration or cloud services. The Software Platform team We build Graphcore products into large scale AI solutions for our customers. The Cloud Platform Team is responsible for providing such systems to both internal users via private clouds and customers via our own public clouds. Often the internal systems will be using and developing pre release hardware and software, so it's vital you are comfortable with unproven components. Responsibilities and Duties Operate and extend existing OpenStack based cloud services and contribute to the deployment and development of new ones. Develop and operate end user services on our clouds and support internal users in their use. Turn end user and product requirements into deployed services. Help build automation to collect and analyse metrics and other observability data from the cloud services to support clear identification and reporting of any issues. Work with users to provide information on any product related issues to Engineering and QA departments. Work with our Datacentre Operations Engineers to maintain and operate the fleet of AI systems at peak performance in our private clouds. Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure as Code in internal and external datacentres. Drive corrective actions for systems that are not operating correctly, working with DC operations and Graphcore Engineering as required. Work with external vendors of off the shelf switches, servers and storage solutions to specify, benchmark and integrate 3rd party products into our Cloud Reference Design. Skills and Experience ALL REQUIRED Bachelor's degree or equivalent practical experience in a relevant subject. Solid infrastructure or IT experience with a proven track record of delivering technical output as an individual contributor. Experience managing or operating on premises or private cloud environments. Experience specifying, scoping, estimating and detailing work plans in an AGILE and SCRUM framework, including priorities, risks, issues, impacts and constraints. Strong proven Linux scripting ability (bash and python required). Strong proven Linux system administration (Ubuntu, RHEL and variants). Experience with a version control system (preferably Git) and using it to manage system configuration or automation. Experience with Continuous Integration or testing pipelines using GitLab, GitHub or similar. Hands on experience deploying services into public or private clouds using Infrastructure as Code. A solid understanding of the technologies underpinning cloud services (APIs, virtualisation of CPUs, IO, systems), virtual networks, block storage, resource management and monitoring. Experience with IAC automation tools (e.g. Terraform/OpenTofu, Ansible, Packer). Experience with container deployment and management tools (e.g. Docker, Podman, Apptainer). Experience with solutions for monitoring and observability (Grafana, Prometheus, OpenSearch/ElasticSearch, Loki, Mimir, OpenTelemetry, Fluentd, Kafka). Good communication and presentation skills, and experience dealing with end users of IT or cloud services. An ability to work independently on critical infrastructure without oversight, and with a focus on end user availability. Desirable but not required Experience with OpenStack deployments or the technologies they rely on (e.g. Ceph, Open vSwitch, KVM, QEMU). Experience with High Performance Computing (HPC) environments using SLURM or similar batch workload solutions. Strong skillset and experience in end to end deployment automation and CI of containerised services. Complete automation of pipelines for build, test, deploy, manage, alert, destroy, rebuild. Experience with managing production Kubernetes clusters and workloads. Experience with workload queue management systems (SLURM, LSF, Kueue). Experience with managed switch configuration (e.g. EOS, SONiC, DNOS). Programming experience with Python3 utilising classes and inheritance. Programming experience with Go. Benefits In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments. Sponsorship Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications.
Staff Cloud Engineer
Cerebras Bristol, Gloucestershire
About Graphcore At Graphcore, we're building the future of AI compute. We're a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence. Job Summary We are looking for a Senior Staff Engineer to join our Cloud Platform Team and help develop and deploy cloud services. Working closely with our colleagues in Software Platform, Datacentre Operations and Product Development teams, you will deploy services on our fleet of cutting edge AI systems. As part of our Software Platform organisation, you will be involved in the cloud integration, validation, performance benchmarking, optimisation, and development of our high performance AI solutions, including in house AI systems and off the shelf high performance servers, switches and storage solutions. This is a hand on technical role requiring a solid background in the use of cloud infrastructure, deployment using Infrastructure as Code, observability, high performance networking and storage systems. You may have been working in an IT organisation, a datacentre, a cloud provider or as a developer of orchestration or cloud services. The Software Platform team We build Graphcore products into large scale AI solutions for our customers. The Cloud Platform Team is responsible for providing such systems to both internal users via private clouds and customers via our own public clouds. Often the internal systems will be using and developing pre release hardware and software, so it's vital you are comfortable with unproven components. Responsibilities and Duties Operate and extend existing OpenStack based cloud services and contribute to the deployment and development of new ones. Develop and operate end user services on our clouds and support internal users in their use. Turn end user and product requirements into deployed services. Help build automation to collect and analyse metrics and other observability data from the cloud services to support clear identification and reporting of any issues. Work with users to provide information on any product related issues to Engineering and QA departments. Work with our Datacentre Operations Engineers to maintain and operate the fleet of AI systems at peak performance in our private clouds. Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure as Code in internal and external datacentres. Drive corrective actions for systems that are not operating correctly, working with DC operations and Graphcore Engineering as required. Work with external vendors of off the shelf switches, servers and storage solutions to specify, benchmark and integrate 3rd party products into our Cloud Reference Design. Skills and Experience ALL REQUIRED Bachelor's degree or equivalent practical experience in a relevant subject. Solid infrastructure or IT experience with a proven track record of delivering technical output as an individual contributor. Experience managing or operating on premises or private cloud environments. Experience specifying, scoping, estimating and detailing work plans in an AGILE and SCRUM framework, including priorities, risks, issues, impacts and constraints. Strong proven Linux scripting ability (bash and python required). Strong proven Linux system administration (Ubuntu, RHEL and variants). Experience with a version control system (preferably Git) and using it to manage system configuration or automation. Experience with Continuous Integration or testing pipelines using GitLab, GitHub or similar. Hands on experience deploying services into public or private clouds using Infrastructure as Code. A solid understanding of the technologies underpinning cloud services (APIs, virtualisation of CPUs, IO, systems), virtual networks, block storage, resource management and monitoring. Experience with IAC automation tools (e.g. Terraform/OpenTofu, Ansible, Packer). Experience with container deployment and management tools (e.g. Docker, Podman, Apptainer). Experience with solutions for monitoring and observability (Grafana, Prometheus, OpenSearch/ElasticSearch, Loki, Mimir, OpenTelemetry, Fluentd, Kafka). Good communication and presentation skills, and experience dealing with end users of IT or cloud services. An ability to work independently on critical infrastructure without oversight, and with a focus on end user availability. Desirable but not required Experience with OpenStack deployments or the technologies they rely on (e.g. Ceph, Open vSwitch, KVM, QEMU). Experience with High Performance Computing (HPC) environments using SLURM or similar batch workload solutions. Strong skillset and experience in end to end deployment automation and CI of containerised services. Complete automation of pipelines for build, test, deploy, manage, alert, destroy, rebuild. Experience with managing production Kubernetes clusters and workloads. Experience with workload queue management systems (SLURM, LSF, Kueue). Experience with managed switch configuration (e.g. EOS, SONiC, DNOS). Programming experience with Python3 utilising classes and inheritance. Programming experience with Go. Benefits In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments. Sponsorship Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications.
24/06/2026
Full time
About Graphcore At Graphcore, we're building the future of AI compute. We're a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence. Job Summary We are looking for a Senior Staff Engineer to join our Cloud Platform Team and help develop and deploy cloud services. Working closely with our colleagues in Software Platform, Datacentre Operations and Product Development teams, you will deploy services on our fleet of cutting edge AI systems. As part of our Software Platform organisation, you will be involved in the cloud integration, validation, performance benchmarking, optimisation, and development of our high performance AI solutions, including in house AI systems and off the shelf high performance servers, switches and storage solutions. This is a hand on technical role requiring a solid background in the use of cloud infrastructure, deployment using Infrastructure as Code, observability, high performance networking and storage systems. You may have been working in an IT organisation, a datacentre, a cloud provider or as a developer of orchestration or cloud services. The Software Platform team We build Graphcore products into large scale AI solutions for our customers. The Cloud Platform Team is responsible for providing such systems to both internal users via private clouds and customers via our own public clouds. Often the internal systems will be using and developing pre release hardware and software, so it's vital you are comfortable with unproven components. Responsibilities and Duties Operate and extend existing OpenStack based cloud services and contribute to the deployment and development of new ones. Develop and operate end user services on our clouds and support internal users in their use. Turn end user and product requirements into deployed services. Help build automation to collect and analyse metrics and other observability data from the cloud services to support clear identification and reporting of any issues. Work with users to provide information on any product related issues to Engineering and QA departments. Work with our Datacentre Operations Engineers to maintain and operate the fleet of AI systems at peak performance in our private clouds. Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure as Code in internal and external datacentres. Drive corrective actions for systems that are not operating correctly, working with DC operations and Graphcore Engineering as required. Work with external vendors of off the shelf switches, servers and storage solutions to specify, benchmark and integrate 3rd party products into our Cloud Reference Design. Skills and Experience ALL REQUIRED Bachelor's degree or equivalent practical experience in a relevant subject. Solid infrastructure or IT experience with a proven track record of delivering technical output as an individual contributor. Experience managing or operating on premises or private cloud environments. Experience specifying, scoping, estimating and detailing work plans in an AGILE and SCRUM framework, including priorities, risks, issues, impacts and constraints. Strong proven Linux scripting ability (bash and python required). Strong proven Linux system administration (Ubuntu, RHEL and variants). Experience with a version control system (preferably Git) and using it to manage system configuration or automation. Experience with Continuous Integration or testing pipelines using GitLab, GitHub or similar. Hands on experience deploying services into public or private clouds using Infrastructure as Code. A solid understanding of the technologies underpinning cloud services (APIs, virtualisation of CPUs, IO, systems), virtual networks, block storage, resource management and monitoring. Experience with IAC automation tools (e.g. Terraform/OpenTofu, Ansible, Packer). Experience with container deployment and management tools (e.g. Docker, Podman, Apptainer). Experience with solutions for monitoring and observability (Grafana, Prometheus, OpenSearch/ElasticSearch, Loki, Mimir, OpenTelemetry, Fluentd, Kafka). Good communication and presentation skills, and experience dealing with end users of IT or cloud services. An ability to work independently on critical infrastructure without oversight, and with a focus on end user availability. Desirable but not required Experience with OpenStack deployments or the technologies they rely on (e.g. Ceph, Open vSwitch, KVM, QEMU). Experience with High Performance Computing (HPC) environments using SLURM or similar batch workload solutions. Strong skillset and experience in end to end deployment automation and CI of containerised services. Complete automation of pipelines for build, test, deploy, manage, alert, destroy, rebuild. Experience with managing production Kubernetes clusters and workloads. Experience with workload queue management systems (SLURM, LSF, Kueue). Experience with managed switch configuration (e.g. EOS, SONiC, DNOS). Programming experience with Python3 utilising classes and inheritance. Programming experience with Go. Benefits In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments. Sponsorship Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications.
Options Resourcing Ltd
Cluster Manager
Options Resourcing Ltd
Job title: Cluster Manager Location: Piccadilly, Central London Terms: Monday - Friday, 08:00 - 17:00 Salary/rate: Salary is paying £64,000-£65,000 depending on qualifications and experience. Requirements: Up to date technical knowledge of ACOP's, fire and environmental control measures, building control requirements, and the Health and Safety at Work Act. Managerial experience at Contract Manager or Senior Supervisor level within the hard services business. Practical experience must include recruitment and line management/supervisory experience. About the company: A well-established maintenance provider who are renowned for their prestigious contracts are currently recruiting for a Cluster Manager on a blue-chip building in Piccadilly, Central London. This reputable company are big believers in promoting staff internally and are currently looking to add a strong Cluster Manager to their team. Responsibilities: Responsible for the H&S of the sites and engineers. Work closely with the Contract Support to ensure that the service levels are maintained to a consistently high level. Provide leadership, and ensuring the planned development of the contract, to ensure that contractual commitments are met and exceeded. Support the Helpdesk in achieving high levels of customer satisfaction. Ensure client satisfaction levels are at a constantly high level, leading to development of the contract to increase the portfolio/contract responsibilities. Meet with clients to establish steady lines of communication and attend monthly client meetings where required. Ensure the contract meets healthy and safety working conditions. Ensuring business policies and processes are effectively communicated and implemented within the contract. Provide Weekly Flash reports for each contract to the appropriate client and internal manager/s, where appropriate. Working with Senior Management to ensure the collaborative development of the business, effective team working, and support to colleagues. Responsible for Statutory & Code compliance of sites. Oversee PPM planning schedules for sites. Ensure PPM is carried out in accordance with manufacturer's guidelines and HVCA SFG20. Responsible for the return of PPM and work-related documentation. Risk Management. Man Management/Team Development. Ensure the contract is staffed by fully competent teams, taking direct responsibility for the appointment of Engineers, ensuring post holders are fully competent, and that effective succession planning arrangements are in place. Financial Management - Full ownership of P&L, Debt and WIP. Disciplinary and Conflict management. Ensure all sites have accurate asset registers and are labelled accordingly. Responsible for the fast and effective procurement of materials and services. Produce dilapidation reports. Provide Operational reports monthly and as requested. Ensure additional services and projects are added, and contracts are re-won on re-tender. Proactively source additional works and raise quotations. Provide technical support where required to engineers, helpdesk and clients. Investigate and report on major Operational incidents. Ensure engineers are fully equipped to carry out daily tasks, carry out tool inspections. Check testing equipment calibration. Ensure appropriate contract review, audit and control systems to ensure statutory, policy and contractual commitments are met. Ensure uniforms are being worn and are in good condition. Carry out monthly site reviews. Carry out at least 2 site Audits per Contract per Annum. Conduct engineer's appraisals. Ensure an effective escalation procedure is in place. Ensure all callouts are attended to, in conjunction with the helpdesk. Ensure toolbox talks are conducted monthly. Promote H&S culture across the whole team. Prepare quotations by supplying administrators with labour summary and supplier quotations for materials for quotations to be raised effectively. Attend operational meetings as required. Ensure regular communication with engineering team. Carry out monthly audits on both PPM & Reactive works and provide manager with report. Audit Logbook - Ensure it is being used correctly by both Employees and subcontractors. Ensure customer service levels are maintained. Ensure all 3 rd party contracts have been carried out. Share initiatives to enhance our service provision and recommendations for system infrastructure development. Reporting to the GPE Operations Manager. Working collaboratively as part of a team across all Divisions. This post carries an element of budgetary responsibility. Direct line manager for the site engineering team. Candidate requirements: The ideal candidate is client facing (job activities that involve direct interaction or contact with a client or customer). The ideal candidate has bundles of experience leading a team and within the facilities maintenance industry. A good general education is essential, ideally to degree standard but possibly to HND level. Excellent verbal and written communication skills, numerate and computer literate. Good technical knowledge. Contact us to apply. If this role sounds of interest, please don't hesitate to drop me a call on - or alternatively drop me an email on - sonny.clarke
24/06/2026
Full time
Job title: Cluster Manager Location: Piccadilly, Central London Terms: Monday - Friday, 08:00 - 17:00 Salary/rate: Salary is paying £64,000-£65,000 depending on qualifications and experience. Requirements: Up to date technical knowledge of ACOP's, fire and environmental control measures, building control requirements, and the Health and Safety at Work Act. Managerial experience at Contract Manager or Senior Supervisor level within the hard services business. Practical experience must include recruitment and line management/supervisory experience. About the company: A well-established maintenance provider who are renowned for their prestigious contracts are currently recruiting for a Cluster Manager on a blue-chip building in Piccadilly, Central London. This reputable company are big believers in promoting staff internally and are currently looking to add a strong Cluster Manager to their team. Responsibilities: Responsible for the H&S of the sites and engineers. Work closely with the Contract Support to ensure that the service levels are maintained to a consistently high level. Provide leadership, and ensuring the planned development of the contract, to ensure that contractual commitments are met and exceeded. Support the Helpdesk in achieving high levels of customer satisfaction. Ensure client satisfaction levels are at a constantly high level, leading to development of the contract to increase the portfolio/contract responsibilities. Meet with clients to establish steady lines of communication and attend monthly client meetings where required. Ensure the contract meets healthy and safety working conditions. Ensuring business policies and processes are effectively communicated and implemented within the contract. Provide Weekly Flash reports for each contract to the appropriate client and internal manager/s, where appropriate. Working with Senior Management to ensure the collaborative development of the business, effective team working, and support to colleagues. Responsible for Statutory & Code compliance of sites. Oversee PPM planning schedules for sites. Ensure PPM is carried out in accordance with manufacturer's guidelines and HVCA SFG20. Responsible for the return of PPM and work-related documentation. Risk Management. Man Management/Team Development. Ensure the contract is staffed by fully competent teams, taking direct responsibility for the appointment of Engineers, ensuring post holders are fully competent, and that effective succession planning arrangements are in place. Financial Management - Full ownership of P&L, Debt and WIP. Disciplinary and Conflict management. Ensure all sites have accurate asset registers and are labelled accordingly. Responsible for the fast and effective procurement of materials and services. Produce dilapidation reports. Provide Operational reports monthly and as requested. Ensure additional services and projects are added, and contracts are re-won on re-tender. Proactively source additional works and raise quotations. Provide technical support where required to engineers, helpdesk and clients. Investigate and report on major Operational incidents. Ensure engineers are fully equipped to carry out daily tasks, carry out tool inspections. Check testing equipment calibration. Ensure appropriate contract review, audit and control systems to ensure statutory, policy and contractual commitments are met. Ensure uniforms are being worn and are in good condition. Carry out monthly site reviews. Carry out at least 2 site Audits per Contract per Annum. Conduct engineer's appraisals. Ensure an effective escalation procedure is in place. Ensure all callouts are attended to, in conjunction with the helpdesk. Ensure toolbox talks are conducted monthly. Promote H&S culture across the whole team. Prepare quotations by supplying administrators with labour summary and supplier quotations for materials for quotations to be raised effectively. Attend operational meetings as required. Ensure regular communication with engineering team. Carry out monthly audits on both PPM & Reactive works and provide manager with report. Audit Logbook - Ensure it is being used correctly by both Employees and subcontractors. Ensure customer service levels are maintained. Ensure all 3 rd party contracts have been carried out. Share initiatives to enhance our service provision and recommendations for system infrastructure development. Reporting to the GPE Operations Manager. Working collaboratively as part of a team across all Divisions. This post carries an element of budgetary responsibility. Direct line manager for the site engineering team. Candidate requirements: The ideal candidate is client facing (job activities that involve direct interaction or contact with a client or customer). The ideal candidate has bundles of experience leading a team and within the facilities maintenance industry. A good general education is essential, ideally to degree standard but possibly to HND level. Excellent verbal and written communication skills, numerate and computer literate. Good technical knowledge. Contact us to apply. If this role sounds of interest, please don't hesitate to drop me a call on - or alternatively drop me an email on - sonny.clarke
Lead Staff Systems Reliability Engineer (Linux & Distributed Systems)
The Trade Desk, Inc.
Overview The Trade Desk is a global technology company and the world's leading independent platform for digital advertising, with nearly 4,000 employees across more than 30 offices. Our technology helps advertisers reach the right audiences across the open internet - from streaming TV and podcasts to mobile apps, news, and more. Advertising powers the content people love. By making it more transparent, effective, and responsible, we help support trusted journalism, quality entertainment, and creators worldwide. The world's brands and agencies rely on us to reach their customers and grow their businesses responsibly. The scale of our platform brings unique technical challenges - from processing massive datasets in real time to building systems that operate reliably on a global scale. When you work here, your impact is worldwide. We welcome diverse perspectives, encourage curiosity, and build teams that learn from one another. If you're driven to solve meaningful challenges, we'd love to meet you. What we do We are looking to hire a Lead Systems Reliability Engineer to join our engineering team to continue building and maintaining our data-driven platform. We leverage technologies like Aerospike, MongoDB, and Kafka to perform many real time activities, translating to with a p99 latency under 1 millisecond on the back end! Do you enjoy tuning, performance testing, troubleshooting, automation, and operating at scale? Does testing next-gen hardware, evaluating data access patterns, and designing automation around distributed systems excite you? What makes this role different First in the Industry: The Trade Desk is the first company to run over 5MM QPS to NVMe in Aerospike on a single node, forcing core software redesigns to achieve this scale. Work on Cutting-Edge Hardware: Design clusters with nodes featuring 300TB of NVMe, 3TB RAM, and 512 cores, delivering a global 2,500GB/s throughput directly from flash. Shape the Future of Infrastructure: Spec your own systems and collaborate directly with AMD and NoSQL vendors to run PoCs and optimize bleeding-edge technology for internet-scale workloads. Deep Performance Engineering: Dive into kernel, hardware, and system interactions, leveraging tools like flamegraphs, NUMA counters, BIOS tuning, and synthetic testing to achieve world-class performance. Push Hardware Endurance Limits: Build clusters engineered to withstand over 1 zettabyte of endurance. What you'll do Lead a team to influence, manage, and plan work streams, systems, and data structures at scale within a global ecosystem, spanning multiple infrastructure providers (cloud and traditional datacenters). Encourage, improve, and build infrastructure automation in a way that works with stateful systems at scale. Own operations for Linux-based systems running Aerospike, Kafka, and Mongo. Serve as a point of contact to review new use cases, answer questions, and participate in on call rotation. Learn to be a NoSQL SME. You do not need experience to apply - we will train you. Benchmark and analyze next generation hardware offerings. Who you are Skills and Experience Linux operating system Leadership experience and ability to mentor Troubleshooting Techniques for isolation, scientific method Identify bottlenecks (Is it CPU? IO?) Nice-To-Have experience: Physical hardware (on-prem) internals, management, and operation Performing testing and tuning Databases (relational or NoSQL) Ansible/PyInfra/Chef Prometheus Kubernetes Python/Ruby/Rust/Bash/Golang/C# An Empathetic, Objective, Critical Thinker Thinking beyond the task at hand to deeply understand the 'why' behind an objective. A welcoming of ideas, and understanding of, perspectives that are different from your own and an interest in seeking and building from a common ground. You are a creative thinker, not bound by "the way things have always been done" but are thinking of the questions nobody has thought of and are "yet to be asked". What you know is less important than how well you learn, innovate, collaborate, and adapt. As a global team from many diverse backgrounds, experiences, and perspectives, you value and seek out paths for fostering diversity. The Trade Desk is an equal opportunity employer. All aspects of employment will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law. As an Equal Opportunity Employer, The Trade Desk is committed to creating an inclusive hiring experience where everyone has the opportunity to thrive.
24/06/2026
Full time
Overview The Trade Desk is a global technology company and the world's leading independent platform for digital advertising, with nearly 4,000 employees across more than 30 offices. Our technology helps advertisers reach the right audiences across the open internet - from streaming TV and podcasts to mobile apps, news, and more. Advertising powers the content people love. By making it more transparent, effective, and responsible, we help support trusted journalism, quality entertainment, and creators worldwide. The world's brands and agencies rely on us to reach their customers and grow their businesses responsibly. The scale of our platform brings unique technical challenges - from processing massive datasets in real time to building systems that operate reliably on a global scale. When you work here, your impact is worldwide. We welcome diverse perspectives, encourage curiosity, and build teams that learn from one another. If you're driven to solve meaningful challenges, we'd love to meet you. What we do We are looking to hire a Lead Systems Reliability Engineer to join our engineering team to continue building and maintaining our data-driven platform. We leverage technologies like Aerospike, MongoDB, and Kafka to perform many real time activities, translating to with a p99 latency under 1 millisecond on the back end! Do you enjoy tuning, performance testing, troubleshooting, automation, and operating at scale? Does testing next-gen hardware, evaluating data access patterns, and designing automation around distributed systems excite you? What makes this role different First in the Industry: The Trade Desk is the first company to run over 5MM QPS to NVMe in Aerospike on a single node, forcing core software redesigns to achieve this scale. Work on Cutting-Edge Hardware: Design clusters with nodes featuring 300TB of NVMe, 3TB RAM, and 512 cores, delivering a global 2,500GB/s throughput directly from flash. Shape the Future of Infrastructure: Spec your own systems and collaborate directly with AMD and NoSQL vendors to run PoCs and optimize bleeding-edge technology for internet-scale workloads. Deep Performance Engineering: Dive into kernel, hardware, and system interactions, leveraging tools like flamegraphs, NUMA counters, BIOS tuning, and synthetic testing to achieve world-class performance. Push Hardware Endurance Limits: Build clusters engineered to withstand over 1 zettabyte of endurance. What you'll do Lead a team to influence, manage, and plan work streams, systems, and data structures at scale within a global ecosystem, spanning multiple infrastructure providers (cloud and traditional datacenters). Encourage, improve, and build infrastructure automation in a way that works with stateful systems at scale. Own operations for Linux-based systems running Aerospike, Kafka, and Mongo. Serve as a point of contact to review new use cases, answer questions, and participate in on call rotation. Learn to be a NoSQL SME. You do not need experience to apply - we will train you. Benchmark and analyze next generation hardware offerings. Who you are Skills and Experience Linux operating system Leadership experience and ability to mentor Troubleshooting Techniques for isolation, scientific method Identify bottlenecks (Is it CPU? IO?) Nice-To-Have experience: Physical hardware (on-prem) internals, management, and operation Performing testing and tuning Databases (relational or NoSQL) Ansible/PyInfra/Chef Prometheus Kubernetes Python/Ruby/Rust/Bash/Golang/C# An Empathetic, Objective, Critical Thinker Thinking beyond the task at hand to deeply understand the 'why' behind an objective. A welcoming of ideas, and understanding of, perspectives that are different from your own and an interest in seeking and building from a common ground. You are a creative thinker, not bound by "the way things have always been done" but are thinking of the questions nobody has thought of and are "yet to be asked". What you know is less important than how well you learn, innovate, collaborate, and adapt. As a global team from many diverse backgrounds, experiences, and perspectives, you value and seek out paths for fostering diversity. The Trade Desk is an equal opportunity employer. All aspects of employment will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law. As an Equal Opportunity Employer, The Trade Desk is committed to creating an inclusive hiring experience where everyone has the opportunity to thrive.
Staff Machine Learning Software Engineer, Research London, United Kingdom
PhysicsX Ltd
Staff Machine Learning Software Engineer, Research London, United Kingdom About us PhysicsX is a deep-tech company with roots in numerical physics and Formula One, dedicated to accelerating hardware innovation at the speed of software. We are building an AI-driven simulation software stack for engineering and manufacturing across advanced industries. By enabling high-fidelity, multi-physics simulation through AI inference across the entire engineering lifecycle, PhysicsX unlocks new levels of optimization and automation in design, manufacturing, and operations - empowering engineers to push the boundaries of possibility. Our customers include leading innovators in Aerospace & Defense, Materials, Energy, Semiconductors, and Automotive. Note: We are currently recruiting for multiple positions across different levels, however please only apply for the role that best aligns with your skillset and career goals. What you will do Shape Research group strategy and culture in a significant way, especially in domains of expertise. Be opinionated and formulate strategy on engineering topics relevant to our Research priorities, especially on: scaled engineering, securing compute, infrastructure stack. Define necessary profiles to execute this strategy. Promote effective working patterns and proactively flag issues with team dynamics to foster a productive environment. Nurture younger colleagues to grow their skillset and guide their professional development. Own Research work-streams at a high-level to deliver outcomes. Align priorities with problem stakeholders, internal and external. Set the technical direction for the stream and apply judgement and taste to drive progress. Plan roadmaps with clear milestones for key decisions and outcomes. Organise and guide the more junior members of the team to effectively execute and deliver against this roadmap. Communicate purpose and key outcomes to raise awareness across the company and create opportunities for use and deployment. The below activities in particular. Work closely with our research scientists and simulation engineers to build and deliver models that address real-world physics and engineering problems. Design, build and optimise machine learning models with a focus on scalability and efficiency in our application domain. Transform prototype model implementations to robust and optimised implementations. Implement distributed training architectures (e.g., data parallelism, parameter server, etc.) for multi-node/multi-GPU training and explore federated learning capacity using cloud (e.g., AWS, Azure, GCP) and on-premise services. Work with research scientists to design, build and scale foundation models for science and engineering; helping to scale and optimise model training to large data and multi-GPU cloud compute. Identify the best libraries, frameworks and tools for our modelling efforts to set us up for success. Discuss the results and implications of your work with colleagues and customers, especially how these results can address real-world problems. Work at the intersection of data science and software engineering to translate the results of our Research into re usable libraries, tooling and products. Foster a nurturing environment for colleagues with less experience in ML / Engineering for them to grow and you to mentor. What you bring to the table Enthusiasm about developing machine learning solutions, especially deep learning and/or probabilistic methods, and associated supporting software solutions for science and engineering. Ability to work autonomously and scope and effectively deliver projects across a variety of domains. Strong problem-solving skills and the ability to analyse issues, identify causes, and recommend solutions quickly. Excellent collaboration and communication skills - with teams and customers alike. MSc or PhD in computer science, machine learning, applied statistics, mathematics, physics, engineering, software engineering, or a related field, with a record of experience in any of the following: scientific computing; high-performance computing (CPU / GPU clusters); parallelised / distributed training for large / foundation models. 4 years of experience in a professional industry setting, where you have been instrumental in most of the below: scaling and optimising ML models, training and serving foundation models at scale (federated learning a bonus); employing distributed computing frameworks (e.g., Spark, Dask) and high-performance computing frameworks (MPI, OpenMP, CUDA, Triton); employing cloud computing (on hyper scaler platforms, e.g., AWS, Azure, GCP); building machine learning models and pipelines in Python, using common libraries and frameworks (e.g., NumPy, SciPy, Pandas, PyTorch, JAX), especially including deep learning applications; building or using C/C++ for computer vision, geometry processing, or scientific computing; following and promoting software engineering concepts and best practices (e.g., versioning, testing, CI/CD, API design, MLOps); container ising and orchestrating compute tasks (Docker, Kubernetes, Slurm); writing pipelines and experiment environments, including running experiments in pipelines in a systematic way. What we offer Build what actually matters Help shape an AI native engineering company at a formative stage, tackling problems that genuinely matter for industry and society. This is work with real world impact - and something you can be proud to stand behind. Learn alongside exceptional people Work with a high caliber, collaborative team of engineers, scientists, and operators who care deeply about doing great work, and about helping each other get better. We come from diverse backgrounds, but we share a commitment to operating at the highest level and addressing some of the most complex challenges out there. If you're ambitious, thoughtful, and driven by impact, you'll feel at home. Influence over hierarchy We operate with a flat structure: good ideas win- wherever they come from. Questioning assumptions and challenging the status quo isn't just welcomed, it's expected. Building meaningful technology is a marathon, not a sprint. We believe in balancing focused, ambitious work with a life beyond it. Our hybrid model blends time together in our Shoreditch office with work from home days, giving you the flexibility to work sustainably while staying connected in person. And it doesn't stop there Equity options - share meaningfully in the company you're helping to build. 10% employer pension contribution - because investing in future matters. Free office lunches - to keep you energised and focused. Enhanced parental leave - 3 months full pay paternity and 6 months full pay maternity leave, to provide extra flexibility during the moments that matter most. YellowNest nursery scheme - to help working parents manage childcare costs. 25 days of Annual Leave (+ Public Holidays) - because taking time to rest matters. Private medical insurance - 100% employee cover, giving you complete peace of mind. Wellhub Subscription - gain access to thousands of gyms, classes and wellness apps, supporting both physical and mental wellbeing. Eye tests - because good work depends on good health. Personal development - dedicated support for learning, development, and leveling up over time. Employee Assistance Programme (EAP) - confidential wellbeing support, available whenever you need it. Bike2Work scheme and Season ticket loan - to make getting to work easier and greener. Octopus EV salary sacrifice - for a simpler, more sustainable way to drive electric. We value diversity and are committed to equal employment opportunity regardless of sex, race, religion, ethnicity, nationality, disability, age, sexual orientation or gender identity. We strongly encourage individuals from groups traditionally underrepresented in tech to apply. To help make a change, we sponsor bright women from disadvantaged backgrounds through their university degrees in science and mathematics. We collect diversity and inclusion data solely for the purpose of monitoring the effectiveness of our equal opportunities policies and ensuring compliance with UK employment and equality legislation. This information is confidential, used only in aggregate form, and will not influence the outcome of your application.
24/06/2026
Full time
Staff Machine Learning Software Engineer, Research London, United Kingdom About us PhysicsX is a deep-tech company with roots in numerical physics and Formula One, dedicated to accelerating hardware innovation at the speed of software. We are building an AI-driven simulation software stack for engineering and manufacturing across advanced industries. By enabling high-fidelity, multi-physics simulation through AI inference across the entire engineering lifecycle, PhysicsX unlocks new levels of optimization and automation in design, manufacturing, and operations - empowering engineers to push the boundaries of possibility. Our customers include leading innovators in Aerospace & Defense, Materials, Energy, Semiconductors, and Automotive. Note: We are currently recruiting for multiple positions across different levels, however please only apply for the role that best aligns with your skillset and career goals. What you will do Shape Research group strategy and culture in a significant way, especially in domains of expertise. Be opinionated and formulate strategy on engineering topics relevant to our Research priorities, especially on: scaled engineering, securing compute, infrastructure stack. Define necessary profiles to execute this strategy. Promote effective working patterns and proactively flag issues with team dynamics to foster a productive environment. Nurture younger colleagues to grow their skillset and guide their professional development. Own Research work-streams at a high-level to deliver outcomes. Align priorities with problem stakeholders, internal and external. Set the technical direction for the stream and apply judgement and taste to drive progress. Plan roadmaps with clear milestones for key decisions and outcomes. Organise and guide the more junior members of the team to effectively execute and deliver against this roadmap. Communicate purpose and key outcomes to raise awareness across the company and create opportunities for use and deployment. The below activities in particular. Work closely with our research scientists and simulation engineers to build and deliver models that address real-world physics and engineering problems. Design, build and optimise machine learning models with a focus on scalability and efficiency in our application domain. Transform prototype model implementations to robust and optimised implementations. Implement distributed training architectures (e.g., data parallelism, parameter server, etc.) for multi-node/multi-GPU training and explore federated learning capacity using cloud (e.g., AWS, Azure, GCP) and on-premise services. Work with research scientists to design, build and scale foundation models for science and engineering; helping to scale and optimise model training to large data and multi-GPU cloud compute. Identify the best libraries, frameworks and tools for our modelling efforts to set us up for success. Discuss the results and implications of your work with colleagues and customers, especially how these results can address real-world problems. Work at the intersection of data science and software engineering to translate the results of our Research into re usable libraries, tooling and products. Foster a nurturing environment for colleagues with less experience in ML / Engineering for them to grow and you to mentor. What you bring to the table Enthusiasm about developing machine learning solutions, especially deep learning and/or probabilistic methods, and associated supporting software solutions for science and engineering. Ability to work autonomously and scope and effectively deliver projects across a variety of domains. Strong problem-solving skills and the ability to analyse issues, identify causes, and recommend solutions quickly. Excellent collaboration and communication skills - with teams and customers alike. MSc or PhD in computer science, machine learning, applied statistics, mathematics, physics, engineering, software engineering, or a related field, with a record of experience in any of the following: scientific computing; high-performance computing (CPU / GPU clusters); parallelised / distributed training for large / foundation models. 4 years of experience in a professional industry setting, where you have been instrumental in most of the below: scaling and optimising ML models, training and serving foundation models at scale (federated learning a bonus); employing distributed computing frameworks (e.g., Spark, Dask) and high-performance computing frameworks (MPI, OpenMP, CUDA, Triton); employing cloud computing (on hyper scaler platforms, e.g., AWS, Azure, GCP); building machine learning models and pipelines in Python, using common libraries and frameworks (e.g., NumPy, SciPy, Pandas, PyTorch, JAX), especially including deep learning applications; building or using C/C++ for computer vision, geometry processing, or scientific computing; following and promoting software engineering concepts and best practices (e.g., versioning, testing, CI/CD, API design, MLOps); container ising and orchestrating compute tasks (Docker, Kubernetes, Slurm); writing pipelines and experiment environments, including running experiments in pipelines in a systematic way. What we offer Build what actually matters Help shape an AI native engineering company at a formative stage, tackling problems that genuinely matter for industry and society. This is work with real world impact - and something you can be proud to stand behind. Learn alongside exceptional people Work with a high caliber, collaborative team of engineers, scientists, and operators who care deeply about doing great work, and about helping each other get better. We come from diverse backgrounds, but we share a commitment to operating at the highest level and addressing some of the most complex challenges out there. If you're ambitious, thoughtful, and driven by impact, you'll feel at home. Influence over hierarchy We operate with a flat structure: good ideas win- wherever they come from. Questioning assumptions and challenging the status quo isn't just welcomed, it's expected. Building meaningful technology is a marathon, not a sprint. We believe in balancing focused, ambitious work with a life beyond it. Our hybrid model blends time together in our Shoreditch office with work from home days, giving you the flexibility to work sustainably while staying connected in person. And it doesn't stop there Equity options - share meaningfully in the company you're helping to build. 10% employer pension contribution - because investing in future matters. Free office lunches - to keep you energised and focused. Enhanced parental leave - 3 months full pay paternity and 6 months full pay maternity leave, to provide extra flexibility during the moments that matter most. YellowNest nursery scheme - to help working parents manage childcare costs. 25 days of Annual Leave (+ Public Holidays) - because taking time to rest matters. Private medical insurance - 100% employee cover, giving you complete peace of mind. Wellhub Subscription - gain access to thousands of gyms, classes and wellness apps, supporting both physical and mental wellbeing. Eye tests - because good work depends on good health. Personal development - dedicated support for learning, development, and leveling up over time. Employee Assistance Programme (EAP) - confidential wellbeing support, available whenever you need it. Bike2Work scheme and Season ticket loan - to make getting to work easier and greener. Octopus EV salary sacrifice - for a simpler, more sustainable way to drive electric. We value diversity and are committed to equal employment opportunity regardless of sex, race, religion, ethnicity, nationality, disability, age, sexual orientation or gender identity. We strongly encourage individuals from groups traditionally underrepresented in tech to apply. To help make a change, we sponsor bright women from disadvantaged backgrounds through their university degrees in science and mathematics. We collect diversity and inclusion data solely for the purpose of monitoring the effectiveness of our equal opportunities policies and ensuring compliance with UK employment and equality legislation. This information is confidential, used only in aggregate form, and will not influence the outcome of your application.
Senior Platform Automation Engineer
Securecloudplus Stoke-on-trent, Staffordshire
The Role The Senior Platform Automation Engineer at Secure Cloud+ is responsible for designing, delivering, and maintaining Infrastructure as Code (IaC) that underpins the secure, repeatable, and compliant delivery of Collaborative Working Environments (CWEs) for defence customers and their supply chain partners. The purpose of this role is to reduce manual infrastructure build and configuration, improve consistency and reliability, and enable rapid, auditable deployment of environments that meet defence grade security and assurance standards. The role plays a critical part in ensuring CWEs can be provisioned, evolved, and operated safely through code driven automation. Working within secure, predominantly on premise and private cloud platforms, the Senior Platform Automation Engineer contributes hands on engineering expertise while promoting automation best practices, supporting platform resilience, and enabling delivery teams to consume infrastructure safely and efficiently. Strategic & Principal Responsibilities Contribute to the strategic direction for on premise platform automation Define automation standards, patterns, and reference architectures for secure on premise infrastructure. Shape the long term roadmap for Infrastructure as Code adoption, modernising traditional infrastructure practices safely within defence environments. Act as a technical authority and decision maker for platform automation approaches, tools, and designs. Influence senior stakeholders, architects, security teams, and delivery leads on platform strategy, risk, and investment priorities. Infrastructure as Code & Automation Delivery Design, build, and maintain Infrastructure as Code (IaC) for the automated provisioning and lifecycle management of on premise CWEs. Develop reusable, version controlled automation artefacts to standardise infrastructure builds across environments. Lead the transition from manual and bespoke builds to repeatable, code driven infrastructure delivery. Ensure all infrastructure changes are auditable, reproducible, and aligned with secure delivery practices. Security, Assurance & Compliance Embed defence grade security controls, identity integration, and configuration hardening into all automation. Design infrastructure automation to support accreditation, assurance, and audit requirements. Work closely with security and compliance teams to ensure platforms meet customer and regulatory expectations. CI/CD & Environment Lifecycle Management Integrate Infrastructure as Code into CI/CD pipelines to support controlled, repeatable platform changes. Define and implement safe promotion models across development, test, and production environments. Automate environment provisioning, modification, and decommissioning to support customer lifecycle needs. Operational Reliability & Continuous Improvement Improve reliability and reduce operational risk by identifying automation gaps and replacing manual processes. Support incident analysis and drive preventative improvement through enhanced automation. Actively manage technical debt within platform automation codebases. Continuously refine automation to improve efficiency, quality, and resilience. Leadership, Coaching & Enablement Provide senior technical leadership and mentoring to Junior and Graduate engineers. Set expectations for coding quality, automation practices, and operational ownership. Enable delivery and operations teams to safely consume platform automation through clear guidance and tooling. Contribute to engineering culture, promoting automation first, code driven delivery Education and Experience Requirements Essential Experience Significant, hands on experience designing, building, and operating on premise infrastructure platforms within enterprise or highly regulated environments. Proven experience delivering Infrastructure as Code (IaC) to automate the provisioning and lifecycle management of infrastructure. Demonstrable experience working in security constrained or regulated environments, such as defence, government, or critical national infrastructure. Experience contributing to or establishing technical standards, automation patterns, and strategic direction for platform engineering. Technical Experience Strong experience with Nutanix hyperconverged infrastructure, including cluster design, operation, and automation. Experience supporting or automating Citrix platforms (e.g. virtual apps and desktops) within secure enterprise environments. Solid understanding of traditional data centre technologies including compute, storage, networking, and identity services. Hands on experience building, configuring, and automating SQL Server platforms. Experience supporting multi tier enterprise applications hosted on on premise infrastructure. Understanding of how collaboration platforms are underpinned by identity, database, and application services. Automation & Infrastructure as Code Strong practical experience with Infrastructure as Code tools (e.g. Terraform, Ansible, PowerShell DSC, or equivalent). Proven ability to create reusable, version controlled automation artefacts. Experience integrating infrastructure automation into CI/CD pipelines. Experience embedding security controls, identity integration, access models, and configuration hardening into automated builds. Familiarity with accreditation, assurance, and audit processes in secure environments. Experience supporting live platforms, responding to incidents, and driving automation led improvements. Leadership & Professional Experience Experience operating as a senior technical engineer, providing guidance and technical leadership to peers. Proven ability to influence technical direction and challenge legacy approaches with pragmatic, automation led solutions. Comfortable acting as a trusted technical advisor to architects, security teams, and delivery stakeholders.
22/06/2026
Full time
The Role The Senior Platform Automation Engineer at Secure Cloud+ is responsible for designing, delivering, and maintaining Infrastructure as Code (IaC) that underpins the secure, repeatable, and compliant delivery of Collaborative Working Environments (CWEs) for defence customers and their supply chain partners. The purpose of this role is to reduce manual infrastructure build and configuration, improve consistency and reliability, and enable rapid, auditable deployment of environments that meet defence grade security and assurance standards. The role plays a critical part in ensuring CWEs can be provisioned, evolved, and operated safely through code driven automation. Working within secure, predominantly on premise and private cloud platforms, the Senior Platform Automation Engineer contributes hands on engineering expertise while promoting automation best practices, supporting platform resilience, and enabling delivery teams to consume infrastructure safely and efficiently. Strategic & Principal Responsibilities Contribute to the strategic direction for on premise platform automation Define automation standards, patterns, and reference architectures for secure on premise infrastructure. Shape the long term roadmap for Infrastructure as Code adoption, modernising traditional infrastructure practices safely within defence environments. Act as a technical authority and decision maker for platform automation approaches, tools, and designs. Influence senior stakeholders, architects, security teams, and delivery leads on platform strategy, risk, and investment priorities. Infrastructure as Code & Automation Delivery Design, build, and maintain Infrastructure as Code (IaC) for the automated provisioning and lifecycle management of on premise CWEs. Develop reusable, version controlled automation artefacts to standardise infrastructure builds across environments. Lead the transition from manual and bespoke builds to repeatable, code driven infrastructure delivery. Ensure all infrastructure changes are auditable, reproducible, and aligned with secure delivery practices. Security, Assurance & Compliance Embed defence grade security controls, identity integration, and configuration hardening into all automation. Design infrastructure automation to support accreditation, assurance, and audit requirements. Work closely with security and compliance teams to ensure platforms meet customer and regulatory expectations. CI/CD & Environment Lifecycle Management Integrate Infrastructure as Code into CI/CD pipelines to support controlled, repeatable platform changes. Define and implement safe promotion models across development, test, and production environments. Automate environment provisioning, modification, and decommissioning to support customer lifecycle needs. Operational Reliability & Continuous Improvement Improve reliability and reduce operational risk by identifying automation gaps and replacing manual processes. Support incident analysis and drive preventative improvement through enhanced automation. Actively manage technical debt within platform automation codebases. Continuously refine automation to improve efficiency, quality, and resilience. Leadership, Coaching & Enablement Provide senior technical leadership and mentoring to Junior and Graduate engineers. Set expectations for coding quality, automation practices, and operational ownership. Enable delivery and operations teams to safely consume platform automation through clear guidance and tooling. Contribute to engineering culture, promoting automation first, code driven delivery Education and Experience Requirements Essential Experience Significant, hands on experience designing, building, and operating on premise infrastructure platforms within enterprise or highly regulated environments. Proven experience delivering Infrastructure as Code (IaC) to automate the provisioning and lifecycle management of infrastructure. Demonstrable experience working in security constrained or regulated environments, such as defence, government, or critical national infrastructure. Experience contributing to or establishing technical standards, automation patterns, and strategic direction for platform engineering. Technical Experience Strong experience with Nutanix hyperconverged infrastructure, including cluster design, operation, and automation. Experience supporting or automating Citrix platforms (e.g. virtual apps and desktops) within secure enterprise environments. Solid understanding of traditional data centre technologies including compute, storage, networking, and identity services. Hands on experience building, configuring, and automating SQL Server platforms. Experience supporting multi tier enterprise applications hosted on on premise infrastructure. Understanding of how collaboration platforms are underpinned by identity, database, and application services. Automation & Infrastructure as Code Strong practical experience with Infrastructure as Code tools (e.g. Terraform, Ansible, PowerShell DSC, or equivalent). Proven ability to create reusable, version controlled automation artefacts. Experience integrating infrastructure automation into CI/CD pipelines. Experience embedding security controls, identity integration, access models, and configuration hardening into automated builds. Familiarity with accreditation, assurance, and audit processes in secure environments. Experience supporting live platforms, responding to incidents, and driving automation led improvements. Leadership & Professional Experience Experience operating as a senior technical engineer, providing guidance and technical leadership to peers. Proven ability to influence technical direction and challenge legacy approaches with pragmatic, automation led solutions. Comfortable acting as a trusted technical advisor to architects, security teams, and delivery stakeholders.
HPC Operations Lead
The Francis Crick Institute Limited
Reports to: Head of Research Computing Platforms Working pattern: Monday - Friday: This is a full time permanent hybrid role (at least 3 days a week in office after probation) on Crick terms and conditions of employment. Salary: From £73,000 - £82,000 with benefits, subject to skills and experience Application closing date: 3rd of July at 11.59pm About us The Francis Crick Institute is Europe's largest biomedical research institute under one roof. Our world class scientists and staff collaborate on vital research to help prevent, diagnose and treat illnesses such as cancer, heart disease, infectious diseases and neurodegenerative conditions. The Crick is a place for collaboration, innovation and exploration across many disciplines. A space where the brightest minds can pursue big and bold ideas and discover answers to crucial scientific questions. We support them in a dynamic environment which fosters excellence with state of the art infrastructure, cutting edge facilities, and a creative and curious culture. We've removed traditional boundaries of departments, divisions and disciplines and instead have an open approach that supports every researcher. This gives us the freedom to take risks and carry out high quality, pioneering research. Creating a space for discovery without boundaries helps us to turn our science into benefits for human health and the economy. About the role We are looking for a collaborative HPC Operations Lead to play a pivotal role in shaping the future of research computing at the Crick. As HPC Operations Lead, you will join our Research Computing Platforms/HPC team, reporting directly to the Head of Research Computing Platforms. This is a highly collaborative position where you'll work closely with scientists across the Institute, other Science Technology Platforms, and the wider Information Technology Office to ensure our platforms and services meet the evolving needs of the scientific community. You will take ownership of the operational effectiveness of the team, driving the smooth running and continual improvement of services, overseeing the HPC service desk to ensure timely resolution of incidents, and designing and delivering training courses. You will also deputise for the Head of Research Computing Platforms, taking on wider managerial responsibilities as required. You will be expected to bring prior leadership experience and the ability to communicate effectively with stakeholders across the organisation. You must be able to translate technical language into clear, accessible terms, ensuring that complex information is understood. This is an exciting opportunity for someone with strong technical expertise, a collaborative mindset, and the confidence to lead both people and platforms in a progressive research environment. What you will be doing You will: Understand the scientific and research requirements of the Crick's scientific programmes to advise and deliver platforms and services appropriate to their needs. Technical lead on the design, implementation, operation of research data storage services for access by researchers and instruments inside the Crick and for external collaborations. Work with the Head of Research Computing Platforms and the wider Scientific Computing function to define a technology vision and roadmap for storage systems. Ensure Research Computing Platforms is a user facing service through the delivery of an engaged and supportive HPC service desk. Serve as incident manager for Research Computing Platforms in response to unplanned service outage. Work collaboratively across ITO teams including Architecture, Security and Helpdesk in the delivery and operational management of research platforms and services. Experience of leading on the design, maintenance and optimisation of petabyte scale high performance storage systems. Experience of leading on the operation and management of high performance compute clusters. Ability to manage complex services and projects effectively and efficiently with minimal supervision, a finite pool of resource, and against deadlines. Excellent interpersonal and communication skills, and demonstrable ability to work collaboratively and flexibly as part of a deeply technical engineering team, while still able to work directly with stakeholders to focus on research/business outcomes. Previous experience of working in a biomedical research environment (Desirable) Additional domain technology expertise such as automation and data centre networking (Desirable) About Working at the Crick Our values Everyone who works at the Crick has a valuable role to play in advancing the Crick's mission and shaping our culture. We are bold. We make space for creative, dynamic and imaginative ideas and approaches. We're not afraid to do things differently. We are open. We're highly collaborative and interactive, and make sure our activities are visible to the outside world. We are collegial. We show respect for one another, work cooperatively and support the wider community. At the Francis Crick Institute, we believe that diversity and inclusion are essential to driving innovation and scientific discovery. We are committed to creating a workplace where everyone feels valued, respected, and empowered to succeed, regardless of their background, identity, or personal circumstances. We actively encourage applications from individuals of all genders, ethnicities, abilities, and experiences. We are Disability Confident: Committed employer and want to ensure that everyone can apply and be part of our recruitment processes and so we will make reasonable adjustments if you need them - just let us know when you apply. If you need assistance with applying (i.e., would like to apply by phone or post) please email: . At the Francis Crick Institute, we value our team members and are proud to offer an extensive range of benefits to support their well being and development: Visas: Applicants for this role will be eligible for sponsorship to work in the UK. Generous Leave: 28 days of annual leave, plus three additional days over Christmas and bank holidays. Pension Scheme: Defined contribution pension with employer contributions of up to 16%. 24/7 GP consultation services. Occupational health services and mental health support programmes. Eye care vouchers and discounted healthcare plans. Work Life Balance: Childcare support allowance. Annual leave purchase options. Crick Networks offering diverse groups' support, community and inclusive social events. Discounted gym memberships, bike to work scheme, and shopping discounts. Subsidised on site restaurant and social spaces for team interaction. Please note you must meet the essential criteria listed within the Role Profile, to have your application reviewed. We reserve the right to withdraw this advert at any given time due to the number of applications received.
21/06/2026
Full time
Reports to: Head of Research Computing Platforms Working pattern: Monday - Friday: This is a full time permanent hybrid role (at least 3 days a week in office after probation) on Crick terms and conditions of employment. Salary: From £73,000 - £82,000 with benefits, subject to skills and experience Application closing date: 3rd of July at 11.59pm About us The Francis Crick Institute is Europe's largest biomedical research institute under one roof. Our world class scientists and staff collaborate on vital research to help prevent, diagnose and treat illnesses such as cancer, heart disease, infectious diseases and neurodegenerative conditions. The Crick is a place for collaboration, innovation and exploration across many disciplines. A space where the brightest minds can pursue big and bold ideas and discover answers to crucial scientific questions. We support them in a dynamic environment which fosters excellence with state of the art infrastructure, cutting edge facilities, and a creative and curious culture. We've removed traditional boundaries of departments, divisions and disciplines and instead have an open approach that supports every researcher. This gives us the freedom to take risks and carry out high quality, pioneering research. Creating a space for discovery without boundaries helps us to turn our science into benefits for human health and the economy. About the role We are looking for a collaborative HPC Operations Lead to play a pivotal role in shaping the future of research computing at the Crick. As HPC Operations Lead, you will join our Research Computing Platforms/HPC team, reporting directly to the Head of Research Computing Platforms. This is a highly collaborative position where you'll work closely with scientists across the Institute, other Science Technology Platforms, and the wider Information Technology Office to ensure our platforms and services meet the evolving needs of the scientific community. You will take ownership of the operational effectiveness of the team, driving the smooth running and continual improvement of services, overseeing the HPC service desk to ensure timely resolution of incidents, and designing and delivering training courses. You will also deputise for the Head of Research Computing Platforms, taking on wider managerial responsibilities as required. You will be expected to bring prior leadership experience and the ability to communicate effectively with stakeholders across the organisation. You must be able to translate technical language into clear, accessible terms, ensuring that complex information is understood. This is an exciting opportunity for someone with strong technical expertise, a collaborative mindset, and the confidence to lead both people and platforms in a progressive research environment. What you will be doing You will: Understand the scientific and research requirements of the Crick's scientific programmes to advise and deliver platforms and services appropriate to their needs. Technical lead on the design, implementation, operation of research data storage services for access by researchers and instruments inside the Crick and for external collaborations. Work with the Head of Research Computing Platforms and the wider Scientific Computing function to define a technology vision and roadmap for storage systems. Ensure Research Computing Platforms is a user facing service through the delivery of an engaged and supportive HPC service desk. Serve as incident manager for Research Computing Platforms in response to unplanned service outage. Work collaboratively across ITO teams including Architecture, Security and Helpdesk in the delivery and operational management of research platforms and services. Experience of leading on the design, maintenance and optimisation of petabyte scale high performance storage systems. Experience of leading on the operation and management of high performance compute clusters. Ability to manage complex services and projects effectively and efficiently with minimal supervision, a finite pool of resource, and against deadlines. Excellent interpersonal and communication skills, and demonstrable ability to work collaboratively and flexibly as part of a deeply technical engineering team, while still able to work directly with stakeholders to focus on research/business outcomes. Previous experience of working in a biomedical research environment (Desirable) Additional domain technology expertise such as automation and data centre networking (Desirable) About Working at the Crick Our values Everyone who works at the Crick has a valuable role to play in advancing the Crick's mission and shaping our culture. We are bold. We make space for creative, dynamic and imaginative ideas and approaches. We're not afraid to do things differently. We are open. We're highly collaborative and interactive, and make sure our activities are visible to the outside world. We are collegial. We show respect for one another, work cooperatively and support the wider community. At the Francis Crick Institute, we believe that diversity and inclusion are essential to driving innovation and scientific discovery. We are committed to creating a workplace where everyone feels valued, respected, and empowered to succeed, regardless of their background, identity, or personal circumstances. We actively encourage applications from individuals of all genders, ethnicities, abilities, and experiences. We are Disability Confident: Committed employer and want to ensure that everyone can apply and be part of our recruitment processes and so we will make reasonable adjustments if you need them - just let us know when you apply. If you need assistance with applying (i.e., would like to apply by phone or post) please email: . At the Francis Crick Institute, we value our team members and are proud to offer an extensive range of benefits to support their well being and development: Visas: Applicants for this role will be eligible for sponsorship to work in the UK. Generous Leave: 28 days of annual leave, plus three additional days over Christmas and bank holidays. Pension Scheme: Defined contribution pension with employer contributions of up to 16%. 24/7 GP consultation services. Occupational health services and mental health support programmes. Eye care vouchers and discounted healthcare plans. Work Life Balance: Childcare support allowance. Annual leave purchase options. Crick Networks offering diverse groups' support, community and inclusive social events. Discounted gym memberships, bike to work scheme, and shopping discounts. Subsidised on site restaurant and social spaces for team interaction. Please note you must meet the essential criteria listed within the Role Profile, to have your application reviewed. We reserve the right to withdraw this advert at any given time due to the number of applications received.
Software Engineering Specialist
Sivara GmbH Birmingham, Staffordshire
Salary: £15,000 - 16,500 per year Requirements We are looking for a candidate with the following essential skills: Experience with containerization technology and orchestration platforms, specifically Kubernetes and Docker. Hands on experience in installing, configuring, operating, and monitoring CI/CD pipeline tools. Proficiency in Python, JavaScript, and Golang. Familiarity with GitLab CI or GitHub Actions. Experience in monitoring tools such as Grafana and ELK. Understanding of Agile software development methodologies and experience using JIRA tools. Knowledge of IT, network services, and security. Ability to collaborate effectively with others to drive key security objectives forward. Strong communication skills, including the ability to present and write documentation for both technical and business audiences. A proven aptitude for autonomous learning to meet the demands of the business. Demonstrated problem solving abilities. Assertiveness and the capacity to drive through change. Excellent team working skills, including working effectively within a geographically dispersed team. Responsibilities In this role, you will take on technical leadership within a high performing team of engineers dedicated to delivering state of the art security tools. Your key responsibilities will include: Managing Kubernetes clusters and container orchestration as part of a Kubernetes DevOps/SysOps engineering team, automating deployment, scaling, and management of containerised applications. Implementing best practices for Kubernetes configuration and security. Collaborating with cross functional teams (development, operations, and QA) to streamline software delivery and automate deployment pipelines using CI/CD tools. Troubleshooting issues along the CI/CD pipeline. Acting as a product owner by breaking down top level requirements into product backlogs as part of quarterly and sprint planning. Interfacing with program and project managers to ensure appropriate engagement with security architecture as necessary. Providing coaching and mentoring on technology both within and outside the team. Maintaining a growth mindset with a desire to learn, teach, and continuously improve skills. Previous experience owning mission critical shared infrastructure is highly valued. Technologies CI/CD DevOps Docker ELK GitHub GitLab Golang Grafana JIRA JavaScript Kubernetes Network Product Owner Python Security Ansible ArgoCD Cloud Jenkins Kafka Linux OpenStack Terraform Windows microservices More This position is based in Birmingham with hybrid working arrangements. We offer a competitive rate of £550 per day (inside IR35, umbrella only) for an engagement of 8 months. We will review all profiles against the required skills and experience. Due to the high volume of applications, we will only respond to successful applicants in the initial stage of our selection process. Thank you for your interest and for taking the time to apply! last updated 25 week of 2026
21/06/2026
Full time
Salary: £15,000 - 16,500 per year Requirements We are looking for a candidate with the following essential skills: Experience with containerization technology and orchestration platforms, specifically Kubernetes and Docker. Hands on experience in installing, configuring, operating, and monitoring CI/CD pipeline tools. Proficiency in Python, JavaScript, and Golang. Familiarity with GitLab CI or GitHub Actions. Experience in monitoring tools such as Grafana and ELK. Understanding of Agile software development methodologies and experience using JIRA tools. Knowledge of IT, network services, and security. Ability to collaborate effectively with others to drive key security objectives forward. Strong communication skills, including the ability to present and write documentation for both technical and business audiences. A proven aptitude for autonomous learning to meet the demands of the business. Demonstrated problem solving abilities. Assertiveness and the capacity to drive through change. Excellent team working skills, including working effectively within a geographically dispersed team. Responsibilities In this role, you will take on technical leadership within a high performing team of engineers dedicated to delivering state of the art security tools. Your key responsibilities will include: Managing Kubernetes clusters and container orchestration as part of a Kubernetes DevOps/SysOps engineering team, automating deployment, scaling, and management of containerised applications. Implementing best practices for Kubernetes configuration and security. Collaborating with cross functional teams (development, operations, and QA) to streamline software delivery and automate deployment pipelines using CI/CD tools. Troubleshooting issues along the CI/CD pipeline. Acting as a product owner by breaking down top level requirements into product backlogs as part of quarterly and sprint planning. Interfacing with program and project managers to ensure appropriate engagement with security architecture as necessary. Providing coaching and mentoring on technology both within and outside the team. Maintaining a growth mindset with a desire to learn, teach, and continuously improve skills. Previous experience owning mission critical shared infrastructure is highly valued. Technologies CI/CD DevOps Docker ELK GitHub GitLab Golang Grafana JIRA JavaScript Kubernetes Network Product Owner Python Security Ansible ArgoCD Cloud Jenkins Kafka Linux OpenStack Terraform Windows microservices More This position is based in Birmingham with hybrid working arrangements. We offer a competitive rate of £550 per day (inside IR35, umbrella only) for an engagement of 8 months. We will review all profiles against the required skills and experience. Due to the high volume of applications, we will only respond to successful applicants in the initial stage of our selection process. Thank you for your interest and for taking the time to apply! last updated 25 week of 2026
Nutanix Platform Engineer
Securecloudplus Stoke-on-trent, Staffordshire
The Role The Infrastructure Engineer (Nutanix) is responsible for the build, configuration, and implementation of Nutanix-based hyper converged infrastructure platforms, delivering robust, scalable, and secure compute, storage, and virtualisation services. The role focuses on translating design requirements into fully operational infrastructure through hands on engineering, standardised builds, and repeatable configurations. The engineer will lead the deployment and configuration of Nutanix clusters, AHV, storage services, and management tooling, ensuring platforms are built in line with architectural standards, security requirements, and operational best practices. Working closely with architecture, cloud, network, and security teams, the role supports the delivery of private and hybrid cloud solutions while driving consistency, automation, and quality across the infrastructure build lifecycle. Key Responsibilities Build, configure, and deploy Nutanix HCI clusters, including compute, storage, and virtualisation components Install, configure, and manage Nutanix AHV, AOS, Prism, and associated management tools Deliver infrastructure builds in line with design documentation, architectural standards, and security policies Perform system configuration, tuning, and optimisation to ensure platform performance and resilience Support infrastructure projects including new platform deployments, upgrades, expansions, and migrations Produce and maintain build documentation, configuration standards, and operational runbooks Collaborate with network, storage, cloud, and security teams to integrate Nutanix platforms into the wider environment Assist with automation and repeatable builds using scripting or infrastructure-as-code approaches where applicable Provide technical support and escalation for infrastructure-related incidents and issues Contribute to continuous improvement by identifying opportunities to enhance build processes, tooling, and standardisation Education and Experience Requirements Essential Degree or equivalent experience in Information Technology, Computer Science, Engineering, or a related discipline. Hands on experience building and configuring Nutanix HCI environments Strong knowledge of virtualisation, compute, storage, and infrastructure platforms Experience with cluster deployments, upgrades, and lifecycle management Solid understanding of infrastructure security, resilience, and availability principles Ability to work from technical designs and implement them accurately Strong documentation and communication skills Desirable Experience with automation tools (e.g. PowerShell, Ansible, Terraform, Calm) Exposure to on prem/private cloud environments Knowledge of enterprise networking and security concepts Nutanix certifications (NCA, NCP, NCM or equivalent)
21/06/2026
Full time
The Role The Infrastructure Engineer (Nutanix) is responsible for the build, configuration, and implementation of Nutanix-based hyper converged infrastructure platforms, delivering robust, scalable, and secure compute, storage, and virtualisation services. The role focuses on translating design requirements into fully operational infrastructure through hands on engineering, standardised builds, and repeatable configurations. The engineer will lead the deployment and configuration of Nutanix clusters, AHV, storage services, and management tooling, ensuring platforms are built in line with architectural standards, security requirements, and operational best practices. Working closely with architecture, cloud, network, and security teams, the role supports the delivery of private and hybrid cloud solutions while driving consistency, automation, and quality across the infrastructure build lifecycle. Key Responsibilities Build, configure, and deploy Nutanix HCI clusters, including compute, storage, and virtualisation components Install, configure, and manage Nutanix AHV, AOS, Prism, and associated management tools Deliver infrastructure builds in line with design documentation, architectural standards, and security policies Perform system configuration, tuning, and optimisation to ensure platform performance and resilience Support infrastructure projects including new platform deployments, upgrades, expansions, and migrations Produce and maintain build documentation, configuration standards, and operational runbooks Collaborate with network, storage, cloud, and security teams to integrate Nutanix platforms into the wider environment Assist with automation and repeatable builds using scripting or infrastructure-as-code approaches where applicable Provide technical support and escalation for infrastructure-related incidents and issues Contribute to continuous improvement by identifying opportunities to enhance build processes, tooling, and standardisation Education and Experience Requirements Essential Degree or equivalent experience in Information Technology, Computer Science, Engineering, or a related discipline. Hands on experience building and configuring Nutanix HCI environments Strong knowledge of virtualisation, compute, storage, and infrastructure platforms Experience with cluster deployments, upgrades, and lifecycle management Solid understanding of infrastructure security, resilience, and availability principles Ability to work from technical designs and implement them accurately Strong documentation and communication skills Desirable Experience with automation tools (e.g. PowerShell, Ansible, Terraform, Calm) Exposure to on prem/private cloud environments Knowledge of enterprise networking and security concepts Nutanix certifications (NCA, NCP, NCM or equivalent)
IT Infrastructure & Security Engineer
Sivara GmbH Chippenham, Wiltshire
Salary: £31,000 - 71,000 per year Requirements Proven experience in an Infrastructure Engineer, Security Engineer, or advanced 2nd/3rd line IT role Strong knowledge of Microsoft environments including M365, Active Directory, Entra ID, and Group Policy Proficient with endpoint management using Intune MDM and Autopilot deployments Experience supporting cloud environments, with Azure essential and AWS basics advantageous Experience working with RMM tools for monitoring, patching, and automation Good understanding of networking fundamentals including TCP/IP, VPNs, firewalls, DNS, and routing basics Experience supporting firewall platforms such as Palo Alto or similar Familiarity with Cloudflare services including DNS, WAF, CDN, and Zero Trust concepts Experience supporting endpoint and identity security including MFA, Defender, and conditional access Hands-on experience with Windows Server and virtualisation platforms such as Hyper-V or VMware Experience investigating issues using logs and monitoring tools, with a security-first mindset Comfortable using PowerShell for scripting and task automation Experience with patching, system maintenance, and vulnerability remediation Understanding of backup and disaster recovery concepts Familiarity with ITIL environments and structured change processes Strong troubleshooting skills with the ability to work independently and elevate when needed Strong understanding of Windows Clusters Strong understanding of Windows RDP Server Farms Clear communication skills and ability to document work effectively A proactive attitude with a willingness to learn and develop Exposure to security monitoring tools or SOC-style workflows is desirable Experience with Azure security tooling such as Defender for Cloud and Sentinel is desirable Basic Linux administration knowledge is desirable Relevant certifications in Microsoft, CompTIA, AWS, or similar are desirable Responsibilities Act as a senior escalation point for our Service Desk, supporting complex infrastructure and security issues Provide day-to-day support and maintenance of core IT infrastructure including Windows Server, Active Directory, virtual environments, and business-critical systems Support and maintain security controls across our environment including MFA, VPN, endpoint protection, and access control Investigate and respond to security alerts, incidents, and vulnerabilities, ensuring timely remediation Assist with infrastructure and security projects and support delivery Monitor systems, logs, and alerts to identify issues, risks, and potential threats Perform regular infrastructure health checks, patching, and system updates to maintain stability and security Support firewall, network, and VPN configurations alongside senior engineers where required Assist with user access control, permission structures, and identity management within Microsoft 365, Entra ID, and Active Directory Support and manage endpoint environments using Intune MDM, Windows Autopilot, and RMM Provide support for cloud platforms including Azure and Microsoft 365, alongside basic AWS environments Assist with DNS, CDN, and security services such as Cloudflare Support RMM platforms for monitoring, automation, patching, and remote management of endpoints and servers Create and maintain technical documentation, knowledge base articles, and operational procedures Automate routine tasks and improvements using PowerShell or similar tools where possible Work closely with internal teams and international sites to support IT operations and minimise disruption Technologies AWS Active Directory Azure Cloud Firewall Hyper-V Support ITIL Linux Microsoft 365 Network PowerShell Security TCP/IP VPN VMware WAF Windows Office 365 More We are Executive Jet Support Ltd, a world-leading provider of commercial aircraft, engines, and airframe components to almost 300 operators in over 50 countries worldwide. We are a family-based company that values pride in our work, long-term customer relationships, trust, and loyalty. This is a full-time, permanent, on-site role based at Bumpers Farm, Chippenham, Wiltshire, working Monday to Friday. We offer a competitive salary dependent on experience, an annual bonus scheme, life assurance, private healthcare cover, incremental holiday entitlement, an extra paid day off for your birthday, company sick pay, attendance bonus, an employee assistance programme, Perkbox rewards, free weekly fruit, free onsite parking, and staff events. We are looking for a capable and driven IT Infrastructure & Security Engineer to join our growing, dynamic, and professional aviation company.
21/06/2026
Full time
Salary: £31,000 - 71,000 per year Requirements Proven experience in an Infrastructure Engineer, Security Engineer, or advanced 2nd/3rd line IT role Strong knowledge of Microsoft environments including M365, Active Directory, Entra ID, and Group Policy Proficient with endpoint management using Intune MDM and Autopilot deployments Experience supporting cloud environments, with Azure essential and AWS basics advantageous Experience working with RMM tools for monitoring, patching, and automation Good understanding of networking fundamentals including TCP/IP, VPNs, firewalls, DNS, and routing basics Experience supporting firewall platforms such as Palo Alto or similar Familiarity with Cloudflare services including DNS, WAF, CDN, and Zero Trust concepts Experience supporting endpoint and identity security including MFA, Defender, and conditional access Hands-on experience with Windows Server and virtualisation platforms such as Hyper-V or VMware Experience investigating issues using logs and monitoring tools, with a security-first mindset Comfortable using PowerShell for scripting and task automation Experience with patching, system maintenance, and vulnerability remediation Understanding of backup and disaster recovery concepts Familiarity with ITIL environments and structured change processes Strong troubleshooting skills with the ability to work independently and elevate when needed Strong understanding of Windows Clusters Strong understanding of Windows RDP Server Farms Clear communication skills and ability to document work effectively A proactive attitude with a willingness to learn and develop Exposure to security monitoring tools or SOC-style workflows is desirable Experience with Azure security tooling such as Defender for Cloud and Sentinel is desirable Basic Linux administration knowledge is desirable Relevant certifications in Microsoft, CompTIA, AWS, or similar are desirable Responsibilities Act as a senior escalation point for our Service Desk, supporting complex infrastructure and security issues Provide day-to-day support and maintenance of core IT infrastructure including Windows Server, Active Directory, virtual environments, and business-critical systems Support and maintain security controls across our environment including MFA, VPN, endpoint protection, and access control Investigate and respond to security alerts, incidents, and vulnerabilities, ensuring timely remediation Assist with infrastructure and security projects and support delivery Monitor systems, logs, and alerts to identify issues, risks, and potential threats Perform regular infrastructure health checks, patching, and system updates to maintain stability and security Support firewall, network, and VPN configurations alongside senior engineers where required Assist with user access control, permission structures, and identity management within Microsoft 365, Entra ID, and Active Directory Support and manage endpoint environments using Intune MDM, Windows Autopilot, and RMM Provide support for cloud platforms including Azure and Microsoft 365, alongside basic AWS environments Assist with DNS, CDN, and security services such as Cloudflare Support RMM platforms for monitoring, automation, patching, and remote management of endpoints and servers Create and maintain technical documentation, knowledge base articles, and operational procedures Automate routine tasks and improvements using PowerShell or similar tools where possible Work closely with internal teams and international sites to support IT operations and minimise disruption Technologies AWS Active Directory Azure Cloud Firewall Hyper-V Support ITIL Linux Microsoft 365 Network PowerShell Security TCP/IP VPN VMware WAF Windows Office 365 More We are Executive Jet Support Ltd, a world-leading provider of commercial aircraft, engines, and airframe components to almost 300 operators in over 50 countries worldwide. We are a family-based company that values pride in our work, long-term customer relationships, trust, and loyalty. This is a full-time, permanent, on-site role based at Bumpers Farm, Chippenham, Wiltshire, working Monday to Friday. We offer a competitive salary dependent on experience, an annual bonus scheme, life assurance, private healthcare cover, incremental holiday entitlement, an extra paid day off for your birthday, company sick pay, attendance bonus, an employee assistance programme, Perkbox rewards, free weekly fruit, free onsite parking, and staff events. We are looking for a capable and driven IT Infrastructure & Security Engineer to join our growing, dynamic, and professional aviation company.
Nutanix HCI Platform Engineer - Build & Automate Cloud
Securecloudplus Stoke-on-trent, Staffordshire
Securecloudplus is seeking an Infrastructure Engineer (Nutanix) to build and manage Nutanix-based hyper-converged infrastructure platforms. This role involves deploying Nutanix clusters while ensuring adherence to architectural standards. The candidate should have strong experience in virtualisation and infrastructure platforms. Essential qualifications include a degree in IT or equivalent and hands-on experience with Nutanix HCI environments.
21/06/2026
Full time
Securecloudplus is seeking an Infrastructure Engineer (Nutanix) to build and manage Nutanix-based hyper-converged infrastructure platforms. This role involves deploying Nutanix clusters while ensuring adherence to architectural standards. The candidate should have strong experience in virtualisation and infrastructure platforms. Essential qualifications include a degree in IT or equivalent and hands-on experience with Nutanix HCI environments.
Inference Engine Development - Member of Technical Staff
Callosum
About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. Inference engines were designed for single-model inference on homogeneous GPU clusters - this role builds them beyond that. Working directly on systems like vLLM and SGLang, you will adapt and extend them for heterogeneous resources, making them hardware-aware, with deeper optimisation around scheduling, memory, and execution. The execution strategies you design - parallelism, disaggregation, caching - will define what heterogeneous inference looks like at production scale. Your work ensures that the capabilities exposed by the lower layers of the stack translate into real, measurable gains, the new standard for how inference runs on diverse hardware. What You'll Build Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities What You Bring Deep familiarity with the internals of SGLang, vLLM, or comparable inference serving frameworks - scheduler design, memory management, and execution pipelines Strong background in high-performance Python and C++/CUDA systems, particularly in the context of ML inference Experience designing or implementing parallelism strategies for large model serving Understanding of disaggregated serving architectures and the tradeoffs involved in separating modules of a workflow Demonstrable record of working effectively in fast-moving open source codebases with evolving APIs and design conventions
20/06/2026
Full time
About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. Inference engines were designed for single-model inference on homogeneous GPU clusters - this role builds them beyond that. Working directly on systems like vLLM and SGLang, you will adapt and extend them for heterogeneous resources, making them hardware-aware, with deeper optimisation around scheduling, memory, and execution. The execution strategies you design - parallelism, disaggregation, caching - will define what heterogeneous inference looks like at production scale. Your work ensures that the capabilities exposed by the lower layers of the stack translate into real, measurable gains, the new standard for how inference runs on diverse hardware. What You'll Build Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities What You Bring Deep familiarity with the internals of SGLang, vLLM, or comparable inference serving frameworks - scheduler design, memory management, and execution pipelines Strong background in high-performance Python and C++/CUDA systems, particularly in the context of ML inference Experience designing or implementing parallelism strategies for large model serving Understanding of disaggregated serving architectures and the tradeoffs involved in separating modules of a workflow Demonstrable record of working effectively in fast-moving open source codebases with evolving APIs and design conventions
Specialised AI Engineer London; UK
Nscale Ltd.
Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As a Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do. About the Role Nscale is looking for Senior / Staff AI Engineers to join our core AI team and build the systems that power our GenAI cloud platform. This role sits at the heart of our AI services platform, designing and optimising distributed systems that power large-scale training, post-training, evaluation, and low-latency, high-throughput inference under strict performance and efficiency constraints. You may specialise deeply in areas such as inference optimisation, large-scale training, post-training (fine-tuning, alignment), or evaluation systems, or operate across multiple parts of the stack. In all cases, you'll work on hard systems problems at scale, where performance, efficiency, and developer experience are critical. This is a hands on role for engineers who want to push the boundaries of how AI systems are built, optimised, and consumed by other AI engineers. Responsibilities Design, build, and optimise scalable AI platform systems spanning (one or more): Drive inference performance and efficiency, including: KV cache management, continuous batching, speculative decoding Quantisation (INT8/4, FP8), sparsity, pruning, and model compression Build and improve post training services, including: Fine tuning (LoRA, QLoRA, adapters, full fine tuning) Alignment (RLHF, DPO, reward modelling) Agentic RL (tool calling, off policy training, parallel thinking, decoupled sampling and updating) Dataset curation and data processing workflows Develop evaluation and benchmarking systems to measure: Model quality, safety, and regression System performance (latency, throughput, cost) Real world behaviour and feedback loops Develop and optimise distributed systems for GPU/accelerator workloads, focusing on scalability, reliability, and efficiency Conduct performance analysis and bottleneck investigations across multiple components and stacks spanning training, post training, and inference Collaborate with research, infrastructure, and product teams to build the right platform components based on customer demand and industry direction Build developer facing APIs, SDKs, and tooling that enable other engineers to effectively use Nscale's AI services Requirements 5+ years of experience building production systems in machine learning, distributed systems, or high-performance infrastructure 4+ years of hands on experience in at least one core area, within large-scale, production AI environments (e.g., AI labs, hyperscalers), such as: Inference optimisation Large scale training / pre training systems Post training (fine tuning, alignment, distillation) Evaluation and benchmarking frameworks Strong hands on expertise in at least one of the above areas, with working knowledge across others Proven ability to design, optimise, and operate systems at scale, with a strong understanding of performance trade offs across latency, throughput, cost, and model quality Deep understanding of transformer architectures, LLMs, and/or multimodal models, including their behaviour in production systems Strong proficiency in Python and PyTorch, with a track record of building production grade ML systems Experience with distributed compute and training paradigms (e.g., data/model parallelism, sharding, scheduling) Experience working close to the hardware/software boundary, such as: GPU/accelerator optimisation (CUDA, ROCm, or similar) Memory management and system level performance tuning Experience building or operating production inference or training systems at scale Ability to design clean abstractions, APIs, and reusable systems for other engineers Strong engineering fundamentals, with a track record of writing maintainable, well tested, production quality code Preferred Experience developing large scale and high load production systems. Experience working in containerised, distributed environments (e.g., Kubernetes, large scale clusters) Experience contributing to or working with widely used/open source AI frameworks or systems is strongly preferred Hands on experience with advanced inference optimisation techniques, such as KVCache, MoE, adaptive batching, or gradient checkpointing. Knowledge of efficient training and inference evaluation strategies, with demonstrated success in improving model efficiency. At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role. For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
20/06/2026
Full time
Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As a Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do. About the Role Nscale is looking for Senior / Staff AI Engineers to join our core AI team and build the systems that power our GenAI cloud platform. This role sits at the heart of our AI services platform, designing and optimising distributed systems that power large-scale training, post-training, evaluation, and low-latency, high-throughput inference under strict performance and efficiency constraints. You may specialise deeply in areas such as inference optimisation, large-scale training, post-training (fine-tuning, alignment), or evaluation systems, or operate across multiple parts of the stack. In all cases, you'll work on hard systems problems at scale, where performance, efficiency, and developer experience are critical. This is a hands on role for engineers who want to push the boundaries of how AI systems are built, optimised, and consumed by other AI engineers. Responsibilities Design, build, and optimise scalable AI platform systems spanning (one or more): Drive inference performance and efficiency, including: KV cache management, continuous batching, speculative decoding Quantisation (INT8/4, FP8), sparsity, pruning, and model compression Build and improve post training services, including: Fine tuning (LoRA, QLoRA, adapters, full fine tuning) Alignment (RLHF, DPO, reward modelling) Agentic RL (tool calling, off policy training, parallel thinking, decoupled sampling and updating) Dataset curation and data processing workflows Develop evaluation and benchmarking systems to measure: Model quality, safety, and regression System performance (latency, throughput, cost) Real world behaviour and feedback loops Develop and optimise distributed systems for GPU/accelerator workloads, focusing on scalability, reliability, and efficiency Conduct performance analysis and bottleneck investigations across multiple components and stacks spanning training, post training, and inference Collaborate with research, infrastructure, and product teams to build the right platform components based on customer demand and industry direction Build developer facing APIs, SDKs, and tooling that enable other engineers to effectively use Nscale's AI services Requirements 5+ years of experience building production systems in machine learning, distributed systems, or high-performance infrastructure 4+ years of hands on experience in at least one core area, within large-scale, production AI environments (e.g., AI labs, hyperscalers), such as: Inference optimisation Large scale training / pre training systems Post training (fine tuning, alignment, distillation) Evaluation and benchmarking frameworks Strong hands on expertise in at least one of the above areas, with working knowledge across others Proven ability to design, optimise, and operate systems at scale, with a strong understanding of performance trade offs across latency, throughput, cost, and model quality Deep understanding of transformer architectures, LLMs, and/or multimodal models, including their behaviour in production systems Strong proficiency in Python and PyTorch, with a track record of building production grade ML systems Experience with distributed compute and training paradigms (e.g., data/model parallelism, sharding, scheduling) Experience working close to the hardware/software boundary, such as: GPU/accelerator optimisation (CUDA, ROCm, or similar) Memory management and system level performance tuning Experience building or operating production inference or training systems at scale Ability to design clean abstractions, APIs, and reusable systems for other engineers Strong engineering fundamentals, with a track record of writing maintainable, well tested, production quality code Preferred Experience developing large scale and high load production systems. Experience working in containerised, distributed environments (e.g., Kubernetes, large scale clusters) Experience contributing to or working with widely used/open source AI frameworks or systems is strongly preferred Hands on experience with advanced inference optimisation techniques, such as KVCache, MoE, adaptive batching, or gradient checkpointing. Knowledge of efficient training and inference evaluation strategies, with demonstrated success in improving model efficiency. At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role. For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
Senior AWS Infra Architect - MultiRegion & Kubernetes
Yocto Project
Join Yocto Project as a Staff Infrastructure Engineer in Greater London. In this pivotal role, you'll lead the design and execution of our infrastructure and reliability architecture, leveraging AWS to support a global customer base. Your responsibilities will encompass architecting AWS-native solutions, enhancing observability tools, and managing Kubernetes clusters supporting enterprise-scale demands. We offer competitive salary, flexible working arrangements, and unique benefits.
20/06/2026
Full time
Join Yocto Project as a Staff Infrastructure Engineer in Greater London. In this pivotal role, you'll lead the design and execution of our infrastructure and reliability architecture, leveraging AWS to support a global customer base. Your responsibilities will encompass architecting AWS-native solutions, enhancing observability tools, and managing Kubernetes clusters supporting enterprise-scale demands. We offer competitive salary, flexible working arrangements, and unique benefits.
Interim Chief Digital and Information Officer
NHS Nottingham, Nottinghamshire
Go back Nottinghamshire Healthcare NHS Foundation Trust Interim Chief Digital and Information Officer The closing date is 30 June 2026 A pivotal senior role at a critical time to shape the future of care through digital, data and technology Are you ready to lead a most exciting NHS digital transformation opportunity? We are seeking an inspirational interim Chief Digital and Information Officer (CDIO) to join our Extended Executive Leadership Team and help deliver the ambitions of the NHS 10 Year Plan and our refreshed Trust Strategy of which, digital innovation is a key component. This is a rare opportunity to shape how digital, data and technology transform care, leading our organisation in moving decisively from analogue to digital. We are recruiting to this post initially on an interim basis with a view to a permanent recruitment later this year. The successful candidate will need to be available to take up post from September this year. Why this role matters This is more than a technology leadership role; this is a transformation role. You will lead our digital, data and information agenda, unlocking innovation to improve patient care, outcomes and staff experience. From maximising our Electronic Patient Record (EPR) systems to embedding AI safely and strengthening cyber resilience, your leadership will be central to our future as a modern NHS organisation. Main duties of the job What makes this role special: You will be joining an organisation with a strong sense of ambition, developing partnerships, and a clear commitment to innovation and inclusion. In this role, you will have the opportunity to shape a Trust-wide digital vision that transforms care, influence collaboration and innovation across the wider system, and lead a talented and dedicated workforce. Above all, you will play a key role in delivering meaningful, lasting impact for patients and the communities we serve. If you are passionate about the power of digital to transform healthcare and are ready to lead at scale, we would love to hear from you. What you will lead Transforming care through digital - setting a clear direction for digital enabled service transformation Maximising EPR value - realising the full benefits of our EPR investment Enhancing patient experience - improving digital access and communication Leading innovation safely - supporting responsible adoption of AI technologies Driving efficiency - reshaping teams and services to deliver value in a constrained environment Protecting our infrastructure - ensuring robust information governance and cyber security Strengthening partnerships - working with the IBC Cluster and system partners About us comprises over 11,000 dedicated colleagues who every day. We deliver intellectual disability, mental health, community health, forensic, and offender healthcare services across Nottinghamshire, Leicestershire, Lincolnshire, and South Yorkshire. Our care is provided from over 200 sites, spanning community locations, acute settings, and secure environments. We are one of the largest mental health and community Trusts in the East Midlands and one of Nottinghamshire's biggest employers. We also host national and regional services, such as the National High Secure Deaf Service and the Nottingham Centre for Transgender Health. We offer a variety of employee led staff networks, including Equality, Diversity, and Inclusion (EDI) groups, the Green Champions network, the Freedom to Speak Up network, the Health and Wellbeing Champions network, and the Menopause Champions network. These networks play a vital role in supporting our diverse workforce and promoting a culture of inclusivity. The health and wellbeing of our colleagues is a top priority. We invest significantly in this through our in house occupational health and staff counselling services, supported by a dedicated Health and Wellbeing team. The Trust is committed to reducing its carbon emissions, with a specialised Energy and Environmental team working to ensure compliance with environmental legislation, enhance our environmental performance, and achieve our net zero commitment. Job responsibilities Maximising the benefit of the investment in the Trusts Electronic Patient Record (EPR) systems Improving patient digital communications Supporting the safe implementation of AI technologies Overseeing the move of the Trusts data centre and the re procurement of our EPR(s) Delivering efficiencies including restructuring the teams to maximise delivery Setting a clear direction for how digital can support the transformation of care services Automating and improving our data analytical capabilities Maintaining the Trusts information and cyber security Meeting the varying demands of a large, complex organisation with a mix of in patient and community services spread across a wide geography and with different user requirements Managing a large supplier base ensuring value for money Working collaboratively with the new IBC Cluster, identifying opportunities for working together Acting as a key leader within the Trusts Extended Executive Leadership Team Meeting ambitious targets to reduce corporate costs Who are we looking for We are seeking a credible, values driven leader with very senior experience in digital, data, or technology within a complex organisation. You will bring a strong track record of delivering large scale transformation, alongside the ability to balance innovation with safety and operational delivery. The successful candidate will demonstrate excellent influencing skills at a senior organisation and system level, supported by sound commercial acumen and proven programme leadership experience. Above all, you will have a deep and authentic commitment to improving patient care while supporting and valuing our workforce. Please note applicants will be required to pay for their DBS check. Costs are deducted from salary following appointment. The cost of the DBS application is £26.40 (standard) or £54.40 (enhanced), this cost will be deducted from your salary over the first two months of employment. You are encouraged to enrol for the DBS Update Service. An annual fee of £16 per year applies. Person Specification Qualifications - Academic / Craft / Professional Educated to Masters degree level or equivalent experience and knowledge in a relevant field. Training Information Governance training. Data usage training. Project management training. Leadership qualification or training. Experience Significant experience of development, implementation and maintenance of IT infrastructures, applications, projects and procurement. Significant senior management experience in a complex environment. Experience of managing large scale projects. Experience of service re design and business engineering. Experience of leading multi disciplinary teams. Contractual A full UK driving licence and vehicle for business use is required for this post; reasonable adjustments will be made for disabled individuals in line with the Equality Act 2010. Ability to travel to various Trust sites. Trust Values All colleagues are expected to demonstrate at interview that they act in line with Nottinghamshire Healthcare NHS Foundation Trust Values: Trust, Honesty, Respect, Compassion, Teamwork. All colleagues are expected to demonstrate an understanding of and commitment to Equality, Diversity and Inclusion (EDI) and how it applies to their role. The Trust's expectations are highlighted within our EDI Policy, and associated EDI and Human Rights legislation. Knowledge Expertise in national and European data protection laws and practices including in depth understanding of GDPR. Network and Information Systems (NIS) Regulations 2018. Knowledge of the NHS & Nottinghamshire Healthcare NHS Foundation Trust. Best practice relating to technology and information systems such as cyber security, information security, data warehousing, information management, records management, etc. Skills Strong verbal & written communication skills, ability to communicate to a wide variety of stakeholders. Development of strategic plans with strong analytical skills. Excellent leadership and management skills, able to inspire and instill confidence whilst maximising performance. Disclosure and Barring Service Check This post is subject to the Rehabilitation of Offenders Act (Exceptions Order) 1975 and as such it will be necessary for a submission for Disclosure to be made to the Disclosure and Barring Service (formerly known as CRB) to check for any previous criminal convictions. Nottinghamshire Healthcare NHS Foundation Trust £112,782 to £129,783 a year per annum (pro rata for part time)
20/06/2026
Full time
Go back Nottinghamshire Healthcare NHS Foundation Trust Interim Chief Digital and Information Officer The closing date is 30 June 2026 A pivotal senior role at a critical time to shape the future of care through digital, data and technology Are you ready to lead a most exciting NHS digital transformation opportunity? We are seeking an inspirational interim Chief Digital and Information Officer (CDIO) to join our Extended Executive Leadership Team and help deliver the ambitions of the NHS 10 Year Plan and our refreshed Trust Strategy of which, digital innovation is a key component. This is a rare opportunity to shape how digital, data and technology transform care, leading our organisation in moving decisively from analogue to digital. We are recruiting to this post initially on an interim basis with a view to a permanent recruitment later this year. The successful candidate will need to be available to take up post from September this year. Why this role matters This is more than a technology leadership role; this is a transformation role. You will lead our digital, data and information agenda, unlocking innovation to improve patient care, outcomes and staff experience. From maximising our Electronic Patient Record (EPR) systems to embedding AI safely and strengthening cyber resilience, your leadership will be central to our future as a modern NHS organisation. Main duties of the job What makes this role special: You will be joining an organisation with a strong sense of ambition, developing partnerships, and a clear commitment to innovation and inclusion. In this role, you will have the opportunity to shape a Trust-wide digital vision that transforms care, influence collaboration and innovation across the wider system, and lead a talented and dedicated workforce. Above all, you will play a key role in delivering meaningful, lasting impact for patients and the communities we serve. If you are passionate about the power of digital to transform healthcare and are ready to lead at scale, we would love to hear from you. What you will lead Transforming care through digital - setting a clear direction for digital enabled service transformation Maximising EPR value - realising the full benefits of our EPR investment Enhancing patient experience - improving digital access and communication Leading innovation safely - supporting responsible adoption of AI technologies Driving efficiency - reshaping teams and services to deliver value in a constrained environment Protecting our infrastructure - ensuring robust information governance and cyber security Strengthening partnerships - working with the IBC Cluster and system partners About us comprises over 11,000 dedicated colleagues who every day. We deliver intellectual disability, mental health, community health, forensic, and offender healthcare services across Nottinghamshire, Leicestershire, Lincolnshire, and South Yorkshire. Our care is provided from over 200 sites, spanning community locations, acute settings, and secure environments. We are one of the largest mental health and community Trusts in the East Midlands and one of Nottinghamshire's biggest employers. We also host national and regional services, such as the National High Secure Deaf Service and the Nottingham Centre for Transgender Health. We offer a variety of employee led staff networks, including Equality, Diversity, and Inclusion (EDI) groups, the Green Champions network, the Freedom to Speak Up network, the Health and Wellbeing Champions network, and the Menopause Champions network. These networks play a vital role in supporting our diverse workforce and promoting a culture of inclusivity. The health and wellbeing of our colleagues is a top priority. We invest significantly in this through our in house occupational health and staff counselling services, supported by a dedicated Health and Wellbeing team. The Trust is committed to reducing its carbon emissions, with a specialised Energy and Environmental team working to ensure compliance with environmental legislation, enhance our environmental performance, and achieve our net zero commitment. Job responsibilities Maximising the benefit of the investment in the Trusts Electronic Patient Record (EPR) systems Improving patient digital communications Supporting the safe implementation of AI technologies Overseeing the move of the Trusts data centre and the re procurement of our EPR(s) Delivering efficiencies including restructuring the teams to maximise delivery Setting a clear direction for how digital can support the transformation of care services Automating and improving our data analytical capabilities Maintaining the Trusts information and cyber security Meeting the varying demands of a large, complex organisation with a mix of in patient and community services spread across a wide geography and with different user requirements Managing a large supplier base ensuring value for money Working collaboratively with the new IBC Cluster, identifying opportunities for working together Acting as a key leader within the Trusts Extended Executive Leadership Team Meeting ambitious targets to reduce corporate costs Who are we looking for We are seeking a credible, values driven leader with very senior experience in digital, data, or technology within a complex organisation. You will bring a strong track record of delivering large scale transformation, alongside the ability to balance innovation with safety and operational delivery. The successful candidate will demonstrate excellent influencing skills at a senior organisation and system level, supported by sound commercial acumen and proven programme leadership experience. Above all, you will have a deep and authentic commitment to improving patient care while supporting and valuing our workforce. Please note applicants will be required to pay for their DBS check. Costs are deducted from salary following appointment. The cost of the DBS application is £26.40 (standard) or £54.40 (enhanced), this cost will be deducted from your salary over the first two months of employment. You are encouraged to enrol for the DBS Update Service. An annual fee of £16 per year applies. Person Specification Qualifications - Academic / Craft / Professional Educated to Masters degree level or equivalent experience and knowledge in a relevant field. Training Information Governance training. Data usage training. Project management training. Leadership qualification or training. Experience Significant experience of development, implementation and maintenance of IT infrastructures, applications, projects and procurement. Significant senior management experience in a complex environment. Experience of managing large scale projects. Experience of service re design and business engineering. Experience of leading multi disciplinary teams. Contractual A full UK driving licence and vehicle for business use is required for this post; reasonable adjustments will be made for disabled individuals in line with the Equality Act 2010. Ability to travel to various Trust sites. Trust Values All colleagues are expected to demonstrate at interview that they act in line with Nottinghamshire Healthcare NHS Foundation Trust Values: Trust, Honesty, Respect, Compassion, Teamwork. All colleagues are expected to demonstrate an understanding of and commitment to Equality, Diversity and Inclusion (EDI) and how it applies to their role. The Trust's expectations are highlighted within our EDI Policy, and associated EDI and Human Rights legislation. Knowledge Expertise in national and European data protection laws and practices including in depth understanding of GDPR. Network and Information Systems (NIS) Regulations 2018. Knowledge of the NHS & Nottinghamshire Healthcare NHS Foundation Trust. Best practice relating to technology and information systems such as cyber security, information security, data warehousing, information management, records management, etc. Skills Strong verbal & written communication skills, ability to communicate to a wide variety of stakeholders. Development of strategic plans with strong analytical skills. Excellent leadership and management skills, able to inspire and instill confidence whilst maximising performance. Disclosure and Barring Service Check This post is subject to the Rehabilitation of Offenders Act (Exceptions Order) 1975 and as such it will be necessary for a submission for Disclosure to be made to the Disclosure and Barring Service (formerly known as CRB) to check for any previous criminal convictions. Nottinghamshire Healthcare NHS Foundation Trust £112,782 to £129,783 a year per annum (pro rata for part time)
Support Engineer - Bristol London, England, United Kingdom
XTX Markets
XTX Markets is a leading algorithmic trading firm which uses state-of-the-art machine learning technology to produce price forecasts for over 50,000 financial instruments across equities, fixed income, currencies, commodities and crypto. It uses those forecasts to trade on exchanges and alternative trading venues, and to offer differentiated liquidity directly to clients worldwide. The firm trades over $250bn a day across 35 countries and has over 250 employees based in London, Singapore, New York, Paris, Bristol, Mumbai, Yerevan and Kajaani. We leverage the talent of the people who work here, modern computational techniques and state-of-the-art research infrastructure to analyse large data sets across markets quickly and efficiently, to maximize the effectiveness of our proprietary trading algorithms. We are actively seeking new methods and ideas. The models that drive our trading strategies have evolved considerably over the last 10 years, from econometric methods that gave our company its name, to trees, to neural networks, to modern deep learning architectures. XTX Markets has an unrivalled level of computational resources in the trading industry, with a growing research cluster currently containing over 25,000 GPUs with 650 petabytes of usable storage. Teams across the firm include world-class researchers, developers and technologists with backgrounds in pure math, programming, physics, computer science and machine learning. The firm is also constructing a large-scale data centre in Finland to future prove its significant computational capabilities. At XTX Markets technology is our business and we are a diverse organization which attracts outstanding talent from across all industry backgrounds. We are focused on teamwork, and our people collaborate on all aspects of the business, working openly and with respect for each other, our clients and the market. Our culture is non hierarchical and one where everyone is valued. We strive for excellence in everything we do. The Role Support Engineer - End User Computing. The End User Computing team look after all the technology used directly by humans in our organisation, such as the Windows and Linux desktops and Windows server infrastructure (including active directory, application deployment, clustered enterprise database systems, monitoring and automated build systems), audio visual deployments, LAN, internet, and WiFi network environments and a wide range of enterprise applications (including e mail, third party productivity tools, and finance specific applications). We exhibit special expertise in automation (predominantly in PowerShell, but also in python and other languages). We currently operate 4 days a week in office, 1 from home. Responsibilities You will support our employees in their day to day use of the computer systems in our Bristol Office and offer remote support to other European offices as required. Perform day to day operations, support, and maintenance of our IT platform and cloud applications, including (but not limited to): Microsoft 365, Slack, Active Directory, Jira, VC Estate and other IT related systems. Co own and continuously improve the onboarding processes for both users and hardware in the Bristol office. Co Manage IT asset tracking and lifecycle processes (procurement, deployment, refresh, and disposal). Contribute to the global Windows platform, supporting standardisation, optimisation, and continuous improvement. Diagnose and resolve issues across: Windows desktops and servers. Active Directory and Entra ID. Hardware and peripherals. Develop and maintain automation solutions to improve efficiency and scalability, including: Automated builds and provisioning. Configuration management. Patch management and application deployment. Monitoring, observability, and performance tooling. Once fully integrated into the business, there is an on call rota to provide IT coverage. This includes occasional weekend support/maintenance tasks, approximately once every 6 weeks with time off in lieu the following week. Qualifications We estimate that you will have at least 5 years' experience in a relevant technical role and you will have many of the following technical skills: Active Directory. Windows 11, Windows Server (2019 / 2022 / 2025). Solid PowerShell scripting skills for automation. Experience with endpoint management tools: SCCM. Microsoft Intune (OS and application deployment/management). Familiarity with Linux command line environments. Working knowledge of version control systems (e.g., Git). Basic understanding of networking concepts and Cisco CLI commands. High proficiency with Microsoft Office applications. Due to the nature of the role, you will be occasionally required to move, lift, rack and install equipment (pc's, switches, servers). Personality and Soft Skills As this role involves regular interaction with, and providing support to our employees, we are seeking an individual that is: Personable and enthusiastic with a genuine desire to enable staff to take full advantage of their IT systems. Able to explain technical concepts and events to non technical staff members. Articulate, numerate and capable of negotiating high value technology equipment and services. Able to reason logically and consistently deliver under pressure. Able to spot new opportunities and follow through with them. Onboarding Note To set our new hire up for success, we envisage that you would need to spend significant chunks of time at our London HQ to onboard with our existing team and learn how we approach end user computing. This could be a few months at the beginning, reducing thereafter to less regular trips. Alternatively, this could be a few weeks at the start and regular periods (2 weeks in every 4) thereafter. XTX would cover all associated costs.
19/06/2026
Full time
XTX Markets is a leading algorithmic trading firm which uses state-of-the-art machine learning technology to produce price forecasts for over 50,000 financial instruments across equities, fixed income, currencies, commodities and crypto. It uses those forecasts to trade on exchanges and alternative trading venues, and to offer differentiated liquidity directly to clients worldwide. The firm trades over $250bn a day across 35 countries and has over 250 employees based in London, Singapore, New York, Paris, Bristol, Mumbai, Yerevan and Kajaani. We leverage the talent of the people who work here, modern computational techniques and state-of-the-art research infrastructure to analyse large data sets across markets quickly and efficiently, to maximize the effectiveness of our proprietary trading algorithms. We are actively seeking new methods and ideas. The models that drive our trading strategies have evolved considerably over the last 10 years, from econometric methods that gave our company its name, to trees, to neural networks, to modern deep learning architectures. XTX Markets has an unrivalled level of computational resources in the trading industry, with a growing research cluster currently containing over 25,000 GPUs with 650 petabytes of usable storage. Teams across the firm include world-class researchers, developers and technologists with backgrounds in pure math, programming, physics, computer science and machine learning. The firm is also constructing a large-scale data centre in Finland to future prove its significant computational capabilities. At XTX Markets technology is our business and we are a diverse organization which attracts outstanding talent from across all industry backgrounds. We are focused on teamwork, and our people collaborate on all aspects of the business, working openly and with respect for each other, our clients and the market. Our culture is non hierarchical and one where everyone is valued. We strive for excellence in everything we do. The Role Support Engineer - End User Computing. The End User Computing team look after all the technology used directly by humans in our organisation, such as the Windows and Linux desktops and Windows server infrastructure (including active directory, application deployment, clustered enterprise database systems, monitoring and automated build systems), audio visual deployments, LAN, internet, and WiFi network environments and a wide range of enterprise applications (including e mail, third party productivity tools, and finance specific applications). We exhibit special expertise in automation (predominantly in PowerShell, but also in python and other languages). We currently operate 4 days a week in office, 1 from home. Responsibilities You will support our employees in their day to day use of the computer systems in our Bristol Office and offer remote support to other European offices as required. Perform day to day operations, support, and maintenance of our IT platform and cloud applications, including (but not limited to): Microsoft 365, Slack, Active Directory, Jira, VC Estate and other IT related systems. Co own and continuously improve the onboarding processes for both users and hardware in the Bristol office. Co Manage IT asset tracking and lifecycle processes (procurement, deployment, refresh, and disposal). Contribute to the global Windows platform, supporting standardisation, optimisation, and continuous improvement. Diagnose and resolve issues across: Windows desktops and servers. Active Directory and Entra ID. Hardware and peripherals. Develop and maintain automation solutions to improve efficiency and scalability, including: Automated builds and provisioning. Configuration management. Patch management and application deployment. Monitoring, observability, and performance tooling. Once fully integrated into the business, there is an on call rota to provide IT coverage. This includes occasional weekend support/maintenance tasks, approximately once every 6 weeks with time off in lieu the following week. Qualifications We estimate that you will have at least 5 years' experience in a relevant technical role and you will have many of the following technical skills: Active Directory. Windows 11, Windows Server (2019 / 2022 / 2025). Solid PowerShell scripting skills for automation. Experience with endpoint management tools: SCCM. Microsoft Intune (OS and application deployment/management). Familiarity with Linux command line environments. Working knowledge of version control systems (e.g., Git). Basic understanding of networking concepts and Cisco CLI commands. High proficiency with Microsoft Office applications. Due to the nature of the role, you will be occasionally required to move, lift, rack and install equipment (pc's, switches, servers). Personality and Soft Skills As this role involves regular interaction with, and providing support to our employees, we are seeking an individual that is: Personable and enthusiastic with a genuine desire to enable staff to take full advantage of their IT systems. Able to explain technical concepts and events to non technical staff members. Articulate, numerate and capable of negotiating high value technology equipment and services. Able to reason logically and consistently deliver under pressure. Able to spot new opportunities and follow through with them. Onboarding Note To set our new hire up for success, we envisage that you would need to spend significant chunks of time at our London HQ to onboard with our existing team and learn how we approach end user computing. This could be a few months at the beginning, reducing thereafter to less regular trips. Alternatively, this could be a few weeks at the start and regular periods (2 weeks in every 4) thereafter. XTX would cover all associated costs.
Staff Infrastructure Engineer (fully remote)
Yocto Project
Overview We are looking for a Staff Infrastructure Engineer to lead the technical direction and execution of balenaCloud's infrastructure and reliability architecture. As our customer base and device fleets expand globally, we need a dedicated technical lead to drive our transition into multi-region hosting and single-tenant dedicated instances, natively within Amazon Web Services (AWS). At balena, we don't have traditional managers or hierarchy; we rely on high levels of trust, autonomy, and alignment. You will be joining at the Staff Level (Tactical scope / Domain Leader). Given the company strategy (the Why), you define the Tactics and the What, design the How, and heavily participate in the Do. This role represents a dual leadership mandate: you will operate across both Infrastructure Engineering (planning for immense scale, multi-region hosting, and deep AWS automation) and Reliability Engineering (designing the observability tooling, defining operational procedures, and scaling the team's ability to debug and improve the system). Our infrastructure is deeply rooted in AWS, and we need an engineer who can drop in and be highly effective within this ecosystem immediately. Your Impact (Responsibilities) As a Staff Level engineer, you are one of the most experienced team members in your domain. You are not a "ticket solver"; you gain significant autonomy but own the responsibility for your architectural decisions. AWS-Native Architecture: Architect, automate, and optimize deeply integrated AWS environments. You will leverage the right AWS services to build a system that hosts balenaCloud reliably, delivering maximum performance and deep cost/resource optimization on a per-device basis. Infrastructure & Reliability: Bridge the gap between building for scale and running for stability. You will not only design the infrastructure but also drive the reliability practices for our growing systems, driving continuous improvement, robust feedback loops, and incident resilience. Architect for Massive B2B Scale: Design infrastructure capable of handling enterprise-level loads: billions of requests per week (>30 million/hour) and terabytes of data per day. Your mental model should align with massive B2B platforms rather than B2C media streaming. Multi-Region & Single-Tenant Hosting: Own the technical tactics and execution to deploy single-tenant, single-region balenaCloud instances (e.g., dedicated instances in the EU, Australia, US, or Japan) to satisfy strict customer data sovereignty needs. Kubernetes at Scale: Architect and manage multiple balenaCloud stacks simultaneously, overseeing the deployment and orchestration of many independent Kubernetes clusters for various customers. Decade-Long Reliability: We are responsible for physical devices in the real world that will stay deployed for decades. Short-term, fragile infrastructure solutions are unacceptable, as they risk rendering devices lost in the field. Your designs and implementations must meet our >10-year durability bar. Team Enablement & Async Collaboration: You will scale your knowledge across an overwhelmed engineering team. You will document, articulate, and demonstrate decision proposals based on objective facts and empirical evidence, minimizing the need for synchronous calls. Essential Qualifications Experience: Minimum of 6 years of highly relevant professional work experience in infrastructure and reliability engineering. Deep AWS Expertise: Proven, hands on mastery of the AWS ecosystem. You must be able to navigate, architect, and optimize AWS services with immediate effectiveness. Observability & Reliability: Deep understanding of Site Reliability Engineering principles. You have proven experience building highly usable observability tooling, metrics, and monitoring systems from the ground up to support high availability. Exceptional Documentation Skills: Strong, hands on ability to write clear, actionable, and maintainable technical documentation, scaling plans, and onboarding materials for the team. Distributed Systems: Proven experience in multiple geolocation hosting with distributed data and processing, specifically in multi tenant SaaS environments. Core Stack & Automation: Deep expertise with Kubernetes deployments at scale, managing massive PostgreSQL / RDS databases, and proven mastery of Infrastructure as Code and infrastructure automation. Scale Testing: Extensive experience in load and scale testing, specifically handling magnitudes of 10k-100k simultaneous connections. Remote & Async Communication: Fluent English. Intrinsic motivation to prioritize open, text based communication in a public knowledge base. You actively work to reduce synchronous call time to respect scarce overlapping hours across global time zones. Abstract Thinking: Ability to identify, research, and advocate for solutions to complicated problems with minimal technical guidance, working from a defined company strategy. Preferred Skills (Nice to Have) Compliance: Experience deploying solutions into special compliance environments (e.g., federal services, FedRAMP, GovCloud). AWS Certifications: High-level AWS certifications (e.g., AWS Certified Solutions Architect - Professional) are a strong bonus. What "Staff Level" Means for You To succeed in this role, you should fit the following profile based on our internal leveling guide (Tactical level / Domain Leader). Given: The Company Strategy and Environment (e.g., "We need to scale our AWS infrastructure to support dedicated regional hosting to satisfy global data sovereignty laws, while improving overall fleet reliability"). You: Define the What and the How (researching AWS networking options, advocating for a specific EKS cluster architecture, writing the scaling plans, observability specs, and IaC), and heavily participate in the Do (hands on coding and infrastructure provisioning). Enable: You elevate the entire company. You remove systemic friction, prevent architectural dead ends by identifying doomed approaches early, and mentor Domain Contributors. You back up your decision making with solid reasoning. You execute within architectural decisions that hold up over a 10+ year horizon, and raise flags early when tactics or designs threaten that durability. Benefits Competitive salary Autonomous vacation allowance 12 weeks of paid parental leave for new parents Equipment of your choice and hardware for side projects Books of your choice to help you in your work Annual company gathering in an international location, Balena Summit 2024 Working with a talented and globally distributed team Flexible schedules by default
18/06/2026
Full time
Overview We are looking for a Staff Infrastructure Engineer to lead the technical direction and execution of balenaCloud's infrastructure and reliability architecture. As our customer base and device fleets expand globally, we need a dedicated technical lead to drive our transition into multi-region hosting and single-tenant dedicated instances, natively within Amazon Web Services (AWS). At balena, we don't have traditional managers or hierarchy; we rely on high levels of trust, autonomy, and alignment. You will be joining at the Staff Level (Tactical scope / Domain Leader). Given the company strategy (the Why), you define the Tactics and the What, design the How, and heavily participate in the Do. This role represents a dual leadership mandate: you will operate across both Infrastructure Engineering (planning for immense scale, multi-region hosting, and deep AWS automation) and Reliability Engineering (designing the observability tooling, defining operational procedures, and scaling the team's ability to debug and improve the system). Our infrastructure is deeply rooted in AWS, and we need an engineer who can drop in and be highly effective within this ecosystem immediately. Your Impact (Responsibilities) As a Staff Level engineer, you are one of the most experienced team members in your domain. You are not a "ticket solver"; you gain significant autonomy but own the responsibility for your architectural decisions. AWS-Native Architecture: Architect, automate, and optimize deeply integrated AWS environments. You will leverage the right AWS services to build a system that hosts balenaCloud reliably, delivering maximum performance and deep cost/resource optimization on a per-device basis. Infrastructure & Reliability: Bridge the gap between building for scale and running for stability. You will not only design the infrastructure but also drive the reliability practices for our growing systems, driving continuous improvement, robust feedback loops, and incident resilience. Architect for Massive B2B Scale: Design infrastructure capable of handling enterprise-level loads: billions of requests per week (>30 million/hour) and terabytes of data per day. Your mental model should align with massive B2B platforms rather than B2C media streaming. Multi-Region & Single-Tenant Hosting: Own the technical tactics and execution to deploy single-tenant, single-region balenaCloud instances (e.g., dedicated instances in the EU, Australia, US, or Japan) to satisfy strict customer data sovereignty needs. Kubernetes at Scale: Architect and manage multiple balenaCloud stacks simultaneously, overseeing the deployment and orchestration of many independent Kubernetes clusters for various customers. Decade-Long Reliability: We are responsible for physical devices in the real world that will stay deployed for decades. Short-term, fragile infrastructure solutions are unacceptable, as they risk rendering devices lost in the field. Your designs and implementations must meet our >10-year durability bar. Team Enablement & Async Collaboration: You will scale your knowledge across an overwhelmed engineering team. You will document, articulate, and demonstrate decision proposals based on objective facts and empirical evidence, minimizing the need for synchronous calls. Essential Qualifications Experience: Minimum of 6 years of highly relevant professional work experience in infrastructure and reliability engineering. Deep AWS Expertise: Proven, hands on mastery of the AWS ecosystem. You must be able to navigate, architect, and optimize AWS services with immediate effectiveness. Observability & Reliability: Deep understanding of Site Reliability Engineering principles. You have proven experience building highly usable observability tooling, metrics, and monitoring systems from the ground up to support high availability. Exceptional Documentation Skills: Strong, hands on ability to write clear, actionable, and maintainable technical documentation, scaling plans, and onboarding materials for the team. Distributed Systems: Proven experience in multiple geolocation hosting with distributed data and processing, specifically in multi tenant SaaS environments. Core Stack & Automation: Deep expertise with Kubernetes deployments at scale, managing massive PostgreSQL / RDS databases, and proven mastery of Infrastructure as Code and infrastructure automation. Scale Testing: Extensive experience in load and scale testing, specifically handling magnitudes of 10k-100k simultaneous connections. Remote & Async Communication: Fluent English. Intrinsic motivation to prioritize open, text based communication in a public knowledge base. You actively work to reduce synchronous call time to respect scarce overlapping hours across global time zones. Abstract Thinking: Ability to identify, research, and advocate for solutions to complicated problems with minimal technical guidance, working from a defined company strategy. Preferred Skills (Nice to Have) Compliance: Experience deploying solutions into special compliance environments (e.g., federal services, FedRAMP, GovCloud). AWS Certifications: High-level AWS certifications (e.g., AWS Certified Solutions Architect - Professional) are a strong bonus. What "Staff Level" Means for You To succeed in this role, you should fit the following profile based on our internal leveling guide (Tactical level / Domain Leader). Given: The Company Strategy and Environment (e.g., "We need to scale our AWS infrastructure to support dedicated regional hosting to satisfy global data sovereignty laws, while improving overall fleet reliability"). You: Define the What and the How (researching AWS networking options, advocating for a specific EKS cluster architecture, writing the scaling plans, observability specs, and IaC), and heavily participate in the Do (hands on coding and infrastructure provisioning). Enable: You elevate the entire company. You remove systemic friction, prevent architectural dead ends by identifying doomed approaches early, and mentor Domain Contributors. You back up your decision making with solid reasoning. You execute within architectural decisions that hold up over a 10+ year horizon, and raise flags early when tactics or designs threaten that durability. Benefits Competitive salary Autonomous vacation allowance 12 weeks of paid parental leave for new parents Equipment of your choice and hardware for side projects Books of your choice to help you in your work Annual company gathering in an international location, Balena Summit 2024 Working with a talented and globally distributed team Flexible schedules by default
Member of Technical Staff (AI Infrastructure Engineer)
Pantera Capital
Location London Employment Type Full time Location Type Hybrid Department AI We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-scale AI training and inference clusters. Responsibilities Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training workloads Manage and optimize Slurm-based HPC environments for distributed training of large language models Develop robust APIs and orchestration systems for both training pipelines and inference services Implement resource scheduling and job management systems across heterogeneous compute environments Benchmark system performance, diagnose bottlenecks, and implement improvements across both training and inference infrastructure Build monitoring, alerting, and observability solutions tailored to ML workloads running on Kubernetes and Slurm Respond swiftly to system outages and collaborate across teams to maintain high uptime for critical training runs and inference services Optimize cluster utilization and implement autoscaling strategies for dynamic workload demands Qualifications Strong expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management Hands-on experience with Slurm workload management, including job scheduling, resource allocation, and cluster optimization Experience with deploying and managing distributed training systems at scale Deep understanding of container orchestration and distributed systems architecture High level familiarity with LLM architecture and training processes (Multi-Head Attention, Multi/Grouped-Query, distributed training strategies) Experience managing GPU clusters and optimizing compute resource utilization Required Skills Expert-level Kubernetes administration and YAML configuration management Proficiency with Slurm job scheduling, resource management, and cluster configuration Python and C++ programming with focus on systems and infrastructure automation Hands-on experience with ML frameworks such as PyTorch in distributed training contexts Strong understanding of networking, storage, and compute resource management for ML workloads Experience developing APIs and managing distributed systems for both batch and real-time workloads Solid debugging and monitoring skills with expertise in observability tools for containerized environments Preferred Skills Experience with Kubernetes operators and custom controllers for ML workloads Advanced Slurm administration including multi-cluster federation and advanced scheduling policies Familiarity with GPU cluster management and CUDA optimization Experience with other ML frameworks like TensorFlow or distributed training libraries Background in HPC environments, parallel computing, and high-performance networking Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices Experience with container registries, image optimization, and multi-stage builds for ML workloads Required Experience Demonstrated experience managing large-scale Kubernetes deployments in production environments Proven track record with Slurm cluster administration and HPC workload management Previous roles in SRE, DevOps, or Platform Engineering with focus on ML infrastructure Experience supporting both long-running training jobs and high-availability inference services Ideally, 3-5 years of relevant experience in ML systems deployment with specific focus on cluster orchestration and resource management
16/06/2026
Full time
Location London Employment Type Full time Location Type Hybrid Department AI We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-scale AI training and inference clusters. Responsibilities Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training workloads Manage and optimize Slurm-based HPC environments for distributed training of large language models Develop robust APIs and orchestration systems for both training pipelines and inference services Implement resource scheduling and job management systems across heterogeneous compute environments Benchmark system performance, diagnose bottlenecks, and implement improvements across both training and inference infrastructure Build monitoring, alerting, and observability solutions tailored to ML workloads running on Kubernetes and Slurm Respond swiftly to system outages and collaborate across teams to maintain high uptime for critical training runs and inference services Optimize cluster utilization and implement autoscaling strategies for dynamic workload demands Qualifications Strong expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management Hands-on experience with Slurm workload management, including job scheduling, resource allocation, and cluster optimization Experience with deploying and managing distributed training systems at scale Deep understanding of container orchestration and distributed systems architecture High level familiarity with LLM architecture and training processes (Multi-Head Attention, Multi/Grouped-Query, distributed training strategies) Experience managing GPU clusters and optimizing compute resource utilization Required Skills Expert-level Kubernetes administration and YAML configuration management Proficiency with Slurm job scheduling, resource management, and cluster configuration Python and C++ programming with focus on systems and infrastructure automation Hands-on experience with ML frameworks such as PyTorch in distributed training contexts Strong understanding of networking, storage, and compute resource management for ML workloads Experience developing APIs and managing distributed systems for both batch and real-time workloads Solid debugging and monitoring skills with expertise in observability tools for containerized environments Preferred Skills Experience with Kubernetes operators and custom controllers for ML workloads Advanced Slurm administration including multi-cluster federation and advanced scheduling policies Familiarity with GPU cluster management and CUDA optimization Experience with other ML frameworks like TensorFlow or distributed training libraries Background in HPC environments, parallel computing, and high-performance networking Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices Experience with container registries, image optimization, and multi-stage builds for ML workloads Required Experience Demonstrated experience managing large-scale Kubernetes deployments in production environments Proven track record with Slurm cluster administration and HPC workload management Previous roles in SRE, DevOps, or Platform Engineering with focus on ML infrastructure Experience supporting both long-running training jobs and high-availability inference services Ideally, 3-5 years of relevant experience in ML systems deployment with specific focus on cluster orchestration and resource management

Modal Window

  • Home
  • Contact
  • About Us
  • FAQs
  • Terms & Conditions
  • Privacy
  • Employer
  • Post a Job
  • Search Resumes
  • Sign in
  • Job Seeker
  • Find Jobs
  • Create Resume
  • Sign in
  • IT blog
  • Facebook
  • Twitter
  • LinkedIn
  • Youtube
© 2008-2026 IT Job Board