Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Support and Operations team plays a critical role in maintaining service availability, driving service reliability and rapid response to customer tickets We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role (Job Purpose) We're looking for an Engineer that has good people, leadership & technical skills. A technical expert responsible for ensuring the efficiency, reliability, and scalability of data centre infrastructure. You're comfortable problem solving & making decisions on complex topics with high levels of ambiguity in a results driven environment. You're comfortable influencing without authority and exceptional at building relationships with senior stakeholders across the business to get things done. You have the understanding and skillset to grasp technical concepts and problems quickly You have strong analytical skills You're a doer who is extremely organised and diligent You're a self starter, curious, and quick to learn, knowing what questions to ask to get up to speed quickly What You'll be Doing (Responsibilities) You'll join the Support duty rotation and, as a Senior, will collaborate with Engineering on incidents and changes. Proactively improve dashboards, alerts, and runbooks to prevent repeat incidents. Contribute to knowledge sharing across Operations and Engineering, including training content, workshops, and PR reviews. Drive to upskill - better the team and yourself. Accurately record, update, manage and resolve tickets using the call tracking system whilst keeping all parties (internal or external) informed of the tickets progression via phone and email. Demonstrate a solid understanding of the underlying Platform to our customers and providing assistance in helping them leverage the service and products Respond to incoming monitoring alerts, resolving or escalating as required in accordance with priorities and agreed service levels Take decisive actions, and calculated risks, on technically complex incidents and tasks to ensure business speed and efficiency. Lead by earning trust, speaking candidly, and benchmark against the best to identify where we can improve. Disagree when appropriate and challenge the status quo. Commit wholly to decisions and plans once in motion. Be a technical expert, and drive the team to make the best decisions. Deliver project tasks, improvements, and technical assessments in the right quality in a timely fashion. Handle escalated customer support issues, providing solutions aligned with business SLA requirements Design and implement automation scripts and tools to optimize processes. Conduct root cause analysis for major incidents and recommend long term fixes. Collaborate with cross functional teams for service improvements Responding to critical incidents during out of business hours, and be on call as required. About You (Skills / Qualifications) Ability to adapt to customer driven demands, such as providing specialist support after core business hours, with availability to travel to Nscale or Customer locations to provide onsite technical expertise and guidance. Disciplined, organised and self motivated. Able to motivate, support and mentor other team members Strong leadership principals, with a bias for taking decisive action, working independently, and driving the team and wider organisation to improve. Understanding of how datacenters operate and the core datacentre technologies: Servers, Networks, Storage and Virtualisation, ideally gained through an operational support background. Good organisational and time management skills, with strong interpersonal skills, able to deal effectively with people at all levels whilst also having good written and verbal communication skills Linux systems engineering at scale. Strong command over modern Linux distributions, kernel modules, systemd, networking stack, and filesystem tooling. Proven troubleshooting across compute, storage and network layers in production. Kubernetes. Operate and troubleshoot K8s clusters, and understand how physical resources are abstracted up the stack to K8s. GPU platforms (NVIDIA and AMD). Practical experience with GPU drivers and GPU logs investigation tools, e.g. nvidia smi. Performance diagnostics using NCCL on large scale clusters. Observability and incident response. Build and use alerting stacks and dashboards, interpret metrics and alerts, and drive runbooks to resolution; contribute to SLOs and post incident reviews. Strong Networking fundamentals. Solid grasp of L2/L3, routing, BGP, VLANs, VXLAN, firewalls, load balancing. Understanding of high performance fabrics (RDMA/NVLink basics) for cluster to cluster traffic. SRE style operations. Write and maintain runbooks, automate diagnostics, and reduce human intervention using scripts or small tools. Cloud Infrastructure Administration and Troubleshooting. Strong familiarity with using virtualisation technologies, and investigating issues that arise, performing deep dive investigation to perform root cause analysis. Openstack operations experience preferred. Nice to Have Automated Network Configuration. Experience automating network deployment configurations and making safe, repeatable changes in business critical environments. Strong GPU HPC concepts, beyond standard troubleshooting. Familiarity and prior experience with RDMA/InfiniBand and performance tuning for distributed workloads. Support scheduling for large multi GPU jobs, containers via Pyxis/Enroot, and MPI. Diagnose queue, topology, and job failures. GitOps tooling and cluster/app automation pipelines. Build and maintain CICD pipelines. Re architecting old scripts to use Github Actions. What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
03/02/2026
Full time
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Support and Operations team plays a critical role in maintaining service availability, driving service reliability and rapid response to customer tickets We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role (Job Purpose) We're looking for an Engineer that has good people, leadership & technical skills. A technical expert responsible for ensuring the efficiency, reliability, and scalability of data centre infrastructure. You're comfortable problem solving & making decisions on complex topics with high levels of ambiguity in a results driven environment. You're comfortable influencing without authority and exceptional at building relationships with senior stakeholders across the business to get things done. You have the understanding and skillset to grasp technical concepts and problems quickly You have strong analytical skills You're a doer who is extremely organised and diligent You're a self starter, curious, and quick to learn, knowing what questions to ask to get up to speed quickly What You'll be Doing (Responsibilities) You'll join the Support duty rotation and, as a Senior, will collaborate with Engineering on incidents and changes. Proactively improve dashboards, alerts, and runbooks to prevent repeat incidents. Contribute to knowledge sharing across Operations and Engineering, including training content, workshops, and PR reviews. Drive to upskill - better the team and yourself. Accurately record, update, manage and resolve tickets using the call tracking system whilst keeping all parties (internal or external) informed of the tickets progression via phone and email. Demonstrate a solid understanding of the underlying Platform to our customers and providing assistance in helping them leverage the service and products Respond to incoming monitoring alerts, resolving or escalating as required in accordance with priorities and agreed service levels Take decisive actions, and calculated risks, on technically complex incidents and tasks to ensure business speed and efficiency. Lead by earning trust, speaking candidly, and benchmark against the best to identify where we can improve. Disagree when appropriate and challenge the status quo. Commit wholly to decisions and plans once in motion. Be a technical expert, and drive the team to make the best decisions. Deliver project tasks, improvements, and technical assessments in the right quality in a timely fashion. Handle escalated customer support issues, providing solutions aligned with business SLA requirements Design and implement automation scripts and tools to optimize processes. Conduct root cause analysis for major incidents and recommend long term fixes. Collaborate with cross functional teams for service improvements Responding to critical incidents during out of business hours, and be on call as required. About You (Skills / Qualifications) Ability to adapt to customer driven demands, such as providing specialist support after core business hours, with availability to travel to Nscale or Customer locations to provide onsite technical expertise and guidance. Disciplined, organised and self motivated. Able to motivate, support and mentor other team members Strong leadership principals, with a bias for taking decisive action, working independently, and driving the team and wider organisation to improve. Understanding of how datacenters operate and the core datacentre technologies: Servers, Networks, Storage and Virtualisation, ideally gained through an operational support background. Good organisational and time management skills, with strong interpersonal skills, able to deal effectively with people at all levels whilst also having good written and verbal communication skills Linux systems engineering at scale. Strong command over modern Linux distributions, kernel modules, systemd, networking stack, and filesystem tooling. Proven troubleshooting across compute, storage and network layers in production. Kubernetes. Operate and troubleshoot K8s clusters, and understand how physical resources are abstracted up the stack to K8s. GPU platforms (NVIDIA and AMD). Practical experience with GPU drivers and GPU logs investigation tools, e.g. nvidia smi. Performance diagnostics using NCCL on large scale clusters. Observability and incident response. Build and use alerting stacks and dashboards, interpret metrics and alerts, and drive runbooks to resolution; contribute to SLOs and post incident reviews. Strong Networking fundamentals. Solid grasp of L2/L3, routing, BGP, VLANs, VXLAN, firewalls, load balancing. Understanding of high performance fabrics (RDMA/NVLink basics) for cluster to cluster traffic. SRE style operations. Write and maintain runbooks, automate diagnostics, and reduce human intervention using scripts or small tools. Cloud Infrastructure Administration and Troubleshooting. Strong familiarity with using virtualisation technologies, and investigating issues that arise, performing deep dive investigation to perform root cause analysis. Openstack operations experience preferred. Nice to Have Automated Network Configuration. Experience automating network deployment configurations and making safe, repeatable changes in business critical environments. Strong GPU HPC concepts, beyond standard troubleshooting. Familiarity and prior experience with RDMA/InfiniBand and performance tuning for distributed workloads. Support scheduling for large multi GPU jobs, containers via Pyxis/Enroot, and MPI. Diagnose queue, topology, and job failures. GitOps tooling and cluster/app automation pipelines. Build and maintain CICD pipelines. Re architecting old scripts to use Github Actions. What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
A tech company is seeking an Infrastructure Support Manager to lead global datacenter operations. The role includes managing a team for monitoring and troubleshooting critical GPU and storage systems. Ideal candidates will have experience in datacenter infrastructure support and excellent leadership skills. The position promotes a collaborative work environment with opportunities for personal and professional growth, flexible work arrangements, and competitive compensation.
03/02/2026
Full time
A tech company is seeking an Infrastructure Support Manager to lead global datacenter operations. The role includes managing a team for monitoring and troubleshooting critical GPU and storage systems. Ideal candidates will have experience in datacenter infrastructure support and excellent leadership skills. The position promotes a collaborative work environment with opportunities for personal and professional growth, flexible work arrangements, and competitive compensation.
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Support and Operations team plays a critical role in maintaining service availability, driving service reliability and rapid response to customer tickets globally. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role (Job Purpose) Nscale is seeking an Infrastructure Support Manager to lead the daily operations and support of our global datacenter infrastructure. This role will manage a team of engineers providing monitoring, troubleshooting, and incident response for mission critical GPU, networking, and storage systems across multiple datacenters. You will ensure that incidents are resolved quickly, infrastructure health is continuously monitored, and support processes are followed consistently. This leadership role is key to guaranteeing operational excellence and reliability across Nscale's datacenter footprint. What You'll be Doing (Responsibilities) Lead and manage a team of infrastructure support engineers across global datacenter sites. Oversee daily monitoring and support of GPU, networking, and storage systems. Ensure rapid and effective incident response, escalation, and resolution. Develop and maintain support processes, runbooks, and escalation procedures. Collaborate with engineering, buildout, and operations teams to improve reliability and reduce recurring issues. Conduct root cause analysis and implement preventative measures for critical incidents. Track and report on support metrics (SLAs, uptime, MTTR, incident volume) to leadership. Drive adoption of monitoring, observability, and automation tools across the team. Mentor and develop team members, fostering a culture of operational excellence. Participate in the on call rotation and ensure adequate coverage across regions. About You (Skills / Qualifications) Proven experience in datacenter infrastructure support or operations management. Strong technical knowledge of servers, GPUs, networking, and storage systems. Solid understanding of monitoring and observability practices and tools (e.g., Prometheus, Grafana, Datadog). Experience leading support teams in mission critical 24/7 environments. Excellent troubleshooting and problem solving skills with a focus on root cause analysis. Familiarity with ITIL or other support frameworks for incident, problem, and change management. Strong leadership, communication, and coaching skills with the ability to manage global teams. Ability to collaborate across engineering, operations, and vendor partners. Nice to have: Experience in AI/ML or high performance computing infrastructure support. Knowledge of GPU orchestration and containerized environments (e.g., Kubernetes). Familiarity with automation and Infrastructure as Code (Terraform, Ansible, Pulumi). Exposure to sustainability and datacenter energy efficiency practices. What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
03/02/2026
Full time
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Support and Operations team plays a critical role in maintaining service availability, driving service reliability and rapid response to customer tickets globally. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role (Job Purpose) Nscale is seeking an Infrastructure Support Manager to lead the daily operations and support of our global datacenter infrastructure. This role will manage a team of engineers providing monitoring, troubleshooting, and incident response for mission critical GPU, networking, and storage systems across multiple datacenters. You will ensure that incidents are resolved quickly, infrastructure health is continuously monitored, and support processes are followed consistently. This leadership role is key to guaranteeing operational excellence and reliability across Nscale's datacenter footprint. What You'll be Doing (Responsibilities) Lead and manage a team of infrastructure support engineers across global datacenter sites. Oversee daily monitoring and support of GPU, networking, and storage systems. Ensure rapid and effective incident response, escalation, and resolution. Develop and maintain support processes, runbooks, and escalation procedures. Collaborate with engineering, buildout, and operations teams to improve reliability and reduce recurring issues. Conduct root cause analysis and implement preventative measures for critical incidents. Track and report on support metrics (SLAs, uptime, MTTR, incident volume) to leadership. Drive adoption of monitoring, observability, and automation tools across the team. Mentor and develop team members, fostering a culture of operational excellence. Participate in the on call rotation and ensure adequate coverage across regions. About You (Skills / Qualifications) Proven experience in datacenter infrastructure support or operations management. Strong technical knowledge of servers, GPUs, networking, and storage systems. Solid understanding of monitoring and observability practices and tools (e.g., Prometheus, Grafana, Datadog). Experience leading support teams in mission critical 24/7 environments. Excellent troubleshooting and problem solving skills with a focus on root cause analysis. Familiarity with ITIL or other support frameworks for incident, problem, and change management. Strong leadership, communication, and coaching skills with the ability to manage global teams. Ability to collaborate across engineering, operations, and vendor partners. Nice to have: Experience in AI/ML or high performance computing infrastructure support. Knowledge of GPU orchestration and containerized environments (e.g., Kubernetes). Familiarity with automation and Infrastructure as Code (Terraform, Ansible, Pulumi). Exposure to sustainability and datacenter energy efficiency practices. What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
A leading AI GPU cloud company is seeking an Engineer to ensure the efficiency and scalability of their data centre infrastructure. The role involves collaborating with cross-functional teams, managing support tickets, and responding to critical incidents. The ideal candidate will have strong technical skills in Linux, Kubernetes, networking, and GPU platforms, along with excellent leadership and organizational abilities. This position offers competitive compensation, a dynamic work environment, and opportunities for professional growth.
03/02/2026
Full time
A leading AI GPU cloud company is seeking an Engineer to ensure the efficiency and scalability of their data centre infrastructure. The role involves collaborating with cross-functional teams, managing support tickets, and responding to critical incidents. The ideal candidate will have strong technical skills in Linux, Kubernetes, networking, and GPU platforms, along with excellent leadership and organizational abilities. This position offers competitive compensation, a dynamic work environment, and opportunities for professional growth.
Nscale is taking on the hyperscalers by building a vertically integrated cloud built for AI. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As a Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do. About the Role As a Network Deployment Engineer, you will be on the ground in datacentres, installing, commissioning, and validating the networks that power Nscale's GPU clusters. You'll work directly with network gear, cabling, and racking, ensuring new deployments are delivered to specification and fully operational. This role is efficient, requiring hands on work with switches, routers, fibre/ethernet cabling, and test equipment. You'll bring up links, verify connectivity, and hand off to network engineering teams once platforms are live. This role requires travel to DC locations 50%-70% of the time. What you'll be doing Install, rack, cable, and patch switches, routers, and optics for new network deployments. Perform on site validation and acceptance testing of deployed hardware (link tests, optics checks, interface bring up). Execute deployment runbooks for new datacentre builds and expansions. Label, document, and maintain structured cabling standards across sites. Work with network engineers to configure devices using provided templates or scripts. Assist with fabric validation (Ethernet, InfiniBand, RDMA) for GPU cluster connectivity. Maintain accurate records of hardware installs, cabling maps, and deployment checklists. Coordinate with vendors and logistics teams for hardware deliveries and replacements. Support capacity expansion projects and scaling activities across global datacentre sites. About You 2-4 years of experience in datacentre network deployment, field engineering, or infrastructure installation. Comfortable with racking and stacking network gear, structured cabling, and on site troubleshooting. Familiar with network hardware (Cisco, Arista, Juniper, Mellanox/NVIDIA, etc.). Understanding of fibre/ethernet cabling types, optics, and patching standards. Basic knowledge of IP networking (VLANs, trunks, link aggregation). Experience following deployment runbooks and MOPs (Methods of Procedure). Strong focus on safety, organisation, and documentation. Able to work independently in datacentres and travel internationally when required. Nice to Have Prior experience with HPC or GPU dense datacentre deployments. Exposure to automation (Python, Ansible) for device staging/config. Knowledge of datacentre standards (TIA/EIA cabling, hot/cold aisle, structured labelling). In all we do, our core values guide us: Relentless Innovation At Nscale, we constantly push the boundaries of innovation, embracing creative risks to shape the future. Our aim is to deliver products that not only meet but exceed today's expectations, setting new standards for tomorrow. Ownership and Accountability Every Nscaler is fully accountable for their work, driving it with excellence and urgency. We set high standards, ensuring that our contributions are not just good but exceptional. Openness and Transparency We believe trust and transparency are key to our success. We maintain open communication within our teams and with stakeholders, sharing both successes and challenges. Our open source approach allows customers to explore our technology, building trust and ensuring our solutions are both innovative, secure, and reliable. Customer Centric Focus Our customers are central to our mission, and we are committed to delivering impactful solutions that drive real world success. We focus on deeply understanding their needs and challenges, striving to exceed expectations in both product quality and service. Sustainability We are dedicated to considering the long term environmental and societal impacts of our technologies. By integrating sustainability into our operations and product development, we ensure our innovations are both effective and responsible, contributing positively to the world around us. Collaboration at Nscale is fast, efficient, and respectful. We work together seamlessly, with clear communication and mutual respect, ensuring our shared goals are met with high standards and impactful outcomes. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
03/02/2026
Full time
Nscale is taking on the hyperscalers by building a vertically integrated cloud built for AI. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As a Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do. About the Role As a Network Deployment Engineer, you will be on the ground in datacentres, installing, commissioning, and validating the networks that power Nscale's GPU clusters. You'll work directly with network gear, cabling, and racking, ensuring new deployments are delivered to specification and fully operational. This role is efficient, requiring hands on work with switches, routers, fibre/ethernet cabling, and test equipment. You'll bring up links, verify connectivity, and hand off to network engineering teams once platforms are live. This role requires travel to DC locations 50%-70% of the time. What you'll be doing Install, rack, cable, and patch switches, routers, and optics for new network deployments. Perform on site validation and acceptance testing of deployed hardware (link tests, optics checks, interface bring up). Execute deployment runbooks for new datacentre builds and expansions. Label, document, and maintain structured cabling standards across sites. Work with network engineers to configure devices using provided templates or scripts. Assist with fabric validation (Ethernet, InfiniBand, RDMA) for GPU cluster connectivity. Maintain accurate records of hardware installs, cabling maps, and deployment checklists. Coordinate with vendors and logistics teams for hardware deliveries and replacements. Support capacity expansion projects and scaling activities across global datacentre sites. About You 2-4 years of experience in datacentre network deployment, field engineering, or infrastructure installation. Comfortable with racking and stacking network gear, structured cabling, and on site troubleshooting. Familiar with network hardware (Cisco, Arista, Juniper, Mellanox/NVIDIA, etc.). Understanding of fibre/ethernet cabling types, optics, and patching standards. Basic knowledge of IP networking (VLANs, trunks, link aggregation). Experience following deployment runbooks and MOPs (Methods of Procedure). Strong focus on safety, organisation, and documentation. Able to work independently in datacentres and travel internationally when required. Nice to Have Prior experience with HPC or GPU dense datacentre deployments. Exposure to automation (Python, Ansible) for device staging/config. Knowledge of datacentre standards (TIA/EIA cabling, hot/cold aisle, structured labelling). In all we do, our core values guide us: Relentless Innovation At Nscale, we constantly push the boundaries of innovation, embracing creative risks to shape the future. Our aim is to deliver products that not only meet but exceed today's expectations, setting new standards for tomorrow. Ownership and Accountability Every Nscaler is fully accountable for their work, driving it with excellence and urgency. We set high standards, ensuring that our contributions are not just good but exceptional. Openness and Transparency We believe trust and transparency are key to our success. We maintain open communication within our teams and with stakeholders, sharing both successes and challenges. Our open source approach allows customers to explore our technology, building trust and ensuring our solutions are both innovative, secure, and reliable. Customer Centric Focus Our customers are central to our mission, and we are committed to delivering impactful solutions that drive real world success. We focus on deeply understanding their needs and challenges, striving to exceed expectations in both product quality and service. Sustainability We are dedicated to considering the long term environmental and societal impacts of our technologies. By integrating sustainability into our operations and product development, we ensure our innovations are both effective and responsible, contributing positively to the world around us. Collaboration at Nscale is fast, efficient, and respectful. We work together seamlessly, with clear communication and mutual respect, ensuring our shared goals are met with high standards and impactful outcomes. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
About Nscale Nscale is the GPU cloud engineered for AI. We provide cost effective, high performance infrastructure for AI start ups and large enterprise customers. Nscale enables AI focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. Business Operations Team At Nscale, our Business Operations team builds the systems, processes, and insights that drive execution and decision making across the company. We create clarity, remove blockers, and ensure teams have the tools, data, and support they need to move quickly and confidently. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role We're seeking a Business Systems Analyst to bridge process design with the systems our IT team manages. You'll translate business requirements into configurations, integrations, and automations that make work simpler and data cleaner. Partnering with BizOps, IT, and domain teams, you'll map data flows, define source of truth rules, and ensure our Lead to Load (L2L) lifecycle is reflected in our tooling. If you enjoy connecting dots across apps, APIs and users, and you care about reliability, data integrity and UX - this role gives you scope to shape Nscale's operating backbone. What You'll be Doing Requirements & Process Design Facilitate discovery to capture business needs and edge cases Convert requirements into specs, user stories and acceptance criteria Ensure workflows align to L2L process design and controls Systems Configuration & Integrations Configure SaaS platforms; manage fields, objects and permissions Design and maintain integrations (native, iPaaS, API based) Implement automations to reduce manual steps and errors Data Governance & Reliability Define data contracts, definitions and lineage across systems Establish validation rules, dedupe logic and audit trails Monitor sync health; triage breakages and coordinate fixes Change Management & Enablement Partner with IT on release planning, testing, and deployment Maintain user guides and run training for new capabilities Capture feedback - iterate for usability and adoption About You Obsessed with process and a systems problem solver Fluent in data models, integrations and permission concepts Comfortable writing user stories and acceptance tests Strong communicator who can talk to both engineers and operators Quality minded; you design for reliability and maintainability Pragmatic; you know when to buy, build or simplify Curious and proactive; you chase root causes, not symptoms Thrive in fast changing environments with evolving requirements Qualifications & Experience 3 5 years in business systems, solutions analysis, or similar Hands on experience with SaaS configuration and iPaaS (e.g., Zapier/Make) Familiarity with APIs, JSON, webhooks, and basic scripting/SQL Proven track record delivering integrations and data quality improvements Experience partnering with IT/InfoSec on access and change control What We Can Offer You A collaborative, supportive, and innovative environment where your contributions will make a real impact. A competitive compensation package (base + equity) with reviews every 12 months. Work at one of the fastest growing tech startups, backed by top PE/VC firms. A clear progression plan. We want you to keep growing. That means trying new things, leading others, challenging the status quo and owning your impact. Always with our complete support. Flexibility: We see you as individuals first, employees second. This approach includes all the expected perks but goes beyond that to offer true flexibility. We're proud to be a workplace that trusts our Nscalers to excel in their roles while giving you the freedom to shape your day. Remote first: Join our remote first team, and enjoy the flexibility of remote work, allowing you to create a productive and balanced work life setup, while staying connected with your global team. At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know.
03/02/2026
Full time
About Nscale Nscale is the GPU cloud engineered for AI. We provide cost effective, high performance infrastructure for AI start ups and large enterprise customers. Nscale enables AI focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. Business Operations Team At Nscale, our Business Operations team builds the systems, processes, and insights that drive execution and decision making across the company. We create clarity, remove blockers, and ensure teams have the tools, data, and support they need to move quickly and confidently. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role We're seeking a Business Systems Analyst to bridge process design with the systems our IT team manages. You'll translate business requirements into configurations, integrations, and automations that make work simpler and data cleaner. Partnering with BizOps, IT, and domain teams, you'll map data flows, define source of truth rules, and ensure our Lead to Load (L2L) lifecycle is reflected in our tooling. If you enjoy connecting dots across apps, APIs and users, and you care about reliability, data integrity and UX - this role gives you scope to shape Nscale's operating backbone. What You'll be Doing Requirements & Process Design Facilitate discovery to capture business needs and edge cases Convert requirements into specs, user stories and acceptance criteria Ensure workflows align to L2L process design and controls Systems Configuration & Integrations Configure SaaS platforms; manage fields, objects and permissions Design and maintain integrations (native, iPaaS, API based) Implement automations to reduce manual steps and errors Data Governance & Reliability Define data contracts, definitions and lineage across systems Establish validation rules, dedupe logic and audit trails Monitor sync health; triage breakages and coordinate fixes Change Management & Enablement Partner with IT on release planning, testing, and deployment Maintain user guides and run training for new capabilities Capture feedback - iterate for usability and adoption About You Obsessed with process and a systems problem solver Fluent in data models, integrations and permission concepts Comfortable writing user stories and acceptance tests Strong communicator who can talk to both engineers and operators Quality minded; you design for reliability and maintainability Pragmatic; you know when to buy, build or simplify Curious and proactive; you chase root causes, not symptoms Thrive in fast changing environments with evolving requirements Qualifications & Experience 3 5 years in business systems, solutions analysis, or similar Hands on experience with SaaS configuration and iPaaS (e.g., Zapier/Make) Familiarity with APIs, JSON, webhooks, and basic scripting/SQL Proven track record delivering integrations and data quality improvements Experience partnering with IT/InfoSec on access and change control What We Can Offer You A collaborative, supportive, and innovative environment where your contributions will make a real impact. A competitive compensation package (base + equity) with reviews every 12 months. Work at one of the fastest growing tech startups, backed by top PE/VC firms. A clear progression plan. We want you to keep growing. That means trying new things, leading others, challenging the status quo and owning your impact. Always with our complete support. Flexibility: We see you as individuals first, employees second. This approach includes all the expected perks but goes beyond that to offer true flexibility. We're proud to be a workplace that trusts our Nscalers to excel in their roles while giving you the freedom to shape your day. Remote first: Join our remote first team, and enjoy the flexibility of remote work, allowing you to create a productive and balanced work life setup, while staying connected with your global team. At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know.
A leading GPU cloud provider is seeking a Senior Cloud Native Platform Engineer to build and evolve their Kubernetes-native control plane managing GPU-backed infrastructure. This role focuses on extending Kubernetes through custom APIs and controllers, requiring production-grade Go programming and strong Linux fundamentals. The ideal candidate thrives in a collaborative and remote-first environment, contributing to cutting-edge AI technology. Opportunities for dynamic progression and a highly competitive package await the right individual.
03/02/2026
Full time
A leading GPU cloud provider is seeking a Senior Cloud Native Platform Engineer to build and evolve their Kubernetes-native control plane managing GPU-backed infrastructure. This role focuses on extending Kubernetes through custom APIs and controllers, requiring production-grade Go programming and strong Linux fundamentals. The ideal candidate thrives in a collaborative and remote-first environment, contributing to cutting-edge AI technology. Opportunities for dynamic progression and a highly competitive package await the right individual.
Nscale is the GPU cloud engineered for AI. We provide cost effective, high performance infrastructure for AI start ups and large enterprise customers. Nscale enables AI focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Engineering team plays a critical role in designing, building, and operating the platforms that power our GPU cloud. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role The Senior Cloud Native Platform Engineer plays a key role in building and evolving Nscale's Kubernetes native control plane that manages GPU backed infrastructure for AI workloads at scale. This is a deeply technical, hands on role focused on extending Kubernetes through custom APIs and controllers, rather than configuring off the shelf components. You will work on production grade control plane software that encodes infrastructure, scheduling, and policy into Kubernetes native abstractions, and exposes APIs for both internal and external consumers. The role sits between mid level and principal engineers, combining strong individual technical delivery with growing architectural ownership and influence. What You'll be Doing (Responsibilities) Design and implement Kubernetes native APIs using Custom Resource Definitions (CRDs) that model real infrastructure and platform concepts. Build and operate custom controllers that reconcile desired state safely and reliably in production environments. Work deeply within the Kubernetes control loop, including: Informers, caches, and leader election Ownership, finalizers, and garbage collection Failure handling, retries, and idempotency Write Go as your primary development language, producing maintainable, testable, and production ready code. Debug complex issues that span multiple layers, including: Linux kernel, cgroups, and device access Networking, scheduling, and resource isolation Integrate with external infrastructure management systems and APIs. Build systems that are observable, supportable, and safe to operate at scale. Contribute to design discussions, code reviews, and technical standards within the platform engineering team. Act as a technical mentor to mid level engineers, supporting skill development and best practices. About You (Skills / Qualifications) Strong, hands on experience extending Kubernetes through custom controllers and CRDs in production environments. Proven experience writing production grade Go for backend or control plane systems. Solid Linux fundamentals, including processes, memory management, filesystems, and cgroups. Good understanding of networking fundamentals, including TCP/IP, DNS, routing, and overlay networks. Experience building, operating, or debugging distributed systems in production. Comfortable working across system boundaries and reasoning about failure modes and operational behaviour. Able to collaborate effectively across teams and influence technical outcomes through clear communication and quality engineering. What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
03/02/2026
Full time
Nscale is the GPU cloud engineered for AI. We provide cost effective, high performance infrastructure for AI start ups and large enterprise customers. Nscale enables AI focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Engineering team plays a critical role in designing, building, and operating the platforms that power our GPU cloud. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role The Senior Cloud Native Platform Engineer plays a key role in building and evolving Nscale's Kubernetes native control plane that manages GPU backed infrastructure for AI workloads at scale. This is a deeply technical, hands on role focused on extending Kubernetes through custom APIs and controllers, rather than configuring off the shelf components. You will work on production grade control plane software that encodes infrastructure, scheduling, and policy into Kubernetes native abstractions, and exposes APIs for both internal and external consumers. The role sits between mid level and principal engineers, combining strong individual technical delivery with growing architectural ownership and influence. What You'll be Doing (Responsibilities) Design and implement Kubernetes native APIs using Custom Resource Definitions (CRDs) that model real infrastructure and platform concepts. Build and operate custom controllers that reconcile desired state safely and reliably in production environments. Work deeply within the Kubernetes control loop, including: Informers, caches, and leader election Ownership, finalizers, and garbage collection Failure handling, retries, and idempotency Write Go as your primary development language, producing maintainable, testable, and production ready code. Debug complex issues that span multiple layers, including: Linux kernel, cgroups, and device access Networking, scheduling, and resource isolation Integrate with external infrastructure management systems and APIs. Build systems that are observable, supportable, and safe to operate at scale. Contribute to design discussions, code reviews, and technical standards within the platform engineering team. Act as a technical mentor to mid level engineers, supporting skill development and best practices. About You (Skills / Qualifications) Strong, hands on experience extending Kubernetes through custom controllers and CRDs in production environments. Proven experience writing production grade Go for backend or control plane systems. Solid Linux fundamentals, including processes, memory management, filesystems, and cgroups. Good understanding of networking fundamentals, including TCP/IP, DNS, routing, and overlay networks. Experience building, operating, or debugging distributed systems in production. Comfortable working across system boundaries and reasoning about failure modes and operational behaviour. Able to collaborate effectively across teams and influence technical outcomes through clear communication and quality engineering. What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
Nscale is the GPU cloud engineered for AI. We provide cost effective, high performance infrastructure for AI start ups and large enterprise customers. Nscale enables AI focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Deployment team plays a critical role in driving the delivery of our GPU infrastructure into our DCs. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role We're looking for a Datacenter Deployment Engineer. This role will be responsible for the planning and organising of physical deployments; you will be engaged in the design and layout of large scale GPU infrastructure projects. This will range from ensuring BOMs are complete and compatible, through to laying out the physical hardware in racks, cabling, and everything in between. What You'll be Doing Layout of IT equipment inside the datacenters Taking BOMs and ensuring all hardware fits into available space, applying best practice to ensure day two maintainability. Ensuring datacenter environment maximums are adhered to Power Cooling Weight limits Reviewing BOMs for interoperability Ensuring optics/AOC/DAC are fit for purpose Ensuring BOMs meet customer specifications Defining and ensuring deployment standards Cable pathing, length estimation and ordering Working with engineering teams for ensuring consistent deployments Configure firewalls/routers for initial use Work on initial switch base configurations About You (Skills / Qualifications Experience) Advanced knowledge of structured cabling Advanced knowledge of fibre cabling Strong working knowledge of network/server hardware Experience creating/reviewing BOMs Knowledge of CMDB tooling Strong attention to detail Strong communication skills both written and verbal Self starter. See a problem, fix a problem mentality Experience of designing and implementing process Nice to Have: Python Ansible Working knowledge and experience of using InfiniBand fabrics Working knowledge of fat tree or rail optimised designs for AI workloads Ability to perform performance level diagnostics on AI fabrics What we can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. We provide flexible workplace trust, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role. For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice.
03/02/2026
Full time
Nscale is the GPU cloud engineered for AI. We provide cost effective, high performance infrastructure for AI start ups and large enterprise customers. Nscale enables AI focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Deployment team plays a critical role in driving the delivery of our GPU infrastructure into our DCs. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role We're looking for a Datacenter Deployment Engineer. This role will be responsible for the planning and organising of physical deployments; you will be engaged in the design and layout of large scale GPU infrastructure projects. This will range from ensuring BOMs are complete and compatible, through to laying out the physical hardware in racks, cabling, and everything in between. What You'll be Doing Layout of IT equipment inside the datacenters Taking BOMs and ensuring all hardware fits into available space, applying best practice to ensure day two maintainability. Ensuring datacenter environment maximums are adhered to Power Cooling Weight limits Reviewing BOMs for interoperability Ensuring optics/AOC/DAC are fit for purpose Ensuring BOMs meet customer specifications Defining and ensuring deployment standards Cable pathing, length estimation and ordering Working with engineering teams for ensuring consistent deployments Configure firewalls/routers for initial use Work on initial switch base configurations About You (Skills / Qualifications Experience) Advanced knowledge of structured cabling Advanced knowledge of fibre cabling Strong working knowledge of network/server hardware Experience creating/reviewing BOMs Knowledge of CMDB tooling Strong attention to detail Strong communication skills both written and verbal Self starter. See a problem, fix a problem mentality Experience of designing and implementing process Nice to Have: Python Ansible Working knowledge and experience of using InfiniBand fabrics Working knowledge of fat tree or rail optimised designs for AI workloads Ability to perform performance level diagnostics on AI fabrics What we can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human First Flexibility: We treat you as humans first. We provide flexible workplace trust, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role. For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice.
A leading GPU cloud service provider in the United Kingdom is seeking a Datacenter Deployment Engineer to plan and organize physical deployments of GPU infrastructure. This role involves collaborating with tech teams to design layouts, manage BOMs, and ensure installations meet specifications. Ideal candidates should have advanced knowledge of cabling, strong hardware expertise, and a proactive problem-solving approach. This role offers a competitive package in a remote-first working culture.
03/02/2026
Full time
A leading GPU cloud service provider in the United Kingdom is seeking a Datacenter Deployment Engineer to plan and organize physical deployments of GPU infrastructure. This role involves collaborating with tech teams to design layouts, manage BOMs, and ensure installations meet specifications. Ideal candidates should have advanced knowledge of cabling, strong hardware expertise, and a proactive problem-solving approach. This role offers a competitive package in a remote-first working culture.
Nscale is the GPU cloud engineered for AI. We offer high performance, cost efficient infrastructure designed for modern AI workloads, blending the power of bespoke supercomputers with the flexibility of cloud services. Our vertically integrated platform spans GPU dense, energy efficient data centres through Kubernetes and Slurm orchestration to AI ready services. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role (Job Purpose) As an Observability Platform Engineer, you will design, build, and manage the systems that surface deep visibility into Nscale's infrastructure and AI workloads. You'll treat observability as a product, partnering with engineering and SRE teams to ensure our monitoring, logging, tracing, and alerting platforms are robust, scalable, and easy to use. This role requires hands on engineering experience combined with empathy for how other teams consume observability data. You'll ensure infrastructure health, reliability, and performance by enabling proactive insights and reducing operational friction. What You'll Do Design, build, and support scalable observability infrastructure (metrics, logs, traces, alerts). Collaborate with internal teams to embed observability as a seamless product across GPU clusters, Kubernetes, Slurm, and AI services. Implement and refine monitoring and alerting patterns to enhance system reliability and reliability culture. Maintain production and pre production observability clusters and help others adopt best practices. Automate observability pipelines using IaC tools and scripting for repeatability and consistency. Troubleshoot observability platform issues and support incident remediation efforts. Serve as an advocate for observability best practices, training teams on effective usage and instrumentation. About you Skills / Experience 2-5 years of experience in Software Engineering, SRE, DevOps, or observability related roles. Proficiency in at least one scripting or programming language (Python, Go, Bash). Experience with Kubernetes or containerised environments. Familiarity with on call responsibilities, triaging, and escalating live production issues. Comfortable with observability tooling, Grafana, Prometheus, Loki, OpenTelemetry, ClickHouse, Elastic, Thanos, VictoriaMetrics, etc. Strong communication and collaboration skills, able to empathise with users of observability systems and translate needs into solutions. Preferred Hands on experience operating observability infrastructure at scale. Knowledge of Infrastructure as Code (e.g. Terraform) to automate deployments. Exposure to streaming systems or pipelines for observability data. In all we do, our core values guide us: Relentless Innovation At Nscale, we constantly push the boundaries of innovation, embracing creative risks to shape the future. Our aim is to deliver products that not only meet but exceed today's expectations, setting new standards for tomorrow. Ownership and Accountability Every Nscaler is fully accountable for their work, driving it with excellence and urgency. We set high standards, ensuring that our contributions are not just good but exceptional. Openness and Transparency We believe trust and transparency are key to our success. We maintain open communication within our teams and with stakeholders, sharing both successes and challenges. Our open source approach allows customers to explore our technology, building trust and ensuring our solutions are both innovative, secure, and reliable. Customer Centric Focus Our customers are central to our mission, and we are committed to delivering impactful solutions that drive real world success. We focus on deeply understanding their needs and challenges, striving to exceed expectations in both product quality and service. Sustainability We are dedicated to considering the long term environmental and societal impacts of our technologies. By integrating sustainability into our operations and product development, we ensure that our innovations are both effective and responsible, contributing positively to the world around us. Collaboration Collaboration at Nscale is fast, efficient, and respectful. We work together seamlessly, with clear communication and mutual respect, ensuring our shared goals are met with high standards and impactful outcomes. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role. Apply for this job
02/02/2026
Full time
Nscale is the GPU cloud engineered for AI. We offer high performance, cost efficient infrastructure designed for modern AI workloads, blending the power of bespoke supercomputers with the flexibility of cloud services. Our vertically integrated platform spans GPU dense, energy efficient data centres through Kubernetes and Slurm orchestration to AI ready services. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About the Role (Job Purpose) As an Observability Platform Engineer, you will design, build, and manage the systems that surface deep visibility into Nscale's infrastructure and AI workloads. You'll treat observability as a product, partnering with engineering and SRE teams to ensure our monitoring, logging, tracing, and alerting platforms are robust, scalable, and easy to use. This role requires hands on engineering experience combined with empathy for how other teams consume observability data. You'll ensure infrastructure health, reliability, and performance by enabling proactive insights and reducing operational friction. What You'll Do Design, build, and support scalable observability infrastructure (metrics, logs, traces, alerts). Collaborate with internal teams to embed observability as a seamless product across GPU clusters, Kubernetes, Slurm, and AI services. Implement and refine monitoring and alerting patterns to enhance system reliability and reliability culture. Maintain production and pre production observability clusters and help others adopt best practices. Automate observability pipelines using IaC tools and scripting for repeatability and consistency. Troubleshoot observability platform issues and support incident remediation efforts. Serve as an advocate for observability best practices, training teams on effective usage and instrumentation. About you Skills / Experience 2-5 years of experience in Software Engineering, SRE, DevOps, or observability related roles. Proficiency in at least one scripting or programming language (Python, Go, Bash). Experience with Kubernetes or containerised environments. Familiarity with on call responsibilities, triaging, and escalating live production issues. Comfortable with observability tooling, Grafana, Prometheus, Loki, OpenTelemetry, ClickHouse, Elastic, Thanos, VictoriaMetrics, etc. Strong communication and collaboration skills, able to empathise with users of observability systems and translate needs into solutions. Preferred Hands on experience operating observability infrastructure at scale. Knowledge of Infrastructure as Code (e.g. Terraform) to automate deployments. Exposure to streaming systems or pipelines for observability data. In all we do, our core values guide us: Relentless Innovation At Nscale, we constantly push the boundaries of innovation, embracing creative risks to shape the future. Our aim is to deliver products that not only meet but exceed today's expectations, setting new standards for tomorrow. Ownership and Accountability Every Nscaler is fully accountable for their work, driving it with excellence and urgency. We set high standards, ensuring that our contributions are not just good but exceptional. Openness and Transparency We believe trust and transparency are key to our success. We maintain open communication within our teams and with stakeholders, sharing both successes and challenges. Our open source approach allows customers to explore our technology, building trust and ensuring our solutions are both innovative, secure, and reliable. Customer Centric Focus Our customers are central to our mission, and we are committed to delivering impactful solutions that drive real world success. We focus on deeply understanding their needs and challenges, striving to exceed expectations in both product quality and service. Sustainability We are dedicated to considering the long term environmental and societal impacts of our technologies. By integrating sustainability into our operations and product development, we ensure that our innovations are both effective and responsible, contributing positively to the world around us. Collaboration Collaboration at Nscale is fast, efficient, and respectful. We work together seamlessly, with clear communication and mutual respect, ensuring our shared goals are met with high standards and impactful outcomes. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role. Apply for this job
Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform that spans from sustainable data centres to advanced AI infrastructure and enterprise applications. We're shaping the next generation of AI-native computing - secure, efficient, and transparent. Our culture is built on relentless innovation, accountability, and excellence. As a Nscaler, you'll join a team that values open collaboration, speed, and respect. We encourage bold thinking and trust every individual to take ownership and deliver impact - together. About the Role Nscale is seeking a Principal AI Engineer to lead the design, implementation, and strategic direction of AI systems powering our GenAI cloud. This is a pivotal role that combines deep hands on expertise with visionary, system level leadership. You will be responsible for steering Nscale's AI strategy, identifying new ways to harness AI for efficient, effective, and collaborative execution, and aligning technical progress with business priorities and available resources. You'll operate at the intersection of engineering, strategy, and execution - ensuring that innovation translates into scalable, secure, and valuable outcomes. Responsibilities Lead and shape the long term AI engineering vision, defining architecture, frameworks, and standards for how AI is built, deployed, and governed at Nscale. See the big picture - apply system level thinking to design coherent AI ecosystems that connect infrastructure, data, and product layers. Translate strategy into action by identifying key milestones, dependencies, and capability development paths aligned with business priorities and resourcing realities. Steer AI direction and create new methodologies to harness AI for automation, intelligent decision making, and cross functional collaboration. Architect and oversee large scale, distributed systems for model training, fine tuning, inference, and integration across multimodal and LLM based architectures. Champion security, IAM, and data governance, embedding compliance and trust into every layer of the AI stack. Collaborate across disciplines - partnering with Data, DevOps, and Security teams to ensure observability, scalability, and operational resilience. Drive operational excellence through automation, telemetry, and continuous improvement, fostering a DevOps mindset and data driven culture. Mentor and guide senior engineers and cross functional teams, promoting engineering rigor, design thinking, and a culture of excellence. Evaluate and integrate models and AI providers (OpenAI, Anthropic, open source frameworks, etc.) while optimising for performance, reliability, and cost. Influence company wide strategy, helping executives and technical leaders make informed trade offs in capability development and delivery sequencing. Requirements 15+ years of experience in software, data, or AI engineering, including extensive hands on experience designing and deploying production scale AI systems. Proven success leading and delivering AI initiatives that bridge innovation and pragmatic business impact. Deep understanding of transformer architectures, LLMs, and AI agent frameworks, and practical experience orchestrating them in enterprise grade systems. Proficiency in Python, PyTorch, and modern MLOps/AIOps ecosystems (e.g., LangChain, Ray, Kubeflow, MLflow, Hugging Face). Strong foundation in data management, IAM, governance, and security, with experience embedding these principles into AI lifecycle workflows. Expertise in distributed systems, Kubernetes, and GPU orchestration at scale. Ability to connect engineering initiatives to business outcomes, translating complex AI concepts into actionable strategic roadmaps. Demonstrated systems level thinking - designing architectures that are scalable, interoperable, and measurable. Commitment to observability, metrics, and data driven decision making to guide prioritisation and continuous improvement. Excellent communicator and collaborator, able to influence across technical and executive teams. Preferred Qualifications Experience as a principal, architect, or head of AI in a complex or fast scaling environment. Hands on work with multi agent orchestration, RAG pipelines, or enterprise scale AI automation frameworks. Contributions to open source AI projects or thought leadership in AI system design and governance. Deep familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) and ML observability frameworks. Track record of designing AI capability roadmaps that balance innovation, security, and sustainability. Expertise in AI governance, trust and risk frameworks, or policy aligned AI deployment. Why Nscale Lead a world class AI engineering team tackling the hardest problems in modern AI infrastructure. Shape how enterprises securely and efficiently adopt and scale generative AI. Influence the direction of Nscale's AI ecosystem - from vision to capability development to delivery. Collaborate with some of the brightest minds across AI infrastructure, systems, and applied research. Competitive compensation, equity, and a culture of autonomy, trust, and excellence. At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
02/02/2026
Full time
Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform that spans from sustainable data centres to advanced AI infrastructure and enterprise applications. We're shaping the next generation of AI-native computing - secure, efficient, and transparent. Our culture is built on relentless innovation, accountability, and excellence. As a Nscaler, you'll join a team that values open collaboration, speed, and respect. We encourage bold thinking and trust every individual to take ownership and deliver impact - together. About the Role Nscale is seeking a Principal AI Engineer to lead the design, implementation, and strategic direction of AI systems powering our GenAI cloud. This is a pivotal role that combines deep hands on expertise with visionary, system level leadership. You will be responsible for steering Nscale's AI strategy, identifying new ways to harness AI for efficient, effective, and collaborative execution, and aligning technical progress with business priorities and available resources. You'll operate at the intersection of engineering, strategy, and execution - ensuring that innovation translates into scalable, secure, and valuable outcomes. Responsibilities Lead and shape the long term AI engineering vision, defining architecture, frameworks, and standards for how AI is built, deployed, and governed at Nscale. See the big picture - apply system level thinking to design coherent AI ecosystems that connect infrastructure, data, and product layers. Translate strategy into action by identifying key milestones, dependencies, and capability development paths aligned with business priorities and resourcing realities. Steer AI direction and create new methodologies to harness AI for automation, intelligent decision making, and cross functional collaboration. Architect and oversee large scale, distributed systems for model training, fine tuning, inference, and integration across multimodal and LLM based architectures. Champion security, IAM, and data governance, embedding compliance and trust into every layer of the AI stack. Collaborate across disciplines - partnering with Data, DevOps, and Security teams to ensure observability, scalability, and operational resilience. Drive operational excellence through automation, telemetry, and continuous improvement, fostering a DevOps mindset and data driven culture. Mentor and guide senior engineers and cross functional teams, promoting engineering rigor, design thinking, and a culture of excellence. Evaluate and integrate models and AI providers (OpenAI, Anthropic, open source frameworks, etc.) while optimising for performance, reliability, and cost. Influence company wide strategy, helping executives and technical leaders make informed trade offs in capability development and delivery sequencing. Requirements 15+ years of experience in software, data, or AI engineering, including extensive hands on experience designing and deploying production scale AI systems. Proven success leading and delivering AI initiatives that bridge innovation and pragmatic business impact. Deep understanding of transformer architectures, LLMs, and AI agent frameworks, and practical experience orchestrating them in enterprise grade systems. Proficiency in Python, PyTorch, and modern MLOps/AIOps ecosystems (e.g., LangChain, Ray, Kubeflow, MLflow, Hugging Face). Strong foundation in data management, IAM, governance, and security, with experience embedding these principles into AI lifecycle workflows. Expertise in distributed systems, Kubernetes, and GPU orchestration at scale. Ability to connect engineering initiatives to business outcomes, translating complex AI concepts into actionable strategic roadmaps. Demonstrated systems level thinking - designing architectures that are scalable, interoperable, and measurable. Commitment to observability, metrics, and data driven decision making to guide prioritisation and continuous improvement. Excellent communicator and collaborator, able to influence across technical and executive teams. Preferred Qualifications Experience as a principal, architect, or head of AI in a complex or fast scaling environment. Hands on work with multi agent orchestration, RAG pipelines, or enterprise scale AI automation frameworks. Contributions to open source AI projects or thought leadership in AI system design and governance. Deep familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) and ML observability frameworks. Track record of designing AI capability roadmaps that balance innovation, security, and sustainability. Expertise in AI governance, trust and risk frameworks, or policy aligned AI deployment. Why Nscale Lead a world class AI engineering team tackling the hardest problems in modern AI infrastructure. Shape how enterprises securely and efficiently adopt and scale generative AI. Influence the direction of Nscale's AI ecosystem - from vision to capability development to delivery. Collaborate with some of the brightest minds across AI infrastructure, systems, and applied research. Competitive compensation, equity, and a culture of autonomy, trust, and excellence. At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. If there's anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
A leading GPU cloud provider in the United Kingdom seeks a Senior Project Manager to lead complex global projects in AI-focused data center infrastructure. The role requires extensive experience in data center operations and project management to ensure timely execution of large-scale initiatives while maintaining high-quality standards. Applicants should possess strong organizational skills and the ability to coordinate across multiple teams. Join an innovative workplace that values diverse backgrounds and drives technological advancements.
02/02/2026
Full time
A leading GPU cloud provider in the United Kingdom seeks a Senior Project Manager to lead complex global projects in AI-focused data center infrastructure. The role requires extensive experience in data center operations and project management to ensure timely execution of large-scale initiatives while maintaining high-quality standards. Applicants should possess strong organizational skills and the ability to coordinate across multiple teams. Join an innovative workplace that values diverse backgrounds and drives technological advancements.
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. Nscale is seeking a Senior Project Manager to lead complex global projects across datacenter infrastructure buildouts and operations. You will be responsible for planning, execution, and delivery of large-scale infrastructure initiatives, coordinating across multiple teams, and ensuring projects are completed on time, within scope, and aligned to business priorities. This role is critical to ensuring that Nscale's global expansion and AI datacenter delivery remain on track with strategic objectives. The ideal candidate will have hands-on experience with data center deployments, excellent organizational skills, and the ability to coordinate cross-functional teams to ensure on-time delivery of high-quality infrastructure projects. What You'll be Doing Execute and manage individual GPU data center deployment projects from initiation to completion Create detailed project plans, schedules, and resource allocations for assigned projects Coordinate the activities of engineering teams, contractors, and vendors during deployments Track project progress, identify potential issues, and implement corrective actions as needed Manage the physical installation of racks, power distribution units, servers, network equipment, and GPU clusters Coordinate structured cabling installations and verify proper implementation Ensure all equipment is installed according to Nscale standards and vendor specifications Manage project documentation, including installation guides, as-built records, and handover documentation Coordinate site access, delivery logistics, and on-site resources Conduct regular project status meetings and provide updates to the Program Manager Monitor project budgets and track expenses against forecasts Implement standardized deployment processes and contribute to process improvements Verify that installations meet power, cooling, and weight requirements of the data center facilities Conduct quality assurance checks throughout the deployment process Facilitate the handover of completed deployments to operations teams Key Objectives Successfully deliver assigned GPU infrastructure deployment projects on time and within budget Implement standardized deployment processes to ensure consistency and quality Reduce deployment cycle times and improve efficiency Ensure all installed infrastructure meets technical specifications and performance requirements Maintain accurate and complete project documentation Build strong working relationships with vendors, contractors, and internal teams Identify and implement process improvements based on project experiences Support the overall program goals established by the Program Manager Key Performance Indicators (KPIs) On-time completion rate of assigned projects Adherence to project budgets Speed of deployment (racks/GPUs deployed per week) Quality metrics Number of post-installation issues identified Adherence to installation standards and procedures Process metrics Documentation quality and completeness Time from equipment delivery to operational status Efficiency of resource utilization Internal team feedback Vendor/contractor relationship management Effective communication and reporting About You (Skills / Qualifications Experience) 3+ years of experience in data center operations, IT infrastructure deployment, or related fields 2+ years of project management experience Demonstrated experience with data center hardware installations and structured cabling Knowledge of power distribution, cooling systems, and rack configurations in data centers Understanding of server, network, and GPU hardware components and configurations Experience coordinating the activities of technical teams and contractors Familiarity with data center infrastructure management tools Strong documentation and technical writing skills Excellent organizational and time management abilities Strong problem-solving skills and ability to adapt to changing requirements Detail-oriented with a focus on quality and precision Proficiency with project management and collaboration tools Nice to Have Experience specifically with GPU clusters or high-density computing environments Knowledge of Infiniband and high-performance network fabrics Familiarity with Netbox or similar CMDB tools Experience creating and reviewing Bills of Materials (BOMs) Understanding of network configurations for AI workloads Senior-level experience Mid-Senior level At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
02/02/2026
Full time
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. Nscale is seeking a Senior Project Manager to lead complex global projects across datacenter infrastructure buildouts and operations. You will be responsible for planning, execution, and delivery of large-scale infrastructure initiatives, coordinating across multiple teams, and ensuring projects are completed on time, within scope, and aligned to business priorities. This role is critical to ensuring that Nscale's global expansion and AI datacenter delivery remain on track with strategic objectives. The ideal candidate will have hands-on experience with data center deployments, excellent organizational skills, and the ability to coordinate cross-functional teams to ensure on-time delivery of high-quality infrastructure projects. What You'll be Doing Execute and manage individual GPU data center deployment projects from initiation to completion Create detailed project plans, schedules, and resource allocations for assigned projects Coordinate the activities of engineering teams, contractors, and vendors during deployments Track project progress, identify potential issues, and implement corrective actions as needed Manage the physical installation of racks, power distribution units, servers, network equipment, and GPU clusters Coordinate structured cabling installations and verify proper implementation Ensure all equipment is installed according to Nscale standards and vendor specifications Manage project documentation, including installation guides, as-built records, and handover documentation Coordinate site access, delivery logistics, and on-site resources Conduct regular project status meetings and provide updates to the Program Manager Monitor project budgets and track expenses against forecasts Implement standardized deployment processes and contribute to process improvements Verify that installations meet power, cooling, and weight requirements of the data center facilities Conduct quality assurance checks throughout the deployment process Facilitate the handover of completed deployments to operations teams Key Objectives Successfully deliver assigned GPU infrastructure deployment projects on time and within budget Implement standardized deployment processes to ensure consistency and quality Reduce deployment cycle times and improve efficiency Ensure all installed infrastructure meets technical specifications and performance requirements Maintain accurate and complete project documentation Build strong working relationships with vendors, contractors, and internal teams Identify and implement process improvements based on project experiences Support the overall program goals established by the Program Manager Key Performance Indicators (KPIs) On-time completion rate of assigned projects Adherence to project budgets Speed of deployment (racks/GPUs deployed per week) Quality metrics Number of post-installation issues identified Adherence to installation standards and procedures Process metrics Documentation quality and completeness Time from equipment delivery to operational status Efficiency of resource utilization Internal team feedback Vendor/contractor relationship management Effective communication and reporting About You (Skills / Qualifications Experience) 3+ years of experience in data center operations, IT infrastructure deployment, or related fields 2+ years of project management experience Demonstrated experience with data center hardware installations and structured cabling Knowledge of power distribution, cooling systems, and rack configurations in data centers Understanding of server, network, and GPU hardware components and configurations Experience coordinating the activities of technical teams and contractors Familiarity with data center infrastructure management tools Strong documentation and technical writing skills Excellent organizational and time management abilities Strong problem-solving skills and ability to adapt to changing requirements Detail-oriented with a focus on quality and precision Proficiency with project management and collaboration tools Nice to Have Experience specifically with GPU clusters or high-density computing environments Knowledge of Infiniband and high-performance network fabrics Familiarity with Netbox or similar CMDB tools Experience creating and reviewing Bills of Materials (BOMs) Understanding of network configurations for AI workloads Senior-level experience Mid-Senior level At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
A leading AI cloud platform provider in Greater London is seeking a Principal AI Engineer to spearhead the design and implementation of advanced AI systems. This pivotal role focuses on developing a strategic AI vision, creating coherent ecosystems, and mentoring senior engineers. Candidates should possess extensive experience in AI engineering, with expertise in transformer architectures and a strong command of Python. The ideal applicant thrives in a collaborative environment that fosters innovation and excellence.
02/02/2026
Full time
A leading AI cloud platform provider in Greater London is seeking a Principal AI Engineer to spearhead the design and implementation of advanced AI systems. This pivotal role focuses on developing a strategic AI vision, creating coherent ecosystems, and mentoring senior engineers. Candidates should possess extensive experience in AI engineering, with expertise in transformer architectures and a strong command of Python. The ideal applicant thrives in a collaborative environment that fosters innovation and excellence.
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Software engineers form the backbone of our product offering. We build state of the art AI products allowing Santander we former clients to move quickly in an increasingly competitive digital landscape. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About The Role (Job Purpose) We're looking for an elite Senior Software Engineer to join our Product Engineering team and help grow the core infrastructure that powers Nscale's AI cloud platform. You'll be working on systems that enable customers to train, fine-tune, and deploy AI models at scale. This role spans platform and product engineering, contributing to customer-facing features as well as the underlying systems that support them. You'll collaborate closely with cross-functional teams, help shape technical direction, and write high-quality, well-tested code in a fast-moving, high-growth environment, within a team that supports individuality. What You'll be Doing Design and develop scalable backend services primarily using Go running on Kubernetes Building AI services alongside cloud services to support the needs of our clients. Maintaining and building upon existing services code. Implement event-driven architectures using messaging systems (NATS preferred) to enable loosely-coupled, resilient services Write clean, well-documented code with comprehensive test coverage Participate in code reviews, architectural discussions, and technical design sessions Contribute to CI/CD pipelines and deployment automation About You (Skills / Qualifications) You're passionate about building scalable distributed systems and thrive in a fast-paced environment. You have professional experience with Go and building services on Kubernetes, with a solid understanding of cloud-native architectures. You're comfortable designing and implementing event-driven systems. You take ownership of your work, writing clean, well-tested code and taking features from design through to production. You're collaborative, communicating effectively with cross-functional teams and contributing to architectural decisions. You have good experience working alongside coding agents and know how to best leverage this technology. 5+ years of professional software engineering experience. Demonstrable skill in leveraging Agentic AI throughout the SDLC to build complex systems (using e.g. Claude Code or Cursor alongside business context and requirements). Experience building for and operating services on Kubernetes in production environments. Experience with event-driven architectures and messaging systems (Kafka, NATS, RabbitMQ, or similar). Solid understanding of RESTful API design and implementation. Experience with relational databases (PostgreSQL preferred). Familiarity with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code tools (Terraform, Helm). Strong understanding of distributed systems concepts: consistency, availability, partitioning, and fault tolerance. Experience with containerisation (Docker) and container orchestration. Excellent problem solving skills and attention to detail. Strong communication skills with ability to explain complex technical concepts clearly. Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience. Nice to Have: Experience with NATS (Core NATS, JetStream) for messaging and event streaming. Python experience for scripting, tooling, upbringing. Experience with 天天送钱 billing/metering systems or usage-based pricing models. Familiarity with AI/ML infrastructure, GPU computing, denna model serving (e.g., vLLM, Triton, Ray). Experience with gRPC and Protocol Buffers. Experience in high-growth startup or hyperscale cloud environments. Contributions to open source projects. 3 years working in Go (Golang). What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the gás. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the unconstitutional, and owning your impact, always with our full support. Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around your life. Join our thriving remote-first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. At NScale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. dame. If there's anything we can do to accommodate your specific situation, please let us know. For information on how Nscale handles candidate personal data, please see our Employee & Candidateadha Privacy Notice:Here.
02/02/2026
Full time
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Software engineers form the backbone of our product offering. We build state of the art AI products allowing Santander we former clients to move quickly in an increasingly competitive digital landscape. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. About The Role (Job Purpose) We're looking for an elite Senior Software Engineer to join our Product Engineering team and help grow the core infrastructure that powers Nscale's AI cloud platform. You'll be working on systems that enable customers to train, fine-tune, and deploy AI models at scale. This role spans platform and product engineering, contributing to customer-facing features as well as the underlying systems that support them. You'll collaborate closely with cross-functional teams, help shape technical direction, and write high-quality, well-tested code in a fast-moving, high-growth environment, within a team that supports individuality. What You'll be Doing Design and develop scalable backend services primarily using Go running on Kubernetes Building AI services alongside cloud services to support the needs of our clients. Maintaining and building upon existing services code. Implement event-driven architectures using messaging systems (NATS preferred) to enable loosely-coupled, resilient services Write clean, well-documented code with comprehensive test coverage Participate in code reviews, architectural discussions, and technical design sessions Contribute to CI/CD pipelines and deployment automation About You (Skills / Qualifications) You're passionate about building scalable distributed systems and thrive in a fast-paced environment. You have professional experience with Go and building services on Kubernetes, with a solid understanding of cloud-native architectures. You're comfortable designing and implementing event-driven systems. You take ownership of your work, writing clean, well-tested code and taking features from design through to production. You're collaborative, communicating effectively with cross-functional teams and contributing to architectural decisions. You have good experience working alongside coding agents and know how to best leverage this technology. 5+ years of professional software engineering experience. Demonstrable skill in leveraging Agentic AI throughout the SDLC to build complex systems (using e.g. Claude Code or Cursor alongside business context and requirements). Experience building for and operating services on Kubernetes in production environments. Experience with event-driven architectures and messaging systems (Kafka, NATS, RabbitMQ, or similar). Solid understanding of RESTful API design and implementation. Experience with relational databases (PostgreSQL preferred). Familiarity with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code tools (Terraform, Helm). Strong understanding of distributed systems concepts: consistency, availability, partitioning, and fault tolerance. Experience with containerisation (Docker) and container orchestration. Excellent problem solving skills and attention to detail. Strong communication skills with ability to explain complex technical concepts clearly. Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience. Nice to Have: Experience with NATS (Core NATS, JetStream) for messaging and event streaming. Python experience for scripting, tooling, upbringing. Experience with 天天送钱 billing/metering systems or usage-based pricing models. Familiarity with AI/ML infrastructure, GPU computing, denna model serving (e.g., vLLM, Triton, Ray). Experience with gRPC and Protocol Buffers. Experience in high-growth startup or hyperscale cloud environments. Contributions to open source projects. 3 years working in Go (Golang). What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the gás. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the unconstitutional, and owning your impact, always with our full support. Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around your life. Join our thriving remote-first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work. At NScale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio economic backgrounds. dame. If there's anything we can do to accommodate your specific situation, please let us know. For information on how Nscale handles candidate personal data, please see our Employee & Candidateadha Privacy Notice:Here.
A leading AI infrastructure company in Greater London is seeking a Senior Software Engineer to join their Product Engineering team. You will design and develop scalable backend services using Go and Kubernetes, ensuring high-quality code and collaboration with cross-functional teams. Candidates should possess experience in building event-driven systems and demonstrate strong problem-solving skills. This role provides a competitive package and promotes a flexible, inclusive workplace.
02/02/2026
Full time
A leading AI infrastructure company in Greater London is seeking a Senior Software Engineer to join their Product Engineering team. You will design and develop scalable backend services using Go and Kubernetes, ensuring high-quality code and collaboration with cross-functional teams. Candidates should possess experience in building event-driven systems and demonstrate strong problem-solving skills. This role provides a competitive package and promotes a flexible, inclusive workplace.
Senior Information Security Manager, London About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. Role Overview We are seeking a Senior Information Security Manager to work closely with the Head of Information Security in building and managing Nscale's end-to-end security framework cross physical, technical, and organisational domains. You'll be hands on, execution-focused, and comfortable working in a complex environment that spans hyperscale GPU clusters, critical infrastructure, and compliance programmes (SOC 2 Type II, ISO 27001/17/18, Cyber Essentials Plus, ISO 22301, and ISO 22237).This role will directly support ongoing certification, audit readiness, and incident response initiatives, while driving operational maturity across all Nscale sites and systems. This role requiresUK government security clearance up to DV What you'll do Support ongoing delivery of ISO 27001, ISO 27017/27018, SOC 2 Type II, Cyber Essentials Plus, and ISO 22301 frameworks. Maintain the Information Security Management System (ISMS), risk register, and control evidence for internal and external audits. Support third party risk management (TPRM) ensuring supplier compliance and onboarding reviews. Develop and track KPIs/KRIs for security operations and compliance health. Operational Security Oversee vulnerability management, EDR posture, and security incident workflows in partnership with or MSSPs. Support incident detection, triage, investigation, and root cause analysis. Own operational runbooks for containment, eradication, and recovery procedures. Review access control lists, privileged user logs, and infrastructure security baselines. Maintain asset inventory, patch cadence, and configuration compliance (servers, workstations, and Kubernetes workloads). Physical & Data Centre Security Support the physical security programme at all Nscale data centres, ensuring alignment with ISO 27001 Annex A.11 and ISO 22237. Maintain visitor management and access audit trails, assisting with incident reviews and compliance documentation. Awareness & Culture Support security awareness and phishing simulation programmes. Develop clear communications and training materials to reinforce security accountability across teams. Contribute to architecture reviews, change control boards, and project assessments. Identify and implement automation opportunities to reduce manual compliance and reporting overhead. Track and report on control effectiveness, audit findings, and remediation progress to senior leadership. About you 5+ years in information or physical security management within a data centre, cloud, or MSP environment. Deep familiarity with ISO 27001, SOC 2, NIST CSF, and Cyber Essentials Plus frameworks. Experience leading or supporting audits and external assessments. Strong understanding of incident response, vulnerability management, and access control processes. Excellent documentation, communication, and stakeholder management skills. Hands on with GRC tooling. Experience with GPU/HPC or cloud infrastructure security. Familiarity with ISO 22237 (data centre design & operations). Knowledge of Kubernetes, container security, and hybrid cloud architectures. Familiarity with Darktrace, Tenable, Checkpoint Harmony, and Exabeam SIEM. Security certifications (CISSP, CISM, ISO 27001 LA/LI, CompTIA Sec+, or similar). What we can offer you At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work.
02/02/2026
Full time
Senior Information Security Manager, London About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future. Role Overview We are seeking a Senior Information Security Manager to work closely with the Head of Information Security in building and managing Nscale's end-to-end security framework cross physical, technical, and organisational domains. You'll be hands on, execution-focused, and comfortable working in a complex environment that spans hyperscale GPU clusters, critical infrastructure, and compliance programmes (SOC 2 Type II, ISO 27001/17/18, Cyber Essentials Plus, ISO 22301, and ISO 22237).This role will directly support ongoing certification, audit readiness, and incident response initiatives, while driving operational maturity across all Nscale sites and systems. This role requiresUK government security clearance up to DV What you'll do Support ongoing delivery of ISO 27001, ISO 27017/27018, SOC 2 Type II, Cyber Essentials Plus, and ISO 22301 frameworks. Maintain the Information Security Management System (ISMS), risk register, and control evidence for internal and external audits. Support third party risk management (TPRM) ensuring supplier compliance and onboarding reviews. Develop and track KPIs/KRIs for security operations and compliance health. Operational Security Oversee vulnerability management, EDR posture, and security incident workflows in partnership with or MSSPs. Support incident detection, triage, investigation, and root cause analysis. Own operational runbooks for containment, eradication, and recovery procedures. Review access control lists, privileged user logs, and infrastructure security baselines. Maintain asset inventory, patch cadence, and configuration compliance (servers, workstations, and Kubernetes workloads). Physical & Data Centre Security Support the physical security programme at all Nscale data centres, ensuring alignment with ISO 27001 Annex A.11 and ISO 22237. Maintain visitor management and access audit trails, assisting with incident reviews and compliance documentation. Awareness & Culture Support security awareness and phishing simulation programmes. Develop clear communications and training materials to reinforce security accountability across teams. Contribute to architecture reviews, change control boards, and project assessments. Identify and implement automation opportunities to reduce manual compliance and reporting overhead. Track and report on control effectiveness, audit findings, and remediation progress to senior leadership. About you 5+ years in information or physical security management within a data centre, cloud, or MSP environment. Deep familiarity with ISO 27001, SOC 2, NIST CSF, and Cyber Essentials Plus frameworks. Experience leading or supporting audits and external assessments. Strong understanding of incident response, vulnerability management, and access control processes. Excellent documentation, communication, and stakeholder management skills. Hands on with GRC tooling. Experience with GPU/HPC or cloud infrastructure security. Familiarity with ISO 22237 (data centre design & operations). Knowledge of Kubernetes, container security, and hybrid cloud architectures. Familiarity with Darktrace, Tenable, Checkpoint Harmony, and Exabeam SIEM. Security certifications (CISSP, CISM, ISO 27001 LA/LI, CompTIA Sec+, or similar). What we can offer you At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments. Join our thriving remote first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work.