Jobs at Doubleword | IT Job Board

Staff Engineer, Batched Inference Platform

Doubleword

A technology startup in Greater London seeks a Member of Technical Staff to develop and scale its batched inference platform. This role involves tackling complex distributed systems challenges, with a focus on reliability and customer needs. The ideal candidate will have significant experience in Rust, PostgreSQL, and Kubernetes, and will thrive in a fast-paced, collaborative environment. The position offers a chance to tackle hard problems while fostering a culture of autonomy and innovation.

13/06/2026

Full time

A technology startup in Greater London seeks a Member of Technical Staff to develop and scale its batched inference platform. This role involves tackling complex distributed systems challenges, with a focus on reliability and customer needs. The ideal candidate will have significant experience in Rust, PostgreSQL, and Kubernetes, and will thrive in a fast-paced, collaborative environment. The position offers a chance to tackle hard problems while fostering a culture of autonomy and innovation.

Member of Technical Staff: LLM Inference Systems

Doubleword

Member of Technical Staff: LLM Inference Systems About the Role We're seeking a Senior Research Engineer to join our mission of solving the hardest inference challenges in generative AI. You'll be responsible for developing cutting edge inference technology at all levels of the inference stack. This could involve writing custom kernels for inference, or designing of compute clusters for unique inference needs, or contributing to state of the art open source inference engines. What You'll Do Examples of projects you might work on: Building and optimizing infrastructure for batch inference workloads: focusing on high throughput, cost-efficient processing Inferencing fine tuned models at scale: using tools like multi LoRA and multi PEFT inference engines. Optimizing open source inference engines for offloading-based inference: implementing inference optimizations for severely memory constrained environments. What We're Looking For Note: A good candidate will have 80% of the following qualities. Please apply, even if the following doesn't describe you perfectly. Core Technical Skills Understanding of GPU architectures and their performance characteristics Deep understanding of LLM inference workloads, performance characteristics, and optimization techniques Familiarity with Inference tooling and deep learning libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM) Research Mindset Curiosity about emerging hardware trends and ML optimization techniques Ability to understand complex research requirements and translate them into infrastructure needs Comfort with ambiguity and rapidly evolving technical landscapes Experience supporting research workflows and experimental systems About Us We're dedicated to making large language models faster, cheaper, and more accessible. Our infrastructure team is laser-focused on LLM inference optimization, pushing the boundaries of what's possible in terms of performance and cost efficiency while maintaining the reliability needed to serve these models at scale. We provide competitive compensation, comprehensive benefits, and opportunities for professional growth in one of the most exciting fields in technology.

13/06/2026

Full time

Member of Technical Staff: LLM Inference Systems About the Role We're seeking a Senior Research Engineer to join our mission of solving the hardest inference challenges in generative AI. You'll be responsible for developing cutting edge inference technology at all levels of the inference stack. This could involve writing custom kernels for inference, or designing of compute clusters for unique inference needs, or contributing to state of the art open source inference engines. What You'll Do Examples of projects you might work on: Building and optimizing infrastructure for batch inference workloads: focusing on high throughput, cost-efficient processing Inferencing fine tuned models at scale: using tools like multi LoRA and multi PEFT inference engines. Optimizing open source inference engines for offloading-based inference: implementing inference optimizations for severely memory constrained environments. What We're Looking For Note: A good candidate will have 80% of the following qualities. Please apply, even if the following doesn't describe you perfectly. Core Technical Skills Understanding of GPU architectures and their performance characteristics Deep understanding of LLM inference workloads, performance characteristics, and optimization techniques Familiarity with Inference tooling and deep learning libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM) Research Mindset Curiosity about emerging hardware trends and ML optimization techniques Ability to understand complex research requirements and translate them into infrastructure needs Comfort with ambiguity and rapidly evolving technical landscapes Experience supporting research workflows and experimental systems About Us We're dedicated to making large language models faster, cheaper, and more accessible. Our infrastructure team is laser-focused on LLM inference optimization, pushing the boundaries of what's possible in terms of performance and cost efficiency while maintaining the reliability needed to serve these models at scale. We provide competitive compensation, comprehensive benefits, and opportunities for professional growth in one of the most exciting fields in technology.

Senior LLM Inference Systems Engineer

Doubleword

A leading AI technology company in Greater London is seeking a Senior Research Engineer to develop cutting-edge LLM inference technology. Candidates will work on optimizing infrastructure for batch inference workloads and enhancing inference engines in memory-constrained environments. Ideal candidates will possess a deep understanding of inference workloads and GPU architectures, along with familiarity with tools such as PyTorch and TensorRT. The role offers competitive compensation and is aimed at solving challenging AI problems.

13/06/2026

Full time

A leading AI technology company in Greater London is seeking a Senior Research Engineer to develop cutting-edge LLM inference technology. Candidates will work on optimizing infrastructure for batch inference workloads and enhancing inference engines in memory-constrained environments. Ideal candidates will possess a deep understanding of inference workloads and GPU architectures, along with familiarity with tools such as PyTorch and TensorRT. The role offers competitive compensation and is aimed at solving challenging AI problems.

Member of Technical Staff - Batched Inference Server

Doubleword

Member of Technical Staff - Batched Inference Server About Doubleword Doubleword is a well funded, VC backed startup building an inference platform that provides the cheapest tokens on the market for high volume batch workloads. The technical challenge is substantial. We orchestrate thousands of concurrent batch jobs while maintaining sub second latency queries, all within a system where reliability is non negotiable. We work directly with our users to shape our technical roadmap. Our focus is clear: provide the cheapest and most scalable tokens on the market while maintaining exceptional reliability and developer experience. The Role We are looking for someone who elevates the people around them. Someone genuinely excited by hard problems, who loves discussing technical ideas and makes others better through clarity and energy. Someone who cares deeply about both the craft and the people they practice it with. You will join a small, high trust team with real autonomy. You will take ownership of complex problems and influence how we design, build, test, and ship software. What You'll Be Doing You will build and scale our batched inference platform, a distributed system that handles thousands of concurrent batch jobs across multiple LLM deployments. Rust for core services PostgreSQL for persistent state Kubernetes for deployment and orchestration Core areas of work Database optimization under high load and concurrent access patterns Distributed job scheduling and retry logic Real time observability and monitoring Designing for failure from the start to build reliability into the system What We're Looking For Requirements Technically exceptional - Your skills span domains and technologies. You solve genuinely hard problems and have consistently demonstrated this. Distributed systems experience - You have delivered distributed systems in production. You understand high throughput, highly parallel architectures and can point to concrete examples of excellent work. Pragmatic shipper - You move fast while maintaining stability for a large user base. Humble - You lead by example. You take accountability quickly and say "I don't know" when appropriate. Customer focused - You start from real user problems and deliver technical solutions. You are a problem solver, not a technology purist. Nice to have Experience with our stack: Rust, TypeScript, PostgreSQL, Kubernetes Experience with LLM inference systems or batch processing infrastructure Our Engineering Principles We are technically ambitious. Hard problems energize us. We move fast. Priorities shift and requirements evolve. You should be excited by rapid iteration. We choose pragmatic solutions over clever ones. The right answer beats the interesting answer. We operate in ambiguity. Decisions are made with incomplete information and revised when evidence changes. Interview Process Technical Culture Interview - 30 minute video call with an engineer. We discuss your experience and alignment with our engineering culture. Wider Culture Interview - 30 minute video call with someone outside the tech team. This focuses on company values and how you work with others. Technical Design Interview - 1 hour video call with members of the engineering team. We present a challenge and collaboratively design a system. Paid Day Work Trial - Spend a day working on a real problem from our Batched Inference Server. This gives you a genuine sense of how we operate, and gives us insight into how you approach real world problems. Compensation: $1,000. Offer - If there is strong mutual alignment, we make an offer and you join us on the journey. Apply Email your CV and a short note explaining why this role interests you to

13/06/2026

Full time

Member of Technical Staff - Batched Inference Server About Doubleword Doubleword is a well funded, VC backed startup building an inference platform that provides the cheapest tokens on the market for high volume batch workloads. The technical challenge is substantial. We orchestrate thousands of concurrent batch jobs while maintaining sub second latency queries, all within a system where reliability is non negotiable. We work directly with our users to shape our technical roadmap. Our focus is clear: provide the cheapest and most scalable tokens on the market while maintaining exceptional reliability and developer experience. The Role We are looking for someone who elevates the people around them. Someone genuinely excited by hard problems, who loves discussing technical ideas and makes others better through clarity and energy. Someone who cares deeply about both the craft and the people they practice it with. You will join a small, high trust team with real autonomy. You will take ownership of complex problems and influence how we design, build, test, and ship software. What You'll Be Doing You will build and scale our batched inference platform, a distributed system that handles thousands of concurrent batch jobs across multiple LLM deployments. Rust for core services PostgreSQL for persistent state Kubernetes for deployment and orchestration Core areas of work Database optimization under high load and concurrent access patterns Distributed job scheduling and retry logic Real time observability and monitoring Designing for failure from the start to build reliability into the system What We're Looking For Requirements Technically exceptional - Your skills span domains and technologies. You solve genuinely hard problems and have consistently demonstrated this. Distributed systems experience - You have delivered distributed systems in production. You understand high throughput, highly parallel architectures and can point to concrete examples of excellent work. Pragmatic shipper - You move fast while maintaining stability for a large user base. Humble - You lead by example. You take accountability quickly and say "I don't know" when appropriate. Customer focused - You start from real user problems and deliver technical solutions. You are a problem solver, not a technology purist. Nice to have Experience with our stack: Rust, TypeScript, PostgreSQL, Kubernetes Experience with LLM inference systems or batch processing infrastructure Our Engineering Principles We are technically ambitious. Hard problems energize us. We move fast. Priorities shift and requirements evolve. You should be excited by rapid iteration. We choose pragmatic solutions over clever ones. The right answer beats the interesting answer. We operate in ambiguity. Decisions are made with incomplete information and revised when evidence changes. Interview Process Technical Culture Interview - 30 minute video call with an engineer. We discuss your experience and alignment with our engineering culture. Wider Culture Interview - 30 minute video call with someone outside the tech team. This focuses on company values and how you work with others. Technical Design Interview - 1 hour video call with members of the engineering team. We present a challenge and collaboratively design a system. Paid Day Work Trial - Spend a day working on a real problem from our Batched Inference Server. This gives you a genuine sense of how we operate, and gives us insight into how you approach real world problems. Compensation: $1,000. Offer - If there is strong mutual alignment, we make an offer and you join us on the journey. Apply Email your CV and a short note explaining why this role interests you to

Doubleword

4 job(s) at Doubleword

Modal Window