Member of Technical Staff: LLM Inference Systems

Doubleword
13/06/2026

Full time Information Technology Telecommunications

Job Description

Member of Technical Staff: LLM Inference Systems About the Role

We're seeking a Senior Research Engineer to join our mission of solving the hardest inference challenges in generative AI. You'll be responsible for developing cutting edge inference technology at all levels of the inference stack. This could involve writing custom kernels for inference, or designing of compute clusters for unique inference needs, or contributing to state of the art open source inference engines.

What You'll Do

Examples of projects you might work on:

Building and optimizing infrastructure for batch inference workloads: focusing on high throughput, cost-efficient processing
Inferencing fine tuned models at scale: using tools like multi LoRA and multi PEFT inference engines.
Optimizing open source inference engines for offloading-based inference: implementing inference optimizations for severely memory constrained environments.

What We're Looking For

Note: A good candidate will have 80% of the following qualities. Please apply, even if the following doesn't describe you perfectly.

Core Technical Skills

Understanding of GPU architectures and their performance characteristics
Deep understanding of LLM inference workloads, performance characteristics, and optimization techniques
Familiarity with Inference tooling and deep learning libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM)

Research Mindset

Curiosity about emerging hardware trends and ML optimization techniques
Ability to understand complex research requirements and translate them into infrastructure needs
Comfort with ambiguity and rapidly evolving technical landscapes
Experience supporting research workflows and experimental systems

About Us

We're dedicated to making large language models faster, cheaper, and more accessible. Our infrastructure team is laser-focused on LLM inference optimization, pushing the boundaries of what's possible in terms of performance and cost efficiency while maintaining the reliability needed to serve these models at scale. We provide competitive compensation, comprehensive benefits, and opportunities for professional growth in one of the most exciting fields in technology.

Member of Technical Staff: LLM Inference Systems

Job Description

Modal Window