AI Inference Engineer for Real-Time LLMs (GPU, PyTorch)

Pantera Capital
27/05/2026

Full time Information Technology Telecommunications

Job Description

Pantera Capital is seeking an AI Inference Engineer to enhance their team in London. In this role, you will develop APIs for AI inference, benchmark the inference stack, improve system reliability, and explore novel research for LLM optimizations. Ideal candidates should have experience with ML systems, deep learning frameworks like PyTorch, familiarity with LLM architectures, and understanding of GPU programming using CUDA. A competitive compensation package including equity options is offered.

AI Inference Engineer for Real-Time LLMs (GPU, PyTorch)

Job Description

Modal Window