AI Inference Engineer for Real-Time LLMs (GPU, PyTorch)

  • Pantera Capital
  • 27/05/2026
Full time Information Technology Telecommunications

Job Description

Pantera Capital is seeking an AI Inference Engineer to enhance their team in London. In this role, you will develop APIs for AI inference, benchmark the inference stack, improve system reliability, and explore novel research for LLM optimizations. Ideal candidates should have experience with ML systems, deep learning frameworks like PyTorch, familiarity with LLM architectures, and understanding of GPU programming using CUDA. A competitive compensation package including equity options is offered.