ML / Backend

  • Sqwish
  • Cambridge, Cambridgeshire
  • 13/06/2026
Full time Information Technology Telecommunications

Job Description

£40k-£90k depending on location and experience

About Sqwish Labs. At Sqwish Labs we build the infrastructure that lets AI products learn from real production outcomes, so signals like customer resolution, conversion, trust, time saved, safety and cost can shape what the system does next.

The work is hard in the best way. Part of it is research: learning from messy, delayed, real-world feedback instead of clean benchmarks. Part of it is engineering: building reliable infrastructure that can sit inside live AI systems and make good decisions request by request.

We're a small, ambitious team with deep Cambridge roots, building with flexibility as we grow. We care deeply about what we build and how we build it. We move quickly, ask sharp questions, learn out loud, and try to stay honest about what reality is teaching us. No one has all the answers here, and that's part of what makes it exciting.

This role focuses on building the product infrastructure behind closed-loop AI: the APIs, services, data models, and operational systems that let Sqwish learn from real outcomes. You will work across low-latency serving paths, feedback and metrics pipelines, background workers, reliability, and observability. The work combines strong backend engineering with enough ML infrastructure context to build systems that research can trust and production can depend on.

The right person will be excited by complex, high-leverage engineering+ML problems where correctness, latency, reliability, and product constraints all matter at once. You enjoy building systems that are clean enough to reason about, but pragmatic enough to ship. You care about tests, observability, and operational safety, but you do not hide behind process.

Problems you'll tackle
  • Building low-latency optimisation APIs that sit on the critical path of AI products
  • Capturing decisions, model outputs, feedback, cost, latency, and outcome signals reliably
  • Designing backend systems that support continuous learning from production data
  • Building Rust and Python services across serving, workers, training workflows, and internal tooling
  • Working with Postgres, Redis, queues/streams, migrations, and event-driven workflows
  • Making reliability, observability, and deployment safety part of the product from the beginning
Core responsibilities
  • Write production-grade Rust and Python services
  • Design clean domain boundaries around requests, functions, variants, metrics, rewards, and feedback
  • Build APIs and data flows that keep serving, reporting, and learning paths consistent
  • Own database schemas, migrations, background jobs, and operational guardrails
  • Instrument systems with structured logs, traces, metrics, dashboards, and useful alerts
  • Improve local development, CI, Docker, E2E tests, and release workflows
  • Work closely with research, product, and design to turn ambiguous system problems into shippable infrastructure

We don't expect mastery of every bullet, strength in some plus the drive to learn the rest beats a perfect checklist.

Nice to have, but teachable on the job
  • Some experience or interest in ML
  • Experience with Rust, Axum, Tokio, SQLx, or PyO3
  • Experience with FastAPI, Pydantic, SQLAlchemy, Alembic, pytest, mypy, or Ruff
  • Familiarity with Postgres, Redis, event-driven systems, queues, or streaming architectures
  • Experience with OpenTelemetry, Prometheus, Grafana, Loki, Tempo, or structured logging
  • Comfort with Docker, Kubernetes, Helm, Terraform, GitHub Actions, or release automation
  • Exposure to LLM infrastructure, model routing, embeddings, GPU workers, or ML platform systems
  • Experience with SOC 2 / ISO 27001 readiness, audit trails, secrets handling, or production compliance

What to expect:
£40k-£90k (+equity) based on location and experience

Location: Cambridge in person (preferred) but remote available too. We can sponsor UK visas.