Jobs at Callosum | IT Job Board

Heterogeneous AI Inference Performance Engineer

Callosum

Callosum is looking for a role focused on bridging internal engineering with real-world applications. You will run experiments and optimize performance on cloud or on-prem infrastructures, while contributing to system health and benchmarking. Your expertise in large model deployment and effective communication will enable you to translate performance data into actionable insights, ultimately guiding stack optimizations for enhanced efficiency.

24/06/2026

Full time

Callosum is looking for a role focused on bridging internal engineering with real-world applications. You will run experiments and optimize performance on cloud or on-prem infrastructures, while contributing to system health and benchmarking. Your expertise in large model deployment and effective communication will enable you to translate performance data into actionable insights, ultimately guiding stack optimizations for enhanced efficiency.

Senior Systems Tooling Engineer - Observability & CLI

Callosum

Callosum in Greater London is seeking a software engineer to shape the developer experience of our innovative AI stack. This role focuses on creating debugging tools and integrations that enhance how engineers interact with complex systems. The ideal candidate will have strong software engineering fundamentals, experience with profiling tools, and a familiarity with observability stacks like Prometheus and Grafana. Join a team dedicated to solving the impossible!

24/06/2026

Full time

Callosum in Greater London is seeking a software engineer to shape the developer experience of our innovative AI stack. This role focuses on creating debugging tools and integrations that enhance how engineers interact with complex systems. The ideal candidate will have strong software engineering fundamentals, experience with profiling tools, and a familiarity with observability stacks like Prometheus and Grafana. Join a team dedicated to solving the impossible!

Inference Engine Development - Member of Technical Staff

Callosum

About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. Inference engines were designed for single-model inference on homogeneous GPU clusters - this role builds them beyond that. Working directly on systems like vLLM and SGLang, you will adapt and extend them for heterogeneous resources, making them hardware-aware, with deeper optimisation around scheduling, memory, and execution. The execution strategies you design - parallelism, disaggregation, caching - will define what heterogeneous inference looks like at production scale. Your work ensures that the capabilities exposed by the lower layers of the stack translate into real, measurable gains, the new standard for how inference runs on diverse hardware. What You'll Build Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities What You Bring Deep familiarity with the internals of SGLang, vLLM, or comparable inference serving frameworks - scheduler design, memory management, and execution pipelines Strong background in high-performance Python and C++/CUDA systems, particularly in the context of ML inference Experience designing or implementing parallelism strategies for large model serving Understanding of disaggregated serving architectures and the tradeoffs involved in separating modules of a workflow Demonstrable record of working effectively in fast-moving open source codebases with evolving APIs and design conventions

20/06/2026

Full time

About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. Inference engines were designed for single-model inference on homogeneous GPU clusters - this role builds them beyond that. Working directly on systems like vLLM and SGLang, you will adapt and extend them for heterogeneous resources, making them hardware-aware, with deeper optimisation around scheduling, memory, and execution. The execution strategies you design - parallelism, disaggregation, caching - will define what heterogeneous inference looks like at production scale. Your work ensures that the capabilities exposed by the lower layers of the stack translate into real, measurable gains, the new standard for how inference runs on diverse hardware. What You'll Build Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities What You Bring Deep familiarity with the internals of SGLang, vLLM, or comparable inference serving frameworks - scheduler design, memory management, and execution pipelines Strong background in high-performance Python and C++/CUDA systems, particularly in the context of ML inference Experience designing or implementing parallelism strategies for large model serving Understanding of disaggregated serving architectures and the tradeoffs involved in separating modules of a workflow Demonstrable record of working effectively in fast-moving open source codebases with evolving APIs and design conventions

Inference Performance & Deployment - Member of Technical Staff

Callosum

About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. This role owns the bridge between Callosum's internal engineering and the real world. You design the tooling and methodologies that ground our technology in real-world performance and behaviour, sitting at the integration point of every engineering function. You will be the first to run our heterogeneous infrastructure in production-equivalent conditions, systematically characterising performance, identifying bottlenecks, and driving decisions on production-readiness. Your work ensures that every layer of the stack is guided by empirical evidence rather than assumption. What You'll Build Run experiments self-hosting models on cloud instances or on-prem across providers and hardware configurations, systematically characterising performance envelopes Develop and maintain deployment patterns that are reproducible, measurable, and optimised for latency, throughput, and cost Work at the orchestration and routing software that sits above the inference engine - to improve caching, request scheduling, batching, and resource allocation Act as the integration point for the other roles: consume new accelerator support, engine features, and infrastructure upgrades - to provide high-quality feedback on bottlenecks, essential capabilities, and guide the stack optimisations Build and maintain benchmarking harnesses, regression suites, and performance dashboards that give the team a shared view of system health and progress What You Bring Experience deploying and benchmarking large model inference in production or production-equivalent environments Familiarity with multi-node GPU deployments and associated networking/communication stacks Strong end-to-end performance characterisation skills: able to isolate whether a bottleneck is in the network, the runtime, the memory subsystem, or the model itself Familiarity with serving frameworks like Dynamo, Triton Inference Server, or similar orchestration layers Clear communication skills - able to translate performance data into actionable, prioritised feedback for the teams building the underlying systems A demonstrable disciplined and systematic approach to deployment: reproducibility, measurement methodology, controlled comparisons, etc

19/06/2026

Full time

About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. This role owns the bridge between Callosum's internal engineering and the real world. You design the tooling and methodologies that ground our technology in real-world performance and behaviour, sitting at the integration point of every engineering function. You will be the first to run our heterogeneous infrastructure in production-equivalent conditions, systematically characterising performance, identifying bottlenecks, and driving decisions on production-readiness. Your work ensures that every layer of the stack is guided by empirical evidence rather than assumption. What You'll Build Run experiments self-hosting models on cloud instances or on-prem across providers and hardware configurations, systematically characterising performance envelopes Develop and maintain deployment patterns that are reproducible, measurable, and optimised for latency, throughput, and cost Work at the orchestration and routing software that sits above the inference engine - to improve caching, request scheduling, batching, and resource allocation Act as the integration point for the other roles: consume new accelerator support, engine features, and infrastructure upgrades - to provide high-quality feedback on bottlenecks, essential capabilities, and guide the stack optimisations Build and maintain benchmarking harnesses, regression suites, and performance dashboards that give the team a shared view of system health and progress What You Bring Experience deploying and benchmarking large model inference in production or production-equivalent environments Familiarity with multi-node GPU deployments and associated networking/communication stacks Strong end-to-end performance characterisation skills: able to isolate whether a bottleneck is in the network, the runtime, the memory subsystem, or the model itself Familiarity with serving frameworks like Dynamo, Triton Inference Server, or similar orchestration layers Clear communication skills - able to translate performance data into actionable, prioritised feedback for the teams building the underlying systems A demonstrable disciplined and systematic approach to deployment: reproducibility, measurement methodology, controlled comparisons, etc

Systems Tooling & Infrastructure - Member of Technical Staff

Callosum

About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co evolving system, evolved beyond GPUs. This role owns the developer experience of Callosum's stack, turning complex, low level systems into something observable, debuggable, and usable by the rest of the team. You'll build the profiling, tracing, and developer tooling that defines how engineers interact with heterogeneous systems, enabling fast experimentation with new accelerators and complex inference workflows. You will own the abstractions, CLIs, and instrumentation that the engineering organisation is built on - primitives that don't yet exist for the next generation of compute infrastructure. As multi stage and multi agent workflows grow in complexity, your work is what keeps execution paths visible and tractable, ensuring the organisation can scale without losing insight or control. What You'll Build Extend profiling and tracing tooling for new accelerators, including collection, compression, and visualisation of performance data Develop CLI tools and automation wrappers that simplify common workflows - spinning up inference stacks, launching benchmarks, managing configurations Convert prototypes of internal tooling into high performance, scalable, accessible commands Build tooling to support multi agent serving workflows: request tracing across agent boundaries, pipeline visualisation, and debugging tools for complex inference DAGs Create internal libraries and abstractions that let other teams move faster without reinventing shared infrastructure What You Bring Strong software engineering fundamentals: clean APIs, good error handling, sensible defaults, and clear documentation Experience with profiling and tracing systems (perf, Nsight, Tracy, or similar) and a good sense of how to make trace data actionable rather than overwhelming Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry, or equivalent) in varied infrastructure environments Comfortable across the stack - from low level trace collection to dashboards and developer facing CLI tools

19/06/2026

Full time

About Us Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you. About the Role Callosum believes that orders of magnitude improvements in AI systems will come through application aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co evolving system, evolved beyond GPUs. This role owns the developer experience of Callosum's stack, turning complex, low level systems into something observable, debuggable, and usable by the rest of the team. You'll build the profiling, tracing, and developer tooling that defines how engineers interact with heterogeneous systems, enabling fast experimentation with new accelerators and complex inference workflows. You will own the abstractions, CLIs, and instrumentation that the engineering organisation is built on - primitives that don't yet exist for the next generation of compute infrastructure. As multi stage and multi agent workflows grow in complexity, your work is what keeps execution paths visible and tractable, ensuring the organisation can scale without losing insight or control. What You'll Build Extend profiling and tracing tooling for new accelerators, including collection, compression, and visualisation of performance data Develop CLI tools and automation wrappers that simplify common workflows - spinning up inference stacks, launching benchmarks, managing configurations Convert prototypes of internal tooling into high performance, scalable, accessible commands Build tooling to support multi agent serving workflows: request tracing across agent boundaries, pipeline visualisation, and debugging tools for complex inference DAGs Create internal libraries and abstractions that let other teams move faster without reinventing shared infrastructure What You Bring Strong software engineering fundamentals: clean APIs, good error handling, sensible defaults, and clear documentation Experience with profiling and tracing systems (perf, Nsight, Tracy, or similar) and a good sense of how to make trace data actionable rather than overwhelming Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry, or equivalent) in varied infrastructure environments Comfortable across the stack - from low level trace collection to dashboards and developer facing CLI tools

Platform Engineer - Member of Technical Staff

Callosum

About Us The last era of AI scaled on a single bet: bigger models, more identical chips, more data. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. Real-world problems are heterogeneous: no single model or chip can solve them alone. The next era of AI requires heterogeneity at the infrastructure level - diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability that move the Pareto frontier of what is possible. That's what we are building. Callosum is the Intelligent Systems Company. We started from questioning what actually creates intelligence. We believe there is no single answer, but rather a system-level solution. We co-evolve models, workflows, and silicon together to show that intelligence does not come from a single component, but it emerges from the diversity of co-optimised mechanisms working together and aware of each other. Heterogeneity will define the next era of compute, and is a principle that holds in biological, neuronal, and economic systems alike. In early 2026 we launched with results showing orders of magnitude improvements in performance, and this is only the beginning. Agentic AI is the future of how intelligence is deployed: multi-step, long-horizon, and operating in changing environments. These systems are inherently heterogeneous, and can only be as powerful as the infrastructure that runs them. We are engineers and scientists based in London, working together across the full depth of the stack. We are curious, intellectually honest, and building what doesn't exist yet. If you thrive on uncharted territory and are energised by the scale of the challenge, we'd love to hear from you. About the Role You'll build the platform our customers integrate against, working to the architecture the Lead API Platform Architect sets and alongside the reliability engineer. Your first major project is the metering and billing layer the commercial model runs on. From there, you'll build out the core services that turn the platform's design into something that runs in production. This is a build-heavy, hands-on role. The architect decides how the platform is shaped. You build a large part of it. What You'll Build The metering and billing layer. Usage capture, attribution per workflow, and the systems behind consumption-based and outcome-based pricing. Workflow deployment onto the platform. The machinery that registers a workflow, routes it, and gives it the scaling and isolation it needs (some workflows run in their own dedicated container), built to the architect's contract and consuming the execution endpoints each workflow resolves to. Core platform services - the request path, rate limiting, quota enforcement, and the rest of the machinery beneath the API. Client SDKs and libraries. The packages customers use to integrate, in the main languages they work in. Internal platform tooling. The deployment, admin, and operational glue that lets a small team run the platform, shared with the reliability engineer. On-call as the platform scales. What You Bring Strong backend or production-engineering background building and operating services at scale. Strong distributed-systems depth: API design, concurrency, fault tolerance, and how high-throughput services behave operationally. Experience with the infrastructure behind multi-tenant SaaS or API products, such as usage metering, billing, rate limiting, or authentication. Fluency in a backend or systems language such as Python, Go, or Rust, and comfort running services in the cloud. Strong ownership of reliability and correctness in production. What Sets You Apart Experience building usage metering or billing infrastructure for a consumption-priced product. Experience designing and shipping client SDKs that external developers rely on. LLM serving, inference APIs, or ML infrastructure experience. Familiarity with observability tooling such as Prometheus, Grafana, or OpenTelemetry. Early-stage company experience. What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

16/06/2026

Full time

About Us The last era of AI scaled on a single bet: bigger models, more identical chips, more data. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. Real-world problems are heterogeneous: no single model or chip can solve them alone. The next era of AI requires heterogeneity at the infrastructure level - diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability that move the Pareto frontier of what is possible. That's what we are building. Callosum is the Intelligent Systems Company. We started from questioning what actually creates intelligence. We believe there is no single answer, but rather a system-level solution. We co-evolve models, workflows, and silicon together to show that intelligence does not come from a single component, but it emerges from the diversity of co-optimised mechanisms working together and aware of each other. Heterogeneity will define the next era of compute, and is a principle that holds in biological, neuronal, and economic systems alike. In early 2026 we launched with results showing orders of magnitude improvements in performance, and this is only the beginning. Agentic AI is the future of how intelligence is deployed: multi-step, long-horizon, and operating in changing environments. These systems are inherently heterogeneous, and can only be as powerful as the infrastructure that runs them. We are engineers and scientists based in London, working together across the full depth of the stack. We are curious, intellectually honest, and building what doesn't exist yet. If you thrive on uncharted territory and are energised by the scale of the challenge, we'd love to hear from you. About the Role You'll build the platform our customers integrate against, working to the architecture the Lead API Platform Architect sets and alongside the reliability engineer. Your first major project is the metering and billing layer the commercial model runs on. From there, you'll build out the core services that turn the platform's design into something that runs in production. This is a build-heavy, hands-on role. The architect decides how the platform is shaped. You build a large part of it. What You'll Build The metering and billing layer. Usage capture, attribution per workflow, and the systems behind consumption-based and outcome-based pricing. Workflow deployment onto the platform. The machinery that registers a workflow, routes it, and gives it the scaling and isolation it needs (some workflows run in their own dedicated container), built to the architect's contract and consuming the execution endpoints each workflow resolves to. Core platform services - the request path, rate limiting, quota enforcement, and the rest of the machinery beneath the API. Client SDKs and libraries. The packages customers use to integrate, in the main languages they work in. Internal platform tooling. The deployment, admin, and operational glue that lets a small team run the platform, shared with the reliability engineer. On-call as the platform scales. What You Bring Strong backend or production-engineering background building and operating services at scale. Strong distributed-systems depth: API design, concurrency, fault tolerance, and how high-throughput services behave operationally. Experience with the infrastructure behind multi-tenant SaaS or API products, such as usage metering, billing, rate limiting, or authentication. Fluency in a backend or systems language such as Python, Go, or Rust, and comfort running services in the cloud. Strong ownership of reliability and correctness in production. What Sets You Apart Experience building usage metering or billing infrastructure for a consumption-priced product. Experience designing and shipping client SDKs that external developers rely on. LLM serving, inference APIs, or ML infrastructure experience. Familiarity with observability tooling such as Prometheus, Grafana, or OpenTelemetry. Early-stage company experience. What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

API Platform Architect - Member of Technical Staff

Callosum

About the Role You'll own the architecture and the technical bar for the API platform our customers integrate against. A customer sends a request for a workflow. Your platform authenticates it, routes it to the right execution backend, scales to meet the load, and keeps every customer's traffic and data fully separate from every other's. The model execution itself sits behind endpoints your platform consumes, so this is a role about the API and orchestration layer, not the serving internals. This is the most senior engineering hire on the platform. It's a build role first and a leadership role second. You'll write the core systems yourself for some time before you mostly direct others. You'll set the foundation the rest of the engineering team builds on, and grow a small platform team around you, starting with a reliability engineer and a product engineer. What You'll Build The API surface customers integrate against, and the orchestration layer behind it: request lifecycle, routing, and state management. DNS routing, load balancing, and autoscaling, so the platform meets demand without anyone touching it by hand. Multi-tenancy and isolation. Per-workflow scaling and deployment. The technical bar. Design standards, code review, and the architectural decisions that are expensive to reverse later. Performance and scalability as customer volume grows to millions of concurrent requests. Production reliability, as a partnership. You set the architecture that makes reliability achievable. A dedicated reliability engineer owns day-to-day operations. What You Bring Deep experience designing, building, and operating production API platforms that customers depend on. Strong distributed-systems fundamentals: stateless and stateful services, routing, queueing, autoscaling, multi-tenancy, fault tolerance. A track record of owning architecture for a system at scale. The judgment to know which architectural decisions are reversible and which are not. Hands-on, with a bias for action. You want to build, not just direct, and an early-stage environment where you set the foundations is what you're after. Comfort with ambiguity and a strong sense of ownership. You'll make consequential calls without complete information. What Sets You Apart Experience with multi-tenant systems where data separation between customers is a hard requirement. Familiarity with high-throughput inference or agent workloads, and a feel for how different workload types place different demands on the infrastructure around them. Experience running production API infrastructure that meets high-assurance standards. Open-source contributions to relevant infrastructure, or production systems whose scale and complexity you can speak to in detail. Early-stage and AI-native company experience. What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

16/06/2026

Full time

About the Role You'll own the architecture and the technical bar for the API platform our customers integrate against. A customer sends a request for a workflow. Your platform authenticates it, routes it to the right execution backend, scales to meet the load, and keeps every customer's traffic and data fully separate from every other's. The model execution itself sits behind endpoints your platform consumes, so this is a role about the API and orchestration layer, not the serving internals. This is the most senior engineering hire on the platform. It's a build role first and a leadership role second. You'll write the core systems yourself for some time before you mostly direct others. You'll set the foundation the rest of the engineering team builds on, and grow a small platform team around you, starting with a reliability engineer and a product engineer. What You'll Build The API surface customers integrate against, and the orchestration layer behind it: request lifecycle, routing, and state management. DNS routing, load balancing, and autoscaling, so the platform meets demand without anyone touching it by hand. Multi-tenancy and isolation. Per-workflow scaling and deployment. The technical bar. Design standards, code review, and the architectural decisions that are expensive to reverse later. Performance and scalability as customer volume grows to millions of concurrent requests. Production reliability, as a partnership. You set the architecture that makes reliability achievable. A dedicated reliability engineer owns day-to-day operations. What You Bring Deep experience designing, building, and operating production API platforms that customers depend on. Strong distributed-systems fundamentals: stateless and stateful services, routing, queueing, autoscaling, multi-tenancy, fault tolerance. A track record of owning architecture for a system at scale. The judgment to know which architectural decisions are reversible and which are not. Hands-on, with a bias for action. You want to build, not just direct, and an early-stage environment where you set the foundations is what you're after. Comfort with ambiguity and a strong sense of ownership. You'll make consequential calls without complete information. What Sets You Apart Experience with multi-tenant systems where data separation between customers is a hard requirement. Familiarity with high-throughput inference or agent workloads, and a feel for how different workload types place different demands on the infrastructure around them. Experience running production API infrastructure that meets high-assurance standards. Open-source contributions to relevant infrastructure, or production systems whose scale and complexity you can speak to in detail. Early-stage and AI-native company experience. What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

Platform Site Reliability Engineer - Member of Technical Staff

Callosum

About Us The last era of AI scaled on a single bet: bigger models, more identical chips, more data. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. Real-world problems are heterogeneous: no single model or chip can solve them alone. The next era of AI requires heterogeneity at the infrastructure level - diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability that move the Pareto frontier of what is possible. That's what we are building. Callosum is the Intelligent Systems Company. We started from questioning what actually creates intelligence. We believe there is no single answer, but rather a system-level solution. We co-evolve models, workflows, and silicon together to show that intelligence does not come from a single component, but it emerges from the diversity of co-optimised mechanisms working together and aware of each other. Heterogeneity will define the next era of compute, and is a principle that holds in biological, neuronal, and economic systems alike. In early 2026 we launched with results showing orders of magnitude improvements in performance, and this is only the beginning. Agentic AI is the future of how intelligence is deployed: multi-step, long-horizon, and operating in changing environments. These systems are inherently heterogeneous, and can only be as powerful as the infrastructure that runs them. We are engineers and scientists based in London, working together across the full depth of the stack. We are curious, intellectually honest, and building what doesn't exist yet. If you thrive on uncharted territory and are energised by the scale of the challenge, we'd love to hear from you. About the Role Our platform is the production system our customers route real traffic through. When it degrades, their product degrades. You will own its operational health end-to-end: SLOs, observability, incident response, deployment discipline, and capacity planning across heterogeneous compute backends. As the platform scales to millions of concurrent requests across heterogeneous compute, the work shifts from building the operational foundation to defending it under conditions most teams never encounter. You'll define "production-grade" means for a platform at the centre of a fast-growing company, and own it end to end. The reliability practice is yours to build. You will work closely with the platform team, setting the technical direction. You will work closely with the hardware and orchestration teams to expose heterogeneous backends reliably through the platform. Who You'll Build Service-level objectives, monitoring, alerting, and observability. On-call and incident response: runbooks, escalation, blameless postmortems, and follow-through. Capacity planning and the operational side of running across heterogeneous compute backends. What You Bring Strong SRE or production-engineering background running customer-facing systems at scale. Fluency with modern operational tooling: observability stacks, container orchestration, infrastructure-as-code, CI/CD. Experience owning incident response and driving reliability improvements. Compliance execution experience. What Sets You Apart Open-source contributions to relevant infrastructure, or production systems whose scale and complexity you can speak to in detail Early-stage and AI-native company experience. What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

15/06/2026

Full time

About Us The last era of AI scaled on a single bet: bigger models, more identical chips, more data. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. Real-world problems are heterogeneous: no single model or chip can solve them alone. The next era of AI requires heterogeneity at the infrastructure level - diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability that move the Pareto frontier of what is possible. That's what we are building. Callosum is the Intelligent Systems Company. We started from questioning what actually creates intelligence. We believe there is no single answer, but rather a system-level solution. We co-evolve models, workflows, and silicon together to show that intelligence does not come from a single component, but it emerges from the diversity of co-optimised mechanisms working together and aware of each other. Heterogeneity will define the next era of compute, and is a principle that holds in biological, neuronal, and economic systems alike. In early 2026 we launched with results showing orders of magnitude improvements in performance, and this is only the beginning. Agentic AI is the future of how intelligence is deployed: multi-step, long-horizon, and operating in changing environments. These systems are inherently heterogeneous, and can only be as powerful as the infrastructure that runs them. We are engineers and scientists based in London, working together across the full depth of the stack. We are curious, intellectually honest, and building what doesn't exist yet. If you thrive on uncharted territory and are energised by the scale of the challenge, we'd love to hear from you. About the Role Our platform is the production system our customers route real traffic through. When it degrades, their product degrades. You will own its operational health end-to-end: SLOs, observability, incident response, deployment discipline, and capacity planning across heterogeneous compute backends. As the platform scales to millions of concurrent requests across heterogeneous compute, the work shifts from building the operational foundation to defending it under conditions most teams never encounter. You'll define "production-grade" means for a platform at the centre of a fast-growing company, and own it end to end. The reliability practice is yours to build. You will work closely with the platform team, setting the technical direction. You will work closely with the hardware and orchestration teams to expose heterogeneous backends reliably through the platform. Who You'll Build Service-level objectives, monitoring, alerting, and observability. On-call and incident response: runbooks, escalation, blameless postmortems, and follow-through. Capacity planning and the operational side of running across heterogeneous compute backends. What You Bring Strong SRE or production-engineering background running customer-facing systems at scale. Fluency with modern operational tooling: observability stacks, container orchestration, infrastructure-as-code, CI/CD. Experience owning incident response and driving reliability improvements. Compliance execution experience. What Sets You Apart Open-source contributions to relevant infrastructure, or production systems whose scale and complexity you can speak to in detail Early-stage and AI-native company experience. What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

Senior API Platform Architect: Scale & Multi-Tenant

Callosum

Callosum is looking for a senior engineering lead to own and develop the architecture for our API platform. This role focuses on building core systems and leading a small team to enhance our platform, ensuring reliability and scalability while integrating customer workflows. The ideal candidate will be hands-on and experienced in designing high-assurance production API systems, comfortable with architectural judgments, and ready to work in an innovative environment. The position is primarily on-site in London.

15/06/2026

Full time

Callosum is looking for a senior engineering lead to own and develop the architecture for our API platform. This role focuses on building core systems and leading a small team to enhance our platform, ensuring reliability and scalability while integrating customer workflows. The ideal candidate will be hands-on and experienced in designing high-assurance production API systems, comfortable with architectural judgments, and ready to work in an innovative environment. The position is primarily on-site in London.

Platform Engineer - Billing & Multi-Tenant SaaS

Callosum

Callosum in London is seeking a backend engineer to build a robust platform for customer integration. The engineer will develop the metering and billing layer, alongside core platform services that ensure reliable production operations. The ideal candidate will have a strong engineering background, experience in distributed systems, and fluency in languages like Python, Go, or Rust. This position offers a competitive salary, equity, and the opportunity for professional growth in an inclusive workplace.

15/06/2026

Full time

Callosum in London is seeking a backend engineer to build a robust platform for customer integration. The engineer will develop the metering and billing layer, alongside core platform services that ensure reliable production operations. The ideal candidate will have a strong engineering background, experience in distributed systems, and fluency in languages like Python, Go, or Rust. This position offers a competitive salary, equity, and the opportunity for professional growth in an inclusive workplace.

Platform SRE & Technical Staff: Reliability Lead

Callosum

Callosum is seeking a skilled platform engineer to own the operational health of their production system. You'll define reliability practices, monitor performance, and enhance incident responses at scale. The ideal candidate has a strong SRE or production-engineering background and fluency with modern operational tooling. Join us in our London office and contribute to advancing the next era of AI with your expertise.

15/06/2026

Full time

Callosum is seeking a skilled platform engineer to own the operational health of their production system. You'll define reliability practices, monitor performance, and enhance incident responses at scale. The ideal candidate has a strong SRE or production-engineering background and fluency with modern operational tooling. Join us in our London office and contribute to advancing the next era of AI with your expertise.

Forward Deployed Engineer - Production ML & Customer Impact

Callosum

Callosum is seeking a Forward Deployed Engineer to directly engage with customers, managing projects from workload scoping to production deployment. You will leverage your software engineering background to evaluate and optimise workflows and systems in collaboration with client engineering teams. As part of a flat organisational structure, you will have significant responsibility and can grow with the team, while directly influencing the research and platform direction at Callosum. Competitive salary and equity options are offered.

13/06/2026

Full time

Callosum is seeking a Forward Deployed Engineer to directly engage with customers, managing projects from workload scoping to production deployment. You will leverage your software engineering background to evaluate and optimise workflows and systems in collaboration with client engineering teams. As part of a flat organisational structure, you will have significant responsibility and can grow with the team, while directly influencing the research and platform direction at Callosum. Competitive salary and equity options are offered.

Forward Deployed Engineer - Member of Technical Staff

Callosum

About the Role Forward Deployed Engineers own Callosum's customer engagements end to end. You will work directly with customer engineering teams to understand their AI workloads, design and run rigorous evaluations against their production pipelines, integrate optimised workflows into their systems, and make sure the results hold in production. At Callosum, FDEs are the technical bridge between what customers run and what we build. What you learn shapes the research and platform roadmap. The evaluations you run are how we prove the technology in production. As an early member of the team, you will help build the methodology, tooling, and engagement practices the function runs on. The role is engineering-led: you will employ technical judgment to recognise what kind of system a problem needs and prototype it. You will work closely with the engineers designing new intelligent systems, with the platform team that serves them, and directly with senior technical stakeholders on the customer side. We run a flat Member of Technical Staff structure. Responsibility and scope grow with what you demonstrate, including leadership of the FDE function as the team scales. What You'll Build Customer engagement from first workload scoping through evaluation, integration, and production deployment Design and run rigorous evaluations on customers' production workloads: set success criteria, instrument the comparison, present results to their engineering teams Develop deep technical understanding of customer systems (retrieval pipelines, agentic workflows, code intelligence stacks) and design optimised Callosum workflows Integrate Callosum workflows into customer environments and support them through to stable production Work with our research and platform teams to turn what you learn into general product capability Build the team's shared methodology: evaluation harnesses, integration runbooks, and engagement standards What You Bring Strong software engineering background with experience in production ML/LLM systems, distributed systems, or large-scale infrastructure Rigour in evaluation or benchmarking work Experience working directly with external engineering teams, whether as a forward deployed engineer, solutions architect, technical consultant, or comparable customer-facing role A bias for action, comfort with ambiguity and a strong sense of ownership. What Sets You Apart Experience with LLM serving infrastructure, agent orchestration, or retrieval systems Familiarity with novel accelerator platforms Experience at an early-stage company or in a founding engineer role What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

12/06/2026

Full time

About the Role Forward Deployed Engineers own Callosum's customer engagements end to end. You will work directly with customer engineering teams to understand their AI workloads, design and run rigorous evaluations against their production pipelines, integrate optimised workflows into their systems, and make sure the results hold in production. At Callosum, FDEs are the technical bridge between what customers run and what we build. What you learn shapes the research and platform roadmap. The evaluations you run are how we prove the technology in production. As an early member of the team, you will help build the methodology, tooling, and engagement practices the function runs on. The role is engineering-led: you will employ technical judgment to recognise what kind of system a problem needs and prototype it. You will work closely with the engineers designing new intelligent systems, with the platform team that serves them, and directly with senior technical stakeholders on the customer side. We run a flat Member of Technical Staff structure. Responsibility and scope grow with what you demonstrate, including leadership of the FDE function as the team scales. What You'll Build Customer engagement from first workload scoping through evaluation, integration, and production deployment Design and run rigorous evaluations on customers' production workloads: set success criteria, instrument the comparison, present results to their engineering teams Develop deep technical understanding of customer systems (retrieval pipelines, agentic workflows, code intelligence stacks) and design optimised Callosum workflows Integrate Callosum workflows into customer environments and support them through to stable production Work with our research and platform teams to turn what you learn into general product capability Build the team's shared methodology: evaluation harnesses, integration runbooks, and engagement standards What You Bring Strong software engineering background with experience in production ML/LLM systems, distributed systems, or large-scale infrastructure Rigour in evaluation or benchmarking work Experience working directly with external engineering teams, whether as a forward deployed engineer, solutions architect, technical consultant, or comparable customer-facing role A bias for action, comfort with ambiguity and a strong sense of ownership. What Sets You Apart Experience with LLM serving infrastructure, agent orchestration, or retrieval systems Familiarity with novel accelerator platforms Experience at an early-stage company or in a founding engineer role What We Offer Competitive Salary, determined by skills and experience Equity & Ownership Private healthcare We offer Visa sponsorship and relocation benefits to hire the best in the world We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

Open Role - Systems Engineer

Callosum

Overview We're always looking for exceptional engineers who want to work on the hardest problems in AI infrastructure - bringing novel accelerators to life, building the systems that orchestrate them, and charting the new territories of what heterogeneous compute can do. If you think you have unique skills to contribute, or have experience in any of the following, we'd love to hear from you: Required experience High-performance systems and low-level performance engineering Inference infrastructure, orchestration, or distributed systems Hardware bring-up, kernel development, or accelerator programming Simulation, modelling, or workload design Share your details and we'll reach out as the right opportunities emerge.

11/06/2026

Full time

Overview We're always looking for exceptional engineers who want to work on the hardest problems in AI infrastructure - bringing novel accelerators to life, building the systems that orchestrate them, and charting the new territories of what heterogeneous compute can do. If you think you have unique skills to contribute, or have experience in any of the following, we'd love to hear from you: Required experience High-performance systems and low-level performance engineering Inference infrastructure, orchestration, or distributed systems Hardware bring-up, kernel development, or accelerator programming Simulation, modelling, or workload design Share your details and we'll reach out as the right opportunities emerge.

AI Infrastructure & Accelerator Systems Engineer

Callosum

A technology company in the United Kingdom is seeking exceptional engineers to work on challenging AI infrastructure projects. Ideal candidates will have skills in high-performance systems, distributed systems, and hardware programming. You will contribute to developing innovative solutions in performing AI tasks and help advance the field. Opportunities arise based on your unique skills, so if you're interested, please get in touch to discuss potential roles.

11/06/2026

Full time

A technology company in the United Kingdom is seeking exceptional engineers to work on challenging AI infrastructure projects. Ideal candidates will have skills in high-performance systems, distributed systems, and hardware programming. You will contribute to developing innovative solutions in performing AI tasks and help advance the field. Opportunities arise based on your unique skills, so if you're interested, please get in touch to discuss potential roles.

Callosum

15 job(s) at Callosum

Modal Window