We're looking for an experienced DevSecOps Engineer for an initial 3 month contract with great scope of extension. The role will be primarily remote based but will require site visits from time to time and due to the nature of the work consultants must be UK nationals and willing to undergo Security Clearance (or hold it already).
What you'll need:
- Proven experience in a DevOps, Platform Engineering, or Infrastructure Engineering role supporting production environments
- Strong experience working with Linux-based systems, including Shell Scripting (Bash) and VM administration
- Hands-on experience running and supporting containerised services using Docker, with the ability to read, understand, and modify Docker configurations
- Ability to support and troubleshoot AI/LLM-powered applications in production or pre-production environments
- Experience operating multi-container environments supporting Back End services such as databases, vector stores, or inference engines
- Solid understanding of CI/CD pipelines, with practical experience using GitLab, including pipeline configuration and environment variables
- Strong working knowledge of Git and source code management best practices
- Experience configuring and maintaining NGINX (or equivalent) as a reverse Proxy for Back End services and web applications
- Experience standing up and managing infrastructure using Infrastructure as Code tools (eg Terraform)
- Proficiency in Python, including supporting Back End services, scripts, or LLM application frameworks
- Experience working with relational databases (eg PostgreSQL), including deployment, configuration, and operational support
- Comfortable working across multiple projects at different stages of delivery, supporting shared platforms used by multiple teams
- Strong problem-solving skills with the ability to debug infrastructure, CI/CD, and application-level issues
- Ability to collaborate effectively with Data Science, AI, and Engineering teams, understanding boundaries between infrastructure, inference, and model training responsibilities
- Support and scale the AI platform to run multiple LLM-based projects across varying levels of maturity and technical stacks
What you'll be doing
- Own and maintain shared infrastructure used by multiple AI/LLM projects, ensuring stability, performance, and efficient use of space and storage
- Provide day-to-day DevOps and platform support to an increasing number of teams onboarding onto the new environment
- Design, deploy, and manage Linux-based virtual machines running containerised services for AI workloads
- Operate and support Dockerised core services, including Qdrant, LLDAP, PostgreSQL, and vLLM, confidently reading and modifying Docker configurations as required
- Manage and configure NGINX reverse Proxy services to expose project front ends and internal tools securely
- Support and contribute to CI/CD pipelines (GitLab), including pipeline configuration, environment variables, and repository management
- Collaborate closely with AI Lab teams to support project onboarding, troubleshooting, and platform evolution as tooling and architectural patterns mature
- Deploy and support open-source LLMs (eg Hugging Face models such as Intern Neural 7B) and associated inference tooling
- Support LLM application frameworks (LangChain, vLLM) and Python-based services without owning model training (handled by Data Science teams)
- Stand up and manage cloud and infrastructure-as-code components using tools such as Terraform
- Provide general Back End and infrastructure support, with some overlap into CI/CD and DevOps best practices
- Ensure platform reliability, security, and scalability as visibility and demand increase following successful AI initiatives