£750 - £800 per day
2 days in London
We're working with a global healthcare and life sciences leader that's pioneering the use of AI and Machine Learning to develop advanced therapies for both existing and emerging diseases. Their mission is to make personalised treatments faster, more effective, and more accessible - and this role is key to that vision.
The team is building cutting-edge data infrastructure to power scientific and ML applications, and they're seeking a Data Engineer with strong experience developing scalable, high-quality data pipelines in the cloud.
The RoleYou'll join a team of data scientists, bioinformaticians, and engineers working at the intersection of healthcare and AI. Your focus will be on designing and maintaining the data pipelines that feed large-scale ML and research workflows.
Day-to-day responsibilities include:
Building and maintaining data pipelines using Python, SQL, Spark, and Google Cloud technologies (BigQuery, Cloud Storage).
Ensuring pipelines are robust, reliable, and optimised for AI/ML use cases.
Developing automated tests, documentation, and monitoring for production-grade data systems.
Collaborating with scientists and ML engineers to meet evolving data needs.
Participating in code reviews, introducing best practices, and continuously improving performance and quality.
Core Skills:
Strong experience with Python and SQL in production environments
Proven track record developing data pipelines using Spark, BigQuery, and cloud tools (preferably Google Cloud)
Familiarity with CI/CD and version control (git, GitHub, DevOps workflows)
Experience with unit testing (e.g., pytest) and automated quality checks
Understanding of agile software delivery and collaborative development
Nice to Have:
Experience with bioinformatics or large-scale biological data (e.g., genomics, proteomics)
Familiarity with orchestration tools such as Airflow or Google Workflows
Experience with containerisation (Docker)
Exposure to NLP, unstructured data processing, or vector databases
Knowledge of ML and AI-powered data products
Strong problem-solving skills and curiosity about scientific or AI-driven challenges
A focus on quality, scalability, and collaboration
The ability to work across cross-functional teams and translate complex requirements into robust data workflows