Senior Machine Learning Engineer - Speech / Voice AI remote

  • Robert Walters
  • Manchester, Lancashire
  • 28/10/2025
Contractor Information Technology Telecommunications

Job Description

Our client is a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities to thrive in education, work, and everyday life. They are looking to develop an in-house voice generation and audio delivery system to enhance accessibility and emotional engagement and searching for the ML Engineer to work remotely that's going to make it happen!

Senior Machine Learning Engineer - Speech / Voice AI (remote)Contract length: 3-monthIR-35 determination: OutsideLocation: Fully remote

Our client is a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities to thrive in education, work, and everyday life. They offer businesses a personal productivity app featuring tools for task breakdown, priority-setting, and structured support to manage anxiety, procrastination, and executive dysfunction. The platform combines tailored learning resources, assistive technology guidance, and mental health content in one accessible space. It serves both students and professionals, helping them build resilience, independence, and sustainable wellbeing through behaviour-change frameworks.

Our client is looking for someone to:- Develop an in-house voice generation and audio delivery system to enhance accessibility and emotional engagement.- Build a text-to-speech capability that produces natural, empathetic voices for guided exercises and wellbeing content.- Implement multilingual functionality and customizable voice tones to support diverse user needs.- Enable dynamic personalization so users receive content in voices and styles suited to their preferences.- Integrate the audio system seamlessly with the existing app and backend for real-time playback and consistency across devices.- Create an inclusive, emotionally intelligent audio experience that deepens user connection and supports lasting behavioural wellbeing.

Required skills:- Strong background in Machine Learning / Deep Learning with hands-on experience in speech or audio processing.- Experience fine-tuning or deploying modern TTS models (e.g. VITS, Bark, or FastSpeech2).- Proficiency in PyTorch (or similar) and comfortable optimizing GPU inference.- Experience deploying ML models to production and integrating via APIs.- Familiarity with AWS, GCP, or Azure for scalable deployment.

Desirable:- Understanding of speaker cloning or emotional prosody control.- Experience with multilingual TTS or phoneme alignment.- Interest in ethical AI and accessible, emotionally sensitive applications.

This is an exciting opportunity to help shape an inclusive AI experience that brings empathy and accessibility to users around the world.

Robert Walters Operations Limited is an employment business and employment agency and welcomes applications from all candidates