SoundHound AI
Machine Learning Engineer II, Text-To-Speech
SoundHound AIFrance7 days ago
Full-timeRemote FriendlyEngineering, Information Technology
Ready to be a part of something big? Join our team at SoundHound AI, where AI innovation and real-world impact come together. We unite voice AI, generative AI, and conversational AI to deliver powerful AI solutions that reimagine how people interact with the products and services they rely on. Whether it’s voice-enabling vehicles, streamlining patient journeys, or enhancing customer service, our multilingual, omnichannel AI technology touches the lives of hundreds of millions of people around the world.The Machine Learning Engineer on our Text-To-Speech team plays a crucial role in building and refining the models that define our unique voice experiences. The position is actively involved in the entire development lifecycle, from processing data to training and deploying our core TTS systems. Working alongside senior researchers and engineers, the role helps create high-quality, natural-sounding voices to be integrated into a wide range of products. The position’s contributions directly impact our ability to deliver an engaging and seamless conversational AI experience to users worldwide.

What You'll Do

  • Implement, train, and evaluate state-of-the-art TTS models to generate high-quality, expressive speech targeted for our key products.
  • Collaborate with language specialists and data labelers to organize the collection and maintenance of essential speech data.
  • Contribute to the development of core speech synthesis inference engine.
  • Optimize models for production runtime.
  • Work with the systems and infrastructure teams to assist in the integration and deployment of TTS models into our production environment.
  • Analyze model performance and work with product stakeholders to identify areas of improvement. Contribute to the iterative enhancement of our TTS technology.
  • Stay current with the latest research and advancements in the TTS field and apply new techniques to our systems.

What You'll Bring

  • 3+ years of professional experience in machine learning, with a strong focus or interest in speech-related topics like TTS or ASR.
  • Excellent programming skills in Python and strong experience with PyTorch. Proficiency in C++ is a big plus.
  • Strong knowledge of and experience implementing key machine learning concepts such as transformers, speech tokenizers, diffusion, flow-matching, LoRA, GANs.
  • Familiarity with cloud technologies such as docker and kubernetes.
  • Experience with torchscript or onnx is a plus.
  • A track record of working with an entire machine learning pipeline, from data preprocessing to model training and evaluation, in particular for TTS and ASR models.
  • A collaborative spirit and the ability to work effectively with cross-functional teams.
  • Drawn to tackling complex technical challenges and eager to learn and grow in the field of speech synthesis.

[Please note that if your application is advanced, the initial step will be an invitation to partake in a pre-assessment.]

  • We recognize that not every candidate will meet every listed requirement. If you believe your skills and experiences position you to contribute meaningfully in this role, we encourage you to apply. You may offer strengths and perspectives we have not yet considered.

This position is available for remote work throughout France. Employees within a 100-kilometer radius of the Paris office are expected to work from the office on three pre-scheduled, company-wide “core days” per month to encourage in-person cross-team collaboration.

Compensation includes salary, equity, comprehensive healthcare, paid time off, and other benefits. Our recruiting team will provide a specific salary range based on location and years of experience.

By working at SoundHound AI, you will join hundreds of employees across the globe who strive every day to create exceptional AI-powered experiences for customers, employees, and patients.

We are a values-driven company that is supportive of one another, open and honest, undaunted by challenges, nimble and focused, and determined to excel and win. Our mission is to build voice AI for the world and use our global, diverse perspectives to achieve real generational breakthroughs.

SoundHound ensures that individuals with disabilities are provided reasonable accommodations to participate in the interview process, perform essential job functions, and receive other employment benefits.

Learn more about our philosophy, benefits, and culture at https://www.soundhound.com/careers.

To view our job applicant privacy policy, please visit https://static.soundhound.com/corpus/ta/applicantprivacynotice.html.

Key Skills

Ranked by relevance