Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Senior Audio AI Engineer – Voice Conversion & Singing (Open-Source | Delivery-Based)
Freelance · Project-Based · Remote
(Open-Source / MIT Friendly)
About Giramille
Giramille is a small but ambitious Brazilian children’s entertainment studio focused on premium, family-safe content across animation, music, and digital platforms, with a growing international footprint.
We operate as a lean, founder-driven startup, with high creative and technical standards, and a very pragmatic approach to execution. As a Latin American company, we are fully transparent that budget efficiency matters to us, especially given the currency gap (USD vs BRL). In return, we offer clarity, honesty, and a serious, delivery-oriented collaboration.
As part of our internal R&D and product pipeline, we are developing an AI-based voice system capable of generating spoken and sung voices, across multiple languages, with production-ready quality, benchmarked against ElevenLabs (speech) and Suno AI (singing).
This is not a research grant, an academic engagement, or an exploratory experiment.
We are engaging a professional to deliver a complete, functional, and production-ready system, whether built with open-source components or otherwise, evaluated under real-world animation, music, and entertainment industry standards — not research benchmarks.
Project Overview
We are seeking a Senior Audio AI Engineer / Audio ML Engineer to design and deliver a fully functional, end-to-end AI voice platform, focused on:
• Same-language voice conversion
• Cross-language voice conversion
• Spoken voice generation
• Singing voice generation (critical)
• Multi-language support (up to 32 languages)
• A simple, intuitive front-end for non-technical users
• A fully automated back-end pipeline
• Complete technical documentation
This role is ideal for engineers who enjoy building real systems, not demos, proofs of concept, or research prototypes.
Important Positioning (Read Carefully)
We are not expecting you to recreate proprietary systems like ElevenLabs or Suno from scratch.
We are looking for the best possible production-ready result achievable through open-source tools, strong engineering judgment, and pragmatic trade-offs, evaluated honestly against those benchmarks.
If your profile is purely academic or research-oriented, this role will not be a good fit.
Commercial Model (Freelance / Project-Based)
• Engagement model: 100% delivery-based (risk on delivery)
• Total fixed compensation: USD 1,000
• Payment method: PayPal
• Payment timing: only after final delivery, live demonstration, and approval
• Upfront payments: None
• Milestones or partial payments: None
• If requirements are not fully met: no payment is due
We fully respect that this budget is modest by global standards. It is structured for engineers who are confident, pragmatic, and willing to bet on their execution rather than billing hours or research time.
Project Scope – Unified Delivery
The project is evaluated as one single, unified delivery.
• Partial deliveries
• Prototypes
• Research experiments
• “Almost working” systems
do not qualify as delivery.
Benchmark reference quality:
• ElevenLabs — spoken voice
• Suno AI — singing voice
Side-by-side comparisons may be performed.
Core Voice Capabilities (Mandatory)
• Same-language voice conversion
• Cross-language voice conversion (up to 32 languages)
• English delivered in two clearly native variants:
– American English
– British English
• Support for spoken and sung voice
• Output formats: WAV and MP3
• All outputs must be 100% production-ready
Audio & Singing Quality Expectations (Summary)
• No metallic, robotic, hollow, or synthetic sound
• No audible artifacts, warbling, buzzing, or distortion
• Singing must have stable pitch, natural phrasing, and musical usability
• Voices must sound native in each target language (no foreign accent)
• Full automation — no manual audio cleanup or post-production
Front-End & Usability
• Fully usable by non-technical users
• Simple workflow (similar to ElevenLabs)
• Voice upload and training
• Language selection
• Text review/edit before generation
• Multi-character support (persistent voices)
• No scripting, CLI, or manual configuration
Development Approach
You may:
• Use MIT-licensed or similarly permissive open-source
• Build from scratch
• Use a hybrid approach
Regardless of approach, quality, automation, and delivery requirements remain mandatory.
Documentation
Delivery must include a full technical documentation package (“technical bible”), covering:
• Architecture
• Setup & deployment
• APIs
• Training and generation workflows
• Maintenance and scalability
Documentation must allow Giramille’s team to operate and extend the system independently.
Timeline
• Final delivery deadline: January 15th, 2026 (hard deadline).
• Partial deliveries, prototypes, or unfinished systems do not qualify as delivery.
• Missed deadline = non-acceptance
Ideal Candidate Profile
• Senior experience in Audio AI, TTS, voice conversion, or audio ML
• Experience with singing voice or music-related AI is a strong plus
• Comfortable owning end-to-end delivery
• Pragmatic, execution-focused mindset
• Honest about technical trade-offs and limitations
Key Skills
Ranked by relevanceReady to apply?
Join Giramille and take your career to the next level!
Application takes less than 5 minutes

