Giramille
Artificial Intelligence Engineer
GiramilleUkraine9 hours ago
ContractRemote FriendlyEngineering, Information Technology

Senior Audio AI Engineer – Voice Conversion & Singing (Open-Source | Delivery-Based)


Freelance · Project-Based · Remote

(Open-Source / MIT Friendly)


About Giramille


Giramille is a small but ambitious Brazilian children’s entertainment studio focused on premium, family-safe content across animation, music, and digital platforms, with a growing international footprint.


We operate as a lean, founder-driven startup, with high creative and technical standards, and a very pragmatic approach to execution. As a Latin American company, we are fully transparent that budget efficiency matters to us, especially given the currency gap (USD vs BRL). In return, we offer clarity, honesty, and a serious, delivery-oriented collaboration.


As part of our internal R&D and product pipeline, we are developing an AI-based voice system capable of generating spoken and sung voices, across multiple languages, with production-ready quality, benchmarked against ElevenLabs (speech) and Suno AI (singing).


This is not a research grant, an academic engagement, or an exploratory experiment.

We are engaging a professional to deliver a complete, functional, and production-ready system, whether built with open-source components or otherwise, evaluated under real-world animation, music, and entertainment industry standards — not research benchmarks.


Project Overview


We are seeking a Senior Audio AI Engineer / Audio ML Engineer to design and deliver a fully functional, end-to-end AI voice platform, focused on:


• Same-language voice conversion

• Cross-language voice conversion

• Spoken voice generation

• Singing voice generation (critical)

• Multi-language support (up to 32 languages)

• A simple, intuitive front-end for non-technical users

• A fully automated back-end pipeline

• Complete technical documentation


This role is ideal for engineers who enjoy building real systems, not demos, proofs of concept, or research prototypes.



Important Positioning (Read Carefully)


We are not expecting you to recreate proprietary systems like ElevenLabs or Suno from scratch.


We are looking for the best possible production-ready result achievable through open-source tools, strong engineering judgment, and pragmatic trade-offs, evaluated honestly against those benchmarks.


If your profile is purely academic or research-oriented, this role will not be a good fit.



Commercial Model (Freelance / Project-Based)


• Engagement model: 100% delivery-based (risk on delivery)

• Total fixed compensation: USD 1,000

• Payment method: PayPal

• Payment timing: only after final delivery, live demonstration, and approval

• Upfront payments: None

• Milestones or partial payments: None

• If requirements are not fully met: no payment is due


We fully respect that this budget is modest by global standards. It is structured for engineers who are confident, pragmatic, and willing to bet on their execution rather than billing hours or research time.



Project Scope – Unified Delivery


The project is evaluated as one single, unified delivery.


• Partial deliveries

• Prototypes

• Research experiments

• “Almost working” systems


do not qualify as delivery.


Benchmark reference quality:

• ElevenLabs — spoken voice

• Suno AI — singing voice


Side-by-side comparisons may be performed.



Core Voice Capabilities (Mandatory)


• Same-language voice conversion

• Cross-language voice conversion (up to 32 languages)

• English delivered in two clearly native variants:

– American English

– British English

• Support for spoken and sung voice

• Output formats: WAV and MP3

• All outputs must be 100% production-ready



Audio & Singing Quality Expectations (Summary)


• No metallic, robotic, hollow, or synthetic sound

• No audible artifacts, warbling, buzzing, or distortion

• Singing must have stable pitch, natural phrasing, and musical usability

• Voices must sound native in each target language (no foreign accent)

• Full automation — no manual audio cleanup or post-production



Front-End & Usability


• Fully usable by non-technical users

• Simple workflow (similar to ElevenLabs)

• Voice upload and training

• Language selection

• Text review/edit before generation

• Multi-character support (persistent voices)

• No scripting, CLI, or manual configuration



Development Approach


You may:


• Use MIT-licensed or similarly permissive open-source

• Build from scratch

• Use a hybrid approach


Regardless of approach, quality, automation, and delivery requirements remain mandatory.



Documentation


Delivery must include a full technical documentation package (“technical bible”), covering:


• Architecture

• Setup & deployment

• APIs

• Training and generation workflows

• Maintenance and scalability


Documentation must allow Giramille’s team to operate and extend the system independently.



Timeline


• Final delivery deadline: January 15th, 2026 (hard deadline).

• Partial deliveries, prototypes, or unfinished systems do not qualify as delivery.

• Missed deadline = non-acceptance


Ideal Candidate Profile


• Senior experience in Audio AI, TTS, voice conversion, or audio ML

• Experience with singing voice or music-related AI is a strong plus

• Comfortable owning end-to-end delivery

• Pragmatic, execution-focused mindset

• Honest about technical trade-offs and limitations

Key Skills

Ranked by relevance