ElevenLabs v3
Studio-grade voice cloning and text-to-speech with emotional nuance in 29 languages.
ElevenLabs v3 is ElevenLabs' flagship voice model, released in June 2025, used by most professional AI audiobook and dubbing studios. It captures emotional nuance, breath, and prosody that older models miss. On Oakgen, voice cloning is included in the $19/month Pro plan — with a standalone ElevenLabs subscription starting at $22/month.
Capabilities at a glance
- 30-second voice cloning with studio-grade fidelity
- 29 languages with preserved voice character
- Punctuation-driven pacing and emotion control
- 4–8 second latency per request
- Same engine used by professional AI dubbing studios
Specs
- Starting price
- $0.04 / generation
- Generation time
- 4–8 seconds
- Max resolution
- 44.1 kHz stereo
- Inputs → outputs
- text → audio
How to use ElevenLabs v3
- 1Upload a clean 30-second sample for cloningNo background music, reverb, or multiple speakers. The sample can be in any supported language.
- 2Use punctuation for pacingCommas create short pauses, periods longer pauses, ellipses dramatic beats. ALL CAPS emphasizes the word.
- 3Stage directions in asterisks*Whispered* or *urgent* tags shape delivery. Works best on dialogue-heavy scripts.
API access
curl -X POST https://api.oakgen.ai/v1/generate/speech \
-H "Authorization: Bearer $OAKGEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-v3",
"voice_id": "your_cloned_voice_id",
"text": "*Whispered* the secret was hidden in the chapter."
}'Compared to other models
ElevenLabs v3 captures subtle emotion, breath, and prosody that MiniMax Speech HD misses. Pick ElevenLabs for audiobooks, drama, and advertising. Pick MiniMax for bulk narration at lower cost.
License & commercial use
Licensed through ElevenLabs' commercial terms.
Permitted on all paid Oakgen plans. Consent required for cloning — public-figure voices blocked by default.