Audio Models
Audio AI models on LORY
9 models available — compare each audio model's strengths, including ElevenLabs Music and MiniMax Music 2.6, then try the best fit in your own story project.
Try LORY free — no subscription
Start with free welcome credits — we never ask for payment info during your trial. Pay only when you decide to top up after your credits run out.
ACE-Step – Text to Audio
Text-to-audio generations with optional structured lyrics and genre tags.
Eleven v3 – Text to Speech
ElevenLabs' most expressive TTS model — cinematic delivery, emotional range, and dramatic pacing. Ideal for trailers, narration, and character dialogue.
ElevenLabs Music
ElevenLabs Eleven Music (music_v1) — full-track music generation with vocals or instrumental, multilingual singing, and 44.1 kHz studio-quality output.
Eleven Multilingual v2 – Text to Speech
High-quality multilingual text-to-speech by ElevenLabs with 21 preset voices, style control, and speed adjustment.
MiniMax Music 2.6
Full-track music generation with optional structured lyrics, vocal or instrumental output, and configurable audio settings.
Chatterbox – Speech to Speech
Voice conversion from a source clip with an optional target voice reference.
Chatterbox Turbo – Text to Speech
Turbo text-to-speech with preset voices and optional 5-10s voice cloning.
Stable Audio 2.5
Text-to-audio generations for full-length music and SFX (up to ~3 minutes).
Stable Audio 2.5 – Audio to Audio
Audio-to-audio transformation with prompt-driven restyling and a strength control to preserve or replace the source.