minimax/speech-02-hd

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.

Input
Configure the inputs for the AI model.

Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds.

-12
12

Semitone offset applied to the voice (−12 to +12).

0.5
2

Speech speed multiplier (0.5–2.0). Lower is slower, higher is faster.

0
10

Relative loudness. 1.0 is default MiniMax gain. Range 0–10.

MP3 bitrate in bits per second. Only used when audio_format is mp3.

mono for 1 channel (default), stereo for 2 channels.

Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion.

Voice to synthesize. Pick any MiniMax system voice or a voice_id returned by https://replicate.com/minimax/voice-cloning.

Audio sample rate in Hz.

File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes.

Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale.

Return MiniMax subtitle metadata with sentence timestamps (non-streaming only).

Improve number/date reading for English text (adds a small amount of latency).

Output
The generated output will appear here.

No output yet

Click "Generate" to create an output.

speech-02-hd - ikalos.ai