minimax/speech-02-hd

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.

Input

Configure the inputs for the AI model.

Text *

Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds.

Pitch

-12

Semitone offset applied to the voice (−12 to +12).

Speed

0.5

Speech speed multiplier (0.5–2.0). Lower is slower, higher is faster.

Volume

Relative loudness. 1.0 is default MiniMax gain. Range 0–10.

bitrate

MP3 bitrate in bits per second. Only used when audio_format is mp3.

channel

mono for 1 channel (default), stereo for 2 channels.

emotion

Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion.

Voice Id

Voice to synthesize. Pick any MiniMax system voice or a voice_id returned by https://replicate.com/minimax/voice-cloning.

sample_rate

Audio sample rate in Hz.

audio_format

File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes.

language_boost

Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale.

Subtitle Enable

Enable

Return MiniMax subtitle metadata with sentence timestamps (non-streaming only).

English Normalization

Enable

Improve number/date reading for English text (adds a small amount of latency).

Output

The generated output will appear here.

No output yet

Click "Generate" to create an output.