turian/insanely-fast-whisper-with-video

whisper-large-v3, incredibly fast, with video transcription

Input

Configure the inputs for the AI model.

Url

Video URL for yt-dlp to download the audio from. Either this or audio must be provided.

task

Task to perform: transcribe or translate to another language. (default: transcribe).

Audio

Audio file. Either this or url must be provided.

Hf Token

Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.

Language

Optional. Language spoken in the audio, specify None to perform language detection.

timestamp

Whisper supports both chunked as well as word level timestamps. (default: chunk).

Batch Size

100

Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 64).

Diarise Audio

Enable

Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.

Output

The generated output will appear here.

No output yet

Click "Generate" to create an output.