Skip to main content

Overview

Transcription converts the spoken audio in your video into timestamped captions in the source language. Powered by ElevenLabs Scribe v2, it produces word-level timing and speaker diarization — the essential foundation for translation and dubbing.

Starting a transcription

  1. Open your video from the dashboard
  2. Click Transcribe
  3. Select the spoken language, or choose Auto-detect
  4. Click Start
Auto-detect works well for most common languages. Select a specific language if you know it — this improves accuracy, especially for less common languages like Dutch or Chinese.

Supported languages

Neolli supports transcription in 10 languages (plus auto-detect):
FlagLanguageCode
🇺🇸Englisheng
🇪🇸Spanishspa
🇫🇷Frenchfra
🇩🇪Germandeu
🇮🇹Italianita
🇧🇷Portuguesepor
🇯🇵Japanesejpn
🇰🇷Koreankor
🇨🇳Chinesezho
🇳🇱Dutchnld
For the full capabilities matrix across all features, see Supported Languages.

Features

  • Speaker diarization — Automatically identifies and labels different speakers
  • Word-level timing — Each word gets its own precise timestamp for accurate syncing
  • Auto-detect — Identifies the spoken language automatically for common languages

Processing time

Video lengthEstimated timeMode
Under 30 min1–3 minutesSynchronous
Over 30 minProportional to lengthAsynchronous
You can close the browser while a job is running — it continues in the background. The dashboard shows job progress in real time.

File requirements

  • Max file size: 3 GB
  • Supported formats: MP4, MOV, MKV, AVI, WebM, and most common video/audio formats
  • Audio: Must contain a detectable audio track with speech
Transcription accuracy depends heavily on audio quality. Background noise, overlapping speakers, heavy music, and low recording quality will reduce accuracy. See Audio Quality Tips for guidance.

After transcription

Once complete, you can:

Credit cost

Transcription is charged at 58 credits per minute of audio. See Credit Costs for a detailed breakdown.