Can I get SRT subtitles for my video podcast?

Yes — pick SRT or WebVTT as the output format. Speaker labels are included inline (SRT) or as tags (VTT) that most modern players render correctly.

AI Transcription Live Transcription YouTube Transcription Meeting Transcription Zoom Transcription Audio to Text Video to Text Phone Call Transcription More →

Podcast Transcription

Commercial use OK 380+ models No watermark No sign-up needed

Upload a podcast episode and get a clean, speaker-labeled transcript with auto-detected chapter markers from silence gaps. Long-form files up to 2GB, 99 languages, Whisper-large-v3 accuracy. Export as SRT/VTT for your video podcast, plain TXT for show notes, or JSON for editing in Descript-style workflows.

Drag and drop your podcast episode, or click to browse

MP3, WAV, M4A, OGG, MP4 — long-form episodes up to 2GB

Language

Engine

Output format

Label speakers (host / guest) — label who's speaking when (+50% tokens)

Number of speakers:

Auto chapter markers — silence gaps >2s

Chapter markers are computed client-side from segment gaps and attached to the transcript. Paste them into YouTube or Spotify descriptions as-is.

Built for podcasters + show editors

Show notes in one paste

Upload the episode, download the TXT. Speaker labels inline, chapter timestamps ready for your Spotify/YouTube description, blog post written in 10 minutes instead of 4 hours.

Video podcast subtitles

Export SRT or WebVTT with speaker labels. Drop straight into Premiere, Final Cut, or DaVinci Resolve — or upload alongside your YouTube video for clean captions.

Text-based episode editing

JSON export gives you every word with start/end timestamps. Pipe into Descript, Reaper, or a custom workflow — edit by highlighting text instead of scrubbing.

How podcast transcription works

Drag your episode onto the drop zone — MP3, WAV, M4A, MP4, up to 2GB.
Leave speaker labels and chapter markers on (they are the defaults). Pick your output format.
We check the duration + price it up before you spend any tokens. Click Transcribe.
Download speaker-labeled TXT, SRT, VTT, or JSON. Chapter markers ship alongside, ready to paste.

Free.ai podcast transcription vs Descript, Riverside, Otter

Feature	Free.ai	Descript	Riverside	Otter.ai
Price	Pay-per-use ($0.003/min)	$15-30/mo	$19/mo	$16.99/mo
Max file size	2 GB	5 GB	Tied to record session	500 MB (varies)
Speaker diarization
Auto chapter markers	(silence-based)	Manual	—	Paid tier
SRT/VTT export				Paid
Languages	99	22	100+	English-focused
Public API		—	—	Limited

Competitor pricing reflects publicly-listed tiers in 2026. Check each provider for current plans.

Transcribe podcasts to text with AI for free. Speaker labels, chapter markers, SRT export.

How to Use Podcast Transcription

Enter your input

Type text, upload a file, or describe what you want. No account needed.

Click generate

Our AI processes your request in seconds using the best open-source models.

Download & share

Download, copy, or share your result. Free for personal and commercial use.

Use this tool via API

Automate this tool from your own code. OpenAI-compatible REST endpoint, Bearer-token auth, no extra SDK required. Token costs match the web interface.

API Documentation Get API Key

curl -X POST https://api.free.ai/v1/stt/ \
  -H "Authorization: Bearer sk-free-..." \
  -H "Content-Type: application/json" \
  -d '{"file": "@audio.mp3", "language": "auto"}'

Related Free AI Tools

AI Transcription

Live Transcription

YouTube Transcription

Meeting Transcription

Zoom Transcription

Audio to Text

Video to Text

Phone Call Transcription

Podcast Transcription — FAQ

The podcast tool defaults to speaker diarization and chapter markers (silence-gap detection >2s), and supports long-form files up to 2GB. Output formats include SRT + VTT for show-notes video clips, plain TXT for blog posts, and structured JSON with per-turn timestamps + speaker labels for editing in Descript-style workflows.

Up to 2GB per file — roughly a 14-hour audio podcast at 128 kbps MP3. Long files are chunked server-side for resilience; you get a single merged transcript back.

Yes. Speaker diarization is ON by default. We detect 2-10 distinct voices via ECAPA voice embeddings, label them Speaker 1 / 2 / ... and apply the labels to every segment. You can rename them in the result view.

Silence gaps longer than 2 seconds — the natural breaks podcasters use between segments. Each chapter gets a timestamp you can paste straight into your show-notes with a "Chapters:" block for YouTube + Spotify.

Descript charges $15-$30 per month for 10 hours of transcription, tied to their editor. We charge per-use at ~500 tokens/min on Whisper ($5 = 200K tokens = ~400 minutes), no subscription, plain export you can paste anywhere.

Riverside is a recording studio that transcribes your own sessions for free inside their app, but only after recording with them. We transcribe any MP3/WAV/MP4 regardless of where it was recorded.

Otter caps at 300 minutes/month on the free tier and is English-focused. We support 99 languages at the same Whisper-large-v3 accuracy with no monthly cap — you pay per minute transcribed.

Yes — pick SRT or WebVTT as the output format. Speaker labels are included inline (SRT) or as <v Speaker N> tags (VTT) that most modern players render correctly.

Whisper-large-v3 handles music beds and light reverb well (typical word-error rate 3-7%). Very loud music or heavy overlap degrades accuracy — consider running /music/vocal-remover/ first on a copy, or splitting your cold opens.

Whisper handles most common names, but highly brand-specific jargon may need a post-edit pass. A ~30-minute episode typically has 5-10 brand/name corrections to apply manually.

Upload them one at a time here, or use our /batch/ feature once signed in to queue up a season. The API at /api/ also accepts POST /v1/stt/ for programmatic batching.

No. Uploaded files are deleted after transcription completes. Your transcript sits in your /account/ history for download if signed in; anonymous users get a 24-hour share link.

Create Free Account

No credit card required

How would you rate this tool?

Podcast Transcription

Built for podcasters + show editors

Show notes in one paste

Video podcast subtitles

Text-based episode editing

How podcast transcription works

Free.ai podcast transcription vs Descript, Riverside, Otter

Result

How to Use Podcast Transcription

Enter your input

Click generate

Download & share

Use this tool via API

Related Free AI Tools

Podcast Transcription — FAQ

How does podcast transcription differ from the generic tool?

What is the longest podcast you can transcribe?

Do you label speakers automatically?

What are chapter markers based on?

How does this compare to Descript?

How does this compare to Riverside?

How does this compare to Otter.ai?

Can I get SRT subtitles for my video podcast?

What accuracy should I expect on podcasts with music beds?

Does it recognize branded terms and guest names?

Can I process multiple episodes at once?

Will my audio be stored after transcription?

Get 10,000 Free Tokens

Wait — 30K free tokens/day!

Want more?