Podcast Transcription

Commercial use OK 380+ models No watermark No sign-up needed
Model:
+ GPT-5, Claude, Gemini
Upload a podcast episode and get a clean, speaker-labeled transcript with auto-detected chapter markers from silence gaps. Long-form files up to 2GB, 99 languages, Whisper-large-v3 accuracy. Export as SRT/VTT for your video podcast, plain TXT for show notes, or JSON for editing in Descript-style workflows.

Drag and drop your podcast episode, or click to browse

MP3, WAV, M4A, OGG, MP4 — long-form episodes up to 2GB

Chapter markers are computed client-side from segment gaps and attached to the transcript. Paste them into YouTube or Spotify descriptions as-is.
Token estimate for this clip
Podcast transcript
Auto-detected chapters

Transcribing your podcast...

Long episodes take several minutes. You can close this tab if email-when-done is checked.

Built for podcasters + show editors

Show notes in one paste

Upload the episode, download the TXT. Speaker labels inline, chapter timestamps ready for your Spotify/YouTube description, blog post written in 10 minutes instead of 4 hours.

Video podcast subtitles

Export SRT or WebVTT with speaker labels. Drop straight into Premiere, Final Cut, or DaVinci Resolve — or upload alongside your YouTube video for clean captions.

Text-based episode editing

JSON export gives you every word with start/end timestamps. Pipe into Descript, Reaper, or a custom workflow — edit by highlighting text instead of scrubbing.

How podcast transcription works

  1. Drag your episode onto the drop zone — MP3, WAV, M4A, MP4, up to 2GB.
  2. Leave speaker labels and chapter markers on (they are the defaults). Pick your output format.
  3. We check the duration + price it up before you spend any tokens. Click Transcribe.
  4. Download speaker-labeled TXT, SRT, VTT, or JSON. Chapter markers ship alongside, ready to paste.

Free.ai podcast transcription vs Descript, Riverside, Otter

Feature Free.ai Descript Riverside Otter.ai
PricePay-per-use ($0.003/min)$15-30/mo$19/mo$16.99/mo
Max file size2 GB5 GBTied to record session500 MB (varies)
Speaker diarization
Auto chapter markers (silence-based)ManualPaid tier
SRT/VTT exportPaid
Languages9922100+English-focused
Public APILimited
Competitor pricing reflects publicly-listed tiers in 2026. Check each provider for current plans.
Advanced options
Result
Tokens running low. Get More Tokens
Want better results? Premium models (GPT-5, Claude, Gemini) deliver higher quality. View Plans

❤️ Love Free.ai? Tell your friends!

Sign up to get a referral link and earn 25,000 tokens per friend.

Want more? Sign up free for 5K tokens/day + 10K bonus
Sign Up Free

Processing your request...

Transcribe podcasts to text with AI for free. Speaker labels, chapter markers, SRT export.

How to Use Podcast Transcription

1
Enter your input

Type text, upload a file, or describe what you want. No account needed.

2
Click generate

Our AI processes your request in seconds using the best open-source models.

3
Download & share

Download, copy, or share your result. Free for personal and commercial use.

Use this tool via API

Automate this tool from your own code. OpenAI-compatible REST endpoint, Bearer-token auth, no extra SDK required. Token costs match the web interface.

curl -X POST https://api.free.ai/v1/stt/ \
  -H "Authorization: Bearer sk-free-..." \
  -H "Content-Type: application/json" \
  -d '{"file": "@audio.mp3", "language": "auto"}'

Podcast Transcription — FAQ

The podcast tool defaults to speaker diarization and chapter markers (silence-gap detection >2s), and supports long-form files up to 2GB. Output formats include SRT + VTT for show-notes video clips, plain TXT for blog posts, and structured JSON with per-turn timestamps + speaker labels for editing in Descript-style workflows.

Up to 2GB per file — roughly a 14-hour audio podcast at 128 kbps MP3. Long files are chunked server-side for resilience; you get a single merged transcript back.

Yes. Speaker diarization is ON by default. We detect 2-10 distinct voices via ECAPA voice embeddings, label them Speaker 1 / 2 / ... and apply the labels to every segment. You can rename them in the result view.

Silence gaps longer than 2 seconds — the natural breaks podcasters use between segments. Each chapter gets a timestamp you can paste straight into your show-notes with a "Chapters:" block for YouTube + Spotify.

Descript charges $15-$30 per month for 10 hours of transcription, tied to their editor. We charge per-use at ~500 tokens/min on Whisper ($5 = 200K tokens = ~400 minutes), no subscription, plain export you can paste anywhere.

Riverside is a recording studio that transcribes your own sessions for free inside their app, but only after recording with them. We transcribe any MP3/WAV/MP4 regardless of where it was recorded.

Otter caps at 300 minutes/month on the free tier and is English-focused. We support 99 languages at the same Whisper-large-v3 accuracy with no monthly cap — you pay per minute transcribed.

Yes — pick SRT or WebVTT as the output format. Speaker labels are included inline (SRT) or as <v Speaker N> tags (VTT) that most modern players render correctly.

Whisper-large-v3 handles music beds and light reverb well (typical word-error rate 3-7%). Very loud music or heavy overlap degrades accuracy — consider running /music/vocal-remover/ first on a copy, or splitting your cold opens.

Whisper handles most common names, but highly brand-specific jargon may need a post-edit pass. A ~30-minute episode typically has 5-10 brand/name corrections to apply manually.

Upload them one at a time here, or use our /batch/ feature once signed in to queue up a season. The API at /api/ also accepts POST /v1/stt/ for programmatic batching.

No. Uploaded files are deleted after transcription completes. Your transcript sits in your /account/ history for download if signed in; anonymous users get a 24-hour share link.

Sign up free for 10,000 tokens

Create Free Account

No credit card required

How would you rate this tool?

Love Free.ai? Tell your friends!