AI Caption Generator

Commercial use OK 380+ models No watermark No sign-up needed
Model:
+ GPT-5, Claude, Gemini
Burn viral TikTok-style captions into your video — big bold text with word-by-word highlight animation (ASS karaoke timing). Prefer plain SRT/VTT sidecar files? Use the subtitle tool instead — this one is style-first and always burns in.

Drag a video here or click to upload

MP4, MOV, WebM up to 200MB — 99 languages supported via Whisper

Word-by-word highlight color (karaoke effect).
Token estimate for your clip
Upload a video to see the exact cost for your clip.

Where viral captions earn their keep

TikTok / Reels / Shorts

Short-form completion rate jumps 30-40% with word-by-word captions. The TikTok Neon preset is the one the top creators all use.

YouTube viral clips

MrBeast-style bold-text overlays on reaction / commentary footage. YouTube Lowerthird preset with a translucent box.

Podcast clips for social

Audio-first shows re-clipped for IG Reels / TikTok. Podcast preset keeps captions readable over the talking-head frame.

4-step how-to

  1. Upload your video. We pull audio, run Whisper STT, and read timing for every spoken segment.
  2. Pick a style preset — TikTok Neon is the safe viral choice. Font / highlight color / position override the preset defaults.
  3. We build an ASS subtitle file with word-by-word karaoke timing (the effect where words change color as they're spoken).
  4. ffmpeg burns the captions into the video. Processing takes 30-90 seconds — close the tab; we email you when it's done.

vs. CapCut, Submagic, Opus Clip, Captions.AI

CapCut's auto-captions are free and good, but you need the CapCut editor installed and you can't batch them. Submagic is $20/mo for unlimited. Opus Clip is $30/mo for long-form → short-form with auto-captions bundled. Captions.AI (App Store) is $10/mo. This tool runs Whisper large-v3 + an ffmpeg ASS karaoke burn-in — the same two primitives all the paid tools use — inside your token pool. For one-offs and batch social exports, it's the fastest path.

Captions vs subtitles — what's the difference?

Subtitles (see /video/subtitle/) are a utility: SRT/VTT sidecar files the viewer's player can toggle on/off, designed for accessibility and upload to YouTube Studio. Captions (this tool) are a style: big bold text burned into every frame with karaoke animation, designed to earn completion rate on TikTok / Reels / Shorts where 85% of viewers keep sound off. Use subtitle for YouTube CCs; use caption for viral short-form.

When NOT to caption

  • Videos that already have burned-in captions — the text will double up and look broken.
  • Long-form YouTube uploads — use the sidecar SRT from /video/subtitle/ instead so viewers can toggle CCs.
  • Videos with zero dialogue — there's nothing to caption. Music-only clips should add text overlays manually.
Advanced options
Result
Tokens running low. Get More Tokens
Want better results? Premium models (GPT-5, Claude, Gemini) deliver higher quality. View Plans

❤️ Love Free.ai? Tell your friends!

Sign up to get a referral link and earn 25,000 tokens per friend.

Want more? Sign up free for 5K tokens/day + 10K bonus
Sign Up Free

Processing your request...

Burn viral TikTok-style captions into any video — word-by-word karaoke highlight, 7 style presets, 8 fonts, 99-language Whisper STT. Always burn-in.

How to Use AI Caption Generator

1
Enter your input

Type text, upload a file, or describe what you want. No account needed.

2
Click generate

Our AI processes your request in seconds using the best open-source models.

3
Download & share

Download, copy, or share your result. Free for personal and commercial use.

Use this tool via API

Automate this tool from your own code. OpenAI-compatible REST endpoint, Bearer-token auth, no extra SDK required. Token costs match the web interface.

curl -X POST https://api.free.ai/v1/video/generate/ \
  -H "Authorization: Bearer sk-free-..." \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat playing piano", "duration": 4}'

AI Caption Generator — FAQ

Burns viral-style captions into any video with word-by-word karaoke-timing animation. Different from /video/subtitle/: subtitle outputs SRT/VTT sidecar files that players toggle on/off. Caption is always burn-in, style-first, and tuned for TikTok / Reels / Shorts where captions must be hardcoded into every frame.

Four steps: (1) extract mono 16kHz audio from your video, (2) transcribe with Whisper large-v3 for 99-language word timing, (3) build an ASS subtitle file with word-level \kf karaoke timing tags, (4) ffmpeg burns the ASS into every frame using libass for clean anti-aliased text.

50 tokens per second (2,000-token minimum). A 30-second clip is ~2,000 tokens (the floor kicks in); a 60-second clip is ~3,000; a 3-minute clip is ~9,000. STT drives most of the cost; the burn-in adds about 25% on top.

Seven: TikTok Neon (yellow Montserrat, word-by-word highlight), YouTube Lowerthird (white Roboto in a translucent box), Meme (white Impact with a black outline), Podcast (Poppins in a dark rounded box), Keynote (Arial Black at the top), Cinematic (italic Oswald at the bottom), and TED (left-aligned Roboto).

Yes. The Font dropdown overrides the preset with Impact, Montserrat, Bebas Neue, Arial Black, Oswald, Poppins, Anton, or Roboto. The Highlight Color picker controls the word-by-word highlight (any hex). Position lets you override top / center / bottom regardless of preset.

Yes — 99 languages via Whisper. Auto-detect works on 99% of clips. You can force a language if Whisper mis-detects (common on short clips under 5 seconds or mixed-language audio).

No. If there is no spoken dialogue, Whisper returns no segments and we surface a clear "No speech detected" error so you do not burn tokens on an impossible job.

CapCut is free, works offline after install, and has great auto-captions — if CapCut is already your editor, use it there. AI Caption Generator skips the install and gives you batch-friendly browser access. The underlying Whisper + libass chain is the same primitive.

Submagic is $20/mo for unlimited captioning with dozens of style packs. Opus Clip is $30/mo with viral long-form-to-short AI clipping bundled. Captions.AI is $10/mo on mobile. All three use Whisper underneath; their real value is the style library and clip-detection. For one-offs and smaller volumes AI Caption Generator is free inside your token pool.

Whisper gives segment-level timestamps — we evenly distribute the segment duration across its words to derive per-word timing. On fast-spoken segments the estimate can drift by ~0.1 seconds. For frame-accurate timing, use the downloadable .ass file and edit in Aegisub.

Yes. After export, both the captioned MP4 and the raw .ass file are downloadable — edit the .ass in Aegisub if you want pixel-perfect word timing, then re-burn locally with ffmpeg -vf subtitles=file.ass.

Yes. POST multipart to /v1/video/caption/ with `file`, `style` (tiktok-neon / youtube-lower / meme / podcast / keynote / cinematic / ted), optional `font`, `highlight_color` (hex), `position`, `language`. Pre-flight: GET /v1/video/caption-quote/?duration=SECS. Snippets at /api/.

Sign up free for 10,000 tokens

Create Free Account

No credit card required

How would you rate this tool?

Love Free.ai? Tell your friends!