AI Lip Sync

Commercial use OK 380+ models No watermark No sign-up needed
Model:
+ GPT-5, Claude, Gemini
Upload a talking-head video and either an audio track or a script — we'll re-render the mouth frame-by-frame to match the new audio. Powered by Sync Lipsync v2. Ideal for redubbing, ADR, voice-over replacement, or making a silent clip talk.

Drag a video here or click

MP4 / MOV / WebM · max 100MB

· ·

Drag an audio file here or click

MP3 / WAV / M4A · max 50MB

·
0 / 1500
Durations don't match
Token estimate for your clip
Upload a video + audio (or type a script) to see the exact cost.
Download

Where AI lip-sync earns its keep

ADR / redub

Re-record a line in the booth, drop it in, the mouth re-renders to match. No more reshoots over a mispronounced word.

Voice-over swap

Shoot with any actor, dub with your preferred voice-over artist (or a TTS voice) — the lips follow, not lead.

Talking avatars

Give a silent portrait or AI-generated character a voice. Chain with /image-to-video/ to animate a still portrait first, then make it speak.

How AI lip sync works

Step 1

Upload video

Clear forward-facing face works best. Multi-speaker, profile view, or rapid head turns reduce quality.

Step 2

Provide audio

Upload MP3 / WAV / M4A OR type a script and we'll TTS it with Kokoro (174 voices across 37 languages).

Step 3

Length-check

We warn if video and audio differ by more than 0.5 s. Auto-trim to the shorter length is checked by default.

Step 4

Render

Sync Lipsync v2 re-renders every mouth frame to phonetic-match the new audio. Typical 30-second clip: ~1–2 min.

Tips for the best lip-sync output

  • Single forward-facing speaker. Multi-speaker shots confuse the face detector.
  • Well-lit face. Heavy shadows on half the face hurt mouth tracking.
  • Audio at -6 dB to -3 dB peak. Clipped or whisper-quiet audio sync worse.
  • 30-second chunks render fastest. For 10+ minute videos, split into scenes.
Advanced options
Result
Tokens running low. Get More Tokens
Want better results? Premium models (GPT-5, Claude, Gemini) deliver higher quality. View Plans

❤️ Love Free.ai? Tell your friends!

Sign up to get a referral link and earn 25,000 tokens per friend.

Want more? Sign up free for 5K tokens/day + 10K bonus
Sign Up Free

Processing your request...

Create lip-synced videos with AI. Match audio to any face.

How to Use AI Lip Sync

1
Enter your input

Type text, upload a file, or describe what you want. No account needed.

2
Click generate

Our AI processes your request in seconds using the best open-source models.

3
Download & share

Download, copy, or share your result. Free for personal and commercial use.

Use this tool via API

Automate this tool from your own code. OpenAI-compatible REST endpoint, Bearer-token auth, no extra SDK required. Token costs match the web interface.

curl -X POST https://api.free.ai/v1/video/generate/ \
  -H "Authorization: Bearer sk-free-..." \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat playing piano", "duration": 4}'

AI Lip Sync — FAQ

Upload a talking-head video plus an audio track (or type a script for TTS) and the AI re-renders the mouth frame-by-frame to phonetic-match the new audio. Ideal for ADR, voice-over replacement, redubbing, or giving a silent portrait a voice.

Sync Lipsync v2 is a state-of-the-art mouth-rendering model. It analyses each phoneme in the audio, detects the face in each frame, and regenerates the lip region to match. The rest of the face, background, and body are untouched.

Lip-sync uses paid tokens (~10,000 minimum, scales with duration). Sign-up bonus tokens can be used once you're signed in.

MP4, MOV, WebM up to 100MB. Clips under 30 seconds work fastest. Single forward-facing speaker gives the cleanest lip-sync; multi-speaker or rapid head turns reduce quality.

MP3, WAV, M4A up to 50MB. Alternatively, type a script and pick from Kokoro's 174 voices across 37 languages — we'll TTS it and use that as the driving audio.

We warn you when durations differ by more than 0.5 seconds. The "auto-trim to shorter" toggle (on by default) cuts the longer of the two; otherwise the output covers only the overlapping window.

Best results: one clear forward-facing face, well-lit, mostly steady camera. Poor results: profile view, occluded face (sunglasses, masks), multiple competing faces, extreme close-ups with partial mouth in frame.

Dubbing (/video/dubbing/) is a full pipeline: STT → translate → TTS → lip-sync. Lip-sync is just the last step — you provide the audio yourself. Use lip-sync when you already have the voice-over track ready; use dubbing when you want to translate and re-voice from scratch.

Typical: 30-second clip renders in 1–2 minutes. The banner shows a wait estimate once you submit, and the result lands in your dashboard — you can close the tab.

Not in one pass — the model locks onto one face. For multi-speaker scenes, cut into single-speaker clips, lip-sync each, then stitch back together in a video editor.

No. Input files are deleted within minutes of the render. The output is kept on our CDN for 24h (7d for paid users) at the share link.

Yes — POST a multipart video + audio_file (or video + text + voice) to /v1/video/lip-sync/. See /api/ for docs.

Sign up free for 10,000 tokens

Create Free Account

No credit card required

How would you rate this tool?

Love Free.ai? Tell your friends!