AI Video Translator

Model:
Repost your YouTube in Spanish, Hindi, French — with your face actually saying the words. Upload a video, pick a target language + voice, and we transcribe, translate, re-voice, and mix the new audio back into the original clip. Toggle lip-sync to also re-render the mouth.

Drag & drop your video here, or click to browse

MP4, MOV, WebM up to 100MB · clips under 5 minutes work fastest

· ·
~1,500 tokens per minute of video (audio path) · ~3,000/min with lip-sync.
Extracting audio from video...
Translated video ready
Show transcript + translation
Original (transcribed)

                    
Translated

                    

Where the Video Translator pays off

Repost YouTube globally

One English upload → Spanish, Hindi, French, Portuguese, Indonesian re-uploads with the speaker's voice in each language. 4× the audience for one production.

Localize courses

Online educators dub a 50-video course into 10 languages overnight. The lip-sync option makes the talking-head segments feel native, not subtitled.

Localize ads

Run the same product testimonial across 10 country-specific ad accounts. Same actor, same shot, native audio per market — without a reshoot.

How the Video Translator works

Step 1

Extract audio

Server-side ffmpeg pulls a clean MP3 track from your upload. Fast — runs on the API VPS.

Step 2

Transcribe

Whisper turns the audio into text with auto language detection. 99-language support out of the box.

Step 3

Translate + re-voice

MadLAD translates into the target language; Kokoro speaks it back in the voice you chose from 174 options.

Step 4

Mix back + (optional) lip-sync

ffmpeg replaces the original audio with the new track. If lip-sync is on, Sync Lipsync v2 re-renders the mouth to match.

Tips for the cleanest translated video

  • Single forward-facing speaker. Multi-speaker scenes confuse the transcript and the lip-sync pass both.
  • Clear audio at -6 dB. Background music or noise reduces transcription accuracy.
  • Under 5-minute clips render in 2-5 minutes (audio path) or 5-15 minutes (with lip-sync).
  • Spanish, French, German, Portuguese, Italian have the most natural Kokoro voice options. Hindi, Arabic, Japanese, Korean, Chinese also work.
Advanced options
ผลลัพธ์
Tokens running low. Get More Tokens
Want better results? Premium models (GPT-5, Claude, Gemini) deliver higher quality. View Plans

❤️ Love this tool? Share it!

Sign up to get a referral link and earn 25,000 tokens per friend.

อยากได้อีกมั้ย Sign up free for 10,000 tokens
ลงทะเบียน

กำลังประมวลผลคำขอของคุณ...

Repost your YouTube in Spanish, Hindi, French — with the speaker's voice in the target language. Whisper transcribes, MadLAD translates, Kokoro re-voices, ffmpeg mixes back. Optional Sync Lipsync v2 re-renders the mouth.

วิธีการใช้ AI Video Translator

1
เติมข้อมูลของคุณ

พิมพ์ข้อความ, โหลดแฟ้ม, หรืออธิบายสิ่งที่คุณต้องการ ไม่จำเป็นต้องมีบัญชีผู้ใช้

2
คลิกสร้าง

ระบบ AI ของเราจะประมวลผลคำขอของคุณในไม่กี่วินาที โดยใช้แบบจำลอง Open Source ที่ยอดเยี่ยม

3
ดาวน์โหลดและแบ่งปัน

ดาวน์โหลด, คัดลอก, หรือแบ่งปันผลลัพธ์ของคุณ ฟรีสำหรับใช้ส่วนตัวและใช้ในเชิงพาณิชย์

Use this tool via API

Automate this tool from your own code. OpenAI-compatible REST endpoint, Bearer-token auth, no extra SDK required. Token costs match the web interface.

curl -X POST https://api.free.ai/v1/video/generate/ \
  -H "Authorization: Bearer sk-free-..." \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat playing piano", "duration": 4}'

AI Video Translator — FAQ

Upload any video (a YouTube clip, a course lecture, a product testimonial) and we translate it into the target language with the speaker's voice re-synthesized in that language. Optionally, lip-sync re-renders the mouth so it actually matches the new audio. The pitch: repost your YouTube in Spanish, Hindi, French — with your face actually saying the words.

Five steps run client-side in sequence: (1) ffmpeg extracts a clean MP3 from the video, (2) Whisper transcribes it (auto-detects 99 languages), (3) MadLAD translates the text, (4) Kokoro speaks the translation in the voice you picked, (5) ffmpeg mixes the new audio back into the original video, replacing the original track. With lip-sync on, a sixth step routes the result through Sync Lipsync v2 to re-render the mouth.

Video Dubbing always runs the full lip-sync pipeline — it's the premium, identity-preserving option. Video Translator makes lip-sync opt-in: skip it for a 3× faster, ~50% cheaper run that's perfect for voice-over-style content where mouth-perfect sync isn't critical (podcasts, screen recordings, cooking videos, anything where the speaker isn't in close-up the whole time).

The audio-only path runs almost entirely on self-hosted models (ffmpeg + Whisper + MadLAD + Kokoro), so it fits inside your daily token allowance for short clips. Lip-sync uses the premium Sync Lipsync v2 model and requires purchased tokens. Estimate: ~1,500 tokens per minute of video for the audio path, ~3,000/min with lip-sync.

The dropdowns expose 20 high-demand languages (Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Arabic, Hebrew, Hindi, Chinese, Japanese, Korean, Vietnamese, Indonesian, Thai, Swedish, English). MadLAD technically supports 450+; ping us if you need others.

Yes — the voice picker pulls all 174 Kokoro voices across 37 languages. Each voice is tagged with a sample so you can audition before committing. Match the voice gender to the speaker for the most natural result.

No — Kokoro picks from 174 stock voices, not a clone of the original speaker. For identity-preserving voice cloning, run our /voice/clone/ tool first to capture the speaker's voice, then use a custom workflow. Voice cloning + auto translation in one pipeline is on the roadmap.

Clips up to 100MB process well. Under 5 minutes renders in 2-5 minutes for the audio path, 5-15 minutes with lip-sync. For longer videos, split into scenes, translate each, and recombine in /video/editor/.

MP4, MOV, WebM, MKV up to 100MB. Single forward-facing speaker gives the cleanest transcript and (if you enable it) the best lip-sync. Background music or multiple speakers reduce transcription accuracy.

Not by this tool — it replaces the audio, not the visuals. To burn translated subtitles into the video, run our /video/caption/ tool on the result. To export an SRT/VTT subtitle file, use /transcribe/ on the original video.

No. Uploads are deleted within minutes of the render. The output sits on our CDN for 24 hours (7 days for paid users) at the share link, then is removed.

Not as a single endpoint — chain the existing live ones: POST /v1/video/to-audio/ → /v1/stt/ → /v1/translate/ → /v1/tts/ → /v1/video/add-audio/, then optionally /v1/video/lip-sync/. Full curl recipes for each at /api/. The frontend orchestrates this exact chain.

Sign up free for 10,000 tokens

สร้างบัญชีผู้ใช้ฟรี

ไม่ต้องใช้บัตรเครดิต

How would you rate this tool?

Love this tool? Share it!