AI Lip Sync
AI lip sync aligns a video's mouth movements to a separate audio track — a workflow that used to require professional ADR. Oakgen's lip sync handles dubs into new languages, voice replacements, and animations of still portraits, producing natural mouth motion that matches phonemes instead of just shapes.
Key fact
Oakgen's lip sync matches phonemes, not just open/closed mouth shapes — English-to-Japanese dubs look native, not pasted over.
Why AI Lip Sync
Video or still image
Works on video clips up to 30 seconds or single portraits animated to match an audio track.
Language-agnostic
Original English video, Japanese dub — the lips match the new audio, not the old.
60–90 second processing
A 10-second clip typically renders in about 60 seconds. Longer clips scale linearly.
How it works
- 1Upload the video or photoVideo up to 30 seconds, or a single frontal portrait. The face should be clearly visible.
- 2Upload the audio trackMP3, WAV, or M4A. Audio can be longer or shorter than the video — the output matches the audio length.
- 3GeneratePreview and download the lip-synced output as MP4 at original resolution.
Who uses this
Best models for AI Lip Sync
Frequently asked questions
Can I use lip sync for dubbing foreign films?
Yes — this is one of the most common use cases. Upload the original footage, upload the new-language audio, and Oakgen re-renders the mouth region to match.
Does the rest of the face change?
No. Only the mouth region (and minor jaw motion) is re-rendered. Eye contact, expressions, and head motion are preserved from the source video.
How much does lip sync cost?
About 150 credits (~$0.60) per 10-second clip. A 30-second clip costs ~450 credits (~$1.80).