Video Agent is Oakgen's autonomous video generator. You give it a single prompt or a short brief, and it handles the rest — writing the script, breaking it into shots, picking the right video and voice models for each shot, generating voiceover and music, and assembling a finished cut. One input, one output. It is built for creators who want a watchable video out the door in minutes, not a multi-stage project to manage. If you want hands-on control over every shot, every prompt, and every revision, you want Cinema Studio instead. Video Agent is the opposite posture: describe the video, walk away, come back to something you can publish or hand to an editor.
What is Video Agent?
Video Agent is a planning-and-execution layer sitting on top of Oakgen's existing video, voice, and music models. Where the standard AI Video Generator gives you one model producing one clip from one prompt, Video Agent takes a higher-level brief and plans an entire multi-shot sequence around it. It decides how many shots the video needs, what each shot should depict, which model should generate each shot, who should narrate, and what the music bed should feel like. Then it runs the plan end-to-end.
The output is a multi-scene talking video — typically 30 to 120 seconds — with a continuous voiceover, a music track, and visual shots cut to the pacing of the narration. You can regenerate individual pieces after the fact, but the first pass is designed to be good enough to publish as-is for social, marketing, or explainer use cases.
Video Agent is currently in beta on Oakgen. Expect the feature set to expand — particularly around shot editing, avatar options, and fine-grained control over music and voice selection. The core "one prompt to finished video" loop is production-ready for short-form content.
What Video Agent does autonomously
The whole pitch of Video Agent is that it handles the steps a human would otherwise do manually. Here is what the agent actually performs between your prompt and the finished MP4.
1. Script writing from a brief
Video Agent starts by turning your brief into a voiceover script. If you give it a one-line prompt like "60-second explainer on how compound interest works, for beginners", it will expand that into a full script with an intro hook, 3 to 5 main beats, and a call-to-action close. If you give it a longer brief with bullet points, it respects your structure and only fills in the connective tissue. The script is written to be spoken, not read — short sentences, natural phrasing, timed to your target length.
2. Shot list generation
Once the script exists, the agent breaks it into shots. Each sentence or clause is matched to a visual shot description — what should be on screen while that line is spoken. A 60-second video typically produces 8 to 14 shots. The shot list includes the visual description, the approximate duration, and any continuity notes (e.g., "same character as shot 3, different angle").
3. Model selection per shot
This is the step that would take you the longest to do by hand. For each shot, Video Agent picks the best-suited model from Oakgen's catalog — Veo 3 for cinematic motion and native audio, faster models for simple b-roll, talking-avatar models for direct-to-camera narration shots. The agent also chooses aspect ratio, duration, and style parameters per shot rather than applying one global setting.
4. Voiceover generation
Video Agent generates the voiceover as a continuous track using Oakgen's TTS pipeline. You can pick a preset voice or — if you have one saved — your cloned voice from the voice library. Pacing is matched to the shot list so each line lands on the right visual.
5. Music selection
A music bed is generated or selected to match the tone of the script. Upbeat briefs get upbeat music, somber briefs get restrained music. The track is mixed under the voiceover at broadcast-friendly levels — the agent handles the ducking automatically so narration stays intelligible.
6. Final assembly
The last step is the edit. Video Agent stitches the generated shots in script order, aligns them to the voiceover timeline, trims any shot that runs long, and delivers a single MP4. No timeline, no manual cuts. You get a finished video you can either publish directly or drop into a proper NLE for polish.
Video Agent vs Cinema Studio
Oakgen now has two fundamentally different ways to make a multi-shot video. They are not competing products — they solve different problems. Here is how to choose.
| Feature | Dimension | Video Agent | Cinema Studio |
|---|---|---|---|
| Input posture | Single prompt or short brief | Scene-by-scene direction | |
| Time to first cut | 5 – 15 minutes | 30 – 90 minutes | |
| Control level | Low — agent decides | High — you decide | |
| Script handling | Auto-written from brief | You write it or refine it | |
| Shot list | Auto-generated | Manual or guided | |
| Model selection | Automatic per shot | Manual per shot | |
| Voiceover | Generated end-to-end | Per-scene, editable | |
| Best for | Explainers, promos, socials | Narrative, branded films, complex edits | |
| Revision loop | Regenerate whole video | Regenerate individual scenes | |
| Skill required | None — just a brief | Basic video literacy helps |
The short version: use Video Agent when the idea matters more than the execution. Use Cinema Studio when the execution is the idea. If you want a deeper walkthrough of the hands-on option, the Cinema Studio guide covers that workflow end-to-end.
Three walk-through examples
To show what Video Agent actually produces, here are three representative briefs with what the agent does with them.
Example 1 — 30-second product promo from a text brief
Brief: "30-second promo for a new noise-cancelling travel pillow called 'Cloud9'. Target audience is frequent business travellers. Tone: calm, premium, confident. End with the product name and website."
What Video Agent does: writes a 6-line script opening on the traveler's problem (crowded flights, bad sleep), introduces Cloud9 mid-clip, closes with the brand and URL. Breaks it into 7 shots — airport hallway, cramped airplane cabin, close-up of the product, traveler putting it on, traveler sleeping, wide shot landing, logo card. Picks a warm-but-professional voice. Generates understated piano-and-pad music.
Approximate output quality: publishable to paid social directly. Some shots may benefit from a re-roll if a specific detail matters (e.g., exact product colour).
Credit cost: ~3,500–5,000 credits depending on which video models the agent selects.
Example 2 — 60-second explainer from a bullet list
Brief:
Topic: How compound interest works
Audience: 20-somethings new to investing
Tone: friendly, clear, slightly playful
Length: 60 seconds
Key points:
- It's interest on your interest
- $100/mo for 30 years at 8% = ~$150k
- Time matters more than amount
- Start now, even small
What Video Agent does: writes a conversational script that hits all four bullets in order. Generates 11 shots including animated graphs, money-motion b-roll, and two presenter cut-ins. Picks a younger-sounding voice. Uses upbeat, bouncy music that ducks cleanly under narration.
Approximate output quality: excellent for YouTube Shorts, Instagram Reels, or TikTok. Graphs are readable but generic — if you need exact numbers on screen, plan to overlay those in post.
Credit cost: ~6,000–8,500 credits.
Example 3 — 90-second branded story from a rough idea
Brief: "A 90-second brand story for a small-batch coffee roaster called 'North Slope Coffee'. They roast in Alaska and source from Ethiopian farms. The story should feel handmade, grounded, and a little cinematic — not a hard sell. End with an invitation to try their starter pack."
What Video Agent does: writes a quieter, more atmospheric script that earns its length. Generates 13 shots spanning the two geographies — Ethiopian farms, coffee cherries, the roaster in Alaska, finished beans, a customer at home with a cup — and uses the cinematic video models for most of them. Selects a lower-register narration voice. Music is slow, acoustic, and doesn't fight the voiceover.
Approximate output quality: good enough to publish as a homepage hero video or to cut down into 15-second platform-specific variants. Expect to re-roll 2–3 shots to get the exact framing you want.
Credit cost: ~10,000–14,000 credits.
When Video Agent works best
Video Agent is at its strongest when the video's job is to communicate an idea quickly with generic-but-competent visuals. Specifically:
- Explainer videos — concepts, products, features, how-it-works, FAQ answers.
- Product launch shorts — 30–60 second promos introducing a new product with a clear hook and CTA.
- Simple narratives — single-character or product-focused stories under 90 seconds.
- AI-generated b-roll compilations — travel vibes, mood pieces, scene-setting cutaways with voiceover.
- Social-first content — hooks in the first 2 seconds, vertical aspect, captioned-friendly narration.
In all of these, the brief is the hard part and the execution is fairly formulaic. Video Agent closes that execution gap in a single pass.
When to use something else
Video Agent is a planner. It is not a cinematographer. There are videos it cannot do well, and you should reach for a different tool in those cases:
- Documentary-feel work where the specific shots matter — archival inserts, real interview clips, B-roll with a strong point of view.
- Multi-speaker dialogue — two characters actually talking to each other, with back-and-forth cuts.
- Character-heavy narratives where the same character needs to appear consistently across many shots with continuous identity.
- Long-form — anything over 2–3 minutes; the agent will plan it, but pacing degrades.
- Music videos and tightly-cut editorial where beat-matching and frame-level edit decisions drive the piece.
For those, use Cinema Studio scene-by-scene, or pair the AI Video Generator with manual editing.
How to write a prompt Video Agent can act on
The agent reads briefs, not wishes. The more structure in your input, the more predictable the output. Aim for five things: goal, audience, tone, length, key points. You do not have to label them, but they should be recoverable from what you wrote.
Template 1 — the one-line version
[Length] [format] about [topic] for [audience], in a [tone] tone. End with [CTA].
Example: "30-second TikTok about why sourdough tastes better than store bread for home-baking beginners, in a warm and nerdy tone. End with a link to our starter culture kit."
Template 2 — the structured brief
Goal: [what this video is for]
Audience: [who it's for]
Tone: [how it should feel]
Length: [target seconds]
Key points:
- [beat 1]
- [beat 2]
- [beat 3]
CTA: [what you want the viewer to do]
Template 3 — the reference-led brief
Make a [length] video in the style of [reference — a brand, a YouTube channel, a format].
Topic: [what it's about]
Audience: [who watches]
Must include: [specific detail, product name, URL, statistic]
Avoid: [things Video Agent often gets wrong for this brief]
If your first output has a recurring issue — wrong aspect ratio, too corporate, stock-footage vibes — add a one-line "Avoid:" block to the brief and regenerate. The agent's planner respects explicit constraints much better than vague style requests.
Pricing
Video Agent uses the same credit pool as the rest of Oakgen. Because the agent orchestrates multiple model calls — video generation per shot, TTS for voiceover, music generation, and the assembly step — the per-video cost is the sum of those parts, not a flat fee. Expect the following ranges in practice:
- 30-second video: 3,500 – 5,500 credits
- 60-second video: 6,000 – 9,000 credits
- 90-second video: 9,000 – 14,000 credits
- Each regenerated shot: 400 – 1,200 credits depending on the model
Which plan makes sense depends on volume. The Ultimate plan gives most creators enough headroom for 10–15 Video Agent runs per month plus normal image and music generation. The Creator plan is built for agencies and teams producing multiple client videos per week — the credit ceiling is high enough that you won't hit it with typical Video Agent use, and the plan unlocks priority queue handling for faster agent runs during peak hours. See full breakdowns on the pricing page.
If you are running Oakgen for clients, the affiliate program pays recurring commission on every plan you refer — a natural fit if you are already producing videos for other businesses on Oakgen.
FAQ
Can I edit the script Video Agent writes before it generates video? Not in the current beta — the agent runs end-to-end in a single pass. The roadmap includes a "review script" checkpoint between script and shot generation. For now, if the script isn't right, adjust the brief and regenerate.
Can I use my own voice? Yes, if you have a cloned voice saved on Oakgen. The agent will use it as the narrator. If you don't, pick from the preset voice library — there are voices optimised for explainer, narrative, promo, and conversational use.
How long does one Video Agent run take? Typically 5 to 15 minutes for a 30–60 second video, depending on which video models the agent selects and current queue depth. Longer videos and cinematic-model shots take proportionally more time.
What if one shot comes out wrong? You can regenerate individual shots after the first pass. The rest of the video — voiceover, music, other shots — stays put, and the new shot is slotted in. No need to re-run the whole agent.
Does Video Agent handle vertical (9:16) and square (1:1) formats? Yes. Specify the aspect ratio in the brief or pick it in the UI before running. The agent plans shots natively for the chosen aspect — vertical briefs get vertical-first shot framings, not cropped widescreen.
Is the output watermarked? No. Video Agent output is yours to publish, cut, and monetise under Oakgen's standard commercial terms for paid plans.