Cinema Studio is Oakgen's purpose-built cinematic video generator. Unlike the generic AI video generator — which is optimized for quick clips from a single prompt — Cinema Studio takes a full script or scene description, breaks it into shots, and stacks video generation with matching voiceover, music, sound design, and a final color grade. You end up with a polished cinematic clip, not a raw model output.
This guide walks you through the complete workflow, the five-stage pipeline, how to pick the right underlying model for each shot, three real examples with credit cost estimates, the mistakes most first-time users make, and how pricing actually plays out on each plan. If you have 30 minutes and a script, you can export something today.
What is Cinema Studio?
Cinema Studio is a director-in-a-browser. You bring the idea (a product ad, an explainer, a short film, a trailer). Cinema Studio handles the shot breakdown, the camera grammar, the model routing, the audio layering, and the grade. It wraps Oakgen's underlying video models — Veo 3, Kling 2.5, Runway Gen-4, Sora 2, Wan 2.2 — behind a single interface with cinematic presets: camera bodies (Modular 8K Digital, 70mm Grand Format, Classic 16mm Film), lenses (Anamorphic, Swirl Bokeh Portrait, Halation Diffusion), focal lengths (8mm ultra-wide to 85mm portrait), aperture (f/1.4 through f/11), and motion (Dolly In, Crane Up, Orbit, Whip Pan, Tracking).
The underlying mental model: generic video generators ask "what should happen?" Cinema Studio asks "how should this be shot?" That single shift is what separates a clip that looks AI-made from one that looks directed.
Solo creators making ads, short-form trailers, music videos, and branded content. Agencies testing concepts before a live shoot. Product teams building launch videos without a production budget. If you would normally hire a DP, editor, colorist, and composer for a 60-second clip, Cinema Studio compresses that team into one tool.
The 5-Stage Workflow
Every Cinema Studio project flows through five stages. Skipping stages produces worse output — treat this like a pre-production checklist.
Stage 1 — Brief and Script
Start with a brief. One paragraph is enough. State the goal (sell, explain, entertain), the tone (documentary, dreamy, kinetic, noir), the runtime target (30s / 60s / 90s), and the delivery aspect ratio (16:9 for YouTube, 9:16 for Reels/TikTok, 21:9 for cinematic widescreen).
Then write the script. For ads and trailers, write VO lines verbatim — do not paraphrase. For explainers, write beats. For cinematic pieces, write scenes in screenplay format (INT./EXT., subject, action, mood).
Stage 2 — Shot Breakdown
Cinema Studio parses the script into shots. You can edit the breakdown: merge shots, split them, reorder, adjust durations. A good rule: one idea per shot. If a shot is trying to say two things, split it.
Typical shot counts:
- 30s trailer: 8–12 shots
- 60s ad: 10–16 shots
- 90s explainer: 14–22 shots
Stage 3 — Scene Generation (Model Selection)
This is where Cinema Studio shines. For each shot you choose:
- Video model — Veo 3, Kling 2.5, Runway Gen-4, Sora 2, or Wan 2.2
- Camera body + lens + focal length + aperture
- Camera motion (Static, Dolly In/Out, Handheld, Crane, Orbit, Whip Pan, Tracking)
- Aspect ratio + resolution + duration (5s / 10s / 15s, 480p / 720p / 1080p)
- Optional reference image for subject/style consistency
Mixing models across shots is expected. Use Veo 3 for establishing and dialogue shots, Kling for dynamic motion, Runway for stylized looks.
Stage 4 — Audio Layer
Cinema Studio stitches three audio tracks:
- Voiceover — synthesized via ElevenLabs (synchronous, returns audio immediately). Pick a voice, paste your VO script, render.
- Music — generated from a brief ("slow ambient piano, melancholic, 80 BPM, no drums"). The music auto-ducks under dialogue.
- Sound design — ambience and key SFX (footsteps, impacts, whooshes) pulled from a curated library or generated.
Stage 5 — Color Grade and Export
Apply a grade preset — Teal & Orange, Cross-Process, Bleach Bypass, Warm Film, Clinical Clean, or a neutral Rec.709. Cinema Studio applies the grade per-shot for consistency across mixed-model output, then exports MP4 (H.264) or ProRes at up to 1080p.
Choosing the Right Video Model for Your Shot
Cinema Studio does not pick the model for you — and it shouldn't. Each model has a distinct signature:
| Feature | Model | Best Use | Strength | Avoid For |
|---|---|---|---|---|
| Veo 3 | Dialogue, ambient realism | Native audio, prompt adherence | Fast whip pans | |
| Kling 2.5 | Action, dynamic motion | Physics, motion coherence | Talking heads | |
| Runway Gen-4 | Stylized, editorial | Look consistency via references | Photoreal product | |
| Sora 2 | World-building, wide shots | Scene complexity | Tight close-ups | |
| Wan 2.2 | Budget shots, b-roll | Cost efficiency | Hero moments |
For a deeper head-to-head read Veo 3 vs Kling 2.5 vs Wan 2.2. For why Veo 3 in particular changed cinematic AI video, see Cinematic AI Video with Veo 3.
Decision guide:
- Shot has dialogue or ambient sound → Veo 3
- Shot is pure motion (car chase, parkour, fight) → Kling 2.5
- Shot must match a specific visual style or reference → Runway Gen-4
- Shot is a wide establishing or world reveal → Sora 2
- Shot is b-roll, fill, or transition → Wan 2.2
3 Walk-Through Examples
Example A — 60-second Product Ad (DTC Skincare)
"60-second hero ad for a vitamin C serum. Morning routine vibe. Sunlit bathroom, glass textures, droplet hero shots. Voice: warm female, 30s. Music: slow piano, uplifting. End on product + tagline."
Shot plan (11 shots):
- Wide — sunlit bathroom — Sora 2, Full-Frame Cine Digital, 35mm, f/4, Static, 5s
- CU hands under water — Veo 3, Premium Modern Prime, 50mm, f/1.4, Handheld, 5s
- Macro — serum droplet on fingertip — Kling 2.5, Extreme Macro, f/1.4, Slow Dolly In, 5s
- MS subject applying serum — Veo 3, 50mm, f/1.4, Orbit, 10s
- Insert — bottle on marble — Kling 2.5, 85mm, f/1.4, Slow Dolly In, 5s
- Action — pipette dispense — Kling 2.5, Extreme Macro, Static, 5s
- CU skin texture hero — Veo 3, Swirl Bokeh Portrait, 85mm, f/1.4, Static, 5s
- MS smile — Veo 3, Warm Cinema Prime, 50mm, f/1.4, Handheld, 5s
- Wide — window light glow — Sora 2, 24mm, f/4, Crane Up, 5s
- Product beauty shot — Runway Gen-4, 85mm, f/1.4, Slow Dolly Out, 5s
- Endframe — logo + tagline — Runway Gen-4, Static, 5s
Audio: ElevenLabs VO (warm female, 22s total over shots 2–9). Generated piano bed (80 BPM, C major). SFX: water, droplet, pipette click.
Grade: Warm Film.
Credit cost estimate: ~8,500 credits (roughly $33 retail value at 260 credits/USD). Veo 3 shots drive most of the cost; Wan 2.2 substitutions on shots 1 and 9 would cut it by ~25%.
Example B — 90-second SaaS Explainer
"90-second explainer for a team task manager. Friendly, kinetic. Show the problem (chaos), the solution (calm), the outcome (shipped on time). Voice: neutral male. Music: upbeat indie."
Shot plan (17 shots): Open on montage of sticky notes and overflowing inboxes (Wan 2.2 b-roll, 3 shots), cut to protagonist frustrated at laptop (Veo 3, 2 shots), whip pan to product UI close-ups (Kling 2.5, 4 shots), outcome montage — team celebrating, charts up, inbox zero (mix of Veo 3 and Runway, 5 shots), endframe (Runway, 3 shots).
Audio: 82s VO, upbeat indie music, UI SFX (clicks, whooshes, confirms).
Grade: Clinical Clean.
Credit cost estimate: ~9,800 credits (~$38). Explainers are cheaper per-second than ads because you can lean on Wan 2.2 for filler shots.
Example C — 30-second Cinematic Trailer (Indie Film)
"30-second teaser for a psychological thriller. Night, rain, neon. One protagonist. No dialogue. Music-led. End on title card."
Shot plan (9 shots):
- Wide — rainy neon street — Sora 2, 70mm Grand Format, 24mm, f/1.4, Slow Dolly In, 5s
- CU — eye reflection of neon — Kling 2.5, Extreme Macro, f/1.4, Static, 5s
- MS — protagonist walking — Veo 3, Classic Anamorphic, 35mm, f/1.4, Tracking, 5s
- Insert — hand clenching — Kling 2.5, 85mm, f/1.4, Static, 3s
- Whip pan — phone ringing — Kling 2.5, 50mm, Whip Pan, 2s
- CU — face lit by screen — Veo 3, Halation Diffusion, 85mm, f/1.4, Static, 5s
- Wide — running through alley — Kling 2.5, 24mm, f/4, Handheld, 5s
- Black beat — 1s
- Title card — Runway Gen-4, Static, 4s
Audio: No VO. Tense cinematic score (generated, 60 BPM, minor key, rising). SFX: rain, neon hum, phone vibration, footsteps.
Grade: Teal & Orange.
Credit cost estimate: ~6,200 credits (~$24). Short runtime + music-led = cheapest format per second of emotional impact.
For twelve more Cinema Studio format ideas, see 12 Types of Videos to Make in Cinema Studio.
Common Mistakes
- Over-describing the shot. New users write three sentences per shot. The model drowns. Keep each shot prompt to one subject + one action + one environment. Let the camera body, lens, focal length, and motion do the styling — that's why those controls exist.
- Wrong aspect ratio picked after generation. Pick your delivery aspect ratio (9:16 for Reels, 16:9 for YouTube, 21:9 for cinematic) before you generate. Cropping a 16:9 shot to 9:16 in post throws away two thirds of the frame and breaks composition.
- Music drowning dialogue. Generated music defaults to full mix volume. Turn on auto-duck (default on, confirm it's enabled) or manually drop music to -18 LUFS under VO. If your audience can't hear the VO at laptop speakers, the ad failed.
- Too many cuts. A 30s piece with 20 shots feels frantic, not cinematic. Cinema reads best at roughly one shot per 3–5 seconds for normal pacing, one shot per 1–2 seconds for montages only. Default to fewer, longer shots.
- Ignoring shot consistency. If your protagonist is in shots 3, 4, and 7, use the reference image control on each shot — or generate shot 3 first and feed that frame in as a reference for 4 and 7. Without references, the protagonist's face will drift across shots and the clip will feel like three different people.
Do not generate every shot with the same model just because it's familiar. A 60-second clip made entirely with Veo 3 costs more and looks flatter than one that routes action to Kling, wides to Sora, and style to Runway. Cinema Studio's multi-model routing is the feature — use it.
Pricing
Cinema Studio runs on Oakgen credits. 1 USD = 260 credits. Credit cost per shot depends on the model, resolution, and duration.
| Feature | Shot Type | Model | Duration | Resolution | Approx Credits |
|---|---|---|---|---|---|
| Budget b-roll | Wan 2.2 | 5s | 720p | ~180 | |
| Standard shot | Kling 2.5 | 5s | 1080p | ~450 | |
| Dialogue shot (audio incl.) | Veo 3 | 5s | 1080p | ~900 | |
| Stylized shot | Runway Gen-4 | 5s | 1080p | ~600 | |
| Wide establisher | Sora 2 | 10s | 1080p | ~1,400 |
Rough end-to-end project cost:
- 30s trailer (9 shots, mixed models, no VO): ~6,000–7,000 credits
- 60s ad (11 shots, full audio): ~8,000–9,500 credits
- 90s explainer (17 shots, b-roll heavy): ~9,500–11,000 credits
Free tier: 50 starter credits + 7-day trial — enough to generate one or two single Cinema Studio shots, not a finished project. Use it to test the interface, then upgrade.
Ultimate ($29/mo): ~7,500 credits/month — one 60s ad per month with room for a couple of iterations, or three 30s trailers.
Creator ($99/mo): ~26,000 credits/month — roughly three full 60s ads per month, or two ads plus a 90s explainer, with meaningful iteration headroom. This is the plan for creators actually shipping Cinema Studio work professionally.
See the full breakdown on pricing. If you run an agency or content operation shipping cinematic work for clients, our affiliate program pays 25% recurring for 6 months on every customer you refer — most serious Cinema Studio users cover their own plan within the first two referrals.
FAQ
Can I use Cinema Studio output commercially? Yes. All output from Cinema Studio on paid Oakgen plans is licensed for commercial use — ads, client work, monetized content, merchandise. Free tier is personal/evaluation only. Check pricing for license details per plan.
What resolution does Cinema Studio export? Up to 1080p MP4 (H.264) or ProRes 422. Individual shots can be rendered at 480p / 720p / 1080p depending on the model and your credit budget. 4K is on the roadmap — for now, a 1080p Cinema Studio export upscaled through a dedicated upscaler is the standard path to 4K delivery.
Does Cinema Studio support voice cloning? Yes — via the ElevenLabs voiceover layer. Upload a 1–3 minute clean voice sample, name the voice, and it becomes selectable inside Cinema Studio's VO step. You can also use any ElevenLabs stock voice if you don't want to clone.
Can I edit after generation? Yes, non-destructively. You can regenerate any individual shot, swap the model on a shot, adjust the camera controls, change the grade, or re-cut the timeline — all without re-rendering the whole project. Each shot is a discrete asset until you export.
How long does a full Cinema Studio project take to render? A 60-second, 11-shot project typically finishes all generation in 4–8 minutes of wall time (shots render in parallel). Audio and grade apply in under 30 seconds. You're looking at under 10 minutes from "submit" to "download" for most projects.
Does Cinema Studio work for vertical (9:16) Reels and TikTok? Yes. Pick 9:16 at the brief stage and every shot will be composed vertically by the underlying model. Do not generate 16:9 and crop — always pick 9:16 from the start. For a worked vertical example, see our Sora-style cinematic Instagram Reels guide.
Open Cinema Studio, paste a one-paragraph brief, and let the shot breakdown render. Thirty minutes from now you can have something a production company would have charged $5,000 for a year ago.