tutorials

The Complete Kling 3 Prompting Guide (2026)

Oakgen Team8 min read
The Complete Kling 3 Prompting Guide (2026)

Kling 3 is the model to reach for when you want controlled motion, stable characters, and genuine cinematic feel in a short clip. It is strongest on deliberate camera moves (dolly, crane, tracking), subject consistency across a 5-10 second window, dynamic physical action (sports, cloth, hair, water), and lighting that actually behaves like lighting. It is weaker on complex dialogue lipsync, clips longer than 10 seconds, tiny hand detail, and reading small on-screen text.

The single biggest mistake with Kling 3 is writing prompts the way you write for an image model. Kling rewards prompts that describe a shot, not a picture — meaning it wants a subject doing something, a camera doing something, and a timeline. This guide gives you the anatomy, the motion vocabulary, the consistency tricks, and 20 copy-paste templates. Open it in one tab and Oakgen's AI Video Generator in the other.

The prompt anatomy

Every Kling 3 prompt that reliably works has six elements. Miss any one and the model fills the gap with a default, which is where "AI look" creeps in.

  1. Subject — Who or what is in frame, described with concrete visual nouns. Not "a person" but "a woman in her late 20s, dark curly hair, charcoal wool coat."
  2. Action — What the subject is doing, as a verb phrase tied to time. Not "dancing" but "turns toward the camera over two seconds, then smiles."
  3. Environment — Where the shot lives. Interior or exterior, time of day, weather, depth cues. "Rain-slick Tokyo side street, neon signs reflecting on wet asphalt, 9pm."
  4. Camera — Lens choice, framing, and movement. "35mm wide, medium shot, slow dolly-in from shoulder height."
  5. Motion and timing — Speed, direction, choreography. "Subject walks left-to-right at natural pace, camera follows at 0.6x speed."
  6. Style — Look and finish. "Shot on ARRI Alexa, shallow depth of field, muted teal-orange grade, 24fps cinematic motion blur."

Drop each of these in as a clause separated by commas or semicolons. Kling 3 parses clause-by-clause, so order matters less than presence.

The 80-word sweet spot

Kling 3 prompts perform best between 50 and 90 words. Under 40 words the model invents too much. Over 120 words it starts dropping clauses. If you need more detail, split the shot across two generations and stitch in Cinema Studio.

Motion control — Kling 3's headline feature

Motion control is what separates Kling 3 from most competitors. You can specify camera speed, subject speed, direction vectors, and even relative motion between the two. The vocabulary below is what the model actually obeys.

Camera movement verbs that work reliably:

  • dolly-in / dolly-out — push toward or pull away from the subject along the lens axis
  • truck left / truck right — lateral slide parallel to the subject
  • pedestal up / pedestal down — vertical rise or fall, camera stays level
  • tilt up / tilt down — camera body stays, lens pivots vertically
  • pan left / pan right — camera body stays, lens pivots horizontally
  • crane up / crane down — large arcing vertical move
  • orbit / arc around subject — circle the subject at fixed radius
  • handheld follow — tracked motion with subtle shake
  • static lock — no camera movement at all

Speed modifiers: append at 0.3x speed, at 0.5x speed, at natural pace, quickly, or snap-cut style. Kling 3 interprets 0.5x as roughly half the default cinematic pace, which is often what you want for drama.

Subject motion that composes well with the camera:

Tell the model what the subject is doing relative to the camera move. A dolly-in on a static subject reads differently from a dolly-in while the subject walks toward camera. Example:

Medium shot of a chef plating a dish, dolly-in at 0.5x speed from waist height, chef glances up at the camera at the 3 second mark, shallow depth of field, 35mm, warm kitchen lighting.

That prompt has four compositional decisions stacked: camera move, camera speed, subject action, and a timing marker. Kling 3 handles all four.

Direction and pacing example:

Dolly-in at 0.5x speed, subject walks left-to-right at natural pace, camera follows at shoulder height, background traffic moves right-to-left creating parallax, 24fps.

The parallax cue is the trick — it tells Kling 3 to move background elements independently, which is what sells a shot as real.

Character consistency

Kling 3 is one of the better models for holding a character together inside a single clip. Across multiple clips, you need to do some work. Two patterns cover 90% of cases.

Pattern 1: Reference image seeding. Generate your character as a still first (Oakgen's image generator is a good starting point, or use a real photo), then use that image as the starting frame in image-to-video mode. Kling 3 inherits the face, wardrobe, and build. For the next clip, generate a new still using the last frame of the previous clip as a reference, then feed that into Kling 3. You are essentially doing a frame chain.

Pattern 2: Locked-phrase description. When you are doing pure text-to-video and cannot use a reference image, write a 15-20 word character block and copy-paste it identically into every prompt. The repetition locks the model's sampling.

Example locked phrase:

a woman, late 20s, shoulder-length dark curly hair, warm olive skin, charcoal wool coat over a white tee, silver hoop earrings

Use that exact string at the start of every prompt across the clip series. Combine it with consistent lighting and environment cues and you will hold a character across 4-6 shots reliably.

Avoid: changing lens, lighting temperature, or wardrobe clause mid-series. Each of those shifts nudges the sampling, even if your character phrase is identical.

Camera directives Kling 3 obeys

These are the specific phrases I have tested and confirmed Kling 3 renders correctly. Use them verbatim.

  • Framing: extreme wide, wide shot, medium shot, medium close-up, close-up, extreme close-up, over-the-shoulder, low angle, high angle, Dutch tilt
  • Lens: 14mm ultra wide, 24mm, 35mm, 50mm, 85mm portrait, 100mm macro, anamorphic
  • Depth: shallow depth of field, rack focus from foreground to background, deep focus, bokeh highlights
  • Film stock look: shot on ARRI Alexa, Kodak 500T look, 16mm grain, digital clean
  • Frame rate cues: 24fps cinematic motion blur, 60fps crisp slow-mo, 120fps super slow-mo

Combine sparingly. A prompt with three directive clauses outperforms a prompt with seven.

20 copy-paste templates

Replace the {bracketed} tokens. Each template is tuned for a 5-10 second clip.

Cinematic

1. Wide establishing:

Extreme wide shot of {location} at {time of day}, static lock, {atmospheric element — fog, rain, dust}, 24mm lens, deep focus, shot on ARRI Alexa, muted {color} grade, 24fps cinematic motion blur.

2. Character close-up:

Close-up of {character description}, 85mm portrait lens, shallow depth of field, slow dolly-in at 0.4x speed, {character} {small action — blinks, exhales, turns head}, soft window light from camera left, shot on ARRI Alexa.

3. Over-shoulder dialog:

Over-the-shoulder medium shot, {character A} in foreground (blurred), {character B} in focus across the table, warm restaurant lighting with practical bulbs in background bokeh, 50mm lens, static lock, {character B} glances down then back up, 24fps.

4. Action reveal:

Low angle medium shot of {subject}, crane up at 0.5x speed revealing {environment reveal}, {subject} holds their position, golden hour backlight, 35mm lens, volumetric light rays, shot on ARRI Alexa.

Product

5. Rotating hero:

Center-framed hero shot of {product} on a {surface}, orbit around subject at 0.6x speed, seamless studio backdrop in {color}, three-point lighting with a strong rim light, 100mm macro, shallow depth of field, 60fps crisp.

6. Lifestyle:

Medium shot, {person description} using {product} in a sunlit {location}, handheld follow at natural pace, natural window light, candid framing, 35mm lens, film grain, Kodak 500T look.

7. In-use:

Close-up of hands {action with product}, static lock with slight tilt down at the 2 second mark, soft overhead softbox light, dark contrast background, 50mm lens, shallow depth of field, 24fps.

8. Macro:

Extreme close-up of {product surface detail — texture, stitching, engraving}, slow dolly-in at 0.3x speed, 100mm macro lens, razor-thin depth of field, dramatic side light with soft fill, studio clean background.

Social

9. Instagram Reel:

9:16 vertical, medium shot of {subject} {action}, handheld follow at natural pace, punchy saturated color grade, bright daylight, 35mm lens, 30fps, upbeat energetic framing.

10. TikTok opener:

9:16 vertical, snap zoom-in on {subject} making direct eye contact, high energy, bright studio lighting, 50mm lens, 30fps, clean white background, subject {gesture — points, waves, shrugs} at the 1 second mark.

11. YouTube short:

9:16 vertical, medium close-up of {creator} in their studio, static lock, soft RGB accent lighting in background, 35mm lens, shallow depth of field, natural skin tones, 24fps, {creator} {small expressive beat}.

12. Story:

9:16 vertical, POV shot from {perspective}, handheld slow walk through {environment}, natural ambient lighting, 24mm lens, deep focus, 30fps, documentary feel.

B-roll

13. City:

Wide shot of {city} skyline at {time of day}, slow truck right at 0.4x speed revealing {landmark}, atmospheric haze, 24mm lens, deep focus, teal-orange grade, 24fps.

14. Nature:

Medium shot of {natural element — leaves, water, grass} moving in the wind, static lock, golden hour backlight, 85mm lens, shallow depth of field, 60fps crisp slow-mo.

15. Studio:

Close-up of {object} on a textured surface, slow dolly-in at 0.3x speed, single hard light from camera right, deep shadows, 50mm lens, 24fps cinematic motion blur.

16. Texture:

Extreme close-up of {texture — fabric, liquid, stone}, orbit at 0.5x speed, dramatic raking side light, 100mm macro lens, razor-thin depth of field, clean black background.

Explainer

17. Process:

Top-down overhead shot of {process — hands assembling, ingredients combining}, static lock with slight pedestal down at the 3 second mark, soft overhead softbox, clean white surface, 50mm lens, 30fps.

18. Data viz:

Medium wide shot of an animated {chart or diagram} floating in a dark studio space, slow dolly-in at 0.4x speed, soft blue accent light, 35mm lens, shallow depth of field, clean minimal aesthetic, 24fps.

19. Metaphor:

Medium shot of {metaphor subject — a key turning in a lock, a seed sprouting, a puzzle piece fitting}, slow dolly-in at 0.3x speed, dramatic directional light, 85mm lens, shallow depth of field, muted color grade, 24fps.

20. Transition:

Whip pan from {scene A — a closing laptop} to {scene B — a coffee cup steaming}, handheld energy, matched warm lighting across both, 35mm lens, 30fps, snappy cut-worthy rhythm.

Kling 3 vs Seedance 2 vs Veo 3.1

Short version: pick based on what you need.

FeatureNeedKling 3Seedance 2Veo 3.1
Controlled camera movesBestGoodGood
Character consistency (5-10s)BestGoodVery good
Native audio / lipsyncLimitedNoBest
Physical realism (cloth, water)BestVery goodGood
Long duration (15s+)WeakGoodVery good
Prompt followingVery goodBestVery good
Speed on OakgenMediumFastMedium

Use Kling 3 when the shot needs motion integrity — a smooth dolly, a stable orbit, real physical action. Use Seedance 2 when you need fast iteration and strong prompt adherence. Use Veo 3.1 when audio, dialogue, or longer shots matter. All three live inside Oakgen's AI Video Generator — you can swap models without rewriting your prompt. For deeper breakdowns see our Kling 3 launch notes, the Veo vs Kling vs Wan roundup, the Seedance prompting guide, and the Veo 3 prompting guide.

Where Kling 3 struggles

Being honest about the failure modes saves you credits.

  • Complex speech and lipsync — Kling 3 can generate talking heads, but sustained dialogue with perfect phoneme-accurate lip movement is still Veo 3.1 territory.
  • Clips over 10 seconds — Quality degrades past the 10s mark. Generate short and stitch in Cinema Studio instead.
  • Fine anatomy — Hands holding small props, tight fingerwork, subtle facial micro-expressions can drift. Keep those actions brief and in medium-or-wider framing.
  • Readable on-screen text — Signage, screens, and subtitles often come out garbled. Render real text in post.
  • Extreme multi-character scenes — 4+ named characters in frame stretch consistency. Two to three is the reliable ceiling.
Credit tip

Kling 3 clips cost more than Seedance or Flux-based generators. Draft your concept on a cheaper model first, lock the composition, then re-run on Kling 3 for the final take. See current credit rates on the pricing page.

FAQ

How long can a Kling 3 clip be? Up to 10 seconds reliably. You can push to 15 but expect motion drift and consistency degradation.

Does Kling 3 support image-to-video? Yes, and it is the recommended workflow for character consistency. Seed with a strong still frame.

What aspect ratios work? 16:9, 9:16, 1:1, and 4:3 are all well supported. Vertical 9:16 is as stable as horizontal.

Can I control exact camera speed in seconds? Not in seconds directly, but speed modifiers like 0.3x, 0.5x, natural pace and timing cues like at the 3 second mark get you close.

How do I try Kling 3 on Oakgen? Open the AI Video Generator, pick Kling 3 from the model dropdown, paste a template from this guide, and hit generate. If a friend sent you, they may have a referral link worth extra credits.

kling 3 promptskling promptingai video promptskling 3 tutorialmotion control prompts
Share

Related Articles