Skip to main content

Veo 3.1 Video Prompting Guide

Veo 3.1 pairs crisp 720p/1080p video with synced audio, so your prompt needs to read like a mini shot list. Use the patterns below to steer visuals, sound, and timing with confidence.
Coming from Wan 2.2? Start here, then compare with the Wan 2.2 Video Generation Guide for Wan-specific motion cues.

What Veo 3.1 Can Do πŸŽ›οΈ

  • Resolution and length: 720p or 1080p clips at 4, 6, or 8 seconds; 16:9 or 9:16 aspect.
  • Audio-native: Generates dialogue, SFX, and ambience directly from your text cues.
  • Complex scenes: Handles multi-character blocking, camera choreography, and style references.
  • Image-to-video: Animate a source frame with stronger prompt adherence and audio.
  • Ingredients to video: Feed reference images for characters, props, or locations to lock consistency.
  • First and last frame: Blend a start and end image into one transition with audio.
  • Add/remove object: Insert or delete elements (runs on Veo 2 today and skips audio).

The Five-Part Prompt Formula 🧭

[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]
  • Cinematography: Shot type and camera move (crane shot rising, slow pan, POV).
  • Subject: Who or what we see.
  • Action: The beat happening right now.
  • Context: The space, background, and props.
  • Style & ambiance: Lighting, mood, color, medium, and film vibes.
Example prompt
Medium shot, a tired corporate worker rubs his temples in front of a bulky 1980s computer in a cluttered late-night office. Harsh fluorescent overheads plus the green glow of the monitor. Retro color film, slight grain, moody.
Start with this scaffold, then tune any dial (camera, action, audio, style) without rewriting the whole thing.

Essential Controls 🎚️

Cinematography language
  • Movement: Dolly, tracking, crane, aerial, slow pan, POV.
  • Composition: Wide, close-up, extreme close-up, low angle, two-shot.
  • Lens & focus: Shallow depth of field, wide-angle, soft focus, macro, deep focus.
    Example: Crane shot starts low on a lone hiker, then rises to reveal a mist-filled canyon at sunrise, soft morning light, epic fantasy tone.
Audio direction
  • Use quotation marks for dialogue: "We have to leave now."
  • Prefix sound effects: SFX: thunder cracks in the distance.
  • Call out ambience: Ambient noise: quiet starship bridge hum.
Negative prompts
State what to exclude with detail: Desolate landscape with no buildings or roads. Avoid vague no structures.
Gemini assist
If a prompt feels thin, ask Gemini to expand it with richer cinematography and sensory cues before sending to Veo.

Quick Starter Prompts 🚦

Text-to-video skeleton
Wide shot, [subject] [action], in [context]. [Camera move], [lighting], [style]. Dialogue/SFX if needed.
Image-to-video focus
Tracking shot that [motion], keeping the subject centered. Ambient: [sound]. Style: [look].
Use when your source frame already defines subject and scene.
Audio-first
Close-up on [subject] as they say "[dialogue]." Background: [context]. SFX: [list]. Mood: [tone].

Advanced Workflows 🎬

1) Dynamic transition with First and Last Frame
  1. Generate a starting still.
  2. Generate the ending still from another POV or moment.
  3. In Veo: upload both images and prompt the bridge, including audio.
    Prompt: The camera arcs 180 degrees from the singer's front to the POV from behind on stage. She sings "when you look me in the eyes, I can see a million stars." Crowd roars, stage lights flare.
2) Dialogue scene with Ingredients to Video
  1. Make reference images for each character and the setting.
  2. Load them into Ingredients to Video.
  3. Prompt each shot so faces, outfits, and set stay consistent.
    Prompt: Using the provided detective, woman, and office images, medium shot of the detective behind his desk. He looks up and says in a weary voice, "Of all the offices in this town, you had to walk into mine."
3) Timestamp prompting for multi-shot pacing
Direct several beats in one generation by assigning times:
[00:00-00:02] Medium shot from behind a young explorer pushing aside a jungle vine to reveal a hidden path.
[00:02-00:04] Reverse shot on her freckled face, awe at moss-covered ruins. SFX: rustling leaves, distant birds.
[00:04-00:06] Tracking shot as she steps into the clearing, hand on carved stone. Emotion: wonder.
[00:06-00:08] Wide, high-angle crane shot showing the vast temple complex. SFX: gentle orchestral swell.

Final Checks βœ…

  • Keep the five-part formula handy; upgrade one part at a time.
  • Call out camera moves and lenses to set tone quickly.
  • Write audio like a scriptβ€”quotes for dialogue, tags for SFX and ambience.
  • Lock continuity with reference images or start/end frames when needed.
  • Iterate: shorten, lengthen, or swap moves until the motion feels right.