Veo 3.1 Video Prompting Guide
Veo 3.1 pairs crisp 720p/1080p video with synced audio, so your prompt needs to read like a mini shot list. Use the patterns below to steer visuals, sound, and timing with confidence.Coming from Wan 2.2? Start here, then compare with the Wan 2.2 Video Generation Guide for Wan-specific motion cues.
What Veo 3.1 Can Do ποΈ
- Resolution and length: 720p or 1080p clips at 4, 6, or 8 seconds; 16:9 or 9:16 aspect.
- Audio-native: Generates dialogue, SFX, and ambience directly from your text cues.
- Complex scenes: Handles multi-character blocking, camera choreography, and style references.
- Image-to-video: Animate a source frame with stronger prompt adherence and audio.
- Ingredients to video: Feed reference images for characters, props, or locations to lock consistency.
- First and last frame: Blend a start and end image into one transition with audio.
- Add/remove object: Insert or delete elements (runs on Veo 2 today and skips audio).
The Five-Part Prompt Formula π§
[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]
- Cinematography: Shot type and camera move (
crane shot rising,slow pan,POV). - Subject: Who or what we see.
- Action: The beat happening right now.
- Context: The space, background, and props.
- Style & ambiance: Lighting, mood, color, medium, and film vibes.
Medium shot, a tired corporate worker rubs his temples in front of a bulky 1980s computer in a cluttered late-night office. Harsh fluorescent overheads plus the green glow of the monitor. Retro color film, slight grain, moody.
Start with this scaffold, then tune any dial (camera, action, audio, style) without rewriting the whole thing.
Essential Controls ποΈ
Cinematography language- Movement: Dolly, tracking, crane, aerial, slow pan, POV.
- Composition: Wide, close-up, extreme close-up, low angle, two-shot.
- Lens & focus: Shallow depth of field, wide-angle, soft focus, macro, deep focus.
Example:Crane shot starts low on a lone hiker, then rises to reveal a mist-filled canyon at sunrise, soft morning light, epic fantasy tone.
- Use quotation marks for dialogue:
"We have to leave now." - Prefix sound effects:
SFX: thunder cracks in the distance. - Call out ambience:
Ambient noise: quiet starship bridge hum.
State what to exclude with detail:
Desolate landscape with no buildings or roads. Avoid vague no structures.
Gemini assistIf a prompt feels thin, ask Gemini to expand it with richer cinematography and sensory cues before sending to Veo.
Quick Starter Prompts π¦
Text-to-video skeletonWide shot, [subject] [action], in [context]. [Camera move], [lighting], [style]. Dialogue/SFX if needed.
Image-to-video focusTracking shot that [motion], keeping the subject centered. Ambient: [sound]. Style: [look].Use when your source frame already defines subject and scene. Audio-first
Close-up on [subject] as they say "[dialogue]." Background: [context]. SFX: [list]. Mood: [tone].
Advanced Workflows π¬
1) Dynamic transition with First and Last Frame- Generate a starting still.
- Generate the ending still from another POV or moment.
- In Veo: upload both images and prompt the bridge, including audio.
Prompt:The camera arcs 180 degrees from the singer's front to the POV from behind on stage. She sings "when you look me in the eyes, I can see a million stars." Crowd roars, stage lights flare.
- Make reference images for each character and the setting.
- Load them into Ingredients to Video.
- Prompt each shot so faces, outfits, and set stay consistent.
Prompt:Using the provided detective, woman, and office images, medium shot of the detective behind his desk. He looks up and says in a weary voice, "Of all the offices in this town, you had to walk into mine."
Direct several beats in one generation by assigning times:
Final Checks β
- Keep the five-part formula handy; upgrade one part at a time.
- Call out camera moves and lenses to set tone quickly.
- Write audio like a scriptβquotes for dialogue, tags for SFX and ambience.
- Lock continuity with reference images or start/end frames when needed.
- Iterate: shorten, lengthen, or swap moves until the motion feels right.