Mastering “Extend” in Veo 3.1 — Step-by-Step Implementation, Design Patterns, and the Latest API-Ready Guide
TL;DR
- What Extend really is: Veo 3.1 can append to an existing Veo-generated video in 7-second chunks up to 20 times, then export it as one continuous clip (maximum output ≈148 seconds). It’s officially available via the Gemini API, and you can also operate it from Flow and Vertex AI UIs/SDKs.
- Supported channels: Documentation for Gemini API (Veo 3.1/3.1 Fast) and Google Cloud Vertex AI includes Extend procedures. Flow also ships “Scene Extension”–type features and can handle audio.
- Requirements & limits: Only Veo-generated videos can be extended. Input constraints include 9:16 or 16:9 / 720p, etc.
- Who benefits: Ads/SNS ops teams / Newsrooms / UGC platforms / Education & internal training. You can grow short clips into longer ones in a steady 7-second cadence, automated via API.
- Goal of this article: Provide the shortest path to production today (environment → base generation → Extend → QA → ops templates) for both UI and API, with copy-pasteable samples.
Who This Helps Most (Concrete Personas)
- Ads / Social Ops Teams: Take a great 8-second shot and “extend the breath” in 7-second steps to a natural 30–60s cut, then run CTR / retention A/B tests.
- News / Media Desks: Turn a short lead clip into a single finished piece by appending follow-ups and supers.
- UGC / App Dev: Auto-concatenate Veo-generated user clips into templates with a consistent tone.
- Education / Corporate Training: Build Intro → Steps → Summary via staged Extend, making edits/substitutions easy.
We prioritize accuracy here and exclude uncertain info. Specs/limits/channels are confirmed in public docs.
1. The Facts You Need First About Veo 3.1 “Extend” (as of Oct 2025)
- Granularity: 7 seconds per extend, up to 20 extends. Final output is one concatenated video up to ≈148 seconds.
- Input prerequisites: Only videos generated by Veo are eligible. Aspect ratio 9:16 or 16:9; resolution 720p; input ≤141 seconds. Via the Gemini API, Extend requires Veo output as input.
- Where it works: Gemini API (Veo 3.1 / 3.1 Fast). Vertex AI offers both Console and API guidance. Flow provides “Scene Extension,” and audio can be extended as well.
- Related upgrades: Veo 3.1 improves continuity control, including up to 3 reference images and first/last frame interpolation. Combined with Extend, you can scale length while preserving scene consistency.
In short, Extend lets you keep pushing a strong short shot forward—cleanly and in sequence.
2. End-to-End Workflow Design (Covering UI → API)
2-1. Core Flow (Big Picture)
- Lock the concept: Decide the subject, lens feel, color temperature—the spine of continuity.
- Create the short “core”: Generate ~8s of the most emblematic shot (leverage reference images and last-frame).
- Extend in 7-second steps: Split narrative prompt vs. constraint prompt to minimize drift.
- Quality check: Verify visual seams and audio continuity; re-Extend where needed.
- Export / publish: Finalize captions, logo, aspect for the target channel.
2-2. Which channel to use?
- Fast prototyping: Flow (Scene Extension) → Gemini API preview for fine-tuning.
- Production: Use Gemini API or Vertex AI via SDK/REST. If you care about cloud ops (IAM/audit), center on Vertex AI.
3. Learn Extend from the UI (Flow / Vertex AI)
3-1. Flow (Scene Extension) Basics
- Generate a base with Veo 3.1 (use reference images and first/last frame to enforce consistency).
- Choose Extend / Scene Extension, enter your 7-second continuation prompt. Also specify ambience/BGM in the same world.
- Repeat up to 20 times and export as one continuous clip.
3-2. Vertex AI Console Extend
- In Media Studio → Video, select Veo and run AI actions → Extend video on a generated clip. Add a prompt and generate. The API mirrors this flow.
4. Implement Extend with the Gemini API (with Code)
From here we’ll use Gemini API (Veo 3.1) to illustrate the canonical 8-second core → 7-second Extend × N pattern. Limits/prereqs follow the docs.
4-1. Prereqs
- Models:
veo-3.1-generate-previeworveo-3.1-fast-generate-preview - Eligible input: Veo-generated video (9:16/16:9, 720p, ≤141s)
- Extend unit: 7s, up to 20 times (output is one continuous clip, max ≈148s)
4-2. Generate the Base Video (Python)
# pip install google-genai
import time
from google import genai
from google.genai import types
client = genai.Client()
prompt = (
"A cinematic medium shot of a female reporter in a modern newsroom. "
"Cool daylight tone, 35mm look, shallow depth of field. "
"She turns to camera and starts walking forward. Include natural ambience."
)
op = client.models.generate_videos(
model="veo-3.1-generate-preview",
prompt=prompt,
# Tip: you can also pass reference_images (up to 3) and first/last frame
# config=types.GenerateVideosConfig(reference_images=[...], last_frame=...),
)
while not op.done:
time.sleep(10)
op = client.operations.get(op)
video_asset = op.response.generated_videos[0].video
client.files.download(file=video_asset)
video_asset.save("base_8s.mp4")
Tip: Using up to 3 reference images plus first/last frame makes Extend continuity much easier.
4-3. Run a Single 7-Second Extend (Python)
import time
from google import genai
from google.genai import types
client = genai.Client()
extend_prompt = (
"Continue the scene: she stops near the anchor desk, points to the right screen. "
"Keep the same outfit, lighting, and ambience. Camera slowly pushes in."
)
op = client.models.generate_videos(
model="veo-3.1-generate-preview",
video=video_asset, # must be a preceding Veo output
prompt=extend_prompt,
config=types.GenerateVideosConfig(
number_of_videos=1,
resolution="720p", # Extend assumes 720p per spec
),
)
while not op.done:
time.sleep(10)
op = client.operations.get(op)
extended = op.response.generated_videos[0].video
client.files.download(file=extended)
extended.save("extended_15s.mp4")
Key spec: 7-second steps, up to 20. Input must be Veo output, 720p, 9:16 or 16:9, etc.
4-4. Practical “Chain Extend” Template (Python)
def extend_chain(base_video, prompts, max_hops=5):
"""
base_video: handle to a Veo-generated video (client.files.get(...) etc.)
prompts: list of 7-second continuation prompts
max_hops: up to 20 (per spec)
"""
assert len(prompts) <= max_hops <= 20
current = base_video
results = []
for i, p in enumerate(prompts, 1):
op = client.models.generate_videos(
model="veo-3.1-generate-preview",
video=current,
prompt=p,
config=types.GenerateVideosConfig(
number_of_videos=1,
resolution="720p",
),
)
while not op.done:
time.sleep(10)
op = client.operations.get(op)
current = op.response.generated_videos[0].video
results.append(current)
return current, results # current is the continuous clip
4-5. JavaScript (Node.js) Skeleton
import { GoogleAI } from "@google/generative-ai"; // equivalent gemini-js SDK
const ai = new GoogleAI();
const base = await ai.models.generateVideos({
model: "veo-3.1-generate-preview",
prompt: "A cinematic shot in a modern newsroom. Cool daylight, 35mm.",
});
let video = base.response.generatedVideos[0].video;
const prompts = [
"Continue: she approaches the desk; maintain lighting, ambience.",
"Continue: over-shoulder view of the wall monitor; match color tone.",
// ...
];
for (const p of prompts) {
const op = await ai.models.generateVideos({
model: "veo-3.1-generate-preview",
video,
prompt: p,
config: { numberOfVideos: 1, resolution: "720p" }
});
video = op.response.generatedVideos[0].video;
}
// download video.bytes → save
Model IDs/params should follow the official tables (e.g.,
veo-3.1(-fast)-generate-preview,number_of_videos,resolution, etc.).
5. Implement Extend on Vertex AI (SDK / REST)
If you value IAM, audit, and cost visibility, Vertex AI is robust. Both Console and API support Extend.
5-1. Python (GenAI SDK on Vertex AI)
pip install --upgrade google-genai
export GOOGLE_CLOUD_PROJECT=your_project
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True
import time
from google import genai
from google.genai.types import GenerateVideosConfig, Video
client = genai.Client()
# Example: assume a base short was generated (omitted). Now extend 7s on an existing Veo clip.
op = client.models.generate_videos(
model="veo-3.1-generate-preview", # Verify current rollout in docs
prompt="Continue: camera push-in; keep ambience and outfit.",
video=Video(uri="gs://your-bucket/base_8s.mp4", mime_type="video/mp4"),
config=GenerateVideosConfig(
aspect_ratio="16:9",
# In Vertex, writing to GCS is a practical pattern
output_gcs_uri="gs://your-bucket/output/",
# resolution and other params: see current model docs
),
)
while not op.done:
time.sleep(15)
op = client.operations.get(op)
print(op.result.generated_videos[0].video.uri)
The Vertex guides cover UI/REST/SDK Extend steps, plus auth/IAM and GCS outputs.
6. Prompt Design: How to Avoid Breaking Continuity
Recommended split
- Context: World, lensing, color temp, ambience.
- Directive: Concrete subject actions / camera moves.
- Constraints: Fix wardrobe/props/lights, specify NGs (hand artifacts, smeared text, etc.).
Extend sample (every 7s)
[Context]
modern newsroom, cool daylight, 35mm look, shallow depth of field; natural ambience.
[Directive]
continue from previous shot; reporter slows down and glances at the wall monitor; camera pushes in slightly.
[Constraints]
keep same outfit and mic color; keep shadow direction consistent; avoid exaggerated hand deformation.
Using reference images: Provide three references (person/wardrobe/props) to stabilize identity. Combine with first/last frame interpolation for smoother motion joins.
7. Quality Management: Catch Failures Early, Fix Cheaply
- Inspect boundary frames: At each join, verify pose/orientation/shadows consistency.
- Audio continuity: Lock ambience/BGM keys in Context. Fade in post for polish.
- Automate NG detection: Use CLIP similarity or SSIM to score adjacent cut consistency; if under a threshold, re-Extend with the same prompt, and fix seed to reduce jitter.
- Production loop: Iterate “short → Extend → review → re-Extend only where needed” in 7-second units to cut cost/time/failure rate.
Veo 3.1 raises audio integration and Flow editing quality. Since lighting/shadows are more realistic, tightening constraint vocabulary often fixes subtle uncanny moments.
8. Common Questions (API & Operations)
Q1. Can I Extend via API?
→ Yes. Gemini API exposes Veo 3.1/3.1 Fast, and the Extend spec (7s × up to 20) is official. Vertex AI docs also include Extend steps.
Q2. How long can I go?
→ For Veo input ≤141s, append in 7-second steps, and output a single clip up to ≈148s.
Q3. Can I Extend vertical 9:16 videos?
→ Yes. 9:16 or 16:9 are supported (Extend assumes 720p).
Q4. If I want to try the UI first?
→ Flow with Scene Extension is quickest. You can extend with audio, and new editing features are available.
Q5. Pricing/quality?
→ Depends on model, resolution, and length. Veo 3.1 offers 1080p (8s) and 720p long-form, plus a Fast variant. Check the latest pricing/limits in the model docs/announcements.
9. Production Patterns (Gemini / Vertex)
9-1. Driver Swap (Abstraction → Implementations)
class ExtendDriver(Protocol):
def generate(self, prompt: str, **cfg) -> "Video": ...
def extend(self, video: "Video", prompt: str, **cfg) -> "Video": ...
class GeminiVeoDriver(ExtendDriver):
...
class VertexVeoDriver(ExtendDriver):
...
def extend_to_target(video, prompts, driver: ExtendDriver, limit=20):
assert len(prompts) <= limit
current = video
for p in prompts:
current = driver.extend(current, p, resolution="720p")
return current
- Fix an abstract interface so you can switch Gemini ↔ Vertex or future models easily.
- Externalize queueing (RDB/Redis/SQS) and audit logs, and make 7-second retries trivial.
9-2. Template Prompts & a “Continuity Lexicon”
- Keep a Context lexicon (lighting/lens/color temp/ambience).
- Share Constraints (wardrobe/props/logo exposure/negative) in a team standard.
- Codify when to use match cut vs. light dissolve as transition rules.
10. Sample Use Case: Extending a News Opener to 60 Seconds Naturally
Setup: 8-second anchor entrance → walk shot. Unified cool daylight / 35mm look.
Extend 7s × 7 (≈57s) prompts
- #1 (0–7s): “Anchor approaches the desk; gentle dolly-in. Ambience is the newsroom murmur.”
- #2 (7–14s): “Today’s topics faintly reflect on the monitor. Keep wardrobe, hairstyle, and shadow direction.”
- #3 (14–21s): “Picks up a pen and gestures to the right; do not change lighting.”
- #4 (21–28s): “A staff silhouette passes behind a glass panel; keep overall volume steady.”
- #5 (28–35s): “Leave space on the left for supers; plan a match cut.”
- #6 (35–42s): “A small nod and eye shift; preserve facial shape and mouth texture.”
- #7 (42–49s): “Glances down to documents; gently dip ambience a touch.”
Always-on NGs: “No severe finger deformation / no excessive blur or blown highlights / logo exposure under 2s / preserve hair length and color.”
Finishing: Adjust subtitles and BGM fades in post, then crop/re-layout for SNS (9:16) and YouTube (16:9).
11. Troubleshooting & Workarounds
- Subject looks like a different person at the join
→ Fix identity with reference images (up to 3); explicitly lock hair/wardrobe/props in Constraints. - Exposure or color temp drifts
→ Fix Context with color temp and light direction vocabulary; tweak lighting/shadows in Flow if needed. - Audio feels off
→ Lock ambience/BGM keys in prompts; keep volume changes short and smooth. - Hitting length limits
→ Back-plan total runtime to pick extend count. If needed, split into two parts and join in editing.
12. Wrap-Up — Weaving a Story in “7-Second Units”
- Veo 3.1 Extend delivers 7s × up to 20 continuous extensions from Gemini API / Flow / Vertex AI. Start by nailing the input constraints (720p, 9:16 or 16:9, Veo-generated input) and output ceiling (≈148s).
- The craft is to define an 8-second core, then lock Context / Directive / Constraints as a continuity lexicon. With reference images and first/last frames, you’ll get longer, cleaner cuts.
- The operational play is ideate in UI → automate via API. Pair queues, audit logs, and retries, and extend fast, cheap, and reliably in 7-second units.
I think of Extend as “passing the baton of breath.” Keep a good shot from losing steam and hand it to the next 7 seconds—again and again. That’s how you lift retention and immersion. With Veo 3.1 Extend, you can encode that craft into code and design.
References (checked: 2025-10-29)
- Generate videos with Veo 3.1 in the Gemini API (Extend = 7s × up to 20, input constraints, model IDs, reference images, first/last frame)
- Extend Veo on Vertex AI-generated videos (Console / API / SDK steps, auth and GCS output flow)
- Introducing Veo 3.1 (Availability in Gemini API / Vertex AI / Flow; creative feature expansion)
- Veo updates in Flow (Editing features incl. audio; Extend)
- The Verge: Coverage of Flow × Veo 3.1 editing, audio, and Scene Extension
