person holding camera with red lights on lens
Photo by Lisa Fotios on Pexels.com

Mastering “Extend” in Veo 3.1 — Step-by-Step Implementation, Design Patterns, and the Latest API-Ready Guide

TL;DR

  • What Extend really is: Veo 3.1 can append to an existing Veo-generated video in 7-second chunks up to 20 times, then export it as one continuous clip (maximum output ≈148 seconds). It’s officially available via the Gemini API, and you can also operate it from Flow and Vertex AI UIs/SDKs.
  • Supported channels: Documentation for Gemini API (Veo 3.1/3.1 Fast) and Google Cloud Vertex AI includes Extend procedures. Flow also ships “Scene Extension”–type features and can handle audio.
  • Requirements & limits: Only Veo-generated videos can be extended. Input constraints include 9:16 or 16:9 / 720p, etc.
  • Who benefits: Ads/SNS ops teams / Newsrooms / UGC platforms / Education & internal training. You can grow short clips into longer ones in a steady 7-second cadence, automated via API.
  • Goal of this article: Provide the shortest path to production today (environment → base generation → Extend → QA → ops templates) for both UI and API, with copy-pasteable samples.

Who This Helps Most (Concrete Personas)

  • Ads / Social Ops Teams: Take a great 8-second shot and “extend the breath” in 7-second steps to a natural 30–60s cut, then run CTR / retention A/B tests.
  • News / Media Desks: Turn a short lead clip into a single finished piece by appending follow-ups and supers.
  • UGC / App Dev: Auto-concatenate Veo-generated user clips into templates with a consistent tone.
  • Education / Corporate Training: Build Intro → Steps → Summary via staged Extend, making edits/substitutions easy.

We prioritize accuracy here and exclude uncertain info. Specs/limits/channels are confirmed in public docs.


1. The Facts You Need First About Veo 3.1 “Extend” (as of Oct 2025)

  • Granularity: 7 seconds per extend, up to 20 extends. Final output is one concatenated video up to ≈148 seconds.
  • Input prerequisites: Only videos generated by Veo are eligible. Aspect ratio 9:16 or 16:9; resolution 720p; input ≤141 seconds. Via the Gemini API, Extend requires Veo output as input.
  • Where it works: Gemini API (Veo 3.1 / 3.1 Fast). Vertex AI offers both Console and API guidance. Flow provides “Scene Extension,” and audio can be extended as well.
  • Related upgrades: Veo 3.1 improves continuity control, including up to 3 reference images and first/last frame interpolation. Combined with Extend, you can scale length while preserving scene consistency.

In short, Extend lets you keep pushing a strong short shot forward—cleanly and in sequence.


2. End-to-End Workflow Design (Covering UI → API)

2-1. Core Flow (Big Picture)

  1. Lock the concept: Decide the subject, lens feel, color temperature—the spine of continuity.
  2. Create the short “core”: Generate ~8s of the most emblematic shot (leverage reference images and last-frame).
  3. Extend in 7-second steps: Split narrative prompt vs. constraint prompt to minimize drift.
  4. Quality check: Verify visual seams and audio continuity; re-Extend where needed.
  5. Export / publish: Finalize captions, logo, aspect for the target channel.

2-2. Which channel to use?

  • Fast prototyping: Flow (Scene Extension)Gemini API preview for fine-tuning.
  • Production: Use Gemini API or Vertex AI via SDK/REST. If you care about cloud ops (IAM/audit), center on Vertex AI.

3. Learn Extend from the UI (Flow / Vertex AI)

3-1. Flow (Scene Extension) Basics

  1. Generate a base with Veo 3.1 (use reference images and first/last frame to enforce consistency).
  2. Choose Extend / Scene Extension, enter your 7-second continuation prompt. Also specify ambience/BGM in the same world.
  3. Repeat up to 20 times and export as one continuous clip.

3-2. Vertex AI Console Extend

  • In Media Studio → Video, select Veo and run AI actions → Extend video on a generated clip. Add a prompt and generate. The API mirrors this flow.

4. Implement Extend with the Gemini API (with Code)

From here we’ll use Gemini API (Veo 3.1) to illustrate the canonical 8-second core → 7-second Extend × N pattern. Limits/prereqs follow the docs.

4-1. Prereqs

  • Models: veo-3.1-generate-preview or veo-3.1-fast-generate-preview
  • Eligible input: Veo-generated video (9:16/16:9, 720p, ≤141s)
  • Extend unit: 7s, up to 20 times (output is one continuous clip, max ≈148s)

4-2. Generate the Base Video (Python)

# pip install google-genai
import time
from google import genai
from google.genai import types

client = genai.Client()

prompt = (
  "A cinematic medium shot of a female reporter in a modern newsroom. "
  "Cool daylight tone, 35mm look, shallow depth of field. "
  "She turns to camera and starts walking forward. Include natural ambience."
)

op = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    prompt=prompt,
    # Tip: you can also pass reference_images (up to 3) and first/last frame
    # config=types.GenerateVideosConfig(reference_images=[...], last_frame=...),
)
while not op.done:
    time.sleep(10)
    op = client.operations.get(op)

video_asset = op.response.generated_videos[0].video
client.files.download(file=video_asset)
video_asset.save("base_8s.mp4")

Tip: Using up to 3 reference images plus first/last frame makes Extend continuity much easier.

4-3. Run a Single 7-Second Extend (Python)

import time
from google import genai
from google.genai import types

client = genai.Client()

extend_prompt = (
  "Continue the scene: she stops near the anchor desk, points to the right screen. "
  "Keep the same outfit, lighting, and ambience. Camera slowly pushes in."
)

op = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    video=video_asset,  # must be a preceding Veo output
    prompt=extend_prompt,
    config=types.GenerateVideosConfig(
        number_of_videos=1,
        resolution="720p",   # Extend assumes 720p per spec
    ),
)
while not op.done:
    time.sleep(10)
    op = client.operations.get(op)

extended = op.response.generated_videos[0].video
client.files.download(file=extended)
extended.save("extended_15s.mp4")

Key spec: 7-second steps, up to 20. Input must be Veo output, 720p, 9:16 or 16:9, etc.

4-4. Practical “Chain Extend” Template (Python)

def extend_chain(base_video, prompts, max_hops=5):
    """
    base_video: handle to a Veo-generated video (client.files.get(...) etc.)
    prompts: list of 7-second continuation prompts
    max_hops: up to 20 (per spec)
    """
    assert len(prompts) <= max_hops <= 20
    current = base_video
    results = []
    for i, p in enumerate(prompts, 1):
        op = client.models.generate_videos(
            model="veo-3.1-generate-preview",
            video=current,
            prompt=p,
            config=types.GenerateVideosConfig(
                number_of_videos=1,
                resolution="720p",
            ),
        )
        while not op.done:
            time.sleep(10)
            op = client.operations.get(op)
        current = op.response.generated_videos[0].video
        results.append(current)
    return current, results  # current is the continuous clip

4-5. JavaScript (Node.js) Skeleton

import { GoogleAI } from "@google/generative-ai"; // equivalent gemini-js SDK
const ai = new GoogleAI();

const base = await ai.models.generateVideos({
  model: "veo-3.1-generate-preview",
  prompt: "A cinematic shot in a modern newsroom. Cool daylight, 35mm.",
});

let video = base.response.generatedVideos[0].video;

const prompts = [
  "Continue: she approaches the desk; maintain lighting, ambience.",
  "Continue: over-shoulder view of the wall monitor; match color tone.",
  // ...
];

for (const p of prompts) {
  const op = await ai.models.generateVideos({
    model: "veo-3.1-generate-preview",
    video,
    prompt: p,
    config: { numberOfVideos: 1, resolution: "720p" }
  });
  video = op.response.generatedVideos[0].video;
}

// download video.bytes → save

Model IDs/params should follow the official tables (e.g., veo-3.1(-fast)-generate-preview, number_of_videos, resolution, etc.).


5. Implement Extend on Vertex AI (SDK / REST)

If you value IAM, audit, and cost visibility, Vertex AI is robust. Both Console and API support Extend.

5-1. Python (GenAI SDK on Vertex AI)

pip install --upgrade google-genai
export GOOGLE_CLOUD_PROJECT=your_project
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True
import time
from google import genai
from google.genai.types import GenerateVideosConfig, Video

client = genai.Client()

# Example: assume a base short was generated (omitted). Now extend 7s on an existing Veo clip.
op = client.models.generate_videos(
    model="veo-3.1-generate-preview",  # Verify current rollout in docs
    prompt="Continue: camera push-in; keep ambience and outfit.",
    video=Video(uri="gs://your-bucket/base_8s.mp4", mime_type="video/mp4"),
    config=GenerateVideosConfig(
        aspect_ratio="16:9",
        # In Vertex, writing to GCS is a practical pattern
        output_gcs_uri="gs://your-bucket/output/",
        # resolution and other params: see current model docs
    ),
)
while not op.done:
    time.sleep(15)
    op = client.operations.get(op)

print(op.result.generated_videos[0].video.uri)

The Vertex guides cover UI/REST/SDK Extend steps, plus auth/IAM and GCS outputs.


6. Prompt Design: How to Avoid Breaking Continuity

Recommended split

  • Context: World, lensing, color temp, ambience.
  • Directive: Concrete subject actions / camera moves.
  • Constraints: Fix wardrobe/props/lights, specify NGs (hand artifacts, smeared text, etc.).

Extend sample (every 7s)

[Context]
modern newsroom, cool daylight, 35mm look, shallow depth of field; natural ambience.

[Directive]
continue from previous shot; reporter slows down and glances at the wall monitor; camera pushes in slightly.

[Constraints]
keep same outfit and mic color; keep shadow direction consistent; avoid exaggerated hand deformation.

Using reference images: Provide three references (person/wardrobe/props) to stabilize identity. Combine with first/last frame interpolation for smoother motion joins.


7. Quality Management: Catch Failures Early, Fix Cheaply

  • Inspect boundary frames: At each join, verify pose/orientation/shadows consistency.
  • Audio continuity: Lock ambience/BGM keys in Context. Fade in post for polish.
  • Automate NG detection: Use CLIP similarity or SSIM to score adjacent cut consistency; if under a threshold, re-Extend with the same prompt, and fix seed to reduce jitter.
  • Production loop: Iterate “short → Extend → review → re-Extend only where needed” in 7-second units to cut cost/time/failure rate.

Veo 3.1 raises audio integration and Flow editing quality. Since lighting/shadows are more realistic, tightening constraint vocabulary often fixes subtle uncanny moments.


8. Common Questions (API & Operations)

Q1. Can I Extend via API?
Yes. Gemini API exposes Veo 3.1/3.1 Fast, and the Extend spec (7s × up to 20) is official. Vertex AI docs also include Extend steps.

Q2. How long can I go?
→ For Veo input ≤141s, append in 7-second steps, and output a single clip up to ≈148s.

Q3. Can I Extend vertical 9:16 videos?
Yes. 9:16 or 16:9 are supported (Extend assumes 720p).

Q4. If I want to try the UI first?
Flow with Scene Extension is quickest. You can extend with audio, and new editing features are available.

Q5. Pricing/quality?
→ Depends on model, resolution, and length. Veo 3.1 offers 1080p (8s) and 720p long-form, plus a Fast variant. Check the latest pricing/limits in the model docs/announcements.


9. Production Patterns (Gemini / Vertex)

9-1. Driver Swap (Abstraction → Implementations)

class ExtendDriver(Protocol):
    def generate(self, prompt: str, **cfg) -> "Video": ...
    def extend(self, video: "Video", prompt: str, **cfg) -> "Video": ...

class GeminiVeoDriver(ExtendDriver):
    ...

class VertexVeoDriver(ExtendDriver):
    ...

def extend_to_target(video, prompts, driver: ExtendDriver, limit=20):
    assert len(prompts) <= limit
    current = video
    for p in prompts:
        current = driver.extend(current, p, resolution="720p")
    return current
  • Fix an abstract interface so you can switch Gemini ↔ Vertex or future models easily.
  • Externalize queueing (RDB/Redis/SQS) and audit logs, and make 7-second retries trivial.

9-2. Template Prompts & a “Continuity Lexicon”

  • Keep a Context lexicon (lighting/lens/color temp/ambience).
  • Share Constraints (wardrobe/props/logo exposure/negative) in a team standard.
  • Codify when to use match cut vs. light dissolve as transition rules.

10. Sample Use Case: Extending a News Opener to 60 Seconds Naturally

Setup: 8-second anchor entrance → walk shot. Unified cool daylight / 35mm look.

Extend 7s × 7 (≈57s) prompts

  1. #1 (0–7s): “Anchor approaches the desk; gentle dolly-in. Ambience is the newsroom murmur.”
  2. #2 (7–14s): “Today’s topics faintly reflect on the monitor. Keep wardrobe, hairstyle, and shadow direction.”
  3. #3 (14–21s): “Picks up a pen and gestures to the right; do not change lighting.”
  4. #4 (21–28s): “A staff silhouette passes behind a glass panel; keep overall volume steady.”
  5. #5 (28–35s): “Leave space on the left for supers; plan a match cut.”
  6. #6 (35–42s): “A small nod and eye shift; preserve facial shape and mouth texture.”
  7. #7 (42–49s): “Glances down to documents; gently dip ambience a touch.”

Always-on NGs: “No severe finger deformation / no excessive blur or blown highlights / logo exposure under 2s / preserve hair length and color.”

Finishing: Adjust subtitles and BGM fades in post, then crop/re-layout for SNS (9:16) and YouTube (16:9).


11. Troubleshooting & Workarounds

  • Subject looks like a different person at the join
    → Fix identity with reference images (up to 3); explicitly lock hair/wardrobe/props in Constraints.
  • Exposure or color temp drifts
    → Fix Context with color temp and light direction vocabulary; tweak lighting/shadows in Flow if needed.
  • Audio feels off
    → Lock ambience/BGM keys in prompts; keep volume changes short and smooth.
  • Hitting length limits
    Back-plan total runtime to pick extend count. If needed, split into two parts and join in editing.

12. Wrap-Up — Weaving a Story in “7-Second Units”

  • Veo 3.1 Extend delivers 7s × up to 20 continuous extensions from Gemini API / Flow / Vertex AI. Start by nailing the input constraints (720p, 9:16 or 16:9, Veo-generated input) and output ceiling (≈148s).
  • The craft is to define an 8-second core, then lock Context / Directive / Constraints as a continuity lexicon. With reference images and first/last frames, you’ll get longer, cleaner cuts.
  • The operational play is ideate in UI → automate via API. Pair queues, audit logs, and retries, and extend fast, cheap, and reliably in 7-second units.

I think of Extend as “passing the baton of breath.” Keep a good shot from losing steam and hand it to the next 7 seconds—again and again. That’s how you lift retention and immersion. With Veo 3.1 Extend, you can encode that craft into code and design.


References (checked: 2025-10-29)

By greeden

Leave a Reply

Your email address will not be published. Required fields are marked *

日本語が含まれない投稿は無視されますのでご注意ください。(スパム対策)