Claude Code skill

veo-video-gen

Animate a still into a cinematic clip with Veo 3.1 — exact cost quoted before every run.

View on GitHub

Add the marketplace once, then install this skill:

claude plugin marketplace add johnkueh/claude-skills

claude plugin install veo-video-gen@johnkueh-skills

Or grab the whole collection: claude plugin install claude-skills@johnkueh-skills

See it

The same frame, moving — Veo 3.1.

The glp3.wiki hero: a GPT Image 2 still animated by Veo 3.1, played once and frozen on the last frame. 80¢.

Why it exists

Veo 3.1 turns a still into a cinematic clip, so it pairs with the GPT Image 2 skill: generate the hero frame there, animate it here. It bills a flat per-second rate by tier and resolution, so unlike a per-image API there's no guessing — the skill quotes the exact dollar cost before every run and logs it. It defaults to the play-once hero pattern (plays through, freezes on the last frame) over fighting loop seams, and bakes in the lessons that actually cost me time: the safety filter rejects shirtless/physique subjects (a clothed one passes) and Lite is stricter than Fast, so it warns on risky wording, keeps 'allow adult' on, and treats Fast as the floor for any figure. I built it shipping the glp3.wiki hero — a still, animated for eighty cents. A safety-filtered or timed-out run is $0.

In practice

Quote before spending

Input

veo-gen generate -i hero.png -p '…' --dry-run

Output

Veo 3.1 fast · 8s · 720p → $0.80 (charged $0 if safety-filtered or timed out). Prints the exact cost and exits without calling the API.

Animate a still → web files

Input

veo-gen generate -i hero.png --web -p 'a person by a window, even daylight, no glow'

Output

Feeds the GPT Image still in as the first frame, renders the clip, then ffmpeg encodes hero.web.mp4 + .web.webm + .poster.jpg ready for a <video>. $0.80, ~60s.

Watch the meter

Input

veo-gen cost

Output

Total: $2.60 (estimated) across 7 calls — 5 delivered, 2 RAI-filtered ($0). Cost is deterministic (seconds × rate), so the log is exact.

skills/veo-video-gen/SKILL.mdRaw

---
name: veo-video-gen
description: Generate cinematic videos with Google's Veo 3.1 (Gemini API) — text→video and image→video, pairing with the gpt-image-gen-2 still as the first frame. Quotes the exact $ cost before every call (flat per-second), supports true-loop (first=last frame) and web-ready MP4/WebM/poster output, and bakes in the play-once hero pattern and RAI-wording guidance. Triggers on "generate a video", "veo", "animate this image", "image to video", "cinematic hero video", "make a background video", "loop this image", "video from a still", or any motion/clip request.
---

# veo-video-gen

Turn a still (or a text brief) into a cinematic Veo 3.1 clip, quote the cost up
front, and optionally produce web-ready files. The natural pairing for
`gpt-image-gen-2`: generate a hero **still** there, animate it here. See
`PROMPTING.md` for the full prompt guide and the hard-won RAI rules.

**Setup:** TypeScript CLI (`cli.ts`) — run `pnpm install` in this directory once
(Node ≥ 18). Auth: `GEMINI_API_KEY` in `~/.config/veo-gen/env` as
`export GEMINI_API_KEY=…`, or exported in your shell. `ffmpeg` is optional
(only for `--web`). Usage log at `~/.config/veo-gen/usage.jsonl`.

## Your job

1. **Classify** the request: `text→video` (no source image) or `image→video`
   (animate a still — the default for heroes; use the gpt-image-gen-2 PNG).
2. **Decide loop vs play-once.** Default to **play-once** (let the clip play and
   freeze on the last frame in the page — drop the HTML `loop` attr). Only use a
   real loop (`--loop`) for genuinely ambient/cyclical motion.
3. **Assemble the prompt** per `PROMPTING.md` (Subject · Action · Camera ·
   Composition · Focus · Ambiance · Style). For a background hero: ONE subtle
   beat, and explicitly negate the "AI glow" (no god-rays/beams/glow/dust).
4. **Check the wording for RAI risk** (body/physique words + a person = likely
   rejection). The CLI warns; reword to neutral before spending.
5. **Quote the cost** (`--dry-run`) and show the user, then generate.
6. **Deliver:** with `--web`, hand back the MP4 + WebM + poster and the
   `<video>` wiring; otherwise the raw MP4.

If the user already gave a complete brief, skip the interview.

## Cost — quoted before every call

Pricing is **flat per-second × resolution** (audio included), so the cost is
**exact**, not a guess. The CLI prints it before every call and on `--dry-run`.

| Tier | 720p | 1080p | 4k |
|---|---|---|---|
| **Fast** (default) | **$0.10/s** | $0.12/s | $0.30/s |
| Standard | $0.40/s | $0.40/s | $0.60/s |
| Lite | $0.05/s | $0.08/s | — |

`cost = durationSeconds × rate`. An 8s/720p Fast clip = **$0.80**. A
**safety-filtered or timed-out** generation is **$0** — the quote is a ceiling.
The Gemini API returns no `$` in its response, so the CLI computes and logs the
cost itself (`cost_estimated: true`).

## Commands

Run from this skill's base directory.

### Image → video (the hero pairing) + web output

```bash
pnpm exec tsx cli.ts generate \
  -i ./hero.png -p "<motion prompt>" \
  --model fast --duration 8 --resolution 720p --web --out ./hero.mp4
```

`--web` writes `hero.web.mp4` (H.264, faststart), `hero.web.webm` (VP9), and
`hero.poster.jpg`. Drop those into a `<video autoplay muted playsinline>` (no
`loop` for play-once; add `poster="hero.poster.jpg"`).

### True forward loop (first frame == last frame)

```bash
pnpm exec tsx cli.ts generate -i ./hero.png --loop \
  -p "<ambient motion that returns to the start pose>" \
  --web --crossfade 0.5 --out ./loop.mp4
```

`--loop` sets the last frame equal to the first so motion departs and returns;
`--crossfade 0.5` (with `--web`) adds a seamless tail→head fade if the seam is
loose.

### Text → video

```bash
pnpm exec tsx cli.ts generate -p "<full scene + motion prompt>" --duration 6
```

### Dry run (quote only, no spend)

```bash
pnpm exec tsx cli.ts generate -i ./hero.png -p "..." --dry-run
```

### Cost log

```bash
pnpm exec tsx cli.ts cost            # total + delivered/rejected counts
pnpm exec tsx cli.ts cost --tail 10  # last 10 calls
```

## Options

- `-i, --image <path>` — first-frame image (image→video).
- `--last-frame <path>` — explicit last frame (first+last interpolation).
- `--loop` — true loop: last frame = first frame (needs `--image`).
- `-m, --model` — `fast` (default) · `standard` · `lite`.
- `-d, --duration` — `4` · `6` · `8` (default 8). No 10s on any tier.
- `-r, --resolution` — `720p` (default) · `1080p` · `4k` (no 4k on Lite).
- `-a, --aspect` — `16:9` (default) · `9:16`.
- `--negative <text>` — negative prompt.
- `--no-allow-adult` — disable `personGeneration: allow_adult` (ON by default).
- `--web` / `--crossfade <sec>` — ffmpeg web encode (+ optional crossfade loop).
- `-o, --out` · `--dry-run` · `--no-open`.

## The play-once hero pattern (recommended)

For a directional/gestural clip, **play once and hold the last frame** — like
superpower.com. No loop seam to engineer. Wire it as:

```html
<video class="hero" autoplay muted playsinline preload="metadata"
       poster="hero.poster.jpg">
  <source src="hero.web.webm" type="video/webm">
  <source src="hero.web.mp4"  type="video/mp4">
</video>
```

```css
.hero{position:absolute;inset:0;width:100%;height:100%;object-fit:cover;}
@media (prefers-reduced-motion: reduce){.hero{display:none;}}  /* poster shows */
@media (max-width: 768px){.hero{display:none;}}                /* mobile: poster */
```

End the generated clip on a good resting frame; set `poster` to the first frame.

## Gotchas (learned shipping the glp3.wiki hero)

- **RAI filter is the main cost sink.** Shirtless/physique subjects get rejected;
  neutralize the wording, keep `allow_adult`, prefer **Fast over Lite** for any
  figure (Lite rejects what Fast accepts). Rejections are $0 but cost a slow
  round-trip — get wording right first.
- **Kill the AI glow.** Negate god-rays/light-beams/glow/floating-dust explicitly.
- **API quirks** (handled by the CLI, noted for debugging): image field is
  `bytesBase64Encoded` not `inlineData`; `durationSeconds` is a number;
  `generateAudio` is rejected by the preview models.
- **Veo can be slow** (40s–10min) and occasionally times out ($0). The CLI polls
  up to ~12min and prints the operation name so you can retry.

See `PROMPTING.md` for the prompt structure, worked examples, and the full
RAI-wording list.