John KuehJohn Kueh
All skills

Claude Code skill

gpt-image-gen-2

Image generation via OpenAI GPT Image 2 with cost logging.

Install all skills in one command:

claude mcp add-plugin johnkueh/claude-skills

Why it exists

GPT Image 2 prompts need a specific structure — Scene, Subject, Details, Composition, Constraints — or output drifts between iterations. Cost is also opaque: $0.13 per generation adds up fast when iterating. This skill turns a one-line brief into a cookbook-aligned prompt, supports moodboards for style transfer, and logs every call's actual dollar cost to a JSONL.

In practice

Logo brief
Input
logo for a coffee app called Steam
Output
Asks one clarifying question for missing fields, assembles a Scene → Subject → Details → Composition → Constraints prompt, shows estimated cost, calls the API, logs $0.131 to ~/.config/image-gen/usage.jsonl.
Edit iteration
Input
change only the background to navy, keep everything else identical
Output
Uses the cookbook's anti-drift rule: edit mode + an explicit preserve list ('keep: subject, lighting, composition, typography').
skills/gpt-image-gen-2/SKILL.mdRaw
---
name: gpt-image-gen-2
description: Generate images, illustrations, logos, infographics, photoreal shots, UI mockups, and ads with OpenAI's GPT Image 2. Translates the user's loose request into a cookbook-aligned prompt, supports reference images / moodboards for style transfer, and logs token usage + actual $ cost per call. Triggers on "make me a logo", "generate an image of…", "create an illustration", "design a poster", "gpt-image-gen", "gpt image", "image generation", "moodboard", "style transfer from this image", or any visual asset request.
---

# gpt-image-gen-2

Turn the user's loose visual brief into a well-engineered GPT Image 2 prompt, generate the asset, and log cost. Built around OpenAI's official prompting guide — see `PROMPTING.md` in this directory for the full distilled cookbook.

**Setup:** API key in `~/.config/image-gen/env` as `export OPENAI_API_KEY=sk-…`, or exported in shell. Usage log at `~/.config/image-gen/usage.jsonl`.

## Your job

1. **Classify** the request into one of these categories:
   `logo | illustration | photoreal | infographic | ui-mockup | ad | story-panel | style-transfer | edit`
2. **Interview** the user for anything missing. Ask one short message — no questionnaires. The critical fields by category are listed below.
3. **Assemble** the prompt using the structure in `PROMPTING.md`: Scene → Subject → Details → Composition → Constraints. Quote literal text. Spell tricky words letter-by-letter.
4. **Show the user the final prompt + estimated cost** (`--dry-run` first if you're unsure).
5. **Call** `cli.py generate` or `cli.py edit` and report the actual cost.
6. **Iterate small.** Single-change edits — "change only X, keep everything else the same" — and repeat the preserve list each turn (per the cookbook's anti-drift rule).

If the user already provided a complete brief, skip step 2.

## Critical fields by category

- **logo**: brand name, what it does, vibe (warm/sharp/playful/serious), whether literal wordmark or symbol-only
- **illustration**: subject, style ref (Ghibli/flat/watercolor/3D), palette, framing
- **photoreal**: subject, action, lens/lighting cues, location, mood — and the word "photorealistic" goes in the prompt
- **infographic**: topic, audience, required components (list them explicitly), label/no-label preference
- **ui-mockup**: product/app, screen purpose, real interface elements (not concept art language)
- **ad**: brand, audience, concept, exact tagline (in quotes), placement
- **story-panel**: narrative beat for this panel, characters' actions
- **style-transfer / edit**: which reference is style vs. content, what must change, what must NOT change

## Commands

Run from this skill's base directory.

### Generate (text → image)

```bash
uv run python cli.py generate \
  -p "Original logo for Field & Flour, a local bakery. Warm, simple, timeless. Clean vector-like shapes, strong silhouette, balanced negative space. Flat design, minimal strokes, no gradients. Single centered mark with generous padding, plain background." \
  --size 1024x1024 --quality high --format png --out ./field-and-flour.png
```

### Generate (dry run — see prompt + cost estimate without spending)

```bash
uv run python cli.py generate -p "..." --quality high --dry-run
```

### Generate transparent (sticker / icon / empty-state art)

`gpt-image-2` dropped native transparent backgrounds — its `background` enum
only accepts `auto` and `opaque` now (the model was trained for scene
consistency, not isolated cut-outs). The `--transparent` flag works around it:
auto-appends a magenta-bg instruction block to your prompt, forces opaque
output, then post-processes the saved PNG to alpha out the magenta. Standard
chroma-key trick for sticker / cut-out assets.

```bash
uv run python cli.py generate \
  -p "Hand-illustrated watercolor still-life of a vintage red postbox with a single white envelope peeking out the slot. Soft warm lantern-yellow rim light. Centered single subject, ~70% of canvas. NO text or labels." \
  --size 1024x1024 --quality high --transparent --out ./postbox.png
```

The chroma-key is also exposed as a standalone command if you want to
strip a key color from an existing image:

```bash
uv run python cli.py chroma-key ./input.png -o ./output.png
uv run python cli.py chroma-key ./input.png --key-color FF00FF --tolerance 70
```

**Prompt the subject to avoid pure magenta.** Brand coral `#FF5A5F` is safe
(RGB distance² ≈ 33k, well above the default threshold of 14,700). True
hot-pink subjects will get partially keyed — recolor or raise tolerance.

### Edit / style-transfer / moodboard (image(s) + prompt → image)

```bash
# Single ref
uv run python cli.py edit \
  -p "Remove the flower from the man's hand. Do not change anything else — preserve face, pose, lighting, background, camera angle." \
  --ref input.png --out ./edited.png

# Style transfer — reference by index in the prompt
uv run python cli.py edit \
  -p "Image 1 is a style reference; Image 2 is the subject. Apply the watercolor brushwork, muted palette, and paper texture of Image 1 to the scene in Image 2. Keep Image 2's composition and subject pose unchanged." \
  --ref style-ref.jpg --ref subject.png --out ./styled.png

# Moodboard (multiple refs for vibe, new content)
uv run python cli.py edit \
  -p "Use the mood, palette, and lighting from these reference images. Generate a new scene: <subject>. Do not copy any subjects from the references; only their style." \
  --ref mood1.jpg --ref mood2.jpg --ref mood3.jpg --out ./new.png
```

### Cost log

```bash
uv run python cli.py cost              # total + per-mode + per-day summary
uv run python cli.py cost --tail 10    # last 10 calls
uv run python cli.py cost --days 7     # last 7 days only
```

## Options reference

- `--size` — `auto` (default), `1024x1024`, `1024x1536` (portrait), `1536x1024` (landscape)
- `--quality` — `low` (drafts, $0.008/1024² img), `medium` ($0.032), `high` ($0.125, default), `auto`
- `--format` — `png` (default), `webp`, `jpeg`
- `--background` — `auto`, `opaque`. (`transparent` is documented by the API but rejected by gpt-image-2; use `--transparent` instead.)
- `--transparent` / `-t` — opaque magenta render + chroma-key post-process → transparent PNG. Sticker / icon / empty-state use cases.
- `--n` — number of variations (default 1)
- `--dry-run` — print prompt + cost estimate, don't call API
- `--no-open` — don't auto-open the result in Preview

## Pricing (logged automatically)

- Text input: $5/1M tokens (cached $1.25/1M)
- Image input (refs): $8/1M tokens (cached $2/1M)
- Image output: $30/1M tokens

Typical actual costs:
- 1024×1024 high quality generate: ~$0.13
- 1024×1024 low quality (draft): ~$0.01
- 1024×1536 high quality generate: ~$0.19
- Edit with 1 ref + high output: ~$0.14-0.15

Cost is **estimated pre-flight** and shown before each call; **actual cost** is computed from the API's `usage` response and logged to `~/.config/image-gen/usage.jsonl`.

## Iteration rules (from the cookbook)

- **Don't overload one prompt.** Start with a clean base; refine with small single-change follow-ups ("warmer lighting", "remove the extra tree", "make the logo mark thicker").
- **Repeat the preserve list every iteration.** The model doesn't remember previous turns — say "keep face, lighting, background, camera angle" again each time.
- **Use "change only X / keep everything else the same"** for surgical edits.
- **Don't over-spec camera details.** Lens/aperture are interpreted loosely; use them for vibe, not exact simulation.
- **Stock-photo wording kills logos & UI work.** Write logos like "vector-like, balanced negative space, scalable, flat"; write UI like "shipped interface, real interface elements", not "design sketch of…".

See `PROMPTING.md` for category-by-category prompt templates and worked examples.