Claude Code skill

media-image-gen

GPT Image 2 — routed through your ChatGPT plan, with clean transparent cut-outs.

Add the marketplace once, then install this skill:

claude plugin marketplace add johnkueh/claude-skills

claude plugin install media-image-gen@johnkueh-skills

Or grab the whole collection: claude plugin install claude-skills@johnkueh-skills

Why it exists

GPT Image 2 is the best text-to-image model I've used, but the OpenAI API bills it per image — about $0.13 a shot, which adds up fast when you're iterating on a logo or a sheet of stickers. The find that reshaped this skill: if you pay for ChatGPT Plus or Pro, Codex already runs GPT Image 2 against that subscription, and you can borrow the same path. So the skill now talks to the local Codex login and bills generation against your plan quota instead of an API key — $0 per image. It still does the careful parts: turns a one-line brief into a structured Scene → Subject → Details → Composition → Constraints prompt, cuts stickers and icons out to clean transparent PNGs, and logs the cost of anything that does hit the paid API. It defaults to the free plan path when you're signed in and falls back to the key when you're not. For a whole set, a batch command fans a JSON manifest out to eight generations at once against one shared proxy — the way the glp3.wiki article heroes get regenerated in a single command.

In practice

Free, on your ChatGPT plan

Input

three watercolour location marks for journeys.im — don't charge my API

Output

Signed in to Codex, so it routes through the ChatGPT-plan quota: $0 billed, ~40s each. Pass --api to force the paid key path when you'd rather bill an API key.

A whole set in parallel

Input

regenerate every article hero for glp3.wiki from this manifest

Output

batch reads a JSON of prompt + output pairs and runs up to eight generations at once against one shared Codex proxy. The full glp3.wiki hero set regenerated 8-at-a-time instead of one-by-one — still $0 on the plan, with a summary of what landed and what to retry.

Transparent sticker

Input

a die-cut avocado mascot giving a thumbs up, sticker style

Output

GPT Image 2 has no native transparency, so it renders on a magenta key, then a soft-matte + despill pass cuts it out — a clean transparent PNG with no pink halo on the anti-aliased edges.

Edit without drift

Input

change only the background to navy, keep everything else identical

Output

Edit mode plus an explicit preserve list ('keep: subject, lighting, composition, typography') — the cookbook's anti-drift rule, so one tweak doesn't redraw the whole image.

skills/media-image-gen/SKILL.mdRaw

---
name: media-image-gen
description: Generate images, illustrations, logos, infographics, photoreal shots, UI mockups, and ads with OpenAI's GPT Image 2. Translates the user's loose request into a cookbook-aligned prompt, supports reference images / moodboards for style transfer, and logs token usage + actual $ cost per call. Triggers on "make me a logo", "generate an image of…", "create an illustration", "design a poster", "gpt-image-gen", "gpt image", "image generation", "moodboard", "style transfer from this image", or any visual asset request.
---

# media-image-gen
Turn the user's loose visual brief into a well-engineered GPT Image 2 prompt, generate the asset, and log cost. Built around OpenAI's official prompting guide — see `PROMPTING.md` in this directory for the full distilled cookbook.

**Setup:** TypeScript CLI (`cli.ts`) — run `pnpm install` in this directory once (needs Node.js ≥ 18). Auth either way:
- **API key** (default): in `~/.config/image-gen/env` as `export OPENAI_API_KEY=sk-…`, or exported in shell.
- **ChatGPT plan** (default when signed in): no API key — bills your ChatGPT Plus/Pro quota. Run `pnpm exec tsx cli.ts setup` once to sign in; after that it's used automatically unless you pass `--api`. See the auth section below.

Usage log at `~/.config/image-gen/usage.jsonl`.

## Your job

1. **Classify** the request into one of these categories:
   `logo | illustration | photoreal | infographic | ui-mockup | ad | story-panel | style-transfer | edit`
2. **Interview** the user for anything missing. Ask one short message — no questionnaires. The critical fields by category are listed below.
3. **Assemble** the prompt using the structure in `PROMPTING.md`: Scene → Subject → Details → Composition → Constraints. Quote literal text. Spell tricky words letter-by-letter.
4. **Show the user the final prompt + estimated cost** (`--dry-run` first if you're unsure).
5. **Call** `cli.ts generate` or `cli.ts edit` and report the actual cost.
6. **Iterate small.** Single-change edits — "change only X, keep everything else the same" — and repeat the preserve list each turn (per the cookbook's anti-drift rule).

If the user already provided a complete brief, skip step 2.

## Critical fields by category

- **logo**: brand name, what it does, vibe (warm/sharp/playful/serious), whether literal wordmark or symbol-only
- **illustration**: subject, style ref (Ghibli/flat/watercolor/3D), palette, framing
- **photoreal**: subject, action, lens/lighting cues, location, mood — and the word "photorealistic" goes in the prompt
- **infographic**: topic, audience, required components (list them explicitly), label/no-label preference
- **ui-mockup**: product/app, screen purpose, real interface elements (not concept art language)
- **ad**: brand, audience, concept, exact tagline (in quotes), placement
- **story-panel**: narrative beat for this panel, characters' actions
- **style-transfer / edit**: which reference is style vs. content, what must change, what must NOT change

## Commands

Run from this skill's base directory.

### Generate (text → image)

```bash
pnpm exec tsx cli.ts generate \
  -p "Original logo for Field & Flour, a local bakery. Warm, simple, timeless. Clean vector-like shapes, strong silhouette, balanced negative space. Flat design, minimal strokes, no gradients. Single centered mark with generous padding, plain background." \
  --size 1024x1024 --quality high --format png --out ./field-and-flour.png
```

### Generate with reference image(s) — coherence / anti-drift

Pass `--ref` (repeatable) to condition a from-scratch generation on a style/subject
bible — the same fridge/character/palette across a set of stills, instead of drifting
into a different one each call. Routes through the image-edit path automatically (the
generations endpoint can't take input images), so it works on both auth paths.

```bash
# anchor a series to one reference so every beat is the SAME scene
pnpm exec tsx cli.ts generate \
  -p "Same fridge as the reference — identical interior, shelves and lighting. New angle: low, looking up at the bottom shelf." \
  --ref ./anchor.png --quality high --out ./beat-01.png
```

(For editing an existing image or multi-ref moodboard/style-transfer, `edit` is the
dedicated verb — see below. `generate --ref` is the convenience for "new scene, keep
the bible".)

### Generate (dry run — see prompt + cost estimate without spending)

```bash
pnpm exec tsx cli.ts generate -p "..." --quality high --dry-run
```

### Generate transparent (sticker / icon / empty-state art)

`gpt-image-2` dropped native transparent backgrounds — its `background` enum
only accepts `auto` and `opaque` now (the model was trained for scene
consistency, not isolated cut-outs). Confirmed for the ChatGPT-plan/Responses
path too: requesting `background: "transparent"` returns
`"Transparent background is not supported for this model."` — it's a model
limitation, not an API-surface one. So `--transparent` works around it on both
routes: auto-appends a magenta-bg instruction block to your prompt, forces
opaque output, then keys out the magenta.

The keyer is a proper **soft matte + decontamination + despill** (not a hard
threshold), so anti-aliased edges stay clean instead of leaving a pink halo:
edge pixels get partial alpha, then their true colour is recovered by un-mixing
the known magenta background (`fg = (observed − (1−α)·magenta) / α`), and any
residual magenta cast on opaque pixels is shaved off.

```bash
pnpm exec tsx cli.ts generate \
  -p "Hand-illustrated watercolor still-life of a vintage red postbox with a single white envelope peeking out the slot. Soft warm lantern-yellow rim light. Centered single subject, ~70% of canvas. NO text or labels." \
  --size 1024x1024 --quality high --transparent --out ./postbox.png
```

The chroma-key is also exposed as a standalone command if you want to
strip a key color from an existing image:

```bash
pnpm exec tsx cli.ts chroma-key ./input.png -o ./output.png
# tune: --lo (keep more, raise toward 0.3) / --hi (cut more, lower toward 0.45) / --despill 0-1
pnpm exec tsx cli.ts chroma-key ./input.png --lo 0.18 --hi 0.55 --despill 0.8
```

**Prompt the subject to avoid pure magenta.** The keyer flags a pixel as
background by its magenta coverage `m = (min(R,B) − G)/255`, so greens, browns,
yellows, and whites are safe; only genuinely magenta/hot-pink subject areas get
keyed. If a subject edge is being eaten, raise `--lo`; if magenta survives in
corners, lower `--hi`.

### Edit / style-transfer / moodboard (image(s) + prompt → image)

```bash
# Single ref
pnpm exec tsx cli.ts edit \
  -p "Remove the flower from the man's hand. Do not change anything else — preserve face, pose, lighting, background, camera angle." \
  --ref input.png --out ./edited.png

# Style transfer — reference by index in the prompt
pnpm exec tsx cli.ts edit \
  -p "Image 1 is a style reference; Image 2 is the subject. Apply the watercolor brushwork, muted palette, and paper texture of Image 1 to the scene in Image 2. Keep Image 2's composition and subject pose unchanged." \
  --ref style-ref.jpg --ref subject.png --out ./styled.png

# Moodboard (multiple refs for vibe, new content)
pnpm exec tsx cli.ts edit \
  -p "Use the mood, palette, and lighting from these reference images. Generate a new scene: <subject>. Do not copy any subjects from the references; only their style." \
  --ref mood1.jpg --ref mood2.jpg --ref mood3.jpg --out ./new.png
```

### Cost log

```bash
pnpm exec tsx cli.ts cost              # total + per-mode + per-day summary
pnpm exec tsx cli.ts cost --tail 10    # last 10 calls
pnpm exec tsx cli.ts cost --days 7     # last 7 days only
```

### Batch — many images in parallel (ChatGPT plan)

Generate a whole set from a JSON manifest, running up to **8 at a time** against a
single shared `openai-oauth` proxy (no per-image proxy churn). This is the way to
do bulk generation on the free plan path.

```bash
pnpm exec tsx cli.ts batch --manifest images.json --concurrency 5
```

`images.json` is an array of items; each needs `prompt` + `out` (`size`/`quality`/`format` optional):

```json
[
  { "prompt": "Photorealistic editorial photograph: ...", "out": "public/images/a.webp" },
  { "prompt": "...", "out": "public/images/b.webp", "size": "1536x1024", "quality": "high" }
]
```

- `--concurrency <1-8>` — parallel generations (default 4; capped at 8).
- `--skip-existing` — skip items whose `out` already exists, so a re-run **resumes** and retries only failures.
- `--size` / `--quality` / `--format` — defaults for items that omit them; `--model` / `--reasoning` / `--oauth-port` as in `generate`.

Prints a JSON summary `{ ok, failed, failures[] }`. A failed item isn't written, so re-running with `--skip-existing` retries only the misses.

### Auth: ChatGPT plan by default, API key as fallback

The CLI **defaults to your ChatGPT plan** whenever `~/.codex/auth.json` exists
(no `$` charge — bills plan quota). If it's not signed in, it falls back to the
`OPENAI_API_KEY` path automatically. Override per call:
- `--chatgpt-auth` — force the ChatGPT-plan path.
- `--api` — force the API-key path even when ChatGPT auth is present.

The ChatGPT-plan path routes through the local
[`openai-oauth`](https://www.npmjs.com/package/openai-oauth) proxy and the
Responses API `image_generation` tool (gpt-image-2 inside the model's reasoning
loop), the same mechanism Codex itself uses.

**One-time setup** — sign in with your ChatGPT account (caches the token at `~/.codex/auth.json`) and verify:

```bash
pnpm exec tsx cli.ts setup     # runs `npx @openai/codex login` + doctor
pnpm exec tsx cli.ts doctor    # re-check anytime (npx, auth, proxy reachability)
```

If you skip `setup`, the first `--chatgpt-auth` call auto-runs the login itself. Then just add the flag — the `openai-oauth` proxy is auto-started:

```bash
pnpm exec tsx cli.ts generate -p "Flat vector logo for a bakery, warm and simple" --chatgpt-auth
pnpm exec tsx cli.ts edit -p "Make the sky a warm sunset, keep everything else" --ref photo.png --chatgpt-auth
```

- `--model` — `gpt-5.5` (default, strongest reasoning), `gpt-5.4`, `gpt-5.4-mini`. The model drives the `image_generation` tool's planning; higher tiers use more quota.
- `--reasoning` — effort for that planning: `none|low|medium|high|xhigh` (default `medium`).
- `--web-search` — off by default (keeps the prompt verbatim + faster); enable for real-person/factual accuracy.
- `--oauth-port` — proxy port (default `10531`).
- `--transparent` works (post-process chroma-key). `--mask` and `--n > 1` are **not** supported on this path.

**Trade-offs vs. the API-key path:**
- No `$` cost; usage is logged with `cost_usd: 0, plan_quota: true`.
- For **bulk** generation, use `batch` (below) — it parallelizes the plan path across one shared proxy.
- The endpoint is **undocumented** and can change without notice. Personal use only.

**Unattended / background use (when Claude drives the skill).** After the one-time
`codex login`, the token auto-refreshes — no recurring login. The only remaining
gate is Claude Code's permission prompt when the agent spawns the `openai-oauth`
proxy. To run hands-off, add this once to your **own** `.claude/settings.json`
(a plugin can't grant itself shell permissions — you must opt in):

```jsonc
"permissions": {
  "allow": [
    "Bash(npx -y openai-oauth:*)",
    "Bash(npx openai-oauth:*)",
    "Bash(npx -y @openai/codex:*)",
    "Bash(npx @openai/codex:*)"
  ]
}
```

With that in place: login persists + proxy auto-spawns silently → recurring
background generation with zero interaction. The only non-interactive stops are
plan-quota exhaustion or the upstream endpoint changing. (Running the CLI
yourself in a plain terminal needs none of this — the prompt is Claude-Code-only.)

## Options reference

- `--size` — `auto` (default), `1024x1024`, `1024x1536` (portrait), `1536x1024` (landscape)
- `--quality` — `low` (drafts, $0.008/1024² img), `medium` ($0.032), `high` ($0.125, default), `auto`
- `--format` — `png` (default), `webp`, `jpeg`
- `--background` — `auto`, `opaque`. (`transparent` is documented by the API but rejected by gpt-image-2; use `--transparent` instead.)
- `--transparent` / `-t` — opaque magenta render + soft-matte/despill keyer → clean transparent PNG. Sticker / icon / empty-state use cases. Works on both auth paths.
- `--ref <path>` (generate + edit) — reference image(s) to condition on (style/subject bible → coherence, anti-drift). Repeatable for multi-image input. On `generate` it routes through the image-edit path; works on both auth paths.
- `--n` — number of variations (default 1; ignored on the ChatGPT-plan path, which returns 1 per call)
- `--dry-run` — print prompt + cost estimate, don't call API
- `--no-open` — don't auto-open the result in Preview

**Auth (see auth section):**
- *(default)* — ChatGPT plan if `~/.codex/auth.json` exists, else API key
- `--chatgpt-auth` — force ChatGPT-plan path · `--api` — force API-key path
- `--model` (`gpt-5.5`|`gpt-5.4`|`gpt-5.4-mini`) · `--reasoning` (`none`..`xhigh`, default `medium`) · `--web-search` · `--oauth-port`

## Pricing (logged automatically)

- Text input: $5/1M tokens (cached $1.25/1M)
- Image input (refs): $8/1M tokens (cached $2/1M)
- Image output: $30/1M tokens

Typical actual costs:
- 1024×1024 high quality generate: ~$0.13
- 1024×1024 low quality (draft): ~$0.01
- 1024×1536 high quality generate: ~$0.19
- Edit with 1 ref + high output: ~$0.14-0.15

Cost is **estimated pre-flight** and shown before each call; **actual cost** is computed from the API's `usage` response and logged to `~/.config/image-gen/usage.jsonl`.

## Iteration rules (from the cookbook)

- **Don't overload one prompt.** Start with a clean base; refine with small single-change follow-ups ("warmer lighting", "remove the extra tree", "make the logo mark thicker").
- **Repeat the preserve list every iteration.** The model doesn't remember previous turns — say "keep face, lighting, background, camera angle" again each time.
- **Use "change only X / keep everything else the same"** for surgical edits.
- **Don't over-spec camera details.** Lens/aperture are interpreted loosely; use them for vibe, not exact simulation.
- **Stock-photo wording kills logos & UI work.** Write logos like "vector-like, balanced negative space, scalable, flat"; write UI like "shipped interface, real interface elements", not "design sketch of…".

See `PROMPTING.md` for category-by-category prompt templates and worked examples.