Claude Code skill
media-image-gen
GPT Image 2 — routed through your ChatGPT plan, with clean transparent cut-outs.
Add the marketplace once, then install this skill:
claude plugin marketplace add johnkueh/claude-skillsclaude plugin install media-image-gen@johnkueh-skillsOr grab the whole collection: claude plugin install claude-skills@johnkueh-skills
Why it exists
GPT Image 2 is the best text-to-image model I've used, but the OpenAI API bills it per image — about $0.13 a shot, which adds up fast when you're iterating on a logo or a sheet of stickers. The find that reshaped this skill: if you pay for ChatGPT Plus or Pro, Codex already runs GPT Image 2 against that subscription, and you can borrow the same path. So the skill now talks to the local Codex login and bills generation against your plan quota instead of an API key — $0 per image. It still does the careful parts: turns a one-line brief into a structured Scene → Subject → Details → Composition → Constraints prompt, cuts stickers and icons out to clean transparent PNGs, and logs the cost of anything that does hit the paid API. It defaults to the free plan path when you're signed in and falls back to the key when you're not. For a whole set, a batch command fans a JSON manifest out to eight generations at once against one shared proxy — the way the glp3.wiki article heroes get regenerated in a single command.
In practice
---
name: media-image-gen
description: Generate images, illustrations, logos, infographics, photoreal shots, UI mockups, and ads with OpenAI's GPT Image 2. Translates the user's loose request into a cookbook-aligned prompt, supports reference images / moodboards for style transfer, and logs token usage + actual $ cost per call. Triggers on "make me a logo", "generate an image of…", "create an illustration", "design a poster", "gpt-image-gen", "gpt image", "image generation", "moodboard", "style transfer from this image", or any visual asset request.
---
# media-image-gen
Turn the user's loose visual brief into a well-engineered GPT Image 2 prompt, generate the asset, and log cost. Built around OpenAI's official prompting guide — see `PROMPTING.md` in this directory for the full distilled cookbook.
**Setup:** TypeScript CLI (`cli.ts`) — run `pnpm install` in this directory once (needs Node.js ≥ 18). Auth either way:
- **API key** (default): in `~/.config/image-gen/env` as `export OPENAI_API_KEY=sk-…`, or exported in shell.
- **ChatGPT plan** (default when signed in): no API key — bills your ChatGPT Plus/Pro quota. Run `pnpm exec tsx cli.ts setup` once to sign in; after that it's used automatically unless you pass `--api`. See the auth section below.
Usage log at `~/.config/image-gen/usage.jsonl`.
## Your job
1. **Classify** the request into one of these categories:
`logo | illustration | photoreal | infographic | ui-mockup | ad | story-panel | style-transfer | edit`
2. **Interview** the user for anything missing. Ask one short message — no questionnaires. The critical fields by category are listed below.
3. **Assemble** the prompt using the structure in `PROMPTING.md`: Scene → Subject → Details → Composition → Constraints. Quote literal text. Spell tricky words letter-by-letter.
4. **Show the user the final prompt + estimated cost** (`--dry-run` first if you're unsure).
5. **Call** `cli.ts generate` or `cli.ts edit` and report the actual cost.
6. **Iterate small.** Single-change edits — "change only X, keep everything else the same" — and repeat the preserve list each turn (per the cookbook's anti-drift rule).
If the user already provided a complete brief, skip step 2.
## Critical fields by category
- **logo**: brand name, what it does, vibe (warm/sharp/playful/serious), whether literal wordmark or symbol-only
- **illustration**: subject, style ref (Ghibli/flat/watercolor/3D), palette, framing
- **photoreal**: subject, action, lens/lighting cues, location, mood — and the word "photorealistic" goes in the prompt
- **infographic**: topic, audience, required components (list them explicitly), label/no-label preference
- **ui-mockup**: product/app, screen purpose, real interface elements (not concept art language)
- **ad**: brand, audience, concept, exact tagline (in quotes), placement
- **story-panel**: narrative beat for this panel, characters' actions
- **style-transfer / edit**: which reference is style vs. content, what must change, what must NOT change
## Commands
Run from this skill's base directory.
### Generate (text → image)
```bash
pnpm exec tsx cli.ts generate \
-p "Original logo for Field & Flour, a local bakery. Warm, simple, timeless. Clean vector-like shapes, strong silhouette, balanced negative space. Flat design, minimal strokes, no gradients. Single centered mark with generous padding, plain background." \
--size 1024x1024 --quality high --format png --out ./field-and-flour.png
```
### Generate (dry run — see prompt + cost estimate without spending)
```bash
pnpm exec tsx cli.ts generate -p "..." --quality high --dry-run
```
### Generate transparent (sticker / icon / empty-state art)
`gpt-image-2` dropped native transparent backgrounds — its `background` enum
only accepts `auto` and `opaque` now (the model was trained for scene
consistency, not isolated cut-outs). Confirmed for the ChatGPT-plan/Responses
path too: requesting `background: "transparent"` returns
`"Transparent background is not supported for this model."` — it's a model
limitation, not an API-surface one. So `--transparent` works around it on both
routes: auto-appends a magenta-bg instruction block to your prompt, forces
opaque output, then keys out the magenta.
The keyer is a proper **soft matte + decontamination + despill** (not a hard
threshold), so anti-aliased edges stay clean instead of leaving a pink halo:
edge pixels get partial alpha, then their true colour is recovered by un-mixing
the known magenta background (`fg = (observed − (1−α)·magenta) / α`), and any
residual magenta cast on opaque pixels is shaved off.
```bash
pnpm exec tsx cli.ts generate \
-p "Hand-illustrated watercolor still-life of a vintage red postbox with a single white envelope peeking out the slot. Soft warm lantern-yellow rim light. Centered single subject, ~70% of canvas. NO text or labels." \
--size 1024x1024 --quality high --transparent --out ./postbox.png
```
The chroma-key is also exposed as a standalone command if you want to
strip a key color from an existing image:
```bash
pnpm exec tsx cli.ts chroma-key ./input.png -o ./output.png
# tune: --lo (keep more, raise toward 0.3) / --hi (cut more, lower toward 0.45) / --despill 0-1
pnpm exec tsx cli.ts chroma-key ./input.png --lo 0.18 --hi 0.55 --despill 0.8
```
**Prompt the subject to avoid pure magenta.** The keyer flags a pixel as
background by its magenta coverage `m = (min(R,B) − G)/255`, so greens, browns,
yellows, and whites are safe; only genuinely magenta/hot-pink subject areas get
keyed. If a subject edge is being eaten, raise `--lo`; if magenta survives in
corners, lower `--hi`.
### Edit / style-transfer / moodboard (image(s) + prompt → image)
```bash
# Single ref
pnpm exec tsx cli.ts edit \
-p "Remove the flower from the man's hand. Do not change anything else — preserve face, pose, lighting, background, camera angle." \
--ref input.png --out ./edited.png
# Style transfer — reference by index in the prompt
pnpm exec tsx cli.ts edit \
-p "Image 1 is a style reference; Image 2 is the subject. Apply the watercolor brushwork, muted palette, and paper texture of Image 1 to the scene in Image 2. Keep Image 2's composition and subject pose unchanged." \
--ref style-ref.jpg --ref subject.png --out ./styled.png
# Moodboard (multiple refs for vibe, new content)
pnpm exec tsx cli.ts edit \
-p "Use the mood, palette, and lighting from these reference images. Generate a new scene: <subject>. Do not copy any subjects from the references; only their style." \
--ref mood1.jpg --ref mood2.jpg --ref mood3.jpg --out ./new.png
```
### Cost log
```bash
pnpm exec tsx cli.ts cost # total + per-mode + per-day summary
pnpm exec tsx cli.ts cost --tail 10 # last 10 calls
pnpm exec tsx cli.ts cost --days 7 # last 7 days only
```
### Batch — many images in parallel (ChatGPT plan)
Generate a whole set from a JSON manifest, running up to **8 at a time** against a
single shared `openai-oauth` proxy (no per-image proxy churn). This is the way to
do bulk generation on the free plan path.
```bash
pnpm exec tsx cli.ts batch --manifest images.json --concurrency 5
```
`images.json` is an array of items; each needs `prompt` + `out` (`size`/`quality`/`format` optional):
```json
[
{ "prompt": "Photorealistic editorial photograph: ...", "out": "public/images/a.webp" },
{ "prompt": "...", "out": "public/images/b.webp", "size": "1536x1024", "quality": "high" }
]
```
- `--concurrency <1-8>` — parallel generations (default 4; capped at 8).
- `--skip-existing` — skip items whose `out` already exists, so a re-run **resumes** and retries only failures.
- `--size` / `--quality` / `--format` — defaults for items that omit them; `--model` / `--reasoning` / `--oauth-port` as in `generate`.
Prints a JSON summary `{ ok, failed, failures[] }`. A failed item isn't written, so re-running with `--skip-existing` retries only the misses.
### Auth: ChatGPT plan by default, API key as fallback
The CLI **defaults to your ChatGPT plan** whenever `~/.codex/auth.json` exists
(no `$` charge — bills plan quota). If it's not signed in, it falls back to the
`OPENAI_API_KEY` path automatically. Override per call:
- `--chatgpt-auth` — force the ChatGPT-plan path.
- `--api` — force the API-key path even when ChatGPT auth is present.
The ChatGPT-plan path routes through the local
[`openai-oauth`](https://www.npmjs.com/package/openai-oauth) proxy and the
Responses API `image_generation` tool (gpt-image-2 inside the model's reasoning
loop), the same mechanism Codex itself uses.
**One-time setup** — sign in with your ChatGPT account (caches the token at `~/.codex/auth.json`) and verify:
```bash
pnpm exec tsx cli.ts setup # runs `npx @openai/codex login` + doctor
pnpm exec tsx cli.ts doctor # re-check anytime (npx, auth, proxy reachability)
```
If you skip `setup`, the first `--chatgpt-auth` call auto-runs the login itself. Then just add the flag — the `openai-oauth` proxy is auto-started:
```bash
pnpm exec tsx cli.ts generate -p "Flat vector logo for a bakery, warm and simple" --chatgpt-auth
pnpm exec tsx cli.ts edit -p "Make the sky a warm sunset, keep everything else" --ref photo.png --chatgpt-auth
```
- `--model` — `gpt-5.5` (default, strongest reasoning), `gpt-5.4`, `gpt-5.4-mini`. The model drives the `image_generation` tool's planning; higher tiers use more quota.
- `--reasoning` — effort for that planning: `none|low|medium|high|xhigh` (default `medium`).
- `--web-search` — off by default (keeps the prompt verbatim + faster); enable for real-person/factual accuracy.
- `--oauth-port` — proxy port (default `10531`).
- `--transparent` works (post-process chroma-key). `--mask` and `--n > 1` are **not** supported on this path.
**Trade-offs vs. the API-key path:**
- No `$` cost; usage is logged with `cost_usd: 0, plan_quota: true`.
- For **bulk** generation, use `batch` (below) — it parallelizes the plan path across one shared proxy.
- The endpoint is **undocumented** and can change without notice. Personal use only.
**Unattended / background use (when Claude drives the skill).** After the one-time
`codex login`, the token auto-refreshes — no recurring login. The only remaining
gate is Claude Code's permission prompt when the agent spawns the `openai-oauth`
proxy. To run hands-off, add this once to your **own** `.claude/settings.json`
(a plugin can't grant itself shell permissions — you must opt in):
```jsonc
"permissions": {
"allow": [
"Bash(npx -y openai-oauth:*)",
"Bash(npx openai-oauth:*)",
"Bash(npx -y @openai/codex:*)",
"Bash(npx @openai/codex:*)"
]
}
```
With that in place: login persists + proxy auto-spawns silently → recurring
background generation with zero interaction. The only non-interactive stops are
plan-quota exhaustion or the upstream endpoint changing. (Running the CLI
yourself in a plain terminal needs none of this — the prompt is Claude-Code-only.)
## Options reference
- `--size` — `auto` (default), `1024x1024`, `1024x1536` (portrait), `1536x1024` (landscape)
- `--quality` — `low` (drafts, $0.008/1024² img), `medium` ($0.032), `high` ($0.125, default), `auto`
- `--format` — `png` (default), `webp`, `jpeg`
- `--background` — `auto`, `opaque`. (`transparent` is documented by the API but rejected by gpt-image-2; use `--transparent` instead.)
- `--transparent` / `-t` — opaque magenta render + soft-matte/despill keyer → clean transparent PNG. Sticker / icon / empty-state use cases. Works on both auth paths.
- `--n` — number of variations (default 1; ignored on the ChatGPT-plan path, which returns 1 per call)
- `--dry-run` — print prompt + cost estimate, don't call API
- `--no-open` — don't auto-open the result in Preview
**Auth (see auth section):**
- *(default)* — ChatGPT plan if `~/.codex/auth.json` exists, else API key
- `--chatgpt-auth` — force ChatGPT-plan path · `--api` — force API-key path
- `--model` (`gpt-5.5`|`gpt-5.4`|`gpt-5.4-mini`) · `--reasoning` (`none`..`xhigh`, default `medium`) · `--web-search` · `--oauth-port`
## Pricing (logged automatically)
- Text input: $5/1M tokens (cached $1.25/1M)
- Image input (refs): $8/1M tokens (cached $2/1M)
- Image output: $30/1M tokens
Typical actual costs:
- 1024×1024 high quality generate: ~$0.13
- 1024×1024 low quality (draft): ~$0.01
- 1024×1536 high quality generate: ~$0.19
- Edit with 1 ref + high output: ~$0.14-0.15
Cost is **estimated pre-flight** and shown before each call; **actual cost** is computed from the API's `usage` response and logged to `~/.config/image-gen/usage.jsonl`.
## Iteration rules (from the cookbook)
- **Don't overload one prompt.** Start with a clean base; refine with small single-change follow-ups ("warmer lighting", "remove the extra tree", "make the logo mark thicker").
- **Repeat the preserve list every iteration.** The model doesn't remember previous turns — say "keep face, lighting, background, camera angle" again each time.
- **Use "change only X / keep everything else the same"** for surgical edits.
- **Don't over-spec camera details.** Lens/aperture are interpreted loosely; use them for vibe, not exact simulation.
- **Stock-photo wording kills logos & UI work.** Write logos like "vector-like, balanced negative space, scalable, flat"; write UI like "shipped interface, real interface elements", not "design sketch of…".
See `PROMPTING.md` for category-by-category prompt templates and worked examples.