Skip to content
Simmer

Simmer

Iterative artifact refinement for Claude Code — hone any document, prompt, or spec over multiple rounds with criteria-driven scoring.

GitHub ← All Products

Simmer takes something you’ve written — a pitch email, a design spec, a system prompt, an API contract — and makes it better through repeated, structured passes. You define 2-3 criteria for “better,” and Simmer runs a generate-judge-reflect loop until the artifact stops improving or you tell it to stop.

Install

/plugin install simmer@2389-research

What it does

Each iteration has three steps:

Generate produces an improved version of the artifact, working from the judge’s feedback. The generator never sees scores — just the previous best candidate and a single directive about what to fix next.

Judge scores the new candidate 1-10 on each of your criteria and picks the single most important thing to fix next. This focused feedback is called ASI (Actionable Side Information). The judge always has the original seed and its scores as a calibration anchor, so scoring stays consistent across iterations.

Reflect records the trajectory — a table of scores across all iterations — and tracks the best candidate so far. If an iteration regresses, the next generator gets the best version, not the worse one.

You get a running score table like this:

IterCriterion ACriterion BCriterion CCompositeKey Change
04534.0seed
17545.3specific problem statement
27666.3low-friction CTA
37787.3peer-sharing tone

After each batch of iterations, Simmer asks if you want to keep going.

How it works

The four subskills (setup, generator, judge, reflect) are deliberately isolated from each other’s context:

  • Generator doesn’t see scores — prevents optimizing for numbers instead of quality
  • Judge doesn’t see previous iterations’ scores — prevents anchoring bias
  • Reflect is the only role with the full picture — it decides what gets passed forward

This isolation is the main design bet. Scattered “fix everything” feedback tends to produce lateral moves. One focused fix per round compounds into real improvement.

Works on anything Claude can read and produce: documents, emails, prompts, specs, creative writing, API designs. You pick the criteria that matter for your artifact type.

Requirements

Claude Code with the plugin system enabled. Part of the 2389 plugin marketplace — installable independently from the test-kitchen family.

30 products · 11 skills · 15 tools · 3 platforms · 5 building · hugo 0.148.2 · 9326ca1 · built Mar 14 00:14
2389 Radio
2389 RADIO Select a station