Simmer takes something you’ve written — a pitch email, a design spec, a system prompt, an API contract — and makes it better through repeated, structured passes. You define 2-3 criteria for “better,” and Simmer runs a generate-judge-reflect loop until the artifact stops improving or you tell it to stop.
Install
/plugin install simmer@2389-research
What it does
Each iteration has three steps:
Generate produces an improved version of the artifact, working from the judge’s feedback. The generator never sees scores — just the previous best candidate and a single directive about what to fix next.
Judge scores the new candidate 1-10 on each of your criteria and picks the single most important thing to fix next. This focused feedback is called ASI (Actionable Side Information). The judge always has the original seed and its scores as a calibration anchor, so scoring stays consistent across iterations.
Reflect records the trajectory — a table of scores across all iterations — and tracks the best candidate so far. If an iteration regresses, the next generator gets the best version, not the worse one.
You get a running score table like this:
| Iter | Criterion A | Criterion B | Criterion C | Composite | Key Change |
|---|---|---|---|---|---|
| 0 | 4 | 5 | 3 | 4.0 | seed |
| 1 | 7 | 5 | 4 | 5.3 | specific problem statement |
| 2 | 7 | 6 | 6 | 6.3 | low-friction CTA |
| 3 | 7 | 7 | 8 | 7.3 | peer-sharing tone |
After each batch of iterations, Simmer asks if you want to keep going.
How it works
The four subskills (setup, generator, judge, reflect) are deliberately isolated from each other’s context:
- Generator doesn’t see scores — prevents optimizing for numbers instead of quality
- Judge doesn’t see previous iterations’ scores — prevents anchoring bias
- Reflect is the only role with the full picture — it decides what gets passed forward
This isolation is the main design bet. Scattered “fix everything” feedback tends to produce lateral moves. One focused fix per round compounds into real improvement.
Works on anything Claude can read and produce: documents, emails, prompts, specs, creative writing, API designs. You pick the criteria that matter for your artifact type.
Requirements
Claude Code with the plugin system enabled. Part of the 2389 plugin marketplace — installable independently from the test-kitchen family.
