Most code is pattern-following, not reasoning. Speed-run splits the work: Cerebras generates code at ~2000 tokens/second for the predictable parts, Claude handles architecture decisions and surgical fixes for the rest. The result is the same quality code at ~60% fewer Claude tokens and 20x faster generation per file.
Install
/plugin marketplace add 2389-research/claude-plugins
/plugin install speed-run
You’ll need a free Cerebras API key from cloud.cerebras.ai. Add it to ~/.claude/settings.json:
{
"env": {
"CEREBRAS_API_KEY": "your-key-here"
}
}
Restart Claude Code after setting the key.
What it does
Speed-run has three modes, each built for a different situation.
Turbo is the simplest. One task, one generation. You write a contract prompt — data types, API shape, algorithm steps, constraints — and Cerebras generates the code in under a second. Claude runs the tests and makes surgical fixes if anything’s off. Use it for algorithmic code, boilerplate, data transformations, or multi-file scaffolding.
Showdown runs multiple agents in parallel against the same design doc. Each agent reads the spec, creates its own implementation plan, generates code via Cerebras, and runs tests independently. A judge skill scores all variants across five criteria (fitness, complexity, readability, robustness, maintainability) and picks the winner. The key: agents create their own plans from the shared design, so you get genuine variation.
Any-Percent is for when you don’t know which approach is right. Instead of the same spec implemented differently, each variant tries a fundamentally different strategy — event-driven vs polling, SQLite vs Postgres, whatever the architectural fork is. Same scoring framework picks the winner, but now you’re comparing tradeoffs across designs rather than implementations.
All parallel modes dispatch every agent in a single message. Sequential dispatch would defeat the purpose.
How it works
The pipeline for any mode:
- Claude writes the spec (contract prompt or design doc)
- Cerebras generates first-pass code (~0.5s per file, 80-95% correct)
- Claude runs tests against the output
- If tests fail, Claude fixes surgically — no regeneration from scratch
- For parallel modes: judge scores all variants, winner gets promoted, losers get cleaned up
The bundled MCP server (@2389/speed-run-mcp) handles Cerebras communication. It parses file blocks from generated output and writes them to disk. The server auto-registers when the plugin is installed.
Requirements
- Cerebras API key — free tier includes ~1M tokens/day
- Claude Code with plugin support
- Works best alongside
superpowersandfresh-eyes-reviewplugins (falls back gracefully without them)
