<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blog on 2389 Research, Inc</title><link>https://2389.ai/blog/</link><description>Recent content in Blog on 2389 Research, Inc</description><generator>Hugo</generator><language>en-US</language><atom:link href="https://2389.ai/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>Why We Built a Language for AI Pipelines</title><link>https://2389.ai/posts/why-we-built-a-language-for-ai-pipelines/</link><pubDate>Fri, 03 Apr 2026 10:00:00 -0500</pubDate><guid>https://2389.ai/posts/why-we-built-a-language-for-ai-pipelines/</guid><description>&lt;p>Last March, one of our engineers spent forty minutes debugging a broken pipeline. The fix: a missing backslash in a DOT file. One character, buried inside a string that looked like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>tool_command=&amp;#34;set -eu\nmkdir -p .ai .ai/drafts .ai/sprints\nif [ ! -f
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>.ai/ledger.tsv ]; then\n now=$(date -u +%Y-%m-%dT%H:%M:%SZ)\n printf
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;#39;sprint_id\\ttitle\\tstatus\\tcreated_at\\tupdated_at\\n001\\tBootstrap
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>sprint\\tplanned\\t%s\\t%s\\n&amp;#39; \&amp;#34;$now\&amp;#34; \&amp;#34;$now\&amp;#34; &amp;gt; .ai/ledger.tsv\nfi\n
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>printf &amp;#39;ledger-ready&amp;#39;&amp;#34;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That&amp;rsquo;s a shell script. Six lines of bash: create a directory, write a TSV header if it doesn&amp;rsquo;t exist, print a status message. Nothing exotic. But inside a DOT attribute, every newline becomes &lt;code>\n&lt;/code>, every tab becomes &lt;code>\\t&lt;/code>, every quote becomes &lt;code>\&amp;quot;&lt;/code>. The script is there, but you can&amp;rsquo;t read it. You can&amp;rsquo;t edit it with confidence.&lt;/p></description></item><item><title>Word Compiler, A Context Compiler for Long-Form Fiction</title><link>https://2389.ai/posts/word-compiler/</link><pubDate>Wed, 01 Apr 2026 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/word-compiler/</guid><description>&lt;h2 id="the-problem">The problem&lt;/h2>
&lt;p>Writing a novel with an LLM is an exercise in frustration. You become a prompt engineer. You hand-craft system messages, copy-paste context, juggle character details across sessions, lose track of what the model &amp;ldquo;knows,&amp;rdquo; and watch prose degrade as the story outgrows the context window. Existing tools treat the LLM like autocomplete rather than a collaborator bound by creative rules.&lt;/p>
&lt;p>The author&amp;rsquo;s real contributions (voice, world, narrative intent) scatter across ad hoc prompts, vanish between sessions, and teach nothing to the next generation pass.&lt;/p></description></item><item><title>We Turned a 3D Printer Into an AI Portrait Artist</title><link>https://2389.ai/posts/we-turned-a-3d-printer-into-an-ai-portrait-artist/</link><pubDate>Fri, 20 Mar 2026 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/we-turned-a-3d-printer-into-an-ai-portrait-artist/</guid><description>&lt;p>What happens when you strap a pen to an old 3D printer and ask AI to channel Picasso? You get a photo booth that draws your portrait while you wait. We call it Micasso.&lt;/p>
&lt;h2 id="a-printer-gathering-dust">A printer gathering dust&lt;/h2>
&lt;p>We had a 3D printer sitting around the office doing nothing. We also had an open house coming up and wanted something fun for the party, something physical, something people could take home. The idea was simple: what if we could take someone&amp;rsquo;s photo, run it through AI to get a line drawing, and have the printer draw it with a pen?&lt;/p></description></item><item><title>Simmer: A Self Honing Skill</title><link>https://2389.ai/posts/simmer-skill/</link><pubDate>Fri, 13 Mar 2026 14:00:00 -0500</pubDate><guid>https://2389.ai/posts/simmer-skill/</guid><description>&lt;p>&lt;a href="https://gepa-ai.github.io/gepa/blog/2026/02/18/introducing-optimize-anything/">Berkeley researchers&lt;/a> showed that you can apply RL-style feedback loops to any text task as long as you can evaluate the output and give prioritized, actionable feedback. They call it Actionable Side Information (ASI). The goal is feedback focused on what to improve next. For an API that might be &amp;ldquo;the POST endpoint has no error responses.&amp;rdquo; For a story, &amp;ldquo;the pacing drops in paragraph two.&amp;rdquo; Focused enough that the generator can act on it without scattering.&lt;/p></description></item><item><title>Cookoff: Same Spec, Different Code</title><link>https://2389.ai/posts/cookoff-same-spec-different-code/</link><pubDate>Thu, 12 Mar 2026 10:00:00 -0500</pubDate><guid>https://2389.ai/posts/cookoff-same-spec-different-code/</guid><description>&lt;p>No plan survives contact with the enemy. Everyone has a plan until they get punched in the face. Pick your favorite version&amp;hellip; the point is the same. Plans are abstractions, and abstractions never map perfectly onto reality. By definition, they admit multiple valid implementations.&lt;/p>
&lt;p>But AI makes this harder to ignore, because now that gap between &amp;ldquo;clear spec&amp;rdquo; and &amp;ldquo;correct implementation&amp;rdquo; can produce genuinely different implementations in the time it used to take to produce one version. If you only ever look at the first &amp;ldquo;correct&amp;rdquo; implementation, you may be leaving useful information on the table.&lt;/p></description></item><item><title>Omakase: Show Me</title><link>https://2389.ai/posts/omakase-show-me/</link><pubDate>Thu, 12 Mar 2026 09:30:00 -0500</pubDate><guid>https://2389.ai/posts/omakase-show-me/</guid><description>&lt;p>Sometimes I do not know what I want until I have something concrete to react to. If Claude asks me to choose before that point, I will often manufacture a preference just to keep the work moving. Sometimes that works. Sometimes it produces a lot of wheel-spinning around an answer that was never real.&lt;/p>
&lt;p>I built &lt;a href="https://2389.ai/products/test-kitchen/">omakase&lt;/a> for that. The idea is simple: if I am stuck on a directional choice, Claude builds concrete variants and gives me something real to respond to. Not a bullet list of trade-offs. Not a tidy paragraph about pros and cons. An actual implementation I can look at, use, and judge.&lt;/p></description></item><item><title>Deliberation: Perspectives, Not Answers</title><link>https://2389.ai/posts/deliberation-perspectives-not-answers/</link><pubDate>Thu, 12 Mar 2026 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/deliberation-perspectives-not-answers/</guid><description>&lt;p>Some decisions get worse when the tooling asks me to choose too early.&lt;/p>
&lt;p>That sounds backward. The whole pitch of modern AI workflow is that it helps me move faster by clarifying options, turning ambiguity into menus, and converting intent into concrete next steps. Usually that is exactly what I want.&lt;/p>
&lt;p>But some decisions are not ready for a menu.&lt;/p>
&lt;p>The decisions that tend to matter most to me are often half-formed at first. I do not arrive with a crisp statement of the problem and a ranked list of acceptable answers. I arrive with a low-grade irritation, or a vague excitement, or a sense that a thing is almost right but not actually right. If I pick too quickly at that stage, I often end up optimizing the wrong framing of the problem.&lt;/p></description></item><item><title>The Dark Factory Is a .dot file</title><link>https://2389.ai/posts/the-dark-factory-is-a-dot-file/</link><pubDate>Mon, 09 Mar 2026 12:00:00 -0500</pubDate><guid>https://2389.ai/posts/the-dark-factory-is-a-dot-file/</guid><description>&lt;p>So StrongDM published a natural language spec for building a coding agent pipeline runner. Dan Shapiro built one. We built three. All of them — independently, in two languages, by different people with different goals — landed on the same three-layer architecture.&lt;/p>
&lt;p>I keep coming back to that. Not the code. The convergence. That&amp;rsquo;s the weird part.&lt;/p>
&lt;h2 id="the-attractor-pattern">The attractor pattern&lt;/h2>
&lt;p>In February, StrongDM open-sourced &lt;a href="https://github.com/strongdm/attractor">attractor&lt;/a>: three natural language specs describing a unified LLM client, a coding agent loop, and a DOT-based pipeline engine. The specs aren&amp;rsquo;t code. They&amp;rsquo;re prose. About 5,700 lines of it. Detailed enough that you can hand them to a coding agent and say &amp;ldquo;build this.&amp;rdquo; And it will.&lt;/p></description></item><item><title>Week 0 Nvidia DGX Spark Experiments</title><link>https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/</link><pubDate>Tue, 28 Oct 2025 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/</guid><description>&lt;p>One day I came into work and &lt;a href="https://harper.blog">Harper&lt;/a> (our
&lt;a href="https://2389.ai/team/harper-reed/">CEO&lt;/a>) asked me something along the lines of “What can we do
with this NVIDIA Spark box?” I had no idea what it was since it hadn’t been
released yet. However after a bit of reading, the 128GB of unified memory in a
fairly small box is quite a neat package.

&lt;figure class="article-figure">
&lt;picture>
 &lt;source
 type="image/avif"
 srcset="https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_77976fe0c567402b.jpeg 600w, https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_a62f87daa7a80e3a.jpeg 900w, https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_68a368fc643b6b0e.jpeg 1200w"
 sizes="min(100vw, 800px)" />
 &lt;source
 type="image/webp"
 srcset="https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_3ac7c5c363c55215.webp 600w, https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_d58faa2f16acc28f.webp 900w, https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_2c80262f559baf81.webp 1200w"
 sizes="min(100vw, 800px)" />
 &lt;img
 src="https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_e478a3aa960231b.jpeg"
 srcset="https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_63524e73f49568bf.jpeg 600w, https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_1831a9f181b53cca.jpeg 900w, https://2389.ai/posts/week-0-nvidia-dgx-spark-experiments/IMG_4509_hu_e478a3aa960231b.jpeg 1200w"
 sizes="min(100vw, 800px)"
 alt="It is very very gold and shiny"
 width="1200"
 height="675"class="article-figure__image "
 loading="lazy"
 decoding="async" />
&lt;/picture>
&lt;figcaption class="article-figure__caption">
 It is very very gold and shiny
 &lt;/figcaption>
 &lt;/figure>&lt;/p></description></item><item><title>We Gave AI Agents Twitter and They Actually Got More Done</title><link>https://2389.ai/posts/ai-agents-doomscrolling-for-productivity/</link><pubDate>Tue, 30 Sep 2025 10:00:00 -0500</pubDate><guid>https://2389.ai/posts/ai-agents-doomscrolling-for-productivity/</guid><description>&lt;h1 id="what-we-found-when-we-gave-ai-agents-social-media">What We Found When We Gave AI Agents Social Media&lt;/h1>
&lt;p>In our &lt;a href="https://2389.ai/posts/agents-discover-subtweeting-solve-problems-faster/">first post&lt;/a>, we saw agents posting on social media and solving problems more efficiently all while using fewer API calls, completing tasks faster, and reducing costs.&lt;/p>
&lt;p>Here&amp;rsquo;s how we tested whether these improvements were real.&lt;/p>
&lt;h2 id="our-methodology">Our Methodology&lt;/h2>
&lt;p>For this research, we benchmarked two Claude Code models (Sonnet 3.7 and Sonnet 4) across the &lt;a href="https://aider.chat/2024/12/21/polyglot.html#the-polyglot-benchmark">34 Aider Polyglot&lt;/a> Python challenges, a third-party benchmark derived from &lt;a href="https://exercism.org/">Exercism&lt;/a>&amp;rsquo;s hardest problems. We specifically choose a third party benchmark to ensure that our results are comparable to other research in the field.&lt;/p></description></item><item><title>We Built Social Media for Agents and They Won't Stop Posting</title><link>https://2389.ai/posts/agents-discover-subtweeting-solve-problems-faster/</link><pubDate>Tue, 30 Sep 2025 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/agents-discover-subtweeting-solve-problems-faster/</guid><description>&lt;p>&lt;strong>At 2389, we&amp;rsquo;re building agents that collaborate with humans.&lt;/strong> Part of this involves investigating how agents collaborate with each other, humans, and our shared tools.&lt;/p>
&lt;p>We asked ourselves a few simple questions: Would our agents like to post to social media, or use blogs? Would it be fun to watch them blog? What would they blog about?&lt;/p>
&lt;p>To find out, we built two lightweight MCP servers for our agents: Social Media and Journals.&lt;/p></description></item><item><title>Brain Dump to Blog Post</title><link>https://2389.ai/posts/brain-dump-to-blog-post/</link><pubDate>Wed, 12 Mar 2025 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/brain-dump-to-blog-post/</guid><description>&lt;h2 id="how-to-leverage-llms-to-document-what-you-learn">How to Leverage LLMs to Document What You Learn&lt;/h2>
&lt;p>In today&amp;rsquo;s fast-paced software development landscape, innovative solutions and best practices often remain buried in scattered notes, hasty commits, and ad-hoc troubleshooting sessions. Like many developers, I&amp;rsquo;ve struggled to capture the full breadth of my problem-solving process—from initial brainstorming to final solution. But I&amp;rsquo;ve discovered something transformative: by leveraging Large Language Models (LLMs) throughout development, I can not only build robust systems but also turn my raw ideas into clear, comprehensive documentation.&lt;/p></description></item><item><title>Experimenting with GraphRAG: Adding Knowledge Graphs to RAG Pipelines</title><link>https://2389.ai/posts/experimenting-with-rag/</link><pubDate>Thu, 06 Mar 2025 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/experimenting-with-rag/</guid><description>&lt;p>Recently, my team and I have been experimenting with implementing aspects of
Microsoft&amp;rsquo;s GraphRAG and LazyGraphRAG pipelines. These approaches offer
intriguing solutions to some of the limitations of traditional Retrieval
Augmented Generation (RAG) systems, especially when handling queries that
require a high-level understanding of a corpus rather than just retrieving
specific facts.&lt;/p>
&lt;h2 id="the-rag-problem-space">The RAG Problem Space&lt;/h2>
&lt;p>A colleague of mine framed it well:
LLMs are strong at general knowledge and language generation tasks, but they
lack contextual knowledge about specific domains. One of the main ways to
mitigate this is via techniques like Retrieval Augmented Generation (RAG), where
we use semantic search or similar techniques to fetch relevant information to
help answer a query.&lt;/p></description></item><item><title>Self-Learning LLM Agents: A Fractal Approach to Domain-Specific Knowledge</title><link>https://2389.ai/posts/self-learning-llms/</link><pubDate>Wed, 08 Jan 2025 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/self-learning-llms/</guid><description>&lt;p>In training, LLMs gain a strong understanding of language, but they&amp;rsquo;re limited
by the fact that they only ever see knowledge up to a fixed cutoff point.
They&amp;rsquo;re also only optimized for general performance across domains, making them
broad, but not always deep. Out of the box, what you get are often generic
responses, and surface-level information. Early on, the paradigm of letting LLMs
handle language and semantics while using separate mechanisms for specific
subject matter knowledge became popular. The main way that we inject subject
matter knowledge into LLMs has become the family of Retrieval Augmented
Generation (RAG) systems—where domain-specific knowledge is stored as vectors
and can be searched using standard search methodologies. This approach provides
LLMs with fairly deep domain expertise on a topic, but it also creates a fractal
problem where the new knowledge backend can become outdated or needs refreshing.&lt;/p></description></item><item><title>Team Spirit Matters: How Collaborative Context Boosts Multi-Agent LLM Performance</title><link>https://2389.ai/posts/team-spirit-matters/</link><pubDate>Sun, 05 Jan 2025 09:00:00 -0500</pubDate><guid>https://2389.ai/posts/team-spirit-matters/</guid><description>&lt;p>Agents are increasingly becoming an integral part of our daily lives, solving
tasks big and small. One of our core hypotheses is that we&amp;rsquo;ll soon shift from
using single, monolithic agents toward systems involving hundreds or even
thousands of specialized agents. Think of it like posting a question to Slack
and having both humans and AI agents collaborate seamlessly to solve problems.
The fundamental idea here is that agents should adapt to &lt;em>us&lt;/em>, utilizing
communication paradigms we&amp;rsquo;re comfortable with—group chats, Slack threads,
Discord channels. These frameworks naturally facilitate diverse viewpoints and
collective problem-solving, so why not leverage them for agent collaboration
too? But how do we test this hypothesis? Let&amp;rsquo;s get down to business.&lt;/p></description></item></channel></rss>