In training, LLMs gain a strong understanding of language, but they’re limited by the fact that they only ever see knowledge up to a fixed cutoff point. They’re also only optimized for general performance across domains, making them broad, but not always deep.
Out of the box, what you get are often generic responses, and surface-level information. Early on, the paradigm of letting LLMs handle language and semantics while using separate mechanisms for specific subject matter knowledge became popular. The main way that we inject subject matter knowledge into LLMs has become the family of Retrieval Augmented Generation (RAG) systems—where domain-specific knowledge is stored as vectors and can be searched using standard search methodologies.
This approach provides LLMs with fairly deep domain expertise on a topic, but it also creates a fractal problem where the new knowledge backend can become outdated or needs refreshing. Hence, we’ve seen the addition of tools like search for LLMs, allowing them to invoke Google search or similar tools for more up-to-date information. Yet, this doesn’t fully alleviate the need for RAG backends, as there is still a benefit to having deeper, specific subject matter expertise beyond what a handful of Google searches might yield.
At 2389, we believe the future is multi-agent. Rather than relying on a single monolithic agent, you’ll interact easily with hundreds or thousands of agents. In a conversation, a travel agent handles flights, a hotel agent manages bookings, and a dining agent curates local spots based on your preferences.
Can Agents Learn?
One of the major theoretical constraints is: how do we allow thousands of agents to build specific domain expertise? Traditionally, building knowledge bases meant custom-building and maintaining each one, which clearly doesn’t scale.
One of my personal areas of interest is exploring how to let an agent learn. Is it feasible to give an agent free rein to develop its own subject matter expertise over time, potentially adapting to a user’s interests? Even if the same agent is broadly available, a user-specific version could be tailored not only in its style and responses, but also in its underlying knowledge.
This post explores some recent experiments around self-learning agents. I’ve been playing with two projects:
- Autonomous Knowledge Base Creation: Can an agent build its own knowledge base around a given topic from minimal original input? This explores how we might get to thousands of agents without having to set up each one manually.
- Interactive Knowledge Augmentation: Using a similar pipeline, can an agent reflect on recent interactions, identify gaps in its own knowledge or areas of user interest, and then gather information to augment its knowledge base in response?
Fractal Nature of LLM Agents
Much of my work for this post is loosely inspired by projects like Google’s Co-Scientist and the Agent Laboratory paper, where agents make multiple passes over an activity to hone an idea or delve deeper into topics in a fractal design pattern. For instance, Google Co-Scientist uses iterative check cycles—an idea is generated, scored by other agents, and refined through multiple iterations. In the Agent Laboratory paper, agents cycle between roles (Postdoc, PhD student, software engineer, and machine learning engineer) to iteratively test and refine potential paper ideas.
As an e-commerce industry data scientist, much of my work has focused on backend semantic search algorithms. When implementing agents with RAG backends, it often comes down to fractal semantic search—whether we’re doing RAG, graphRAG, or a similar implementation. In a similar vein, the self-learning agent concept involves letting an agent examine an initial idea or conversation, extract items of interest, jargon, or concepts, and then research related ideas. Repeating this process at various depths yields increasingly detailed research into a topic.
From Zero to Not Quite a Hero
I won’t say that the agents become absolute domain experts after going through this pipeline, but they do learn interesting details and generate more compelling output than their off-the-shelf counterparts. The key is allowing an agent to choose how to use a search tool in various ways, performing multiple rounds of search in a fractal manner with cycles of introspection.
The setup works as follows:
- An agent receives a query or topic and decomposes it into a set of high-level queries. The goal of this first pass is to gather a superset of high-level information—timelines, design principles, and overarching themes.
- The agent then retrieves sets of search results for this initial superset. These websites are parsed into a markdown-like format, and from each, concepts, ideas, and topics are extracted. This enables further rounds of search based on the newly discovered information.
Example: French Cuisine
For instance, in one test, I had the agent learn about French cuisine with the following starting prompt:
“French Cuisine, regional specialties, ingredients, cooking principles, dishes, recipes”
From this, the bot generated a list of high-level topics to research:
1. Topic: Regions of France and their culinary characteristics
Goal: Identify and categorize the distinct regions of France to explore their unique culinary traditions and specialties.
2. Topic: Common ingredients used in French cuisine
Goal: Compile a comprehensive list of ingredients fundamental to French cooking, then analyze regional differences and traditional recipes.
3. Topic: Fundamental cooking techniques in French cuisine
Goal: Gather information on essential cooking methods used in French cuisine to provide insight into the principles behind dish preparation.
4. Topic: Traditional French dishes by region
Goal: Create a list of iconic dishes specific to each French region to understand their cultural significance.
5. Topic: Recipes for classic French dishes
Goal: Compile a collection of recipes for well-known French dishes to serve as practical examples of the identified techniques and ingredients.
Based on these topics, the agent then created more specific queries, such as:
Searching (1/34): List of regions in France with descriptions
Searching (2/34): Famous regional dishes in France by region
Searching (3/34): Local ingredients specific to French regions
Searching (4/34): Traditional French cooking methods by region
Searching (5/34): List of notable chefs and culinary schools in France by region
Searching (6/34): Traditional recipes from each region of France
Searching (7/34): Common ingredients in French cuisine
Searching (8/34): French cooking techniques
Searching (9/34): Regional specialties of French cuisine
In the first round, the agent collects a superset of information about French cuisine. Then, it performs additional rounds of search using context from the initial articles, drilling down into specifics related to the regions and their culinary nuances. So in the below snippet we can see it drilling deeper into the cuisines related to the various regions of France. Once this is done we can add all of the articles to the backend RAG database for this agent.
Expanded search (1/40): List of culinary specialties by region in France
Expanded search (2/40): Overview of administrative regions of France
Expanded search (3/40): Historical recipes by French region
Expanded search (4/40): Culinary characteristics of French overseas regions
Expanded search (5/40): Culinary specialties of Provence-Alpes-Côte d'Azur
Expanded search (6/40): Culinary specialties of Brittany
Expanded search (7/40): Culinary specialties of Hauts-de-France
Expanded search (8/40): Culinary specialties of Normandy
Expanded search (9/40): Culinary specialties of Île-de-France
Expanded search (10/40): Culinary specialties of Grand Est
Expanded search (11/40): Culinary specialties of Occitania
Testing the Knowledge Base
As a loose sense check, I ran a sample query through both GPT-4o-mini via a web client (with search enabled) and my custom agent (GPT-4o-mini with access to the new French cuisine backend).
Sample Query: “What are the regional specialties of Normandy?”
Generic GPT-4o-mini Output:
Normandy has a ton of unique culinary treats that capture its history and local ingredients. Highlights include cheeses like Camembert de Normandie, Pont-l’Évêque, and Neufchâtel; apple-based treats from its orchards, including cider and Calvados; fresh seafood due to its extensive coastline; rich dairy products; and hearty traditional dishes like Tripes à la mode de Caen.
This response was reasonable but generic. Now, let’s see what the new agent does.
New Agent Output Flow:
Query Decomposition:
The agent breaks down the user query into refined search queries:
- Regional specialties of Normandy
- Normandy food traditions
- Traditional dishes from Normandy
Backend Search & Aggregation:
The agent searches its knowledge base for each refined query, extracts context, and then generates a comprehensive answer based on the aggregated results.
Final Response Example:
Normandy is renowned for its rich culinary traditions and regional specialties, characterized by a blend of local ingredients, historical influences, and distinctive dishes. Here are some key specialties:
1. Cheeses
- Camembert de Normandie: A soft cheese with a bloomy rind and creamy interior.
- Livarot: Recognized by its cylindrical shape and plaited straw circles, this soft cheese features a washed rind.
- Pont-l’Évêque: Known for its square shape and distinctive orange rind.
- Neufchâtel: A soft, bloomy rind cheese, often heart-shaped.
2. Seafood
- Mussels with Normandy Cream: Often served with local crème fraîche or cheese sauces.
- Shellfish: Including scallops, oysters, and prawns; dishes like Marmite Dieppoise (a fish stew) and Moules à la crème (mussels in a cream sauce) are popular.
3. Meat Dishes
- Pré-salé Lamb: Sourced from salt-marsh grazing near Mont-Saint-Michel.
- Tripe à la mode de Caen: A slow-cooked tripe dish enhanced by local cider.
- Andouille de Vire: A seasoned sausage made from pork tripe.
4. Ciders and Calvados
- Cider: A staple in Normandy, available in a range of styles from dry to sweet.
- Calvados: An apple brandy used both as a beverage and in various sauces.
5. Pastries and Desserts
- Tarte Normande: An apple tart reflecting the region’s apple-growing heritage.
- Galettes and Crêpes: Savory buckwheat pancakes and sweet crepes, often paired with cider.
Conclusion:
Normandy’s culinary offerings reflect its rich heritage—emphasizing the “Four C’s”: Cider, Calvados, Cream, and Camembert. Whether through its exquisite cheeses, fresh seafood, or hearty meat dishes, Normandy delivers a culinary experience steeped in tradition.
One of our team members, whose family is from Normandy, gave this experiment a thumbs up—so I’ll count that as a win for this initial test.
Letting Agents Learn Over Time
The second aspect of this experiment is exploring whether an agent can learn in response to user queries or conversations—such that a given user’s version of the agent may become specialized in certain ways compared to a broadly available base agent.
The process is similar to researching from zero, but with some additional experimentation:
- Query Generation: The agent uses an LLM to generate candidate queries based on current context and gaps in its knowledge. These queries aim to cover areas where updated or new information might be beneficial.
- Query Scoring: Each generated query is evaluated and scored (e.g., on a scale of 0–100) based on its relevance, novelty, and alignment with the user’s context.
- Select Top N Queries: The system filters the queries by selecting the top N that score above a predetermined threshold. If there aren’t enough high-scoring queries, the generation step is repeated (with a cutoff to avoid infinite loops).
- Execute Searches: For each selected query, the agent performs searches (semantic or vector-based) against a data source to retrieve the top K relevant results.
- Parse & Ingest Results: The retrieved search results are parsed to extract and clean useful information, which is then incorporated into the agent’s evolving knowledge base.
- Update Knowledge Base & Indexes: The new data is integrated into the knowledge base, and any semantic search indexes are updated, ensuring the repository remains enriched and current.
Example: Updating Restaurant Knowledge
Consider this sample conversation summary:
"Asked about restaurants in West Loop Chicago, showed an interest in Italian, French, and Japanese cuisine. Also, some interest in the chefs of Smyth and Loyalist."
In this scenario, the agent’s existing knowledge base covers a few hundred restaurants, but lacks detailed, up-to-date information on specifics like chefs, current menus, or unique culinary approaches.
Query Generation Example:
Selected top 3 queries:
1. What is the culinary background and philosophy of the chefs at Smyth and Loyalist? (Score: 90)
2. Can you provide information on the signature dishes and unique ingredients used in the French restaurants of the West Loop? (Score: 88)
3. Are there any special dining experiences or events hosted by Smyth or Loyalist that attract diners? (Score: 85)
Here, the agent identifies that its knowledge of restaurants in West Loop Chicago is decent but that it lacks specific details about Smyth and Loyalist. Two of the three queries focus on that gap, while the third covers broader French cuisine ingredients.
Once these queries are executed, the agent pulls down relevant web pages, parses them as markdown, and integrates the information into its backend RAG database. For instance, for the query Chef John Shields
(executive chef at Smyth and Loyalist), the top two results are:
- Result 1:
Title: Smyth | Smyth and The Loyalist
URL: smythandtheloyalist.com/smyth/
Content:
Three-Michelin-starred Smyth, one of only thirteen restaurants in the country with such an honor, creatively journeys to elevate North American farmers, fishermen, and makers at the peak of their season. Husband-and-wife chefs John Shields and Karen Urie Shields leverage 30 years of culinary experience from places like Alinea, Tru, and Charlie Trotter’s.
- Result 2: Name: John’s Food and Wine Description: A New American spot offering a seasonal menu with dishes like beef tartare and whole butterflied mackerel in a casual setting.
Even if some of the later results are less relevant, the agent can synthesize enough useful context to answer questions about Chef John Shields.
Closing Thoughts
These experiments are small steps toward creating agents that can learn and adapt from their interactions with users—a move toward building thousands of agents without needing to custom-build each one. There are, however, many future directions and challenges to tackle:
- Verifying the Quality: How do we verify the quality of information added to the knowledge base?
- Computational Efficiency: Is running hundreds of searches in production feasible?
- Information Sharing: How should agents in similar domains share information or learnings? At what level does this sharing occur? Does information learned by one agent propagate back to the overarching knowledge base for all agents?
While these examples are basic, they illustrate the potential for agents that learn and evolve in response to user interactions. The journey from a generic language model to a specialized, self-improving agent is just beginning—and the possibilities are exciting and vast.