For me, 2024 was the year of RAG, 2025 was the year of Agents. 2026 will be the year of Agentic RAG. This blog post explains why Agentic RAG differs from the 2024 version of RAG. After some theory, you follow along to create an application that enables you to interact with several documents via chat using Agentic Search—everything made possible by the new Embabel implementation of RAG.
What did the 2024 RAG look like? #
LLMs (Large Language Models) do not have detailed knowledge of everything. They have a knowledge cut-off from when they were trained. Smaller models, in general, have less factual knowledge. They focus on understanding language, not on knowing all the facts. If you want an LLM to know more about something recent or personal, you have to provide it in the prompt you pass to the LLM with your request.
RAG (Retrieval Augmented Generation) was born. The user request was sent to a retrieval system, often a vector database, to find similar chunks to the user’s request. A good example is a dataset of frequently asked questions: when a user asks a question, you can find a similar one and add its answers to the prompt to let the LLM answer the user’s question.
It is essential to have chunks with a similar scope to the user’s question. If you are looking for a specific part in a book, a vector of the entire book is not sufficient. You need vectors of paragraphs or sections. To find matching chunks, you need the right chunking strategy.
Once you’ve identified the right chunks, the retrieval strategy is just as important. You can go up in the hierarchy; if you find a section, you can return the whole paragraph. Or if you found the matching question, retrieve its answers.
Visit one of my previous blogs if you want to read more about this topic.
Getting the proper context for RAG is choosing your chunking and retrieval strategy.
Agentic retrieval in 2025 #
Agents automated parts of working with an LLM; they introduced long-running processes that can work for you in the background. Agents introduced an execution environment for tools, and of course, MCP. With MCP, more tools became available. Vector databases became first-class citizens for agents to retrieve information. Boy, how often have we been disappointed? It was hard to find semantically similar text that delivered real value to the agent. The reason is often the mismatch, as discussed in the section about LLMs, between the vectorised chunks and the input from the user or agent.
There are 100s of agent frameworks. I wrote about multiple frameworks on this blog. One of them is Embabel.
Building Agents with Embabel: A Hands-On Introduction
In the 0.3.1 release of Embabel, Agentic RAG is introduced. The implementation is very close to what I think is the right way of doing RAG and helps agents get the most out of it. Of course, this still means you have to explain to Embabel what content you have and, most of all, choose the right strategy for handling it. You must choose the datastore wisely: an inverted index, a vector store, a graph database, or a smart combination. That is exactly what Embabel now has. In the next section, you follow along while I create a sample application using the Embabel RAG implementation.
Embabel Agentic RAG #
With Embabel Agentic RAG, the LLM has full control over the retrieval process. It decides when to search, which queries to run, and which items to retrieve. The LLM can use the results to fetch more data, expand the scope, or even modify the query.
Content access is provided by stores. Each store can provide one or multiple SearchOperations. These search operations are exposed as tools. Known search operations are: VectorSearch, TextSearch, ResultExpander, and RegexSearchOperations. The Lucene store exposes all of them, whereas the Spring AI Vectorstore exposes only the vector search tools.
The source code #
You can find the sample application at GitHub.
GitHub - jettro/embabel-agent-rag-sample: A project demonstrating the embabel RAG implementation
The repository contains the agent, a Java Embabel project. It also has a frontend project built with Vite, React, and Chakra UI. The data folder contains Markdown files for 10 of my blog posts. The extract folder contains a Python script that extracts markdown files from blogs.
Setting up the project #
Dependencies for Embabel and Spring Boot are managed in the parent project’s POM. The project’s actual dependencies are configured in the agent module’s pom.
The project uses OpenAI models for LLM and embeddings. Therefore, we use the OpenAI starter.
<dependency>
<groupId>com.embabel.agent</groupId>
<artifactId>embabel-agent-starter-openai</artifactId>
<version>${embabel-agent.version}</version>
</dependency>For RAG, you need to add a store. The project uses the Lucene store. Therefore, the project needs the following dependency.
<dependency>
<groupId>com.embabel.agent</groupId>
<artifactId>embabel-agent-rag-lucene</artifactId>
<version>${embabel-agent.version}</version>
</dependency>On the JVM, Apache Tika is a well-known library for parsing a wide range of content. Embabel has a wrapper around Tike to make the integration effortless.
<dependency>
<groupId>com.embabel.agent</groupId>
<artifactId>embabel-agent-rag-tika</artifactId>
<version>${embabel-agent.version}</version>
</dependency>Embabel ChatBot #
All Embabel agents support RAG integration. For this example, the ChatBot agent is chosen. I leave the working of the ChatBot agent for another blog. It focuses on a Conversation object to hold the UserMessage and AssistantMessage and provide it to the LLM for the conversation history.
For this blog, we focus on the RAG integration.
Configure the LuceneSearchOperations #
We begin by initialising the Lucene-based search operations. The following code block configured the Spring Bean for the LuceneSearchOperations.
@Bean
LuceneSearchOperations luceneSearchOperations(ModelProvider modelProvider) {
var embeddingService = modelProvider.getEmbeddingService(DefaultModelSelectionCriteria.INSTANCE);
return LuceneSearchOperations
.withName("sources")
.withEmbeddingService(embeddingService)
.withChunkerConfig(new ContentChunker.DefaultConfig(800,100, false))
.withIndexPath(Path.of("./.lucene-index"))
.buildAndLoadChunks();
}The model provider is an Embabel bean configured in the Embabel autoconfigure classes. In addition to the embedding service, you need to provide the chunker strategy. This example configures the default ContentChunker. This chunker operates on a Root that contains containers. Each container can contain other containers or a leaf, which is the end of the nested path. Notice the path where the Lucene index is stored.
Wrap SearchOperations in the ToolishRag object #
The ChatBot agent does not need a goal. It contains actions for handling user requests. First, create the ChatActions object.
@EmbabelComponent
public class ChatActions {
private static final Logger logger = LoggerFactory.getLogger(ChatActions.class);
private final ToolishRag toolishRag;
public ChatActions(SearchOperations searchOperations) {
this.toolishRag = new ToolishRag(
"sources",
"sources for answering user questions",
searchOperations);
}
}The injection SearchOperations are the LuceneSearchOperations from the previous code sample. Notice the creation of the ToolishRag instance with a name and a short description. Next, we need to specify an action that will call the LLM with the provided Conversation and ActionContext.
@Action(canRerun = true, trigger = UserMessage.class)
public void respond(Conversation conversation, ActionContext context) {
var lastUserMessage = conversation.lastMessageIfBeFromUser();
if (lastUserMessage != null) {
logger.info("Received user message as last message: {}", lastUserMessage.getContent());
var assistantMessage = context.ai()
.withLlmByRole(CHEAPEST.name())
.withReferences(toolishRag)
.withSystemPrompt("You are a helpful assistant. Answer questions concisely.")
.respond(conversation.getMessages());
context.sendMessage(conversation.addMessage(assistantMessage));
} else {
logger.info("Received non-user message");
}
}Notice the canRerun field, which specifies whether the LLM can repeat the action. The trigger is a special mechanism that only handles the first time a specific UserMessage is added to the conversation. Note that we read the user message from the conversation with the method lastMessageIfBeFromUser. Note that for the LLM call, the model to use is configured via a role. This role is specified in the application.yml.
embabel:
models:
default-llm: gpt-5-mini
default-embedding-model: text-embedding-3-small
llms:
CHEAPEST: gpt-5-mini
standard: gpt-5-mini
best: gpt-5
embedding-services:
fast: text-embedding-3-small
accurate: text-embedding-3-large
agent:
logging:
personality: montypythonMaking the RAG tools available to the agent, you just configure the withReferences. That is it.
Ingesting documents into the Lucene store #
Before you can retrieve documents from Lucene, you have to ingest documents. Embabel has an easy structure to do just that. It reuses the SearchOperations to expose the store. And in this case, you use Tika to extract chunks in a hierarchical manner. The following code block reads all files in the data folder, creates chunks, and generates embeddings for those chunks.
@PostMapping("/ingest")
public String ingestData() {
var dataPath = Path.of("./data");
int count = 0;
try (var stream = Files.list(dataPath)) {
var files = stream.filter(Files::isRegularFile).toList();
for (Path file : files) {
var fileUri = file.toAbsolutePath().toUri().toString();
var ingested = NeverRefreshExistingDocumentContentPolicy.INSTANCE.ingestUriIfNeeded(
searchOperations,
new TikaHierarchicalContentReader(),
fileUri
);
if (ingested != null) {
count++;
}
}
} catch (java.io.IOException e) {
logger.error("Error reading data directory", e);
return "Error reading data directory: " + e.getMessage();
}
return "Successfully ingested " + count + " files";
}If you are curious, the extract folder contains a Python script that converts my blog post to a Markdown file. It uses the trafilatura library to do it. I leave it to you to review the source code.
Calling the RAG enhanced agent #

GUI showcasing Embabel Agentic RAG at work
To demonstrate that it works, we show you some application logs. I used the Monty Python persona for the logs; I hope it does not bother you.
These lines show that the LuceneSearchOperations are initialised. Can you find the tools that ToolishRag exposes?
00:45:28.321 [main] INFO LuceneSearchOperations - Using disk-based Lucene index at: ./.lucene-index
00:45:28.321 [main] INFO LuceneSearchOperations - Manually triggering chunk loading from disk...
00:45:28.321 [main] INFO LuceneSearchOperations - Triggering lazy loading of existing chunks...
00:45:28.321 [main] INFO LuceneSearchOperations - Starting to load existing chunks from disk index...
00:45:28.322 [main] INFO LuceneSearchOperations - Opening DirectoryReader to read from disk
00:45:28.339 [main] INFO LuceneSearchOperations - Successfully opened reader. Index has 449 documents (maxDoc: 449)
00:45:28.387 [main] INFO LuceneSearchOperations - ✅ Loaded 449 existing elements from disk index ./.lucene-index
00:45:28.389 [main] INFO ToolishRag - Adding VectorSearchTools to ToolishRag tools sources
00:45:28.389 [main] INFO ToolishRag - Adding TextSearchTools to ToolishRag tools sources
00:45:28.389 [main] INFO ToolishRag - Adding ResultExpanderTools to ToolishRag tools sourcesBelow is the system message generated using the SearchOperations. Note how the Lucene search operations are exposed to the LLM.
SYSTEM <Current date: 2026-01-08
----
Reference: sources
Description: sources for answering user questions
Tool prefix: sources
Notes: Lucene search syntax support: Full support
Hints:
Search acceptance criteria:
Continue search until the question is answered, or you have to give up.
Be creative, try different types of queries.
Be thorough and try different approaches.
If nothing works, report that you could not find the answer.
----
You are a helpful assistant. Answer questions concisely.
----
Knowledge cutoff: 2024-05
>Some logs that show the first query performed by the tools, as requested by the agent.
00:46:11.084 [tomcat-handler-10] INFO FlyingCircus - [vibrant_shirley] (respond) calling tool sources_vectorSearch({"query":"Embabel blog Embabel blog \"Embabel\"","topK":10,"threshold":0.2})
00:46:11.086 [tomcat-handler-10] INFO VectorSearchTools - Performing vector search with query='Embabel blog Embabel blog "Embabel"', topK=10, threshold=0.2
00:46:13.876 [tomcat-handler-10] INFO LuceneSearchOperations - Vector search for query 'Embabel blog Embabel blog "Embabel"' found 20 results
00:46:13.881 [tomcat-handler-10] INFO FlyingCircus - [vibrant_shirley] (respond) tool sources_vectorSearch returned 20 results:
chunkId: 068577f8-cbee-4080-b8f8-4b69559cc24b 0.78 - This is my blog about Embabel
This is a sample blog post about the Embabel project.
... A lot more here ...Next in the logs, you see fine-tuning of the queries. Note that it does a text search only, but with AND and OR to better select the right content.
00:46:17.602 [tomcat-handler-10] INFO FlyingCircus - [vibrant_shirley] (respond) calling tool sources_textSearch({"query":"\"Embabel\" AND (blog OR \"blog post\" OR tutorial OR \"Building Agents with Embabel\" OR jettro.dev OR \"jettro\" OR \"Medium\")","topK":20,"threshold":0.2})
00:46:17.603 [tomcat-handler-10] INFO TextSearchTools - Performing text search with query='"Embabel" AND (blog OR "blog post" OR tutorial OR "Building Agents with Embabel" OR jettro.dev OR "jettro" OR "Medium")', topK=20, threshold=0.2
00:46:17.622 [tomcat-handler-10] INFO LuceneSearchOperations - Text search for query '"Embabel" AND (blog OR "blog post" OR tutorial OR "Building Agents with Embabel" OR jettro.dev OR "jettro" OR "Medium")' found 11 results
00:46:17.623 [tomcat-handler-10] INFO FlyingCircus - [vibrant_shirley] (respond) tool sources_textSearch returned 11 results:This happens a few times more: it performs a different text search and a different vector search. Be honest, the question is a bit broad, so it needs a few passes to ensure it can be answered. But in the end, we do get back a decent answer.
00:46:44.234 [tomcat-handler-10] INFO FlyingCircus - [vibrant_shirley] (dev.jettro.knowledge.ChatActions.respond-java.lang.String) received LLM response of type String from ByRoleModelSelectionCriteria(role=CHEAPEST) in 36 seconds
00:46:44.235 [tomcat-handler-10] INFO ControllerOutputChannel - Response MessageOutputChannelEvent: Yes — a few good blog posts and tutorials about Embabel I can point you to (mostly from Jettro Coenradie, who has written practical guides and examples):
- Building Agents with Embabel: A Hands-On Introduction — Jettro Coenradie
https://jettro.dev/building-agents-with-embabel-a-hands-on-introduction-4f96d2edeac0
(Hands‑on walkthrough for creating an Embabel agent on the JVM; also mirrored on Medium: [https://medium.com/@jettro.coenradie/building-agents-with-embabel-a-hands-on-introduction-4f96d2edeac0)](https://medium.com/@jettro.coenradie/building-agents-with-embabel-a-hands-on-introduction-4f96d2edeac0))
... more content here ...I hope you are as enthusiastic as I am about this new Embabel feature. In the next blog, I take a deep dive into the ChatBot Agent used in this example.