Table of Contents

    Table of Contents

    Active Retrieval-Augmented Generation – For Even Better Responses

    Oren Ezra

    Oren Ezra

    CMO, K2view

    Active retrieval-augmented generation improves passive RAG by fine-tuning the retriever based on feedback from the generator during multiple interactions.    

    Passive vs Active RAG 

    Text generation is the process of creating natural language text from inputs like prompts and queries. Text generation is usually based on Large Language Models (LLMs) trained on massive amounts of publicly available information (usually via the Internet). The problem is that LLMs don’t have access to all the relevant information or data needed to generate accurate, personalized responses.

    To address this issue, researchers came up with the idea of augmenting LLMs with an enterprise’s own internal knowledge bases and systems – called Retrieval-Augmented Generation (RAG).

    Simply explained, RAG consists of 2 components: a retriever and a generator. The retriever identifies and chooses the most relevant data from the internal sources based on the prompt or query. The generator integrates the retrieved data into the LLM, which then produces a more informed answer.  

    Depending on how the retriever and the generator interact with each other, RAG-LLM integration can be passive or active – leading to a passive vs active RAG comparison in the next 2 sections.

    Get the condensed version of the Gartner RAG report for free

    Passive Retrieval-Augmented Generation 

    Passive RAG is characterized by one-way interaction between retriever and generator. Basically, the retriever chooses the most relevant internal data and passes it on to the generator. The generator, in turn, produces the response without any additional help from the retriever.

    So, the retriever can’t adjust its selection based on partial text created by the generator, and the generator can’t request additional information from the retriever during the generation process.

    Although passive RAG is relatively simple – with one-step retrieval and one-step generation – it has its limitations, such as the:  

    • Retriever not being able to choose the most relevant data, because it lacks context in terms of both the question and the questioner

    • Generator not being able to use the retrieved data in the best way, due to incompleteness, irrelevance, or redundancy

    • Framework coming up with contradictory or inconsistent answers, since it can’t verify and/or update the retrieved data 

    Passive RAG examples: 

    Method 

    Retriever basis 

    Generator basis 

    RAG (Lewis…, 2020) 

    Bi-encoder 

    Pre-trained language model 

    REALM (Guu…, 2020) 

    Masked language modeling, trained with the generator 

    Kaleido-BERT (Zhu…, 2020) 

    Graph neural network objective, with a transformer encoder 

    Active Retrieval-Augmented Generation 

    Passive RAG is characterized by two-way interaction between retriever and generator, in which both communicate with each other during the generation process to update the retrieved data and the generated text on the fly.

    With active RAG, the retriever can choose different data elements for different generation steps, based on partial text produced by the generator, and the generator can request additional or supplemtary data from the retriever during the generation process, based on its uncertainty.

    Although active RAG is more complex – because it entails multiple retrieval and generation steps – it’s also more flexible and may address some of the limitations of passive RAG, such as the:

    • Retriever choosing the most relevant data, since it can put the query and the questioner into better context

    • Generator using the retrieved data more effectively, because it can be matched to the specific generation

    • Framework producing more accurate and relevant answers, due to its ability to verify and/or update the retrieved data 

    Active RAG examples: 

    Method 

    Retriever basis 

    Generator basis 

    ReCoSa (Zhang …, 2019) 

    Transformer encoder, with a
    self-service mechanism 

    Hierarchical recurrent
    neural network

    ARAG (Zhao …, 2020) 

    Bi-encoder, with a reinforcement learning algorithm 

    Pre-trained language model 

    FiD (Izacard & Grave, 2020) 

    Bi-encoder, with a fusion-in-decoder mechanism  

    RAG Summary 

    Retrieval-augmented generation has the potential to address the shortcomings of traditional text generation models. By incorporating internal knowledge in the content generation phase, RAG is setting a new standard for relevance and accuracy.

    RAG GenAI frameworks are beneficial to enterprises because they can: 

    1. Integrate trusted, internal data with the external data your LLM has been trained on.

    2. Sync with your source systems, ensuring your data and LLM are current.

    3. Enhance user trust, by providing data accuracy, freshness, and relevance. 

    With one-step retrieval and generation, passive RAG is relatively simple but also limited by the quality and relevance of the exising data.

    With multiple retrieval and generation steps, active RAG is more complex but also more flexible in addressing the limitations of passive RAG.

    In a 2024 report called, “How to Supplement Large Language Models with Internal Data”, Gartner lets enterprises know how they can prepare for RAG AI deployments, listed as Gartner RAG tips in a separate article. 

    Don’t Take RAG Too Personally 

    Until now, RAG of any kind (passive or active) concerns itself with docs retrieved from internal knowledge bases, meaning that the information gleaned from the vector files is generalized, instead of personalized.

    For example, if you wanted to to check your vacation day status in your organization, your RAG chatbot might provide a generalized answer based on a company policy doc retrieved from your knowledge base. Let’s face it, a response like, “Every employee earns 1.5 vacation days for every month worked” isn’t very personal, is it?

    Now imagine a framework that could go right to the source, accessing all the data related to you, as an employee, within your organization. Wouldn’t an answer like, “Hey Parker, you’ve got 13 vacation days at your disposal this year – 10 credited from last year + 3 for the 2 months you worked this year.” That’s where DAG comes in. 

    Data-Augmented Generation to the Rescue 

    Data-Augmented Generation (DAG) accesses enterprise data from structured data sources for customer-facing scenarios, where personalized, accurate responses are required in real time. Use cases include human customer care agents answering customer service calls, or a chatbot providing self-service technical support to authenticated users on a company’s portal.

    For this to happen, you need to organize your enterprise data by data products – according to the business entities that are relevant for your use case, be they customers, employees, suppliers, products, loans, etc. 

    DAG manages the data for each business entity in its own, high-performance Micro-Database™ – for real-time data retrieval.  

    Designed to continually sync with underlying sources, the Micro-Database can be modeled to capture any business entity, and comes with built-in data access controls and dynamic data masking capabilities. And its data is compressed by up to 90%, enabling billions of Micro-Databases to be managed concurrently on commodity hardware.

    Discover the K2view Data Product Platform

    Achieve better business outcomeswith the K2view Data Product Platform

    Solution Overview

    Discover the #1
    RAG tool

    Built for enterprise complexity

    Solution Overview