What is Retrieval-Augmented Generation?

The Complete Guide

Retrieval-augmented generation is a framework for improving the accuracy and reliability of large language models using relevant data from internal sources


tableicon Table of Contents


Providing the right answer: Not as simple as it sounds

Auto-generating reliable responses to user queries – based on an organization’s internal information and data – remains an elusive goal for enterprises looking to generate value from their GenAI apps. 

Sure, technologies like machine translation and abstractive summarization can break down language barriers and lead to some satisfying interactions, but, overall, generating an accurate and reliable response is still a significant challenge.

Traditional text generation models, typically based on encoder/decoder architectures, can translate languages, respond in different styles, and answer simple questions. But because these models rely on the statistical patterns found in their training data, they sometimes provide incorrect or irrelevant information, called hallucinations.  

GenAI leverages Large Language Models (LLMs) are trained on massive amounts of publicly available (Internet) information. The few LLM vendors (such as OpenAI, Google, Meta, etc.) don’t retrain their models often due the extremely lengthy time and high cost involved. Since LLMs ingest public data as of a certain time and date, they’re never current – and have no access to the highly valuable, private data stored in an organization.

Retrieval-Augmented Generation (RAG) is an emerging technology that addresses these limitations.

RAG incorporates fresh, trusted data retrieved from a company’s own internal sources – docs stored in document databases and/or data stored in enterprise systems – directly into the generation process.

So, instead of relying solely on its public, static, and dated knowledge base, the Large Language Model (LLM) actively ingests relevant data from a company’s own sources to generate better-informed and relevant outputs.

The RAG model essentially “grounds” the LLM with an organization’s most current and information and data, resulting in more accurate, reliable, and relevant responses.

However, RAG also has its challenges:

  • Companies must maintain up-to-date and accurate information and data.
  • The information and data stored in internal knowledge bases and enterprise systems must be readily accessible and searchable.
  • Generating the most effective and accurate prompts based on the retrieved data requires sophisticated prompt engineering and machine learning to optimize.

Despite these challenges, RAG still represents a great leap forward in GenAI. Its ability to leverage up-to-date internal data addresses the limitations of traditional models by improving user experience with more personalized and reliable exchanges of information.

RAG is already delivering value in several GenAI domains, including customer service, IT service management, sales and marketing, and legal and compliance.

A data product approach to RAG can deliver and translate real-time, context-aware, complete, and compliant business data into intelligent prompts to reduce GenAI hallucinations – and elevate the effectiveness and trust of GenAI apps.

Table of Contents

  • Download as PDF


What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is a design pattern that augments a Large Language Model (LLM) with fresh, trusted data retrieved from authoritative internal knowledge bases and enterprise systems, to generate more informed and reliable responses.

The acronym “RAG” is attributed to the 2020 publication, “Retrieval-Augmented Generation for Knowledge-Intensive Tasks”, submitted by Facebook AI Research (now Meta AI). The paper describes RAG as “a general-purpose fine-tuning recipe” because it’s meant to connect any LLM with any internal data source.

As its name suggests, retrieval-augmented generation inserts a data retrieval component into the response generation process to enhance the relevance and reliability of the answers.

The retrieval model accesses, selects, and prioritizes the most relevant information and data based on the user’s query, transforms it into an enriched, contextual prompt, and invokes the LLM via its API. The LLM responds with an accurate and coherent response to the user.

RAG Framework-1

Inspired by Gartner, this diagram illustrates the retrieval-augmented generation framework:

  1. The user enters a prompt, which triggers the retrieval model.

  2. The retrieval model queries the company’s internal sources (knowledge bases and enterprise systems) for the relevant docs and data.

  3. The retrieval model crafts an enriched prompt – which augments the user’s original prompt with additional contextual information – and passes it on as input to the generation model (LLM API).

  4. The LLM uses the augmented prompt to generate a more accurate and relevant response, which is then sent to the user.

All this takes place in less than a second.

A common RAG analogy is a courtroom, where the judge generally relies on a library of law books. But to make an informed decision today, the judge asks an expert consultant for current advice on a particularly narrow topic. In this example, the judge’s knowledge and library are the LLM and the expert consultant is the retrieval model.

This analogy is also apt because it shows the relative stature and position of each actor. The LLM (the judge and library) dwarfs the retrieval model (the consultant) in terms of scope – imagine massive haystacks of information contained in the LLM, compared to the valuable needles that RAG “unburies”. And in terms of role, while RAG may be consulted, the LLM remains the ultimate authority and decision-maker.

Learn more about the RAG solution for enterprise systems.


RAG Use Cases

RAG use cases for the enterprise can span multiple domains, as shown in the following table:



Types of RAG data

Customer service

Personalize the chatbot experience to each customer to respond more effectively.

  • Order/payment history

  • Customer feedback and/or rating

  • Contract status

  • Network alerts

  • Next or related questions

  • Workflow for call routing

  • Current phone prompts

  • Performance metrics (e.g., first-call resolution, customer satisfaction)

Sales and marketing

Engage with potential customers on a website or via chatbot to describe products and offer recommendations.

  • Market data

  • Economic information

  • Stock market indexes

  • Mergers and acquisitions

  • Product documentation

  • Customer profiles and demographics

  • Targeted personas and behaviors


Respond to Data Subject Access Requests (DSARs) from customers.

  • Industry standards and state regulations

  • Personal customer data stored in enterprise systems

  • Internal approval procedures


Identify fraudulent customer activity.

  • Fraudulent activities and related data, previously detected by the company

  • Real-time customer transaction and activity data 


Gartner on Retrieval-Augmented Generation

In its 12/2023 report, “Emerging Tech Impact Radar: Conversational Artificial Intelligence”, Gartner estimates that widespread enterprise adoption of RAG will take a few years because of the complexities involved in:

  • Applying generative AI (GenAI) to self-service customer support mechanisms, like chatbots

  • Keeping sensitive data hidden from people who aren’t authorized to see it
  • Combining insight engines with knowledge bases, to run the search retrieval function
  • Indexing, embedding, pre-processing, and/or graphing enterprise data and documents

  • Building and integrating retrieval pipelines into applications.

All of the above are challenging for enterprises due to skill set gaps, data sprawl, ownership issues, and technical limitations.

Also, as vendors start offering tools and workflows for data onboarding, knowledge base activation, and components for RAG application design (including conversational AI chatbots), enterprises will more actively support the grounding of GenAI apps for content consumption.

In the Gartner RAG report of 1/2024, “Quick Answer: How to Supplement Large Language Models with Internal Data”, the analysts advise enterprises preparing for RAG to:

1. Select a pilot use case, in which business value can be clearly measured.

2. Classify your use case data, as structured, semi-structured, or unstructured, to decide on the best ways of handling the data and mitigating risk.

3. Get all the metadata you can, because it provides the context for your RAG deployment and the basis for selecting your enabling technologies.


The RAG-LLM Relationship

The RAG lifecycle – from data sourcing to the final output – is based on a Large Language Model or LLM.

An LLM is a foundational Machine Learning (ML) model that employs deep learning algorithms to process and generate natural language. It’s trained on massive amounts of text data to learn complex language patterns and relationships, and perform related tasks, such as text generation, summarization, translation, and, of course, the answering of questions.

These models are pre-trained on large and diverse datasets to learn the intracacies of speech, and can be fine-tuned for specific applications or tasks. The term “large” is even a bit of an understatement because these models can contain billions of data points. For example, ChatGPT4 is reported to have over a trillion.

RAGs leverage LLMs to execute their retrieval and generative capabilities by:

1. Sourcing

The starting point of any RAG system is data sourcing, typically from internal text documents and enterprise systems. The source data is essentially your company’s knowledge base that the retrieval model searches through to identify and collect relevant information. To ensure accurate, diverse, and trusted data sourcing, you must also manage and minimize data redundancy.

2. Unifying data for retrieval

Enterprises should organize their data and metadata in such a way that RAG can access it instantly. For example, your customer 360 data, including master data, transactional data, and interaction data, should be unified for real-time retrieval. Depending on the use case, you may need to arrange your data by other business entities, such as employees, products, suppliers, or anything else that’s relevant for your use case.

3. Chunking documents

Before the retrieval model can work effectively on unstructured documents, it’s advisable to divide the data up into more manageable chunks. Effective chunking can improve retrieval performance and accuracy. For example, a document may be a chunk on its own, but it could also be chunked down further into sections, paragraphs, sentences, or even words.

4. Embedding (converting text to vector formats)

Textual data in documents must be converted into a format that RAG can use for search and retrieval. This might mean transforming the text into vectors that are stored in a vector database by a process called “embedding”. The embeddings are then linked back to the source data, enabling the creation of more accurate and meaningful responses.

5. Protecting sensitive data

The sensitive data retrieved by RAG must never be seen by unauthorized users – such as credit card information by salespeople, or Social Security Numbers by service agents. The RAG solution must employ dynamic data masking and role-based access controls to achieve this.

6. Engineering the prompt  

The RAG solution must automatically generate an enriched prompt by building a “story” out of the retrieved 360-degree data. There needs to be an ongoing tuning process for prompt engineering, ideally aided by Machine Learning (ML) models.

complimentary DOWNLOAD

Gartner RAG tips: FREE
in this condensed version.

Get Gartner Tips
Gartner RAG Tips@72x-8


Retrieval-Augmented Generation Benefits

By deploying RAG, enterprises benefit from:

1. Quicker time to value, at lower cost

Training an LLM takes a long time and is very costly. By offering a more rapid and affordable way to introduce new data to the LLM, RAG makes GenAI accessible and reliable for customer-facing operations.

2. Personalization of user interactions

By integrating specific customer 360 data with the extensive general knowledge of the LLM, RAG personalizes user interactions via chatbots, and marketing insights like cross-sell and up-sell recommendations by human customer service agents.

3. Improved user trust

RAG-powered LLMs deliver reliable information through a combination of data accuracy, freshness, and relevance – personalized for a specific user. User trust protects and even elevates the reputation of your brand.


Adding Enterprise Data to the Mix

The need for RAG to access enterprise data from structured data sources is critical in customer-facing scenarios, where personalized, accurate responses are required in real time. Examples include human customer care agents answering customer service calls, or a conversational AI bot providing self-service technical support to authenticated users on a company’s portal.

For this to happen, you need to organize your enterprise data by whichever business entities are relevant for your use case, be they customers, employees, suppliers, products, loans, etc. 

K2view manages the data for each business entity in its own, high-performance Micro-Database™ – for real-time data retrieval by the RAG framework.

Designed to continually sync with underlying sources, the Micro-Database can be modeled to capture any business entity, and comes with built-in data access controls and dynamic data masking capabilities. And its data is compressed by up to 90%, enabling billions of Micro-Databases to be managed on commodity hardware.

To summarize, Micro-Database technology enables:

  • Real-time data access, for any entity
  • 360-degree view of the data, for any entity
  • Fresh data all the time, in sync with underlying systems
  • Dynamic data masking, to protect sensitive data
  • Role-based access controls, to safeguard data privacy
  • Reduced TCO, via a small footprint running on commodity hardware


RAG Chatbot: The Natural Starting Point

When you want a quick answer to a question, a company’s RAG chatbot can be extremely helpful. The problem is that most bots are trained on a limited number of intents (or question/answer combinations) and they lack context, in the sense that they give the same answer to different users. Therefore, their responses often ineffective – making their usefulness questionable.

RAG can make conventional bots a lot smarter by empowering the LLM to provide answers to questions that aren’t on the intent list – and that are contextual to the user.

For example, an airline chatbot responding to a platinum frequent flyer asking about an upgrade on a particular flight based on accumulated miles won’t be very helpful if it winds up answering, “Please contact frequent flyer support."

But once RAG augments the airline’s LLM with that particular user’s dataset, a much more contextual response could be generated, such as, “Francis, you have 929,100 miles at your disposal. For Flight EG17 from New York to Madrid, departing at 7 pm on November 5, 2024, you could upgrade to business class for 30,000 miles, or to first class for 90,000 miles. How would you like to proceed?”

What a difference a RAG makes.

The Q&A nature of chat interactions make the RAG chatbot an ideal pilot use case because understanding the context of a question – and of the user – leads to a more accurate, relevant, and satisfying response.

In fact, chatbots are the natural entry point for RAG and other GenAI apps.


Future of Retrieval-Augmented Generation

New RAG GenAI use cases are emerging all over the place, for example a:

A stock investor wishing to see the commissions she was charged over the last quarter

A hospital patient wanting to compare the drugs he received to the payments he made

A telco subscriber choosing a new Internet-TV-phone plan since his is about to terminate

A car owner requesting a 3-year insurance claim history to reduce her annual premium

Today, RAG is mainly used to provide accurate, contextual, and timely answers to questions – via chatbots, email, texting, and other conversational AI applications.

In the future, RAG GenAI might be used to suggest appropriate actions to contextual information and user prompts.

For example, if today RAG GenAI is used to inform army veterans about reimbursement policies for higher education, in the future it might list nearby colleges, and even recommend programs based on the applicant’s previous experience and military training. It may even be able to generate the reimbursement request itself.


Retrieval-Augmented Generation via Data Products

When all’s said and done, it’s difficult to deliver strategic value from GenAI today because your LLMs lack business context, cost too much to retrain, and hallucinate too frequently.

Data products – reusable data assets that combine data with everything needed to make them independently accessible by authorized users – power RAG use cases via context derived from an organization’s internal information and data.

Data products  can:

  1. Feed real-time data about a particular customer or any other business entity
  2. Dynamically mask PII (Personally Identifiable Information) or other sensitive data
  3. Be reused for handling data service access requests, or for suggesting cross-sell recommendations
  4. Access enterprise systems via API, CDC, messaging, streaming – in any combination – to unify data from multiple source systems

Empowering RAG with real-time data products is useful for many use cases, such as:

issue resolution

hyper-personalized marketing campaigns

personalized cross-/up-sell recommendations for call center agents

fraud by identifying suspicious activity
in a user account

Learn more about the Data Product Platform that powers RAG

Interested in leveraging your enterprise data
to ground your GenAI apps?

Get Demo

Retrieval-Augmented Generation FAQ

1. What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a Generative AI (GenAI) framework that augments a Large Language Model (LLM) with fresh, trusted data retrieved from authoritative internal knowledge bases and enterprise systems to generate more informed and reliable responses to user queries from both inside and outside the organization.

2. What’s the relationship between GenAI, LLMs, and RAG?

GenAI is any type of AI that creates new content. An excellent example of GenAI are LLMs, which can generate text, answer questions, translate languages, and even write creative content. RAG is a technique that enhances LLM capabilities by augmenting its external knowledge with fresh, trusted internal data and information and data.

3. What types of information and data does RAG make use of?

  • Unstructured data found in knowledge bases such as articles, books, conversations, documents, and web pages
  • Structured data found in databases, knowledge graphs, and ontologies
  • Domain-specific knowledge, metadata, and user context

4. Can RAG provide sources for the information it retrieves?

Yes. If the knowledge bases accessed by RAG have references to the retrieved information, sources can be cited. Also, if an error is found in a particular source, it can be easily corrected or deleted, so that future questions won’t be answered with incorrect information.

5. How does RAG differ from traditional generative models?

Compared to traditional models that generate responses based on input context only, retrieval-augmented generation retrieves relevant information from internal sources before generating an output. This process leads to more accurate and contextually rich responses, and, therefore, more positive and satisfying user experiences.

6. What is RAG's main component?

As its name suggests, RAG inserts a data retrieval component into the generation process, aimed at enhancing the relevance and reliability of the generated responses.

7. How does the retrieval model work?

The retrieval model collects and prioritizes the most relevant information and data based on the user query, transforms it into an enriched, contextual prompt, and invokes the LLM via its API. The LLM responds with an accurate and coherent response to the user.

8. What are the advantages of using RAG?

  • Quicker time to value at lower cost

  • Personalization of user interactions

  • Improved user trust

9. What challenges are associated with RAG?

  • Accessing all the information and data stored in internal knowledge bases and enterprise systems in real time

  • Generating the most effective and accurate prompts for the RAG framework

  • Keeping sensitive data hidden from people who aren’t authorized to see it

  • Building and integrating retrieval pipelines into applications

10. When is RAG most helpful?

Retrieval-augmented generation has various applications such as conversational agents, customer support, content creation, and question answering systems. It proves particularly useful in scenarios where access to internal information and data enhances the accuracy and relevance of the generated responses.

11. What’s the relationship between GenAI, LLMs, and RAG?

GenAI is any type of AI that creates new content. An excellent example of GenAI are LLMs, which can generate text, answer questions, translate languages, and even write creative content. RAG is a technique that enhances LLM capabilities by augmenting its external knowledge with fresh, trusted internal data and information.