The Complete Guide

What is Retrieval Augmented Generation?

Last updated on 11 March 2024

Get RAG Demo
Retrieval-Augmented Generation (RAG) and AI

Retrieval-augmented generation (RAG) is a GenAI framework for improving the accuracy and reliability of large language models (LLMs), using relevant data from company sources.

Auto-generating reliable responses to user queries – based on an organization’s private information and data – remains an elusive goal for enterprises looking to generate value from their GenAI apps. Sure, technologies like machine translation and abstractive summarization can break down language barriers and lead to some satisfying interactions, but, overall, generating an accurate and reliable response is still a significant challenge. 

 

Get RAG Demo

01

Providing reliable responses is not so easy

Traditional text generation models, typically based on encoder/decoder architectures, can translate languages, respond in different styles, and answer simple questions. But because these models rely on the statistical patterns found in their training data, they sometimes provide incorrect or irrelevant information, called hallucinations.  

GenAI leverages Large Language Models (LLMs) that are trained on massive amounts of publicly available (Internet) information. The few LLM vendors (such as Microsoft, Google, AWS, and Meta) can't frequently retrain their models due the  high cost and time involved. Since LLMs ingest public data as of a certain time and date, they’re never current – and have no access to the highly valuable, private data stored in an organization.

Retrieval-Augmented Generation (RAG) is an emerging GenAI technology that addresses these limitations.

A complete RAG implementation – one that retrieves structured data from enterprise systems, as well as unstructured data from knowledge bases – can transform real-time, multi-source business data into intelligent, context-aware, and compliant prompts to reduce GenAI hallucinations and elevate the effectiveness and trust of GenAI apps.

 

02

What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is a Generative AI (GenAI) framework that augments a Large Language Model (LLM) with fresh, trusted data retrieved from authoritative internal knowledge bases and enterprise systems, to generate more informed and reliable responses.

The acronym “RAG” is attributed to the 2020 publication, “Retrieval-Augmented Generation for Knowledge-Intensive Tasks”, submitted by Facebook AI Research (now Meta AI). The paper describes RAG as “a general-purpose fine-tuning recipe” because it’s meant to connect any LLM with any internal data source.

As its name suggests, retrieval-augmented generation inserts a data retrieval component into the response generation process to enhance the relevance and reliability of the answers.

A retrieval model accesses, selects, and prioritizes the most relevant information and data from the appropriate sources based on the user’s query, transforms it into an enriched, contextual prompt, and invokes the LLM via its API. The LLM responds with an accurate and coherent response to the user.

 

RAG architecture

 

This diagram illustrates the retrieval-augmented generation framework:

  1. The user enters a prompt, which triggers the structured data retrieval model and unstructured data retrieval mode., as required.

  2. The retrieval models query the company’s internal sources (knowledge bases and enterprise systems) for the relevant docs and data.

  3. The retrieval models crafts an enriched prompt – which augments the user’s original prompt with additional contextual information – and passes it on as input to the generation model (LLM API).

  4. The LLM uses the augmented prompt to generate a more accurate and relevant response, which is then provided to the user.

All this takes place in less than a second.

A common RAG analogy is a courtroom, where the judge generally relies on a library of law books. But to make an informed decision, the judge asks an expert consultant for current advice on a particularly narrow topic. In this example, the judge’s knowledge and library are the LLM and the expert consultant is the retrieval model.

This analogy is also apt because it shows the relative stature and position of each actor. The LLM (the judge and library) dwarfs the retrieval model (the consultant) in terms of scope – imagine massive haystacks of information contained in the LLM, compared to the valuable needles that RAG “unburies”. And in terms of role, while RAG may be consulted, the LLM remains the ultimate authority and decision-maker.

03

Retrieval-Augmented Generation challenges

An organization typically stores its documentation of internal processes, procedures, and operations in offline and online document repositories. Moreover, it also has a wealth of its business data fragmented across dozens of enterprise systems, such as sales CRM, customer service, billing, and ERP.

RAG integrates fresh, trusted data retrieved from a company’s internal sources – docs stored in document databases and data stored in enterprise systems – directly into the generation process.

So, instead of relying solely on its public, static, and dated knowledge base, the Large Language Model (LLM) actively ingests relevant data from a company’s own sources to generate better-informed and relevant outputs.

The RAG model essentially “grounds” the LLM with an organization’s most current and information and data, resulting in more accurate, reliable, and relevant responses.

However, RAG also has its challenges:

  • RAG relies on enterprise-wide data retrieval. Not only must you maintain up-to-date and accurate information and data, you must also have accurate metadata in place that describes your data for the RAG framework.
  • The information and data stored in internal knowledge bases and enterprise systems must be RAG-ready: accessible, searchable, and of high quality. For example, if you have 100 million customer records – can you access and integrate all of David Smith’s details in less than a second?
  • To generate smart contextual prompts, you’ll need sophisticated prompt engineering capabilities, including chain-of-thought prompting, to inject the relevant data into the LLM in a way that generates the most accurate responses.
  • To ensure your data remains private and secure, you must limit your LLM’s access to authorized data only – per the example above, that means only David Smith’s data and nobody else’s.

Despite these challenges, RAG still represents a great leap forward in GenAI. Its ability to leverage up-to-date internal data addresses the limitations of traditional models by improving user experience with more personalized and reliable exchanges of information.

RAG is already delivering value in several GenAI domains, including customer service, IT service management, sales and marketing, and legal and compliance.

04

Retrieval-Augmented Generation use cases

RAG use cases for the enterprise can span multiple domains, as shown in the following table:

Department

Objective

Types of RAG data

Customer service

Personalize the chatbot response to the customer’s precise needs, behaviors, status and preferences, to respond more effectively.

  • Order/payment history

  • Customer status

  • Contract status

  • Network alerts

  • Next or related questions

  • Workflow for call routing

  • Agent script

  • Call history and first contact resolution

  • Customer feedback history

Sales and marketing

Engage with potential customers  via a chatbot or a sales consultant to describe products and offer recommendations.

  • Product documentation

  • Product specs

  • Customer profile and demographics

  • Customer preferences and purchase history

  • Targeted personas and behaviors

  • Campaigns

Compliance

Respond to Data Subject Access Requests (DSARs) from customers.

  • Industry standards and state regulations

  • Personal customer data stored in enterprise systems

  • Internal approval procedures

Risk

Identify fraudulent customer activity.

  • Fraudulent activities and related data, previously detected by the company

  • Real-time customer transaction and activity data 

05

Benefits of Retrieval-Augmented Generation

For GenAI, your data is your differentiator. By deploying RAG, your organization can benefit from:

1. Quicker time to value, at lower cost

Training an LLM takes a long time and is very costly. By offering a more rapid and affordable way to introduce new data to the LLM, RAG makes GenAI accessible and reliable for customer-facing and back-office operations.

2. Personalization of user interactions

By integrating specific customer 360 data with the extensive general knowledge of the LLM, RAG personalizes user interactions via chatbots and customer service agents, with next-best-action and cross-sell and up-sell recommendations tailored for the customer in real time.

3. Improved user trust

RAG-powered LLMs deliver reliable information through a combination of data accuracy, freshness, and relevance – personalized for a specific user. User trust protects and even elevates the reputation of your brand.

06

Gartner's recommendations for implementing RAG

In its 12/2023 report, “Emerging Tech Impact Radar: Conversational Artificial Intelligence”, Gartner estimates that widespread enterprise adoption of RAG will take a few years because of the complexities involved in:

  • Applying generative AI (GenAI) to self-service customer support mechanisms, like chatbots

  • Keeping sensitive data hidden from people who aren’t authorized to see it
  • Combining insight engines with knowledge bases, to run the search retrieval function
  • Indexing, embedding, pre-processing, and/or graphing enterprise data and documents

  • Building and integrating retrieval pipelines into applications.

All of the above are challenging for enterprises due to skill set gaps, data sprawl, ownership issues, and technical limitations.

Also, as vendors start offering tools and workflows for data onboarding, knowledge base activation, and components for RAG application design (including conversational AI chatbots), enterprises will more actively support the grounding of GenAI apps for content consumption.

In the Gartner RAG report of 1/2024, “Quick Answer: How to Supplement Large Language Models with Internal Data”, Gartner advises enterprises preparing for RAG to:

1. Select a pilot use case, in which business value can be clearly measured.

2. Classify your use case data, as structured, semi-structured, or unstructured, to decide on the best ways of handling the data and mitigating risk.

3. Get all the metadata you can, because it provides the context for your RAG deployment and the basis for selecting your enabling technologies.

In its 1/2024 article, “Architects: Jump into Generative AI”, Forrester claims that for RAG-focused architectures, gates, pipelines, and service layers work best. So, if you’re thinking about implementing GenAI apps, make sure they:

  • Feature intent and governance gates on both ends
  • Flow through pipelines that can engineer and govern prompts
  • Are grounded through RAG

07

The data retrieval process

The RAG lifecycle – from data sourcing to the final output – is based on a Large Language Model or LLM.

An LLM is a foundational Machine Learning (ML) model that employs deep learning algorithms to process and generate natural language. It’s trained on massive amounts of text data to learn complex language patterns and relationships, and perform related tasks, such as text generation, summarization, translation, and, of course, the answering of questions.

These models are pre-trained on large and diverse datasets to learn the intricacies of language, and can be fine-tuned for specific applications or tasks. The term “large” is even a bit of an understatement because these models can contain billions of data points. For example, ChatGPT4 is reported to have over a trillion.

RAGs execute retrieval and generative models in 5 key steps:

1. Sourcing

The starting point of any RAG system is data sourcing, from internal text documents and enterprise systems. The source data is queried (if structured) and searched (if unstructured) by the retrieval model to identify and collect relevant information. To ensure accurate, diverse, and trusted data sourcing, you must maintain accurate and up-to-date metadata, as well as manage and minimize data redundancy.

2. Preparing enterprise data for retrieval

Enterprises must organize their multi-source data and metadata so that RAG can access it in near real time. For example, your customer data, which includes master data (the customer's unique identifying attributes), transactional data (e.g., service requests, purchases, payments, and invoices), and interaction data (emails, chats, phone call transcripts), must be integrated, unified, and organized for real-time retrieval.

Depending on the use case, you may need to arrange your data by other business entities, such as employees, products, suppliers, or anything else that’s relevant for your use case.

3. Preparing documents for retrieval

Unstructured data, such as textual documents, must be divided into smaller, manageable pieces of related information, a process called "chunking". Effective chunking can improve retrieval performance and accuracy. For example, a document may be a chunk on its own, but it could also be chunked down further into sections, paragraphs, sentences, or even words. Chunking also makes the retriever less costly, by only including the relevant part of a document in the LLM prompt, instead of the whole document.

Next, the textual chunks must be transformed into vectors so they can be stored in a vector database, to enable efficient semantic search. This process is called “embedding” and is achieved by applying an LLM-based embedding model. The embeddings are linked back to the source, enabling the creation of more accurate and meaningful responses.

4. Protecting the data

The RAG framework must employ role-based access controls to prevent users from gaining answers to questions beyond their assigned role. For example, a customer service rep serving a specific customer segment might be prevented from accessing data about customers not in their segment; similarly, a marketing analyst might not be given access to certain confidential financial data.

Furthermore any sensitive data retrieved by RAG, such as personal identifiable information (PII), must never be accessible by unauthorized users – such as credit card information by salespeople, or Social Security Numbers by service agents. The RAG solution must employ dynamic data masking to protect the data in compliance with data privacy regulations.  

5. Engineering the prompt from enterprise data

After enterprise application data is retrieved, the retrieval-augmented generation process generates an enriched prompt by building a “story” out of the retrieved 360-degree data. There needs to be an ongoing tuning process for prompt engineering, ideally aided by Machine Learning (ML) models, and leveraging advanced chain-of-thought prompting techniques.

COMPLIMENTARY DOWNLOAD

Gartner RAG tips: FREE in this condensed version.

Get Gartner Tips
Gartner on Retrieval Augmented Generation (RAG)

09

Enterprise data retrieval in real time

The need for RAG to access enterprise application data is especially critical in customer-facing use cases, where personalized, accurate responses are required in real time.

Sample use cases include a customer care agent answering a service call, or a conversational AI bot providing self-service technical support to authenticated users on a company portal. The data needed for these scenarios is typically found in enterprise systems, like CRM, billing, and ticketing  systems.

A real-time 360° view of the relevant business entity is required, be they customers, employees, suppliers, products, loans, etc.

To achieve this, you need to organize your enterprise data by whichever business entities are relevant for your use case.

But the process of locating, accessing, integrating and unifying enterprise data in real time – and then using it to ground your LLM – is extremely complex, because:

  1. Enterprise data is typically fragmented across many different applications, with different databases and structures, and formats.

  2. Querying and unifying the relevant data for 1 entity (e.g., a single customer), in real time, typically requires high-complexity and high-cost data management

  3. Supporting high scale while maintaining low latency adds further complexity.

  4. The retrieved data has to be fresh, every time

  5. For reason of privacy and security, your LLM must only be able to access the data the user is authorized to see.

The Micro-Database™: Enabling hyper-performance enterprise data access by RAG

K2view organizes your fragmented multi-source enterprise application data into 360-degree views of your business entities (customers, products, loans, work orders, etc.), making them accessible to your RAG applications in near real time.

Each 360-view is stored in its own Micro-Database, secured by a unique encryption key, compressed by up to 90% and optimized for low footprint and low-latency RAG access.

K2view manages the data for each business entity in its own, high-performance, always-fresh
Micro-Database™ – ready for real-time access by RAG workflows

Designed to continually sync with underlying sources, the Micro-Database can be modeled to capture any business entity – and comes with built-in data access controls and dynamic data masking capabilities.  

To summarize, Micro-Database technology enables:

  • Real-time data access, for any entity
  • 360-degree complete view, for any business entity
  • Fresh data all the time, in sync with underlying systems
  • Dynamic data masking, to protect sensitive data
  • Role-based access controls, to safeguard data privacy
  • Reduced TCO, via a small footprint running on commodity hardware

10

RAG chatbot: A natural starting point

When customers need a quick answer to a question, a RAG chatbot can be extremely helpful. The problem is that most bots are trained on a limited number of intents (or question/answer combinations) and they lack context, in the sense that they give the same answer to different users. Therefore, their responses often ineffective – making their usefulness questionable.

RAG can make conventional bots a lot smarter by empowering the LLM to provide answers to questions that aren’t on the intent list – and that are contextual to the user.

For example, an airline chatbot responding to a platinum frequent flyer asking about an upgrade on a particular flight based on accumulated miles won’t be very helpful if it winds up answering, “Please contact frequent flyer support."

But once RAG augments the airline’s LLM with that particular user’s dataset, a much more contextual response could be generated, such as, “Francis, you have 929,100 miles at your disposal. For Flight EG17 from New York to Madrid, departing at 7 pm on November 5, 2024, you could upgrade to business class for 30,000 miles, or to first class for 90,000 miles. How would you like to proceed?”

What a difference!

The Q&A nature of chat interactions make the RAG chatbot an ideal pilot use case because understanding the context of a question – and of the user – leads to a more accurate, relevant, and satisfying response.

In fact, chatbots that are used by service and sales agents are the natural entry point for RAG-based GenAI apps.

11

Current state of Retrieval-Augmented Generation

Many companies are piloting RAG chatbots on internal users like customer service agents, because they’re hesitant to use them in production, primarily due to issues surrounding hallucinations, privacy, and security. As they become more reliable, this trend will change.

In fact, new RAG GenAI use cases are emerging all over the place, for example:

A stock investor wishing to see the commissions she was charged over the last quarter

A hospital patient wanting to compare the drugs he received to the payments he made

A telco subscriber choosing a new Internet-TV-phone plan since his is about to terminate

A car owner requesting a 3-year insurance claim history to reduce her annual premium

 

Today, RAG AI is mainly used to provide accurate, contextual, and timely answers to questions – via chatbots, email, texting, and other conversational AI applications. In the future, RAG AI might be used to suggest appropriate actions to contextual information and user prompts.

For example, if today RAG GenAI is used to inform army veterans about reimbursement policies for higher education, in the future it might list nearby colleges, and even recommend programs based on the applicant’s previous experience and military training. It may even be able to generate the reimbursement request itself.

12

Grounding GenAI apps with enterprise data

When all’s said and done, it’s difficult to deliver strategic value from GenAI today because your LLMs lack business context, cost too much to retrain, and hallucinate too frequently.

K2view GenAI Data Fusion is a patented solution that provides a complete retrieval-augmented generation solution,  grounding your GenAI apps with real-time, multi-source enterprise data.

K2view GenAI Data Fusion is built on a Micro-Database foundation described above to unify and organize multi-source enterprise data by business entities – customers, orders, loans, products, or anything else that is important to the business. In this way, the data for every business entity (say, a specific customer) is managed in its own Micro-Database, secured and ready for RAG access.

An entity’s data can be queried by, and injected into, the LLM as a contextual prompt – in milliseconds. It essentially makes multi-source enterprise data “GenAI-ready”.

K2view GenAI Data Fusion can:

  1. Feed real-time data about a particular customer or any other business entity
  2. Dynamically mask PII (Personally Identifiable Information) or other sensitive data
  3. Be reused for handling data service access requests, or for suggesting cross-sell recommendations
  4. Access enterprise systems via API, CDC, messaging, streaming – in any combination – to unify data from multiple source systems

It powers your GenAI apps to generate accurate recommendations, information, and content – for many use cases, such as:

Accelerating
issue resolution

Creating
hyper-personalized marketing campaigns

Generating
personalized cross-/up-sell recommendations for call center agents

Detecting
fraud by identifying suspicious activity
in a user account

 

Learn more about the K2view GenAI Data Fusion RAG tool.

Retrieval-Augmented Generation FAQ

1. What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a Generative AI (GenAI) framework that augments a Large Language Model (LLM) with fresh, trusted data retrieved from authoritative internal knowledge bases and enterprise systems. RAG generates more informed and reliable responses to LLM prompts, minimizing hallucinations and increasing user trust in GenAI apps.

2. What’s the relationship between GenAI, LLMs, and RAG?

GenAI is any type of AI that creates new content. An excellent example of GenAI are LLMs, which can generate text, answer questions, translate languages, and even write creative content. RAG is a technique that enhances LLM capabilities by augmenting its external, publicly-available knowledge with fresh, trusted internal data and information.

3. What types of information and data does RAG make use of?

  • Unstructured data found in knowledge bases such as articles, books, conversations, documents, and web pages
  • Structured data found in databases, knowledge graphs, and ontologies
  • Semi-structured data, stored in spreadsheets, XML and JSON files

4. Can RAG provide sources for the information it retrieves?

Yes. If the knowledge bases accessed by RAG have references to the retrieved information (in the form of metadata), sources can be cited. Citing sources allows errors to be identified and easily corrected, so that future questions won’t be answered with incorrect information.

5. How does RAG differ from traditional generative models?

Compared to traditional generative models that base their responses on input context only, retrieval-augmented generation retrieves relevant information from internal sources before generating an output. This process leads to more accurate and contextually rich responses, and, therefore, more positive and satisfying user experiences.

6. What is RAG's main component?

As its name suggests, RAG inserts a data retrieval component into the generation process, aimed at enhancing the relevance and reliability of the generated responses.

7. How does the retrieval model work?

The retrieval model collects and prioritizes the most relevant information and data from underlying enterprise systems and knowledge bases, based on the user query, transforms it into an enriched, contextual prompt, and invokes the LLM via its API. The LLM responds with an accurate and coherent response to the user.

8. What are the advantages of using RAG?

  • Quicker time to value at lower cost, versus LLM retraining or fine-tuning

  • Personalization of user interactions

  • Improved user trust because of reduced hallucinations

9. What challenges are associated with RAG?

  • Accessing all the information and data stored in internal knowledge bases and enterprise systems in real time

  • Generating the most effective and accurate prompts for the RAG framework

  • Keeping sensitive data hidden from people who aren’t authorized to see it

  • Building and integrating retrieval pipelines into applications

10. When is RAG most helpful?

Retrieval-augmented generation has various applications such as conversational agents, customer support, content creation, and question answering systems. It proves particularly useful in scenarios where access to internal information and data enhances the accuracy and relevance of the generated responses.