Back to Notes

Retrieval Augmented Generation (RAG): A Beginner's Guide

Generative AI has taken the world by storm. Tools like ChatGPT can write essays, answer questions, and even generate code. But sometimes, these models "hallucinate"—they make up facts or give outdated answers because they only rely on the data they were trained on.

This is where Retrieval Augmented Generation (RAG) comes in. RAG combines the power of search with the creativity of AI generation, making AI responses more accurate, up-to-date, and trustworthy.

Infographic explaining RAG failure cases

What is RAG?

Retrieval Augmented Generation (RAG) is a technique where an AI model doesn't just generate answers from its training. Instead, it first retrieves relevant information from an external knowledge base (like documents, PDFs, or a database) and then uses a generator (a large language model) to create a natural, human-like answer.

Think of it like this:


Why is RAG Used?

RAG is used to:


How RAG Works: Retriever + Generator

RAG has two main parts:

  1. Retriever – Finds relevant information from a knowledge base
  2. Generator – Uses an LLM to create a fluent answer based on the retrieved info

Simple Example

Suppose you ask:

What is the capital of France, and give me a fun fact about it?

Retriever: Looks into a knowledge base, finds documents mentioning "France → Capital → Paris" and "Paris facts → Eiffel Tower, culture."

Generator: Reads those snippets and responds:

The capital of France is Paris. A fun fact is that the Eiffel Tower was originally meant to be a temporary structure!

Here, the retriever ensures the model has the right facts, and the generator makes the response smooth and human-like.


What is Indexing?

To search documents quickly, we need an index—like the index in a book.

Instead of flipping through every page, the index lets us jump straight to the relevant section.

In RAG, indexing means organizing documents in a way that the retriever can quickly find the most relevant pieces of text.


Why do we perform Vectorization?

Computers don't understand text the way humans do. They need numbers.

Vectorization means converting text into number-based representations called embeddings.

Example

This helps the retriever know what information is related to your question.

Infographic explaining RAG failure cases

Why do RAGs Exist?

RAGs exist because LLMs can't know everything:

RAG solves this by connecting the model to fresh, specific, and external knowledge sources.


Why do we perform Chunking?

Most documents are long. If we dump an entire 200-page PDF into the model, it won't work well.

Chunking means breaking documents into smaller pieces (e.g., 500–1000 characters each).

Example

A 10-page company policy is split into smaller paragraphs so the retriever can pull out only the relevant section.


Why is Overlapping used in Chunking?

Sometimes, important context sits on the boundary of two chunks.

Example

Chunk 1: "...employees must log in daily. Passwords should be..."

Chunk 2 (overlap): "...Passwords should be at least 12 characters long..."

This way, the retriever doesn't miss context when searching.


Final Thoughts

Retrieval Augmented Generation (RAG) is like giving AI a library card—it can look up the right information before answering. By combining retrieval (search) and generation (language models), RAG helps build AI systems that are:

As GenAI keeps evolving, RAG will be one of the core techniques powering enterprise chatbots, research assistants, and intelligent knowledge systems.

Back to Notes