RAG (short for Retrieval-Augmented Generation) is a technique or software architecture in the field of Artificial Intelligence (AI), designed to optimize the output of a Large Language Model (LLM).
In essence, RAG is a combination of two mechanisms:
The goal of RAG is to provide the LLM with accurate, up-to-date, and specific context, helping the model overcome the limitations of static training data.
Traditional LLM models often face 3 major problems that RAG can solve:
The process of handling a question in RAG proceeds as follows:
| Step | Name | Action Description |
|---|---|---|
| 1 | Retrieval (Truy xuất) | The system searches for text segments most relevant to the question in the data repository (usually using a Vector Database). |
| 2 | Augmentation (Tăng cường) | Combine the user’s question + The data just found into a complete “prompt”. |
| 3 | Generation (Tạo sinh) | Send that prompt to the AI (LLM) for it to synthesize and write out the final answer for the user. |
