With this post I aim to present you with the basics about what are Generative AI models and RAG.

💬 Gen AI vs. Normal AI

The LLM development initially was target to a specific data domain that included training in the data around problems it needed to solve. In order to do so they needed a lot of labeled data. Early AI models were typically trained on labeled datasets for narrow, task-specific applications (like classifying emails or translating text). Generative AI, especially large language models (LLMs), are trained on vast amounts of unstructured text. These models learn general language patterns and can generate new content rather than just performing fixed classifications. It basically transfers that knowledge learned from the terabytes of data—that's why it's called a large language model—applies that knowledge to predict the next most likely token (word or symbol) based on the context.

🕹️ Tunning

By providing small amounts of additional context (often dynamically, without retraining), we can guide these models to perform domain-specific tasks more effectively. This technique allows one model to generalize across tasks like math, chemistry, and customer support.

🥳 Advantages

Some of the advantages of this approach are:

Very good performance out of the box with minimal data for the task
A lot of productivity since the same model can be used to different task domains

😓 Disadvantages

Everything is about tradeoffs. In the case of foundational models:

Only have their embedded knowledge as references
New knowledge means new training
Its hard to trust because they might hallucinate or generate facts to serve the task domain
They have a big computational cost to training

That's where RAG comes into play

🤖 Retrieval Augmented Generation - RAG

RAG is a framework that enables the usage of external sources to optimize the output. In this approach, we basically add a knowledge base that contains updated information and allows the retrieval of that information that will then be added with the prompt. The steps would be represented like this:

Enumerating and explaining:

User types the prompt
We encode the user prompt utilizing embedding and transforming it into a vector
We query the Vector Database (eg., Chroma, FAISS) for similar documents
We add that knowledge as context in the user prompt
The LLM receives that and generates its answer

That would be a very simplistic overview!

🧐 The good, the bad, and the Ugly

😍 The good

Instead of retraining the model, now we just need to update its knowledge store
Less hallucinations

😠 The bad

If the retriever is not good enough the model might skip an answerable question

😣 The ugly

Since LLMs are not naturally calibrated to express uncertainty, developers must instruct them—through prompt engineering or guardrails—to avoid guessing when the answer isn’t found in context.

Recap

In this post we dissected the differences between normal AI and Gen AI. The latter is more self sufficient but requires a lot of training and will be at risk of being obsolete and have to be retrained. That's were RAG plays a substantial role. It allows us to reuse that pre-trained model and give it extra tools and updated information. This will drastically improve the quality of the answers you get.

What are Generative AI models