
What are Generative AI models
With this post I aim to present you with the basics about what are Generative AI models and RAG.
💬 Gen AI vs. Normal AI
The LLM development initially was target to a specific data domain that included training in the data around problems it needed to solve. In order to do so they needed a lot of labeled data. Early AI models were typically trained on labeled datasets for narrow, task-specific applications (like classifying emails or translating text). Generative AI, especially large language models (LLMs), are trained on vast amounts of unstructured text. These models learn general language patterns and can generate new content rather than just performing fixed classifications. It basically transfers that knowledge learned from the terabytes of data—that's why it's called a large language model—applies that knowledge to predict the next most likely token (word or symbol) based on the context.
🕹️ Tunning
By providing small amounts of additional context (often dynamically, without retraining), we can guide these models to perform domain-specific tasks more effectively. This technique allows one model to generalize across tasks like math, chemistry, and customer support.
🥳 Advantages
Some of the advantages of this approach are:
- Very good performance out of the box with minimal data for the task
- A lot of productivity since the same model can be used to different task domains
😓 Disadvantages
Everything is about tradeoffs. In the case of foundational models:
- Only have their embedded knowledge as references
- New knowledge means new training
- Its hard to trust because they might hallucinate or generate facts to serve the task domain
- They have a big computational cost to training
That's where RAG comes into play
🤖 Retrieval Augmented Generation - RAG
RAG is a framework that enables the usage of external sources to optimize the output. In this approach, we basically add a knowledge base that contains updated information and allows the retrieval of that information that will then be added with the prompt. The steps would be represented like this:

Enumerating and explaining:
- User types the prompt
- We encode the user prompt utilizing embedding and transforming it into a vector
- We query the Vector Database (eg., Chroma, FAISS) for similar documents
- We add that knowledge as context in the user prompt
- The LLM receives that and generates its answer
That would be a very simplistic overview!
🧐 The good, the bad, and the Ugly
😍 The good
- Instead of retraining the model, now we just need to update its knowledge store
- Less hallucinations
😠 The bad
- If the retriever is not good enough the model might skip an answerable question
😣 The ugly
- Since LLMs are not naturally calibrated to express uncertainty, developers must instruct them—through prompt engineering or guardrails—to avoid guessing when the answer isn’t found in context.
Recap
In this post we dissected the differences between normal AI and Gen AI. The latter is more self sufficient but requires a lot of training and will be at risk of being obsolete and have to be retrained. That's were RAG plays a substantial role. It allows us to reuse that pre-trained model and give it extra tools and updated information. This will drastically improve the quality of the answers you get.