Teach Your PDFs to Chat: Build an AI-Powered Assistant That Delivers Answers
by Gabriel Vergara
Introduction
Turning static PDF documents into an AI-powered, interactive knowledge assistant might sound complex, but with the right tools, it’s surprisingly achievable. In this hands-on guide, we’ll dive straight into the nuts and bolts of building a Retrieval-Augmented Generation (RAG) pipeline—a system that blends vector-based information retrieval with the power of generative AI.
I will walk you through every step of the process, focusing on the code and tools that make it possible. From setting up a vector store using FAISS (Facebook AI Similarity Search) to integrating Langchain for orchestration and querying advanced language models via Ollama, this article is all about building, experimenting, and creating.
This isn’t just theory—it’s a practical, end-to-end implementation that transforms your PDFs into a queryable AI-driven resource. Along the way, you’ll see Python scripts in action and understand how these tools come together to create a flexible, powerful documentation assistant Proof of Concept.
If you want a more in-depth theory explanation, do not hesitate in check this article: From Messy Files to Magic Answers: How RAG Makes AI Smarter (and Life Easier)
Ready to roll up your sleeves and bring your documentation to life? Let’s dive in!
Prerequisites
Before diving into the examples, ensure that your development environment is set up with the necessary tools and dependencies. Here’s what you’ll need:
- Ollama: A local instance of Ollama is required for embedding and querying operations. If you don’t already have Ollama installed, you can download it from here. This guide assumes that Ollama is installed and running on your machine.
- Models: Once Ollama is set up, pull the required models.
- Mistral model: Used for querying during the retrieval-augmented generation process (check here).
- nomic-embed-text embedding model: Used to create embeddings for document chunks (check here).
- Python Environment:
- Python version: This script has been tested with Python 3.10. Ensure you have a compatible Python version installed.
- Installing Dependencies: Use a Python environment management tool like
pipenv
to set up the required libraries. Execute the following command in your terminal:
pipenv install langchain langchain-community langchain-ollama faiss-cpu pypdf
- Sample documents: a sample of about ten PDF documents, for which I recommend files with easy to grab text (PDF files with scanned images will not work as you expect… I’ve warned you!).
With these prerequisites in place, you’ll be ready to proceed to the next steps in setting up the vector store and querying it.
Feeding the Vector Store Index
In this section, we’ll explore how to populate a vector store with embeddings derived from PDF documents. This process involves configuring the setup, processing the documents, splitting them into manageable chunks, generating embeddings, and saving the vector store for efficient retrieval later (full code at the end of this article).
Configuration and Setup
The script begins by defining paths and configurations, including the embedding model to be used and directories for input documents and the vector store.
embedding_model = 'nomic-embed-text'
document_input_path = './document_tray/'
vectorstore_path = f'vectorstore_{embedding_model}'
embedding_model
: Specifies the model used to generate embeddings. In this example, we use nomic-embed-text.document_input_path
: Directory containing the PDF documents to be processed.vectorstore_path
: Directory where the vector store will be saved.
Vector Store Initialization
Before creating the vector store, the script removes any existing store with the same name to ensure a fresh start.
# Remove previous vectorstore
if os.path.isdir(vectorstore_path):
shutil.rmtree(vectorstore_path)
Purpose: Ensures that outdated or conflicting data does not interfere with the new vector store.
Loading and Splitting PDF Documents
The script processes all PDF files in the specified input directory, loading their content and splitting them into smaller chunks using a character-based text splitter.
document_list = []
# Setup text splitter
text_splitter = CharacterTextSplitter(
chunk_size=1000, chunk_overlap=30, separator="\n"
)
# Process files
for file in os.listdir(document_input_path):
if file.endswith('.pdf'):
pdf_file = os.path.join(document_input_path, file)
loader = PyPDFLoader(file_path=pdf_file)
docs_chunks = text_splitter.split_documents(loader.load())
document_list.extend(docs_chunks)
- Text Splitting: Chunks are split to a size of 1000 characters with a 30-character overlap to preserve context across chunks.
- PyPDFLoader: Reads the content of each PDF file and prepares it for splitting.
Creating and Saving the Vector Store
After processing the documents, the script generates embeddings for each chunk and saves them in a FAISS vector store.
vectorstore_embeddings = OllamaEmbeddings(model=embedding_model)
vectorstore_index = FAISS.from_documents(document_list, vectorstore_embeddings)
vectorstore_index.save_local(vectorstore_path)
- OllamaEmbeddings: Converts document chunks into high-dimensional vectors based on the specified embedding model.
- FAISS: Builds the vector store using the embeddings and the processed document chunks.
- Saving: The vector store is saved locally for later retrieval.
This script effectively automates the process of turning static PDFs into a structured and queryable vector store, forming the foundation for retrieval-augmented generation. With the vector store prepared, the next step is to query it for relevant context during LLM interactions.
Querying the Vector Store
This script demonstrates how to load an existing vector store, enhance the LLM’s prompt using retrieval-augmented generation (RAG), and query the model for answers based on the stored context (full code at the end of this article). Here’s a step-by-step breakdown:
Configuration and Setup
The first step is to define the configuration, including the LLM model and embedding model, as well as the path to the pre-created vector store.
llm_model = 'mistral'
embedding_model = 'nomic-embed-text'
vectorstore_path = f'vectorstore_{embedding_model}'
llm_model
: Specifies the LLM used for generating responses. In this case, the Mistral model is used.embedding_model
: Defines the embedding model used to generate vector representations.vectorstore_path
: Points to the directory where the vector store index is saved.
Loading the Vector Store
The vector store is loaded from the saved directory, enabling efficient retrieval of relevant chunks during queries.
vectorstore_embeddings = OllamaEmbeddings(model=embedding_model)
vectorstore_index = FAISS.load_local(
vectorstore_path, vectorstore_embeddings, allow_dangerous_deserialization=True
)
OllamaEmbeddings
: Re-creates the embedding structure required for interpreting the stored vectors.FAISS.load_local
: Loads the FAISS vector store index and associates it with the embedding model for retrieval.
Creating the Prompt Template and Retrieval Chain
A custom prompt template is defined to guide the LLM in generating answers solely based on the retrieved context.
retrieval_qa_chat_prompt = ChatPromptTemplate.from_messages([
("system", "Answer any use questions based solely on the context below:\n\n<context>\n{context}\n</context>"),
("placeholder", "{chat_history}"),
("human", "{input}"),
])
combine_docs_chain = create_stuff_documents_chain(
ChatOllama(model=llm_model), retrieval_qa_chat_prompt
)
retrieval_chain = create_retrieval_chain(
vectorstore_index.as_retriever(), combine_docs_chain
)
- Prompt Template: Provides instructions to the LLM, including placeholders for context (
{context}
), chat history ({chat_history}
), and user input ({input}
). - Combining Documents: Uses
create_stuff_documents_chain
to integrate the context and user input for the LLM. - Retrieval Chain: Combines the vector store retriever with the document chain to form a seamless query pipeline.
Querying the LLM with Questions
The script defines a list of sample queries and processes each query using the retrieval chain.
query_list = [
"What can you tell me about Google Cloud Vision AI?",
"What kind of service is provided by Sightengine?",
"What is a chatbot?",
"Can you explain what is the 'chain of thought' process?"
]
for query in query_list:
print(f'query> {query}')
res = retrieval_chain.invoke({'input': query})
print(f"{llm_model}> {res['answer']}")
- Query List: A set of example questions to demonstrate the model’s ability to retrieve and answer contextually.
- Retrieval Chain Invocation: Each query is passed to the
retrieval_chain
, which retrieves relevant context and feeds it to the LLM for a final answer.
Output
The model processes each query and generates responses enriched with the retrieved context, ensuring high relevance and accuracy.
Full code
This is the full code of both the vector store feeder as well as the vector store queries.
vectorstore_feed.py
import os
import shutil
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter
if __name__ == '__main__':
# ---- Configuration ------------------------------------------------------
embedding_model = 'nomic-embed-text'
document_input_path = './document_tray/'
vectorstore_path = f'vectorstore_{embedding_model}'
# ---- Vector Store setup -------------------------------------------------
print('-' * 80)
print(f'> Creating vectorstore: {vectorstore_path} ')
# Remove previous vectorstore
if os.path.isdir(vectorstore_path):
print(f' > Removing existing vectorstore... ')
shutil.rmtree(vectorstore_path)
# ---- Grab each PDF file as a document for the vector store
print(f' > Processing PDF files in: {document_input_path} ')
# All documents list
document_list = []
# Setup text splitter
text_splitter = CharacterTextSplitter(
chunk_size=1000, chunk_overlap=30, separator="\n"
)
# Process files
for file in os.listdir(document_input_path):
if file.endswith('.pdf'):
# load the PDF document
pdf_file = os.path.join(document_input_path, file)
print(f' > Processing file: {pdf_file} ')
loader = PyPDFLoader(file_path=pdf_file)
docs_chunks = text_splitter.split_documents(loader.load())
document_list.extend(docs_chunks)
# ---- Create vector store index
print(f' > Saving vectorstore: {vectorstore_path}')
vectorstore_embeddings = OllamaEmbeddings(model=embedding_model)
vectorstore_index = FAISS.from_documents(document_list, vectorstore_embeddings)
vectorstore_index.save_local(vectorstore_path)
print('> Vectorstore created!')
vectorstore_feed.py
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
if __name__ == '__main__':
# ---- Configuration ------------------------------------------------------
llm_model = 'mistral'
embedding_model = 'nomic-embed-text'
vectorstore_path = f'vectorstore_{embedding_model}'
# ---- Load vector store index
print('-' * 80)
print(f'> Loading vectorstore: {vectorstore_path}')
vectorstore_embeddings = OllamaEmbeddings(model=embedding_model)
vectorstore_index = FAISS.load_local(
vectorstore_path, vectorstore_embeddings, allow_dangerous_deserialization=True
)
print('> Vectorstore loaded!')
# ---- Improved Prompt ----------------------------------------------------
retrieval_qa_chat_prompt = ChatPromptTemplate.from_messages([
("system", "Answer any use questions based solely on the context below:\n\n<context>\n{context}\n</context>"),
("placeholder", "{chat_history}"),
("human", "{input}"),
])
combine_docs_chain = create_stuff_documents_chain(
ChatOllama(model=llm_model), retrieval_qa_chat_prompt
)
retrieval_chain = create_retrieval_chain(
vectorstore_index.as_retriever(), combine_docs_chain
)
# ---- Query the LLM model with a set of questions ------------------------
query_list = [
"What can you tell me about Google Cloud Vision AI?",
"What kind of service is provided by Sightengine?",
"What is a chatbot?",
"Can you explain what is the 'chain of thought' process?"
]
print(f'> Querying model: {llm_model}')
for query in query_list:
print('-' * 20)
print(f'query> {query}')
res = retrieval_chain.invoke({'input': query})
print(f"{llm_model}> {res['answer']}")
Conclusion
In this article, we walked through building a RAG pipeline from the ground up, leveraging tools like FAISS, Langchain, and Ollama to create an AI-driven documentation assistant. We transformed static PDF files into an interactive, queryable knowledge resource, demonstrating how practical and powerful this approach can be.
One of the greatest strengths of this implementation is its flexibility. You can easily swap out the language models, embedding models, or even expand the pipeline to handle other document types, like Word files, HTML, or emails. With a few tweaks to the code, this system can evolve alongside advancements in AI and adapt to meet your unique needs.
This is just the beginning. Use it as a sandbox to experiment—adjust parameters, test new embeddings, or refine the AI prompts to handle specific queries better. The modularity of the pipeline allows for endless customization, enabling you to tailor it to personal projects or professional applications.
Now it’s your turn. Dive into the code, explore the possibilities, and make it your own. By building on this foundation, you’ll not only simplify how you access information but also create tools that bring value, clarity, and efficiency to the way we interact with data.
Let the innovation begin!
About Me
I’m Gabriel, and I like computers. A lot.
For nearly 30 years, I’ve explored the many facets of technology—as a developer, researcher, sysadmin, security advisor, and now an AI enthusiast. Along the way, I’ve tackled challenges, broken a few things (and fixed them!), and discovered the joy of turning ideas into solutions. My journey has always been guided by curiosity, a love of learning, and a passion for solving problems in creative ways.
See ya around!