{"id":14969,"date":"2025-01-31T20:14:41","date_gmt":"2025-01-31T20:14:41","guid":{"rendered":"https:\/\/temperies.com\/?p=14969"},"modified":"2025-01-31T20:14:43","modified_gmt":"2025-01-31T20:14:43","slug":"see-it-search-it-ai-powered-image-cataloging-made-easy","status":"publish","type":"post","link":"https:\/\/temperies.com\/es\/2025\/01\/31\/see-it-search-it-ai-powered-image-cataloging-made-easy\/","title":{"rendered":"See It, Search It: AI-Powered Image Cataloging Made Easy"},"content":{"rendered":"<h1>See It, Search It: AI-Powered Image Cataloging Made Easy<\/h1>\n\n\n\n<p><em>by Gabriel Vergara<\/em><\/p>\n\n\n\n<h2>Introduction<\/h2>\n\n\n\n<p>Have you ever struggled to find an image in a massive collection, remembering only a vague description like <em>&#8220;sunset over a snowy mountain&#8221;<\/em> or <em>&#8220;a cat sitting next to a laptop&#8221;<\/em>? What if you could just type a natural-language description and instantly retrieve the most relevant images? That&#8217;s exactly what we&#8217;re going to build in this article using <strong>multimodal AI and a vector database<\/strong>.<\/p>\n\n\n\n<h3><strong>What&#8217;s a Multimodal Model, and How Is It Different from an LLM?<\/strong><\/h3>\n\n\n\n<p>When we talk about <strong>Large Language Models (LLMs)<\/strong>, we usually mean AI models trained to understand and generate text. They&#8217;re great at answering questions, summarizing documents, or generating creative text. However, they can&#8217;t &#8220;see&#8221; or interpret images. That&#8217;s where <strong>multimodal models<\/strong> come in.<\/p>\n\n\n\n<p>A <strong>multimodal model<\/strong> processes multiple types of input\u2014like text and images\u2014simultaneously. It can <strong>understand and describe images in natural language<\/strong>, combining vision and language capabilities in one model. One such model is <strong>LLaVA<\/strong> (Large Language and Vision Assistant), defined as <em>a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding<\/em>, which we&#8217;ll use in this project.<\/p>\n\n\n\n<h4><strong>What&#8217;s a Vision Encoder? (In Simple Terms)<\/strong><\/h4>\n\n\n\n<p>A <strong>vision encoder<\/strong> is the part of a multimodal model that processes images. It converts an image into a numerical representation (embeddings) that a language model can understand. Think of it as translating a picture into a format that an AI can &#8220;read&#8221; and describe in words.<\/p>\n\n\n\n<h3><strong>What Are We Building?<\/strong><\/h3>\n\n\n\n<p>In this guide, we&#8217;ll use <strong>LLaVA<\/strong> to <strong>automatically generate text descriptions for a repository of images<\/strong>. Instead of manually tagging each picture, we&#8217;ll let AI analyze and describe them for us. Then, we&#8217;ll store these descriptions in a <strong>vector database<\/strong> so we can perform <strong>similarity searches<\/strong>\u2014meaning, you can find images based on their textual descriptions.<\/p>\n\n\n\n<h3><strong>How Are We Doing This?<\/strong><\/h3>\n\n\n\n<p>We&#8217;ll write <strong>Python code<\/strong> that integrates:<\/p>\n\n\n\n<ul><li><strong>LangChain<\/strong> for AI model orchestration<\/li><li><strong>Ollama<\/strong> to run the multimodal model (LLaVA) locally<\/li><li><strong>FAISS<\/strong> for storing and searching the text representations efficiently<\/li><li><strong>Pandas<\/strong> to structure the cataloged data<\/li><li><strong>Pillow<\/strong> to handle image processing<\/li><\/ul>\n\n\n\n<p>If you want a more in-depth theory explanation, do not hesitate in check this link: <a href=\"https:\/\/temperies.com\/es\/2025\/01\/16\/from-messy-files-to-magic-answers-how-rag-makes-ai-smarter-and-life-easier\/\" data-type=\"post\" data-id=\"14927\" target=\"_blank\" rel=\"noreferrer noopener\">From Messy Files to Magic Answers: How RAG Makes AI Smarter (and Life Easier)<\/a>. Also, if you want to delve deeper into RAG implementations, take a look at this: <a href=\"https:\/\/temperies.com\/es\/2025\/01\/17\/teach-your-pdfs-to-chat-build-an-ai-powered-assistant-that-delivers-answers\/\" data-type=\"post\" data-id=\"14933\" target=\"_blank\" rel=\"noreferrer noopener\">Teach Your PDFs to Chat: Build an AI-Powered Assistant That Delivers Answers<\/a>.<\/p>\n\n\n\n<p>By the end of this article, you&#8217;ll have a working system that can <strong>catalog images automatically and make them searchable by description<\/strong>. Let&#8217;s get started!<\/p>\n\n\n\n<h2>Prerequisites<\/h2>\n\n\n\n<p>Before diving into the examples, ensure that your development environment is set up with the necessary tools and dependencies. Here&#8217;s what you&#8217;ll need:<\/p>\n\n\n\n<ol><li><strong>Ollama<\/strong>: A local instance of Ollama is required for embedding and querying operations. If you don&#8217;t already have Ollama installed, you can download it from <a href=\"https:\/\/ollama.com\/download\">here<\/a>. This guide assumes that Ollama is installed and running on your machine.<\/li><li><strong>Models<\/strong>: Once Ollama is set up, pull the required models.<ul><li><strong>llava:13b<\/strong>: Used to generate the images descriptions (<a href=\"https:\/\/ollama.com\/library\/llava:13b\">link<\/a>). If you are low on hardware, you can also try with <strong>llava:7b<\/strong> (<a href=\"https:\/\/ollama.com\/library\/llava:7b\">link<\/a>), which is less accurate but lighter on processing requirements (and faster, also). You will need about 7GB of free storage space for the <strong>llava:13b<\/strong> model.<\/li><li><strong>nomic-embed-text embedding model<\/strong>: Used to create embeddings from the images descriptions (<a href=\"https:\/\/ollama.com\/library\/nomic-embed-text\">link<\/a>).<\/li><\/ul><\/li><li><strong>Python Environment<\/strong>:<ul><li><strong>Python version:<\/strong> This script has been tested with Python 3.10. Ensure you have a compatible Python version installed.<\/li><li><strong>Installing Dependencies: <\/strong>Use a Python environment management tool like <code>pipenv<\/code> to set up the required libraries. Execute the following command in your terminal:<\/li><\/ul><\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>pipenv install langchain langchain-community langchain-ollama faiss-cpu pillow tqdm pandas<\/code><\/pre>\n\n\n\n<ol start=\"4\"><li><strong>Sample images<\/strong>: a sample of ten to twenty PNG images files, used as a image repository. This can be used with other images types, but will require that you tweak the scripts a little.<\/li><\/ol>\n\n\n\n<p>With these prerequisites in place, you&#8217;ll be ready to proceed to the next steps in setting up the image catalog and query it.<\/p>\n\n\n\n<h2><strong>Generating Image Descriptions and Storing Them for Search<\/strong><\/h2>\n\n\n\n<p>Before we dive into the code, here&#8217;s the plan: we&#8217;ll process a repository of images, use a <strong>multimodal AI model<\/strong> to generate text descriptions, and then store those descriptions in a <strong>vector database<\/strong> for easy retrieval.<\/p>\n\n\n\n<p>The full code will be available at the end of the article, but here, we&#8217;ll break it down into the key steps.<\/p>\n\n\n\n<h3><strong>Step 1: Setting Up the Environment<\/strong><\/h3>\n\n\n\n<p>We start by setting up our configurations:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>llm_model = 'llava:13b'\nembedding_model = 'nomic-embed-text'\nvectorstore_path = f'vectorstore_{embedding_model}'\nimages_repository_path = '.\/images_repository'\nimages_description_csv = '.\/images_descriptions.csv'\nbatch_size = 8<\/code><\/pre>\n\n\n\n<p>This defines the multimodal model, embedding model, and the paths for storing processed data. The <strong>batch_size<\/strong> parameter is used to process image descriptions in small chunks to feed the vector database (for the sake of the example, 8 was enough\u2026 consider using a bigger number for real scenarios).<\/p>\n\n\n\n<h3><strong>Step 2: Preparing Image Data<\/strong><\/h3>\n\n\n\n<p>LLaVA requires images in <strong>base64 format<\/strong>, so we need to convert them before feeding them into the model. The following function reads an image, encodes it, and returns its base64 representation:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def get_png_as_base64(file_path):\n    buffered = BytesIO()\n    Image.open(file_path).save(buffered, format=\"PNG\")\n    return base64.b64encode(buffered.getvalue()).decode(\"utf-8\")<\/code><\/pre>\n\n\n\n<p>I encourage you to enhance this function to process other image types!<\/p>\n\n\n\n<h3><strong>Step 3: Extracting Descriptions from Images<\/strong><\/h3>\n\n\n\n<p>Now, we iterate over the images in the repository, send them to LLaVA, and capture the text descriptions:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>model = OllamaLLM(model=llm_model)\nprompt = 'Describe this image in two or less sentences:'\n\ninput_files = glob.glob(f'{images_repository_path}\/*.png')\ndataframe = pd.DataFrame(columns=&#91;'image_path', 'description'])\n\nfor image_path in input_files:\n    image_base64 = get_png_as_base64(image_path)\n    response = model.bind(images=&#91;image_base64]).invoke(input=prompt)\n    dataframe.loc&#91;len(dataframe)] = &#91;image_path, response]\n\ndataframe.to_csv(images_description_csv, index=False)<\/code><\/pre>\n\n\n\n<p>Here&#8217;s what happens:<\/p>\n\n\n\n<ol><li>The script grabs all <code>.png<\/code> files in the image repository.<\/li><li>Each image is converted to <strong>base64<\/strong> and passed to LLaVA.<\/li><li>The model generates a short <strong>text description<\/strong>, which is stored in a Pandas DataFrame.<\/li><li>The DataFrame is saved as a CSV file for later reference.<\/li><\/ol>\n\n\n\n<h3><strong>Step 4: Creating the Vector Database<\/strong><\/h3>\n\n\n\n<p>Once we have textual descriptions, we need to <strong>convert them into vectors<\/strong> and store them in FAISS for efficient retrieval.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>if os.path.isdir(vectorstore_path):\n    shutil.rmtree(vectorstore_path)\n\nvectorstore_embeddings = OllamaEmbeddings(model=embedding_model)\nvectorstore_index = None<\/code><\/pre>\n\n\n\n<p>This section initializes FAISS and ensures we start with a <strong>fresh<\/strong> vector database by removing any existing one.<\/p>\n\n\n\n<h3><strong>Step 5: Indexing Descriptions in FAISS<\/strong><\/h3>\n\n\n\n<p>Since we&#8217;re working with potentially <strong>large datasets<\/strong>, we process text descriptions in batches:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for i in range(0, len(dataframe), batch_size):\n    batch_df = dataframe.iloc&#91;i:i + batch_size]\n    documents = &#91;\n        Document(\n            page_content=row&#91;'description'],\n            metadata={\"image_path\": row&#91;\"image_path\"], \"description\": row&#91;\"description\"]}\n        )\n        for _, row in batch_df.iterrows()\n    ]\n\n    if vectorstore_index is None:\n        vectorstore_index = FAISS.from_documents(documents, vectorstore_embeddings)\n    else:\n        vectorstore_index.add_documents(documents)<\/code><\/pre>\n\n\n\n<p>This approach helps to:<\/p>\n\n\n\n<ul><li><strong>Reduce memory usage<\/strong> by processing only a few descriptions at a time.<\/li><li><strong>Ensure smooth indexing<\/strong> when dealing with thousands of images.<\/li><\/ul>\n\n\n\n<h3><strong>Step 6: Saving the Vector Store<\/strong><\/h3>\n\n\n\n<p>Finally, we store the indexed vector database for future queries:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>vectorstore_index.save_local(vectorstore_path)<\/code><\/pre>\n\n\n\n<p>And that&#8217;s it! Now, we have a <strong>searchable<\/strong> catalog of image descriptions stored in FAISS, making it easy to find images based on textual queries.<\/p>\n\n\n\n<h2><strong>Searching Your Image Catalog Using Natural Language<\/strong><\/h2>\n\n\n\n<p>Now that we have a <strong>vectorized image catalog<\/strong>, it&#8217;s time to put it to use! In this section, we\u2019ll explore how to perform <strong>similarity searches<\/strong> against our FAISS vector database to retrieve the most relevant images based on a text query.<\/p>\n\n\n\n<p>The full code will be available at the end of the article, but here, we\u2019ll break it down into the most important parts.<\/p>\n\n\n\n<h3><strong>Step 1: How Similarity Search Works<\/strong><\/h3>\n\n\n\n<p>Instead of using traditional <strong>keyword-based<\/strong> searching, we\u2019ll leverage <strong>vector similarity search<\/strong> to find images based on meaning, even if the descriptions aren&#8217;t an exact match.<\/p>\n\n\n\n<p>FAISS allows us to use a function called <strong>similarity_search_with_score<\/strong>, which determines how closely a stored description matches a query. The <strong>lower the score, the better the match<\/strong>\u2014a score of <strong>0.0<\/strong> means a perfect match.<\/p>\n\n\n\n<h3><strong>Step 2: Defining the Search Function<\/strong><\/h3>\n\n\n\n<p>We define a function that takes a <strong>text query<\/strong> (e.g., <em>&#8220;a cat sitting next to a laptop&#8221;<\/em>) and finds the top matching images from the vector database:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def search_images(query, top_k=3):\n    results = vectorstore.similarity_search_with_score(query, k=top_k)\n    return &#91;\n        (doc.metadata&#91;\"image_path\"], doc.metadata&#91;\"description\"], score)\n        for doc, score in results\n    ]<\/code><\/pre>\n\n\n\n<p>Here\u2019s how it works:<\/p>\n\n\n\n<ol><li><strong>The query is compared<\/strong> against stored image descriptions.<\/li><li><strong>FAISS returns the best-matching images<\/strong>, along with their similarity <strong>scores<\/strong>.<\/li><li><strong>The results include<\/strong>:<ul><li>The <strong>image path<\/strong> (so you can locate the file).<\/li><li>The <strong>AI-generated description<\/strong>.<\/li><li>The <strong>similarity score<\/strong> (lower = better match).<\/li><\/ul><\/li><\/ol>\n\n\n\n<p>By default, the function returns <strong>three results<\/strong>, but you can adjust <code>top_k<\/code> to show more or fewer images.<\/p>\n\n\n\n<p><strong>IMPORTANT:<\/strong> consider that the function will return the (by default) 3 closest descriptions\u2026 if you have a very small set of images, it will find the three closest descriptions, even if they do not relate with your search query.<\/p>\n\n\n\n<h3><strong>Step 3: Loading the Vector Database<\/strong><\/h3>\n\n\n\n<p>Before running a search, we need to load the FAISS index and the text embedding model:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from langchain_ollama.embeddings import OllamaEmbeddings\nfrom langchain_community.vectorstores import FAISS\n\nembedding_model = 'nomic-embed-text'\nvectorstore_path = f'vectorstore_{embedding_model}'\n\nvectorstore_embeddings = OllamaEmbeddings(model=embedding_model)\nvectorstore = FAISS.load_local(vectorstore_path, vectorstore_embeddings, allow_dangerous_deserialization=True)<\/code><\/pre>\n\n\n\n<p>This ensures that:<\/p>\n\n\n\n<ul><li>We use the same <strong>embedding model<\/strong> that was used during indexing.<\/li><li>The <strong>vectorstore is loaded from disk<\/strong>, so we can perform searches instantly.<\/li><\/ul>\n\n\n\n<h3><strong>Step 4: Running the Search Loop<\/strong><\/h3>\n\n\n\n<p>To make the system interactive, we create a loop where users can enter <strong>text queries<\/strong>, get relevant image matches, and decide whether to search again:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>search_another = True\nwhile search_another:\n    image_search_query = input(\"Enter an image description to search for: \")\n    movie_results = search_images(image_search_query)\n\n    for image_path, description, score in movie_results:\n        print(f\"- Image Path: {image_path}\\n\\t- Score: {score:.4f}\\n\\t- Description: {description}\\n\")\n\n    search_another = input(\"Search again? &#91;Y,n]: \").lower() in &#91;'y', '']<\/code><\/pre>\n\n\n\n<p>This part:<\/p>\n\n\n\n<ul><li><strong>Prompts the user<\/strong> to enter a natural-language description of the image they\u2019re looking for.<\/li><li><strong>Retrieves the closest-matching images<\/strong> using FAISS.<\/li><li><strong>Displays the image path, similarity score, and AI-generated description<\/strong>.<\/li><li><strong>Lets the user perform multiple searches<\/strong> without restarting the script.<\/li><\/ul>\n\n\n\n<h2>Output samples<\/h2>\n\n\n\n<p>This is a output sample, just for you to see the results:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Enter a image description to search for: a landscape\n\n---- Images related ----------------------\n\n- Image Path: .\/images_repository\/Second_Life_Landscape_champ_des_fleurs.png\n    - Score: 0.4404\n    - Description:  This image features a digital rendering of a natural landscape, possibly from a video game or a 3D model. The scene includes wildflowers, tall grasses, and trees in the background under a clear sky. There is a blank area on the bottom part of the image where there should be more detail but it appears to have been cut off or obscured.\n\n- Image Path: .\/images_repository\/Semmering_landscape.png\n    - Score: 0.6829\n    - Description:  This image captures a serene winter scene on a mountain slope. The landscape is covered in snow, and the forest at the base of the slope has snow-covered pine trees. There's a slight mist or fog that adds to the tranquility of the setting.<\/code><\/pre>\n\n\n\n<p>And these are the images:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img fetchpriority=\"high\" src=\"https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Second_Life_Landscape_champ_des_fleurs.png\" alt=\"Second_Life_Landscape_champ_des_fleurs.png\" class=\"wp-image-14970\" width=\"752\" height=\"384\" srcset=\"https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Second_Life_Landscape_champ_des_fleurs.png 1504w, https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Second_Life_Landscape_champ_des_fleurs-768x392.png 768w, https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Second_Life_Landscape_champ_des_fleurs-18x9.png 18w\" sizes=\"(max-width: 752px) 100vw, 752px\" \/><figcaption>Second_Life_Landscape_champ_des_fleurs.png<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img src=\"https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Semmering_landscape.png\" alt=\"Semmering_landscape.png\" class=\"wp-image-14972\" width=\"947\" height=\"648\" srcset=\"https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Semmering_landscape.png 3788w, https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Semmering_landscape-768x526.png 768w, https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Semmering_landscape-1536x1051.png 1536w, https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Semmering_landscape-2048x1401.png 2048w, https:\/\/temperies.com\/wp-content\/uploads\/2025\/01\/Semmering_landscape-18x12.png 18w\" sizes=\"(max-width: 947px) 100vw, 947px\" \/><figcaption>Semmering_landscape.png<\/figcaption><\/figure>\n\n\n\n<h2>Full code<\/h2>\n\n\n\n<p>This is the full code of both the image catalog creation as well as the vector store queries.<\/p>\n\n\n\n<h3>index_images_description.py<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import base64\nimport glob\nimport os\nimport pandas as pd\nimport shutil\n\nfrom io import BytesIO\nfrom langchain.schema import Document\nfrom langchain_community.vectorstores import FAISS\nfrom langchain_ollama import OllamaLLM\nfrom langchain_ollama.embeddings import OllamaEmbeddings\nfrom PIL import Image\nfrom tqdm import tqdm\n\n\n# ---- Functions definition ---------------------------------------------------\ndef get_dataframe(file_path):\n    if os.path.isfile(file_path):\n        os.remove(file_path)\n    return pd.DataFrame(columns=&#91;'image_path', 'description'])\n\n\ndef get_png_as_base64(file_path):\n    buffered = BytesIO()\n    Image.open(file_path).save(buffered, format=\"PNG\")  # You can change the format if needed\n    return base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n\n\n# ---- Entry point ------------------------------------------------------------\nif __name__ == '__main__':\n\n    # ---- Configuration ------------------------------------------------------\n    llm_model = 'llava:13b'\n    embedding_model = 'nomic-embed-text'\n    vectorstore_path = f'vectorstore_{embedding_model}'\n    images_repository_path = '.\/images_repository'\n    images_description_csv = '.\/images_descriptions.csv'\n    batch_size = 8\n\n    # ---- Model definition ---------------------------------------------------\n    model = OllamaLLM(model=llm_model)\n    prompt = 'Describe this image in two or less sentences:'\n\n    # ---- Grab images from repository ----------------------------------------\n    print('-' * 80)\n    print(f'&gt; Grabbing PNG images from repository: {images_repository_path}')\n    input_files = glob.glob(f'{images_repository_path}\/*.png')\n    print(f'&gt; Got {len(input_files)} file(s).')\n\n    # ---- Get the images description dataset ---------------------------------\n    print(f'&gt; Creating images description dataset: {images_description_csv}')\n    dataframe = get_dataframe(images_description_csv)\n\n    # ---- Image processing ---------------------------------------------------\n    print('&gt; Updating images description dataset...')\n    for i in tqdm(range(0, len(input_files), 1), desc=\"Images description generation progress\"):\n        image_path = input_files&#91;i]\n        image_base64 = get_png_as_base64(image_path)\n        response = model.bind(images=&#91;image_base64]).invoke(input=prompt)\n        dataframe.loc&#91;len(dataframe)] = &#91;image_path, response]\n\n    dataframe.to_csv(images_description_csv, index=False)\n    print('&gt; ...done!')\n\n    # ---- Vector Store setup -------------------------------------------------\n    print('-' * 80)\n    print(f'&gt; Creating vectorstore: {vectorstore_path} ')\n    if os.path.isdir(vectorstore_path):\n        shutil.rmtree(vectorstore_path)\n\n    # Initialize FAISS index &amp; embedding model\n    vectorstore_embeddings = OllamaEmbeddings(model=embedding_model)\n    vectorstore_index = None  # Placeholder for FAISS index\n\n    # ---- Vector store indexing\n    print(f'&gt; Indexing {len(dataframe)} images descriptions...\\n')\n    for i in tqdm(range(0, len(dataframe), batch_size), desc=\"Indexing images description progress\"):\n        batch_df = dataframe.iloc&#91;i:i + batch_size]\n        documents = &#91;\n            Document(\n                page_content=row&#91;'description'],\n                metadata={\"image_path\": row&#91;\"image_path\"], \"description\": row&#91;\"description\"]}\n            )\n            for _, row in batch_df.iterrows()\n        ]\n        # Initialize or append to FAISS index\n        if vectorstore_index is None:\n            vectorstore_index = FAISS.from_documents(documents, vectorstore_embeddings)\n        else:\n            vectorstore_index.add_documents(documents)\n\n    # ---- Save vector store index\n    print(f'&gt; Saving vectorstore: {vectorstore_path}')\n    vectorstore_index.save_local(vectorstore_path)\n    print('&gt; Vectorstore created!')<\/code><\/pre>\n\n\n\n<h3>search_images_description.py<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>from langchain_ollama.embeddings import OllamaEmbeddings\nfrom langchain_community.vectorstores import FAISS\n\n# Query function\ndef search_images(query, top_k=2):\n    results = vectorstore.similarity_search_with_score(query, k=top_k)\n    return &#91;\n        (doc.metadata&#91;\"image_path\"], doc.metadata&#91;\"description\"], score)\n        for doc, score in results\n    ]\n\nif __name__ == '__main__':\n    # ---- Configuration ------------------------------------------------------\n    embedding_model = 'nomic-embed-text'\n    vectorstore_path = f'vectorstore_{embedding_model}'\n\n    # ---- Load FAISS index ---------------------------------------------------\n    vectorstore_embeddings = OllamaEmbeddings(model=embedding_model)\n    vectorstore = FAISS.load_local(vectorstore_path, vectorstore_embeddings, allow_dangerous_deserialization=True)\n\n    # ---- Query vector store -------------------------------------------------\n    search_another = True\n    while search_another:\n        print('-' * 80)\n        image_search_query = input(\"Enter a image description to search for: \")\n        movie_results = search_images(image_search_query)\n\n        print(\"\\n---- Images related ----------------------\\n\")\n        for image_path, description, score in movie_results:\n            print(f\"- Image Path: {image_path}\\n\\t- Score: {score:.4f}\\n\\t- Description: {description}\\n\")\n\n        search_another = input(\"Search again? &#91;Y,n]: \").lower() in &#91;'y', '']<\/code><\/pre>\n\n\n\n<h2><strong>Wrapping Up: From Pixels to Searchable Insights<\/strong><\/h2>\n\n\n\n<p>In this article, we built a powerful <strong>AI-driven image search system<\/strong> using a <strong>multimodal model (LLaVA) and a vector database (FAISS)<\/strong>. Instead of manually tagging images, we let AI <strong>see, describe, and organize them<\/strong>\u2014turning an unstructured image repository into a fully searchable collection.<\/p>\n\n\n\n<h3><strong>What We Accomplished<\/strong><\/h3>\n\n\n\n<ul><li><strong>Generated image descriptions<\/strong> automatically using LLaVA.<\/li><li><strong>Converted those descriptions into vectors<\/strong> with nomic-embed-text.<\/li><li><strong>Stored them in FAISS<\/strong> for efficient similarity-based searches.<\/li><li><strong>Built an interactive search system<\/strong> that retrieves images based on natural language queries.<\/li><\/ul>\n\n\n\n<p>This workflow proves how <strong>multimodal AI and vector databases can revolutionize the way we interact with visual data<\/strong>. Whether you&#8217;re cataloging a massive photo archive, building an AI-powered image search tool, or enhancing media management workflows\u2014this technique offers a scalable, efficient solution.<\/p>\n\n\n\n<h3><strong>Next Steps<\/strong><\/h3>\n\n\n\n<p>You can expand on this project by:<\/p>\n\n\n\n<ul><li><strong>Enhancing search accuracy<\/strong> with better prompts or fine-tuned models.<\/li><li><strong>Adding metadata filtering<\/strong> (e.g., searching by date or category).<\/li><li><strong>Building a web interface<\/strong> for a more user-friendly experience.<\/li><\/ul>\n\n\n\n<p>Now it\u2019s your turn\u2014how will you use AI to make image search smarter? \ud83d\ude80<\/p>\n\n\n\n<h2>About Me<\/h2>\n\n\n\n<p><em>I\u2019m Gabriel, and I like computers. A lot.<\/em><\/p>\n\n\n\n<p>For nearly 30 years, I\u2019ve explored the many facets of technology\u2014as a developer, researcher, sysadmin, security advisor, and now an AI enthusiast. Along the way, I\u2019ve tackled challenges, broken a few things (and fixed them!), and discovered the joy of turning ideas into solutions. My journey has always been guided by curiosity, a love of learning, and a passion for solving problems in creative ways.<\/p>\n\n\n\n<p>See ya around!<\/p>","protected":false},"excerpt":{"rendered":"<p>Have you ever struggled to find an image in a massive collection, remembering only a vague description like &#8220;sunset over a snowy mountain&#8221; or &#8220;a cat sitting next to a laptop&#8221;? What if you could just type a natural-language description and instantly retrieve the most relevant images? That&#8217;s exactly what we&#8217;re going to build in this article using multimodal AI and a vector database.<\/p>","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[54],"tags":[55,73,72,61,71,74,56,57,69,59,60,70],"_links":{"self":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/14969"}],"collection":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/comments?post=14969"}],"version-history":[{"count":5,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/14969\/revisions"}],"predecessor-version":[{"id":14976,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/14969\/revisions\/14976"}],"wp:attachment":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/media?parent=14969"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/categories?post=14969"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/tags?post=14969"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}