Building an AI Travel Assistant Based on Web Indexing Using Qdrant and Gemini
In this article, we will explore the process of building an AI Travel Assistant using Qdrant and Gemini. This involves indexing URLs similar to how search engine bots operate, capturing relevant data, and summarizing the results for natural language queries. We will also provide the source URLs for relevant answers.
For data storage, we will be using the Qdrant Vector Database. This practical demonstration will focus on creating a search engine specifically designed for the travel domain, serving as an 'AI Tour Assistant' for top cities in Europe.
Understanding Web Indexing
Search engine bots collect, store, and organize public data from websites during the indexing process. This data is then made searchable to users. Adding Large Language Models (LLMs) to this process enhances natural language understanding and generation, providing well-trimmed and informative text outputs. By incorporating LLMs and Qdrant Vector Search, search engines can now deliver results based on similarity scores rather than just keyword matches.
Summarization Using LLM
Incorporating LLMs introduces a 5th step in our process, which involves summarizing the results from relevant search pages and displaying the corresponding source URLs.
Programming Segments
The programming process is divided into two segments:
- Web Scraping/Indexing Asynchronously: In this step, we extract data from a set of URLs and store it in a vector database through embedding. This process is managed by an indexer script that continuously stores data outside of the search engine query loop.
- LLM-Powered Search Engine: Here, LLMs are used to convert natural language search queries into vector queries, identifying relevant chunks and summarizing them using the Gemini AI model. LangChain, QdrantDB, and Streamlit are utilized for orchestration.
Implementing Web Indexing
We begin by scraping content from a collection of relevant travel domain URLs using LangChain. The extracted text chunks are then stored in a vector database as embeddings for future use.
The stored data in the vector database can be used for user search queries. Necessary libraries and configurations are set up to run the search engine script effectively.
Creating the Search Engine UI
The search engine functionality is implemented by connecting to the QdrantDB collection, customizing the output parser, and building the UI using Streamlit. Users can input queries to retrieve information along with summaries and corresponding source URLs.
This process demonstrates how to build an LLM-powered search engine using publicly available information from the internet. Users can verify responses by visiting the provided source URLs. To scale this up, frameworks like Ray can be utilized to handle concurrent user queries efficiently.