RAG-based Chatbot for document queries

Hien,Wed Jun 19 2024•diploma team project

Chatbot system pipelines

Chatbot pipelines

The chatbot is designed to help users quickly obtain the information they need, but it can also verify their sources. The implemented solution for our chatbot is based on the latest technology, which can improve the accuracy and relevance of texts generated by LLMs (large language models), namely Retrieval Augmented Generation (RAG). This approach to natural language processing combines two powerful techniques: information retrieval and text generation. Below are the two main stages of RAG technology:

Document Processing Pipeline:

Accepting Documents: The pipeline initiated from the document repository starts with the acceptance of stored text documents.
Text Division: After processing, the documents are divided into smaller text fragments for further processing and storage stages.
Generating Embeddings: Each text fragment is transformed into a vector representation, called an embedding, using an advanced embedding model. These embeddings capture the semantic essence of the text.
Metadata Extraction: At the same time, important metadata associated with each document are extracted, including the sources of each document (links to websites).
Integration: The pipeline integrates embeddings with the corresponding text fragments, creating a paired dataset consisting of text and its vector representation.
Storage: This paired dataset is then stored in a specialized vector database designed for performing high-performance vector operations, enabling efficient similarity searches and retrieval tasks.
Vector Database: Serving as the best solution for data storage, it contains fully processed texts and embeddings, ready for access by the system component operating in real time.

Real-time Processing Pipeline:

User: Users interact with the system via a chat interface that accepts user queries.
Processing Queries into Embeddings: Upon receiving a query, the system generates query embeddings using the same embedding model used in the Document Processing Pipeline.
Searching for Relevant Texts: The system then retrieves the most relevant text fragments from the vector database, comparing the query embedding with the embedding of the retrieved text fragments. In our case, the most appropriate text fragments are determined based on their cosine similarity scores.
Creating Prompts: Using appropriate system prompts for ChatGPT and the retrieved most relevant text fragments, a contextually complex prompt is created, which is passed to ChatGPT to generate a response.
Generating Responses: ChatGPT generates responses by combining previously trained knowledge and specified provided data. Thus, the responses can be more aware, accurate, and relevant in relation to the posed topic.

Check it out at:

https://iaff.azurewebsites.net/assistant/ (opens in a new tab)