2.6 RAG Systems. Techniques for Question Answering

Introduction

Retrieval Augmented Generation (RAG) systems have revolutionized the way we interact with large corpora of data, enabling the development of sophisticated chatbots and question-answering models. A critical stage in these systems involves passing retrieved documents, along with the original query, to a language model (LM) for generating answers. This chapter explores various strategies for optimizing this process, ensuring accurate and comprehensive responses.

Question Answering with Language Models

Once relevant documents are retrieved, they must be effectively synthesized into coherent answers. This involves the integration of document content with the query context and leveraging the capabilities of LMs.

General Flow

Query Reception: A user query is received.
Document Retrieval: Relevant documents are sourced from the corpus.
Answer Generation: Documents and the query are passed to an LM, which generates an answer.

Integration Methods

By default, all retrieved chunks are passed into the LM's context window. However, limitations arise due to the context window size. Strategies like MapReduce, Refine, and MapRerank offer solutions to this constraint.

Enhancing RAG Systems with Advanced Question Answering Techniques

Before diving into the specifics of question answering with LMs, ensure your development environment is configured correctly. This setup includes importing necessary libraries, setting up API keys, and adjusting for any deprecations in LM versions.

import os
import openai
from dotenv import load_dotenv
import datetime

# Load environment variables and configure OpenAI API key
load_dotenv()
openai.api_key = os.environ['OPENAI_API_KEY']

# Adjust for LLM versioning
current_date = datetime.datetime.now().date()
llm_name = "gpt-3.5-turbo"
print(f"Using LLM Version: {llm_name}")

Document Retrieval with VectorDB

A crucial step in RAG systems is retrieving documents relevant to a user's query. This is achieved using a vector database (VectorDB) that stores document embeddings.

# Import necessary libraries for vector database and embeddings generation
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

# Specify the directory where the vector database will persist its data
documents_storage_directory = 'docs/chroma/'

# Initialize the embeddings generator using OpenAI's embeddings
embeddings_generator = OpenAIEmbeddings()

# Initialize the vector database with the specified storage directory and embedding function
vector_database = Chroma(persist_directory=documents_storage_directory, embedding_function=embeddings_generator)

# Display the current document count in the vector database to verify initialization
print(f"Document Count in VectorDB: {vector_database._collection.count()}")

Implementing Question Answering Chains

The RetrievalQA chain is a method that combines document retrieval with question answering, utilizing the capabilities of LMs to generate responses based on the retrieved documents.

Initializing the Language Model

from langchain.chat_models import ChatOpenAI

# Initialize the chat model with the chosen LLM version
language_model = ChatOpenAI(model_name=llm_name, temperature=0)

Configuring the RetrievalQA Chain

# Importing necessary modules from the langchain library
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Creating a custom prompt template for the language model
# The template guides the model to use the provided context effectively to answer the question
custom_prompt_template = """To better assist with the inquiry, consider the details provided below as your reference...
{context}
Inquiry: {question}
Insightful Response:"""

# Initializing the RetrievalQA chain with the custom prompt template
question_answering_chain = RetrievalQA.from_chain_type(
    language_model,
    retriever=vector_database.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PromptTemplate.from_template(custom_prompt_template)}
)

Question Answering in Action

# Pose a query to the system
query = "Is probability a class topic?"
response = qa_chain({"query": query})
print("Answer:", response["result"])

Exploring Advanced QA Chain Types

MapReduce and Refine Techniques

MapReduce and Refine are advanced techniques designed to circumvent limitations posed by the LM's context window size, enabling the processing of numerous documents.

# Configuring the question answering chain to use the MapReduce technique
# This configuration enables the aggregation of responses from multiple documents
question_answering_chain_map_reduce = RetrievalQA.from_chain_type(
    language_model,
    retriever=vector_database.as_retriever(),
    chain_type="map_reduce"
)

# Executing the MapReduce technique with a user-provided query
response_map_reduce = question_answering_chain_map_reduce({"query": query})

# Printing the aggregated answer obtained through the MapReduce technique
print("MapReduce Answer:", response_map_reduce["result"])

# Configuring the question answering chain to use the Refine technique
# This approach allows for the sequential refinement of the answer based on the query
question_answering_chain_refine = RetrievalQA.from_chain_type(
    language_model,
    retriever=vector_database.as_retriever(),
    chain_type="refine"
)

# Executing the Refine technique with the same user-provided query
response_refine = question_answering_chain_refine({"query": query})

# Printing the refined answer, showcasing the iterative improvement process
print("Refine Answer:", response_refine["result"])

Practical Tips and Best Practices

Choosing Between MapReduce and Refine: The decision to use MapReduce or Refine depends on the specific requirements of your task. MapReduce is best suited for scenarios where the goal is to aggregate information from multiple sources quickly. Refine, however, is more appropriate for tasks requiring high accuracy and the iterative improvement of answers.
Optimizing Performance: When implementing these techniques, especially in distributed systems, pay attention to network latency and data serialization costs. Efficient data transfer and processing can significantly impact the overall performance.
Experimentation is Key: The effectiveness of MapReduce and Refine can vary based on the nature of the data and the specifics of the question answering task. It's essential to experiment with both techniques to determine which yields the best results for your particular application.

Addressing RetrievalQA Limitations

A notable limitation of RetrievalQA chains is their inability to preserve conversational history, impacting the flow of follow-up queries.

Demonstrating the Limitation

# Importing the question answering chain from a hypothetical library
from some_library import question_answering_chain as qa_chain

# Defining an initial query related to course content
initial_question_about_course_content = "Does the curriculum cover probability theory?"
# Generating a response to the initial query using the question answering chain
response_to_initial_question = qa_chain({"query": initial_question_about_course_content})

# Defining a follow-up query without explicitly preserving the conversational context
follow_up_question_about_prerequisites = "Why are those prerequisites important?"
# Generating a response to the follow-up query, again using the question answering chain
response_to_follow_up_question = qa_chain({"query": follow_up_question_about_prerequisites})

# Displaying the responses to both the initial and follow-up queries
print("Response to Initial Query:", response_to_initial_question["result"])
print("Response to Follow-Up Query:", response_to_follow_up_question["result"])

This limitation underscores the need for integrating conversational memory into RAG systems, a topic that will be explored in subsequent sections.

Conclusion

Advanced question answering

techniques in RAG systems offer a pathway to more dynamic and accurate responses, enhancing user interaction. Through the careful implementation of RetrievalQA chains, and by addressing inherent limitations, developers can create sophisticated systems capable of engaging in meaningful dialogues with users.

Theory questions:

What are the three main stages involved in the question answering process of a RAG system?
Describe the limitations of passing all retrieved chunks into the LM's context window and mention at least two strategies to overcome this constraint.
Explain the significance of using a vector database (VectorDB) in document retrieval for RAG systems.
How does the RetrievalQA chain combine document retrieval with question answering in RAG systems?
Compare and contrast the MapReduce and Refine techniques in the context of overcoming LM's context window size limitations.
What practical considerations should be taken into account when implementing MapReduce or Refine techniques in a distributed system?
Why is it crucial to experiment with both MapReduce and Refine techniques in a RAG system?
Identify a major limitation of RetrievalQA chains concerning conversational history and its impact on follow-up queries.
Discuss the importance of integrating conversational memory into RAG systems and how it could potentially enhance user interaction.
What are the recommended areas for further reading and exploration to advance one's understanding of RAG systems and their capabilities?

Practice questions:

Based on the content of the chapter on advanced question answering techniques in RAG systems, here are some Python tasks that align with the key concepts and code examples presented:

Vector Database Initialization
Implement a Python function that initializes a vector database for document retrieval. Use the Chroma class for the database and OpenAIEmbeddings for generating embeddings. The function should take a directory path as an input for where the vector database will store its data and print the current document count in the database.
RetrievalQA Chain Setup
Create a Python function that sets up a RetrievalQA chain with a custom prompt template. The function should initialize a language model and a vector database retriever, then configure the RetrievalQA chain using these components. Use the custom prompt template provided in the chapter, and allow the function to accept a model name and a documents storage directory as parameters.
Question Answering with MapReduce and Refine Techniques
Write a Python script that demonstrates the use of MapReduce and Refine techniques for question answering. Your script should include the initialization of language model and vector database components, setup for both MapReduce and Refine question answering chains, and execute these chains with a sample query. Print the results of both techniques.
Handling Conversational Context
Implement a Python function that simulates the handling of a follow-up question in a conversational context. The function should accept two queries (an initial query and a follow-up query) and generate responses to both using a question answering chain. This task aims to illustrate the limitation mentioned in the chapter regarding the preservation of conversational history. Your implementation does not need to solve the limitation but should demonstrate how the system currently handles follow-up queries.