Answers 2.6

Theory

Three stages of RAG‑QA: accept the query, retrieve relevant documents, and generate the answer.
Context window constraints: because the LLM context is limited, you cannot pass every fragment. MapReduce and Refine help aggregate or iteratively refine information across multiple documents.
Vector database: stores document embeddings and provides fast retrieval of the most relevant documents based on semantic similarity.
RetrievalQA chain: combines retrieval and answer generation, improving relevance and accuracy of results.
MapReduce and Refine: MapReduce quickly produces a summary from many documents; Refine sequentially improves the answer, which is useful when precision is critical. Choose based on the task.
Distributed systems: account for network latency and serialization when operating in distributed setups.
Experimentation: try MapReduce and Refine; effectiveness depends heavily on data types and question styles.
RetrievalQA limitation: no built‑in dialogue memory, which makes maintaining context across follow‑ups difficult.
Dialogue memory: needed to incorporate previous turns and provide contextual answers during longer conversations.
Further study: new LLM approaches, their impact on RAG systems, and memory strategies in RAG chains.

Practical Tasks

1.

from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

def initialize_vector_database(directory_path):
    # Initialize an embeddings generator (OpenAI) to create vector representations for text
    embeddings_generator = OpenAIEmbeddings()

    # Initialize a Chroma vector database pointing to a persistence directory
    # and the embedding function to use
    vector_database = Chroma(persist_directory=directory_path, embedding_function=embeddings_generator)

    # Display current document count to verify initialization
    # Assumes Chroma exposes `_collection.count()`
    document_count = vector_database._collection.count()
    print(f"Documents in VectorDB: {document_count}")

# Example usage of initialize_vector_database:
documents_storage_directory = 'path/to/your/directory'
initialize_vector_database(documents_storage_directory)

2.

from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

def setup_retrieval_qa_chain(model_name, documents_storage_directory):
    # Initialize embeddings and Chroma vector store
    embeddings_generator = OpenAIEmbeddings()
    vector_database = Chroma(persist_directory=documents_storage_directory, embedding_function=embeddings_generator)

    # Initialize the language model (LLM) used in the RetrievalQA chain
    language_model = ChatOpenAI(model=model_name, temperature=0)

    # Define a custom prompt template to format LLM inputs
    custom_prompt_template = """To better assist with the inquiry, consider the details provided below as your reference...
{context}
Inquiry: {question}
Insightful Response:"""

    # Create the RetrievalQA chain, passing the LLM, a retriever from the vector DB,
    # requesting source documents, and using the custom prompt
    question_answering_chain = RetrievalQA.from_chain_type(
        language_model,
        retriever=vector_database.as_retriever(),
        return_source_documents=True,
        chain_type_kwargs={"prompt": PromptTemplate.from_template(custom_prompt_template)}
    )

    return question_answering_chain

# Example usage of setup_retrieval_qa_chain:
model_name = "gpt-4o-mini"
documents_storage_directory = 'path/to/your/documents'
qa_chain = setup_retrieval_qa_chain(model_name, documents_storage_directory)

3.

# Assume setup_retrieval_qa_chain has been defined in the same script or imported.

# Configure to demonstrate both techniques (MapReduce and Refine)
model_name = "gpt-3.5-turbo"
documents_storage_directory = 'path/to/your/documents'
qa_chain = setup_retrieval_qa_chain(model_name, documents_storage_directory)

# Create QA chains: one for MapReduce, one for Refine
question_answering_chain_map_reduce = RetrievalQA.from_chain_type(
    qa_chain.llm,
    retriever=qa_chain.retriever,
    chain_type="map_reduce"  # Use MapReduce chain type
)

question_answering_chain_refine = RetrievalQA.from_chain_type(
    qa_chain.llm,
    retriever=qa_chain.retriever,
    chain_type="refine"  # Use Refine chain type
)

# Example query to test both techniques
query = "What is the importance of probability in machine learning?"

# Run MapReduce and print the answer
response_map_reduce = question_answering_chain_map_reduce({"query": query})
print("MapReduce answer:", response_map_reduce["result"])

# Run Refine and print the answer
response_refine = question_answering_chain_refine({"query": query})
print("Refine answer:", response_refine["result"])

4.

def handle_conversational_context(initial_query, follow_up_query, qa_chain):
    """
    Simulate handling a follow‑up question in a longer conversation.

    Args:
    - initial_query (str): First user query.
    - follow_up_query (str): Follow‑up query referring to prior context.
    - qa_chain (RetrievalQA): Initialized QA chain that can answer queries.

    Returns:
    - None: Prints both answers directly to the console.
    """
    # Generate the answer to the initial query
    initial_response = qa_chain({"query": initial_query})
    print("Answer to initial query:", initial_response["result"])

    # Generate the answer to the follow‑up query (note: no dialogue memory)
    follow_up_response = qa_chain({"query": follow_up_query})
    print("Answer to follow‑up query:", follow_up_response["result"])

# Example usage
a_initial = "Does the curriculum cover probability theory?"
a_follow_up = "Why are those prerequisites important?"
handle_conversational_context(a_initial, a_follow_up, qa_chain)