2.5 Semantic Search — Advanced Strategies

Delivering precisely relevant information from large corpora is key to smart systems like chatbots and question‑answering (QA). Basic semantic search is a solid start, but there are edge cases where its quality and result diversity fall short. This chapter explores advanced retrieval techniques to improve both precision and variety.

Search based purely on semantic proximity doesn’t always yield the most informative and diverse set of results. Advanced methods add mechanisms to balance diversity and relevance — especially important for complex queries that require nuance.

Enter Maximum Marginal Relevance (MMR). MMR balances relevance and diversity: it selects documents that are close to the query while being dissimilar to each other. This reduces redundancy and helps cover different aspects of an answer.

The procedure looks like this: first, select “candidates” by semantic similarity; then choose a final set that simultaneously accounts for relevance to the query and dissimilarity to the documents already selected. The outcome is a broader, more useful result set.

Next comes Self‑Query Retrieval. This method suits queries that include both semantic content and metadata (e.g., “alien movies released in 1980”). It splits the request into a semantic component (for embedding search) and a metadata filter (e.g., “release year = 1980”).

Finally, Contextual Compression extracts only the most relevant fragments from retrieved documents. This is useful when you don’t need an entire document. It requires an extra processing step (finding the most relevant parts) but significantly improves accuracy and specificity.

Moving to the practical side of advanced retrieval techniques for strengthening semantic search: retrieving relevant documents is a critical stage in RAG (Retrieval‑Augmented Generation) systems such as chatbots and QA. The techniques below help handle edge cases in basic search and increase both diversity and specificity of results.

Retrieving relevant documents is a critical stage in RAG (Retrieval‑Augmented Generation) systems such as chatbots and QA. The techniques below help handle edge cases in basic search and increase both diversity and specificity of results.

Before you start, import the necessary libraries and configure access to external services (for example, OpenAI for embeddings).

# Import required libraries
import os
from openai import OpenAI
import sys

# Add the project root to sys.path for relative imports
sys.path.append('../..')

# Load environment variables from .env for safe API key management
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

# Initialize OpenAI using environment variables
client = OpenAI()

# Ensure required packages are installed, including `lark` for parsing if needed
# !pip install lark

Now configure a vector store to efficiently perform meaning‑based search (using embeddings mapped to high‑dimensional vectors).

# Import the Chroma vector store and OpenAI embeddings from LangChain
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Directory for the vector database to persist its data
persist_directory = 'vector_db/chroma/'

# Initialize the embedding function using an OpenAI model
embedding_function = OpenAIEmbeddings()

# Create a Chroma vector database with persistence and the embedding function
vector_database = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding_function
)

# Print the current record count to verify readiness
print(vector_database._collection.count())

Add a small demo set to showcase similarity search and MMR.

# A small set of texts to populate the database
texts = [
    "The Death Cap mushroom has a notable large fruiting body, often found above ground.",
    "Among mushrooms, the Death Cap stands out for its large fruiting body, sometimes appearing in all-white.",
    "The Death Cap, known for its toxicity, is one of the most dangerous mushrooms.",
]

# Create a tiny demonstration vector database from the texts
demo_vector_database = Chroma.from_texts(texts, embedding_function=embedding_function)

# A sample query for the demo vector database
query_text = "Discuss mushrooms characterized by their significant white fruiting bodies"

# Similarity search: top‑2 most relevant
similar_texts = demo_vector_database.similarity_search(query_text, k=2)
print("Similarity search results:", similar_texts)

# MMR search: diverse yet relevant (fetch extra candidates)
diverse_texts = demo_vector_database.max_marginal_relevance_search(query_text, k=2, fetch_k=3)
print("Diverse search (MMR) results:", diverse_texts)

A common issue is overly similar results. MMR balances relevance and diversity, reducing repetition and widening coverage. A practical MMR example:

# An information‑seeking query
query_for_information = "what insights are available on data analysis tools?"

# Standard similarity search: top‑3 relevant documents
top_similar_documents = vector_database.similarity_search(query_for_information, k=3)

# Show the beginning of the first two documents for comparison
print(top_similar_documents[0].page_content[:100])
print(top_similar_documents[1].page_content[:100])

# Note potential overlap. Introduce diversity with MMR.
diverse_documents = vector_database.max_marginal_relevance_search(query_for_information, k=3)

# Show the beginning of the first two diverse documents to observe differences
print(diverse_documents[0].page_content[:100])
print(diverse_documents[1].page_content[:100])

This example shows the difference between a standard similarity search and MMR: the latter yields relevant but less repetitive results.

Improving Accuracy with Metadata

Metadata helps refine queries and filter results by attributes (source, date, and so on).

Metadata‑filtered search

# A query scoped to a specific context
specific_query = "what discussions were there about regression analysis in the third lecture?"

# Similarity search with a metadata filter to target a specific lecture
targeted_documents = vector_database.similarity_search(
    specific_query,
    k=3,
    filter={"source": "documents/cs229_lectures/MachineLearning-Lecture03.pdf"}
)

# Inspect metadata to highlight the specificity of the search
for document in targeted_documents:
    print(document.metadata)

Combining Metadata and Self‑Query Retrievers

The Self‑Query Retriever extracts both the semantic query and the metadata filters from a single user phrase — no manual filter specification required.

Initialization and metadata description

Before running metadata‑aware search, define the metadata attributes to use:

# Import required modules from LangChain
from langchain_openai import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

# Define the metadata attributes with detailed descriptions
metadata_attributes = [
    AttributeInfo(
        name="source",
        description="Specifies the lecture document, limited to files in `docs/cs229_lectures`.",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="Page number within the lecture document.",
        type="integer",
    ),
]

# Note: switching to the OpenAI model gpt‑4o‑mini, as the previous default is deprecated
document_content_description = "Detailed lecture notes"
language_model = OpenAI(model='gpt-4o-mini', temperature=0)

Configure the Self‑Query Retriever

# Initialize the Self‑Query Retriever with the LLM, vector DB, and metadata attributes
self_query_retriever = SelfQueryRetriever.from_llm(
    language_model,
    vector_database,
    document_content_description,
    metadata_attributes,
    verbose=True
)

Run a query with automatically inferred metadata

# A query that encodes context directly in the question
specific_query = "what insights are provided on regression analysis in the third lecture?"

# Note: the first run may emit a deprecation warning for `predict_and_parse`; you can ignore it.
# Retrieve documents relevant to the specific query using inferred metadata
relevant_documents = self_query_retriever.get_relevant_documents(specific_query)

# Display metadata to demonstrate specificity
for document in relevant_documents:
    print(document.metadata)

Implementing Contextual Compression

Contextual compression works by extracting the segments of a document that are most relevant to a given query. This method not only reduces the computational load on LLMs but also improves answer quality by focusing on the most pertinent information.

Setting Up the Environment

Before diving into contextual compression specifics, make sure your environment is correctly configured with the necessary libraries:

# Import classes for contextual compression and document retrieval
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import OpenAI

Initializing the Compression Tools

Next, initialize the compression mechanism with a pretrained language model that will identify and extract relevant parts of documents:

# Initialize the language model with deterministic settings
language_model = OpenAI(temperature=0, model="gpt-4o-mini")

# Create a compressor that uses the LLM to extract relevant segments
document_compressor = LLMChainExtractor.from_llm(language_model)

Creating the Contextual Compression Retriever

With the compressor ready, configure a retriever that integrates contextual compression into the retrieval process:

# Combine the document compressor with the vector DB retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=document_compressor,
    base_retriever=vector_database.as_retriever()
)

Run a query and see how the compression‑aware retriever returns a more focused set of documents:

# Define a query to look for relevant document segments
query_text = "what insights are offered on data analysis tools?"

# Retrieve documents relevant to the query, automatically compressed for relevance
compressed_documents = compression_retriever.get_relevant_documents(query_text)

# Helper to pretty‑print compressed document contents
def pretty_print_documents(documents):
    print(f"\n{'-' * 100}\n".join([f"Document {index + 1}:\n\n" + doc.page_content for index, doc in enumerate(documents)]))

# Display the compressed documents
pretty_print_documents(compressed_documents)

Contextual compression aims to extract the essence of documents by focusing on the segments most relevant to the query. Combined with MMR, it balances relevance and diversity to provide a broader perspective on the topic. Configure the retriever with compression and MMR:

# Initialize a retriever that uses both contextual compression and MMR
compression_based_retriever = ContextualCompressionRetriever(
    base_compressor=document_compressor,
    base_retriever=vector_database.as_retriever(search_type="mmr")
)

# A query to test the combined approach
query_for_insights = "what insights are available on statistical analysis methods?"

# Retrieve compressed documents using the compression‑aware retriever
compressed_documents = compression_based_retriever.get_relevant_documents(query_for_insights)

# Pretty‑print the compressed documents
pretty_print_documents(compressed_documents)

This approach optimizes retrieval, ensuring results are not only relevant but also diverse, preventing redundancy and improving users’ understanding of the subject matter.

Beyond semantic search, there are other retrieval methods. TF‑IDF (Term Frequency‑Inverse Document Frequency) is a statistical measure of a word’s importance in a collection: it accounts for term frequency in a document and rarity across the corpus; high values indicate good descriptors and work well for exact‑match search. SVM (Support Vector Machine) can be used for document classification and indirectly improve retrieval by filtering or ranking documents by predefined categories.

Useful Links

LangChain Self‑Query Retriever: https://python.langchain.com/docs/modules/data_connection/retrievers/self_query
LangChain Maximal Marginal Relevance Retriever: https://python.langchain.com/docs/modules/data_connection/retrievers/how_to/mmr
TF‑IDF Explained: https://www.youtube.com/watch?v=BtWcKEmM0g4
SVM Explained: https://www.youtube.com/watch?v=efR1C6CvhmE

Theory Questions

What advantages does Maximum Marginal Relevance (MMR) offer over standard similarity search for document retrieval?
How do metadata improve precision and relevance in semantic search?
Describe how the Self‑Query Retriever works and its key advantage.
When is it sensible to use TF‑IDF and SVM in information retrieval, and how do they differ from embedding‑based methods?

Practical Tasks

Implement MMR with different parameters: experiment with k and fetch_k, and analyze how they affect diversity and relevance.
Extend metadata: add new types (e.g., author, publication date, keywords) and use them for filtered searches.
Integrate Self‑Query Retriever: expand metadata attribute descriptions to include the new fields and verify it can automatically form complex, constrained queries.
Compare methods: implement a simple TF‑IDF‑ or SVM‑based search over your collection and compare against semantic search, noting strengths and weaknesses in different scenarios.

Best Practices

Beyond using various retrieval techniques effectively, follow best practices to ensure maximum performance and reliability.

Choosing the Right Strategy

Choosing between MMR, Self‑Query Retriever, or plain similarity search depends on application requirements. When you need diverse results, MMR is optimal. If user queries contain explicit metadata, Self‑Query Retriever simplifies the process. Standard similarity search fits simpler queries.

Performance Optimization

Vector‑database performance, especially at large scale, is crucial. Regular indexing, caching popular queries, and hardware optimization can significantly speed up retrieval. Distributed vector databases can also help with scaling.

Metadata Management

Well‑structured and accurate metadata significantly improve search quality. Establish a thoughtful metadata schema and apply it consistently across the collection. Auto‑generating metadata with an LLM can help, but requires careful validation.

Monitoring and Iteration

Retrieval systems require continuous monitoring of performance and result quality. Collect user feedback, analyze relevance metrics, and A/B test retrieval strategies to iteratively improve the system.

Conclusion

This chapter surveyed advanced retrieval techniques designed to improve semantic‑search systems. By addressing limitations around diversity, specificity, and relevance, these methods provide a path toward more intelligent and effective retrieval. Through practical application of MMR, self‑query retrieval, contextual compression, and alternative document‑retrieval methods, developers can build systems that not only understand the semantic content of queries but also deliver rich, diverse, and targeted answers.

Following best practices ensures retrieval systems are both efficient and effective. As NLP continues to evolve, staying up‑to‑date with advances in retrieval technologies will be key to maintaining an edge in semantic‑search capabilities.

In sum, integrating advanced retrieval techniques into semantic‑search systems represents a significant step forward. With careful selection and optimization, developers can build solutions that substantially enhance user experience by delivering accurate, diverse, and contextually relevant information in response to complex queries.

Additional Theory Questions

Explain the principle of Maximum Marginal Relevance (MMR) and its role in improving retrieval quality.
How does self‑query retrieval handle queries that combine semantic content and metadata?
Explain contextual compression in document retrieval and why it matters.
List the environment‑setup steps for advanced retrieval using the OpenAI API and LangChain.
How does initializing a vector database enable efficient semantic similarity search?
Describe how to populate and use a vector database for similarity and diversified (MMR) search.
In advanced document retrieval, what advantages does MMR bring for ensuring diversity?
How can metadata be leveraged to increase specificity in document‑retrieval systems?
Discuss the benefits and challenges of self‑query retrievers in semantic search.
What role does contextual compression play in reducing compute load and improving answer quality?
What best practices matter most when implementing advanced retrieval in semantic‑search systems?
Compare the effectiveness of vector‑based retrieval with TF‑IDF and SVM in document retrieval.
How does integrating advanced retrieval techniques improve performance and UX in semantic‑search systems?
What impact might future NLP advances have on advanced retrieval for semantic search?

Additional Practical Tasks

Implement a Python class VectorDatabase with methods:
__init__(self, persist_directory: str): initialize the vector DB and its persistence directory.
add_text(self, text: str): embed text into a high‑dimensional vector using OpenAI embeddings and store it. Assume a function openai_embedding(text: str) -> List[float] returns the embedding vector.
similarity_search(self, query: str, k: int) -> List[str]: perform similarity search and return the top‑k most similar texts. Use a simplified similarity function.
Write a function compress_document that takes a list of strings (a document) and a query string, and returns a list of strings where each element is a compressed segment of the document relevant to the query. Assume an external utility compress_segment(segment: str, query: str) -> str that compresses individual segments for the query.
Implement max_marginal_relevance that takes a list of document IDs, a query, and two parameters lambda and k, and returns a list of k IDs selected by the Maximum Marginal Relevance criterion. Assume similarity function similarity(doc_id: str, query: str) -> float and diversity function diversity(doc_id1: str, doc_id2: str) -> float.
Write initialize_vector_db that demonstrates how to populate a vector DB with a list of predefined texts, then run similarity and diversified searches, printing both sets of results. Use the VectorDatabase class from task 1 as the backing store.