3.3 AI Quiz Generation Mechanism
This chapter assembles a working AI‑powered quiz generator end to end: we set up the environment and access to external services, prepare a compact dataset of subjects/categories/facts, design a prompt so questions strictly match the chosen category, and wire it all into a LangChain pipeline. We start with environment setup and keys; to keep output clean you can suppress non‑essential warnings.
# Use the warnings library to control warning messages
import warnings
# Ignore all warnings to ensure clean runtime output
warnings.filterwarnings('ignore')
# Load API keys for third‑party services used in the project
from utils import get_circle_ci_api_key, get_github_api_key, get_openai_api_key
# Obtain individual API keys for CircleCI, GitHub, and OpenAI
circle_ci_api_key = get_circle_ci_api_key()
github_api_key = get_github_api_key()
openai_api_key = get_openai_api_key()
Next, we form the app’s backbone — a compact dataset from which questions will be composed: we fix subjects, categories, and facts that quizzes will be built from.
# Define a template for structuring quiz questions
quiz_question_template = "{question}"
# Initialize a quiz bank with subjects, categories, and facts
quiz_bank = """
Here are three new quiz questions following the given format:
1. Subject: A Historical Conflict
Categories: History, Politics
Facts:
- Began in 1914 and ended in 1918
- Involved two major alliances: the Allies and the Central Powers
- Known for extensive trench warfare on the Western Front
2. Subject: A Revolutionary Communication Technology
Categories: Technology, History
Facts:
- Invented by Alexander Graham Bell in 1876
- Revolutionized long‑distance communication
- The first words transmitted were "Mr. Watson, come here, I want to see you"
3. Subject: An Iconic American Landmark
Categories: Geography, History
Facts:
- Gifted to the United States by France in 1886
- Symbolizes freedom and democracy
- Located on Liberty Island in New York Harbor
"""
To ensure questions are relevant to the user’s selected category, we design a detailed prompt template: from category selection via the quiz bank to formulating questions in the prescribed format.
# Define a delimiter to separate different parts of the quiz prompt
section_delimiter = "####"
# Create a detailed prompt template guiding the AI to generate user‑customized quizzes
quiz_generation_prompt_template = f"""
Instructions for generating a customized quiz:
Each question is separated by four hashes, i.e. {section_delimiter}
The user chooses a category for the quiz. Ensure the questions are relevant to the chosen category.
Step 1:{section_delimiter} Identify the user‑selected category from the list below:
* Culture
* Science
* Art
Step 2:{section_delimiter} Choose up to two subjects that match the selected category from the quiz bank:
{quiz_bank}
Step 3:{section_delimiter} Create a quiz based on the selected subjects by formulating three questions per subject.
Quiz format:
Question 1:{section_delimiter} <Insert Question 1>
Question 2:{section_delimiter} <Insert Question 2>
Question 3:{section_delimiter} <Insert Question 3>
"""
With this template, we move to LangChain: form a ChatPrompt, select a model, and a parser to normalize the response into a readable form.
# Import required components from LangChain for prompt structuring and LLM interaction
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
# Convert the detailed quiz generation prompt into a structured format for the LLM
structured_chat_prompt = ChatPromptTemplate.from_messages([("user", quiz_generation_prompt_template)])
# Select the language model for quiz question generation
language_model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Configure an output parser to convert the LLM response into a readable format
response_parser = StrOutputParser()
Now we connect everything using the LangChain Expression Language into a single pipeline for reproducible generation.
# Compose the structured prompt, language model, and output parser into a quiz generation pipeline
quiz_generation_pipeline = structured_chat_prompt | language_model | response_parser
# Execute the pipeline to generate a quiz (example invocation not shown)
Next, encapsulate the setup and execution of quiz generation into a single reusable function. This increases modularity and simplifies maintenance. generate_quiz_assistant_pipeline
bundles prompt creation, model selection, and parsing into one workflow.
Quick overview: generate_quiz_assistant_pipeline
is flexible and allows plugging in different templates and configurations (models/parsers). Function definition:
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
def generate_quiz_assistant_pipeline(
system_prompt_message,
user_question_template="{question}",
selected_language_model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
response_format_parser=StrOutputParser()):
"""
Assembles the components required to generate quizzes through an AI‑based process.
Parameters:
- system_prompt_message: A message containing instructions or context for quiz generation.
- user_question_template: A template for structuring user questions; defaults to a simple placeholder.
- selected_language_model: The AI model used to generate content; a default model is provided.
- response_format_parser: A mechanism for parsing the LLM response into the desired format.
Returns:
A LangChain pipeline that, when invoked, generates a quiz based on the provided system message and user template.
"""
# Create a structured chat prompt from the system and user messages
structured_chat_prompt = ChatPromptTemplate.from_messages([
("system", system_prompt_message),
("user", user_question_template),
])
# Compose the chat prompt, language model, and output parser into a single pipeline
quiz_generation_pipeline = structured_chat_prompt | selected_language_model | response_format_parser
return quiz_generation_pipeline
Practical usage. The function hides the complexity of composing components: simply call generate_quiz_assistant_pipeline
with the required arguments to generate topic/category quizzes and easily integrate into larger systems. A few practical tips:
- Configuration: use parameters to flexibly tune the process.
- Model choice: experiment with models for quality/creativity trade‑offs.
-
Prompt design: plan
Error handling: account for API limits and unexpected responses.user_question_template
andsystem_prompt_message
thoughtfully.
Including this function in your project simplifies creating AI‑driven quizzes, enabling innovative educational tools and interactive content.
To add quality checks, introduce evaluate_quiz_content
: it verifies that the generated quiz contains the expected topic keywords — essential for relevance and correctness in learning scenarios.
Now about content evaluation. The function integrates with the generation pipeline: it accepts the system message (instructions/context), a specific request (e.g., a topic for the quiz), and a list of expected words/phrases that should appear in the result. Function definition:
def evaluate_quiz_content(
system_prompt_message,
quiz_request_question,
expected_keywords,
user_question_template="{question}",
selected_language_model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
response_format_parser=StrOutputParser()):
"""
Evaluates the generated quiz content to ensure it includes expected keywords or phrases.
Parameters:
- system_prompt_message: Instructions or context for quiz generation.
- quiz_request_question: The specific question or request that triggers quiz generation.
- expected_keywords: A list of words or phrases that must be present in the quiz content.
- user_question_template: A template for structuring user questions; defaults to a simple placeholder.
- selected_language_model: The AI model used to generate content; a default model is provided.
- response_format_parser: A mechanism for parsing the LLM response into the desired format.
Raises:
- AssertionError: If none of the expected keywords are found in the generated quiz content.
"""
# Use the helper to generate quiz content based on the provided request
generated_content = generate_quiz_assistant_pipeline(
system_prompt_message,
user_question_template,
selected_language_model,
response_format_parser).invoke({"question": quiz_request_question})
print(generated_content)
# Verify that the generated content includes at least one of the expected keywords
assert any(keyword.lower() in generated_content.lower() for keyword in expected_keywords), \
f"Expected the generated quiz to contain one of '{expected_keywords}', but none were found."
Consider an example: generate and evaluate a science quiz.
# Define the system message (or prompt template), the specific request, and the expected keywords
system_prompt_message = quiz_generation_prompt_template # Assumes this variable was defined earlier in your code
quiz_request_question = "Generate a quiz about science."
expected_keywords = ["renaissance innovator", "astronomical observation tools", "natural sciences"]
# Call the evaluation function with the test parameters
evaluate_quiz_content(
system_prompt_message,
quiz_request_question,
expected_keywords
)
This example shows how evaluate_quiz_content
can confirm that a science quiz includes relevant themes (figures, instruments, concepts). Good practices:
- Keyword selection — make them specific enough but leave room for variation.
- Broad checks — use multiple keyword sets for different topics.
- Iterative approach — refine template/parameters/dataset based on evaluation results.
Structured testing helps maintain quality and uncover opportunities to improve relevance and engagement.
To handle out‑of‑scope requests, introduce evaluate_request_refusal
, which tests proper refusal in inappropriate scenarios. This matters for trust and user experience (UX): the function simulates cases where the system should refuse (based on relevance/constraints) and verifies that the expected refusal message is returned. Function definition:
def evaluate_request_refusal(
system_prompt_message,
invalid_quiz_request_question,
expected_refusal_response,
user_question_template="{question}",
selected_language_model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
response_format_parser=StrOutputParser()):
"""
Evaluates the system’s response to ensure it correctly refuses invalid or out‑of‑scope requests.
Parameters:
- system_prompt_message: Instructions or context for quiz generation.
- invalid_quiz_request_question: A request that the system should decline.
- expected_refusal_response: The expected text indicating the system’s refusal to fulfill the request.
- user_question_template: A template for structuring user questions; defaults to a simple placeholder.
- selected_language_model: The AI model used to generate content; a default model is provided.
- response_format_parser: A mechanism for parsing the LLM response into the desired format.
Raises:
- AssertionError: If the system’s response does not contain the expected refusal message.
"""
# Align parameter order with what `generate_quiz_assistant_pipeline` expects
generated_response = generate_quiz_assistant_pipeline(
system_prompt_message,
user_question_template,
selected_language_model,
response_format_parser).invoke({"question": invalid_quiz_request_question})
print(generated_response)
# Check that the system’s response contains the expected refusal phrase
assert expected_refusal_response.lower() in generated_response.lower(), \
f"Expected a refusal message '{expected_refusal_response}', but got: {generated_response}"
To illustrate evaluate_request_refusal
, consider a scenario where the quiz generator should refuse to create a quiz because the request is outside its scope or unsupported by the current configuration.
# Define the system message (or prompt template), an out‑of‑scope request, and the expected refusal message
system_prompt_message = quiz_generation_prompt_template # Assumes this variable was defined earlier in your code
invalid_quiz_request_question = "Generate a quiz about Rome."
expected_refusal_response = "I'm sorry, but I can't generate a quiz about Rome at this time."
# Run the refusal evaluation with the specified parameters
evaluate_request_refusal(
system_prompt_message,
invalid_quiz_request_question,
expected_refusal_response
)
This example demonstrates how to test the quiz generator’s response to a request that should be declined: by checking for the expected refusal message, we ensure the system behaves correctly when facing requests it cannot fulfill. Tips and suggestions:
- Clear refusal messages: make them informative so users understand why the request cannot be completed.
- Comprehensive testing: use diverse scenarios, including unsupported topics or formats, to thoroughly evaluate refusal logic.
- Refinement and feedback: iterate on refusal logic and messaging to improve user understanding and satisfaction.
- Consider UX: where possible, offer alternatives or suggestions to maintain a positive interaction.
Implementing and testing refusal scenarios ensures the quiz generator can reliably handle a wide range of requests, maintaining robustness and user trust even when it cannot provide the requested content.
To adapt the provided template to a practical test scenario focused on a science‑themed quiz, we add a test_science_quiz
function. It evaluates whether AI‑generated quiz questions truly center on expected scientific topics or subjects. By integrating evaluate_quiz_content
, we can ensure the quiz includes specific keywords or themes characteristic of the science category.
Finally, we tailor evaluate_quiz_content
for a science test case: the function checks whether the generated content aligns with expected scientific themes. Function definition for testing a science quiz:
def test_science_quiz():
"""
Tests the quiz generator’s ability to create science‑related questions by checking for expected subjects.
"""
# Define the request to generate a quiz question
question_request = "Generate a quiz question."
# The list of expected keywords or subjects indicating scientific alignment
expected_science_subjects = ["physics", "chemistry", "biology", "astronomy"]
# The system message or prompt template configured for quiz generation
system_prompt_message = quiz_generation_prompt_template # This should be defined earlier in your code
# Invoke the evaluation with science‑specific parameters
evaluate_quiz_content(
system_prompt_message=system_prompt_message,
quiz_request_question=question_request,
expected_keywords=expected_science_subjects
)
This function encapsulates the validation logic: for a science request, content must contain expected science themes/keywords. Calling test_science_quiz
simulates the request and checks for scientific themes — a key indicator of correct generation. Refine the keyword list for your domain and coverage, expand tests for other categories (history/geography/art), and analyze failures: compare expectations with results to improve prompt logic/dataset. Structured testing helps maintain quality and discover opportunities to improve relevance and engagement.
Lastly — a quick look at CI/CD: the .circleci/config.yml
file in the repository root describes a YAML‑based pipeline (build/test/deploy). Below is a sketch for a Python project with automated tests:
version: 2.1
orbs:
python: circleci/python@1.2.0 # Use the Python orb to simplify your config
jobs:
build-and-test:
docker:
- image: cimg/python:3.8 # Specify the Python version
steps:
- checkout # Check out the source code
- restore_cache: # Restore cache to save time on dependencies installation
keys:
- v1-dependencies-{{ checksum "requirements.txt" }}
- v1-dependencies-
- run:
name: Install Dependencies
command: pip install -r requirements.txt
- save_cache: # Cache dependencies to speed up future builds
paths:
- ./venv
key: v1-dependencies-{{ checksum "requirements.txt" }}
- run:
name: Run Tests
command: pytest # Or any other command to run your tests
workflows:
version: 2
build_and_test:
jobs:
- build-and-test
Key elements: version
— the config version (commonly 2.1); orbs
— reusable blocks, here the python
orb helps with environment setup; jobs
— a set of tasks, here a single build-and-test
; docker
— the image to run (e.g., cimg/python:3.8
); steps
— the sequence (checkout, cache, dependencies, tests); workflows
— ties jobs into a process and triggers them by rule.
To customize: pick your Python version under docker
, replace pytest
with your test command, and add extra steps (DB, env vars, etc.) as additional - run:
blocks. After committing .circleci/config.yml
, CircleCI detects the configuration and will run the pipeline on each commit per your rules.
Theoretical Questions
- What components are necessary to set up the environment for an AI‑based quiz generator?
- How do you structure a dataset for generating quiz questions? Include examples of categories and facts.
- How does prompt engineering influence customized quiz generation? Provide a sample prompt template.
- Explain LangChain’s role in structuring prompts for LLM processing.
- What constitutes the quiz generation pipeline when using the LangChain Expression Language?
- How can functions for evaluation ensure the relevance and accuracy of generated quiz content?
- Describe a method for testing the system’s ability to refuse quiz generation under certain conditions.
- How can you test LLM‑generated quiz questions for alignment with expected science topics or subjects?
- Describe the key components of a CircleCI configuration file for a Python project, including automated test execution.
- Discuss the importance of customizing the CircleCI config to match a project’s specific needs.
Practical Assignments
-
Create a quiz dataset: Define a Python dictionary named
quiz_bank
representing a collection of quiz entries, each containing subjects, categories, and facts similar to the example. Ensure your dictionary supports easy access to subjects, categories, and facts. -
Generate quiz questions using prompts: Implement a function
generate_quiz_questions(category)
that accepts a category (e.g., "History", "Technology") as input and returns a list of generated quiz questions based on subjects and facts fromquiz_bank
. Use string operations or templates to construct the questions. -
Implement LangChain‑style prompt structuring: Simulate using LangChain’s capabilities by writing a function
structure_quiz_prompt(quiz_questions)
that accepts a list of quiz questions and returns a structured chat prompt in a format similar to the one described, without actually integrating LangChain. -
Quiz generation pipeline: Create a Python function
generate_quiz_pipeline()
that simulates creating and running a quiz generation pipeline using placeholders for LangChain components. The function should print a message emulating pipeline execution. -
Reusable quiz generation function: Implement a Python function
generate_quiz_assistant_pipeline(system_prompt_message, user_question_template="{question}")
that simulates assembling the components needed for quiz generation. Use string formatting to construct the detailed prompt from inputs. -
Evaluate generated quiz content: Write a function
evaluate_quiz_content(generated_content, expected_keywords)
that accepts generated quiz content and a list of expected keywords, and checks whether the content contains any of the keywords. Raise an assertion error with a custom message if none are found. -
Handle invalid quiz requests: Develop a function
evaluate_request_refusal(invalid_request, expected_response)
that simulates evaluating the system’s response to an invalid quiz request. The function should verify whether the refusal text matches the expected refusal response. -
Science Quiz Evaluation Test: Develop a Python function
test_science_quiz()
that uses theevaluate_quiz_content
function to test if a generated science quiz includes questions related to expected scientific topics, such as "physics" or "chemistry".