Skip to content

3.3 AI Quiz Generation Mechanism

This chapter assembles a working AI‑powered quiz generator end to end: we set up the environment and access to external services, prepare a compact dataset of subjects/categories/facts, design a prompt so questions strictly match the chosen category, and wire it all into a LangChain pipeline. We start with environment setup and keys; to keep output clean you can suppress non‑essential warnings.

# Use the warnings library to control warning messages
import warnings

# Ignore all warnings to ensure clean runtime output
warnings.filterwarnings('ignore')

# Load API keys for third‑party services used in the project
from utils import get_circle_ci_api_key, get_github_api_key, get_openai_api_key

# Obtain individual API keys for CircleCI, GitHub, and OpenAI
circle_ci_api_key = get_circle_ci_api_key()
github_api_key = get_github_api_key()
openai_api_key = get_openai_api_key()

Next, we form the app’s backbone — a compact dataset from which questions will be composed: we fix subjects, categories, and facts that quizzes will be built from.

# Define a template for structuring quiz questions
quiz_question_template = "{question}"

# Initialize a quiz bank with subjects, categories, and facts
quiz_bank = """
Here are three new quiz questions following the given format:

1. Subject: A Historical Conflict  
   Categories: History, Politics  
   Facts:  
   - Began in 1914 and ended in 1918  
   - Involved two major alliances: the Allies and the Central Powers  
   - Known for extensive trench warfare on the Western Front  

2. Subject: A Revolutionary Communication Technology  
   Categories: Technology, History  
   Facts:  
   - Invented by Alexander Graham Bell in 1876  
   - Revolutionized long‑distance communication  
   - The first words transmitted were "Mr. Watson, come here, I want to see you"  

3. Subject: An Iconic American Landmark  
   Categories: Geography, History  
   Facts:  
   - Gifted to the United States by France in 1886  
   - Symbolizes freedom and democracy  
   - Located on Liberty Island in New York Harbor  
"""

To ensure questions are relevant to the user’s selected category, we design a detailed prompt template: from category selection via the quiz bank to formulating questions in the prescribed format.

# Define a delimiter to separate different parts of the quiz prompt
section_delimiter = "####"

# Create a detailed prompt template guiding the AI to generate user‑customized quizzes
quiz_generation_prompt_template = f"""
Instructions for generating a customized quiz:
Each question is separated by four hashes, i.e. {section_delimiter}

The user chooses a category for the quiz. Ensure the questions are relevant to the chosen category.

Step 1:{section_delimiter} Identify the user‑selected category from the list below:
* Culture
* Science
* Art

Step 2:{section_delimiter} Choose up to two subjects that match the selected category from the quiz bank:

{quiz_bank}

Step 3:{section_delimiter} Create a quiz based on the selected subjects by formulating three questions per subject.

Quiz format:
Question 1:{section_delimiter} <Insert Question 1>
Question 2:{section_delimiter} <Insert Question 2>
Question 3:{section_delimiter} <Insert Question 3>
"""

With this template, we move to LangChain: form a ChatPrompt, select a model, and a parser to normalize the response into a readable form.

# Import required components from LangChain for prompt structuring and LLM interaction
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

# Convert the detailed quiz generation prompt into a structured format for the LLM
structured_chat_prompt = ChatPromptTemplate.from_messages([("user", quiz_generation_prompt_template)])

# Select the language model for quiz question generation
language_model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Configure an output parser to convert the LLM response into a readable format
response_parser = StrOutputParser()

Now we connect everything using the LangChain Expression Language into a single pipeline for reproducible generation.

# Compose the structured prompt, language model, and output parser into a quiz generation pipeline
quiz_generation_pipeline = structured_chat_prompt | language_model | response_parser

# Execute the pipeline to generate a quiz (example invocation not shown)

Next, encapsulate the setup and execution of quiz generation into a single reusable function. This increases modularity and simplifies maintenance. generate_quiz_assistant_pipeline bundles prompt creation, model selection, and parsing into one workflow.

Quick overview: generate_quiz_assistant_pipeline is flexible and allows plugging in different templates and configurations (models/parsers). Function definition:

from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

def generate_quiz_assistant_pipeline(
    system_prompt_message,
    user_question_template="{question}",
    selected_language_model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
    response_format_parser=StrOutputParser()):
    """
    Assembles the components required to generate quizzes through an AI‑based process.

    Parameters:
    - system_prompt_message: A message containing instructions or context for quiz generation.
    - user_question_template: A template for structuring user questions; defaults to a simple placeholder.
    - selected_language_model: The AI model used to generate content; a default model is provided.
    - response_format_parser: A mechanism for parsing the LLM response into the desired format.

    Returns:
    A LangChain pipeline that, when invoked, generates a quiz based on the provided system message and user template.
    """

    # Create a structured chat prompt from the system and user messages
    structured_chat_prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt_message),
        ("user", user_question_template),
    ])

    # Compose the chat prompt, language model, and output parser into a single pipeline
    quiz_generation_pipeline = structured_chat_prompt | selected_language_model | response_format_parser

    return quiz_generation_pipeline

Practical usage. The function hides the complexity of composing components: simply call generate_quiz_assistant_pipeline with the required arguments to generate topic/category quizzes and easily integrate into larger systems. A few practical tips:

  • Configuration: use parameters to flexibly tune the process.
  • Model choice: experiment with models for quality/creativity trade‑offs.
  • Prompt design: plan user_question_template and system_prompt_message thoughtfully.

    Error handling: account for API limits and unexpected responses.

Including this function in your project simplifies creating AI‑driven quizzes, enabling innovative educational tools and interactive content.

To add quality checks, introduce evaluate_quiz_content: it verifies that the generated quiz contains the expected topic keywords — essential for relevance and correctness in learning scenarios.

Now about content evaluation. The function integrates with the generation pipeline: it accepts the system message (instructions/context), a specific request (e.g., a topic for the quiz), and a list of expected words/phrases that should appear in the result. Function definition:

def evaluate_quiz_content(
    system_prompt_message,
    quiz_request_question,
    expected_keywords,
    user_question_template="{question}",
    selected_language_model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
    response_format_parser=StrOutputParser()):
    """
    Evaluates the generated quiz content to ensure it includes expected keywords or phrases.

    Parameters:
    - system_prompt_message: Instructions or context for quiz generation.
    - quiz_request_question: The specific question or request that triggers quiz generation.
    - expected_keywords: A list of words or phrases that must be present in the quiz content.
    - user_question_template: A template for structuring user questions; defaults to a simple placeholder.
    - selected_language_model: The AI model used to generate content; a default model is provided.
    - response_format_parser: A mechanism for parsing the LLM response into the desired format.

    Raises:
    - AssertionError: If none of the expected keywords are found in the generated quiz content.
    """

    # Use the helper to generate quiz content based on the provided request
    generated_content = generate_quiz_assistant_pipeline(
        system_prompt_message,
        user_question_template,
        selected_language_model,
        response_format_parser).invoke({"question": quiz_request_question})

    print(generated_content)

    # Verify that the generated content includes at least one of the expected keywords
    assert any(keyword.lower() in generated_content.lower() for keyword in expected_keywords), \
        f"Expected the generated quiz to contain one of '{expected_keywords}', but none were found."

Consider an example: generate and evaluate a science quiz.

# Define the system message (or prompt template), the specific request, and the expected keywords
system_prompt_message = quiz_generation_prompt_template  # Assumes this variable was defined earlier in your code
quiz_request_question = "Generate a quiz about science."
expected_keywords = ["renaissance innovator", "astronomical observation tools", "natural sciences"]

# Call the evaluation function with the test parameters
evaluate_quiz_content(
    system_prompt_message,
    quiz_request_question,
    expected_keywords
)

This example shows how evaluate_quiz_content can confirm that a science quiz includes relevant themes (figures, instruments, concepts). Good practices:

  • Keyword selection — make them specific enough but leave room for variation.
  • Broad checks — use multiple keyword sets for different topics.
  • Iterative approach — refine template/parameters/dataset based on evaluation results.

Structured testing helps maintain quality and uncover opportunities to improve relevance and engagement.

To handle out‑of‑scope requests, introduce evaluate_request_refusal, which tests proper refusal in inappropriate scenarios. This matters for trust and user experience (UX): the function simulates cases where the system should refuse (based on relevance/constraints) and verifies that the expected refusal message is returned. Function definition:

def evaluate_request_refusal(
    system_prompt_message,
    invalid_quiz_request_question,
    expected_refusal_response,
    user_question_template="{question}",
    selected_language_model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
    response_format_parser=StrOutputParser()):
    """
    Evaluates the system’s response to ensure it correctly refuses invalid or out‑of‑scope requests.

    Parameters:
    - system_prompt_message: Instructions or context for quiz generation.
    - invalid_quiz_request_question: A request that the system should decline.
    - expected_refusal_response: The expected text indicating the system’s refusal to fulfill the request.
    - user_question_template: A template for structuring user questions; defaults to a simple placeholder.
    - selected_language_model: The AI model used to generate content; a default model is provided.
    - response_format_parser: A mechanism for parsing the LLM response into the desired format.

    Raises:
    - AssertionError: If the system’s response does not contain the expected refusal message.
    """

    # Align parameter order with what `generate_quiz_assistant_pipeline` expects
    generated_response = generate_quiz_assistant_pipeline(
        system_prompt_message,
        user_question_template,
        selected_language_model,
        response_format_parser).invoke({"question": invalid_quiz_request_question})

    print(generated_response)

    # Check that the system’s response contains the expected refusal phrase
    assert expected_refusal_response.lower() in generated_response.lower(), \
        f"Expected a refusal message '{expected_refusal_response}', but got: {generated_response}"

To illustrate evaluate_request_refusal, consider a scenario where the quiz generator should refuse to create a quiz because the request is outside its scope or unsupported by the current configuration.

# Define the system message (or prompt template), an out‑of‑scope request, and the expected refusal message
system_prompt_message = quiz_generation_prompt_template  # Assumes this variable was defined earlier in your code
invalid_quiz_request_question = "Generate a quiz about Rome."
expected_refusal_response = "I'm sorry, but I can't generate a quiz about Rome at this time."

# Run the refusal evaluation with the specified parameters
evaluate_request_refusal(
    system_prompt_message,
    invalid_quiz_request_question,
    expected_refusal_response
)

This example demonstrates how to test the quiz generator’s response to a request that should be declined: by checking for the expected refusal message, we ensure the system behaves correctly when facing requests it cannot fulfill. Tips and suggestions:

  • Clear refusal messages: make them informative so users understand why the request cannot be completed.
  • Comprehensive testing: use diverse scenarios, including unsupported topics or formats, to thoroughly evaluate refusal logic.
  • Refinement and feedback: iterate on refusal logic and messaging to improve user understanding and satisfaction.
  • Consider UX: where possible, offer alternatives or suggestions to maintain a positive interaction.

Implementing and testing refusal scenarios ensures the quiz generator can reliably handle a wide range of requests, maintaining robustness and user trust even when it cannot provide the requested content.

To adapt the provided template to a practical test scenario focused on a science‑themed quiz, we add a test_science_quiz function. It evaluates whether AI‑generated quiz questions truly center on expected scientific topics or subjects. By integrating evaluate_quiz_content, we can ensure the quiz includes specific keywords or themes characteristic of the science category.

Finally, we tailor evaluate_quiz_content for a science test case: the function checks whether the generated content aligns with expected scientific themes. Function definition for testing a science quiz:

def test_science_quiz():
    """
    Tests the quiz generator’s ability to create science‑related questions by checking for expected subjects.
    """
    # Define the request to generate a quiz question
    question_request = "Generate a quiz question."

    # The list of expected keywords or subjects indicating scientific alignment
    expected_science_subjects = ["physics", "chemistry", "biology", "astronomy"]

    # The system message or prompt template configured for quiz generation
    system_prompt_message = quiz_generation_prompt_template  # This should be defined earlier in your code

    # Invoke the evaluation with science‑specific parameters
    evaluate_quiz_content(
        system_prompt_message=system_prompt_message,
        quiz_request_question=question_request,
        expected_keywords=expected_science_subjects
    )

This function encapsulates the validation logic: for a science request, content must contain expected science themes/keywords. Calling test_science_quiz simulates the request and checks for scientific themes — a key indicator of correct generation. Refine the keyword list for your domain and coverage, expand tests for other categories (history/geography/art), and analyze failures: compare expectations with results to improve prompt logic/dataset. Structured testing helps maintain quality and discover opportunities to improve relevance and engagement.

Lastly — a quick look at CI/CD: the .circleci/config.yml file in the repository root describes a YAML‑based pipeline (build/test/deploy). Below is a sketch for a Python project with automated tests:

version: 2.1

orbs:
  python: circleci/python@1.2.0  # Use the Python orb to simplify your config

jobs:
  build-and-test:
    docker:
      - image: cimg/python:3.8  # Specify the Python version
    steps:
      - checkout  # Check out the source code
      - restore_cache:  # Restore cache to save time on dependencies installation
          keys:
            - v1-dependencies-{{ checksum "requirements.txt" }}
            - v1-dependencies-
      - run:
          name: Install Dependencies
          command: pip install -r requirements.txt
      - save_cache:  # Cache dependencies to speed up future builds
          paths:
            - ./venv
          key: v1-dependencies-{{ checksum "requirements.txt" }}
      - run:
          name: Run Tests
          command: pytest  # Or any other command to run your tests

workflows:
  version: 2
  build_and_test:
    jobs:
      - build-and-test

Key elements: version — the config version (commonly 2.1); orbs — reusable blocks, here the python orb helps with environment setup; jobs — a set of tasks, here a single build-and-test; docker — the image to run (e.g., cimg/python:3.8); steps — the sequence (checkout, cache, dependencies, tests); workflows — ties jobs into a process and triggers them by rule.

To customize: pick your Python version under docker, replace pytest with your test command, and add extra steps (DB, env vars, etc.) as additional - run: blocks. After committing .circleci/config.yml, CircleCI detects the configuration and will run the pipeline on each commit per your rules.

Theoretical Questions

  1. What components are necessary to set up the environment for an AI‑based quiz generator?
  2. How do you structure a dataset for generating quiz questions? Include examples of categories and facts.
  3. How does prompt engineering influence customized quiz generation? Provide a sample prompt template.
  4. Explain LangChain’s role in structuring prompts for LLM processing.
  5. What constitutes the quiz generation pipeline when using the LangChain Expression Language?
  6. How can functions for evaluation ensure the relevance and accuracy of generated quiz content?
  7. Describe a method for testing the system’s ability to refuse quiz generation under certain conditions.
  8. How can you test LLM‑generated quiz questions for alignment with expected science topics or subjects?
  9. Describe the key components of a CircleCI configuration file for a Python project, including automated test execution.
  10. Discuss the importance of customizing the CircleCI config to match a project’s specific needs.

Practical Assignments

  1. Create a quiz dataset: Define a Python dictionary named quiz_bank representing a collection of quiz entries, each containing subjects, categories, and facts similar to the example. Ensure your dictionary supports easy access to subjects, categories, and facts.

  2. Generate quiz questions using prompts: Implement a function generate_quiz_questions(category) that accepts a category (e.g., "History", "Technology") as input and returns a list of generated quiz questions based on subjects and facts from quiz_bank. Use string operations or templates to construct the questions.

  3. Implement LangChain‑style prompt structuring: Simulate using LangChain’s capabilities by writing a function structure_quiz_prompt(quiz_questions) that accepts a list of quiz questions and returns a structured chat prompt in a format similar to the one described, without actually integrating LangChain.

  4. Quiz generation pipeline: Create a Python function generate_quiz_pipeline() that simulates creating and running a quiz generation pipeline using placeholders for LangChain components. The function should print a message emulating pipeline execution.

  5. Reusable quiz generation function: Implement a Python function generate_quiz_assistant_pipeline(system_prompt_message, user_question_template="{question}") that simulates assembling the components needed for quiz generation. Use string formatting to construct the detailed prompt from inputs.

  6. Evaluate generated quiz content: Write a function evaluate_quiz_content(generated_content, expected_keywords) that accepts generated quiz content and a list of expected keywords, and checks whether the content contains any of the keywords. Raise an assertion error with a custom message if none are found.

  7. Handle invalid quiz requests: Develop a function evaluate_request_refusal(invalid_request, expected_response) that simulates evaluating the system’s response to an invalid quiz request. The function should verify whether the refusal text matches the expected refusal response.

  8. Science Quiz Evaluation Test: Develop a Python function test_science_quiz() that uses the evaluate_quiz_content function to test if a generated science quiz includes questions related to expected scientific topics, such as "physics" or "chemistry".