Skip to content

3.3 Implementing the AI Quiz Generation Mechanism

In this chapter, we explore the creation of an AI-powered quiz generator, a sample application that demonstrates how to leverage third-party APIs, dataset creation, and prompt engineering for AI models. This project illustrates the practical application of AI in generating educational content, specifically quizzes, based on user-selected categories.

Section 1: Preparing the Environment

The first step in building our AI-powered quiz generator involves setting up the environment and ensuring that we have access to necessary third-party APIs. This setup includes silencing warnings that can clutter our output, thereby making our development process smoother and more readable.

# Import the warnings library to control warning messages
import warnings

# Ignore all warning messages to ensure clean output during execution
warnings.filterwarnings('ignore')

# Load API tokens for third-party services used in the project
from utils import get_circle_ci_api_key, get_github_api_key, get_openai_api_key

# Retrieve individual API keys for CircleCI, GitHub, and OpenAI
circle_ci_api_key = get_circle_ci_api_key()
github_api_key = get_github_api_key()
openai_api_key = get_openai_api_key()

Section 2: Creating the Quiz Dataset

The core of our quiz generator is the dataset from which it will generate questions. This dataset includes subjects from various categories, each with unique facts that can be used to craft quiz questions.

# Define a template for structuring quiz questions
quiz_question_template = "{question}"

# Initialize the quiz bank with subjects, categories, and facts
quiz_bank = """
Here are three new quiz questions following the given format:

1. Subject: Historical Conflict  
   Categories: History, Politics  
   Facts:  
   - Began in 1914 and ended in 1918  
   - Involved two major alliances: the Allies and the Central Powers  
   - Known for the extensive use of trench warfare on the Western Front  

2. Subject: Revolutionary Communication Technology  
   Categories: Technology, History  
   Facts:  
   - Invented by Alexander Graham Bell in 1876  
   - Revolutionized long-distance communication  
   - First words transmitted were "Mr. Watson, come here, I want to see you"  

3. Subject: Iconic American Landmark  
   Categories: Geography, History  
   Facts:  
   - Gifted to the United States by France in 1886  
   - Symbolizes freedom and democracy  
   - Located on Liberty Island in New York Harbor  
"""

Section 3: Engineering the Quiz Generation Prompt

To generate quizzes tailored to the user's preferred category, we craft a detailed prompt template that guides the AI in creating quizzes. This template outlines the steps the AI should follow, from category selection to question generation.

# Define a delimiter to separate different parts of the quiz prompt
section_delimiter = "####"

# Craft a detailed prompt template guiding the AI to generate custom quizzes
quiz_generation_prompt_template = f"""
Instructions for Generating a Customized Quiz:
Each question is separated by four hashtags, i.e., {section_delimiter}

The user selects a category for their quiz. Ensure questions are relevant to the chosen category.

Step 1:{section_delimiter} Identify the user's chosen category from the following list:
* Culture
* Science
* Art

Step 2:{section_delimiter} Select up to two subjects that align with the chosen category from the quiz bank:

{quiz_bank}

Step 3:{section_delimiter} Create a quiz based on the selected subjects, formulating three questions per subject.

Quiz Format:
Question 1:{section_delimiter} <Insert Question 1>
Question 2:{section_delimiter} <Insert Question 2>
Question 3:{section_delimiter} <Insert Question 3>
"""

Section 4: Utilizing Langchain to Structure the Prompt

With the prompt template ready, we use Langchain's capabilities to structure it for processing by an AI model. This includes setting up a chat prompt, choosing the AI model, and parsing its output.

# Import necessary components from Langchain for prompt structuring and AI model interaction
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

# Convert the detailed quiz generation prompt into a structured format for the AI
structured_chat_prompt = ChatPromptTemplate.from_messages([("user", quiz_generation_prompt_template)])

# Select the language model for generating quiz questions
language_model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Set up an output parser to convert the AI's response into a readable format
response_parser = StrOutputParser()

Section 5: Executing the Quiz Generation Process

Finally, we combine all the components using Langchain's expression language to create a seamless quiz generation pipeline.

# Connect the structured prompt, language model, and output parser to form the quiz generation pipeline
quiz_generation_pipeline = structured_chat_prompt | language_model | response_parser

# Execute the pipeline to generate a quiz (execution example not shown)

In this section, we encapsulate the entire process of setting up and executing the AI-powered quiz generation into a single, reusable function. This design pattern promotes modularity and reusability, allowing for easy adjustments and maintenance. The function generate_quiz_assistant_pipeline combines all the necessary components—prompt creation, model selection, and output parsing—into a coherent workflow that can be invoked with customized inputs.

Function Overview

The generate_quiz_assistant_pipeline function is designed to be versatile, accommodating different prompts and configurations for the quiz generation process. Its parameters allow for customization of the user's question template and the selection of specific language models and output parsers.

Function Definition

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

def generate_quiz_assistant_pipeline(
    system_prompt_message,
    user_question_template="{question}",
    selected_language_model=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),
    response_format_parser=StrOutputParser()):
    """
    Assembles the components required for generating quizzes through an AI-powered process.

    Parameters:
    - system_prompt_message: The message containing instructions or context for the quiz generation.
    - user_question_template: A template for structuring user questions, defaulting to a simple placeholder.
    - selected_language_model: The AI model used for generating content, with a default model specified.
    - response_format_parser: The mechanism for parsing the AI model's response into a desired format.

    Returns:
    A Langchain pipeline that, when executed, generates a quiz based on the provided system message and user template.
    """

    # Create a structured chat prompt from the system and user messages
    structured_chat_prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt_message),
        ("user", user_question_template),
    ])

    # Assemble the chat prompt, language model, and output parser into a single pipeline
    quiz_generation_pipeline = structured_chat_prompt | selected_language_model | response_format_parser

    return quiz_generation_pipeline

Practical Usage

This function abstracts away the complexity of setting up individual components for quiz generation. By calling generate_quiz_assistant_pipeline with appropriate arguments, users can easily generate quizzes on various subjects and categories. This abstraction not only simplifies the process for developers but also enhances the flexibility of the quiz generator, allowing for easy integration into larger systems or applications.

Best Practices and Tips

  • Customization: Utilize the function's parameters to customize the quiz generation process according to different needs or contexts.
  • Model Selection: Experiment with different language models to find the one that best suits your accuracy and creativity requirements.
  • Template Design: Craft your user_question_template and system_prompt_message carefully to guide the AI in generating relevant and engaging quiz questions.
  • Error Handling: Implement error handling within the function to manage issues that may arise during the quiz generation process, such as API limitations or unexpected model responses.

Incorporating this function into your project simplifies the creation of AI-powered quizzes, enabling innovative educational tools and interactive content.

To enhance the AI-powered quiz generator with evaluation capabilities, we introduce the evaluate_quiz_content function. This function is designed to evaluate the generated quiz content, ensuring that it contains expected keywords related to a given topic. This kind of evaluation is crucial for verifying the relevance and accuracy of the generated quizzes, especially in educational or training contexts where content validity is paramount.

Function Overview

The evaluate_quiz_content function integrates with the previously established quiz generation pipeline. It takes a system message (which includes instructions or context for generating a quiz), a specific question (such as a request to generate a quiz on a particular topic), and a list of expected words or phrases that should appear in the quiz content to consider the generation successful.

Function Definition

def evaluate_quiz_content(
    system_prompt_message,
    quiz_request_question,
    expected_keywords,
    user_question_template="{question}",
    selected_language_model=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),
    response_format_parser=StrOutputParser()):
    """
    Evaluates the generated quiz content to ensure it includes expected keywords or phrases.

    Parameters:
    - system_prompt_message: Instructions or context for the quiz generation.
    - quiz_request_question: The specific question or request for generating a quiz.
    - expected_keywords: A list of words or phrases expected to be included in the quiz content.
    - user_question_template: A template for structuring user questions, with a default placeholder.
    - selected_language_model: The AI model used for content generation, with a default model specified.
    - response_format_parser: The mechanism for parsing the AI model's response into a desired format.

    Raises:
    - AssertionError: If none of the expected keywords are found in the generated quiz content.
    """

    # Utilize the assistant_chain function to generate quiz content based on the provided question
    generated_content = generate_quiz_assistant_pipeline(
        system_prompt_message,
        user_question_template,
        selected_language_model,
        response_format_parser).invoke({"question": quiz_request_question})

    print(generated_content)

    # Verify that the generated content includes at least one of the expected keywords
    assert any(keyword.lower() in generated_content.lower() for keyword in expected_keywords), \
        f"Expected the generated quiz to include one of '{expected_keywords}', but it did not."

Practical Example: Generating and Evaluating a Science Quiz

To put this evaluation function into practice, let's consider a test case where we generate and evaluate a quiz about science.

# Define the system message (or prompt template), the specific question, and the expected keywords
system_prompt_message = quiz_generation_prompt_template  # Assume this variable is defined as before
quiz_request_question = "Generate a quiz about science."
expected_keywords = ["renaissance innovator", "astronomical observation tools", "natural sciences"]

# Call the evaluation function with the test case parameters
evaluate_quiz_content(
    system_prompt_message,
    quiz_request_question,
    expected_keywords
)

This example demonstrates how to use the evaluate_quiz_content function to ensure that the generated quiz about science includes relevant content, such as questions related to significant scientific figures, tools, or concepts.

Best Practices and Tips

  • Keyword Selection: Choose expected keywords or phrases that are specific enough to accurately assess the quiz content's relevance but also broad enough to allow for creative variations in the AI's responses.
  • Comprehensive Evaluation: Consider using multiple sets of expected keywords for diverse topics to thoroughly test the AI's ability to generate relevant quizzes across different subjects.
  • Iterative Improvement: Use the evaluation results to iteratively refine the quiz generation process, including adjusting the prompt template, the language model's parameters, or the dataset used for generating quizzes.

To ensure that our AI-powered quiz generator can appropriately handle requests that fall outside its scope or capabilities, we introduce the evaluate_request_refusal function. This function is designed to test the system's ability to decline to answer under certain conditions, such as when the request is not applicable or beyond the system's current knowledge base. Handling such cases gracefully is essential for maintaining user trust and ensuring a positive user experience.

Function Overview

The evaluate_request_refusal function simulates scenarios where the system should refuse to generate a quiz, based on predefined criteria such as the relevance of the request or the system's limitations. It verifies that the system responds with a specified refusal message, indicating its inability to fulfill the request.

Function Definition

def evaluate_request_refusal(
    system_prompt_message,
    invalid_quiz_request_question,
    expected_refusal_response,
    user_question_template="{question}",
    selected_language_model=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),
    response_format_parser=StrOutputParser()):
    """
    Evaluates the system's response to ensure it appropriately declines to answer invalid or inapplicable requests.

    Parameters:
    - system_prompt_message: Instructions or context for the quiz generation.
    - invalid_quiz_request_question: A request that the system should decline to fulfill.
    - expected_refusal_response: The expected response indicating the system's refusal to answer the request.
    - user_question_template: A template for structuring user questions, with a default placeholder.
    - selected_language_model: The AI model used for content generation, with a default model specified.
    - response_format_parser: The mechanism for parsing the AI model's response into a desired format.

    Raises:
    - AssertionError: If the system's response does not include the expected refusal message.
    """

    # Reorder parameters to match the expected order in `generate_quiz_assistant_pipeline`
    generated_response = generate_quiz_assistant_pipeline(
        system_prompt_message,
        user_question_template,
        selected_language_model,
        response_format_parser).invoke({"question": invalid_quiz_request_question})

    print(generated_response)

    # Verify that the system's response includes the expected refusal message
    assert expected_refusal_response.lower() in generated_response.lower(), \
        f"Expected the system to decline with '{expected_refusal_response}', but received: {generated_response}"

Practical Example: Testing for Appropriate Refusal

To illustrate how evaluate_request_refusal works, let's consider a scenario where the quiz generator should refuse to generate a quiz due to the request being out of scope or not supported by the current configuration.

# Define the system message (or prompt template), a request that should be declined, and the expected refusal response
system_prompt_message = quiz_generation_prompt_template  # Assume this variable is defined as before
invalid_quiz_request_question = "Generate a quiz about Rome."
expected_refusal_response = "I'm sorry, but I can't generate a quiz about Rome at this time."

# Execute the refusal evaluation function with the specified parameters
evaluate_request_refusal(
    system_prompt_message,
    invalid_quiz_request_question,
    expected_refusal_response
)

This example demonstrates the function's ability to test the quiz generator's response to a request that should be declined. By verifying the presence of the expected refusal message, we can ensure that the system behaves as intended when faced with requests it cannot fulfill.

Best Practices and Tips

  • Clear Refusal Messages: Design refusal messages to be clear and informative, helping users understand why their request cannot be fulfilled.
  • Comprehensive Testing: Use a variety of test cases, including requests for unsupported topics or formats, to thoroughly evaluate the system's refusal logic.
  • Refinement and Feedback: Based on testing outcomes, refine the refusal logic and messages to enhance user understanding and satisfaction.
  • Consider User Experience: While refusal is sometimes necessary, consider providing alternative suggestions or guidance to maintain a positive user interaction.

Implementing and testing refusal scenarios ensures that the quiz generator can handle a wide range of user requests gracefully, maintaining reliability and user trust even when it cannot provide the requested content.

To adapt the provided template for a practical test scenario focused on generating a science-themed quiz, the function test_science_quiz is designed. This function aims to evaluate whether the AI-generated quiz questions indeed revolve around expected science topics or subjects. By incorporating the evaluate_quiz_content function, we can ensure that the quiz includes specific keywords or themes indicative of a science category.

Revised Function for Science Quiz Evaluation

Below, I'll adapt the previously outlined evaluate_quiz_content function into the context of this test scenario, ensuring clarity on expected outcomes and the evaluation process. The function will test if the AI-generated content aligns with the expected scientific themes.

Function Definition for Science Quiz Test

def test_science_quiz():
    """
    Tests the quiz generator's ability to produce questions related to science, verifying the inclusion of expected subjects.
    """
    # Define the request for generating a quiz question
    question_request = "Generate a quiz question."

    # List of expected keywords or subjects that indicate the quiz's alignment with science topics
    expected_science_subjects = ["physics", "chemistry", "biology", "astronomy"]

    # System message or prompt template configured for quiz generation
    system_prompt_message = quiz_generation_prompt_template  # This should be defined earlier in your code

    # Invoke the evaluation function with the science-specific parameters
    evaluate_quiz_content(
        system_prompt_message=system_prompt_message,
        quiz_request_question=question_request,
        expected_keywords=expected_science_subjects
    )

This function encapsulates the evaluation logic to ensure that when a quiz question is requested, the generated content reflects science subjects accurately. It leverages the structure of invoking the quiz generation and subsequent evaluation to ascertain the presence of science-related keywords or subjects within the generated content.

Execution and Evaluation

Executing test_science_quiz effectively simulates the scenario of requesting a quiz question from the system and then scrutinizing the response to confirm the inclusion of science-related subjects. This test plays a crucial role in verifying the system's capability to understand the context of the request and generate relevant content accordingly.

Best Practices and Considerations

  • Adjust Expectations as Needed: Depending on the specificity of your quiz generator's domain or the breadth of the science category, you might need to refine the list of expected subjects or keywords to better match your application's scope and accuracy.
  • Comprehensive Testing: Beyond science, consider implementing similar test functions for other categories your quiz generator supports, such as history, geography, or arts, to ensure comprehensive coverage and functionality across diverse subjects.
  • Analyze Failures for Improvement: Should the test fail, analyze the discrepancies between expected subjects and generated content to identify potential areas for refinement in your quiz generation logic or dataset.

This structured approach to testing not only ensures that the quiz generator performs as expected but also highlights areas for improvement, driving enhancements in content relevance and user engagement.

CircleCI Configuration File Overview

A CircleCI configuration file is named .circleci/config.yml and is placed at the root of your project's repository. This file defines the entire CI/CD pipeline in YAML syntax, specifying how your software should be built, tested, and deployed.

Here's an outline of what a basic CircleCI config file might look like for a Python project, including running tests automatically:

version: 2.1

orbs:
  python: circleci/python@1.2.0  # Use the Python orb to simplify your config

jobs:
  build-and-test:
    docker:
      - image: cimg/python:3.8  # Specify the Python version
    steps:
      - checkout  # Check out the source code
      - restore_cache:  # Restore cache to save time on dependencies installation
          keys:
            - v1-dependencies-{{ checksum "requirements.txt" }}
            - v1-dependencies-
      - run:
          name: Install Dependencies
          command: pip install -r requirements.txt
      - save_cache:  # Cache dependencies to speed up future builds
          paths:
            - ./venv
          key: v1-dependencies-{{ checksum "requirements.txt" }}
      - run:
          name: Run Tests
          command: pytest  # Or any other command to run your tests

workflows:
  version: 2
  build_and_test:
    jobs:
      - build-and-test

Key Components Explained

  • version: Specifies the CircleCI configuration version. As of my last update, 2.1 is commonly used.
  • orbs: Orbs are reusable snippets of code that simplify your configuration. The python orb is used here as an example to help set up Python environments.
  • jobs: Defines the jobs that will be run. In this case, there's a single job called build-and-test.
  • docker: Specifies the Docker image to use for the job. Here, it's using CircleCI's Python 3.8 image.
  • steps: The steps to be run as part of the job, including checking out the code, restoring cache, installing dependencies, and running tests.
  • workflows: Defines the workflow to run the jobs. This configuration has a single workflow that runs the build-and-test job.

Customizing Your Configuration

  • Adjust the Python version in the docker section according to your project's needs.
  • Replace the pytest command with the specific command you use to run your tests, if different.
  • If your project has additional setup steps (like setting up databases, configuring environment variables, etc.), you can add them as additional - run: steps in the jobs section.

To set up your tests to run automatically in CircleCI, commit this .circleci/config.yml file to your repository. Once pushed, CircleCI, if integrated with your GitHub or Bitbucket account, will automatically detect the configuration file and run your defined pipeline on each commit according to the rules you've set up.

Theory questions:

  1. What are the necessary components for setting up the environment for an AI-powered quiz generator?
  2. Describe how to structure a dataset for generating quiz questions. Include examples of categories and facts.
  3. How does prompt engineering influence the generation of customized quizzes? Provide an example of a prompt template.
  4. Explain the role of Langchain in structuring prompts for processing by an AI model.
  5. What constitutes the quiz generation pipeline in the context of using Langchain's expression language?
  6. How can one ensure the relevance and accuracy of generated quiz content through evaluation functions?
  7. Describe a method for testing the system's ability to decline generating a quiz under certain conditions.
  8. How can one test the AI-generated quiz questions for alignment with expected science topics or subjects?
  9. Outline the basic components of a CircleCI configuration file for a Python project, including automatic test running.
  10. Discuss the importance of customization in the CircleCI configuration file to match project-specific needs.

Practice questions:

  1. Create Quiz Dataset: Define a Python dictionary named quiz_bank that represents a collection of quiz questions, each with subjects, categories, and facts similar to the given example. Ensure your dictionary allows for easy access to subjects, categories, and facts.

  2. Generate Quiz Questions Using Prompts: Craft a function generate_quiz_questions(category) that takes a category (e.g., "History", "Technology") as input and returns a list of generated quiz questions based on the subjects and facts from the quiz_bank dictionary. Use string manipulation or template strings to construct your quiz questions.

  3. Implement Langchain Prompt Structuring: Simulate the process of using Langchain's capabilities by defining a function structure_quiz_prompt(quiz_questions) that takes a list of quiz questions and returns a structured chat prompt in a format similar to the one outlined, without actually integrating Langchain.

  4. Quiz Generation Pipeline: Create a Python function named generate_quiz_pipeline() that mimics the creation and execution of a quiz generation pipeline, using placeholders for Langchain components. The function should print a message simulating the execution of the pipeline.

  5. Reusable Quiz Generation Function: Implement a Python function generate_quiz_assistant_pipeline(system_prompt_message, user_question_template="{question}") that simulates assembling the components required for generating quizzes. Use string formatting to construct a detailed prompt based on the inputs.

  6. Evaluate Generated Quiz Content: Write a function evaluate_quiz_content(generated_content, expected_keywords) that takes generated quiz content and a list of expected keywords as inputs, and checks if the content contains any of the keywords. Raise an assertion error with a custom message if none of the keywords are found.

  7. Handle Invalid Quiz Requests: Design a function evaluate_request_refusal(invalid_request, expected_response) that simulates evaluating the system's response to an invalid quiz request. The function should assert whether the generated refusal response matches the expected refusal response.

  8. Science Quiz Evaluation Test: Develop a Python function test_science_quiz() that uses the evaluate_quiz_content function to test if a generated science quiz includes questions related to expected scientific topics, such as "physics" or "chemistry".