AI SQ Bootcamp

Β 

AI QA Specialist Course

Β 

The course is divided into thematic weeks, each with three sessions, balancing theory and hands-on practice. Below is a detailed breakdown, including content, hands-on activities, and expectations for each session.

This week, we’ll demystify what an AI model is and immediately jump into your first hands-on data investigation.

  • Day 1: AI Fundamentals & The QA Mindset
    • Lecture (1 hour): We’ll cover the basics: What is an AI model? How is it different from normal software? We’ll walk through the AI lifecycle (Data -> Model -> Deploy) and pinpoint exactly where QA adds the most value. We’ll also discuss the unique challenges, like testing a “black box.”
    • Practical Exercise (1 hour): You’ll be given a case study of a real-world AI application (e.g., a movie recommendation engine). Your task will be to create a mind map identifying at least 10 potential quality risks. For example, “What if the data is biased towards action movies?” or “How do we test if a ‘bad’ recommendation is a bug?” This exercise requires no code, only your critical QA thinking.
  • Day 2: Introduction to Data Profiling
    • Lecture (1 hour): We’ll introduce the “garbage in, garbage out” principle. You’ll learn what data profiling is and why it’s the first step in any AI testing project. We’ll cover common data issues like missing values, outliers, and incorrect data types.
    • Practical Exercise (1 hour): You’ll get your hands on your first dataset! Using Google Colab (a free, browser-based Python environment), you’ll load a CSV file into a Pandas DataFrame. You will run simple commands like .info(), .describe(), and .isnull().sum() to generate a basic Data Quality Profile Report that answers questions like:
      • Which columns have missing data?
      • What are the average and maximum values in the numerical columns?
      • Are there any obvious outliers (e.g., an age of 999)?

This week is all about writing simple tests to ensure the quality and integrity of the data being used to train the model.

  • Day 3: Finding and Quantifying Data Issues
    • Lecture (1 hour): We’ll go deeper into specific data problems. We’ll cover the impact of bad data labels(e.g., a picture of a cat labeled as a dog) and how to measure data quality with quantifiable checks.
    • Practical Exercise (1 hour): Building on last week’s exercise, you’ll write simple Python functions to find and flag data quality issues. For a given dataset of user profiles, you will:
      • Write a script to count the number of rows with missing email addresses.
      • Identify and print all rows where the ‘Registration_Date’ is before the ‘Birth_Date’.
      • Filter the dataset to find users with an age outside a reasonable range (e.g., < 18 or > 90).
  • Day 4: Writing Your First Data Tests
    • Lecture (1 hour): We’ll introduce the concept of an automated data validation pipeline. You’ll learn how to write formal “unit tests” for your data to ensure it always meets quality standards before it’s used for training.
    • Practical Exercise (1 hour): You will write your first set of programmatic data tests using Python’s assertstatement. For a given dataset, you will write a script that checks for the following and fails if they are not met:
      • assert df[‘user_id’].is_unique, to check for duplicate users.
      • assert df[‘subscription_type’].isin([‘free’, ‘premium’, ‘basic’]).all(), to ensure only valid categories exist.
      • assert df[‘age’].isnull().sum() == 0, to ensure there are no missing ages.

Now that we trust our data, we can test the model. This week, you’ll learn the essential metrics used to decide if a model is good enough to release.

  • Day 5: Understanding the Confusion Matrix
    • Lecture (1 hour): We’ll introduce classification models (e.g., spam or not spam?). You’ll learn how to read a Confusion Matrixβ€”the most important tool for understanding a model’s errors. We’ll define Accuracy, Precision, and Recall in simple terms.
    • Practical Exercise (1 hour): This is a no-code, logic-based exercise. You’ll be given several pre-filled confusion matrices for different scenarios (e.g., medical diagnosis, fraud detection). You will manually calculate the Accuracy, Precision, and Recall for each and write a one-sentence summary explaining which metric is most important for that specific scenario and why.
  • Day 6: Calculating Metrics with Code
    • Lecture (1 hour): You’ll learn how to move from manual calculations to automated reporting using Python’s most popular machine learning library, Scikit-learn.
    • Practical Exercise (1 hour): You’ll be given two arrays: true_labels and model_predictions. You will write a Python script using Scikit-learn to:
      • Generate the confusion matrix automatically.
      • Calculate the precision, recall, and F1-score.
      • Print a complete classification_report, a professional summary of the model’s performance.

A model can have good metrics but still behave strangely. This week, we’ll write tests for “common sense” behavior and see how the model handles stress.

  • Day 7: Behavioral Testing
    • Lecture (1 hour): We’ll cover Behavioral Testing, which is like unit testing for AI. We’ll discuss how to test for basic capabilities, invariance (e.g., changing a name shouldn’t change a loan decision), and directional expectations.
    • Practical Exercise (1 hour): You will create a Behavioral Test Plan for a pre-trained sentiment analysis model. In a simple document, you’ll define at least 5 test cases. For example:
      • Test Case 1 (Invariance): Input “I love this movie” vs. “I love this film.” Expect the same positive sentiment.
      • Test Case 2 (Directional Expectation): Input “The movie was good.” vs. “The movie was very good.” Expect the second score to be higher.
      • You’ll then run these inputs through a live model and record the results.
  • Day 8: Robustness and Stress Testing
    • Lecture (1 hour): We’ll discuss why models often fail in the real world: messy, unexpected data. You’ll learn about robustness testing and how to create “corrupted” data to see how the model copes.
    • Practical Exercise (1 hour): You’ll write a Python script that tests the robustness of a sentiment analyzer. The script will:
      • Take a positive sentence like “This is a fantastic product.”
      • Create “corrupted” versions: add typos, remove punctuation, convert to ALL CAPS.
      • Feed each version to the model and print the sentiment score to see if it degrades gracefully or fails completely.

This week focuses on the crucial area of Responsible AI. You’ll learn how to find unfair bias in a model and peek inside the “black box.”

  • Day 9: Auditing for AI Bias
    • Lecture (1 hour): We’ll explore how AI models can accidentally discriminate against certain groups. We’ll define AI bias and discuss key fairness metrics like Demographic Parity.
    • Practical Exercise (1 hour): You’ll be given a dataset of a model’s loan application predictions, including demographic data (e.g., gender). You’ll write a Pandas script to:
      • Group the results by gender.
      • Calculate the approval rate for each group.
      • Identify and report if there is a significant difference, which could indicate bias.
  • Day 10: Peeking Inside the Black Box (Explainable AI)
    • Lecture (1 hour): We’ll introduce Explainable AI (XAI) and why it’s critical for building trust. We’ll discuss tools like LIME and SHAP that can explain why a model made a specific decision.
    • Practical Exercise (1 hour): You’ll use a pre-configured tool (like the What-If Tool or a simple SHAP notebook) on a pre-trained model. Your task will be to:
      • Input a specific data point (e.g., a customer profile).
      • Generate a visual explanation.
      • Write a brief summary identifying the top 3 features that influenced the model’s decision (e.g., “The model denied the loan primarily because of low income and high debt.”).

In our final week, we’ll touch on security and then you’ll apply everything you’ve learned in a final capstone project.

  • Day 11: Introduction to AI Security Testing
    • Lecture (1 hour): We’ll cover the basics of AI security. You’ll learn about adversarial attacks, where tiny, invisible changes to an input can trick a model into making a huge mistake.
    • Practical Exercise (1 hour): This will be a guided demonstration. You will be walked through a pre-written script that loads a famous image recognition model. You will see how the model correctly identifies an image, then how the script adds a tiny amount of “noise” and runs the prediction again, showing how the model is now completely fooled.
  • Day 12: Capstone: Your First AI QA Sign-Off
    • Lecture (30 mins): We’ll review the key stages of an AI QA process and introduce the final project.
    • Practical Project (1.5 hours): You’ll be given a new model and dataset. Your mission is to perform a full QA audit by completing a QA checklist. You will:
      1. Run a data profiling script to check for quality issues.
      2. Run a script to calculate the model’s precision and recall.
      3. Run a bias audit script to check for fairness between two groups.
      4. Run a robustness script to test against 3 types of corrupted input.
      5. Fill out a Final QA Report with your findings and a “Go / No-Go” recommendation for deployment.
Benefits

Interview Guide

Remote Work Option

Resume Preparation

High Paying Jobs

No Experience Required

LinkedIn Advice