# Train a model and debug it with Responsible AI dashboard
In today's data-driven world, the demand for machine learning models that not only excel in accuracy but also adhere to ethical principles has never been more pronounced. The Responsible AI dashboard provides data scientists and AI developers with the essential tools necessary to craft machine learning models that prioritize societal well-being and inspire trust. This dashboard empowers us to confront crucial concerns like discrimination, inclusiveness, transparency, and fairness in machine learning. Traditional machine learning model evaluation metrics frequently fall short in identifying responsible AI issues, encompassing fairness, inclusiveness, reliability/safety, privacy & security, accountability, and transparency. Practical tools like the Responsible AI dashboard are instrumental in comprehending the societal impact of your AI model and, most importantly, how to improve it to be less harmful.

In the lab later in this module, we'll be using the UCI hospital diabetes dataset to train a classification model using the Scikit-Learn framework. The model will predict whether or not a diabetic patient will be readmitted back to a hospital within 30 days of being discharged.


## What is a Responsible AI dashboard?
The Responsible AI dashboard is built on the latest open-source tools developed by the leading academic institutions and organizations including Microsoft. These tools are instrumental for data scientists and AI developers to better understand model behavior, discover and mitigate undesirable issues from AI model using ErrorAnalysis, InterpretML, Fairlearn, DiCE, and EconML.

## Responsible AI dashboard components

The Responsible AI dashboard brings together various new and pre-existing tools. The dashboard integrates these tools with Azure Machine Learning CLI v2, Azure Machine Learning Python SDK v2, and Azure Machine Learning studio. The tools include:

| Tool                                      | Description                                                                                                                                                                      |
|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data analysis                             | To understand and explore your dataset distributions and statistics.                                                                                                             |
| Model overview and fairness assessment    | To evaluate the performance of your model and evaluate your model's group fairness issues (how your model's predictions affect diverse groups of people)                         |
| Error analysis                            | To view and understand how errors are distributed in your dataset.                                                                                                                |
| Model interpretability                   | To understand your model's predictions and how those overall and individual predictions are made.                                                                                 |
| Counterfactual what-if                   | To observe how feature perturbations would affect your model predictions while providing the closest data points with opposing or different model predictions. For example: Taylor would have obtained a loan approval from the AI system if they earned $10,000 more in annual income and had two fewer credit cards open.  |
| Causal analysis                          | To estimate how a real-world outcome changes in the presence of an intervention. It also helps construct promising interventions by simulating feature responses to various interventions and creating rules to determine which population cohorts would benefit from a particular intervention. Collectively, these functionalities allow you to apply new policies and effect real-world change. For example, how would providing promotional values to certain customers affect revenue? |

Together, these tools will help you debug machine learning models, while informing your data-driven and model-driven business decisions. The following diagram shows how you can incorporate them into your AI lifecycle to improve your models and get solid data insights.

![alt text](assets/dashboard.png)

## Model debugging
Assessing and debugging machine learning models is critical for model reliability, interpretability, fairness, and compliance. It helps determine how and why AI systems behave the way they do. You can then use this knowledge to improve model performance. Conceptually, model debugging consists of three stages:

1. Identify, to understand and recognize model errors and/or fairness issues by addressing the following questions:
    - "What kinds of errors does my model have?"
    - "In what areas are errors most prevalent?"

2. Diagnose, to explore the reasons behind the identified errors by addressing:
    - "What are the causes of these errors?"
    - "Where should I focus my resources to improve my model?"

3. Mitigate, to use the identification and diagnosis insights from previous stages to take targeted mitigation steps and address questions such as:
    - "How can I improve my model?"
    - "What social or technical solutions exist for these issues?"

![alt text](assets/model-debugging.png)


## Reasons for using the Responsible AI dashboard
Although progress has been made on individual tools for specific areas of Responsible AI, data scientists often need to use various tools to holistically evaluate their models and data. For example: they might have to use model interpretability and fairness assessment together.

If data scientists discover a fairness issue with one tool, they then need to jump to a different tool to understand what data or model factors lie at the root of the issue before taking any steps on mitigation. The following factors further complicate this challenging process:

- There's no central location to discover and learn about the tools, extending the time it takes to research and learn new techniques.
- The different tools don't communicate with each other. Data scientists must wrangle the datasets, models, and other metadata as they pass them between the tools.
- The metrics and visualizations aren't easily comparable, and the results are hard to share.

The Responsible AI dashboard challenges this status quo. It's a comprehensive yet customizable tool that brings together fragmented experiences in one place. It enables you to seamlessly onboard to a single customizable framework for model debugging and data-driven decision-making.

By using the Responsible AI dashboard, you can create dataset cohorts, pass those cohorts to all of the supported components, and observe your model health for your identified cohorts. You can further compare insights from all supported components across various prebuilt cohorts to perform disaggregated analysis and find the blind spots of your model.

## Error analysis on a model
Traditional performance metrics for machine learning models focus on calculations based on correct vs incorrect predictions. The aggregated accuracy scores or average error loss show how good the model is, but don't reveal conditions causing model errors. While the overall performance metrics such as classification accuracy, precision, recall or Mean Absolute Error (MAE) scores are good proxies to help you build trust with your model, they're insufficient in locating where in the data the model has inaccuracies. Often, model errors aren't distributed uniformly in your underlying dataset. For instance, if your model is 89% accurate, does that mean it's 89% fair as well?

Model fairness and model accuracy aren't the same thing and must be considered. Unless you take a deep dive in the model error distribution, it would be challenging to discover the different regions of your data for where the model is failing 42% of the time (see the red region in diagram below). The consequence of having errors in certain data groups can lead to fairness or reliability issues. To illustrate, the data group with the high number of errors might contain sensitive features such as age, gender, disabilities, or ethnicity. Further analysis could reveal that the model has a high error rate with individuals with disabilities compared to ones without disabilities. So, it's essential to understand areas where the model is performing well or not, because the data regions where there are a high number of inaccuracies in your model might turn out to be an important data demographic you can't afford to ignore.

![alt text](assets/error-distribution.png)

This is where the error analysis component of Azure Machine Learning Responsible AI dashboard helps in identifying a model’s error distribution across its test dataset. Throughout this module we'll be using the diabetes hospital readmission classification model scenario to learn and explain the responsible AI dashboard. Later in the lab, you'll train and create your own dashboard using the same dataset.


# Find model performance inconsistencies
An effective approach to evaluating the performance of machine learning models is getting a holistic understanding of their behavior across different scenarios. One way to approach this includes calculating and assessing model performance metrics like accuracy, recall, precision, root mean squared error (RSME), mean absolute error (MAE), or R2 scores. However, just analyzing one metric or alternatively, the aggregated metrics for the overall model is insufficient to debug a model and identify the root cause of errors or inaccuracies. In conjunction with measuring performance metrics, data scientists and AI developers need to conduct comparative analysis to aid their holistic decision making.

Comparative analysis shines a light on how models are performing for one subgroup of the dataset versus another. One of the advantages is that the model overview component of the Responsible AI dashboard isn't just reliant on high-level numeric calculations on datasets, it dives down to the data features as well. This is especially important when one cohort has certain unique characteristics compared to another cohort. For example, discovering that the model is more erroneous with a cohort that has sensitive features (for example, patient race, gender or age) can help expose potential unfairness.

The model overview component provides a comprehensive set of performance and fairness metrics for evaluating your model, along with key performance disparity metrics along specified features and dataset cohorts.

The Model Overview component within the Responsible AI dashboard helps analyze model performance metric disparities across different data cohorts that the user creates.

## Finding disparities in model performance

Model fairness is quantified through disparity metrics during the analysis process.

The following are the different areas the model overview component highlights issues while also using some of the traditional performance metrics:

- Disparities among performance metric
    - Showing how model is performing for a given cohort using metrics such as Accuracy, Precision, Recall, MAE, RSME etc.
- Probability distribution
    - Showing the probability of a given cohort to fall in a model’s predicted outcome.
- Metric visualization
    - Showing performance scores for a given cohort.



# Expose data biases
The traditional method of evaluating the trustworthiness of a model’s performance is to look at calculated metrics such as accuracy, recall, precision, root mean squared error (RSME), mean absolute error (MAE), or R2 depending on the type of use-case you have (for example, classification or regression). Data scientists and AI developers can also measure confidence levels for areas the model correctly predicted or the frequency of making correct predictions. You can also try to isolate your test data in separate cohorts to observe and compare how the model performs with some groups vs. others. However, all of these techniques ignore a major blind spot: the underlying data.

Data can be overrepresented in some cases and underrepresented in others. This might lead to data biases, causing the model to have fairness, inclusiveness, safety, and/or reliability issues.

The Responsible AI dashboard includes a data analysis component that enables users to explore and understand the dataset distributions and statistics. It provides an interactive user interface (UI) to enable users to visualize datasets based on the predicted and actual outcomes, error groups, and specific features. This is useful for ML professionals to be able to quickly debug and identify issues of data over- and under-representation and to see how data is clustered in the dataset. As a result, they can understand the root cause of errors and any fairness issues introduced via data imbalances or lack of representation of a particular data group.

With the data analysis component, a Table view pane shows you a table view of your raw dataset with all the features as well as the true outcome vs predicted. In addition, the Chart view panel shows you aggregate and individual plots of datapoints. You can analyze data statistics along the x-axis and y-axis by using filters such as predicted outcome, dataset features, and error groups. This view helps you understand overrepresentation and underrepresentation in your dataset.

# Explain and interpret a model
Assessing a model isn't just about understanding how accurately it can make a prediction, but also why it made the prediction. Understanding a model’s behavior is a critical part of debugging and helps drive responsible outputs. By evaluating which data features are driving a model’s prediction, you can identify if they're acceptable sensitive or nonsensitive features to base a decision on. For instance, if a model is using race or gender to predict a diabetic patient’s time in the hospital, then that’s a red flag to investigate the model. In addition, being able to explain a model’s outcome provides shared understanding for data scientists, decision-makers, end-users and auditors. Some industries have compliance regulations that require organizations to provide an explanation for how and why a model made the prediction it did. If an AI system is driving the decision-making, then data scientists need to specify the data features driving the model to make a prediction.

This is where the Responsible AI dashboard is beneficial. The feature importance component provides an interactive user interface (UI) that enables data scientists or AI developers to see the top features in their dataset that influence their model’s prediction. In addition, it provides both global explanations and local explanations. With global explanations, the dashboard displays the top features that affect the model’s overall predictions. For local explanations, it shows which features most influenced a prediction for an individual data point. In our diabetes hospital readmission use case, every patient is different, so what features drove the model to make a prediction for one patient might not be as important for another patient.

The feature importance component has built-in model explainability and interpretability capabilities to help users answer questions in scenarios such as:

- Model debugging: Why did my model make this mistake? How can I improve my model?
- Human-AI collaboration: How can I understand and trust the model’s decisions?
- Regulatory compliance: Does my model satisfy legal requirements?

By using the feature importance component, you can see which features were most important in your model’s predictions.
