### Explainable AI

The purpose of this document is to give a preview of the explanations for the questions provided in the accompanying Google form. 

#### 1. Are there any longitudinal or time dependent trends in the data?

There are many cases where we may get data that show seasonal patterns. For example - sales records of cough syrups, average rainfall recorded in a particular area, height of ocean tides, etc. According to us, a businessman would like to see the comparison of the target variable for this season period compared against the target variable during non seasonal period

#### 2. Show me some interesting behavior that I wouldn’t expect.

Although the question comes across as naive, nevertheless, a lot of customers ask this question. This is what we have come up with - through clustering, we can group the data into segments and profile them. In the profiling step we will be labelling the segments, for example - high volume low margin traders, loyal and low basket value customers, etc. We aim to attract customer attention by showing them the latent structures in the data and their profiles/attributes. What other information should we show them? 

#### 3. What are the most important features?

A staple question for all the businessmen. To answer this question, we will display the list of variables in decreasing order of importance. Along with that we will show how much the model suffers in terms of accuracy (or any other relevant metric) on excluding that variable. Is there any other way to answer this question that makes it more intuitive and easy to understand? Also is the answer sufficient in terms of content?

<tr>
    <td> <img src="var_imp_plot.png" alt="Drawing" style="width: 1000px;"/> </td>
    <td> <img src="acc_drop.png" alt="Drawing" style="width: 1000px;"/> </td>
</tr>

#### 4. What model is being used for prediction?

Our response to this question will constitute the following information - the type of model, a short description of it, the types of problems it is generally used on and any caveats associated with it. Assuming you are a business stakeholder in a project and have nominal experience in data based solutions, what else would you want to know?

<table>
<tr>
    <th>Model</th>
    <td>Logistic Regression</td>
</tr>
<tr>
    <th>Description</th>
    <td>Logistic Regression is a probability estimator<br>used to estimate the probability of an event occurring<br>having been given some previous data</td>
</tr>
<tr>
    <th>Interpretability</th>
    <td>High</td>
</tr>
<tr>
    <th>Used for</th>
    <td>Classification problems</td>
</tr>
<tr>
    <th>Caveats</th>
    <td>Linear model, doesn't capture non-linearity in data</td>
</tr>
<tr>
    <th>Use case</th>
    <td>
        <ol>
            <li>Predict the chances of a customer defaulting</li>
            <li>Predict the propensity of a customer buying a product</li>
            <li>Predict if a customer will churn</li>
        </ol>
    </td>
</tr>
</table>

#### 5. How well is the model performing?

While there are many accuracy metrics we can report, we will be displaying only a few of them based on relevance. The answer will have a format similar to this-  name of the metric, a short description, its value, and the range it should lie in. We can add a second pa What are your expectations? Does this response provide sufficient information? 

<table>
<tr>
    <th>Metric</th>
    <th>Value</th>
    <th>Description</th>
</tr>
<tr>
    <th>AUC-ROC</th>
    <td>0.83</td>
    <td>
        It shows how far off the model is from a randomly<br>
        predicting machine. For the baseline it's 0.5, and for<br>
        the perfect classifier it's 1.<br>
    </td>
</tr>

<tr>
    <th>GINI</th>
    <td>0.66</td>
    <td>GINI is a measure of inequality in a population. Higher values<br>
    indicate better prediction</td>
</tr>

<tr>
    <th>Accuracy</th>
    <td>0.88</td>
    <td>Accuracy measures the percentage of cases where predicted<br>
    outcome matches the actual outcome</td>
</tr>

<tr>
    <th>Sensitivity</th>
    <td>0.78</td>
    <td>
        Sensitivity measures the proportion of positives that<br>
        are correctly identified as such (e.g. the percentage of<br>
        sick people who are correctly identified as having the condition).
    </td>
</tr>

<tr>
    <th>Specificity</th>
    <td>0.81</td>
    <td>
        Specificity measures the proportion of negatives that<br>
        are correctly identified as such (e.g. the percentage<br>
        of healthy people who are correctly identified as not having<br>
        the condition).
    </td>
</tr>

</table>


#### 6. What are the important feature in a segment with respect to the target variable

Tp provide deeper and more meaningful explanations, we will be performing a few exercises in addition to the task. Segmentation/Clustering is one of them. Each cluster has different properties which means that for each segment, the variable importance might change. Here the idea is to show that if we created a separate model for each segment what would the most important variables be. Think of this as question 3 but at the segment scale. Would this information suffice? 

<tr>
    <td> <img src="seg_2.png" alt="Drawing" style="width: 7000px;"/> </td>
    <td> <img src="seg_4.png" alt="Drawing" style="width: 7000px;"/> </td>
</tr>

#### 7. What are the differences between the segments? 

The difference between this and the above question is the exclusion of target variable. In the previous question, we looked at variable importance with respect to the target variable. In this question however, we will show how much the variables differ from each other by showing a graphical comparison of their summary statistics. We will also show how the target variable descriptive measures (mean, median, mode, etc) vary across each cluster. Is this answer sufficient to broadly answer the question 

<tr>
    <td> <img src="cluster_comparison_means.png" alt="Drawing" style="width: 7000px;"/> </td>
    <td> <img src="cluster_comparison_medians.png" alt="Drawing" style="width: 7000px;"/> </td>
</tr>

#### 8. What are the attributes that influence this decision?

The question is aimed at understanding what variables influenced the prediction for a particular case (just the one instance). The result could be different from the variable importance that we show for question 3. To answer this question, we will show the variable importance for that particular instance.