# <font color="orange">Confusion Matrix</font>

<p>A confusion matrix is a table that is often used to <strong>describe the performance of a classification model</strong> (or "classifier") on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand.</p>

<img src="../../img/confusion_matrix_simple2.png">

<p>What can we learn from this matrix?</p>
<ul>
<li>There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.</li>
<li>The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).</li>
<li>Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.</li>
<li>In reality, 105 patients in the sample have the disease, and 60 patients do not.</li>
</ul>

<p>Let's now define the most basic terms, which are whole numbers (not rates):</p>
<ul>
<li><strong>true positives (TP):</strong> These are cases in which we predicted yes (they have the disease), and they do have the disease.</li>
<li><strong>true negatives (TN):</strong> We predicted no, and they don't have the disease.</li>
<li><strong>false positives (FP):</strong> We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")</li>
<li><strong>false negatives (FN):</strong> We predicted no, but they actually do have the disease. (Also known as a "Type II error.")</li>
</ul>


<p>Let's now define the most basic terms, which are whole numbers (not rates):</p>
<img src="../../img/confusion_matrix2.png">

<ul>
<li><strong>Accuracy:</strong> Overall, how often is the classifier correct?
<ul>
<li>(TP+TN)/total = (100+50)/165 = 0.91</li>
</ul>
</li>
<li><strong>Misclassification Rate:</strong> Overall, how often is it wrong?
<ul>
<li>(FP+FN)/total = (10+5)/165 = 0.09</li>
<li>equivalent to 1 minus Accuracy</li>
<li>also known as "Error Rate"</li>
</ul>
</li>
<li><strong>True Positive Rate:</strong> When it's actually yes, how often does it predict yes?
<ul>
<li>TP/actual yes = 100/105 = 0.95</li>
<li>also known as "Sensitivity" or "Recall"</li>
</ul>
</li>
<li><strong>False Positive Rate:</strong> When it's actually no, how often does it predict yes?
<ul>
<li>FP/actual no = 10/60 = 0.17</li>
</ul>
</li>
<li><strong>True Negative Rate:</strong> When it's actually no, how often does it predict no?
<ul>
<li>TN/actual no = 50/60 = 0.83</li>
<li>equivalent to 1 minus False Positive Rate</li>
<li>also known as "Specificity"</li>
</ul>
</li>
<li><strong>Precision:</strong> When it predicts yes, how often is it correct?
<ul>
<li>TP/predicted yes = 100/110 = 0.91</li>
</ul>
</li>
<li><strong>Prevalence:</strong> How often does the yes condition actually occur in our sample?
<ul>
<li>actual yes/total = 105/165 = 0.64</li>
</ul>
</li>
</ul>

<p>A couple other terms are also worth mentioning:</p>

<ul>
<li><strong>Null Error Rate:</strong> This is how often you would be wrong if you always predicted the majority class. (In our example, the null error rate would be 60/165=0.36 because if you always predicted yes, you would only be wrong for the 60 "no" cases.) This can be a useful baseline metric to compare your classifier against. However, the best classifier for a particular application will sometimes have a higher error rate than the null error rate, as demonstrated by the <a href="http://en.wikipedia.org/wiki/Accuracy_paradox">Accuracy Paradox</a>.</li>
<li><strong>Cohen's Kappa:</strong> This is essentially a measure of how well the classifier performed as compared to how well it would have performed simply by chance. In other words, a model will have a high Kappa score if there is a big difference between the accuracy and the null error rate. (<a href="http://en.wikipedia.org/wiki/Cohen's_kappa">More details about Cohen's Kappa.</a>)</li>
<li><strong>F Score:</strong> This is a weighted average of the true positive rate (recall) and precision. (<a href="http://en.wikipedia.org/wiki/F1_score">More details about the F Score.</a>)</li>
<li><strong>ROC Curve:</strong> This is a commonly used graph that summarizes the performance of a classifier over all possible thresholds. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as you vary the threshold for assigning observations to a given class. (<a href="https://www.dataschool.io/roc-curves-and-auc-explained/">More details about ROC Curves.</a>)</li>
</ul>


<img src="../../img/1_BT3awaBdZHsit5s41LPb9A.png">
<img src="../../img/1_QRIZDkk_FffXKs_07ZlhZw.png">
<img src="../../img/1_98FaAKfPWo-EBTbjsxm4GA.png">

