##1

###Q. Confusion Matrix
Write a function that takes two 1-d numpy arrays $y$ and $yd$ as inputs and returns two outputs - confusion matrix as a 2-d numpy array and order of classes in the confusion matrix as a 1-d numpy array. 

<img src="https://drive.google.com/uc?id=1TfOYFOGMH91b_tewl6RzEgYYxaxThYLb" width = "400">

 <br>
For eg., <br>
$y$ = ['cat', 'cat', 'dog', 'human', 'human', 'human'] <br>
$yd$ = ['cat', 'dog', 'dog', 'dog', 'cat', 'human']


```confusion_matrix($y$, $yd$)``` -> ```([[1, 0, 1], [1, 1, 1], [0, 0, 1]], ['cat', 'dog', 'human'])``` <br>

In the above example, the order of classes in the confusion matrix is cat, dog, human. (Consider list of unique classes in $yd$ as list of all classes)

In [None]:
import numpy as np
def confusion_matrix(y, yd):
  """
  Inputs:
    y : 1-d numpy array, model outputs
    yd: 1-d numpy array, desired outputs or ground truth
  Outputs:
    cm: 2-d numpy array of shape (N, N) where N is number of unique classes in yd
    classes: 1-d numpy array, unique classes in the order in which they occur in cm
  """
  ar=[]
  a=[]
  for k in yd:
    if k not in a:
      a.append(k)
  ar=np.array(ar)
  a=np.array(a)
  y=np.array(y)
  yd=np.array(yd)
  x=len(a)
  ar=np.zeros((x,x))
  for i in range(0,len(y)):
    if (yd[i]==y[i]):
      for j in range(0,len(a)):
        if (a[j]==yd[i]):
          ar[j,j]+=1
    else:
      for j in range(0,len(a)):
        for m in range(0,len(a)):
          if(a[j]==yd[i] and a[m]==y[i]):
            ar[j,m]+=1      
  return (ar,a)
  


##2

###Q. Max F1 score
Write a function that takes a confusion matrix as input and returns index of the class with the maximum f1 score.

In [None]:
import numpy as np
def max_f1_score(cm):
  """
  Inputs:
    cm : confusion matrix, 2-d numpy array
  Outputs:
    integer, index of class with max f1 score
  """
  cm=np.array(cm)
  total_pred_pos=0
  total_act_pos=0
  ar_f1=[]
  for i in range(0,len(cm)):
    for j in range(0,len(cm)):
      total_pred_pos+=cm[j,i]
      true_pos=cm[i,i]
      total_act_pos+=cm[i,j]
    precision=true_pos/total_pred_pos
    recall=true_pos/total_act_pos
    f1=2*precision*recall/(precision+recall)
    ar_f1.append(f1)
    total_pred_pos=0
    total_act_pos=0
  for k in range(0,len(ar_f1)):
    if(ar_f1[k]==max(ar_f1)):
      return k


##3

###Q. Most confused class
Write a function that takes a confusion matrix as input and returns index of the class that occurs most frequently as the result of a misclassification. <br>
For eg., if $a$ is the correct class of a sample but model misclassifies it as $c$ in two cases. <br>
And in another case, $b$ is the correct class of a sample but the model misclassifies it as $c$. <br>
So in this way $c$ has occurred 3 times as the result of misclassification. <br>


In [None]:
import numpy as np
def most_confused_class(cm):
  """
  Inputs:
    cm : confusion matrix, 2-d numpy array
  Outputs:
    integer, index of class which is most confused
  """
  cm=np.array(cm)
  sum=0
  a=[]
  for i in range(0,cm.shape[1]):
    for j in range(0,cm.shape[0]):
      if(i!=j):
        sum+=cm[j,i]
    a.append(sum)
    sum=0
  for k in range(0,len(a)):
    if(a[k]==max(a)):
      return k



## 4

###Q. MSE
Write a function that takes two 1-d numpy arrays ($y$ and $yd$) and returns the mean squared error, defined as 
$$
mse = \frac{1}{N}\sum_{i=1}^{N}{(yd_i - y_i)^2}
$$ 

In [None]:
import numpy as np
def mse(y, yd):
  """
  Inputs:
    y: 1-d Numpy array of floats
    yd: 1-d Numpy array of floats
  Outputs:
    mse: float, mean squared error
  """
  n=len(y)
  yd=np.array(yd)
  y=np.array(y)
  diff_sum=0
  for i in range(0,n):
    diff_sum+=((yd[i]-y[i])**2)
  return (diff_sum/n)


##5

###Q. Fraction of MSE
Write a function that takes two 1-d numpy arrays ($y$ and $yd$) and does the following - 
- It computes the  $\text{total squared error}$ first. ($\text{total squared error } = mse*N$) 
- It computes total squared error of the 20% of samples contributing most toward the $\text{total squared error}$ which is $\text{tot_20}$
- It returns the fraction $\frac{\text{tot_20}}{\text{total squared error}}$

In [None]:
def fraction_mse_20(y, yd):
  """
  Inputs:
    y: 1-d Numpy array of floats
    yd: 1-d Numpy array of floats
  Outputs:
    float, tot_20/total_squared_error
  """ 
  n=len(y)
  yd=np.array(yd)
  y=np.array(y)
  a=[]
  tot_20=0
  total_squared_error=0
  for i in range(0,n):
    total_squared_error+=((yd[i]-y[i])**2)
    a.append((yd[i]-y[i])**2)
  no_of_samples_top_20=.2*n
  a=np.array(a)
  sort=np.sort(a)
  for i in range(0,int(no_of_samples_top_20)):
    tot_20+=sort[n-i-1]
  return (tot_20/total_squared_error)

