<h1 align='center'> DSCC 465 HW 02 <h1>
    <h3 align='right'> Uzair Tahamid Siam </h3>

---

\begin{enumerate}
\item
Let, the hypothesis be that the car is behind door 1. And we want to find the probability that the car is behind door 1 given that the door opened has a goat behind it. We will use ! to represent not. So,
$$C = \text{The car is behind door 1}$$ $$G = \text{The goat is behind the door opened}$$



\begin{align*}
    P(C|G) &= \frac{P(G|C) P(C)}{P(G)}
\end{align*}

We can write the marginal probability of the host showing a goat as,
\begin{align*}
    P(G) = P(G|C)P(C) + P(G|!C) P(!C)
\end{align*}
This gives us all the possibilities of the host showing the goat regardless of whether door 1 has a car or not. 

\begin{align*}
    P(G|C) = P(G \cap C)/P(C) = (1/3)/(1/3) = 1
\end{align*}
The probability of having the goat behind the door that was opened given that the car is behind door 1 is just 1. Because the door with the car will obviously not be opened. 

\begin{align*}
    P(G|!C) = P(G \cap !C)/P(!C) = (2/3)/(2/3) = 1
\end{align*}
The probability that the host shows the goat, given that there is a goat behind door 1, is again always 1. The host always shows a door with a goat. 

From the problem itself we know that, $P(C) = 1/3$ ($P(!C)=1-P(C)=2/3$) and that is our prior. 

\begin{align*}
    \therefore P(G) = (1/3) + (2/3) = 1
\end{align*}
Which again makes sense because the host will always open a door that has a goat. He will never open a door with a car. 

So, now we can find,

\begin{align*}
    P(C|G) &= \frac{P(G|C) P(C)}{P(G)}\\
    &= \frac{(1) \cdot (1/3)}{1} = 1/3
\end{align*}

We also find that,
\begin{align*}
     P(!C|G) = 1 -  P(C|G) = 2/3
\end{align*}

This shows us that it is more probable that the car is not behind door 1. So, we can conclude that it is more advantageous to switch.


\item For the two teams to reach the $7^{th}$ game team A has to win 3 games and has to lose 3 games (i.e. team B wins the 3 games). This is just a binomial distribution since there are n-trials with a binary possibility. So, let $P(A) = 0.55$ be the probability that A wins and $P(B) = 0.45$ be the probability that A loses (i.e B wins). There are multiple combinations possible for A to win 3 games and lose 3 games. To be exact there are 6C3 = 20 possible combinations. E.g. WWWLLL/WWLLWL/LLLWWW/and so on. So we can compute the probability as,

\begin{align*}
    P_{6}(3) &= {6 \choose 3} (P(A))^3 (P(B))^3\\
    &= {6 \choose 3} (0.55)^3 (0.45)^3 \\
    &\approx 0.3032
\end{align*}

\item See below
\item See below
\item See below
\end{enumerate}


## Importing

In [22]:
import numpy as np
import sklearn.datasets
import pandas as pd
from numpy.typing import ArrayLike
import matplotlib.pyplot as plt

## Data Preprocessing

In [23]:
# importing california housing data from sklearn 

dataset = sklearn.datasets.fetch_california_housing(as_frame=True).frame

In [24]:
x = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]

In [25]:
# Dividing into training and test sets

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=265)

In [26]:
# Normalizing the data i.e. feature scaling

from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)

## Q3
---

Gradient: 
$$\nabla_{\theta}J = (2/m) \matrix{X}^{T}({X}\theta - \vec{y})$$

$X = $ feature matrix

$\theta = $ weights

$m = $ number of data objects

$X\theta = $ prediction

$\vec{y} = $ targets

MSE function: 
$$J(\vec{\theta}) = (1/2m)(X\theta - \vec{y})^{2}$$


In [27]:
def cost(theta: ArrayLike, X: ArrayLike, y: ArrayLike) -> int:
    '''
    Parameters
    ----------
    theta: numpy.typing.ArrayLike
        The weights that are to be optimized
    
    X: numpy.typing.ArrayLike
        Numpy Array of features
    
    y: numpy.typing.ArrayLike
        Numpy Array of targets
    
    Returns
    ----------
    mse: int
        Returns the mean-square-error between the predicted and actual target values
    '''
    m = len(y) # number of data
    y_pred = np.dot(X, theta) # calculating X*theta
    mse = 1/(2*m)*((y_pred - y)**2)  # calculating the error function 

    return mse

def gradient(theta:ArrayLike, X:ArrayLike, y:ArrayLike)->ArrayLike:
    '''
    '''
    m = len(y) # number of data
    err = np.dot(X, theta) - y # error/residue term
    grad = 2/m * np.dot(X.T, err) # finding the gradient using the vector form
    
    return grad
 
def gradient_descent(X: ArrayLike, y:ArrayLike, number_of_steps: int = 1000, learning_rate: int = 0.01, plot:bool = False) -> ArrayLike:
    '''
    Parameters
    ----------
    theta: numpy.typing.ArrayLike
        The weights that are to be optimized
    
    X: numpy.typing.ArrayLike
        Numpy Array of features
    
    y: numpy.typing.ArrayLike
        Numpy Array of targets
        
    number_of_steps: int
        Number of iterations for the gradient descent algorithm
        
    learning_rate: int
        Often called alpha. Used as a multiplier for the step-size
    
    Returns
    ----------
    theta: int
        Returns the optimized values of the weights initially given as a parameter
    '''
    
    y_reshaped = np.reshape(y, (len(y), 1)) # reshaping for calculation
    new_X = np.c_[np.ones((len(X), 1)), X] # appending a column of 1 to X for intercept
    theta = np.random.randn(len(X[0])+1, 1) # initializing the weights and also the intercept
    m = len(y) # number of data
    cost_lst = [] # saving cost for plotting

    for i in range(number_of_steps): # initializing the iterations
        gradients = gradient(theta, new_X, y_reshaped) # finding the gradient using the initial theta
        theta = theta - learning_rate * gradients # finding the new weights
        y_pred = np.dot(new_X, theta) # finding the new prediction value
        if plot:
            cost_value = cost(theta, new_X, y_reshaped) # finding the new cost if plotting
            # calculating the total cost
            total = 0
            for i in range(len(y)):
                total += cost_value[i][0] 
                #Calculate the cost function for each iteration

                cost_lst.append(total)
    if plot:
        plt.plot(np.arange(1,number_of_steps),cost_lst[1:], color = 'red')
        plt.title('MSE Plot')
        plt.xlabel('Number of iterations')
        plt.ylabel('Cost')
        
    intercept_ = theta[0][0]   
    
    return {'intercept':intercept_, 'weights':theta[1:].flatten()}


    

In [28]:
# If you wish to plot the cost please set plot as True in the argument

gradient_descent(x_train, y_train.values, number_of_steps= 1000, learning_rate= 0.01)

{'intercept': 2.0660607482611555,
 'weights': array([ 0.95173487,  0.17742949, -0.40748083,  0.40257876,  0.01243974,
        -0.04845525, -0.35121777, -0.32949271])}

## Interpretation:

The intercept represents the median value of the house if every feature had a value of 0.

If we are to assume that the data follows rules of linearity, then from our analysis we can make the following claims:

> 1) The median income has the highest positive trend with the median value of a house. This makes complete sense as if you make more money you are probably living in a high income area which also results in higher prices in housing.

> 2) The lattitude and longitudes have the most negative trend with the value. I am not sure what to make of this observation given my lack of knowledge in both the housing industry as well as what living at different longi/lattitude means.

> 3) The population and average occupation seem to have the lowest correlation to the house pricing. Kind of surprising to me as you would expect some sort of correlation between population in a region and how expensive it might be to get land/build housing ia highly populated region. But at least it does show slightly negative impact as expected.

> 4) The higher the age of the house the more expensive the house appears to be. This is truly surprising to me as I would expect newer houses to be more expensive.

> 5) The more *BEDROOMS* the higher the price but the more *TOTAL* rooms the lower the price. I suppose this kind of makes sense. If you have more bedrooms you do expect the house to be bigger but also if there's too many rooms that are not bedrooms people might not want to buy the house as a result decreasing the price.

# Q4 
---

In [29]:
# Using sklearn's Stochastic Gradient Descent Regression algorithm

from sklearn.linear_model import SGDRegressor

reg = SGDRegressor(loss='squared_error',random_state=265, max_iter=1000, alpha=0.01)

# training the data
reg.fit(x_train, y_train)

# getting the optimized weights
weights = reg.coef_

In [30]:
print(f"weights:\n{weights}")
print(f"intercept:\n{reg.intercept_}")

weights:
[ 0.8415846   0.12309233 -0.2766104   0.28230879 -0.01295436 -0.03037704
 -0.80087347 -0.76047679]
intercept:
[2.06235155]


## Interpretation:

*The behavior is very comparable to that seen in our analysis done in Q3. Of course this was to be expected as we are using similar techniques. The values for the weights do not completely align (i.e they are not 1:1) but the trends are all the same and so are the relative magnitudes of the weights. The slight decrepencies are likely to be a result of a suboptimized self-made gradient descent function compared to the more professionally developed sklearn module's regressor. But nonetheless given enough steps in our own gradient descent function, the two do get very close. (Feel free to run it for number_of_steps = 5000 and check for yourself)*

## Q5
---

$$cov(X) = E[(X-E[X])(X-E[X])^{T}]$$

In [47]:
def cov_like_np(X: ArrayLike) -> ArrayLike:
    '''
    Parameters
    ----------
    X: numpy.typing.ArrayLike
        Numpy Array for covariance calculation
    
    Returns
    ----------
    cov: numpy.typing.ArrayLike
        Numpy Array representing the covariance of the input matrix X
    '''
    X = np.array(X)
    EX = np.mean(X, axis=1).reshape(len(X), 1) # finding mean of each col in X and reshaping for subtraction
    diff = np.subtract(X, EX) # finding X - EX
    prod = np.dot(diff, diff.T)
    cov = prod/(len(X[0])-1)
    return cov
    

In [48]:
def cov_like_pd(X: ArrayLike, asFrame:bool = True) -> ArrayLike:
    '''
    Parameters
    ----------
    X: numpy.typing.ArrayLike
        Numpy Array for covariance calculation
    
    asFrame: bool
        User input for return type as a dataframe or numpy array
    
    Returns
    ----------
    cov: numpy.typing.ArrayLike
        Numpy Array representing the covariance of the input matrix X
    '''
    I = np.ones((len(X), len(X)))   
    EX = np.dot(I, X)/(X.shape[0]) # finding the expectation of X by multiplying by I and divide by the number of rows
    diff = np.subtract(X,EX) # finding X - EX
    cov = np.dot(diff.T, diff)/(X.shape[0]-1) 
    return pd.DataFrame.from_records(cov) if asFrame else cov


### Pandas Like Covariance

In [49]:
cov_like_pd(x)

Unnamed: 0,0,1,2,3,4,5,6,7
0,3.609323,-2.84614,1.536568,-0.055858,10.40098,0.370289,-0.32386,-0.057765
1,-2.84614,158.39626,-4.772882,-0.463718,-4222.271,1.724298,0.300346,-2.728244
2,1.536568,-4.772882,6.121533,0.993868,-202.3337,-0.124689,0.562235,-0.136518
3,-0.055858,-0.463718,0.993868,0.224592,-35.52723,-0.030424,0.070575,0.01267
4,10.400979,-4222.270582,-202.333712,-35.527225,1282470.0,821.712002,-263.137814,226.377839
5,0.370289,1.724298,-0.124689,-0.030424,821.712,107.870026,0.052492,0.051519
6,-0.32386,0.300346,0.562235,0.070575,-263.1378,0.052492,4.562293,-3.957054
7,-0.057765,-2.728244,-0.136518,0.01267,226.3778,0.051519,-3.957054,4.014139


In [50]:
pd.DataFrame.cov(pd.DataFrame.from_records(x))

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
MedInc,3.609323,-2.84614,1.536568,-0.055858,10.40098,0.370289,-0.32386,-0.057765
HouseAge,-2.84614,158.39626,-4.772882,-0.463718,-4222.271,1.724298,0.300346,-2.728244
AveRooms,1.536568,-4.772882,6.121533,0.993868,-202.3337,-0.124689,0.562235,-0.136518
AveBedrms,-0.055858,-0.463718,0.993868,0.224592,-35.52723,-0.030424,0.070575,0.01267
Population,10.400979,-4222.270582,-202.333712,-35.527225,1282470.0,821.712002,-263.137814,226.377839
AveOccup,0.370289,1.724298,-0.124689,-0.030424,821.712,107.870026,0.052492,0.051519
Latitude,-0.32386,0.300346,0.562235,0.070575,-263.1378,0.052492,4.562293,-3.957054
Longitude,-0.057765,-2.728244,-0.136518,0.01267,226.3778,0.051519,-3.957054,4.014139


### Numpy Like Covariance

In [51]:
cov_like_np(x)

20640


array([[ 15828.51059651, 100411.06909593,  22911.52253488, ...,
         43698.7069652 ,  32878.34650356,  59154.88863585],
       [100411.06909593, 726902.76879646, 152324.56887559, ...,
        307726.22523876, 227636.95255885, 422082.4403021 ],
       [ 22911.52253488, 152324.56887559,  33722.95864622, ...,
         65602.13612302,  49048.98091119,  89243.70502233],
       ...,
       [ 43698.7069652 , 307726.22523876,  65602.13612302, ...,
        131028.86321045,  97274.06321906, 179228.804142  ],
       [ 32878.34650356, 227636.95255885,  49048.98091119, ...,
         97274.06321906,  72373.786035  , 132831.76856219],
       [ 59154.88863585, 422082.4403021 ,  89243.70502233, ...,
        179228.804142  , 132831.76856219, 245479.08714416]])

In [46]:
np.cov(x)

array([[ 15828.51059651, 100411.06909593,  22911.52253488, ...,
         43698.7069652 ,  32878.34650356,  59154.88863585],
       [100411.06909593, 726902.76879646, 152324.56887559, ...,
        307726.22523876, 227636.95255885, 422082.4403021 ],
       [ 22911.52253488, 152324.56887559,  33722.95864622, ...,
         65602.13612302,  49048.98091119,  89243.70502233],
       ...,
       [ 43698.7069652 , 307726.22523876,  65602.13612302, ...,
        131028.86321045,  97274.06321906, 179228.804142  ],
       [ 32878.34650356, 227636.95255885,  49048.98091119, ...,
         97274.06321906,  72373.786035  , 132831.76856219],
       [ 59154.88863585, 422082.4403021 ,  89243.70502233, ...,
        179228.804142  , 132831.76856219, 245479.08714416]])