# Exercises
## Step 1: Specify Prediction Target
Select the target variable, which corresponds to the sales price. Save this to a new variable called <b>y</b>. <br> 
You'll need to print a list of the columns to find the name of the column you need.

In [12]:
import pandas as pd 
import warnings
warnings.filterwarnings('ignore')

housing = pd.read_csv('housing_train.csv')

y = housing['SalePrice']

In [13]:
y

0       208500
1       181500
2       223500
3       140000
4       250000
         ...  
1455    175000
1456    210000
1457    266500
1458    142125
1459    147500
Name: SalePrice, Length: 1460, dtype: int64

## Step 2: Create X
Now you will create a DataFrame called X holding the predictive features.

Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.

You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):
- LotArea
- YearBuilt
- 1stFlrSF
- 2ndFlrSF
- FullBath
- BedroomAbvGr
- TotRmsAbvGrd
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.

In [17]:
#feature engineering / select columns 

feature_names = ['LotArea', 'YearBuilt','1stFlrSF','2ndFlrSF','FullBath','BedroomAbvGr', 'TotRmsAbvGrd']

X = housing[feature_names]

X

Unnamed: 0,LotArea,YearBuilt,1stFlrSF,2ndFlrSF,FullBath,BedroomAbvGr,TotRmsAbvGrd
0,8450,2003,856,854,2,3,8
1,9600,1976,1262,0,2,3,6
2,11250,2001,920,866,2,3,6
3,9550,1915,961,756,1,3,7
4,14260,2000,1145,1053,2,4,9
...,...,...,...,...,...,...,...
1455,7917,1999,953,694,2,3,7
1456,13175,1978,2073,0,2,3,7
1457,9042,1941,1188,1152,2,4,9
1458,9717,1950,1078,0,1,2,5


<strong>Check:</strong> When you've updated the starter code, check() will tell you whether your code is correct. 
You need to update the code that creates variable <b>X</b>

### Review Data
- Before building a model, take a quick look at X to verify it looks sensible

## Step 3: Specify and Fit Model¶

- Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.

Then fit the model you just created using the data in X and y that you saved above.

<hr>
<strong>What is a Decision Tree Regressor </strong>
<p>A decision tree regressor is a machine learning model that uses a tree-like structure to predict a continuous target variable . It works by recursively partitioning the data into smaller and smaller subsets based on the features, ultimately fitting a simple model (like the mean or median of the target variable) to each subset</p>
<img src= 'https://miro.medium.com/1*62Am0QdlxCq5Vmt5siR-7Q.png' />

In [18]:
from sklearn.tree import DecisionTreeRegressor

# Define model
my_model = DecisionTreeRegressor()

# Fit model
my_model.fit(X, y)


## Step 4: Make Predictions
Make predictions with the model's predict command using X as the data. Save the results to a variable called predictions.

In [23]:
# Make predictions
predictions = my_model.predict(X)

# Show first 5 predictions
predictions


array([208500., 181500., 223500., ..., 266500., 142125., 147500.],
      shape=(1460,))

In [8]:
housing

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68.0,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


## Step 5: TESTING THE ACCURACY OF PREDICTIONS
Since the sales prices are continuing data or ordinal data. 



In [29]:
from sklearn.metrics import r2_score

r2 = r2_score(y, predictions)
print("R² Score:", r2)


R² Score: 0.9999346278363472


<h1>Linear Regression </h1>

In [26]:
from sklearn.linear_model import LinearRegression

# # Fit the linear regression model
l_model = LinearRegression()
l_model.fit(X, y)

# # # Predict
predict_l = l_model.predict(X)

# # # Evaluate: R-squared
r_squared = l_model.score(X, y)
print(f"The model explains {r_squared:.1%}")

The model explains 70.5%
