# Random Forest 

Is an ensemble method that combines multiple individual decision trees to make predictions. It improves upon the predictive accuracy of a single decision tree by averaging the predictions of each tree in the forest. In general, Random Forest tends to have superior performance compared to a single decision tree, and it often works well even with default parameters. However, as you continue to explore and build more models, you may come across other algorithms that offer even better performance. It's worth noting that some of these advanced models can be sensitive to finding the right combination of parameters for optimal results.

In [19]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
import pandas as pd
from pathlib import Path

# save filepath to variable for easier access
melbourne_file_path = Path.cwd()

# Define o caminho relativo para a pasta de dados
data_folder = melbourne_file_path / "melb_data.csv"

# read the data and store data in DataFrame titled melbourne_data
melbourne_data = pd.read_csv(data_folder) 

# dropna drops missing values (think of na as "not available")
melbourne_data = melbourne_data.dropna(axis=0)

## Assigning the target variable 'Price' to y
y = melbourne_data.Price

## Defining a list of features to be used for modeling
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']

## Creating a DataFrame X with only the selected features
X = melbourne_data[melbourne_features]

In [20]:
# Split the data into training and validation sets
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)

# Define the Random Forest model
forest_model = RandomForestRegressor(random_state = 1)

# Fit the model on the training data
forest_model.fit(train_X, train_y)

# Make predictions on the validation data
melb_preds = forest_model.predict(val_X)

# Calculate the mean absolute error between the true and predicted values
mae = mean_absolute_error(val_y, melb_preds)

# Print the mean absolute error
print(mae)

207190.6873773146


# Conclusion

There is likely room for further improvement, but this represents a significant improvement compared to the best decision tree error of 260,000 dolars. Random Forest models have parameters that can be tuned to enhance performance, similar to adjusting the maximum depth of a single decision tree. However, one of the key advantages of Random Forest models is that they often work reasonably well even without extensive tuning.