<a href="https://colab.research.google.com/github/yoboiwatsup/MachineLearning/blob/main/Week%203/Regression_Model_MaterialStrength.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Dataset ini terdiri dari berbagai fitur, dengan kolom terakhir "target_feature" sebagai target yang akan prediksi menggunakan model regresi. Lanjut untuk membangun model regresi menggunakan Decision Tree dan k-NN, serta juga menggunakan regresi linear biasa dan regresi linear dengan basis fungsi.

In [7]:
import pandas as pd
# Load the new dataset for regression analysis
file_path_regression = 'MaterialStrength.csv'
data_regression = pd.read_csv(file_path_regression)

# Display the first few rows of the regression dataset to understand its structure
data_regression.head()


Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,target_feature
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,0.3,540.0,1.538462,1,0.350044,YEs,NO,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,0.3,540.0,1.560651,1,0.452416,yES,nOO,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,0.685714,475.0,1.569024,0,6.704743,yEs,NO,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,0.685714,475.0,1.569024,0,8.891596,yes,NOO,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,0.966767,331.0,1.185221,0,8.126411,YeS,no,44.3


MSE (Mean Squared Error): Semakin kecil nilai MSE, semakin baik model tersebut dalam memprediksi.
RMSE (Root Mean Squared Error): Merupakan akar kuadrat dari MSE, memberikan gambaran yang lebih intuitif tentang kesalahan model dalam satuan yang sama dengan target.
R² (Koefisien Determinasi): Menunjukkan proporsi variasi dalam data yang dapat dijelaskan oleh model. Nilai R² mendekati 1 menunjukkan model yang baik.
Dari hasil tersebut, model Decision Tree menunjukkan kinerja terbaik di antara semua model yang diuji, dengan R² yang tinggi dan MSE serta RMSE yang rendah.

In [10]:
# Re-running the necessary code to set up the training and test sets again since previous variables were lost.
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, LabelEncoder
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# Instantiate LabelEncoder
le = LabelEncoder() # Create an instance of LabelEncoder

# Encode categorical features into numerical values
data_regression['x14'] = le.fit_transform(data_regression['x14'])
data_regression['x15'] = le.fit_transform(data_regression['x15'])
data_regression['target_feature'] = data_regression['target_feature'].astype(float)

# Separate features (X) and target (y)
X_reg = data_regression.drop('target_feature', axis=1)
y_reg = data_regression['target_feature']

# Split the dataset into training and testing sets
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.3, random_state=42)

# Train a Decision Tree Regressor
dt_regressor = DecisionTreeRegressor(random_state=42)
dt_regressor.fit(X_train_reg, y_train_reg)

# Train a k-NN Regressor
knn_regressor = KNeighborsRegressor(n_neighbors=5)
knn_regressor.fit(X_train_reg, y_train_reg)

# Train a Linear Regression model
linear_regressor = LinearRegression()
linear_regressor.fit(X_train_reg, y_train_reg)

# Train a Polynomial Regression model (linear regression with polynomial features)
poly_features = PolynomialFeatures(degree=2)
X_poly_train = poly_features.fit_transform(X_train_reg)
X_poly_test = poly_features.transform(X_test_reg)

poly_regressor = LinearRegression()
poly_regressor.fit(X_poly_train, y_train_reg)

# Make predictions for each model
y_pred_dt = dt_regressor.predict(X_test_reg)
y_pred_knn = knn_regressor.predict(X_test_reg)
y_pred_linear = linear_regressor.predict(X_test_reg)
y_pred_poly = poly_regressor.predict(X_poly_test)

# Calculate evaluation metrics for each model
metrics = {
    'Decision Tree': {
        'MSE': mean_squared_error(y_test_reg, y_pred_dt),
        'RMSE': np.sqrt(mean_squared_error(y_test_reg, y_pred_dt)),
        'R²': r2_score(y_test_reg, y_pred_dt)
    },
    'k-NN': {
        'MSE': mean_squared_error(y_test_reg, y_pred_knn),
        'RMSE': np.sqrt(mean_squared_error(y_test_reg, y_pred_knn)),
        'R²': r2_score(y_test_reg, y_pred_knn)
    },
    'Linear Regression': {
        'MSE': mean_squared_error(y_test_reg, y_pred_linear),
        'RMSE': np.sqrt(mean_squared_error(y_test_reg, y_pred_linear)),
        'R²': r2_score(y_test_reg, y_pred_linear)
    },
    'Polynomial Regression': {
        'MSE': mean_squared_error(y_test_reg, y_pred_poly),
        'RMSE': np.sqrt(mean_squared_error(y_test_reg, y_pred_poly)),
        'R²': r2_score(y_test_reg, y_pred_poly)
    },
}

# Create a DataFrame for the evaluation metrics
metrics_regression_df = pd.DataFrame(metrics).T

# Display the metrics
metrics_regression_df


Unnamed: 0,MSE,RMSE,R²
Decision Tree,19.681814,4.436419,0.927259
k-NN,91.618345,9.571747,0.661393
Linear Regression,87.772618,9.368704,0.675606
Polynomial Regression,41.543331,6.445412,0.846462
