**PRAKTIKUM 1: Bagging**

**Bagging dengan RandomForest**

Pada kasus ini, kita akan menggunakan salah satu metode bagging, yaitu RandomForest, untuk mengklasifikasikan jenis tumor. Dalam latihan ini, Anda akan melakukan pelatihan dengan menggunakan dataset Wisconsin Breast Cancer dari UCI Machine Learning Repository. Tujuan dari latihan ini adalah untuk memprediksi apakah tumor tersebut ganas atau jinak.

Selama latihan ini, kita akan membandingkan performa antara algoritma Decision Tree dan RandomForest dalam menyelesaikan kasus ini.

In [4]:
# Import library yang diperlukan
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [5]:
# Load data dari file CSV
df = pd.read_csv('data/wbc.csv')

# Menampilkan 5 baris pertama dari data
df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [6]:
# Memeriksa apakah ada nilai null dalam dataset
df.isnull().sum()

id                           0
diagnosis                    0
radius_mean                  0
texture_mean                 0
perimeter_mean               0
area_mean                    0
smoothness_mean              0
compactness_mean             0
concavity_mean               0
concave points_mean          0
symmetry_mean                0
fractal_dimension_mean       0
radius_se                    0
texture_se                   0
perimeter_se                 0
area_se                      0
smoothness_se                0
compactness_se               0
concavity_se                 0
concave points_se            0
symmetry_se                  0
fractal_dimension_se         0
radius_worst                 0
texture_worst                0
perimeter_worst              0
area_worst                   0
smoothness_worst             0
compactness_worst            0
concavity_worst              0
concave points_worst         0
symmetry_worst               0
fractal_dimension_worst      0
Unnamed:

In [7]:
# Seleksi fitur dan label
X = df.iloc[:, 3:-1]  # Memilih kolom 'radius_mean' sampai 'fractal_dimension_worst' sebagai fitur
y = df['diagnosis']  # Label
y = y.map({'M': 1, 'B': 0})  # Mengkodekan label 'M' sebagai 1 dan 'B' sebagai 0

# Menampilkan jumlah fitur dan instance
X.shape

(569, 29)

In [8]:
# Membagi data menjadi data pelatihan dan pengujian
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [9]:
# Membuat model Decision Tree Classifier dengan parameter default
dt = DecisionTreeClassifier()

# Melatih model Decision Tree Classifier dengan data pelatihan
dt.fit(X_train, y_train)

# Melakukan prediksi menggunakan model Decision Tree pada data uji
y_pred_dt = dt.predict(X_test)

# Menghitung akurasi model Decision Tree
acc_dt = accuracy_score(y_test, y_pred_dt)
print("Test set accuracy: {:.2f}".format(acc_dt))
print(f"Test set accuracy: {acc_dt}")

Test set accuracy: 0.93
Test set accuracy: 0.9298245614035088


In [10]:
# Membuat model Random Forest Classifier dengan 10 estimator dan seed acak 1
rf = RandomForestClassifier(n_estimators=10, random_state=1)

# Melatih model Random Forest Classifier dengan data pelatihan
rf.fit(X_train, y_train)

# Melakukan prediksi menggunakan model Random Forest pada data uji
y_pred_rf = rf.predict(X_test)

# Menghitung akurasi model Random Forest
acc_rf = accuracy_score(y_test, y_pred_rf)
print("Test set accuracy: {:.2f}".format(acc_rf))
print(f"Test set accuracy: {acc_rf}")

Test set accuracy: 0.96
Test set accuracy: 0.956140350877193
