# **Tugas 3**

Dengan menggunakan dataset diabetes, buatlah ensemble voting dengan algoritma

1. Logistic Regression

2. SVM kernel polynomial

3. Decission Tree

Anda boleh melakukan eksplorasi dengan melakukan tunning hyperparameter

### **Import Library**

In [3]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler

### **Persiapan Data**

In [5]:
# Load data
df = pd.read_csv('data/diabetes.csv')

# Tampilkan 5 baris pertama data
print(df.head())

   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6      148             72             35        0  33.6   
1            1       85             66             29        0  26.6   
2            8      183             64              0        0  23.3   
3            1       89             66             23       94  28.1   
4            0      137             40             35      168  43.1   

   DiabetesPedigreeFunction  Age  Outcome  
0                     0.627   50        1  
1                     0.351   31        0  
2                     0.672   32        1  
3                     0.167   21        0  
4                     2.288   33        1  


In [6]:
# Cek kolom null
df.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

### **Split Data Training & Testing**

In [7]:
# Pisahkan fitur (X) dan label (y)
X = df.drop(columns=['Outcome'])
y = df['Outcome']

# Split data menjadi training dan testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### **Normalisasi Fitur**

In [8]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### **Tuning Hyperparameter untuk Logistic Regression**

In [9]:
log_reg = LogisticRegression(solver='liblinear')
param_grid_lr = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2']
}
grid_search_lr = GridSearchCV(log_reg, param_grid_lr, cv=5)
grid_search_lr.fit(X_train, y_train)
best_lr = grid_search_lr.best_estimator_
print("Best Logistic Regression Parameters:", grid_search_lr.best_params_)


Best Logistic Regression Parameters: {'C': 10, 'penalty': 'l2'}
Best SVM Parameters: {'C': 10, 'degree': 3, 'gamma': 'scale'}


### **Tuning Hyperparameter untuk SVM dengan kernel polynomial**

In [11]:
svm = SVC(kernel='poly', probability=True)
param_grid_svm = {
    'C': [0.1, 1, 10],
    'degree': [2, 3, 4],
    'gamma': ['scale', 'auto']
}
grid_search_svm = GridSearchCV(svm, param_grid_svm, cv=5)
grid_search_svm.fit(X_train, y_train)
best_svm = grid_search_svm.best_estimator_
print("Best SVM Parameters:", grid_search_svm.best_params_)

Best SVM Parameters: {'C': 10, 'degree': 3, 'gamma': 'scale'}


### **Tuning Hyperparameter untuk Decision Tree**

In [10]:
dt = DecisionTreeClassifier()
param_grid_dt = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}
grid_search_dt = GridSearchCV(dt, param_grid_dt, cv=5)
grid_search_dt.fit(X_train, y_train)
best_dt = grid_search_dt.best_estimator_
print("Best Decision Tree Parameters:", grid_search_dt.best_params_)

Best Decision Tree Parameters: {'criterion': 'entropy', 'max_depth': 30, 'min_samples_split': 10}


### **Membangun Ensemble Voting Classifier**

In [12]:
# Membangun Voting Classifier
voting_clf = VotingClassifier(estimators=[
    ('log_reg', best_lr),
    ('svm', best_svm),
    ('decision_tree', best_dt)
], voting='soft')  # Menggunakan soft voting

# Fit Voting Classifier pada data training
voting_clf.fit(X_train, y_train)

# Prediksi dengan set test
y_pred = voting_clf.predict(X_test)

# Hitung akurasi
acc = accuracy_score(y_test, y_pred)
print(f"Voting Classifier Test set accuracy: {acc:.2f}")
print(classification_report(y_test, y_pred))


Voting Classifier Test set accuracy: 0.78
              precision    recall  f1-score   support

           0       0.82      0.84      0.83        99
           1       0.70      0.67      0.69        55

    accuracy                           0.78       154
   macro avg       0.76      0.76      0.76       154
weighted avg       0.78      0.78      0.78       154

