### Support Vector Machines
#### Amaç, iki sınıf arasındaki ayrımın optimum olmasını sağlayacak hiper-düzlemi bulmaktır.
##### İki sınıf arasındaki ayrımı bir doğru vasıtasıyla sağlayıp, aralarındaki marjinin maksimum olması hedeflenir, bunun yapılamadığı durumlarda ise doğrusal olmayan yöntemlere(kernel trick) başvurulur.

In [7]:
import numpy as np
import pandas as pd 
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import confusion_matrix, accuracy_score, mean_squared_error, r2_score, roc_auc_score, roc_curve, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

In [8]:
import warnings
warnings.filterwarnings('ignore')

In [9]:
import warnings
warnings.filterwarnings('ignore')

In [10]:
df = pd.read_csv("diabetes.csv")

In [13]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [14]:
y = df["Outcome"]
X = df.drop(["Outcome"], axis=1)

#### Model & Prediction

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

In [18]:
# kernel otomatik ayar rgb(non_linear)
svm_model = SVC(kernel="linear").fit(X_train, y_train)

In [19]:
y_pred = svm_model.predict(X_test)

In [20]:
accuracy_score(y_test, y_pred)

0.7445887445887446

#### Model Tuning

In [24]:
svm = SVC()

In [25]:
# C = karmaşıklık(ceza) parametresi
svm_params = {"C": np.arange(1,10),
              "kernel": ["linear", "rgb"]}

In [26]:
svm_cv_model = GridSearchCV(svm, svm_params, cv=5, n_jobs=-1, verbose=2).fit(X_train, y_train)

Fitting 5 folds for each of 18 candidates, totalling 90 fits


In [27]:
svm_cv_model.best_score_

0.7839044652128765

In [28]:
svm_cv_model.best_params_

{'C': 2, 'kernel': 'linear'}

In [29]:
# Final Model

In [30]:
svm_tuned = SVC(C=2, kernel="linear").fit(X_train, y_train)

In [31]:
y_pred = svm_tuned.predict(X_test)

In [32]:
accuracy_score(y_test, y_pred)

0.7445887445887446