## Makine Öğrenmesiyle Meme Kanseri Sınıflandırma

ABD'nin Wisconsin şehrindeki meme kanseri vakalarının istatistiği bir veriseti halinde https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) adresinde paylaşılmıştır. Bu veriseti 700 hastanın çeşitli hücre bilgilerini içermektedir. Verisetinde bulunan özellikler;

   ###  Attribute                           Domain
   -- -----------------------------------------
   **Sample code number**            id number  
   **Clump Thickness**               1 - 10  
   **Uniformity of Cell Size**       1 - 10  
   **Uniformity of Cell Shape**      1 - 10  
   **Marginal Adhesion**             1 - 10  
   **Single Epithelial Cell Size**   1 - 10  
   **Bare Nuclei**                   1 - 10  
   **Bland Chromatin**               1 - 10  
   **Normal Nucleoli**               1 - 10  
   **Mitoses**                       1 - 10  
   **Class:**                        (2 for benign, 4 for malignant)  


### Verisetinin yapısı

In [1]:
import pandas as pd

In [2]:
dataset = pd.read_csv("../datasets/breast-cancer-wisconsin.data",header=None)
dataset.columns = ["id_number", "clump_thickness","cell_size","cell_shape",
                   "adhesion","se_cell_size","bare_nuclei","bland_chromatin","normal_nucleoi","mitoses","class"]

In [3]:
dataset.head()

Unnamed: 0,id_number,clump_thickness,cell_size,cell_shape,adhesion,se_cell_size,bare_nuclei,bland_chromatin,normal_nucleoi,mitoses,class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2


In [4]:
dataset.iloc[:,1:-1].head()

Unnamed: 0,clump_thickness,cell_size,cell_shape,adhesion,se_cell_size,bare_nuclei,bland_chromatin,normal_nucleoi,mitoses
0,5,1,1,1,2,1,3,1,1
1,5,4,4,5,7,10,3,2,1
2,3,1,1,1,2,2,3,1,1
3,6,8,8,1,3,4,3,7,1
4,4,1,1,3,2,1,3,1,1


In [5]:
dataset["class"].head()

0    2
1    2
2    2
3    2
4    2
Name: class, dtype: int64

### Eksik değer içeren örnekleri verisetinden çıkarıyoruz.

In [6]:
import numpy as np
dataset.replace({'bare_nuclei': {'?': np.nan}}, regex=False,inplace=True)
print(dataset.isnull().sum())
dataset = dataset.dropna()
print(dataset.isnull().sum())


id_number           0
clump_thickness     0
cell_size           0
cell_shape          0
adhesion            0
se_cell_size        0
bare_nuclei        16
bland_chromatin     0
normal_nucleoi      0
mitoses             0
class               0
dtype: int64
id_number          0
clump_thickness    0
cell_size          0
cell_shape         0
adhesion           0
se_cell_size       0
bare_nuclei        0
bland_chromatin    0
normal_nucleoi     0
mitoses            0
class              0
dtype: int64


### Veriseti eğitim ve test olmak üzere ikiye bölünüyor.

In [7]:
X = dataset.iloc[:,1:-1]
y = dataset["class"]

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y)

## KNN ile sınıflandırma

In [56]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics

knn = KNeighborsClassifier(n_neighbors=15)
knn.fit(X_train,y_train)
knn_predictions = knn.predict(X_test)
print(metrics.accuracy_score(y_test,knn_predictions))

0.9883040935672515


## Karar Ağaçlarıyla Sınıflandırma

In [57]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

dec_tree = DecisionTreeClassifier()
dec_tree.fit(X_train,y_train)
dt_predictions = dec_tree.predict(X_test)
print(metrics.accuracy_score(y_test,dt_predictions))

0.9532163742690059


### Naive Bayes Sınıflandırıcı

In [58]:
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics

nb = GaussianNB()
nb.fit(X_train,y_train)
nb_predictions = nb.predict(X_test)
print(metrics.accuracy_score(y_test,nb_predictions))

0.9824561403508771


In [60]:
for i in range(len(knn_predictions)):
    print("Actual: ",list(y_test)[i],"KNN: ",knn_predictions[i], 
          "DT: ", dt_predictions[i], "NB: ",nb_predictions[i])

Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  4 NB:  2
Actual:  4 KNN:  4 DT:  4 NB:  4
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  4 KNN:  4 DT:  2 NB:  4
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  4 KNN:  4 DT:  2 NB:  4
Actual:  2 KNN:  2 DT:  4 NB:  4
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  4 KNN:  4 DT:  4 NB:  4
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  4 KNN:  4 DT:  4 NB:  4
Actual:  4 KNN:  4 DT:  4 NB:  4
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  4 KNN:  4 DT:  4 NB:  4
Actual:  2 KNN:  2 DT:  2 NB:  2
Actual:  2