In [2]:
import numpy as np
import pandas as pd


About Breast Cancer Wisconsin (Diagnostic) Data Set Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

In [3]:
df=pd.read_csv('data.csv')
df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [4]:
new=df.drop(['id','Unnamed: 32'], axis=1)

In [5]:
new.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


suffe the data frame

In [6]:
from sklearn.utils import shuffle
df = shuffle(new)

Splitting the data into Training and Testing

In [11]:
df_copy = df.copy()
train_set = df_copy.sample(frac=0.90, random_state=0)
test_set = df_copy.drop(train_set.index)

In [12]:
#train_set.head()
test_set.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
25,M,17.14,16.4,116.0,912.7,0.1186,0.2276,0.2229,0.1401,0.304,...,22.25,21.4,152.4,1461.0,0.1545,0.3949,0.3853,0.255,0.4066,0.1059
106,B,11.64,18.33,75.17,412.5,0.1142,0.1017,0.0707,0.03485,0.1801,...,13.14,29.26,85.51,521.7,0.1688,0.266,0.2873,0.1218,0.2806,0.09097
457,B,13.21,25.25,84.1,537.9,0.08791,0.05205,0.02772,0.02068,0.1619,...,14.35,34.23,91.29,632.9,0.1289,0.1063,0.139,0.06005,0.2444,0.06788
244,M,19.4,23.5,129.1,1155.0,0.1027,0.1558,0.2049,0.08886,0.1978,...,21.65,30.53,144.9,1417.0,0.1463,0.2968,0.3458,0.1564,0.292,0.07614
514,M,15.05,19.07,97.26,701.9,0.09215,0.08597,0.07486,0.04335,0.1561,...,17.58,28.06,113.8,967.0,0.1246,0.2101,0.2866,0.112,0.2282,0.06954


feature and target split

In [13]:
train_set_labels = train_set.pop('diagnosis')
test_set_labels = test_set.pop('diagnosis')

In [14]:
train_set

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
461,27.420,26.27,186.90,2501.0,0.10840,0.19880,0.363500,0.168900,0.2061,0.05623,...,36.040,31.37,251.20,4254.0,0.13570,0.42560,0.68330,0.26250,0.2641,0.07427
199,14.450,20.22,94.49,642.7,0.09872,0.12060,0.118000,0.059800,0.1950,0.06466,...,18.330,30.12,117.90,1044.0,0.15520,0.40560,0.49670,0.18380,0.4753,0.10130
559,11.510,23.93,74.52,403.5,0.09261,0.10210,0.111200,0.041050,0.1388,0.06570,...,12.480,37.16,82.28,474.2,0.12980,0.25170,0.36300,0.09653,0.2112,0.08732
524,9.847,15.68,63.00,293.2,0.09492,0.08419,0.023300,0.024160,0.1387,0.06891,...,11.240,22.99,74.32,376.5,0.14190,0.22430,0.08434,0.06528,0.2502,0.09209
556,10.160,19.59,64.73,311.7,0.10030,0.07504,0.005025,0.011160,0.1791,0.06331,...,10.650,22.88,67.88,347.3,0.12650,0.12000,0.01005,0.02232,0.2262,0.06742
142,11.430,17.31,73.66,398.0,0.10920,0.09486,0.020310,0.018610,0.1645,0.06562,...,12.780,26.76,82.66,503.0,0.14130,0.17920,0.07708,0.06402,0.2584,0.08096
485,12.450,16.41,82.85,476.7,0.09514,0.15110,0.154400,0.048460,0.2082,0.07325,...,13.780,21.03,97.82,580.6,0.11750,0.40610,0.48960,0.13420,0.3231,0.10340
234,9.567,15.91,60.21,279.6,0.08464,0.04087,0.016520,0.016670,0.1551,0.06403,...,10.510,19.16,65.74,335.9,0.15040,0.09515,0.07161,0.07222,0.2757,0.08178
100,13.610,24.98,88.05,582.7,0.09488,0.08511,0.086250,0.044890,0.1609,0.05871,...,16.990,35.27,108.60,906.5,0.12650,0.19430,0.31690,0.11840,0.2651,0.07397
137,11.430,15.39,73.06,399.8,0.09639,0.06889,0.035030,0.028750,0.1734,0.05865,...,12.320,22.02,79.93,462.0,0.11900,0.16480,0.13990,0.08476,0.2676,0.06765


In [15]:
#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(train_set)
X_test = sc.transform(test_set)

In [16]:
#one hot encoding on y
#from keras.utils import to_categorical
#Y_train = to_categorical(train_set_labels)
#Y_test = to_categorical(test_set_labels)
#print(Y_train)

one_hot_data = pd.concat([train_set_labels, pd.get_dummies(train_set_labels)], axis=1)

# Drop the previous rank column
Y_train = one_hot_data.drop('diagnosis', axis=1)

one_hot_data = pd.concat([test_set_labels, pd.get_dummies(test_set_labels)], axis=1)

# Drop the previous rank column
Y_test = one_hot_data.drop('diagnosis', axis=1)


In [24]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.constraints import maxnorm

In [31]:
model = Sequential()
model.add(Dropout(0.2, input_shape=(30,)))
model.add(Dense(30, activation='relu',kernel_initializer='normal', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(15, activation='relu',kernel_initializer='normal', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(2, activation='softmax',kernel_initializer='normal'))
# Compiling the ANN
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [32]:
from keras.callbacks import EarlyStopping, ModelCheckpoint
filepath='weights.best.hdf5'
keras_callbacks=[
    EarlyStopping(monitor='val_loss',patience=25),
    ModelCheckpoint(filepath,monitor='val_loss',save_best_only=True)
]

In [33]:
model.fit(X_train, Y_train, epochs=300, validation_split=0.2, batch_size=50, callbacks=keras_callbacks)

Instructions for updating:
Use tf.cast instead.
Train on 409 samples, validate on 103 samples
Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300


Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300
Epoch 78/300
Epoch 79/300
Epoch 80/300
Epoch 81/300
Epoch 82/300
Epoch 83/300
Epoch 84/300
Epoch 85/300
Epoch 86/300
Epoch 87/300
Epoch 88/300
Epoch 89/300
Epoch 90/300
Epoch 91/300
Epoch 92/300
Epoch 93/300
Epoch 94/300
Epoch 95/300
Epoch 96/300
Epoch 97/300
Epoch 98/300
Epoch 99/300
Epoch 100/300
Epoch 101/300


<keras.callbacks.History at 0x7f7e939590f0>

In [34]:
from keras.models import load_model
saved_model = load_model('weights.best.hdf5')
score = saved_model.evaluate(X_test, Y_test, verbose=0)
print(score)

[0.24103429647195235, 0.9649122807017544]
