In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets

# Obtener los datos 


In [None]:
cancer = datasets.load_breast_cancer()

In [None]:
print(cancer.DESCR)

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, f

## Seleccionando entradas y salidas 
Para este ejemplo seleccionaremos un solo dato  para predecir  que tanto progresará la enfermedad  este dato será el índice de masa corporal 

In [None]:
cancer_X=cancer.data

In [None]:
cancer_X.shape

(569, 30)

In [None]:
cancer_y=cancer.target

In [None]:
cancer_y.shape

(569,)

## Importar el modelo

In [None]:
from sklearn.svm import SVC

## Ajustar Parámetros del modelo

In [None]:
model = SVC()

## Separar los datos

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
cancer_X_train, cancer_X_test, cancer_y_train, cancer_y_test=train_test_split(cancer_X,cancer_y,test_size=0.30,random_state=42)

In [None]:
print(cancer_X_train.shape)
print(cancer_y_train.shape)

(398, 30)
(398,)


In [None]:
print(cancer_X_test.shape)
print(cancer_y_test.shape)

(171, 30)
(171,)


## Ajuste del modelo

In [None]:
model.fit(cancer_X_train,cancer_y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

## Predecir resultados

In [None]:
cancer_y_pred=model.predict(cancer_X_test) 

Se pueden graficar los puntos  para ver como deberían separarse **(reducción dimensional)**

## Evaluar al modelo 

In [None]:
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

La precisión se usa generalmente  como métrica de clasificación esto es adecuado   

In [None]:
print(accuracy_score(cancer_y_pred,cancer_y_test)) 

0.935672514619883


In [None]:
print(confusion_matrix(cancer_y_test,cancer_y_pred))

[[ 52  11]
 [  0 108]]


In [None]:
print(classification_report(cancer_y_test,cancer_y_pred))

              precision    recall  f1-score   support

           0       1.00      0.83      0.90        63
           1       0.91      1.00      0.95       108

    accuracy                           0.94       171
   macro avg       0.95      0.91      0.93       171
weighted avg       0.94      0.94      0.93       171



Después de revisar la matriz de confusión y el reporte de clasificación nos damos cuenta que la primera clase ha sido completamente ignorada y todos los datos se clasifican en la segunda clase  

## Ajustar al modelo

Todo algoritmo se debe ajustar para resolver un problema en específico, es altamente improbable que un algoritmo de aprendizaje máquina funcione a la perfección sin un ajuste. 

Este proceso de ajuste es iterativo  en la mayoría de las ocasiones, Para preparar al algoritmo es necesario seleccionar  los  hiperparametros  adecuados.

  Un hiperparámetro es una constante que impactan en el desempeño de los algoritmos 

En el caso de una máquina de soporte vectorial  tenemos dos principales **c** y **gamma**


Una de las formas más fáciles de determinar el valor adecuado de los hiperparametros es por medio de una búsqueda ordenada. Esto se logra por medio de una análisis de todas las combinaciones  de los posibles valores y obtener la mejor. A Este método se le denomina búsqueda exhaustiva (“grid Search”).  

En la búsqueda exhaustiva se generan diccionarios con los posibles valores de cada  hiperparámetro.

In [None]:
param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001]} 

El método de búsqueda exhaustiva funciona de una forma muy similar a los métodos de aprendizaje máquina en **sklearn** .

### Importar el modelo

In [None]:
from sklearn.model_selection import GridSearchCV

### Ajustar Parámetros del modelo

In [None]:
grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)

### Ajuste del modelo

In [None]:
grid.fit(cancer_X_train,cancer_y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.625, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.625, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.625, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.633, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.620, total=   0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] .................... C=0.1, gamma=0.1, score=0.625, total=   0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] ..........

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV] .................. C=0.1, gamma=0.001, score=0.625, total=   0.0s
[CV] C=0.1, gamma=0.001 ..............................................
[CV] .................. C=0.1, gamma=0.001, score=0.625, total=   0.0s
[CV] C=0.1, gamma=0.001 ..............................................
[CV] .................. C=0.1, gamma=0.001, score=0.633, total=   0.0s
[CV] C=0.1, gamma=0.001 ..............................................
[CV] .................. C=0.1, gamma=0.001, score=0.620, total=   0.0s
[CV] C=0.1, gamma=0.0001 .............................................
[CV] ................. C=0.1, gamma=0.0001, score=0.938, total=   0.0s
[CV] C=0.1, gamma=0.0001 .............................................
[CV] ................. C=0.1, gamma=0.0001, score=0.887, total=   0.0s
[CV] C=0.1, gamma=0.0001 .............................................
[CV] ................. C=0.1, gamma=0.0001, score=0.938, total=   0.0s
[CV] C=0.1, gamma=0.0001 .............................................
[CV] .

[Parallel(n_jobs=1)]: Done 125 out of 125 | elapsed:    1.4s finished


GridSearchCV(cv=None, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': [0.1, 1, 10, 100, 1000],
                         'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=3)

In [None]:
grid.best_params_

{'C': 100, 'gamma': 0.0001}

In [None]:
grid.best_estimator_

SVC(C=100, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

### Predecir resultados

In [None]:
cancer_grid_predictions = grid.predict(cancer_X_test)

### Evaluar al modelo 

In [None]:
print(accuracy_score(cancer_grid_predictions,cancer_y_test)) 

0.9532163742690059


In [None]:
print(confusion_matrix(cancer_y_test,cancer_grid_predictions))

[[ 57   6]
 [  2 106]]


In [None]:
print(classification_report(cancer_y_test,cancer_grid_predictions))

             precision    recall  f1-score   support

          0       0.97      0.94      0.95        63
          1       0.96      0.98      0.97       108

avg / total       0.96      0.96      0.96       171

