## HELLO WORLD Program With SCIKIT LEARN

[repo](https://github.com/parthvadhadiya/hello-world-program-in-Scikit-Learn),
[article](https://medium.com/@parthvadhadiya424/hello-world-program-with-scikit-learn-a869beb55deb)

Following is pipeline of model building in SK-Learn:

### Import -> Initialize -> Fit(Train) -> Predict
<br>

In [1]:
import numpy as np

### Loading Data-Set

First we create a simple matrix in Numpy:

In [7]:
# Create 10 samples with 5 feature-sets each
X = np.random.random((10, 5))

# Shape of X
print("shape:", X.shape)
print("\nrandom array:", X)

shape: (10, 5)

random array: [[0.73300596 0.95212403 0.9102191  0.95025522 0.4545641 ]
 [0.63314637 0.04661544 0.75931578 0.98142014 0.9008036 ]
 [0.46639179 0.95367503 0.84668171 0.46853314 0.41802841]
 [0.24934605 0.50697892 0.83727587 0.3417669  0.34389248]
 [0.78316128 0.6964057  0.3606834  0.81410236 0.8713395 ]
 [0.42016431 0.13957179 0.0089049  0.21873519 0.79670351]
 [0.82632631 0.87254799 0.41272359 0.39557873 0.59178659]
 [0.27427864 0.72429084 0.35459739 0.21412522 0.04451131]
 [0.02992162 0.82528768 0.16373909 0.80278509 0.59682682]
 [0.08112299 0.46600549 0.67246546 0.13758648 0.18061462]]


[Sparse Matrix Representation in Python](https://www.kdnuggets.com/2020/05/sparse-matrix-representation-python.html)

Then we need to zero out a majority of the matrix elements, making it sparse ("simplified"):

In [11]:
X[X < 0.7] = 0
X

array([[0.73300596, 0.95212403, 0.9102191 , 0.95025522, 0.        ],
       [0.        , 0.        , 0.75931578, 0.98142014, 0.9008036 ],
       [0.        , 0.95367503, 0.84668171, 0.        , 0.        ],
       [0.        , 0.        , 0.83727587, 0.        , 0.        ],
       [0.78316128, 0.        , 0.        , 0.81410236, 0.8713395 ],
       [0.        , 0.        , 0.        , 0.        , 0.79670351],
       [0.82632631, 0.87254799, 0.        , 0.        , 0.        ],
       [0.        , 0.72429084, 0.        , 0.        , 0.        ],
       [0.        , 0.82528768, 0.        , 0.80278509, 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ]])

In [13]:
# Create corresponding labels(M, F)
y = np.array(['M','F','M','M','M','F','M','F','M','F'])

# Shape of y
print("shape:", y.shape)
y

shape: (10,)


array(['M', 'F', 'M', 'M', 'M', 'F', 'M', 'F', 'M', 'F'], dtype='<U1')

### Split Data-Set into Train Set And Test Set

In every ML or DL program we perform **split -> train** and **test** dataset for evolution of our model.

What is **evolution**? Measurement of how our model does after training.

So we...

1. **Split** our data set
2. Perform **training** with `training-data` set
3. Check **accuracy** of our model using `testing-data` set

For splitting, we have awesome module called `train_test_split()`.

In [14]:
from sklearn.model_selection import train_test_split

In [18]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

print("X_train:", X_train.shape)
print("X_test:", X_test.shape)
print("y_train:", y_train.shape)
print("y_test:", y_test.shape)

print("\nX_train:", X_train)
print("\nX_test:", X_test)
print("\ny_train:", y_train)
print("\ny_test:", y_test)

X_train: (7, 5)
X_test: (3, 5)
y_train: (7,)
y_test: (3,)

X_train: [[0.         0.         0.         0.         0.        ]
 [0.         0.         0.75931578 0.98142014 0.9008036 ]
 [0.82632631 0.87254799 0.         0.         0.        ]
 [0.         0.72429084 0.         0.         0.        ]
 [0.         0.         0.83727587 0.         0.        ]
 [0.73300596 0.95212403 0.9102191  0.95025522 0.        ]
 [0.         0.         0.         0.         0.79670351]]

X_test: [[0.         0.95367503 0.84668171 0.         0.        ]
 [0.         0.82528768 0.         0.80278509 0.        ]
 [0.78316128 0.         0.         0.81410236 0.8713395 ]]

y_train: ['F' 'F' 'M' 'F' 'M' 'M' 'F']

y_test: ['M' 'M' 'M']


### Initialize Models

In [20]:
# Support vector machines (svm)
from sklearn.svm import SVC
svc = SVC(kernel='linear')

In [21]:
# KNN
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(n_neighbors=4)

In [22]:
# Naive Bayes 
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()

In [23]:
# Principal Component Analysis (PCA) 
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)

In [24]:
# K-Means 
from sklearn.cluster import KMeans
k_means = KMeans(n_clusters=4, random_state=0)

### Fit Data Into Models

Test 'em!  Test these different algorithms.

In [25]:
svc.fit(X_train, y_train)

SVC(kernel='linear')

In [26]:
knn.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=4)

In [27]:
gnb.fit(X_train, y_train)

GaussianNB()

In [28]:
k_means.fit(X_train)


KMeans(n_clusters=4, random_state=0)

In [29]:
pca_model = pca.fit_transform(X_train)


### Prediction

In [30]:
pred = svc.predict(X_test)
print(pred)

['M' 'F' 'F']


In [34]:
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)

pred = knn.predict(X_test)
print(pred)

['M' 'F' 'F']


In [32]:
pred = gnb.predict(X_test)
print(pred)

['M' 'M' 'F']


In [35]:
pred = k_means.predict(X_test)
print(pred)

[1 1 0]


# Dog vs cat classification

* 3 images of dogs
* 3 images of cats
* 2 pixel values (x, y)

See: Hello_World.py