# Breast Cancer Detection Using Machine Learning

# Libraries for quick reference
### Pandas : 
1. Used for data manipluation
2. Some common functions used in pandas : read_csv(), dataframe(), rank(), sort(), etc
3. Refer to Pandas Cheatsheet : https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-for-data-science-in-python

### Scikit-learn :
1. Scikit-learn (Sklearn): useful and robust library for machine learning.
2. It provides a selection of machine learning: classification, regression, clustering and dimensionality. All you have to do is to import the algorithm in-built function
3. Refer to SKlearn documentation : https://scikit-learn.org/stable/user_guide.html

### Pickle :
1. Pickle : Used in deployment of the model in serializing and deserializing a Python object structure. 
2. It converts a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network
3. Refer to python documentation for functions used : https://docs.python.org/3/library/pickle.html

In [2]:
#importing the libraries 
import pandas as pd                                          #Importing the pandas library as pd abbreviation/alias
from sklearn.linear_model import LogisticRegression          #Using the Scikit-learn library, pulling the linear model class and further calling the LogisticRegression function
                                                             #LogisticRegression is the ML agorithm (all the mathematics calculation involved in the algorithm is packaged in this function and we are just calling it)
                                                             
from sklearn.metrics import accuracy_score                   #calling the metric class from scikit learn and importing the accuracy_score function
import pickle

#Reading the data using pandas read function
df = pd.read_csv('breast_cancer_detection.csv')              #based on the file type csv,txt, xlsx, etc appropriate function can be called for reading the data.
                                                             #Data is stored in a dataframe which is form of tables (rows & cols)

X = df.iloc[:,1:len(df.columns)]                             #iloc is a pandas function which helps in selecting the specific/all rows, columns from a datframe
                                                             #X is an object which contains all the independent variables needed for the model (all rows but contains columns starting from 1 index)                                                     
                                                             #REMEMBER: iloc and loc are different function, refer this to recall #https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

y = df.iloc[:,0]                                             #Synatx meaning - index both axis, all rows and 0th location column
                                                             #Y is the dependent/ target variable (which will be predicted)

model = LogisticRegression(max_iter=800)                     
                                                             #Logistic Regression () gives binary output. maximum iterations taken for solver to converge
                                                             #model is an object/classifier which will store either 0 or 1 value
    
model.fit(X,y)                                               #how well the values of dependent and independent fits the logistic regression model

predictions = model.predict(X)                               #assigning the model predicted values to predictions object. 
print(accuracy_score(y,predictions))                         #comparing the actual target variable values with the prediciton made by the model and printing the accuracy score. Values lies between 0-1


pickle_out = open('classifier', mode='wb')                   #opening/reading the classifier file (produce on running the model) in write and binary mode
pickle.dump(model, pickle_out)                               #converting the model objecte created and converting into byte stream and loading in destination (pickle_out)
pickle_out.close()                                           #closing the python object we created

0.8910369068541301


In [3]:
%%writefile app.py

import pickle
import streamlit as st

pickle_in = open('classifier', 'rb')
classifier = pickle.load(pickle_in)

@st.cache()

# Define the function which will make the prediction using data
# inputs from users
def prediction(radius, texture,smoothness, concave_points, symmetry):
    
    # Make predictions
    prediction = classifier.predict(
        [[radius, texture,smoothness, concave_points, symmetry]])
    
    if prediction == 0:
        pred = 'Congratulation you are fit and do NOT have Breast Cancer'
    else:
        pred = 'PLEASE SEE A DOCTOR!  You are likely to have Breast Cancer!'
    return pred

# This is the main function in which we define our webpage
def main():
    
    # Create input fields
    radius = st.number_input("Radius of detected growth",
                                  min_value=0.000,
                                  max_value=5.000,
                                  value=0.000,
                                  step=0.005,
                                     )
    
    texture = st.number_input("Measure of growth texture",
                              min_value=10.00,
                              max_value=60.00,
                              value=10.00,
                              step=5.00
                             )

    smoothness = st.number_input("Measure of growth smoothness",
                              min_value=0.050,
                              max_value=1.000,
                              value=0.050,
                              step=0.002
                             )
    concave_points = st.number_input("number of concave indentations observed along the border of the growth",
                          min_value=0.000,
                          max_value=0.400,
                          value=0.000,
                          step=0.001
                         )
    symmetry = st.number_input("Measure of how symmetric the growth is",
                          min_value=0.000,
                          max_value=2.00,
                          value=0.000,
                          step=0.001
                         )

    result = ""
    
    # When 'Predict' is clicked, make the prediction and store it
    if st.button("Predict"):
        result = prediction(radius, texture, smoothness, concave_points, symmetry)
        st.success(result)
        
if __name__=='__main__':
    main()
    

Overwriting app.py


In [4]:
!streamlit run app.py

^C
