A webpage built using Support Vector Machine algorithm and Streamlit.
Check it out here
- Supervised machine learning algorithm (SVM)
- Numpy
- Pandas
- Pickle
- Streamlit
The dataset used from Kaggle
- Loaded the .csv file using pandas
- Checked the outcome column to understand count for 0 and 1
- 0 -> Non Diabetic
- 1 -> Diabetic
- Grouped outcome column based on its mean value with all columns
- Took 2 variables
- x -> Columns except Outcome
- y -> Outcome column only
- Standardising data for transform data such that its distribution will have a mean value 0 and standard deviation of 1
scalar = StandardScaler()
scalar.fit(x)
standardized_data = scalar.transform(x)
- Splitted the data into Training and Testing purpose
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,stratify=y,random_state=2 )
20% is kept for testing
stratify used to ensure that both the train and test sets have the proportion of examples in each class that is present in the provided “y” array
- Training the SVM model
classifier = svm.SVC(kernel='linear')
classifier.fit(x_train,y_train)
- Finding the accuracy score of the model
- Booyah got 72.2% accuracy 🥳
- Doing the ultimate part. The Prediction
input_data = (4,170,92,30,40,37.6,0.592,30)
input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
std_data = scalar.transform(input_data_reshaped)
print(std_data)
prediction = classifier.predict(std_data)
print(prediction)
if (prediction[0]==0):
print('Not diabetic')
else:
print('Diabetic!')
- And for user convenience made onto Webpage using Streamlit
- Saved our model into .sav file
- and then took inputs from streamlit input fields
- Finally prediction is done on the basis of user entered data