Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



16 Commits

Repository files navigation

Diabetes Prediction

A webpage built using Support Vector Machine algorithm and Streamlit.

Check it out here

Things that I have used

  • Supervised machine learning algorithm (SVM)
  • Numpy
  • Pandas
  • Pickle
  • Streamlit


How did I do?

The dataset used from Kaggle

  1. Loaded the .csv file using pandas
  2. Checked the outcome column to understand count for 0 and 1
    • 0 -> Non Diabetic
    • 1 -> Diabetic
  3. Grouped outcome column based on its mean value with all columns
  4. Took 2 variables
    • x -> Columns except Outcome
    • y -> Outcome column only
  5. Standardising data for transform data such that its distribution will have a mean value 0 and standard deviation of 1
scalar = StandardScaler()
standardized_data = scalar.transform(x)
  1. Splitted the data into Training and Testing purpose
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,stratify=y,random_state=2 )

20% is kept for testing

stratify used to ensure that both the train and test sets have the proportion of examples in each class that is present in the provided “y” array

  1. Training the SVM model
classifier = svm.SVC(kernel='linear'),y_train)
  1. Finding the accuracy score of the model
    • Booyah got 72.2% accuracy 🥳
  2. Doing the ultimate part. The Prediction
input_data = (4,170,92,30,40,37.6,0.592,30)
input_data_as_numpy_array = np.asarray(input_data)

input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

std_data = scalar.transform(input_data_reshaped)

prediction = classifier.predict(std_data)

if (prediction[0]==0):
  print('Not diabetic')
  1. And for user convenience made onto Webpage using Streamlit
    • Saved our model into .sav file
    • and then took inputs from streamlit input fields
    • Finally prediction is done on the basis of user entered data