








 # Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objectives

   
  At the end of the experiment, you will be able to :
    
  * implement Logistic Regression using sklearn library


### Dataset Description

The dataset named "Heart Disease Dataset" comes from a study conducted in 1988 and originates from the UCI Machine Learning Repository. The task is to get the best predictor and guess if a patient has a heart disease.

The dataset consists of 303 individuals data. There are 14 columns in the dataset, which are described below.


**1. Age:** The person’s age in years

**2. Sex:** The person’s Gender (1 = male, 0 = female)

**3. cp - chest pain type:** The type of chest pain experienced by the individual person



*  0: typical angina

*  1: atypical angina

*  2: non-anginal pain

*  3: asymptotic


**4. trestbps - Resting Blood Pressure:** The person’s resting blood pressure (mm Hg on admission to the hospital)

**5. chol - Serum Cholestrol:** The person’s cholesterol measurement in mg/dl

**6. fbs - Fasting Blood Sugar:** The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)

**7. restecg - Resting ECG:** resting electrocardiographic results

*   0: normal
*   1: having ST-T wave abnormality
*   2: left ventricular hyperthrophy


**8. thalach - Max heart rate achieved:** The person’s maximum heart rate achieved

**9. exang - Exercise induced angina:** Exercise induced angina (1 = yes; 0 = no)

**10. oldpeak - ST depression induced by exercise relative to rest:** ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot.)

**11. slope - Peak exercise ST segment:** The slope of the peak exercise ST segment


*  0: downsloping
*  1: flat
*  2: upsloping


**12. ca - Number of major vessels (0–3) colored by flourosopy:** The number of major vessels (0–3)

**13. thal:** A blood disorder called thalassemia


*   0: NULL (dropped from the dataset)
*   1: fixed defect (no blood flow in some part of the heart)
*   2: normal blood flow
*   3: reversible defect (a blood flow is observed but it is not normal)

**14. target:** Heart disease (1 = no, 0= yes)





**Problem description:**

The goal is to predict the binary class Heart Disease (target), which represents whether or not a patient has heart disease:

0 represents no heart disease present

1 represents heart disease present

### Setup Steps:

In [10]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2302815" #@param {type:"string"}

In [11]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "+6592721549" #@param {type:"string"}

In [12]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "U1W3_10_Logistic_Regression_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
    from IPython.display import HTML, display
    ipython.magic("sx wget https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Heart_Disease.csv")
    display(HTML('<script src="https://staging.dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getWalkthrough() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook, "feedback_walkthrough":Walkthrough ,
              "feedback_experiments_input" : Comments,
              "feedback_inclass_mentor": Mentor_support}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://aiml-iiith.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


def getWalkthrough():
  try:
    if not Walkthrough:
      raise NameError
    else:
      return Walkthrough
  except NameError:
    print ("Please answer Walkthrough Question")
    return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Importing the required packages

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

### Load the data

In [5]:
df = pd.read_csv('Heart_Disease.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [6]:
# Check for the shape of the dataset
df.shape

(303, 14)

In [7]:
df.dtypes

age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal          int64
target        int64
dtype: object

In [8]:
# Check for missing values
df.isna().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

There are no missing values in the dataset

In [9]:
# Check for the target values
df['target'].value_counts()

target
1    165
0    138
Name: count, dtype: int64

### Store the features and labels

In [13]:
X = df.drop('target', axis = 1) # Features
y = df['target'] # Label

In [14]:
X.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2


### Split the data into train and test sets

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

### Normalization of the data

In [16]:
# define standard scaler
scaler = StandardScaler()
# transform data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### Apply Logistic Regression from sklearn

Refer to the following [link](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) for Logistic Regression from sklearn.

In [17]:
# Create an instance for logistic regression
log_reg = LogisticRegression()

# Train the model
log_reg.fit(X_train_scaled, y_train)

# Get the predictions on the test set
y_pred = log_reg.predict(X_test_scaled)

In [18]:
# Calculate the accuracy
print(accuracy_score(y_test,y_pred))

0.8524590163934426


In [19]:
# Storing the actuals and predictions in a dictionary to see the results
actual = []
prediction = []

for i,j in zip(y_test,y_pred):
  actual.append(i)
  prediction.append(j)

dic = {'Actual':actual,
       'Prediction':prediction
       }
result  = pd.DataFrame(dic)

In [20]:
result.head()

Unnamed: 0,Actual,Prediction
0,0,0
1,1,1
2,0,1
3,0,0
4,1,0


### Please answer the questions below to complete the experiment:




In [29]:
#@title The output of logistic regression is? { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "smaller than 0 and 1" #@param ["","smaller than 0", "greater than 1","smaller than 0 and 1","greater than 0 and smaller than 1"]

In [30]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good, But Not Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [31]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "na" #@param {type:"string"}


In [32]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [33]:
#@title  Experiment walkthrough video? { run: "auto", vertical-output: true, display-mode: "form" }
Walkthrough = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [34]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [35]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [36]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 4379
Date of submission:  24 May 2024
Time of submission:  13:02:29
View your submissions: https://aiml-iiith.talentsprint.com/notebook_submissions
