# Acquired Immunodeficiency Syndrome (AIDS)

- AIDS (Acquired Immunodeficiency Syndrome) is a disease caused by the HIV (Human Immunodeficiency Virus). This virus weakens the immune system, making the body vulnerable to infections.

- HIV spreads through infected blood, semen, vaginal fluids, and breast milk. It can be transmitted via unprotected sexual contact, sharing infected needles, or from an infected mother to her baby during childbirth or breastfeeding.

- In the early stages, HIV may not show noticeable symptoms. However, over time, it weakens the immune system, leading to symptoms like fatigue, weight loss, fever, and frequent infections.

- There is no permanent cure for AIDS, but Antiretroviral Therapy (ART) helps control HIV and slow its progression. Prevention includes practicing safe sex, using clean needles, and regular HIV testing.

## Question: How does HIV become AIDS?

HIV attacks the immune system, slowly reducing CD4 cells.
Without treatment, it progresses over years, making the body weak.
When CD4 count drops below 200, severe infections lead to AIDS.

# About Dataset

*The AIDS_Classification_50000.csv dataset is a comprehensive resource specifically compiled for researchers and healthcare professionals focusing on the statistical analysis of AIDS (Acquired Immunodeficiency Syndrome). Composed of 50,000 instances, this dataset encapsulates a broad spectrum of clinical and demographic variables related to AIDS patients. Each record in the dataset holds data across 23 columns, indicating various patient attributes including treatment details, demographic information, clinical test results, and disease progression indicators.*

# Importing Libraries

In [4]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

print("Libraries imported successfully! Ready to proceed")

Libraries imported successfully! Ready to proceed


In [5]:
# Load the data
PATH = "AIDS_Classification_50000.csv"
df = pd.read_csv(PATH)
print("Data loaded successfully!")

Data loaded successfully!


In [6]:
# Printing the first 5 rows of the dataset
df.head()

Unnamed: 0,time,trt,age,wtkg,hemo,homo,drugs,karnof,oprior,z30,...,str2,strat,symptom,treat,offtrt,cd40,cd420,cd80,cd820,infected
0,1073.0,1.0,37.0,79.46339,0.0,1.0,0.0,100.0,0.0,1.0,...,1.0,2.0,0.0,1.0,0.0,322.0,469.0,882.0,754.0,1.0
1,324.0,0.0,33.0,73.02314,0.0,1.0,0.0,90.0,0.0,1.0,...,1.0,3.0,1.0,1.0,1.0,168.0,575.0,1035.0,1525.0,1.0
2,495.0,1.0,43.0,69.47793,0.0,1.0,0.0,100.0,0.0,1.0,...,1.0,1.0,0.0,0.0,0.0,377.0,333.0,1147.0,1088.0,1.0
3,1201.0,3.0,42.0,89.15934,0.0,1.0,0.0,100.0,1.0,1.0,...,1.0,3.0,0.0,0.0,0.0,238.0,324.0,775.0,1019.0,1.0
4,934.0,0.0,37.0,137.46581,0.0,1.0,0.0,100.0,0.0,0.0,...,0.0,3.0,0.0,0.0,1.0,500.0,443.0,1601.0,849.0,0.0


In [7]:
# Printing the last 5 rows of the dataset
df.tail()

Unnamed: 0,time,trt,age,wtkg,hemo,homo,drugs,karnof,oprior,z30,...,str2,strat,symptom,treat,offtrt,cd40,cd420,cd80,cd820,infected
49995,953.0,3.0,46.0,61.28204,0.0,0.0,0.0,90.0,0.0,1.0,...,1.0,3.0,0.0,1.0,1.0,234.0,402.0,481.0,1014.0,0.0
49996,1036.0,0.0,42.0,73.36768,0.0,1.0,0.0,100.0,0.0,1.0,...,1.0,3.0,0.0,0.0,1.0,369.0,575.0,514.0,657.0,0.0
49997,1157.0,0.0,40.0,78.75824,0.0,1.0,0.0,100.0,0.0,1.0,...,1.0,1.0,0.0,1.0,0.0,308.0,663.0,1581.0,863.0,0.0
49998,596.0,0.0,31.0,52.20371,0.0,0.0,0.0,100.0,0.0,1.0,...,1.0,1.0,0.0,1.0,1.0,349.0,440.0,470.0,865.0,1.0
49999,612.0,2.0,41.0,77.121,0.0,1.0,0.0,90.0,0.0,1.0,...,1.0,3.0,0.0,1.0,0.0,428.0,396.0,1002.0,696.0,0.0


In [8]:
# Printing the shape of the dataset
print("The shape of the dataset is:", df.shape)

The shape of the dataset is: (50000, 23)


In [10]:
# Descriptive statistics
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
time,50000.0,877.36978,307.288688,66.0,542.0,1045.0,1136.0,1231.0
trt,50000.0,1.3848,1.233272,0.0,0.0,1.0,3.0,3.0
age,50000.0,34.16402,7.091152,12.0,29.0,34.0,39.0,68.0
wtkg,50000.0,75.861991,12.02873,42.36162,68.253682,74.054115,81.142185,149.83087
hemo,50000.0,0.03348,0.179888,0.0,0.0,0.0,0.0,1.0
homo,50000.0,0.65354,0.475847,0.0,0.0,1.0,1.0,1.0
drugs,50000.0,0.13222,0.338733,0.0,0.0,0.0,0.0,1.0
karnof,50000.0,96.83156,5.091788,76.0,90.0,100.0,100.0,100.0
oprior,50000.0,0.0423,0.201275,0.0,0.0,0.0,0.0,1.0
z30,50000.0,0.64088,0.479747,0.0,0.0,1.0,1.0,1.0


Attribute Description:

- time: Time since the baseline measurement, in days.
- trt: Treatment code (0, 1, 2), where each number signifies a different treatment regimen.
- age: Age of the patient in years. {in the range of [12,68]}
- wtkg: Weight of the patient in kilograms. {in the range of [42, 149]}
- hemo: Presence of Hemophilia (0 = No, 1 = Yes).
- homo: Homosexual behavior (0 = No, 1 = Yes).
- drugs: Drug use (0 = No, 1 = Yes).
- karnof: Karnofsky score indicating patient's functional impairment (scores range from 0 to 100). {min_val, max_val = 76, 100}
- oprior: Number of opportunistic infections prior to study.
- z30: Presence of Z30 gene (0 = No, 1 = Yes).
- preanti: Months before receiving antiretroviral therapy.
- race: Race (0 = Non-white, 1 = White).
- gender: Gender (0 = Female, 1 = Male).
- str2: Stratification variable 2.
- strat: Overall stratification.
- symptom: Presence of specific AIDS-related symptoms (0 = No, 1 = Yes).
- treat: Treatment response (0 = No, 1 = Yes).
- offtrt: Off treatment (0 = No, 1 = Yes).
- cd40: CD4 count at the baseline. {in the range [236, 930]}
- cd420: CD4 count at 20 weeks. {in the range [327, 1119]}
- cd80: CD4 count at 8 weeks. {in the range [885, 4656]}
- cd820: CD4 count at 20 weeks post the 8-week measurement. {in the range [649, 3585]}
- infected: HIV infection status (0 = Negative, 1 = Positive).
