# Summary of that ipynb file 
- Dataframe 
- Series
- CSV File Reading 
- JSON File Reading
- Label Scaling/ Label Encoding 

In [1]:
import pandas as pd

## DataFrame

A **DataFrame** is like a table — just like an Excel sheet.

- It has **rows** and **columns**.
- Each **row** is one data entry (like one person, product, etc).
- Each **column** is a feature or category (like name, age, price).

In pandas, we use DataFrames to store and work with data easily.  
It's one of the most important tools in data analysis and machine learning.



In [2]:
student = {
    "student id": ["202818", "201918", "192830"],
    "study hr": ["3", "4", "1"],
    "grade": ["3.55", "3.77", "3.22"]
}

student_df = pd.DataFrame(student)
print(student_df)

  student id study hr grade
0     202818        3  3.55
1     201918        4  3.77
2     192830        1  3.22


## 📌 What is a Series?

A **Series** is like a single column from a table.

- It's a **one-dimensional** data structure in pandas.
- It has **values** and an **index** (like row numbers).
- You can think of it like a list in Python, but with labels.

#### Output:

| label | value  |
|-------|--------|
| 0     | female |
| 1     | female |
| 2     | male   |
dtype: object

In [3]:
gender = ["female", "female", "male"]
gender_series = pd.Series(gender)
print(gender_series)

0    female
1    female
2      male
dtype: object


In [4]:
# changing the label 
gender_with_custom_label = pd.Series(gender, index = ["202818", "201918", "192830"])
print(gender_with_custom_label)

202818    female
201918    female
192830      male
dtype: object


In [5]:
# series with dictionary 
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)

day1    420
day2    380
day3    390
dtype: int64


##### Adding Series to a DataFrame 

In [6]:
# my dataframe 
print(student_df)
print("---------------------")

#my series 
print(gender_series)
print("---------------------")

# adding gender in the df 
student_df["gender"] = gender_series
print(student_df)

  student id study hr grade
0     202818        3  3.55
1     201918        4  3.77
2     192830        1  3.22
---------------------
0    female
1    female
2      male
dtype: object
---------------------
  student id study hr grade  gender
0     202818        3  3.55  female
1     201918        4  3.77  female
2     192830        1  3.22    male


### 📄 What is a CSV File?

A **CSV (Comma-Separated Values)** file is a simple text file used to store **tabular data** (like rows and columns). Each line represents a row, and each value is separated by a comma.


#### Reading a CSV file and transform it to a daraframe 

In [7]:
df = pd.read_csv("student_data.csv")
print(df)

      Name  Age  Gender  GPA
0    Alice   22  Female  3.6
1      Bob   25    Male  3.2
2  Charlie   23    Male  3.4
3    Diana   24  Female  3.8
4    Ethan   26    Male  2.9
5    Fiona   21  Female  3.7
6   George   27    Male  3.1
7   Hannah   22  Female  3.9
8      Ian   24    Male  3.3
9    Jenny   25  Female  3.5


### Analysing Data

In [8]:
# analyzing data of first n (Imagine n = 2)
print(df.head(2)) # first n (here 2)
print("--------------")
print("--------------")
print()

print(df.tail(2)) # last n (here 2)
print("--------------")
print("--------------")
print()

print(df.info()) # gives me info

    Name  Age  Gender  GPA
0  Alice   22  Female  3.6
1    Bob   25    Male  3.2
--------------
--------------

    Name  Age  Gender  GPA
8    Ian   24    Male  3.3
9  Jenny   25  Female  3.5
--------------
--------------

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    10 non-null     object 
 1   Age     10 non-null     int64  
 2   Gender  10 non-null     object 
 3   GPA     10 non-null     float64
dtypes: float64(1), int64(1), object(2)
memory usage: 452.0+ bytes
None


### Reading JSON file 

In [9]:
df = pd.read_json("student_data.json")
print(df)

      name  age  gender  gpa
0    Alice   22  Female  3.6
1      Bob   24    Male  3.2
2  Charlie   23    Male  3.4
3    Diana   21  Female  3.9


## Label Scaling (or Label Encoding)

Label scaling, also known as label encoding, is the process of converting categorical values (such as Male, Female, Rich, Poor etc) into numerical values (like 0, 1, 2). Because computer can not understand string it works on Numerical Values and each Numerical Values will represent the class such as, here 0 will be represent `Female` and 1 will represent `Male`

In [10]:
df['gender'] = df["gender"].map({"Female": 0, "Male": 1})
print(df)

      name  age  gender  gpa
0    Alice   22       0  3.6
1      Bob   24       1  3.2
2  Charlie   23       1  3.4
3    Diana   21       0  3.9
