```[pandas] is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. — Wikipedia```

## What can pandas do ?
- Calculate statistics and answer questions about the data, like
        - What's the average, median, max, or min of each column?
        - Does column A correlate with column B?
        - What does the distribution of data in column C look like?
- Clean the data by doing things like removing missing values and filtering rows or columns by some criteria
- Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.
- Store the cleaned, transformed data back into a CSV, other file or database

## How to Install pandas

In [53]:
!pip install --upgrade pandas



## Start working

In [54]:
import pandas as pd
import numpy as np
import sys

In [55]:
print("Python: " + sys.version.split("|")[0])
print("Pandas: " + pd.__version__)

Python: 3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)]
Pandas: 2.0.0


In [56]:
my_list = [1, 2.33, "asdsd", True, False, None]  # non-homogenous data

df = pd.DataFrame(my_list)
df

Unnamed: 0,0
0,1
1,2.33
2,asdsd
3,True
4,False
5,


In [57]:
my_list = [99, 88, 77, 66, 44, 22]  # homogenous data

my_df = pd.DataFrame(my_list)
my_df

Unnamed: 0,0
0,99
1,88
2,77
3,66
4,44
5,22


In [58]:
print(type(my_df))

<class 'pandas.core.frame.DataFrame'>


In [59]:
my_df

Unnamed: 0,0
0,99
1,88
2,77
3,66
4,44
5,22


In [60]:
my_df = pd.DataFrame(data=my_list)
my_df

Unnamed: 0,0
0,99
1,88
2,77
3,66
4,44
5,22


In [61]:
my_df = pd.DataFrame(data=my_list, columns=("ages",))
my_df

Unnamed: 0,ages
0,99
1,88
2,77
3,66
4,44
5,22


In [62]:
my_df = pd.DataFrame({"ages": [99, 88, 77, 66, 44, 22]})
my_df

Unnamed: 0,ages
0,99
1,88
2,77
3,66
4,44
5,22


In [63]:
my_df = pd.DataFrame(
    {
        "ages": [99, 88, 66, 22],
        "names": ("Ramesh", "suresh", "Ganesh", "Mahesh"),
        "randomData": (23 / 2, None, np.nan, True),
    }
)  # values must be of same length
my_df

Unnamed: 0,ages,names,randomData
0,99,Ramesh,11.5
1,88,suresh,
2,66,Ganesh,
3,22,Mahesh,True


In [64]:
my_df.to_json("output_datasets\persons.json")

In [82]:
! type "output_datasets\persons.json"

{"ages":{"0":99,"1":88,"2":66,"3":22},"names":{"0":"Ramesh","1":"suresh","2":"Ganesh","3":"Mahesh"},"randomData":{"0":11.5,"1":null,"2":null,"3":true}}


In [66]:
my_df.to_csv("output_datasets/persons.csv")

In [83]:
! type "output_datasets\persons.csv"

,ages,names,randomData
0,99,Ramesh,11.5
1,88,suresh,
2,66,Ganesh,
3,22,Mahesh,True


In [85]:
my_df.to_csv("output_datasets/persons1.csv", index=False)

In [87]:
! type "output_datasets\persons1.csv"

ages,names,randomData
99,Ramesh,11.5
88,suresh,
66,Ganesh,
22,Mahesh,True


In [70]:
my_df.to_csv("output_datasets/persons2.csv", index=False, header=False)

In [89]:
! type "output_datasets\persons2.csv"

99,Ramesh,11.5
88,suresh,
66,Ganesh,
22,Mahesh,True


### Reading Data

In [72]:
new_df = pd.read_csv("output_datasets/persons.csv")
new_df

Unnamed: 0.1,Unnamed: 0,ages,names,randomData
0,0,99,Ramesh,11.5
1,1,88,suresh,
2,2,66,Ganesh,
3,3,22,Mahesh,True


In [73]:
new_df = pd.read_csv("output_datasets/persons.csv", header=None)
new_df

Unnamed: 0,0,1,2,3
0,,ages,names,randomData
1,0.0,99,Ramesh,11.5
2,1.0,88,suresh,
3,2.0,66,Ganesh,
4,3.0,22,Mahesh,True


In [74]:
new_df = pd.read_csv("output_datasets/persons.csv", names=["index", "age", "name"])
new_df

Unnamed: 0,index,age,name
,ages,names,randomData
0.0,99,Ramesh,11.5
1.0,88,suresh,
2.0,66,Ganesh,
3.0,22,Mahesh,True


In [75]:
new_df.dtypes

index    object
age      object
name     object
dtype: object

In [76]:
new_df.age.dtypes

dtype('O')

In [77]:
new_df.name.dtypes

dtype('O')

#### Anayzing Data

In [78]:
# Method 1:
Sorted = new_df.sort_values(["age"], ascending=False)
Sorted.head(1)

Unnamed: 0,index,age,name
1.0,88,suresh,


In [79]:
# Method 2:
new_df["age"].max()

'suresh'

In [80]:
try:
    matches_df = pd.read_csv("matches.csv")
except FileNotFoundError as ex:
    print(repr(ex))

FileNotFoundError(2, 'No such file or directory')


In [81]:
try:
    matches_df = pd.read_csv("matches.csv")
except OSError as ex:
    print(repr(ex))

FileNotFoundError(2, 'No such file or directory')
