```[pandas] is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. â€” Wikipedia```

## What can pandas do ?
- Calculate statistics and answer questions about the data, like
        - What's the average, median, max, or min of each column?
        - Does column A correlate with column B?
        - What does the distribution of data in column C look like?
- Clean the data by doing things like removing missing values and filtering rows or columns by some criteria
- Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.
- Store the cleaned, transformed data back into a CSV, other file or database

## How to Install pandas

pip install --upgrade pandas

## Start working

In [1]:
import pandas as pd
import sys

In [2]:
print("Python: " + sys.version.split("|")[0])
print("Pandas: " + pd.__version__)

Python: 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)]
Pandas: 0.25.3


In [3]:
my_list = [99, 88, 77, 66, 44, 22]
print(type(my_list), my_list)

<class 'list'> [99, 88, 77, 66, 44, 22]


In [4]:
my_df = pd.DataFrame(my_list)
print(type(my_df), my_df)

<class 'pandas.core.frame.DataFrame'>     0
0  99
1  88
2  77
3  66
4  44
5  22


In [5]:
my_df

Unnamed: 0,0
0,99
1,88
2,77
3,66
4,44
5,22


In [6]:
my_df = pd.DataFrame(data=my_list)
my_df

Unnamed: 0,0
0,99
1,88
2,77
3,66
4,44
5,22


In [8]:
my_df = pd.DataFrame(data=my_list, columns=("ages",))
my_df

Unnamed: 0,ages
0,99
1,88
2,77
3,66
4,44
5,22


In [9]:
my_df = pd.DataFrame({"ages": [99, 88, 66, 22]})
my_df

Unnamed: 0,ages
0,99
1,88
2,66
3,22


In [11]:
my_df = pd.DataFrame(
    {"ages": [99, 88, 66, 22], "names": ("Ramesh", "suresh", "Ganesh", "Mahesh")}
)  # values must be of same length
my_df

Unnamed: 0,ages,names
0,99,Ramesh
1,88,suresh
2,66,Ganesh
3,22,Mahesh


In [12]:
my_df.to_json("persons.json")

In [13]:
import os

os.listdir()

['.ipynb_checkpoints',
 '01_pandas_csv.py',
 '02_pandas_csv.py',
 'additional_references.txt',
 'Pandas DataFrame Notes.pdf',
 'PandasPythonForDataScience.pdf',
 'Pandas_Cheat_Sheet.pdf',
 'pandas_ex1.py',
 'pandas_ex2.py',
 'pandas_material.ipynb',
 'persons.json',
 'Python_Pandas_Cheat_Sheet_2.pdf',
 'Scikit_Learn_Cheat_Sheet_Python.pdf',
 'TODO']

In [14]:
! type persons.json

{"ages":{"0":99,"1":88,"2":66,"3":22},"names":{"0":"Ramesh","1":"suresh","2":"Ganesh","3":"Mahesh"}}


In [15]:
my_df.to_csv("persons.csv")

In [16]:
! type persons.csv

,ages,names
0,99,Ramesh
1,88,suresh
2,66,Ganesh
3,22,Mahesh


In [17]:
my_df.to_csv("persons1.csv", index=False)

In [18]:
! type persons1.csv

ages,names
99,Ramesh
88,suresh
66,Ganesh
22,Mahesh


In [19]:
my_df.to_csv("persons2.csv", index=False, header=False)

In [20]:
! type persons2.csv

99,Ramesh
88,suresh
66,Ganesh
22,Mahesh


### Reading Data

In [21]:
new_df = pd.read_csv("persons.csv")
new_df

Unnamed: 0.1,Unnamed: 0,ages,names
0,0,99,Ramesh
1,1,88,suresh
2,2,66,Ganesh
3,3,22,Mahesh


In [22]:
new_df = pd.read_csv("persons.csv", header=None)
new_df

Unnamed: 0,0,1,2
0,,ages,names
1,0.0,99,Ramesh
2,1.0,88,suresh
3,2.0,66,Ganesh
4,3.0,22,Mahesh


In [24]:
new_df = pd.read_csv("persons.csv", names=["index", "age", "name"])
new_df

Unnamed: 0,index,age,name
0,,ages,names
1,0.0,99,Ramesh
2,1.0,88,suresh
3,2.0,66,Ganesh
4,3.0,22,Mahesh


In [25]:
new_df.dtypes

index    float64
age       object
name      object
dtype: object

In [27]:
new_df.age.dtypes

dtype('O')

In [30]:
new_df.name.dtypes

dtype('O')

#### Anayzing Data

In [32]:
# Method 1:
Sorted = new_df.sort_values(["age"], ascending=False)
Sorted.head(1)

Unnamed: 0,index,age,name
0,,ages,names


In [33]:
# Method 2:
df["age"].max()

NameError: name 'df' is not defined