# Pandas


* Pandas is a Python library used for working with data sets.
* Contains methods for analyzing, cleaning, exploring, and manipulating data.


## Read csv

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

```python 
#syntax
my_df = pd.read_csv('data.csv')
```

In [15]:
import pandas as pd

my_pdf = pd.read_csv('unique_keywords.csv')

print(my_pdf.to_string()) 

                                               index  count
0                                     CLIMATE CHANGE     54
1                                        STORM SURGE     54
2                                           DOMINODE     47
3                                               CRIS     41
4                                      DRONE IMAGERY     33
5                            ASSET MANAGEMENT SYSTEM     28
6                                           EXPOSURE     18
7                                  BRIDGE RISK INDEX     17
8                                HYDRAULIC CROSSINGS     17
9                                     AERIAL IMAGERY     14
10                                            HAZARD      9
11                           DIGITAL ELEVATION MODEL      8
12                                 SATELLITE IMAGERY      8
13                                             WATER      5
14                                             INDEX      4
15                                      

## Dataframe

A 2 dimensional data structure like a table with rows and columns.
```python
#syntax
my_df = pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
```

In [8]:
#import the pandas library and use pd as a alias to reference pandas and it's methods
import pandas as pd 

#dataset with mixed types
dataset1 = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}
#dataset with similar type
dataset2 = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
#3 column dataset
dataset3 = {
    'X':[78,85,96,80,86], 
    'Y':[84,94,89,83,86],
    'Z':[86,97,96,72,83]
}

#create a dataframe for each dataset
df1 = pd.DataFrame(dataset1)
df2 = pd.DataFrame(dataset2)
df3 = pd.DataFrame(dataset3)

print(df1)
print(df2)
print(df3)

    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2
   calories  duration
0       420        50
1       380        40
2       390        45
    X   Y   Z
0  78  84  86
1  85  94  97
2  96  89  96
3  80  83  72
4  86  86  83


## DataSeries

A Pandas Series is like a column in a table. It is a one-dimensional array holding data of any type.

```python
#syntax
my_df = pandas.Series()
```


In [12]:
import pandas as pd

dataset1 = [2, 4, 6, 8, 10]
dataset2 = {"day 1": 420, "day 2": 380, "day 3": 390}
            
s1 = pd.Series(dataset1)
s2 = pd.Series(dataset2)

print(s1)
print(s2)

0     2
1     4
2     6
3     8
4    10
dtype: int64
day 1    420
day 2    380
day 3    390
dtype: int64


## Activity 8

Write a pandas program to create a dataframe from the following dictionary and display it.
```python 
chipotle = {
"order_id":[1,1,2,3],
"quantity":[1,1,2,1],
"item_name":["Chips and Fresh Tomato Salsa","Nantucket Nectar","Chicken Bowl","Chicken Bowl"],
"choice_description":["", "Apple","Tomatillo-Red Chili Salsa (Hot)","Fresh Tomato Salsa (Mild)"],
"item_price":["$2.39","$3.39","$16.98","$10.98"]
}
```


## Activity 9


DataFrame.head(n) : returns the first n rows for the object based on position
DataFrame.info() : prints a concise summary of a DataFrame

Write a Pandas program to 
* create a DataFrame from the following dictionary, using the specified labels list.
* display a summary of the basic information about the created DataFrame and its data.
* display the first 6 rows of the DataFrame

```python
exam_data  = {
    'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
    'score': [12.5, 9, 16.5, 0, 9, 20, 14.5, 0, 8, 19],
    'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
    'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```

## Activity 10

DataFrame.agg(func=None, axis=0, *args, **kwargs)
* Aggregate using one or more operations over the specified axis    
* fun : list of functions and/or function names, e.g. [np.sum, 'mean']
* example
```python
data = {
  "x": [50, 40, 30],
  "y": [300, 1112, 42]
}

df = pd.DataFrame(data)

x = df.agg([

"sum"])
#OUTPUT
#      x      y
#sum   120 
   14`
    
    

Write a Pandas program to split the following dataframe by school code and get mean, min, and max value of weight for each school.

```python
school_data = {
    'school_code': ['s001','s002','s003','s001','s002','s004'],
    'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
    'date_Of_Birth ': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'age': [12, 12, 13, 13, 14, 12],
    'height': [173, 192, 186, 167, 151, 159],
    'weight': [35, 32, 33, 30, 31, 32],
    'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']
}
labels = ['S1', 'S2', 'S3', 'S4', 'S5', 'S6']
```