# Pandas

## Pandas DataFrames
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.

There are several ways to create a DataFrame


### Create a DataFrame using dictionary

In [27]:
dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
       "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
       "area": [8.516, 17.10, 3.286, 9.597, 1.221],
       "population": [200.4, 143.5, 1252, 1357, 52.98] }

import pandas as pd
brics = pd.DataFrame(dict)
print(brics)
print('------------------------------------------------------------------\n)')

# with your own index row labels
brics.index = ["BR", "RU", "IN", "CH", "SA"]
print(brics)

     area    capital       country  population
0   8.516   Brasilia        Brazil      200.40
1  17.100     Moscow        Russia      143.50
2   3.286  New Dehli         India     1252.00
3   9.597    Beijing         China     1357.00
4   1.221   Pretoria  South Africa       52.98
------------------------------------------------------------------
)
      area    capital       country  population
BR   8.516   Brasilia        Brazil      200.40
RU  17.100     Moscow        Russia      143.50
IN   3.286  New Dehli         India     1252.00
CH   9.597    Beijing         China     1357.00
SA   1.221   Pretoria  South Africa       52.98


### Create a DataFrame using import
First example reads rom a cvs file

In [28]:
# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
cars = pd.read_csv('cars.csv')

# Print out cars
print(cars)



  Unnamed: 0  cars_per_cap        country  drives_right
0         US           809  United States          True
1        AUS           731      Australia         False
2        JAP           588          Japan         False
3         IN            18          India         False
4         RU           200         Russia          True
5        MOR            70        Morocco          True
6         EG            45          Egypt          True


is is also easy to read from an excel file

In [29]:
# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
cars_2 = pd.read_excel("cars.xlsx")

# Print out cars
print(cars_2)

       cars_per_cap         country  drives_right
US               809  United States          True
AUS              731     Australia          False
JAP              588          Japan         False
IN                18          India         False
RU               200         Russia          True
MOR               70        Morocco          True
EG                45          Egypt          True


## Indexing DataFrames

There are several ways to index a Pandas DataFrame. One of the easiest ways to do this is by using square bracket notation.


### Pandas Series vs Pandas DataFrame 
In the example below, you can use square brackets to select one column of a DataFrame. You can either use a single bracket or a double bracket. The **_single bracket_** with output a **_Pandas Series_**(1 dim, same type), while a **_double bracket_** will output a **_Pandas DataFrame_** (table multi type).

In [30]:
# Import pandas and cars.csv
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out country column as Pandas Series
print("country column as Pandas Series:\n")
ser=cars['cars_per_cap']
print(ser)
print (type(ser))
print('------------------------------------------------------------------\n')

# Print out country column as Pandas DataFrame
print("country column as Pandas DataFrame:\n")
dt_frm=cars[['cars_per_cap']]
print(dt_frm)
print (type(dt_frm))
print('------------------------------------------------------------------\n')

# Print out DataFrame with country and drives_right columns
print("DataFrame with country and drives_right columns:\n")
dt_frm_2=cars[['cars_per_cap', 'country']]
print(dt_frm_2)
print (type(dt_frm_2))

country column as Pandas Series:

US     809
AUS    731
JAP    588
IN      18
RU     200
MOR     70
EG      45
Name: cars_per_cap, dtype: int64
<class 'pandas.core.series.Series'>
------------------------------------------------------------------

country column as Pandas DataFrame:

     cars_per_cap
US            809
AUS           731
JAP           588
IN             18
RU            200
MOR            70
EG             45
<class 'pandas.core.frame.DataFrame'>
------------------------------------------------------------------

DataFrame with country and drives_right columns:

     cars_per_cap        country
US            809  United States
AUS           731      Australia
JAP           588          Japan
IN             18          India
RU            200         Russia
MOR            70        Morocco
EG             45          Egypt
<class 'pandas.core.frame.DataFrame'>


### Column access 
Column can be accessed using square brackets or by "named function call
A singel column could be a Pandas Series or a Pandas DataFrame 

#### Access 1 column

In [37]:
# Import cars data
import pandas as pd
brics = pd.read_csv('brics.csv', index_col = 0)

# Print out all coubtries using bracket and qoutes
brics_1 =brics["country"]
print (brics_1)
print(type(brics_1))
print('------------------------------------------------------------------\n')

# Print out all coubtries using class constatn 
brics_2 =brics.country
print (brics_2)
print(type(brics_2))


brics_2 =brics.country

BR          Brazil
RU          Russia
IN           India
CH           China
SA    South Africa
Name: country, dtype: object
<class 'pandas.core.series.Series'>
------------------------------------------------------------------

BR          Brazil
RU          Russia
IN           India
CH           China
SA    South Africa
Name: country, dtype: object
<class 'pandas.core.series.Series'>


#### Access several columns
several columns will be represented as Pandas DataFrame

In [81]:
# Import cars data
import pandas as pd
brics = pd.read_csv('brics.csv', index_col = 0)

# Print out 2 columns usinh bracket notation with colum named with index labels
brics_1 =brics[["country","area"]]
print('brics[["country","area"]]')
print (brics_1)
print(type(brics_1))
print('------------------------------------------------------------------\n')

# Print out 2 columns usinh bracket notation with colum number index
print('brics.iloc[:,0:2]')
brics_2 =brics.iloc[:,0:2]
print (brics_2)
print(type(brics_2))
print('------------------------------------------------------------------\n')

# Print out 2 columns using loc and column index label in bracket
print('=brics.loc[:,["country","area"]')
brics_3 =brics.loc[:,["country","area"]]
print (brics_3)
print(type(brics_3))
print('------------------------------------------------------------------\n')

# Print out 2 columns using loc and column index label as slicers
print('=brics.loc[:,["country","area"]')
brics_4 =brics.loc[:,"country":"area"]
print (brics_4)
print(type(brics_4))
print('------------------------------------------------------------------\n')

# with 2 different column ranges, index based slicing: 
print('brics_5=brics[brics.columns[0:1].tolist() + brics.columns[2:4].tolist()]')
brics_5=brics[brics.columns[0:1].tolist() + brics.columns[2:4].tolist()]
print(brics_5)

brics[["country","area"]]
         country      area
BR        Brazil   8515767
RU        Russia  17098242
IN         India   3287590
CH         China   9596961
SA  South Africa   1221037
<class 'pandas.core.frame.DataFrame'>
------------------------------------------------------------------

brics.iloc[:,0:2]
         country  population
BR        Brazil         200
RU        Russia         144
IN         India        1252
CH         China        1357
SA  South Africa          55
<class 'pandas.core.frame.DataFrame'>
------------------------------------------------------------------

=brics.loc[:,["country","area"]
         country      area
BR        Brazil   8515767
RU        Russia  17098242
IN         India   3287590
CH         China   9596961
SA  South Africa   1221037
<class 'pandas.core.frame.DataFrame'>
------------------------------------------------------------------

=brics.loc[:,["country","area"]
         country  population      area
BR        Brazil         200   851576

### Access observation (row) with brackets
Square brackets can also be used to access observations (rows) from a DataFrame. For example:

In [31]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out first 4 observations  as Pandas DataFrame 
obser=cars[0:4]
print(obser)
print (type(obser))
print('------------------------------------------------------------------\n')

# Print out fifth, sixth, and seventh observation as Pandas DataFrame
obser=cars[4:6]
print(obser)
print(type(obser))

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
IN             18          India         False
<class 'pandas.core.frame.DataFrame'>
------------------------------------------------------------------

     cars_per_cap  country  drives_right
RU            200   Russia          True
MOR            70  Morocco          True
<class 'pandas.core.frame.DataFrame'>


### Access observation (row) with methods
You can also use **_loc_** and **_iloc_** to perform just about any data selection operation. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns 

In [62]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out observation for Japan as Pandas Series use cars.iloc[[2]] for Pandas DataFrame
x= cars.iloc[2]
print(x)
print(type(x))
print('------------------------------------------------------------------\n')

# Print out observations for Australia and Egypt as Pandas DataFrame
y=cars.loc[['AUS', 'EG']]
print(y)
print(type(y))

cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object
<class 'pandas.core.series.Series'>
------------------------------------------------------------------

     cars_per_cap    country  drives_right
AUS           731  Australia         False
EG             45      Egypt          True
<class 'pandas.core.frame.DataFrame'>


### Acceess matrix
different methods to do selective m<trix slicing including selective label based, index based and the selective ranges based column slicing.

In [86]:
import pandas as pd
brics = pd.read_csv('brics.csv', index_col = 0)
print(brics)
print('------------------------------------------------------------------\n')

#label based selective column slicing 
brics_1 =brics.loc[["BR","RU"], ["country", "populatio", "capital"]] 
print(brics_1)
print('------------------------------------------------------------------\n')

# label based column ranges slicing 
brics_2 =brics.loc["BR":"IN", "country":"area"] 
print(brics_2)
print('------------------------------------------------------------------\n')

# ## index based column ranges slicing 
print("index based column ranges slicing ")
brics_3 =brics.iloc[0:3, 0:3] 
print(brics_3)
print('------------------------------------------------------------------\n')

# with 2 different column ranges, index based slicing: 
brics_4=brics[brics.columns[0:1].tolist() + brics.columns[2:4].tolist()]
print(brics_4)




         country  population      area    capital
BR        Brazil         200   8515767   Brasilia
RU        Russia         144  17098242     Moscow
IN         India        1252   3287590  New Delhi
CH         China        1357   9596961    Beijing
SA  South Africa          55   1221037   Pretoria
------------------------------------------------------------------

   country  populatio   capital
BR  Brazil        NaN  Brasilia
RU  Russia        NaN    Moscow
------------------------------------------------------------------

   country  population      area
BR  Brazil         200   8515767
RU  Russia         144  17098242
IN   India        1252   3287590
------------------------------------------------------------------

index based column ranges slicing 
   country  population      area
BR  Brazil         200   8515767
RU  Russia         144  17098242
IN   India        1252   3287590
------------------------------------------------------------------

         country      area    cap