# Pandas

In [1]:
import pandas as pd

The `DataFrame` is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a `dictionary`.
Let's build the dictionary for a dataframe.

In [2]:
dict ={
    "country" : ["Brazil", "Russia", "India", "China"],
    "capital" : ["Barsila", "Moscow", "New Delhi", "Beijing"],
    "area" : [8.516, 17.10, 3.286, 9.597],
    "population" : [200.4, 143.5, 1252, 1357]
}

Here,<br> 

Rows = Observation <br>
Column = Varibles

Each dictionary key = Column label <br>
Each value is a list = Column elements.

In [3]:
#building dataframe from dictionary
brics = pd.DataFrame(dict)
brics

Unnamed: 0,country,capital,area,population
0,Brazil,Barsila,8.516,200.4
1,Russia,Moscow,17.1,143.5
2,India,New Delhi,3.286,1252.0
3,China,Beijing,9.597,1357.0


In [4]:
#lets change the row label
brics.index = ["BR", "RU", "IN", "CH"]
brics

Unnamed: 0,country,capital,area,population
BR,Brazil,Barsila,8.516,200.4
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


Putting data in a dictionary and then building a DataFrame works, but it's not very efficient. What if we are dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for "comma-separated values".<br>

To import CSV data into Python as a Pandas DataFrame we can use `read_csv()`

In [5]:
new_bricks = pd.read_csv("bricks.csv")
new_bricks

Unnamed: 0.1,Unnamed: 0,country,capital,area,population
0,BR,Brazil,Barsila,8.516,200.4
1,RU,Russia,Moscow,17.1,143.5
2,IN,India,New Delhi,3.286,1252.0
3,CH,China,Beijing,9.597,1357.0


out `read_csv()` call to import the CSV data didn't generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.

`index_col`, an argument of read_csv(), that we can use to specify which column in the CSV file should be used as a row label.

In [6]:
brics = pd.read_csv("bricks.csv", index_col = 0)
brics

Unnamed: 0,country,capital,area,population
BR,Brazil,Barsila,8.516,200.4
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


In [7]:
#regular way of accessing columns
brics["country"]

BR    Brazil
RU    Russia
IN     India
CH     China
Name: country, dtype: object

In [8]:
type(brics["country"])

pandas.core.series.Series

If we want to select `country` coulmn but keep the data in a DataFrame we will need double square brackets like this..

In [15]:
brics[["country"]]

Unnamed: 0,country
BR,Brazil
RU,Russia
IN,India
CH,China


In [12]:
type(brics[["country"]])

pandas.core.frame.DataFrame

We can extend this call to select multiple columns, then we end up with a `sub DataFrame`

In [19]:
brics[["country", "population"]]

Unnamed: 0,country,population
BR,Brazil,200.4
RU,Russia,143.5
IN,India,1252.0
CH,China,1357.0


To select rows from a DataFrame we need to specify `slice`

In [21]:
brics[1:4]

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


In [26]:
brics.loc["RU"]

country       Russia
capital       Moscow
area            17.1
population     143.5
Name: RU, dtype: object

In [50]:
#to get a dataframe
brics.loc[["RU"]]

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5


In [51]:
brics.loc[["RU", "IN", "CH"]]

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


In [33]:
brics.loc[["RU", "IN", "CH"], ["country", "capital"]]

Unnamed: 0,country,capital
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing


In [34]:
brics.loc[:, ["country", "capital"]]

Unnamed: 0,country,capital
BR,Brazil,Barsila
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing


In [35]:
brics.iloc[[1]]

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5


In [44]:
brics.iloc[[1, 2, 3]]

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


In [48]:
brics.iloc[[1, 2, 3], [0, 1]]

Unnamed: 0,country,capital
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing


In [49]:
brics.iloc[:, [0, 1]]

Unnamed: 0,country,capital
BR,Brazil,Barsila
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing
