# Pandas package
    - Based on Numpy
    - Efficient data manipulation through Dataframe structure
    - Pandas Dataframe based on Numpy (efficient calculations and data wrangling)
    - Similar to R data manipulation
    - Similar to SQL
    - Connections: SQL databases (e.g. postgre), NoSQL (e.g. HDFS), Cloud (e.g. S3)... CSV
    - It is NOT distributed (it is limited with computers RAM)

## Importing data in Pandas
    - Read from local files
    - Read from Python package (pydataset, sklearn.datasets)
    - Read from web location
    - Manual creation
    - Reading line by line # Add this could be important

In [3]:
import pandas as pd
#!pip install pydataset
from pydataset import data # library with preinstalled datasets
from sklearn import datasets # sklearn standard ML library for python has its own datasets

In [23]:
a = data() # Available datasets from pydataset.data

In [24]:
type(a)

pandas.core.frame.DataFrame

In [25]:
# Read datasets from pydataset.data
titanic=data('titanic') # load the data
housing=data('Housing')

In [4]:
# read_csv from web location or from hard drive
drinks=pd.read_csv('http://bit.ly/drinksbycountry')
type(drinks)

pandas.core.frame.DataFrame

In [29]:
drinks

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa
5,Antigua & Barbuda,102,128,45,4.9,North America
6,Argentina,193,25,221,8.3,South America
7,Armenia,21,179,11,3.8,Europe
8,Australia,261,72,212,10.4,Oceania
9,Austria,279,75,191,9.7,Europe


head/tail methods retrieve top/bottom N records, where default is 5

In [30]:
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [31]:
drinks.tail(10)

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
183,Tanzania,36,6,1,5.7,Africa
184,USA,249,158,84,8.7,North America
185,Uruguay,115,35,220,6.6,South America
186,Uzbekistan,25,101,8,2.4,Asia
187,Vanuatu,21,18,11,0.9,Oceania
188,Venezuela,333,100,3,7.7,South America
189,Vietnam,111,2,1,2.0,Asia
190,Yemen,6,0,0,0.1,Asia
191,Zambia,32,19,4,2.5,Africa
192,Zimbabwe,64,18,4,4.7,Africa


## Selecting Data from pd.DataFrame (Slicing and Dicing)

Pandas dataframe has convenient slice operator that can be used for:
- selection of columns based on their names (indices) or 
- selection of rows based on row number (index mask)

More consistant methods for slicing and dicing of a pd.DataFrame are:
- iloc is based on integer indices (similar to ndarray selection)
- loc is based on label indices, but may also be used with a boolean array.

Detailed information on selecting and indexing data may be find on the following [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)

Records can also be selected with [Filter function](# Filter function
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.filter.html), but this will not be covered in this tutorial

### Slicer examples

In [40]:
# First check Index of columns in a pd.DataFrame
drinks.columns

Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

In [42]:
# Column selection. Slicer accepts one index name or list of index names
drinks[['country', 'spirit_servings']]
# note that this example uses list (that is why double brackets are used)

Unnamed: 0,country,spirit_servings
0,Afghanistan,0
1,Albania,132
2,Algeria,0
3,Andorra,138
4,Angola,57
5,Antigua & Barbuda,128
6,Argentina,25
7,Armenia,179
8,Australia,72
9,Austria,75


In [46]:
# Row selection based on integer index (step operator)
drinks[:5]

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [63]:
drinks[['beer_servings']].columns

Index(['beer_servings'], dtype='object')

In [50]:
# Row selection based on boolean array
sr_records = (drinks['country'] == 'Serbia')
drinks[sr_records]

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
151,Serbia,283,131,127,9.6,Europe


In [53]:
# Another example
drinks[drinks.spirit_servings>300]
# Note that columns may be accessed as df.column_name

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
15,Belarus,142,373,42,14.4,Europe
68,Grenada,199,438,28,11.9,North America
72,Guyana,93,302,1,7.1,South America
73,Haiti,1,326,1,5.9,North America
141,Russian Federation,247,326,73,11.5,Asia
144,St. Lucia,171,315,71,10.1,North America


Unfortunately slice operator do not allow Dicing (selection based on both rows and columns)
This can be achieved with .loc and .iloc functions

In [65]:
drinks[0:2, ['beer_servings', 'wine_servings']]

TypeError: unhashable type: 'slice'

### Indices and index labels

Pandas DataFrame is indexed on Rows and Columns. Index is special object and allows consistant data manipulation (select, update, merge, join etc.)

Index for rows (row numbers) is automatically generated when DataFrame is imported.
If custom index for rows is created, row numbers represent Index labels (they can still be accessed with specialized methods)

Index for columns is in most cases defined by attribute names. If attribute names are not provided, pd.DataFrame creates default indices

In [2]:
import pandas as pd
drinks=pd.read_csv('http://bit.ly/drinksbycountry')

In [40]:
# Automatically generated index for rows
drinks.index

RangeIndex(start=0, stop=193, step=1)

In [41]:
# pd.DataFrame by default takes first row as column index. 
#If header = None is specified in pd.read_csv index will be range of integers
drinks.columns

Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

In [42]:
drinks[2:5]

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [1]:
# Custom index can be specified
drinks=pd.read_csv('http://bit.ly/drinksbycountry')
drinks = drinks.set_index('country')
drinks.head()
# Note that RangeIndex does not exist anymore

NameError: name 'pd' is not defined

In [50]:
# dtypes does not return country anymore, now it is index
drinks.dtypes

beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

In [49]:
# Hower rang still exists as index label and Dataframe can be sliced based on row numbers
drinks[2:5]

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,217,57,45,5.9,Africa


###  .loc examples 

.loc function allows slicing and dicing based on index values

In [4]:
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,Asia
Albania,89,132,54,4.9,Europe
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,217,57,45,5.9,Africa


In [12]:
drinks.loc['Afghanistan':'Algeria', 'spirit_servings':'wine_servings']

Unnamed: 0_level_0,spirit_servings,wine_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,0,0
Albania,132,54
Algeria,0,14


In [13]:
drinks.loc[['Afghanistan', 'Angola'], ['wine_servings', 'beer_servings']]

Unnamed: 0_level_0,wine_servings,beer_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,0,0
Angola,45,217


### Update DataFrame values with .loc
Parts of dataframe may be updated by assigning data matiric.
! Note that change is reflected on DataFrame (no need for assignment to new variable)

In [15]:
drinks.loc[['Afghanistan', 'Angola'], ['wine_servings', 'beer_servings']] = [[2,3],[3,4]]

In [16]:
drinks.loc[['Afghanistan', 'Angola'], ['wine_servings', 'beer_servings']]

Unnamed: 0_level_0,wine_servings,beer_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,2,3
Angola,3,4


### iloc examples 

iloc selects the data based on index labels (ineger row numbers)

In [6]:
drinks.iloc[2:5, 0:2]

Unnamed: 0_level_0,beer_servings,spirit_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Algeria,25,0
Andorra,245,138
Angola,217,57


In [8]:
# Step operator works the same way as for ndarrays
drinks.iloc[2:5:2, :-1]

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Algeria,25,0,14,0.7
Angola,217,57,45,5.9


In [9]:
drinks.iloc[2:5, ::-1]

Unnamed: 0_level_0,continent,total_litres_of_pure_alcohol,wine_servings,spirit_servings,beer_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Algeria,Africa,0.7,14,0,25
Andorra,Europe,12.4,312,138,245
Angola,Africa,5.9,45,57,217


In [51]:
# similarly to .loc, iloc allows updates of values in the dataframe
drinks.iloc[0:2, 0:2] = [[5,6],[7,8]]

In [21]:
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,5,6,2,0.0,Asia
Albania,7,8,54,4.9,Europe
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,4,57,3,5.9,Africa


## Mini Task 

1. Read drinks dataframe
2. Show statistics for all European countries
3. Show beer_servings and spirit_servings for Serbia
4. Assign index to attribute country
5. How many beer servings are there in Serbia and Germany

In [None]:
'''
drinks.columns[0]
(drinks['country'] == 'Serbia').dtype # This line returns boolean array
# What are the statistics in my country?
my_country = 'Serbia'
drinks['country'] == 'Serbia' # This line returns Series with type 'bool'
drinks[drinks['country'] == my_country] # Pandas may be sliced by 'Bool series'
'''


## Describe DataFrame

In [83]:
# read dataframe
drinks=pd.read_csv('http://bit.ly/drinksbycountry')

### len and shape
- len returns number of rows (the same usage as for lists)
- shape returns matrix(dataframe) dimensions, excluding indices.
- Note that len is a function and shape is property of pd.DataFrame

In [84]:
print(len(drinks))
print(drinks.shape)

193
(193, 6)


### pd.describe function and selection by attribute type 
- Describe function gives basic statistics about attributes
- If DataFrame contains mixed datatypes, by default statistics for NUMERICAL features will be shown
- In order to inspect statistics of categorical features, select only categorical features

In [85]:
# Note that there is no description of country and continent attributes
drinks.describe()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


In [86]:
# pd.describe returns pd.DataFrame
type(drinks.describe())

pandas.core.frame.DataFrame

In [87]:
# Check types in DataFrame
drinks.dtypes

country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

In [88]:
# pd.select_dtypes is convenient method for selection of columns of specific type
# data with specific dtypes may be included or excluded
drinks_categorical = drinks.select_dtypes(include = 'object')

In [89]:
drinks_categorical.describe()

Unnamed: 0,country,continent
count,193,193
unique,193,6
top,Togo,Africa
freq,1,53


In [90]:
drinks_numerical = drinks.select_dtypes(exclude  = 'object')

### value_counts
- value counts function provides number of instances for each attribute value
- usefull with categorical and binary attributes (not with continuous ones)

In [70]:
drinks_categorical['beer_servings'].value_counts()

Africa           53
Europe           45
Asia             44
North America    23
Oceania          16
South America    12
Name: continent, dtype: int64

### Other statistics on DataFrame

In [92]:
drinks_numerical.sum(axis = 0)

beer_servings                   20489.0
spirit_servings                 15632.0
wine_servings                    9544.0
total_litres_of_pure_alcohol      910.4
dtype: float64

In [94]:
drinks_numerical.sum(axis=1).head()

0      0.0
1    279.9
2     39.7
3    707.4
4    324.9
dtype: float64

In [95]:
drinks_numerical.mean()

beer_servings                   106.160622
spirit_servings                  80.994819
wine_servings                    49.450777
total_litres_of_pure_alcohol      4.717098
dtype: float64

In [96]:
drinks_numerical.median()

beer_servings                   76.0
spirit_servings                 56.0
wine_servings                    8.0
total_litres_of_pure_alcohol     4.2
dtype: float64

In [97]:
drinks_categorical.count()

country      193
continent    193
dtype: int64

## Manual Creation of Pandas dataframe and Appending Rows and Columns

pandas.DataFrame method enables multiple ways for manual creation of dataframes:
- [from records](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_records.html#pandas.DataFrame.from_records)
- [from dictionary](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html#pandas.DataFrame.from_dict)
- [from items](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_items.html#pandas.DataFrame.from_items)

Additionally pd.DataFrame constructor may be used with different datatypes.
Examples may be found [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).


In [72]:
# Create pd.DataFrame from list of dictionaries
# Each dictionary represents one row

ld=[{"Player": "Milos Teodosic", 'Points':6, 'Assists':10},
    {"Player": "Bogdan Bogdanovic", 'Points':15},
    {"Player": "Nikola Jokic", 'Points':22},
   ]

In [73]:
players=pd.DataFrame(ld)

In [74]:
players

Unnamed: 0,Assists,Player,Points
0,10.0,Milos Teodosic,6
1,,Bogdan Bogdanovic,15
2,,Nikola Jokic,22


In [77]:
# Append new record as a dictionary
players = players.append({'Name': 'Boban'}, ignore_index=True)
# Note that values that new attribute is added and values that are not provided are replaced with NaN

In [76]:
players

Unnamed: 0,Assists,Player,Points,Name
0,10.0,Milos Teodosic,6.0,
1,,Bogdan Bogdanovic,15.0,
2,,Nikola Jokic,22.0,
3,,,,Boban


## Renaming attributes
- Attributes may be renamed in different ways
        -rename function
        -columns property

In [78]:
# Rename function can use dictionary (map) as columns parameter
players = players.rename(columns = {'Player':'Igrac', 'Assists':'Asistencije'})

In [79]:
players

Unnamed: 0,Asistencije,Igrac,Points,Name
0,10.0,Milos Teodosic,6.0,
1,,Bogdan Bogdanovic,15.0,
2,,Nikola Jokic,22.0,
3,,,,Boban
4,,,,Boban


In [80]:
# Attributes may be renamed by providing list of attribute names
new_atts = ['Asistencije', 'Igrac', 'Poeni', 'Ime']

In [81]:
players.columns = new_atts
players

Unnamed: 0,Asistencije,Igrac,Poeni,Ime
0,10.0,Milos Teodosic,6.0,
1,,Bogdan Bogdanovic,15.0,
2,,Nikola Jokic,22.0,
3,,,,Boban
4,,,,Boban


## Adding, removing columns

In [204]:
housing['lotsize']

1       5850
2       4000
3       3060
4       6650
5       6360
6       4160
7       3880
8       4160
9       4800
10      5500
11      7200
12      3000
13      1700
14      2880
15      3600
16      3185
17      3300
18      5200
19      3450
20      3986
21      4785
22      4510
23      4000
24      3934
25      4960
26      3000
27      3800
28      4960
29      3000
30      4500
31      3500
32      3500
33      4000
34      4500
35      6360
36      4500
37      4032
38      5170
39      5400
40      3150
41      3745
42      4520
43      4640
44      8580
45      2000
46      2160
47      3040
48      3090
49      4960
50      3350
51      5300
52      4100
53      9166
54      4040
55      3630
56      3620
57      2400
58      7260
59      4400
60      2400
61      4120
62      4750
63      4280
64      4820
65      5500
66      5500
67      5040
68      6000
69      2500
70      4095
71      4095
72      3150
73      1836
74      2475
75      3210
76      3180
77      1650

In [205]:
housing['lotsize_10']=housing.lotsize/10

In [206]:
housing.columns

Index(['price', 'lotsize', 'bedrooms', 'bathrms', 'stories', 'driveway',
       'recroom', 'fullbase', 'gashw', 'airco', 'garagepl', 'prefarea',
       'lotsize_10'],
      dtype='object')

In [207]:
housing.loc[:, ['lotsize', 'lotsize_10']].head()

Unnamed: 0,lotsize,lotsize_10
1,5850,585
2,4000,400
3,3060,306
4,6650,665
5,6360,636


In [217]:
housing.drop(4, axis=0, inplace=True)

In [218]:
housing.head()

Unnamed: 0,price,bedrooms,bathrms,stories,driveway,recroom,fullbase,gashw,airco,garagepl,prefarea,lotsize_10
1,42000,3,1,2,yes,no,yes,no,no,1,no,585
2,38500,2,1,1,yes,no,no,no,no,0,no,400
3,49500,3,1,1,yes,no,no,no,no,0,no,306
5,61000,2,1,1,yes,no,no,no,no,0,no,636
6,66000,3,1,1,yes,yes,yes,no,yes,0,no,416


In [174]:
# great link for droping rows and columns in Pandas https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/

In [212]:
housing.columns

Index(['price', 'bedrooms', 'bathrms', 'stories', 'driveway', 'recroom',
       'fullbase', 'gashw', 'airco', 'garagepl', 'prefarea', 'lotsize_10'],
      dtype='object')

## Sort
    -https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html

## Pandas Series
  - documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html 

In [219]:
type(titanic['age'])

pandas.core.series.Series

In [220]:
titanic.age

1       adults
2       adults
3       adults
4       adults
5       adults
6       adults
7       adults
8       adults
9       adults
10      adults
11      adults
12      adults
13      adults
14      adults
15      adults
16      adults
17      adults
18      adults
19      adults
20      adults
21      adults
22      adults
23      adults
24      adults
25      adults
26      adults
27      adults
28      adults
29      adults
30      adults
31      adults
32      adults
33      adults
34      adults
35      adults
36      adults
37      adults
38      adults
39      adults
40      adults
41      adults
42      adults
43      adults
44      adults
45      adults
46      adults
47      adults
48      adults
49      adults
50      adults
51      adults
52      adults
53      adults
54      adults
55      adults
56      adults
57      adults
58      adults
59      adults
60      adults
61      adults
62      adults
63      adults
64      adults
65      adults
66      adults
67      ad

In [21]:
titanic['age'].size

1316

In [22]:
print(type(titanic))
print(type(titanic['age']))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


In [23]:
print(titanic.age)

1       adults
2       adults
3       adults
4       adults
5       adults
6       adults
7       adults
8       adults
9       adults
10      adults
11      adults
12      adults
13      adults
14      adults
15      adults
16      adults
17      adults
18      adults
19      adults
20      adults
21      adults
22      adults
23      adults
24      adults
25      adults
26      adults
27      adults
28      adults
29      adults
30      adults
31      adults
32      adults
33      adults
34      adults
35      adults
36      adults
37      adults
38      adults
39      adults
40      adults
41      adults
42      adults
43      adults
44      adults
45      adults
46      adults
47      adults
48      adults
49      adults
50      adults
51      adults
52      adults
53      adults
54      adults
55      adults
56      adults
57      adults
58      adults
59      adults
60      adults
61      adults
62      adults
63      adults
64      adults
65      adults
66      adults
67      ad

In [221]:
titanic.age.unique() # Take unique values - there is no distinct function

array(['adults', 'child'], dtype=object)

In [222]:
type(titanic.age.unique())

numpy.ndarray

In [12]:
import numpy as np

In [19]:
titanic.age.unique()

array(['adults', 'child'], dtype=object)

In [14]:
len(titanic.age.unique())

2

In [223]:
titanic.count()

klasa       1316
age         1316
gender      1316
survived    1316
dtype: int64

In [225]:
len(titanic)

1316

In [226]:
housing.count()

price         545
bedrooms      545
bathrms       545
stories       545
driveway      545
recroom       545
fullbase      545
gashw         545
airco         545
garagepl      545
prefarea      545
lotsize_10    545
dtype: int64

In [47]:
drinks.continent.head()

country
Afghanistan      Asia
Albania        Europe
Algeria        Africa
Andorra        Europe
Angola         Africa
Name: continent, dtype: object

In [241]:
# .value_counts returns series with index
drinks.continent.value_counts()

Africa           53
Europe           45
Asia             44
North America    23
Oceania          16
South America    12
Name: continent, dtype: int64

In [242]:
# Value counts returns
drinks.continent.value_counts().index

Index(['Africa', 'Europe', 'Asia', 'North America', 'Oceania',
       'South America'],
      dtype='object')

In [128]:
drinks.continent.value_counts().values

array([53, 45, 44, 23, 16, 12])

In [50]:
drinks.continent.value_counts()['Africa']

53

In [129]:
drinks.continent.value_counts().sort_values()

South America    12
Oceania          16
North America    23
Asia             44
Europe           45
Africa           53
Name: continent, dtype: int64

In [130]:
drinks.continent.value_counts().sort_index()

Africa           53
Asia             44
Europe           45
North America    23
Oceania          16
South America    12
Name: continent, dtype: int64

In [243]:
people=pd.Series([30000000, 85000], index=['Serbia', 'Belarus'], name='population')
people

Serbia     30000000
Belarus       85000
Name: population, dtype: int64

### Alignment of rows in dataframe based on index

In [57]:
### Total beer servings for each Country (Average per person multiplied by Number of peopel)

In [149]:
drinks.beer_servings.head()

0      0
1     89
2     25
3    245
4    217
Name: beer_servings, dtype: int64

In [244]:
drinks.beer_servings*people

Afghanistan                            NaN
Albania                                NaN
Algeria                                NaN
Andorra                                NaN
Angola                                 NaN
Antigua & Barbuda                      NaN
Argentina                              NaN
Armenia                                NaN
Australia                              NaN
Austria                                NaN
Azerbaijan                             NaN
Bahamas                                NaN
Bahrain                                NaN
Bangladesh                             NaN
Barbados                               NaN
Belarus                           12070000
Belgium                                NaN
Belize                                 NaN
Benin                                  NaN
Bhutan                                 NaN
Bolivia                                NaN
Bosnia-Herzegovina                     NaN
Botswana                               NaN
Brazil     

In [245]:
pd.concat([drinks, people], axis=1).head()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent,population
Afghanistan,0,0,0,0.0,Asia,
Albania,89,132,54,4.9,Europe,
Algeria,25,0,14,0.7,Africa,
Andorra,245,138,312,12.4,Europe,
Angola,217,57,45,5.9,Africa,


## Join, Merge 