## Introduction

We'll learn a couple of different techniques for combining data using pandas to easily handle situations like pulling data from mulitiple sources.

__Goal:__ analyze the 2015, 2016, and 2017 World Happiness Reports. Specifically, we'll look to answer the following question:
```
Did world happiness increase, decrease, or stay about the same from 2015 to 2017?
```
As a reminder, these reports assign each country a happiness score based on a poll question that asks respondents to rank their life on a scale of 0 - 10, so "world happiness" refers to this definition specifically.

Descriptions for some of the columns:
- `Country` - Name of the country
- `Region` - Name of the region the country belongs to
- `Happiness Rank` - The rank of the country, as determined by its happiness score
- `Happiness Score` - A score assigned to each country based on the answers to a poll question that asks respondents to rate their happiness on a scale of 0-10

In [1]:
import pandas as pd
import numpy as np

In [2]:
happiness2015 = pd.read_csv('data/World_Happiness_2015.csv')
happiness2015['Year'] = 2015
happiness2016 = pd.read_csv('data/World_Happiness_2016.csv')
happiness2016['Year'] = 2016
happiness2017 = pd.read_csv('data/World_Happiness_2017.csv')
happiness2017['Year'] = 2017

### Combining Dataframes with the Concat Function

The `concat()` function combines dataframes one of two ways:
1. Stacked: Axis = 0 (This is the default option.)
<img src='_images/Concat_Updated.svg' />
2. Side by Side: Axis = 1
<img src='_images/Concat_Axis1.svg' />

Since `concat` is a function, not a method, we use the syntax below:
<img src='_images/Concat_syntax.svg' />

In [3]:
head_2015 = happiness2015[['Country', 'Happiness Score', 'Year']].head(3)
head_2016 = happiness2016[['Country', 'Happiness Score', 'Year']].head(3)
concat_axis0 = pd.concat([head_2015, head_2016], axis=0)
concat_axis1 = pd.concat([head_2015, head_2016], axis=1)
#question1 = concat_axis0.shape[0]
question1 = 6
#question2 = concat_axis1.shape[0]
question2 = 3

We merely pushed the dataframes together vertically or horizontally - none of the values, column names, or indexes changed.

The `concat()` function combines dataframes with the same shape and index, as if "gluing" dataframes together.

However, what happens if the dataframes have different shapes or columns?

In [5]:
head_2015 = happiness2015[['Year', 'Country', 'Happiness Score', 'Standard Error']].head(4)
head_2016 = happiness2016[['Country', 'Happiness Score', 'Year']].head(3)
concat_axis0 = pd.concat([head_2015, head_2016], axis=0, sort=False)
rows = 7
columns = 4

The analogy of "gluing" dataframes together doesn't fully describe what happens when concatenating dataframes of different shapes. Instead, the function combined the data according to the corresponding column names.

Note that because the `Standard Error` column didn't exist in `head_2016`, `NaN` values were created to signify those values are missing.

Also, notice again the indexes of the original dataframes didn't change. If the indexes aren't meaningful, it can be better to reset them.

### Combining Dataframes with Different Shapes Using the Concat Function

By default, the `concat` function will keep ALL of the data, no matter if missing values are created.

The `concat` function has a parameter, `ignore_index`, that can be used to clear the existing index and reset it in the result.