# 1. Let's load needed libraries

In [None]:
# we import the library pandas and give it the "pd" kickname
import pandas as pd

# 2. Let's load the gapminder dataset

In [None]:
# we use pandas.read_csv() function to access the file "gapminder.tsv" stored in a remote location 

# the remote location is: https://raw.githubusercontent.com/thousandoaks/BEMM458/master/data/

# with the argument sep='\t' we indicate that the columns are separated by tabs rather than commas.

df = pd.read_csv('https://raw.githubusercontent.com/thousandoaks/BEMM458/master/data/gapminder.tsv', sep='\t')



### df is a DataFrame.
### DataFrames are core entities in data analytics

In [None]:
type(df)

# 3. Let's observe our data

In [None]:
# we show the first 5 rows
df.head()

In [None]:
# we show the size of our dataset
df.shape

In [None]:
# we get some more detailed info on our dataset
df.info()

## 3.1. Let's extract some columns from our data

In [None]:
# we can extract a column by its name
df['country']

In [None]:
# we can extract several columns at the same time
df[['country','lifeExp']]

## 3.2. Let's extract some rows from our data

In [None]:
# let's extract the first row. Python starts counting from zero
df.iloc[0]

In [None]:
# let's extract the 100th row. Python starts counting from zero
df.iloc[99]

In [None]:
# we can even select multiple rows

df.iloc[[0,99,999]]

# 4. Grouped and aggregated calculations

There are several initial questions that we can ask ourselves:
1. For each year in our data, what was the average life expectancy? What is the average life expectancy, population, and GDP?
2. What if we stratify the data by continent and perform the same calculations?
3. How many countries are listed in each continent?



## 4.1. What was the average life expectancy evolution across time ?

In [None]:
df

In [None]:
# the following command groups data by the columm "year" then extracts the column lifeExp and computes the mean
df.groupby('year')['lifeExp'].mean()

### the following figure provides a visual representation of the operation we have just performed

<img src="https://raw.githubusercontent.com/thousandoaks/BEMM458/master/sessions/images/groupby_means.png">

## 4.2. What was the average life expectancy evolution across time AND continent ?

In [None]:
# the following command groups data by the columm "year" AND continent then extracts the column lifeExp and computes the mean
df.groupby(['year','continent'])['lifeExp'].mean()

## 4.3. What was the average life expectancy evolution AND GDP per capita across time AND continent ?

In [None]:
df.groupby(['year','continent'])[['lifeExp','gdpPercap']].mean()

## 4.4. How many countries are there in each continent ?

In [None]:
df.head()

In [None]:
#  we group by continent then extract the country column and count unique occurrences
df.groupby('continent')['country'].nunique()

# 5. Basic plotting

## 5.1. Evolution of life expectancy across time

In [None]:
df.groupby('year')['lifeExp'].mean()

In [None]:
# Let's save the previous operation as a new variable

lifeExpectancyEvolution=df.groupby('year')['lifeExp'].mean()

In [None]:
# let's plot the result
lifeExpectancyEvolution.plot()

## 5.2. Evolution of life expectancy accross time and continent

In [None]:
lifeExpectancyEvolutionContinent=df.groupby(['continent','year'])['lifeExp'].mean()

In [None]:

lifeExpectancyEvolutionContinent.unstack(level=0).plot(kind='line', subplots=False,figsize=(10,10))

# 6. Challenge yourself ! 

## 6.1.   What was the maximum life expectancy evolution across time AND country ?

### Hint 1: use the followint command to display all rows in a pandas DataFrame

pandas.set_option('display.max_rows', None)

### Hint 2: look on the pandas documentation how to compute the maximum after a groupby operation

### https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html