# Project - Play with DataFrames

## Goal of Project
- Master pandas DataFrame

### Step 1: Import pandas
- Execute the cell below (SHIFT + ENTER)

In [1]:
import pandas as pd

### Step 2: Read the data
- Use ```pd.read_csv()``` to read the file `files/population.csv`
- NOTE: Remember to assign the result to a variable (e.g., ```data```)

In [3]:
data = pd.read_csv('files/population.csv')
data.head()

Unnamed: 0,Country,Year,Population
0,Denmark,2000,5.3
1,Denmark,2010,5.5
2,Denmark,2020,5.8
3,Sweden,2000,8.8
4,Sweden,2010,9.3


### Step 3: Investigate the data types
- Use ```.dtypes``` 

In [4]:
data.dtypes

Country        object
Year            int64
Population    float64
dtype: object

### Step 4: Convert Year to Datetime
- ```pd.to_datetime(...)```: Convert to a datetime
- ```format='%Y'```: Format of input, here it is the year.

In [8]:
data['Year'] = pd.to_datetime(data['Year'], format='%Y')
data.head()

Unnamed: 0,Country,Year,Population
0,Denmark,2000-01-01,5.3
1,Denmark,2010-01-01,5.5
2,Denmark,2020-01-01,5.8
3,Sweden,2000-01-01,8.8
4,Sweden,2010-01-01,9.3


### Step 5: Scale Population to millions
- HINT: ```data['Population']*1000``` scales by 1000

In [13]:
data['Population'] = data['Population'] * 1_000_000

In [14]:
data.head()

Unnamed: 0,Country,Year,Population
0,Denmark,2000-01-01,5300000.0
1,Denmark,2010-01-01,5500000.0
2,Denmark,2020-01-01,5800000.0
3,Sweden,2000-01-01,8800000.0
4,Sweden,2010-01-01,9300000.0


### Step 6: Calculate mean population for each country
- HINT: ```data.groupby('Country')``` groups the data

In [15]:
data.groupby('Country').mean()

Unnamed: 0_level_0,Population
Country,Unnamed: 1_level_1
Denmark,5533333.0
Sweden,9433333.0


### Step 7: Replace Denmark to DNK
- Given a column you can access the string functions on it with ```.str```
    - This enables you to apply string functions on it
    - HINT: ```data['Country'].str.replace('Denmark', 'DNK')```

In [18]:
data['Country'] = data['Country'].str.replace('Denmark', 'DNK')

In [19]:
data.head()

Unnamed: 0,Country,Year,Population
0,DNK,2000-01-01,5300000.0
1,DNK,2010-01-01,5500000.0
2,DNK,2020-01-01,5800000.0
3,Sweden,2000-01-01,8800000.0
4,Sweden,2010-01-01,9300000.0
