# Refreshing Python Essentials: Pandas

## Outline

* [Reading Data](#Reading-Data)
* [Indexing & Selecting](#Indexing-&-Selecting)
* [Calculating Basic Stats](#Calculating-Basic-Stats)
* [Working with Columns](#Working-with-Columns)
* [Using `str` and `dt` Functions](#Using-str-and-dt-Functions)
* [Mapping](#Mapping)
* [Grouping](#Grouping)
* [Challenge](#Challenge)

In [None]:
import pandas as pd

## Reading Data

In [None]:
baby_names = pd.read_csv('data/baby-names.csv')

In [None]:
baby_names.head()

---

## Indexing & Selecting

In [None]:
baby_names.Year

In [None]:
baby_names['Name']

In [None]:
baby_names.Gender.value_counts()

In [None]:
baby_names[baby_names.Gender == 'F']

In [None]:
baby_names[(baby_names.Gender == 'F') & (baby_names.Count > 90000)]

---

## Calculating Basic Stats

In [None]:
baby_names.describe()

In [None]:
baby_names.Count.mean()

---

## Working with Columns

In [None]:
baby_names['new_column'] = 1

In [None]:
baby_names.head()

In [None]:
baby_names.drop(['Id', 'Count'], axis='columns')

In [None]:
baby_names.columns

In [None]:
baby_names.rename(columns={
    'Id': 'id'
})

In [None]:
baby_names.columns = ['id', 'name', 'year', 'gender', 'count']

In [None]:
baby_names.head()

---

## Using `str` and `dt` Functions

In [None]:
baby_names.name.str.upper()

In [None]:
baby_names.year = pd.to_datetime(baby_names.year, format='%Y')

In [None]:
baby_names.head()

---

## Mapping

In [None]:
def convert(value):
    if value == 'F':
        return 'Female'
    else:
        return 'Male'

In [None]:
baby_names.gender.map(convert)

---

## Grouping

In [None]:
baby_names.head()

In [None]:
baby_names.groupby('gender').mean()['count']

---

## Challenge

1. ชื่อไหนที่ได้รับความนิยมมากที่สุดในข้อมูลชุดนี้ 10 อันดับแรก?
2. ในปี 2000 ชื่ออะไรได้รับความนิยมสูงสุด?

In [None]:
import pandas as pd

In [None]:
baby_names = pd.read_csv('data/baby-names.csv')

In [None]:
baby_names.groupby('Name')['Count'].sum().sort_values(ascending=False)[0:10]

In [None]:
baby_names[baby_names.Year == 2000].sort_values(by='Count', ascending=False).iloc[0]