# US - Baby Names

### Introduction:

We are going to use a subset of [US Baby Names](https://www.kaggle.com/kaggle/us-baby-names) from Kaggle.  
In the file it will be names from 2004 until 2014


### Step 1. Import the necessary libraries

In [50]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv). 

### Step 3. Assign it to a variable called baby_names.

In [51]:
url = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv"
baby_names = pd.read_csv(url)

### Step 4. See the first 10 entries

In [52]:
baby_names.head(10)

Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41
5,11354,11355,Abigail,2004,F,AK,37
6,11355,11356,Olivia,2004,F,AK,33
7,11356,11357,Isabella,2004,F,AK,30
8,11357,11358,Alyssa,2004,F,AK,29
9,11358,11359,Sophia,2004,F,AK,28


### Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [53]:
baby_names.drop(["Unnamed: 0", "Id"], axis=1, inplace=True)

### Step 6. Is there more male or female names in the dataset?

In [54]:
baby_names.Gender.value_counts()

F    558846
M    457549
Name: Gender, dtype: int64

### Step 7. Group the dataset by name and assign to names

In [55]:
names = baby_names.groupby(by="Name")

### Step 8. How many different names exist in the dataset?

In [56]:
len(names)

17632

### Step 9. What is the name with most occurrences?

In [57]:
names.sum().Count.sort_values(ascending=False).head(1)

Name
Jacob    242874
Name: Count, dtype: int64

### Step 10. How many different names have the least occurrences?

In [58]:
(names.sum().Count == 5).sum()

2578

In [59]:
names.head()

Unnamed: 0,Name,Year,Gender,State,Count
0,Emma,2004,F,AK,62
1,Madison,2004,F,AK,48
2,Hannah,2004,F,AK,46
3,Grace,2004,F,AK,44
4,Emily,2004,F,AK,41
...,...,...,...,...,...
1004923,Gryffin,2014,M,WI,5
1004950,Kroy,2014,M,WI,5
1004973,Owyn,2014,M,WI,5
1005707,Haylea,2005,F,WV,5


### Step 11. What is the standard deviation of names?

In [62]:
names.Count.std()

Name
Aaban       0.000000
Aadan       0.957427
Aadarsh          NaN
Aaden      21.154974
Aadhav           NaN
             ...    
Zyra        1.154701
Zyrah       0.707107
Zyren            NaN
Zyria       0.737865
Zyriah      1.666667
Name: Count, Length: 17632, dtype: float64

### Step 12. Get a summary with the mean, min, max, std and quartiles.

In [63]:
names.describe().T

Unnamed: 0,Name,Aaban,Aadan,Aadarsh,Aaden,Aadhav,Aadhya,Aadi,Aadin,Aadit,Aaditya,...,Zymire,Zyon,Zyonna,Zyquan,Zyquavious,Zyra,Zyrah,Zyren,Zyria,Zyriah
Year,count,2.0,4.0,1.0,196.0,1.0,40.0,38.0,1.0,3.0,14.0,...,2.0,138.0,11.0,7.0,1.0,7.0,2.0,1.0,10.0,9.0
Year,mean,2013.5,2009.75,2009.0,2010.015306,2014.0,2012.875,2008.947368,2008.0,2009.666667,2008.285714,...,2012.0,2009.231884,2009.818182,2007.142857,2010.0,2012.142857,2012.0,2013.0,2008.9,2009.666667
Year,std,0.707107,2.872281,,2.044322,,1.488201,2.865898,,3.785939,2.757607,...,2.828427,2.8727,3.060006,1.772811,,2.115701,1.414214,,2.685351,2.915476
Year,min,2013.0,2008.0,2009.0,2005.0,2014.0,2007.0,2004.0,2008.0,2007.0,2005.0,...,2010.0,2004.0,2005.0,2005.0,2010.0,2008.0,2011.0,2013.0,2005.0,2006.0
Year,25%,2013.25,2008.0,2009.0,2008.0,2014.0,2012.0,2007.0,2008.0,2007.5,2006.25,...,2011.0,2007.0,2008.0,2006.0,2010.0,2011.5,2011.5,2013.0,2007.25,2007.0
Year,50%,2013.5,2008.5,2009.0,2010.0,2014.0,2013.0,2009.0,2008.0,2008.0,2008.0,...,2012.0,2009.0,2009.0,2007.0,2010.0,2013.0,2012.0,2013.0,2008.0,2009.0
Year,75%,2013.75,2010.25,2009.0,2011.0,2014.0,2014.0,2010.75,2008.0,2011.0,2009.0,...,2013.0,2012.0,2012.5,2008.0,2010.0,2013.5,2012.5,2013.0,2010.5,2012.0
Year,max,2014.0,2014.0,2009.0,2014.0,2014.0,2014.0,2014.0,2008.0,2014.0,2014.0,...,2014.0,2014.0,2014.0,2010.0,2010.0,2014.0,2013.0,2013.0,2014.0,2014.0
Count,count,2.0,4.0,1.0,196.0,1.0,40.0,38.0,1.0,3.0,14.0,...,2.0,138.0,11.0,7.0,1.0,7.0,2.0,1.0,10.0,9.0
Count,mean,6.0,5.75,5.0,17.479592,6.0,11.325,8.078947,5.0,6.0,6.928571,...,5.0,8.594203,5.545455,6.0,6.0,6.0,5.5,6.0,5.9,6.444444
