# Introduction
This assignment will test how well you're able to perform various data science-related tasks.

Each Problem Group below will center around a particular dataset that you have worked with before.

To ensure you receive full credit for a question, make sure you demonstrate the appropriate pandas, altair, or other commands as requested in the provided code blocks. 

You may find that some questions require multiple steps to fully answer. Others require some mental arithmetic in addition to pandas commands. Use your best judgment.

## Submission
Each problem group asks a series of questions. This assignment consists of two submissions:

1. After completing the questions below, open the Module 01 Assessment Quiz in Canvas and enter your answers to these questions there.

2. After completing and submitting the quiz, save this Colab notebook as a GitHub Gist (You'll need to create a GitHub account for this), by selecting `Save a copy as a GitHub Gist` from the `File` menu above.

    In Canvas, open the Module 01 Assessment GitHub Gist assignment and paste the GitHub Gist URL for this notebook. Then submit that assignment.

## Problem Group 1

For the questions in this group, you'll work with the Netflix Movies Dataset found at this url: [https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/netflix_titles.csv](https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/netflix_titles.csv)


### Question 1
Load the dataset into a Pandas data frame and determine what data type is used to store the `release_year` feature.

In [25]:
import pandas as pd
import numpy as np
url = "https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/netflix_titles.csv"
df = pd.read_csv(url)
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6234 entries, 0 to 6233
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       6234 non-null   int64 
 1   type          6234 non-null   object
 2   title         6234 non-null   object
 3   director      4265 non-null   object
 4   cast          5664 non-null   object
 5   country       5758 non-null   object
 6   date_added    6223 non-null   object
 7   release_year  6234 non-null   int64 
 8   rating        6224 non-null   object
 9   duration      6234 non-null   object
 10  listed_in     6234 non-null   object
 11  description   6234 non-null   object
dtypes: int64(2), object(10)
memory usage: 584.6+ KB


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."
3,80058654,TV Show,Transformers: Robots in Disguise,,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,"September 8, 2018",2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,"September 8, 2017",2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...


### Question 2
Filter your dataset so it contains only `TV Shows`. How many of those TV Shows were rated `TV-Y7`?

In [34]:
fil = df[df["type"] == "TV Show"]
fil.value_counts()

show_id   type     title                         director          cast                                                                                                                                                                            country                          date_added          release_year  rating  duration   listed_in                                                          description                                                                                                                                           
81183491  TV Show  Jamtara - Sabka Number Ayega  Soumendra Padhi   Amit Sial, Dibyendu Bhattacharya, Aksha Pardhasany, Sparsh Shrivastava, Monika Panwar, Anshumaan Pushkar                                                                        India                            January 10, 2020    2020          TV-MA   1 Season   Crime TV Shows, International TV Shows, TV Dramas                  A group of small-town young men run a lucrative phishing 

### Question 3
Further filter your dataset so it only contains TV Shows released between the years 2000 and 2009 inclusive. How many of *those* shows were rated `TV-Y7`?

In [35]:
filt = fil[fil["release_year"] > 2000]
filte = filt[filt["release_year"] < 2009]
filte.value_counts()

show_id   type     title                                             director             cast                                                                                                                                                   country         date_added          release_year  rating  duration   listed_in                                             description                                                                                                                                          
70235731  TV Show  The Blue Planet: A Natural History of the Oceans  Alastair Fothergill  David Attenborough                                                                                                                                     United Kingdom  October 10, 2015    2001          TV-G    1 Season   British TV Shows, Docuseries, International TV Shows  David Attenborough narrates this definitive exploration of the marine world, from the familiar to the unknown, revealing t

## Problem Group 2

For the questions in this group, you'll work with the Cereal Dataset found at this url: [https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/cereal.csv](https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/cereal.csv)


### Question 4
After importing the dataset into a pandas data frame, determine the median amount of `protein` in cereal brands manufactured by Kelloggs. (`mfr` code "K")

In [47]:
url = "https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/cereal.csv"
tro = pd.read_csv(url)
tro.head(50)
pas = tro[tro["mfr"] == "K"]
pas["protein"].median()

3.0

### Question 5
In order to comply with new government regulations, all cereals must now come with a "Healthiness" rating. This rating is calculated based on this formula:

    healthiness = (protein + fiber) / sugar

Create a new `healthiness` column populated with values based on the above formula.

Then, determine the median healthiness value for only General Mills cereals (`mfr` = "G"), rounded to two decimal places.

In [55]:
sad = tro[tro["mfr"] == "G"]
sad
sad["healthiness"] = round(((sad.protein + sad.fiber) / sad.sugars), 2)
round(sad["healthiness"].median(), 2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


0.47

## Problem Group 3

For the questions in this group, you'll work with the Titanic Dataset found at this url: [https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/titanic.csv](https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/titanic.csv)

### Question 6

After loading the dataset into a pandas DataFrame, create a new column called `NameGroup` that contains the first letter of the passenger's surname in lower case.

Note that in the dataset, passenger's names are provided in the `Name` column and are listed as:

    Surname, Given names

For example, if a passenger's `Name` is `Braund, Mr. Owen Harris`, the `NameGroup` column should contain the value `b`.

Then count how many passengers have a `NameGroup` value of `k`.

In [65]:
url = "https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/titanic.csv"
stat = pd.read_csv(url)
stat["NameGroup"]= stat['Name'].astype(str).str[0]
K = stat[stat["NameGroup"] == "K"]
K.value_counts()

PassengerId  Survived  Pclass  Name                          Sex   Age   SibSp  Parch  Ticket  Fare     Cabin  Embarked  NameGroup
622          Yes       1       Kimball, Mr. Edwin Nelson Jr  male  42.0  1      0      11753   52.5542  D19    S         K            1
488          No        1       Kent, Mr. Edward Austin       male  58.0  0      0      11771   29.7000  B37    C         K            1
dtype: int64