# Task 1: Instructions

Import `pandas` then load the data.

- Read the notebook on the right before the instructions here on the left.

- Import `pandas` under the alias `pd`.

- Load the dataset's CSV files (`'datasets/super_bowls.csv'`, `'datasets/tv.csv'`, and `'datasets/halftime_musicians.csv'`) into DataFrames.

## Good to know

This project gives you an opportunity to apply the skills from [Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science). DataCamp projects are completed in Jupyter Notebooks. If you'd like more info on Jupyter Notebooks, check out this [introduction](https://www.datacamp.com/projects/33).

DataCamp projects are more open-ended than DataCamp courses. The **"Check Project" button** checks to see if you have completed tasks in the project, though it doesn't check for absolute code "correctness" as there can be multiple "correct" solutions sometimes. The Jupyter Notebook will still provide error messages if your code causes an error. Consult the hint and the expected output image to see what one correct solution looks like.

The **hints** for this project consist of the solution code with minimal fill-in-the-blanks represented by underscores.

If you experience odd behavior you can reset the project by clicking the circular arrow in the bottom-right corner of the screen. Resetting the project will discard all code you have written so be sure to save it offline first.

Helpful links for this task:

- CSV to DataFrame [exercise](https://campus.datacamp.com/courses/intermediate-python-for-data-science/dictionaries-pandas?ex=12)

The output for *one* correct version of a solution looks like this:

![1](https://assets.datacamp.com/production/project_684/img/task_1_output.png)

# Task 2: Instructions

Display and inspect the summaries of the TV and halftime musician DataFrames for issues.

- Use the `.info()` method to inspect the DataFrame `tv`.

- Use the `.info()` method to inspect the DataFrame `halftime_musicians`.

The `.info()` method wasn't covered in Intermediate Python for Data Science so if you're stuck, check out the hint for the full solution.

You don't need to use `display()` or `print()` with `.info()` in Jupyter Notebooks because it prints to the output by default. The `'\n'` prints a blank line in between the `.info()` summaries to make them more readable.

Helpful links:

- `.info()` method [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html)
- Inspecting a DataFrame [exercise](https://campus.datacamp.com/courses/introduction-to-data-science-in-python/loading-data-in-pandas?ex=3) (in another course)

The output for *one* correct version of a solution looks like this:
    
![2](https://assets.datacamp.com/production/project_684/img/task_2_output.png)

# Task 3: Instructions

Plot a histogram of combined points then display the rows with the most extreme combined point outcomes.

- From `matplotlib`, import the `pyplot` module under the alias `plt`.

- Create a histogram of the `combined_pts` column from the `super_bowls` DataFrame.

- Select the Super Bowl(s) where the combined score was less than 25.

`%matplotlib inline` is a magic Jupyter Notebook command that allows you to display your graphs without `plt.show()`. You only need to use `plt.show()` in this notebook if you want to display the plot before other outputs (which you do in this task).

Helpful links:

- Basic plots with `matplotlib` [lesson](https://campus.datacamp.com/courses/intermediate-python-for-data-science/matplotlib?ex=1)

- Histograms [lesson](https://campus.datacamp.com/courses/intermediate-python-for-data-science/matplotlib?ex=7)

- Filtering Pandas DataFrame [lesson](https://campus.datacamp.com/courses/intermediate-python-for-data-science/logic-control-flow-and-filtering?ex=14)

The output for *one* correct version of a solution looks like this:
    
![3](https://assets.datacamp.com/production/project_684/img/task_3_output.png)

# Task 4: Instructions

Modify and display the histogram of point differences, then display the rows with the most extreme point difference outcomes.

- Add a y-label with `'Number of Super Bowls'`.

- Display the plot with `plt.show()`.

- Select the Super Bowl(s) where the point difference was equal to 1.

- Select the Super Bowl(s) where the point difference was greater than or equal to 35.

Helpful links:

- Labels [exercise](https://campus.datacamp.com/courses/intermediate-python-for-data-science/matplotlib?ex=14)

The output for *one* correct version of a solution looks like this:
    
![4](https://assets.datacamp.com/production/project_684/img/task_4_output.png)

# Task 5: Instructions

Import `seaborn` and plot household share vs. point difference.

- Import the `seaborn` module under the alias `sns`.

- Fill in the `x` argument of `sns.regplot()` with the point difference column

- Fill in the `y` argument of `sns.regplot()` with the household share column.

Remember column names are represented as strings!

`seaborn`'s `regplot()` is like scatter plot except more specialized for [visualizing linear relationships](https://seaborn.pydata.org/tutorial/regression.html#functions-to-draw-linear-regression-models). It draws a scatterplot, then fits a regression model and plots the resulting regression line and a 95% confidence interval for that regression.

Helpful links:

- Packages [lesson](https://campus.datacamp.com/courses/intro-to-python-for-data-science/chapter-3-functions-and-packages?ex=9)

The output for *one* correct version of a solution looks like this:
    
![5](https://assets.datacamp.com/production/project_684/img/task_5_output.png)

# Task 6: Instructions

Create three line plots using the `tv` DataFrame to compare viewers, rating, and ad cost.

- For the first plot, plot `super_bowl` on the x-axis, `avg_us_viewers` on the y-axis, and set the line color to `'#648FFF'`.

- For the second plot, plot `super_bowl` on the x-axis, `rating_household` on the y-axis, and set the line color to `'#DC267F'`.

- For the third plot, plot `super_bowl` on the x-axis, `ad_cost` on the y-axis, and set the line color to `'#FFB000'`.

The colors for the lines were based on a palette suggestion from [Coloring for Colorblindness](https://davidmathlogic.com/colorblind/).

Helpful links:

- Line plot [exercise](https://campus.datacamp.com/courses/intermediate-python-for-data-science/matplotlib?ex=2)

The output for *one* correct version of a solution looks like this:
    
![6](https://assets.datacamp.com/production/project_684/img/task_6_output.png)

# Task 7: Instructions

Filter and display the musicians for halftime shows up to and including Super Bowl XXVII.

- Using `halftime_musicians`, select the musicians that performed in halftime shows up to and including Super Bowl XXVII (27) (i.e. Michael Jackson's performance).

The last line of code in a Jupyter Notebook cell automatically gets it output displayed so you don't need to use `display()` here.

The output for *one* correct version of a solution looks like this:

![7](https://assets.datacamp.com/production/project_684/img/task_7_output.png)

# Task 8: Instructions

Select and display the musicians with more than one halftime show appearance.

- The new `halftime_appearances` DataFrame has two columns, `musician` and `super_bowl`, where `super_bowl` now contains the halftime show counts for each musician. Select the musicians that have appeared in more than one halftime show.

The `halftime_appearances` code is preloaded because it wasn't covered in the prerequisite for this project, [Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science). Grouping and rearranging data are covered in [Manipulating DataFrames with pandas](https://www.datacamp.com/courses/manipulating-dataframes-with-pandas).

The output for *one* correct version of a solution looks like this:

![8](https://assets.datacamp.com/production/project_684/img/task_8_output.png)

# Task 9: Instructions

Modify the histogram of number of songs performed for non-band musicians.

- In the `plt.hist()` function, set the number of bins argument equal to `most_songs` (the most number of songs performed in a halftime show by a single musician).
- Add an x-label with `'Number of Songs Per Halftime Show Performance'`.

You can't filter out "Band" because Bruce Springsteen and the E Street Band performed at Super Bowl XLIII.

The `no_bands` code is preloaded because it wasn't covered in [Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science). The `.str.contains()` method is covered in [Cleaning Data in Python](https://campus.datacamp.com/courses/cleaning-data-in-python/case-study-5?ex=9).

Helpful links:

- Build a histogram: bins [exercise](https://campus.datacamp.com/courses/intermediate-python-for-data-science/matplotlib?ex=9)

The output for *one* correct version of a solution looks like this:

![9](https://assets.datacamp.com/production/project_684/img/task_9_output.png)

# Task 10: Instructions

Who will win Super Bowl LIII?

- The `patriots` and `rams` are playing in Super Bowl LIII. Assign the variable of the team you think will win to the `super_bowl_LIII_winner` variable.

Congratulations on reaching the end of the project! You just applied your Python skills in a real-world data analysis. The structure of this project (where code intersperses narrative) is an excellent structure for blog posts to add to your data science portfolio.

To continue building your Python skills, continue to the next course in your track. If you're not enrolled in a track, pick a new course from DataCamp's Python [library](https://www.datacamp.com/courses/tech:python).