# 2022-2024 NHL Games

<hr>

## Task 3 - Exploratory Data Project
#### Tim Gormly
#### 2/18/2023

<hr>

<br>
In this notebook, we will look at statistics surround the games played in the 2022-2023 NHL season.  This dataset is made available by www.hockey-reference.com.  This data can be found here: https://www.hockey-reference.com/leagues/NHL_2023_games.html.
<br>
<br>

<hr>



## 1. Load the Data

To begin, we will import Pandas and use csv_read() to load this our NHL game data into a DataFrame.

In [29]:
import pandas as pd

df = pd.read_csv('22-23_games.csv')

<hr>

## 2. View the Data

Let's take a look at the data we're working with.

In [30]:
df.head()

Unnamed: 0,Date,Visitor,G,Home,G.1,Decision,Att.,LOG,Notes
0,2022-10-07,San Jose Sharks,1.0,Nashville Predators,4.0,,16648.0,2:43,at (Prague CZ)
1,2022-10-08,Nashville Predators,3.0,San Jose Sharks,2.0,,17023.0,2:33,at (Prague CZ)
2,2022-10-11,Vegas Golden Knights,4.0,Los Angeles Kings,3.0,,18230.0,2:31,
3,2022-10-11,Tampa Bay Lightning,1.0,New York Rangers,3.0,,18006.0,2:21,
4,2022-10-12,Seattle Kraken,4.0,Anaheim Ducks,5.0,OT,17530.0,2:28,


In [31]:
df.tail()

Unnamed: 0,Date,Visitor,G,Home,G.1,Decision,Att.,LOG,Notes
1307,2023-04-13,Vegas Golden Knights,,Seattle Kraken,,,,,
1308,2023-04-13,Detroit Red Wings,,Tampa Bay Lightning,,,,,
1309,2023-04-13,New Jersey Devils,,Washington Capitals,,,,,
1310,2023-04-14,Buffalo Sabres,,Columbus Blue Jackets,,,,,
1311,2023-04-14,Colorado Avalanche,,Nashville Predators,,,,,


Looking at the head of this DataFrame, we can see that we're only working with a few columns.

Let's name a few of these so that they are more descriptive and clear.



In [32]:
df.columns = ["Date", "Visiting_Team", "Visiting_Team_Goals", "Home_Team", "Home_Team_Goals", "Decision", "Attendance", "Game_Length", "Notes"]

df.head()

In [None]:
df.tail()

Now we can see that we have:

<ul>
    <li>Index: Game number for the season</li>
    <li>Date: Date of game, YYYY-MM-DD format</li>
    <li>Visiting_Team: The name of the visiting team</li>
    <li>Visiting_Team_Goals: Goals scored by the visiting team</li>
    <li>Home_Team: The name of the home team</li>
    <li>Home_Team_Goals: Goals scored by the home team</li>
    <li>Decision: Initially this is blank if the result of the game was determined in regulation.  If the game had to go to overtime or the shootout, this is noted in this column</li>
    <li>Attendance: Number of people attending the game in person</li>
    <li>Game_Length: Length of game, H:MM format </li>
    <li>Notes: Specific notes related to game.  This is typically blank.  This is used to note if a game was played in an atypical arena.  In the head() call, we can see that games at index 0 and 1 were played in Prague, CZ.</li>
</ul>

Note that in the tail, there are a lot of fields labeled "NaN" or "Not a Number".  This is because these games are part of the 2022-2023 season, but they are for a future date and no gameplay data exists for these games yet.

<hr>

TODO:

-- Is there a home team advantage?
-- What percentage of games were decided in regulation?  Overtime?  Shootout? (filter)
-- How do visiting team goal totals compare to home team goal totals
-- Describe data for total goals.