# Preparing Data for Database Loading
- Data acquisition utilizing ```nba_api``` Python wrapper
- Data cleaning
- Data organization/modeling for NoSQL database

## Retrieving All NBA Teams
- Load all NBA teams from static module from ```nba_api```

In [1]:
from nba_api.stats.static import teams as tm

teams = tm.get_teams()
print(teams[0])

{'id': 1610612737, 'full_name': 'Atlanta Hawks', 'abbreviation': 'ATL', 'nickname': 'Hawks', 'city': 'Atlanta', 'state': 'Atlanta', 'year_founded': 1949}


### Fixing Errors in Data
- As seen above there are incorrect values in the data from the ```nba_api``` static module (i.e. ```'state': 'Atlanta'```)
- Data will be manually cleaned

In [2]:
# Set values in team dictionaries
teams[0]['state'] = 'Georgia'
teams[7]['city'] = 'San Francisco'
teams[13]['city'] = 'Minneapolis'
teams[17]['city'] = 'Indianapolis'
teams[25]['city'] = 'Salt Lake City'

## NoSQL Schema (BigQuery)
Schema in ```nosql_schema.json```:
- ```id```
- ```full_name```
- ```abbreviation```
- ```nickname```
- ```city```
- ```state```
- ```games``` (nested and repeated field)
    - ```games.game_id```
    - ```games.date```
    - ```games.is_home```
    - ```games.opponent```
    - ```games.shots``` (nested and repeated field)
        - ```games.shots.shot_number```
        - ```games.shots.player_id```
        - ```games.shots.period```
        - ```games.shots.play_clock```
        - ```games.shots.time_elapsed```
        - ```games.shots.shot_type```
        - ```games.shots.range```
        - ```games.shots.is_make```

### Retrieving and Organizing Data According to Schema
- Class objects (```models.py```) were created to encapsulate data
- Note: script will insert all plays of all games into ```Team``` object

In [3]:
from nba_api.stats.endpoints import teamgamelog as tgl
from nba_api.stats.endpoints import playbyplayv2 as pbp

from models import Shot, Game, Team

In [4]:
# Set team
team = Team(teams[0])

dict(team)

{'id': 1610612737,
 'full_name': 'Atlanta Hawks',
 'abbreviation': 'ATL',
 'nickname': 'Hawks',
 'city': 'Atlanta',
 'state': 'Georgia',
 'games': []}

In [5]:
# Retrieve all games for team in the season (most recent by default)
games = tgl.TeamGameLog(team.id).get_normalized_dict()['TeamGameLog']

In [6]:
game = Game(games[0])

dict(game)

{'game_id': '0022001066',
 'date': datetime.date(2021, 5, 16),
 'is_home': True,
 'opponent': 'HOU',
 'shots': []}

In [7]:
# Retrieve all plays from game
plays = pbp.PlayByPlayV2(game.game_id).get_normalized_dict()['PlayByPlay']

# Insert plays that are shots
game.insert_shots(plays)

dict(game)['shots'][0]

{'shot_number': 1,
 'player_id': 1629631,
 'period': 1,
 'play_clock': '11:33',
 'time_elapsed': '0:27',
 'shot_type': '2PT',
 'range': 21,
 'is_make': True}

In [10]:
# Insert game into team object
team.insert_game(dict(game))

dict(team)['games'][0]['shots'][-1]

{'shot_number': 90,
 'player_id': 1630233,
 'period': 4,
 'play_clock': '0:18',
 'time_elapsed': '47:42',
 'shot_type': '2PT',
 'range': 1,
 'is_make': True}