# College Basketball Feature Store

Mock out connecting different data sources to build out a normalized view of features for analytics for college basketball

** Main things **

Find a way to get a normalized view of all stats related to college basketball. We’ll want to look at this for regular season AND post season. Post season is much harder since we have to predict weeks to a month out for game outcomes. The best chance is to take a list of features to summarize the current team performance for the year and approximate performance in the playoffs. Some stats (like elo) we can use as an end of season rating or take the last value. Sadly odds will not be available that far out. 

One additional piece will be to make an “approximate” odds calculation for each game. Each round of the playoffs will be iterated where in a closed system like elo we can include the iterated elo for each team. 

We need to make a simulator class to iterate through the playoff. Once that is made we can apply a random factor to the initialization of teams and see how many (if even possible) iterations it will take to perfectly predict the postseason

Start small. Pick one game:
- Get event info
- Pull in elo rating from theedgepredictor/elo-rating

Additional:
- Torvik (trank, net rank, barthag, ratings)
- odds (open, current)
- team stats (season, last 3, last 5, last 10)
- Name mapper college

In [6]:
import time

import requests


class ESPNBaseAPI:
    """
    ESPNBaseAPI class for making API requests to ESPN's sports data endpoints.

    Attributes:
        _base_url (str): The base URL for ESPN's public API.
        _core_url (str): The base URL for ESPN's core API.

    Methods:
        api_request(url: str, retry_count: int = 0) -> dict or None:
            Makes an API request to the specified URL.

            Args:
                url (str): The complete URL for the API request.
                retry_count (int): The number of times to retry the request in case of failure. Default is 0.

            Returns:
                dict or None: The JSON response from the API, or None if the request was unsuccessful.
                If the response indicates a 404 status code or an error, None is returned.

            Raises:
                Exception: Raises an exception if the request encounters an error after multiple retries.
                This is typically used when the request limit is exceeded (error code 2502).
    """

    def __init__(self):
        """
        Initializes an instance of the ESPNBaseAPI class.

        Attributes:
            _base_url (str): The base URL for ESPN's public API.
            _core_url (str): The base URL for ESPN's core API.
        """
        self._base_url = 'https://site.api.espn.com/apis/site/v2/sports'
        self._core_url = 'https://sports.core.api.espn.com/v2/sports'

    def api_request(self, url: str, retry_count: int = 0) -> dict or None:
        """
        Makes an API request to the specified URL.

        Args:
            url (str): The complete URL for the API request.
            retry_count (int): The number of times to retry the request in case of failure. Default is 0.

        Returns:
            dict or None: The JSON response from the API, or None if the request was unsuccessful.
            If the response indicates a 404 status code or an error, None is returned.

        Raises:
            Exception: Raises an exception if the request encounters an error after multiple retries.
            This is typically used when the request limit is exceeded (error code 2502).
        """
        try:
            headers = {
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
            }
            resp = requests.get(url=url, headers=headers)
            if resp.status_code == 404:
                return None
            res = resp.json()
            if 'error' in res:
                if res['error']['code'] == 404:  # No data
                    return None
            if 'code' in res:
                if res['code'] == 2502:
                    raise Exception('Flooded')  # Too many requests
                if res['code'] == 400:  # Data cant be found (wrong endpoint/wrong request)
                    return None
            return res
        except Exception as e:
            if retry_count >= 3:
                raise e
            time.sleep(5)
            print(f'URL error for {url}')
            self.api_request(url, retry_count=retry_count + 1)


In [3]:
import pandas as pd

df = pd.read_parquet('https://github.com/theedgepredictor/elo-rating/raw/main/data/elo/basketball/mens-college-basketball/2023.parquet')

simple = df.loc[df.id == 401522202].to_dict(orient='records')[0]

In [4]:
simple

{'str_event_id': '20230404_uconn_sandiegostate',
 'season': 2023,
 'date': Timestamp('2023-04-04 00:00:00'),
 'neutral_site': 1,
 'home_team_id': 41,
 'home_team_score': 76,
 'away_team_id': 21,
 'away_team_score': 59,
 'home_elo_pre': 2332.574918858023,
 'away_elo_pre': 2276.7945212855925,
 'home_elo_prob': 0.5400512295532702,
 'away_elo_prob': 0.4599487704467299,
 'home_elo_post': 2371.4713973814605,
 'away_elo_post': 2237.898042762155,
 'id': 401522202,
 'home_team_name': 'uconn',
 'away_team_name': 'sandiegostate',
 'is_postseason': 1,
 'tournament_id': 22,
 'is_finished': 1,
 'datetime': Timestamp('2023-04-04 01:20:00+0000', tz='UTC')}

## Simple

Game: 20230404_uconn_sandiegostate
- Info: Last years final
- Game_id: 401522202
- Home_id: 41
- Away_id: 21



In [51]:
sport_str = 'basketball/mens-college-basketball'
base_api = ESPNBaseAPI()
res = base_api.api_request(f"{base_api._base_url}/{sport_str}/scoreboard?dates=20230403&limit=10")
res['events'][0]['competitions'][0]['competitors'][0]['team'].keys()

dict_keys(['id', 'uid', 'location', 'name', 'abbreviation', 'displayName', 'shortDisplayName', 'color', 'alternateColor', 'isActive', 'venue', 'links', 'logo', 'conferenceId'])

## Access Pattern

### 1. Scoreboard

Get overall event info. Main thing is:
- id
- datetime
- status
- home_id
- away_id

### 2. Team Stats
base: https://site.api.espn.com/apis/site/v2/sports/{SPORT}/seasons/2023/teams/41

Get team based stats for each team




In [59]:
res = base_api.api_request(f"{base_api._base_url}/{sport_str}/seasons/2023/teams/41/events")
res

{}

In [60]:
f"{base_api._base_url}/{sport_str}/seasons/2023/teams/41/events"

'https://site.api.espn.com/apis/site/v2/sports/basketball/mens-college-basketball/seasons/2023/teams/41/events'