# Scraping BestFightOdds.com

This is a Python script designed to scrape event and odds information about UFC (Ultimate Fighting Championship) fights for data analysis or predictive model development. The website being scraped for the odds is "bestfightodds.com". The script also uses other libraries such as Pandas, BeautifulSoup, and Selenium for data manipulation, webpage parsing and automating browser tasks respectively. 


PROBLEM with BestFightOdds: Many fights are not in the card they say they were (i think), i.e., organization is meh. 


Here's a high-level flow of what this script does:

1. Importing necessary libraries and setting the working directory.

2. It then loads event data from an existing CSV file in the directory ("Final_Hand_Done_BFO_Urls_with UFC_Stats_Urls.csv").

3. It fetches event URLs from the newly loaded file and prepares a list of all unique URLs.

4. It checks these URLs against the completed UFC events listed on the UFC stats website. If there are new events not covered in the initial CSV, they are identified, and their URLs from bestfightodds.com are fetched by searching via 'Google search'.

5. If there are any issues with fetching these URLs, errors are flagged and depending on the issue, the script may require manual intervention.

6. For each of the events, it downloads their odds information by sending a GET request to the URL while providing user-agent headers to avoid the request being blocked. This data is processed and added to a Pandas DataFrame for further processing.

7. Once all the data for the URLs are downloaded, the script then attempts to download all the odds change data by simulating browser interaction with each of the URLs using Selenium WebDriver. 

8. Finally, the process of data collection is summarized, and all data is saved into CSV files for further analysis.

9. Outputs: (in data/final/...)
- 'events/All_Events_V1.csv'
- 'events/odds_per_event/' + event_name + '.csv'
- 'odds/All_Odds_by_Fighter_V1.csv'
- 'odds/All_Odds_by_Fighter_V2.csv'
- 'events/All_Events_V1.csv'
- 'odds/odds_changes/' + name + '.csv'
- 'aggregates/All_Odds_Changes_V1.csv'
- 'odds/All_Odds_by_Fighter_WithChange.csv'




In [1]:
# Load Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.ticker as mtick
import seaborn as sns
from matplotlib.pyplot import figure
from bs4 import BeautifulSoup
import time
import requests     
import shutil      
import datetime
from scipy.stats import norm
from random import randint
import  random
import os
os.chdir('/Users/travisroyce/Library/CloudStorage/OneDrive-Personal/Data Science/Personal_Projects/Sports/UFC_Prediction_V2')
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup 
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

## Excel-fixed data:

Initially, I matched up events in excel, so we will load that file. From here on out, I have a function that will scrape the data for me.

In [2]:
# get the current path
path = os.getcwd()
path

'/Users/travisroyce/Library/CloudStorage/OneDrive-Personal/Data Science/Personal_Projects/Sports/UFC_Prediction_V2'

In [3]:
# load excel-fixed data, which we will update with the scraped data
fight_odds_urls = pd.read_excel('data/final/aggregates/ufcstats_event_urls_v2.xlsx')

print(fight_odds_urls.shape)
fight_odds_urls.head(3)

(668, 7)


Unnamed: 0,event_title,event_url,date,location,event_id,BestFightOdds_Url,BFO_Second_Url
0,UFC Fight Night: Almeida vs. Lewis,http://ufcstats.com/event-details/7c4ec656d8fc...,2023-11-04,"Sao Paulo, Sao Paulo, Brazil",7c4ec656d8fcb867,https://www.bestfightodds.com/events/ufc-2988,
1,UFC 294: Makhachev vs. Volkanovski 2,http://ufcstats.com/event-details/13a0fb8fbdaf...,2023-10-21,"Abu Dhabi, Abu Dhabi, United Arab Emirates",13a0fb8fbdafb54f,https://www.bestfightodds.com/events/ufc-294-2908,
2,UFC Fight Night: Yusuff vs. Barboza,http://ufcstats.com/event-details/f3a078277b3b...,2023-10-14,"Las Vegas, Nevada, USA",f3a078277b3b8ff4,https://www.bestfightodds.com/events/ufc-fight...,https://www.bestfightodds.com/events/ufc-fight...


In [4]:
# get all event urls from excel-fixed data
event_urls = fight_odds_urls['event_url'].unique()
print(f' {len(event_urls)} events in excel-fixed data')

# get all event urls
event_urls = fight_odds_urls['event_url'].unique()
# split by /, keep last
event_urls = [url.split('/')[-1] for url in event_urls]

 668 events in excel-fixed data


In [5]:
# get list of all values in BestFightOdds_Url and BFO_Second_Url
bfo_urls = fight_odds_urls['BestFightOdds_Url'].tolist()
bfo_urls.extend(fight_odds_urls['BFO_Second_Url'].tolist())

# remove duplicates
bfo_urls = list(set(bfo_urls))
# to df
bfo_urls = pd.DataFrame(bfo_urls, columns=['BestFightOdds_Url'])
# add column for short urls
bfo_urls['short_url'] = bfo_urls['BestFightOdds_Url'].str.split('/').str[-1]
bfo_urls

Unnamed: 0,BestFightOdds_Url,short_url
0,https://www.bestfightodds.com/events/ufc-fight...,ufc-fight-night-77-belfort-vs-henderson-3-997
1,https://www.bestfightodds.com/events/ufc-fight...,ufc-fight-night-69-jedrzejczyk-vs-penne-964
2,https://www.bestfightodds.com/events/ufc-77-ho...,ufc-77-hostile-territory-10
3,https://www.bestfightodds.com/events/ufc-183-s...,ufc-183-silva-vs-diaz-856
4,https://www.bestfightodds.com/events/ufc-119-m...,ufc-119-mir-vs-cro-cop-296
...,...,...
533,https://www.bestfightodds.com/events/ufc-fight...,ufc-fight-night-83-cowboy-vs-cowboy-1057
534,https://www.bestfightodds.com/events/ufc-251-u...,ufc-251-usman-vs-masvidal-1890
535,https://www.bestfightodds.com/events/ufc-fight...,ufc-fight-night-41-munoz-vs-mousasi-799
536,https://www.bestfightodds.com/events/ufc-261-u...,ufc-261-usman-vs-masvidal-2-2087


In [6]:
# check downloaded files
downloaded = os.listdir('data/final/events/odds_per_event')
len(downloaded)
list_down = pd.DataFrame(downloaded, columns=['short_url'])
# drop .csv
list_down['short_url'] = list_down['short_url'].str.replace('.csv', '')
list_down_list = list_down['short_url'].tolist()

  list_down['short_url'] = list_down['short_url'].str.replace('.csv', '')


In [7]:
# check for short_urls not in downloaded
bfo_urls['short_url'].isin(list_down_list).value_counts()

True     518
False     20
Name: short_url, dtype: int64

In [8]:

def get_last_events_ufcstats():
    """
    Retrieves information about completed UFC events from the UFC Stats website.
    Returns a DataFrame with columns: event_name, event_date, event_url, event_id.
    """
    url = 'http://www.ufcstats.com/statistics/events/completed'
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    
    # Extract event links and names
    events = soup.find_all('a', class_='b-link b-link_style_black')
    event_links = [link.get('href') for link in events]
    event_names = [link.text.strip() for link in events]

    # Extract event dates
    event_dates = soup.find_all('span', class_='b-statistics__date')
    event_dates = [date.text.strip() for date in event_dates][1:]

    # Create DataFrame
    df = pd.DataFrame({'event_name': event_names, 'event_date': event_dates, 'event_url': event_links})
    df['event_date'] = pd.to_datetime(df['event_date'])
    df = df.sort_values(by='event_date', ascending=False).reset_index(drop=True)
    df['event_id'] = df['event_url'].apply(lambda x: x.split('/')[-1])
    
    return df 

In [9]:
last_events = get_last_events_ufcstats()
last_events.head()

Unnamed: 0,event_name,event_date,event_url,event_id
0,UFC Fight Night: Almeida vs. Lewis,2023-11-04,http://www.ufcstats.com/event-details/7c4ec656...,7c4ec656d8fcb867
1,UFC 294: Makhachev vs. Volkanovski 2,2023-10-21,http://www.ufcstats.com/event-details/13a0fb8f...,13a0fb8fbdafb54f
2,UFC Fight Night: Yusuff vs. Barboza,2023-10-14,http://www.ufcstats.com/event-details/f3a07827...,f3a078277b3b8ff4
3,UFC Fight Night: Dawson vs. Green,2023-10-07,http://www.ufcstats.com/event-details/c8a49ff2...,c8a49ff2acb6f3c5
4,UFC Fight Night: Fiziev vs. Gamrot,2023-09-23,http://www.ufcstats.com/event-details/c945adc2...,c945adc22c2bfe8f


In [10]:
last_events_urls = last_events['event_id'].unique()

# get events  in last_events that are not in fight_odds_urls
new_events = [event for event in last_events_urls if event not in event_urls]
new_events

[]

### Get all UFC Events Stored at BFO

In [11]:
# use selenium to get the event urls
driver = webdriver.Chrome(path + '/chromedriver')

search_url= 'https://www.google.com/search?q=site%3Abestfightodds.com/events+UFC&sca_esv=576600514&ei=nno5Zbpwscv0A6y5pZAN&ved=0ahUKEwi6t_b5hJKCAxWxJX0KHaxcCdIQ4dUDCBA&uact=5&oq=site%3Abestfightodds.com%2Fevents+UFC'
driver.get(search_url)

# wait 5 seconds
time.sleep(5)

# click 'more results'

### Get ALL UfcStats Events

In [12]:
all_events_page = 'http://ufcstats.com/statistics/events/completed?page=all'
# scrape all_events_page
page = requests.get(all_events_page)
# find table
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find('table', {'class': 'b-statistics__table-events'})
# find all links
links = table.find_all('a')
# create list of links
event_links = []
for link in links:
    event_links.append(link.get('href'))

# create list of link texts
event_titles = []
for link in links:
    event_titles.append(link.text.strip())

# put together event titles and links in df
event_urls = pd.DataFrame({'event_title': event_titles, 'event_url': event_links})

# create list of dates
dates = []
 # dates are b-statistics__date class
date_tags = table.find_all('span', {'class': 'b-statistics__date'})
for date in date_tags:
    dates.append(date.text.strip())

# add dates to df
event_urls['date'] = dates

# get locations
locations = []
 # locations are b-statistics__date class
location_tags = table.find_all('td', {'class': 'b-statistics__table-col b-statistics__table-col_style_big-top-padding'})
for location in location_tags:
    locations.append(location.text.strip())

# add locations to df
event_urls['location'] = locations

# add event_id column
event_urls['event_id'] = event_urls['event_url'].apply(lambda x: x.split('/')[-1])



event_urls


Unnamed: 0,event_title,event_url,date,location,event_id
0,UFC 295: Prochazka vs. Pereira,http://ufcstats.com/event-details/5a558ba1ff5e...,"November 11, 2023","New York City, New York, USA",5a558ba1ff5e9121
1,UFC Fight Night: Almeida vs. Lewis,http://ufcstats.com/event-details/7c4ec656d8fc...,"November 04, 2023","Sao Paulo, Sao Paulo, Brazil",7c4ec656d8fcb867
2,UFC 294: Makhachev vs. Volkanovski 2,http://ufcstats.com/event-details/13a0fb8fbdaf...,"October 21, 2023","Abu Dhabi, Abu Dhabi, United Arab Emirates",13a0fb8fbdafb54f
3,UFC Fight Night: Yusuff vs. Barboza,http://ufcstats.com/event-details/f3a078277b3b...,"October 14, 2023","Las Vegas, Nevada, USA",f3a078277b3b8ff4
4,UFC Fight Night: Dawson vs. Green,http://ufcstats.com/event-details/c8a49ff2acb6...,"October 07, 2023","Las Vegas, Nevada, USA",c8a49ff2acb6f3c5
...,...,...,...,...,...
664,UFC 6: Clash of the Titans,http://ufcstats.com/event-details/1c3f5e85b59e...,"July 14, 1995","Casper, Wyoming, USA",1c3f5e85b59ec710
665,UFC 5: The Return of the Beast,http://ufcstats.com/event-details/dedc3bb440d0...,"April 07, 1995","Charlotte, North Carolina, USA",dedc3bb440d09554
666,UFC 4: Revenge of the Warriors,http://ufcstats.com/event-details/b60391da771d...,"December 16, 1994","Tulsa, Oklahoma, USA",b60391da771deefe
667,UFC 3: The American Dream,http://ufcstats.com/event-details/1a49e0670dfa...,"September 09, 1994","Charlotte, North Carolina, USA",1a49e0670dfaca31


In [13]:
# event_urls.to_csv('data/final/aggregates/ufcstats_event_urls.csv', index=False)

In [14]:
all_ufcstats_event_ids = event_urls['event_id'].unique()

Identify Events to Download

In [15]:
missing_events = last_events[last_events['event_id'].isin(new_events)]
# strip event names
missing_events['event_name'] = missing_events['event_name'].apply(lambda x: x.strip())
missing_events

Unnamed: 0,event_name,event_date,event_url,event_id


Find BestFightOdds url for event

In [16]:
driver = webdriver.Chrome(path + '/chromedriver')

## NOTE: FIGHT ODDS NOT SCRAPING? (Sep21)

- This is going from an event-based search, when I know from experience that BestFightOdds is bad at that and is better as a Fighter-First Search. 

In [18]:
# find the rows where event_odds_url is empty in the fight_odds_urls dataframe
try:
    missing= fight_odds_urls[fight_odds_urls['event_odds_url'].isnull()]
    missing
except:
    print('no missing events')


no missing events


## MUST Scrape Individually from BFO

In [None]:
# # rename bfo_url to event_odds_url
# missing_events = missing_events.rename(columns={'bfo_url': 'event_odds_url'})
# # rename event_name to event_title
# missing_events = missing_events.rename(columns={'event_name': 'event_title'})
# # drop event_id
# missing_events = missing_events.drop(columns=['event_id'])

In [None]:
# # update fight_odds_urls with missing_events
# fight_odds_urls = pd.concat([fight_odds_urls, missing_events], axis=0)
# fight_odds_urls = fight_odds_urls.reset_index(drop=True)
# fight_odds_urls.head(3)

#### Scraping BFO via Requests & Headers

In [None]:
# Must now scrape BFO via Selenium, then grab the tables. 

url = 'https://www.bestfightodds.com/events/ufc-282-2621'
# driver already open
driver.get(url)
# wait 4 seconds
time.sleep(4)
# get the tables
tables = pd.read_html(driver.page_source)
# print table 1
tables[0]





# Following is old, when we used fake headers
# header = {
#   "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
#   "X-Requested-With": "XMLHttpRequest"
# }
# r = requests.get(url, headers = header)
# print
#print(r)
# dfs = pd.read_html(r.text)
# dfs[0]

## Scraping BFO Data

In [None]:
# fight_odds_urls = pd.read_csv('data/final/events/Final_Hand_Done_BFO_Urls_with UFC_Stats_Urls.csv')

In [None]:
all_event_odds_urls = fight_odds_urls.event_odds_url.unique()
len(all_event_odds_urls)

In [None]:
len('https://www.bestfightodds.com/events/')

In [None]:
def get_odds(url):
        driver.get(url)
        # wait 4 seconds
        time.sleep(4)

        # get the tables
        tables = pd.read_html(driver.page_source)
        data = tables[1]
        data.rename(columns={'Unnamed: 0': 'fighter'}, inplace=True)
        data['event_odds_url'] = url
        event_name = url.split('/')[4]
        data['event_id'] = event_name
        data.to_csv('data/final/events/odds_per_event/' + event_name + '.csv', index=False)
        return data



In [None]:
get_odds(all_event_odds_urls[1]).head(2)

In [None]:
# drop nan
all_event_odds_urls = all_event_odds_urls[~pd.isnull(all_event_odds_urls)]
all_event_odds_urls

In [None]:
# Initialize a counter variable to keep track of the number of URLs processed
n = 0

# Initialize lists to store URLs that resulted in errors and those that were processed successfully
errors = []
complete = []

# Get a list of filenames from the 'data/final/events/odds_per_event/' directory
files = os.listdir('data/final/events/odds_per_event/')

# Extract the file identifiers (excluding the file extension) to store them in 'file_list'
file_list = [n[:-4] for n in files]

# Loop through each URL in the list of URLs pointing to event odds pages
for url in all_event_odds_urls:
    # Extract the unique identifier from the URL (excluding the file extension)
    url_file_name = url.split('/')[-1]
    url_file_name2 = url_file_name[:-4]
    
    # Check if the unique identifier is already in the list of processed files
    if url_file_name not in file_list:
        try:
            # Attempt to get odds data from the URL using the 'get_odds' function
            get_odds(url)
            
            # Print a message indicating the progress (number of URLs processed out of total)
            print(f'{n} / {len(all_event_odds_urls)}')
            
            # Increment the counter variable
            n += 1
        except:
            # Print an error message if an exception occurs while getting odds data
            print(f'ERROR! {n} / {len(all_event_odds_urls)}')
            
            # Append the problematic URL to the 'errors' list
            errors.append(url)
            
            # Increment the counter variable
            n += 1
    else:
        # If the unique identifier is already in 'file_list', print a message indicating that it exists
        print(f'{url_file_name} already exists')
        
        # Append the unique identifier to the 'complete' list
        complete.append(url_file_name)

# Print the total number of errors encountered during the execution
print(f' total errors: {len(errors)}')


### Append Together

In [None]:
# Get list of all files in 'data/final/events/odds_per_event/' directory
files = os.listdir('data/final/events/odds_per_event/')

# Create an empty list to store all the event odds data frames
all_event_odds = []

# Loop through each file in the directory
for file in files:
    # Read the csv file and store it as a pandas data frame
    df = pd.read_csv('data/final/events/odds_per_event/' + file)
    # Append the data frame to the list of all event odds data frames
    all_event_odds.append(df)

# Concatenate all the data frames in the list into a single data frame
all_event_odds = pd.concat(all_event_odds)

# Print the shape (dimensions) of the final data frame and display the first few rows
print(all_event_odds.shape)
all_event_odds.head()

Add Event Names & URLs to Event_Odds df

In [None]:
def get_event_title(event_odds_url):
    # first, filter the fight_odds_urls dataframe to only include rows with a matching event_odds_url
    data = fight_odds_urls[fight_odds_urls['event_odds_url'] == event_odds_url]
    # next, extract the event_title value from the filtered dataframe
    event_name = data.event_title.values[0]
    # finally, return the event_title value
    return event_name

def get_ufcstats_url(event_odds_url):
    # similar to the previous function, filter the fight_odds_urls dataframe and extract the event_url value
    data = fight_odds_urls[fight_odds_urls['event_odds_url'] == event_odds_url]
    event_url = data.event_url.values[0]
    # return the event_url value
    return event_url

In [None]:
all_event_odds['event_name'] = all_event_odds.apply(lambda row: get_event_title(row['event_odds_url']), axis=1)

In [None]:
all_event_odds['event_ufcstats_url'] = all_event_odds.apply(lambda row: get_ufcstats_url(row['event_odds_url']), axis=1)

In [None]:
all_event_odds.head(3)

In [None]:
all_event_odds.to_csv('data/final/odds/All_Odds_by_Fighter_V1.csv', index=False)

In [None]:
all_event_odds['5D'] = all_event_odds['5D'].str[:-1]
all_event_odds['5D'] = all_event_odds['5D'].astype(float)
all_event_odds.head(3)

In [None]:
all_event_odds['Ref'] = all_event_odds['Ref'].str[:-1]
all_event_odds['Ref'] = all_event_odds['Ref'].astype(float)
all_event_odds.head(3)

In [None]:
# Choose columns to keep
cols = ['fighter', '5D', 'Ref', 'event_odds_url', 'event_ufcstats_url', 'event_id', 'event_name']
all_event_odds = all_event_odds[cols]
all_event_odds.head()

In [None]:
# get shape
all_event_odds.shape

In [None]:
all_event_odds.to_csv('data/final/odds/All_Odds_by_Fighter_V2.csv', index=False)


In [None]:
all_event_odds.to_csv('data/a_minimal/All_Fight_Odds.csv', index=False)

## Summary
- ALL ODDS by fighter are available at:
  - data/ufc_BestFightOdds/All_Event_Odds_by_Fighter.csv
- You can combine this with UfcStatsData easily using 'event_ufcstats_url' 

# Add All Events Df

Saved in final/events

In [None]:
all_event_odds = pd.read_csv('data/final/odds/All_Odds_by_Fighter_V2.csv')

In [None]:
# events are located in data/ufc_stats/events2

folder = 'data/ufc_stats/events2/'

# aggregate all events into one dataframe
all_events = []

for file in os.listdir(folder):
    df = pd.read_csv(folder + file)
    all_events.append(df)

all_events = pd.concat(all_events)

print(all_events.shape)
all_events.head()

In [None]:
all_events.to_csv('data/final/events/All_Events_V1.csv', index=False)

In [None]:
all_events = pd.read_csv('data/final/events/All_Events_V1.csv')

# Add Odd Changes to Odds

In [None]:
# create a function to extract odds data from a given URL
def get_odds(url):
    
    # add headers to avoid getting blocked while making requests
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest"
    }
    
    # make a GET request to the URL with the provided headers
    r = requests.get(url, headers=headers)
    
    # scrape the HTML tables present in the response content
    dfs = pd.read_html(r.text)
    
    # extract the relevant data from the second table
    data = dfs[1]
    
    # rename the first column as "fighter" for better readability
    data.rename(columns={'Unnamed: 0': 'fighter'}, inplace=True)
    
    # add the event odds URL as a new column in the dataframe
    data['event_odds_url'] = url
    
    # extract the event name from the URL and add it as a new column in the dataframe
    event_name = url.split('/')[4]
    data['event_id'] = event_name
    
    # uncomment the following line to save the extracted data as a CSV file
    # data.to_csv('data/ufc_BestFightOdds/odds_by_event/' + event_name + '.csv', index=False)
    
    # return the dataframe with the extracted odds data
    return data

In [None]:
def get_bfodds(event_url):
    # scrape the changes in odds by event url
    
    driver = webdriver.Chrome()
    driver.get(event_url)

    # click show all
    driver.find_element_by_class_name('event-swing-expand').click()
    #wait 2 seconds
    time.sleep(2)

    event_swing_container = driver.find_element_by_id('event-swing-container')
    event_swing_container.text


    # the first half of the text is the odds changes, the seconds half are the names. Lets match them up. 
    # first split the text by new line
    event_swing_container.text.split('\n')

    # get the names. You can identify them because they will be more than 5 characters long
    names = [x for x in event_swing_container.text.split('\n') if len(x) > 5]
    names

    # get the odds changes
    odds_changes = [x for x in event_swing_container.text.split('\n') if '%' in x]
    odds_changes

    # get the event name
    event_name = driver.find_element_by_class_name('table-header')
    # name is h1 in event_name
    name = event_name.find_element_by_tag_name('h1')
    event = name.text

    # get the url
    url = driver.current_url


    # create a dataframe
    df = pd.DataFrame({'names': names, 'odds_changes': odds_changes, 'event': event, 'event_url': url})
    
    #close the driver
    driver.close()

    return df

In [None]:
all_odds_eventlist = all_event_odds['event_odds_url'].unique().tolist()

In [None]:
# Initialize lists to keep track of completed, failed, and newly completed events
already_complete = []
failed = []
newly_complete = []

# Iterate through all events in the all_odds_eventlist
for url in all_odds_eventlist:
    # Extract the event name from the URL
    event_name = url.split('/')[-1]
    
    # Check if the event has already been completed
    if str(event_name) + '.csv' in os.listdir('data/final/odds/odds_changes/'):
        print('already done with ' + url.split('/')[-1])
        already_complete.append(event_name)
        continue
    else:
        try:
            # Get the odds data for the event
            df = get_bfodds(url)
            name = url.split('/')[-1]
            
            # Save the odds data to a CSV file
            df.to_csv('data/final/odds/odds_changes/' + name + '.csv', index=False)
            newly_complete.append(name)
            print('done with ' + name)
        except:
            # If there's an error, add the event to the failed list
            name = url.split('/')[-1]
            failed.append(name)
            print('failed on ' + name)
            continue

# Print the summary of completed, newly completed, and failed events
print("")
print('already done with ' + str(len(already_complete)) + ' events')
print('newly done with ' + str(len(newly_complete)) + ' events')
print('failed on ' + str(len(failed)) + ' events')

In [None]:
# aggregate all odds changes into one dataframe
all_odds_changes = []

for file in os.listdir('data/final/odds/odds_changes/'):
    df = pd.read_csv('data/final/odds/odds_changes/' + file)
    # append
    all_odds_changes.append(df)

all_odds_changes = pd.concat(all_odds_changes)

print(all_odds_changes.shape)
all_odds_changes.to_csv('data/final/aggregates/All_Odds_Changes_V1.csv', index=False)
all_odds_changes

### Edit Event_odds 

Get rid of alternate lines

In [None]:
all_event_odds = pd.read_csv('data/final/odds/All_Odds_by_Fighter_V2.csv')
all_event_odds.head(3)

In [None]:
all_options = all_event_odds['fighter'].value_counts()
all_options = pd.DataFrame(all_options)
all_options.reset_index(inplace=True)
delete_options = all_options.head(120)
delete_options

In [None]:
delete_options_list = delete_options['index'].tolist()

In [None]:
# delete rows when fighter is in delete_options_list
all_event_odds = all_event_odds[~all_event_odds['fighter'].isin(delete_options_list)]
all_event_odds.head(2)

In [None]:
# delete any rows where fighter has more than 3 words
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('round')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('decision')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('submission')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('knockout')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('draw')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('scorecards')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('majority')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('Not')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('wins')]
all_event_odds = all_event_odds[~all_event_odds['fighter'].str.contains('losses')]
all_event_odds


# Add Odds Change to Odds

In [None]:
def get_odds_change_from_db(event_odds_url, fighter):
    """
    Get the odds change from the all_odds_changes dataframe.
    """

    try:
        odds_change = all_odds_changes[(all_odds_changes['event_url'] == event_odds_url)
                                       & (all_odds_changes['names'] == fighter)]['odds_changes'].values[0]
        
        return odds_change
    
    except:
        return np.nan

In [None]:
# test
get_odds_change_from_db('https://www.bestfightodds.com/events/ufc-100-137', 'Brock Lesnar')

In [None]:
# Create odds_change column
all_event_odds["odds_change"] = all_event_odds.apply(
    lambda row: get_odds_change_from_db(row["event_odds_url"], row["fighter"]),
    axis=1
)

In [None]:
# get rid of %
all_event_odds['odds_change'] = all_event_odds['odds_change'].str.replace('%', '')
# turn into float
all_event_odds['odds_change'] = all_event_odds['odds_change'].astype(float)
all_event_odds.head(3)

In [None]:
# save to csv
all_event_odds.to_csv('data/final/odds/All_Odds_by_Fighter_WithChange.csv', index=False)

In [None]:
print(f' This was last run: {datetime.datetime.now()}')