# UFC Data Analysis

## Introduction:
### &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The UFC (Ultimate Fighting Champion) is a sports promotion company focused on the organization and hosting of professional mixed martial arts (MMA) fights globally. Founded in 1993, the UFC is currently the largest MMA promotion company in the world by revenue and paid pay-per-view events. The events are broadcast in over 165 countries reaching over 1.1 billion households worldwide. 

### &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; There have been over 500 UFC events held since the companies founding and Kaggle user Rajeev Warrier painstakingly collected data from every single fight in those events. Recording data for dates, fighters, referees, win type (knockout vs. decision), winning streaks, and many more statistics oficially sourced from the UFC’s records. In this project we will be using Warriers raw dataset and using data cleaning and visualization techniques using python libraries such as ____________. We will also be performing statistical analysis and a machine learning algorithm which utilizes a gradient descent model to predict winners given certain prerequisite statistics.

## Importing Data:

In [1]:
import pandas as pd
raw_data = pd.read_csv('raw_total_fight_data.csv',sep = ";")

## Cleaning Data: Separating the attempts landed and the total attempts.

In [2]:
## Significant strikes

red_significant_strikes = raw_data['R_SIG_STR.']
blue_significant_strikes = raw_data['B_SIG_STR.']

red_landed = []
red_attempted = []
blue_landed = []
blue_attempted = []

for i in red_significant_strikes:
    split = i.split()
    red_landed.append(int(split[0]))
    red_attempted.append(int(split[2]))

for i in blue_significant_strikes:
    split = i.split()
    blue_landed.append(int(split[0]))
    blue_attempted.append(int(split[2]))

raw_data['R_SIG_STR_landed'] = red_landed
raw_data['R_SIG_STR_att'] = red_attempted
raw_data['B_SIG_STR_landed'] = blue_landed
raw_data['B_SIG_STR_att'] = blue_attempted

#####

red_total_strikes = raw_data['R_TOTAL_STR.']
blue_total_strikes = raw_data['B_TOTAL_STR.']

red_landed = []
red_attempted = []
blue_landed = []
blue_attempted = []

for i in red_total_strikes:
    split = i.split()
    red_landed.append(int(split[0]))
    red_attempted.append(int(split[2]))

for i in blue_total_strikes:
    split = i.split()
    blue_landed.append(int(split[0]))
    blue_attempted.append(int(split[2]))

raw_data['R_TOTAL_STR_landed'] = red_landed
raw_data['R_TOTAL_STR_att'] = red_attempted
raw_data['B_TOTAL_STR_landed'] = blue_landed
raw_data['B_TOTAL_STR_att'] = blue_attempted

raw_data.drop(['R_SIG_STR.', 'B_SIG_STR.', 'R_SIG_STR_pct', 'B_SIG_STR_pct', 'R_TOTAL_STR.', 'B_TOTAL_STR.',
              'R_TD_pct', 'B_TD_pct'], 
              axis=1, inplace=True)


## Takedowns


red_takedowns = raw_data['R_TD']
blue_takedowns = raw_data['B_TD']

red_landed = []
red_attempted = []
blue_landed = []
blue_attempted = []

for i in red_takedowns:
    split = i.split()
    red_landed.append(int(split[0]))
    red_attempted.append(int(split[2]))

for i in blue_takedowns:
    split = i.split()
    blue_landed.append(int(split[0]))
    blue_attempted.append(int(split[2]))

raw_data['R_TD_landed'] = red_landed
raw_data['R_TD_att'] = red_attempted
raw_data['B_TD_landed'] = blue_landed
raw_data['B_TD_att'] = blue_attempted

raw_data.drop(['R_TD', 'B_TD'],axis=1, inplace=True)


## The rest of the needed columns

red_head_str = raw_data.iloc[: , 10]
col_names = list(raw_data.columns)[10:22]

# Dropping rows with missing data
raw_data.dropna(how="any", axis = 0, inplace = True)

# Cleaning the rest of the columns with a loop
for i in range(len(col_names)):
    index = i + 10
    col = raw_data.iloc[:, index]
    col_name = col_names[i]
    attempted = []
    landed = [] 
    for i in col:
        split = i.split()
        landed.append(int(split[0]))
        attempted.append(int(split[2]))
    title1 = col_name + "_landed"
    title2 = col_name + "_att"
    raw_data[title1] = landed
    raw_data[title2] = attempted

raw_data.drop(col_names,axis=1, inplace=True)

## Cleaning Data: Adding the gender of fighters for each fight.

In [3]:
fight_type = raw_data['Fight_type']

gender = []

for s in fight_type:
    if 'Women' in s:
        gender.append('F')
    else:
        gender.append('M')

raw_data['Gender'] = gender

## Cleaning Data: Cleaning the Winner column.
### Red fighter winning is represented as 1
### Blue fighter winning is represented as 0

In [4]:
# Cleaning the Winner column (red fighter = 1; blue fighter = 0)
new_winners = []

raw_data = raw_data.reset_index()

red_fighters = raw_data['R_fighter']
blue_fighters = raw_data['B_fighter']
winners = raw_data['Winner']

for i in range(0, len(winners)):
    if (red_fighters[i] == winners[i]):
        new_winners.append(1)
    elif (blue_fighters[i] == winners[i]):
        new_winners.append(0)

raw_data['Winner'] = new_winners

## Displaying the Dataframe:

In [5]:
raw_data

Unnamed: 0,index,R_fighter,B_fighter,R_KD,B_KD,R_SUB_ATT,B_SUB_ATT,R_PASS,B_PASS,R_REV,...,B_DISTANCE_att,R_CLINCH_landed,R_CLINCH_att,B_CLINCH_landed,B_CLINCH_att,R_GROUND_landed,R_GROUND_att,B_GROUND_landed,B_GROUND_att,Gender
0,0,Henry Cejudo,Marlon Moraes,0,0,1,0,1,0,0,...,116,19,23,2,2,26,30,1,1,M
1,1,Valentina Shevchenko,Jessica Eye,1,0,1,0,3,0,0,...,12,2,2,0,0,1,1,0,0,F
2,2,Tony Ferguson,Donald Cerrone,0,0,0,0,0,0,0,...,184,1,2,0,1,0,0,0,0,M
3,3,Jimmie Rivera,Petr Yan,0,2,0,0,0,1,0,...,167,9,15,10,12,4,4,4,10,M
4,4,Tai Tuivasa,Blagoy Ivanov,0,1,0,0,0,0,0,...,111,14,18,5,6,0,0,6,6,M
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5034,5139,Gerard Gordeau,Kevin Rosier,1,0,0,0,0,0,0,...,3,0,0,0,0,6,9,0,0,M
5035,5140,Ken Shamrock,Patrick Smith,0,0,2,0,0,0,0,...,1,0,0,1,1,1,1,2,6,M
5036,5141,Royce Gracie,Art Jimmerson,0,0,0,0,2,0,0,...,0,0,0,0,0,0,0,0,0,M
5037,5142,Kevin Rosier,Zane Frazier,2,0,0,0,0,0,0,...,7,4,9,10,19,7,8,2,2,M


## Displaying the Column Names:

In [6]:
raw_data.columns

Index(['index', 'R_fighter', 'B_fighter', 'R_KD', 'B_KD', 'R_SUB_ATT',
       'B_SUB_ATT', 'R_PASS', 'B_PASS', 'R_REV', 'B_REV', 'win_by',
       'last_round', 'last_round_time', 'Format', 'Referee', 'date',
       'location', 'Fight_type', 'Winner', 'R_SIG_STR_landed', 'R_SIG_STR_att',
       'B_SIG_STR_landed', 'B_SIG_STR_att', 'R_TOTAL_STR_landed',
       'R_TOTAL_STR_att', 'B_TOTAL_STR_landed', 'B_TOTAL_STR_att',
       'R_TD_landed', 'R_TD_att', 'B_TD_landed', 'B_TD_att', 'R_HEAD_landed',
       'R_HEAD_att', 'B_HEAD_landed', 'B_HEAD_att', 'R_BODY_landed',
       'R_BODY_att', 'B_BODY_landed', 'B_BODY_att', 'R_LEG_landed',
       'R_LEG_att', 'B_LEG_landed', 'B_LEG_att', 'R_DISTANCE_landed',
       'R_DISTANCE_att', 'B_DISTANCE_landed', 'B_DISTANCE_att',
       'R_CLINCH_landed', 'R_CLINCH_att', 'B_CLINCH_landed', 'B_CLINCH_att',
       'R_GROUND_landed', 'R_GROUND_att', 'B_GROUND_landed', 'B_GROUND_att',
       'Gender'],
      dtype='object')

## Data Visualization: 

In [None]:
#notes

#number of fights by weight class
#number of fights by year

#idk how to explain - but similar to cancer cells
#x-axis is 
