# Roger Federer Match Length History Pilot Analysis

Now this is exciting ! We will now go beyond analysing generalized (ie. non player specific) match length data and take a look at a specific player : the GOAT Roger Federer. There are several reasons for this choice (many matches at the top level, long career spanning more than two decades, evolution in terms of playstyle) but mainly Federer is my tennis hero.

The goal here is to obtain a series of weighted variables which would allow us to predict the length of a Roger Federer match against a given opponent, under a given set of match conditions.

In [22]:
### IMPORTS ###

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [45]:
### CLEAN FEDERER MATCHES TABLES ###

atp = pd.read_csv("atp_cat.csv")

fed_won = atp[atp["winner_name"] == "Roger Federer"]    # 1163 wins
fed_lost = atp[atp["loser_name"] == "Roger Federer"]    # 261 loses (82% winrate)

# We'll drop all of Federer's information except his age
fed_won = fed_won.drop(labels=["winner_name", "winner_hand", "winner_ht", "winner_ioc"], axis=1)
fed_won = fed_won.rename(columns={"winner_age": "fed_age", "winner_rank": "fed_rank", "winner_rank_points": "fed_rank_points",
                                  "loser_name":"opp_name", "loser_hand":"opp_hand", "loser_ht":"opp_ht", "loser_ioc":"opp_ioc", "loser_age":"opp_age",
                                  "loser_rank": "opp_rank", "loser_rank_points": "opp_rank_points"})
fed_won["fed_won"] = "1"

fed_lost = fed_lost.drop(labels=["loser_name", "loser_hand", "loser_ht", "loser_ioc"], axis=1)
fed_lost = fed_lost.rename(columns={"loser_age": "fed_age", "loser_rank": "fed_rank", "loser_rank_points": "fed_rank_points",
                                    "winner_name":"opp_name", "winner_hand":"opp_hand", "winner_ht":"opp_ht", "winner_ioc":"opp_ioc", "winner_age":"opp_age",
                                    "winner_rank": "opp_rank", "winner_rank_points": "opp_rank_points"})
fed_lost["fed_won"] = "0"

fed = pd.concat([fed_won, fed_lost])
fed.head(5)    # Clean Table of all Roger Federer ATP matches

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=True'.




Unnamed: 0.1,Unnamed: 0,best_of,fed_age,fed_rank,fed_rank_points,fed_won,minutes,opp_age,opp_hand,opp_ht,opp_ioc,opp_name,opp_rank,opp_rank_points,round,score,surface,tourney_date,tourney_level,tourney_name
24932,3086,3,17.138946,878.0,9.0,1,60.0,28.618754,R,180.0,FRA,Guillaume Raoux,45.0,859.0,R32,6-2 6-2,Hard,19980928,A,Toulouse
24941,3095,3,17.138946,878.0,9.0,1,85.0,28.418891,R,196.0,AUS,Richard Fromberg,43.0,927.0,R16,6-1 7-6(5),Hard,19980928,A,Toulouse
25666,251,3,17.483915,243.0,173.0,1,113.0,22.431211,R,190.0,ESP,Carlos Moya,5.0,3178.0,R32,7-6(1) 3-6 6-3,Hard,19990201,A,Marseille
25682,267,3,17.483915,243.0,173.0,1,140.0,25.396304,L,188.0,FRA,Jerome Golmard,63.0,743.0,R16,6-7(6) 7-6(5) 7-6(5),Hard,19990201,A,Marseille
25849,479,3,17.522245,178.0,262.0,1,149.0,29.002053,R,180.0,FRA,Guillaume Raoux,71.0,691.0,R32,6-7(4) 7-5 7-6(3),Carpet,19990215,A,Rotterdam


## Federer Generalities

### Surface

In [47]:
fed_surface = pd.pivot_table(fed, values="minutes", index = ["best_of", "opp_hand"], columns=["surface"]) # average duration per surface
display(fed_surface)

Unnamed: 0_level_0,surface,Carpet,Clay,Grass,Hard
best_of,opp_hand,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,L,61.4,95.363636,72.9,82.361111
3,R,93.1875,88.39375,89.388235,89.526678
5,L,,161.625,131.315789,131.05
5,R,170.75,134.097561,124.031579,130.214634
