## Overview of Data Cleaning and Wranging

Goals of this 1st workbook in the ATP men's singles match outcome, at the level of % points won per player, prediction project:

* Import raw data across a range of years from the Association of Tennis Professionals website (via Jeff Sackmann's Github)
* Concatenate across years, clean/error correct, add missing data to existing fields, and add some additional key descriptive features (eg, country of match, indoor vs outdoor, unified chronological time base) to overall raw data set
* Prepare cleaned and expanded data set (years 2012-2019 currently) for export to feature development stage, and then export to that stage (see Workbook 2)
* Currently, a data inclusion date range of 2012-2019 is delimited (this has been shaped by feedback from EDA and modeling), and matches played on grass (too low a sample size; also removed Davis up and Olympics matches for same reason as well as for their "odd" contexts) and matches where one player withdrew (usually for injury reason) either before the match or early on in the match were filtered out and NOT included in feature generation. Critically, matches filtered out beyond this point (see EDA and Modeling workbooks) WERE used for initial feature generation/accrual.
* UPDATE: For modeling clay court tennis specifically, where the sample is much smaller (by ~3-4x) over the same time frame as for hard court tennis, the sample has been expanded over 3 additional years (2009-2019). 


### Imports

In [1]:
from urllib.request import urlretrieve
import pandas as pd
import numpy as np
#import matplotlib.pyplot as plt
#import seaborn as sns
import os
import warnings
warnings.filterwarnings('ignore')

### Load Data

In [2]:
#url ='https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_2006.csv'
#urlretrieve(url,'atp_matches_2006.csv')

In [3]:
# Jeff Sackmann's raw data
# Several rounds of model building have empirically shown a 5 year window for modeling, with an additional 3 years of 
# retrospective stats accrual on the front end yields the best predictions (across a range of minimum player matches played threshold)

df_2019 = pd.read_csv('../data/atp_matches_2019_jnr.csv')
df_2018 = pd.read_csv('../data/atp_matches_2018_jnr.csv')
df_2017 = pd.read_csv('../data/atp_matches_2017_jnr.csv')
df_2016 = pd.read_csv('../data/atp_matches_2016_jnr.csv')
df_2015 = pd.read_csv('../data/atp_matches_2015_jnr.csv')
df_2014 = pd.read_csv('../data/atp_matches_2014_jnr.csv')
df_2013 = pd.read_csv('../data/atp_matches_2013_jnr.csv')
df_2012 = pd.read_csv('../data/atp_matches_2012_jnr.csv')
#df_2011 = pd.read_csv('../data/atp_matches_2011_jnr.csv')
#df_2010 = pd.read_csv('../data/atp_matches_2010_jnr.csv')
#df_2009 = pd.read_csv('../data/atp_matches_2009_jnr.csv')
#df_2008 = pd.read_csv('../data/atp_matches_2008_jnr.csv')
#df_2007 = pd.read_csv('../data/atp_matches_2007_jnr.csv')
#df_2006 = pd.read_csv('../data/atp_matches_2006_jnr.csv')

In [4]:
df = pd.concat([df_2019, df_2018, df_2017, df_2016, df_2015, df_2014, df_2013, df_2012], ignore_index=True)
df.head(20) 

Unnamed: 0,tourney_id,tourney_name,surface,draw_size,tourney_level,tourney_date,match_num,winner_id,winner_seed,winner_entry,...,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced,winner_rank,winner_rank_points,loser_rank,loser_rank_points
0,2019-6242,Winston-Salem,Hard,64,A,20190819,276,126207,10.0,,...,9.0,5.0,1.0,3.0,2.0,4.0,52.0,1015.0,74.0,815.0
1,2019-0301,Auckland,Hard,32,A,20190107,291,105807,4.0,,...,9.0,5.0,2.0,1.0,1.0,1.0,24.0,1705.0,124.0,460.0
2,2019-0410,Monte Carlo Masters,Clay,64,M,20190415,242,106065,11.0,,...,3.0,1.0,2.0,2.0,0.0,2.0,16.0,2021.0,54.0,930.0
3,2019-0337,Vienna,Hard,32,A,20191021,297,106233,1.0,,...,18.0,6.0,1.0,3.0,1.0,4.0,5.0,5085.0,34.0,1330.0
4,2019-0352,Paris Masters,Hard,64,M,20191028,251,133430,,,...,14.0,6.0,3.0,2.0,3.0,4.0,28.0,1460.0,53.0,990.0
5,2019-9164,Zhuhai,Hard,32,A,20190923,279,200175,,,...,18.0,10.0,1.0,4.0,3.0,5.0,50.0,1029.0,59.0,983.0
6,2019-6242,Winston-Salem,Hard,64,A,20190819,279,134770,12.0,,...,13.0,8.0,4.0,3.0,2.0,4.0,54.0,976.0,99.0,535.0
7,2019-6242,Winston-Salem,Hard,64,A,20190819,289,126207,10.0,,...,13.0,6.0,7.0,4.0,1.0,3.0,52.0,1015.0,51.0,1018.0
8,2019-M007,Miami Masters,Hard,128,M,20190318,191,105430,,Q,...,7.0,4.0,6.0,6.0,0.0,4.0,46.0,971.0,50.0,956.0
9,2019-0439,Umag,Clay,32,A,20190715,293,105882,,,...,21.0,11.0,3.0,5.0,0.0,3.0,105.0,543.0,9.0,2785.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23298 entries, 0 to 23297
Data columns (total 49 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   tourney_id          23298 non-null  object 
 1   tourney_name        23298 non-null  object 
 2   surface             23298 non-null  object 
 3   draw_size           23298 non-null  int64  
 4   tourney_level       23298 non-null  object 
 5   tourney_date        23298 non-null  int64  
 6   match_num           23298 non-null  int64  
 7   winner_id           23298 non-null  int64  
 8   winner_seed         10087 non-null  float64
 9   winner_entry        2947 non-null   object 
 10  winner_name         23298 non-null  object 
 11  winner_hand         23291 non-null  object 
 12  winner_ht           22245 non-null  float64
 13  winner_ioc          23298 non-null  object 
 14  winner_age          23297 non-null  float64
 15  loser_id            23298 non-null  int64  
 16  lose

In [6]:
df.tail()

Unnamed: 0,tourney_id,tourney_name,surface,draw_size,tourney_level,tourney_date,match_num,winner_id,winner_seed,winner_entry,...,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced,winner_rank,winner_rank_points,loser_rank,loser_rank_points
23293,2012-605,Tour Finals,Hard,8,F,20121105,515,104925,1.0,,...,57.0,35.0,19.0,12.0,3.0,7.0,1.0,11420.0,2.0,9465.0
23294,2012-M-DC-2012-WG-M-ESP-CZE-01,Davis Cup WG F: ESP vs CZE,Hard,4,D,20121116,1,103970,,,...,,,,,,,5.0,6430.0,37.0,1060.0
23295,2012-M-DC-2012-WG-M-ESP-CZE-01,Davis Cup WG F: ESP vs CZE,Hard,4,D,20121116,2,104607,,,...,,,,,,,6.0,4605.0,11.0,2515.0
23296,2012-M-DC-2012-WG-M-ESP-CZE-01,Davis Cup WG F: ESP vs CZE,Hard,4,D,20121116,4,103970,,,...,,,,,,,5.0,6430.0,6.0,4605.0
23297,2012-M-DC-2012-WG-M-ESP-CZE-01,Davis Cup WG F: ESP vs CZE,Hard,4,D,20121116,5,103285,,,...,,,,,,,37.0,1060.0,11.0,2515.0


### Initial DF Housekeeping

In [7]:
# Rename useful columns with more brevity
df.rename(columns = {'round':'t_round','best_of':'m_best_of','match_num':'m_num','tourney_id':'t_id','tourney_name':'t_name','surface':'t_surf','draw_size':'t_draw_size','tourney_level':'t_lvl','tourney_date':'t_date','winner_id':'w_id','winner_seed':'w_sd','winner_entry':'w_ent','winner_name':'w_name','winner_ht':'w_ht','winner_hand':'w_hd','winner_ioc':'w_ioc','winner_age':'w_age', 'loser_id':'l_id','loser_seed':'l_sd','loser_entry':'l_ent','loser_name':'l_name','loser_hand':'l_hd','loser_ht':'l_ht','loser_ioc':'l_ioc','loser_age':'l_age','score':'m_score','minutes':'m_time(m)','winner_rank':'w_rank','winner_rank_points':'w_rank_pts','loser_rank':'l_rank','loser_rank_points':'l_rank_pts'}, inplace=True)

In [8]:
df = df.drop(['w_df','w_1stIn','l_df','l_1stIn','w_sd','l_sd'],axis=1)
#df = df.drop(['w_bpSaved','w_bpFaced','w_ace','w_df','w_1stIn','w_bpSaved','w_bpFaced','w_bpSaved','w_bpFaced','l_ace','l_df','l_1stIn','l_bpSaved','l_bpFaced','w_sd','l_sd'],axis=1)

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23298 entries, 0 to 23297
Data columns (total 43 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   t_id         23298 non-null  object 
 1   t_name       23298 non-null  object 
 2   t_surf       23298 non-null  object 
 3   t_draw_size  23298 non-null  int64  
 4   t_lvl        23298 non-null  object 
 5   t_date       23298 non-null  int64  
 6   m_num        23298 non-null  int64  
 7   w_id         23298 non-null  int64  
 8   w_ent        2947 non-null   object 
 9   w_name       23298 non-null  object 
 10  w_hd         23291 non-null  object 
 11  w_ht         22245 non-null  float64
 12  w_ioc        23298 non-null  object 
 13  w_age        23297 non-null  float64
 14  l_id         23298 non-null  int64  
 15  l_ent        4908 non-null   object 
 16  l_name       23298 non-null  object 
 17  l_hd         23261 non-null  object 
 18  l_ht         21082 non-null  float64
 19  l_io

In [10]:
# Remove Davis Cup and Grass Court tournaments, and matches resulting in Walkover. Also, restricting to only matches where at least 12 games were completed.
df = df[~df['t_name'].str.contains("Olympics")]
df = df[~df['t_name'].str.contains("Davis Cup")]
df = df[~df['t_surf'].str.contains("Grass")]
df = df[~df['m_score'].str.contains("W/O")]
#df = df[~df['m_score'].str.contains("RET")]
df = df[(df['w_SvGms'] + df['l_SvGms'] >= 12)]

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18168 entries, 11 to 23293
Data columns (total 43 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   t_id         18168 non-null  object 
 1   t_name       18168 non-null  object 
 2   t_surf       18168 non-null  object 
 3   t_draw_size  18168 non-null  int64  
 4   t_lvl        18168 non-null  object 
 5   t_date       18168 non-null  int64  
 6   m_num        18168 non-null  int64  
 7   w_id         18168 non-null  int64  
 8   w_ent        2567 non-null   object 
 9   w_name       18168 non-null  object 
 10  w_hd         18168 non-null  object 
 11  w_ht         17759 non-null  float64
 12  w_ioc        18168 non-null  object 
 13  w_age        18168 non-null  float64
 14  l_id         18168 non-null  int64  
 15  l_ent        4327 non-null   object 
 16  l_name       18168 non-null  object 
 17  l_hd         18168 non-null  object 
 18  l_ht         17095 non-null  float64
 19  l_i

In [12]:
df["w_ent"].unique(), df["l_ent"].unique()

(array(['Q', nan, 'SE', 'WC', 'LL', 'PR', 'Alt', 'ALT'], dtype=object),
 array(['PR', 'Q', 'LL', nan, 'SE', 'WC', 'Alt', 'ALT', 'S'], dtype=object))

In [13]:
df["w_ent"]

11         Q
14       NaN
15       NaN
16         Q
18       NaN
        ... 
23289    NaN
23290    NaN
23291    NaN
23292    NaN
23293    NaN
Name: w_ent, Length: 18168, dtype: object

In [14]:
# Convert tourny entry type to hierarchy (and to numeric type). 
# Intuition is backed up by EDA (scatterplot spread here) in terms of hierarchy.
df["w_ent"] = df["w_ent"].fillna(5) #regular entry (ie, got in based on ranking)
df.loc[((df["w_ent"] == "PR") | (df["w_ent"] == "WC") | (df["w_ent"] == "SE") | (df["w_ent"] == "Alt") | (df["w_ent"] == "ALT")), "w_ent"] = 4 #all got direct entry, but for extenuating circumstances (ie, not based on current ranking)
df.loc[(df["w_ent"] == "Q"), "w_ent"] = 2.5 #qualifier (ie, had to play matches on site to win way in)
df.loc[(df["w_ent"] == "LL"), "w_ent"] = 2 #lucky loser (ie, also had to play matches on site, but lost final qualifying match)

df["l_ent"] = df["l_ent"].fillna(5) #regular entry (ie, got in based on ranking)
df.loc[((df["l_ent"] == "PR") | (df["l_ent"] == "WC") | (df["l_ent"] == "SE") | (df["l_ent"] == "S") | (df["l_ent"] == "Alt") | (df["l_ent"] == "ALT")), "l_ent"] = 4 #all got direct entry, but for extenuating circumstances (ie, not based on current ranking)
df.loc[(df["l_ent"] == "Q"), "l_ent"] = 2.5 #qualifier (ie, had to play matches on site to win way in)
df.loc[(df["l_ent"] == "LL"), "l_ent"] = 2 #lucky loser (ie, also had to play matches on site, but lost final qualifying match)


In [15]:
df["w_ent"].unique()

array([2.5, 5, 4, 2], dtype=object)

In [16]:
df["l_ent"].unique()

array([4, 2.5, 2, 5], dtype=object)

In [17]:
df.head()

Unnamed: 0,t_id,t_name,t_surf,t_draw_size,t_lvl,t_date,m_num,w_id,w_ent,w_name,...,l_svpt,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced,w_rank,w_rank_pts,l_rank,l_rank_pts
11,2019-6242,Winston-Salem,Hard,64,A,20190819,267,106216,2.5,Bjorn Fratangelo,...,38.0,8.0,2.0,6.0,5.0,11.0,119.0,467.0,,
14,2019-7485,Antwerp,Hard,32,A,20191014,282,105526,5.0,Jan Lennard Struff,...,56.0,16.0,9.0,8.0,2.0,7.0,41.0,1155.0,84.0,638.0
15,2019-0375,Montpellier,Hard,32,A,20190204,278,104871,5.0,Jeremy Chardy,...,41.0,11.0,6.0,7.0,2.0,7.0,35.0,1155.0,124.0,451.0
16,2019-7290,Estoril,Clay,32,A,20190429,289,200221,2.5,Alejandro Davidovich Fokina,...,46.0,15.0,2.0,8.0,4.0,10.0,167.0,326.0,42.0,1040.0
18,2019-M020,Brisbane,Hard,32,A,20181231,278,105683,5.0,Milos Raonic,...,43.0,16.0,2.0,7.0,4.0,8.0,18.0,1855.0,67.0,780.0


In [18]:
# Convert Tournament Level to a numeric hierarchy
df.loc[(df["t_lvl"] == "G"), "t_lvl"] = 4 #Grand Slams
df.loc[(df["t_lvl"] == "F"), "t_lvl"] = 3 #Tour Finals
df.loc[(df["t_lvl"] == "M"), "t_lvl"] = 2 #Masters Series
df.loc[(df["t_lvl"] == "A"), "t_lvl"] = 1 #Regular Tour Events

In [19]:
df["t_lvl"].unique()

array([1, 4, 2, 3], dtype=object)

In [20]:
# Ages reported with too much precision in the original file
df["w_age"] = df["w_age"].round(2)
df["l_age"] = df["l_age"].round(2)

In [21]:
# Let's improve Jeff's match_num variable too, as it is not unique per match. 
# These will NOT be accurate chronological markers of match sequence, but we can still use to re-merge by match later in the workstream as needed (we will split matches into rows per-player/per-match below first) 
df["m_num"] = df.index
#df["m_num"]

In [22]:
# Jeff's tourney id column is useful, but somewhat random is it's assignments. It will be useful for computing numerous features
#to have a seqeuential through the entire sample 'tour_week' column. Let's build a version of this just for this small dataset here.
df["tour_wk"] = ""

In [23]:
# Create proper tour week for each tourney in the sample
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Pune")), "tour_wk"] = "2019_01"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2019_02"       
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2019_03"       
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Cordoba")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Sofia")), "tour_wk"] = "2019_04"     
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "New York")), "tour_wk"] = "2019_05" 
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Delray Beach")|(df["t_name"] == "Marseille")|(df["t_name"] == "Rio de Janeiro")), "tour_wk"] = "2019_06"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Sao Paulo")), "tour_wk"] = "2019_07"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2019_08"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2019_09"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Marrakech")), "tour_wk"] = "2019_10"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2019_11"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Budapest")), "tour_wk"] = "2019_12"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich")), "tour_wk"] = "2019_13"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2019_14"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2019_15"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Geneva")|(df["t_name"] == "Lyon")), "tour_wk"] = "2019_16"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2019_17"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Bastad")|(df["t_name"] == "Umag")), "tour_wk"] = "2019_18"
df.loc[(df["t_id"].str.contains("2019")) & (((df["t_name"] == "Atlanta")|(df["t_name"] == "Gstaad")|(df["t_name"] == "Hamburg"))), "tour_wk"] = "2019_19"
df.loc[(df["t_id"].str.contains("2019")) & (((df["t_name"] == "Kitzbuhel")|(df["t_name"] == "Los Cabos")|(df["t_name"] == "Washington"))), "tour_wk"] = "2019_20"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2019_21"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2019_22"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2019_23"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2019_24"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Metz")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2019_25"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Chengdu")|(df["t_name"] == "Zhuhai")), "tour_wk"] = "2019_26"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2019_27"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2019_28"
df.loc[(df["t_id"].str.contains("2019")) & (((df["t_name"] == "Antwerp")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2019_29"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Vienna")), "tour_wk"] = "2019_30"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2019_31"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2019_32"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_name"] == "NextGen Finals")), "tour_wk"] = "2019_33"

df.loc[(df["t_id"].str.contains("2018")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Pune"))), "tour_wk"] = "2018_01"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2018_02"       
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2018_03"       
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Quito")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Sofia")), "tour_wk"] = "2018_04"     
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "New York")), "tour_wk"] = "2018_05" 
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Delray Beach")|(df["t_name"] == "Marseille")|(df["t_name"] == "Rio de Janeiro")), "tour_wk"] = "2018_06"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Sao Paulo")), "tour_wk"] = "2018_07"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2018_08"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2018_09"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Marrakech")), "tour_wk"] = "2018_10"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2018_11"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Budapest")), "tour_wk"] = "2018_12"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Istanbul")), "tour_wk"] = "2018_13"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2018_14"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2018_15"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Geneva")|(df["t_name"] == "Lyon")), "tour_wk"] = "2018_16"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2018_17"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Bastad")|(df["t_name"] == "Umag")), "tour_wk"] = "2018_18"
df.loc[(df["t_id"].str.contains("2018")) & (((df["t_name"] == "Atlanta")|(df["t_name"] == "Gstaad")|(df["t_name"] == "Hamburg"))), "tour_wk"] = "2018_19"
df.loc[(df["t_id"].str.contains("2018")) & (((df["t_name"] == "Kitzbuhel")|(df["t_name"] == "Los Cabos")|(df["t_name"] == "Washington"))), "tour_wk"] = "2018_20"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2018_21"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2018_22"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2018_23"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2018_24"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Metz")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2018_25"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Chengdu")|(df["t_name"] == "Shenzhen")), "tour_wk"] = "2018_26"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2018_27"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2018_28"
df.loc[(df["t_id"].str.contains("2018")) & (((df["t_name"] == "Antwerp")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2018_29"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Vienna")), "tour_wk"] = "2018_30"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2018_31"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "NextGen Finals")), "tour_wk"] = "2018_32"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2018_33"

df.loc[(df["t_id"].str.contains("2017")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2017_01"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2017_02"       
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2017_03"       
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Quito")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Sofia")), "tour_wk"] = "2017_04"     
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Memphis")), "tour_wk"] = "2017_05" 
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Delray Beach")|(df["t_name"] == "Marseille")|(df["t_name"] == "Rio de Janeiro")), "tour_wk"] = "2017_06"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Sao Paulo")), "tour_wk"] = "2017_07"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2017_08"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2017_09"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Marrakech")), "tour_wk"] = "2017_10"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2017_11"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Budapest")), "tour_wk"] = "2017_12"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Istanbul")), "tour_wk"] = "2017_13"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2017_14"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2017_15"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Geneva")|(df["t_name"] == "Lyon")), "tour_wk"] = "2017_16"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2017_17"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Bastad")|(df["t_name"] == "Umag")), "tour_wk"] = "2017_18"
df.loc[(df["t_id"].str.contains("2017")) & (((df["t_name"] == "Atlanta")|(df["t_name"] == "Gstaad")|(df["t_name"] == "Hamburg"))), "tour_wk"] = "2017_19"
df.loc[(df["t_id"].str.contains("2017")) & (((df["t_name"] == "Kitzbuhel")|(df["t_name"] == "Los Cabos")|(df["t_name"] == "Washington"))), "tour_wk"] = "2017_20"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2017_21"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2017_22"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2017_23"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2017_24"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Metz")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2017_25"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Chengdu")|(df["t_name"] == "Shenzhen")), "tour_wk"] = "2017_26"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2017_27"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2017_28"
df.loc[(df["t_id"].str.contains("2017")) & (((df["t_name"] == "Antwerp")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2017_29"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Vienna")), "tour_wk"] = "2017_30"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2017_31"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2017_32"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_name"] == "NextGen Finals")), "tour_wk"] = "2017_33"

df.loc[(df["t_id"].str.contains("2016")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2016_01"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2016_02"       
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2016_03"       
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Quito")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Sofia")), "tour_wk"] = "2016_04"     
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Memphis")), "tour_wk"] = "2016_05" 
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Delray Beach")|(df["t_name"] == "Marseille")|(df["t_name"] == "Rio de Janeiro")), "tour_wk"] = "2016_06"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Sao Paulo")), "tour_wk"] = "2016_07"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2016_08"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2016_09"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Marrakech")), "tour_wk"] = "2016_10"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2016_11"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2016_12"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Istanbul")), "tour_wk"] = "2016_13"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2016_14"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2016_15"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Geneva")|(df["t_name"] == "Nice")), "tour_wk"] = "2016_16"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2016_17"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Hamburg")), "tour_wk"] = "2016_18"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Umag")|(df["t_name"] == "Kitzbuhel")| (df["t_name"] == "Gstaad")|(df["t_name"] == "Washington")), "tour_wk"] = "2016_19"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2016_20"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Atlanta")), "tour_wk"] = "2016_21"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Los Cabos")), "tour_wk"] = "2016_22"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2016_23"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2016_24"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2016_25"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Metz")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2016_26"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Chengdu")|(df["t_name"] == "Shenzhen")), "tour_wk"] = "2016_27"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2016_28"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2016_29"
df.loc[(df["t_id"].str.contains("2016")) & (((df["t_name"] == "Antwerp")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2016_30"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Vienna")), "tour_wk"] = "2016_31"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2016_32"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2016_33"

df.loc[(df["t_id"].str.contains("2015")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2015_01"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2015_02"       
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2015_03"       
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Quito")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Zagreb")), "tour_wk"] = "2015_04"     
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Sao Paulo")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Memphis")), "tour_wk"] = "2015_05" 
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Delray Beach")|(df["t_name"] == "Marseille")|(df["t_name"] == "Rio de Janeiro")), "tour_wk"] = "2015_06"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Buenos Aires")), "tour_wk"] = "2015_07"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2015_08"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2015_09"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Casablanca")), "tour_wk"] = "2015_10"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2015_11"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2015_12"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Istanbul")), "tour_wk"] = "2015_13"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2015_14"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2015_15"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Geneva")|(df["t_name"] == "Nice")), "tour_wk"] = "2015_16"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2015_17"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Umag")|(df["t_name"] == "Bogota")), "tour_wk"] = "2015_18"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Hamburg")|(df["t_name"] == "Gstaad")|(df["t_name"] == "Atlanta")), "tour_wk"] = "2015_19"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Kitzbuhel")|(df["t_name"] == "Washington")), "tour_wk"] = "2015_20"                                            
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2015_21"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2015_22"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2015_23"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2015_24"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Metz")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2015_25"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Kuala Lumpur")|(df["t_name"] == "Shenzhen")), "tour_wk"] = "2015_26"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2015_27"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2015_28"
df.loc[(df["t_id"].str.contains("2015")) & (((df["t_name"] == "Vienna")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2015_29"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Valencia")), "tour_wk"] = "2015_30"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2015_31"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2015_32"                                           

df.loc[(df["t_id"].str.contains("2014")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2014_01"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2014_02"       
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2014_03"       
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Vina del Mar")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Zagreb")), "tour_wk"] = "2014_04"     
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Memphis")), "tour_wk"] = "2014_05" 
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Delray Beach")|(df["t_name"] == "Marseille")|(df["t_name"] == "Rio de Janeiro")), "tour_wk"] = "2014_06"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Sao Paulo")), "tour_wk"] = "2014_07"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2014_08"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2014_09"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Casablanca")), "tour_wk"] = "2014_10"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2014_11"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2014_12"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich")), "tour_wk"] = "2014_13"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2014_14"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2014_15"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Dusseldorf")|(df["t_name"] == "Nice")), "tour_wk"] = "2014_16"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2014_17"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Stuttgart")), "tour_wk"] = "2014_18"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Hamburg")|(df["t_name"] == "Bogota")), "tour_wk"] = "2014_19"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Atlanta")|(df["t_name"] == "Gstaad") |(df["t_name"] == "Umag")), "tour_wk"] = "2014_20"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Kitzbuhel")|(df["t_name"] == "Washington")), "tour_wk"] = "2014_21"                                          
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2014_22"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2014_23"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2014_24"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2014_25"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Metz")), "tour_wk"] = "2014_26"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Kuala Lumpur")|(df["t_name"] == "Shenzhen")), "tour_wk"] = "2014_27"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2014_28"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2014_29"
df.loc[(df["t_id"].str.contains("2014")) & (((df["t_name"] == "Vienna")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2014_30"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Valencia")), "tour_wk"] = "2014_31"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2014_32"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2014_33"  

df.loc[(df["t_id"].str.contains("2013")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2013_01"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2013_02"       
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2013_03"       
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Santiago")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Zagreb")), "tour_wk"] = "2013_04"     
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "San Jose")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Sao Paulo")), "tour_wk"] = "2013_05" 
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Marseille")|(df["t_name"] == "Memphis")), "tour_wk"] = "2013_06"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Delray Beach")), "tour_wk"] = "2013_07"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2013_08"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2013_09"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Casablanca")), "tour_wk"] = "2013_10"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2013_11"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2013_12"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich")), "tour_wk"] = "2013_13"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2013_14"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2013_15"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Dusseldorf")|(df["t_name"] == "Nice")), "tour_wk"] = "2013_16"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2013_17"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Stuttgart")), "tour_wk"] = "2013_18"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Hamburg")|(df["t_name"] == "Bogota")), "tour_wk"] = "2013_19"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Atlanta")|(df["t_name"] == "Gstaad") |(df["t_name"] == "Umag")), "tour_wk"] = "2013_20"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Kitzbuhel")|(df["t_name"] == "Washington")), "tour_wk"] = "2013_21"                                          
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2013_22"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2013_23"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2013_24"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2013_25"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Metz")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2013_26"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Kuala Lumpur")|(df["t_name"] == "Bangkok")), "tour_wk"] = "2013_27"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2013_28"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2013_29"
df.loc[(df["t_id"].str.contains("2013")) & (((df["t_name"] == "Vienna")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2013_30"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Valencia")), "tour_wk"] = "2013_31"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2013_32"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2013_33"  

df.loc[(df["t_id"].str.contains("2012")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2012_01"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2012_02"       
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2012_03"       
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Santiago")|(df["t_name"] == "Montpellier")|(df["t_name"] == "Zagreb")), "tour_wk"] = "2012_04"     
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "San Jose")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Sao Paulo")), "tour_wk"] = "2012_05" 
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Marseille")|(df["t_name"] == "Memphis")), "tour_wk"] = "2012_06"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Delray Beach")), "tour_wk"] = "2012_07"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2012_08"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2012_09"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Casablanca")), "tour_wk"] = "2012_10"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2012_11"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Barcelona")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2012_12"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Belgrade")), "tour_wk"] = "2012_13"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2012_14"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2012_15"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Dusseldorf")|(df["t_name"] == "Nice")), "tour_wk"] = "2012_16"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2012_17"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Stuttgart") |(df["t_name"] == "Umag")), "tour_wk"] = "2012_18"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Atlanta")|(df["t_name"] == "Gstaad") |(df["t_name"] == "Hamburg")), "tour_wk"] = "2012_19"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Kitzbuhel")|(df["t_name"] == "Los Angeles")), "tour_wk"] = "2012_20"                                          
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Washington")), "tour_wk"] = "2012_21"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2012_22"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2012_23"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2012_24"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2012_25"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Metz")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2012_26"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Kuala Lumpur")|(df["t_name"] == "Bangkok")), "tour_wk"] = "2012_27"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2012_28"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2012_29"
df.loc[(df["t_id"].str.contains("2012")) & (((df["t_name"] == "Vienna")|(df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm"))), "tour_wk"] = "2012_30"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Valencia")), "tour_wk"] = "2012_31"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2012_32"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2012_33"  

df.loc[(df["t_id"].str.contains("2011")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2011_01"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2011_02"       
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2011_03"       
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Santiago")|(df["t_name"] == "Johannesburg")|(df["t_name"] == "Zagreb")), "tour_wk"] = "2011_04"     
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "San Jose")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Costa Do Sauipe")), "tour_wk"] = "2011_05" 
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Marseille")|(df["t_name"] == "Memphis")), "tour_wk"] = "2011_06"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Delray Beach")), "tour_wk"] = "2011_07"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2011_08"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2011_09"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Casablanca")), "tour_wk"] = "2011_10"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2011_11"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Barcelona")), "tour_wk"] = "2011_12"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Belgrade")), "tour_wk"] = "2011_13"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2011_14"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2011_15"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Dusseldorf")|(df["t_name"] == "Nice")), "tour_wk"] = "2011_16"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2011_17"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Stuttgart")), "tour_wk"] = "2011_18"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Atlanta")|(df["t_name"] == "Hamburg")), "tour_wk"] = "2011_19"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Gstaad")|(df["t_name"] == "Los Angeles") |(df["t_name"] == "Umag")), "tour_wk"] = "2011_20"                                          
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Washington") |(df["t_name"] == "Kitzbuhel")), "tour_wk"] = "2011_21"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2011_22"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2011_23"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Winston-Salem")), "tour_wk"] = "2011_24"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2011_25"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Metz")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2011_26"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Kuala Lumpur")|(df["t_name"] == "Bangkok")), "tour_wk"] = "2011_27"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2011_28"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2011_29"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm")), "tour_wk"] = "2011_30"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Vienna")|(df["t_name"] == "St. Petersburg")), "tour_wk"] = "2011_31"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Valencia")), "tour_wk"] = "2011_32"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2011_33"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2011_34"  

df.loc[(df["t_id"].str.contains("2010")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2010_01"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2010_02"       
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2010_03"       
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Santiago")|(df["t_name"] == "Johannesburg")|(df["t_name"] == "Zagreb")), "tour_wk"] = "2010_04"     
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "San Jose")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Costa Do Sauipe")), "tour_wk"] = "2010_05" 
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Marseille")|(df["t_name"] == "Memphis")), "tour_wk"] = "2010_06"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Delray Beach")), "tour_wk"] = "2010_07"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2010_08"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2010_09"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Casablanca")), "tour_wk"] = "2010_10"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2010_11"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Barcelona")), "tour_wk"] = "2010_12"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2010_13"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Belgrade")), "tour_wk"] = "2010_14"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2010_15"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Dusseldorf")|(df["t_name"] == "Nice")), "tour_wk"] = "2010_16"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2010_17"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Stuttgart")), "tour_wk"] = "2010_18"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Atlanta")|(df["t_name"] == "Hamburg")), "tour_wk"] = "2010_19"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Gstaad")|(df["t_name"] == "Los Angeles") |(df["t_name"] == "Umag")), "tour_wk"] = "2010_20"                                          
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Washington")), "tour_wk"] = "2010_21"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2010_22"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2010_23"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "New Haven")), "tour_wk"] = "2010_24"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2010_25"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Metz")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2010_26"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Kuala Lumpur")|(df["t_name"] == "Bangkok")), "tour_wk"] = "2010_27"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2010_28"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2010_29"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm")), "tour_wk"] = "2010_30"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Vienna")|(df["t_name"] == "St. Petersburg") |(df["t_name"] == "Montpellier")), "tour_wk"] = "2010_31"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Valencia")), "tour_wk"] = "2010_32"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2010_33"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2010_34"

df.loc[(df["t_id"].str.contains("2009")) & (((df["t_name"] == "Brisbane")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2009_01"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2009_02"       
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2009_03"       
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Vina del Mar")|(df["t_name"] == "Johannesburg")|(df["t_name"] == "Zagreb")), "tour_wk"] = "2009_04"     
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "San Jose")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Costa Do Sauipe")), "tour_wk"] = "2009_05" 
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Buenos Aires")|(df["t_name"] == "Marseille")|(df["t_name"] == "Memphis")), "tour_wk"] = "2009_06"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Dubai")|(df["t_name"] == "Delray Beach")), "tour_wk"] = "2009_07"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2009_08"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2009_09"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Casablanca")), "tour_wk"] = "2009_10"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2009_11"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Barcelona")), "tour_wk"] = "2009_12"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2009_13"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Estoril")|(df["t_name"] == "Munich") |(df["t_name"] == "Belgrade")), "tour_wk"] = "2009_14"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Madrid Masters")), "tour_wk"] = "2009_15"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Kitzbuhel")), "tour_wk"] = "2009_16"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2009_17"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Bastad") |(df["t_name"] == "Stuttgart")), "tour_wk"] = "2009_18"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Indianapolis")|(df["t_name"] == "Hamburg")), "tour_wk"] = "2009_19"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Gstaad")|(df["t_name"] == "Los Angeles") |(df["t_name"] == "Umag")), "tour_wk"] = "2009_20"                                          
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Washington")), "tour_wk"] = "2009_21"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2009_22"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2009_23"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "New Haven")), "tour_wk"] = "2009_24"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2009_25"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Metz")|(df["t_name"] == "Bucharest")), "tour_wk"] = "2009_26"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Kuala Lumpur")|(df["t_name"] == "Bangkok")), "tour_wk"] = "2009_27"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Beijing")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2009_28"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Shanghai Masters")), "tour_wk"] = "2009_29"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Moscow")|(df["t_name"] == "Stockholm")), "tour_wk"] = "2009_30"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Vienna")|(df["t_name"] == "St. Petersburg") |(df["t_name"] == "Lyon(old)")), "tour_wk"] = "2009_31"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Basel")|(df["t_name"] == "Valencia")), "tour_wk"] = "2009_32"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2009_33"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2009_34"

df.loc[(df["t_id"].str.contains("2008")) & (((df["t_name"] == "Adelaide")|(df["t_name"] == "Doha")|(df["t_name"] == "Chennai"))), "tour_wk"] = "2008_01"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Auckland")|(df["t_name"] == "Sydney")), "tour_wk"] = "2008_02"       
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Australian Open")), "tour_wk"] = "2008_03"       
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Vina del Mar")), "tour_wk"] = "2008_04"     
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Delray Beach")|(df["t_name"] == "Marseille")|(df["t_name"] == "Costa Do Sauipe")), "tour_wk"] = "2008_05" 
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "San Jose")|(df["t_name"] == "Rotterdam")|(df["t_name"] == "Buenos Aires")), "tour_wk"] = "2008_06"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Acapulco")|(df["t_name"] == "Zagreb")|(df["t_name"] == "Memphis")), "tour_wk"] = "2008_07"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Las Vegas")|(df["t_name"] == "Dubai")), "tour_wk"] = "2008_08"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Indian Wells Masters")), "tour_wk"] = "2008_09"  
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Miami Masters")), "tour_wk"] = "2008_10"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Houston")|(df["t_name"] == "Estoril") | (df["t_name"] == "Valencia(old)")), "tour_wk"] = "2008_11"  
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Monte Carlo Masters")), "tour_wk"] = "2008_12"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Barcelona") |(df["t_name"] == "Munich")), "tour_wk"] = "2008_13"  
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Rome Masters")), "tour_wk"] = "2008_14"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Hamburg Masters")), "tour_wk"] = "2008_15"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Casablanca")|(df["t_name"] == "Poertschach")), "tour_wk"] = "2008_16"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Roland Garros")), "tour_wk"] = "2008_17"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Warsaw")), "tour_wk"] = "2008_18"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Gstaad")|(df["t_name"] == "Bastad") |(df["t_name"] == "Stuttgart")), "tour_wk"] = "2008_19"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Indianapolis")|(df["t_name"] == "Kitzbuhel") |(df["t_name"] == "Umag") |(df["t_name"] == "Amersfoort")), "tour_wk"] = "2008_20"                                          
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Canada Masters")), "tour_wk"] = "2008_21"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Cincinnati Masters")), "tour_wk"] = "2008_22"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Los Angeles")), "tour_wk"] = "2008_23"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Washington")), "tour_wk"] = "2008_24"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "New Haven")), "tour_wk"] = "2008_25"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "US Open")), "tour_wk"] = "2008_26"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Bucharest")), "tour_wk"] = "2008_27"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Bangkok")|(df["t_name"] == "Beijing")), "tour_wk"] = "2008_28"  
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Metz")|(df["t_name"] == "Tokyo")), "tour_wk"] = "2008_29" 
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Vienna")|(df["t_name"] == "Stockholm")|(df["t_name"] == "Moscow")), "tour_wk"] = "2008_30"  
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Madrid Masters(old)")), "tour_wk"] = "2008_31"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Basel")|(df["t_name"] == "St. Petersburg") |(df["t_name"] == "Lyon(old)")), "tour_wk"] = "2008_32"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Paris Masters")), "tour_wk"] = "2008_33"
df.loc[(df["t_id"].str.contains("2008")) & ((df["t_name"] == "Tour Finals")), "tour_wk"] = "2008_34"


In [24]:
# tourney country - we are interested in "Home Court Advantage" (whether a player is playing in his home country and his opponent isn't)
df["t_country"] = ""
df.loc[(df["t_name"] == "Brisbane") | (df["t_name"] == "Sydney") | (df["t_name"] == "Adelaide") | (df["t_name"] == "Australian Open"), "t_country"] = "AUS"  
df.loc[(df["t_name"] == "Auckland"), "t_country"] = "NZL"  
df.loc[(df["t_name"] == "Pune") | (df["t_name"] == "Chennai") | (df["t_name"] == "Mumbai"), "t_country"] = "IND" 
df.loc[(df["t_name"] == "Zagreb"), "t_country"] = "CRO"
df.loc[(df["t_name"] == "Vina del Mar")  | (df["t_name"] == "Santiago"), "t_country"] = "CHL"
df.loc[(df["t_name"] == "Doha"), "t_country"] = "QAT"
df.loc[(df["t_name"] == "Quito"), "t_country"] = "ECU"
df.loc[(df["t_name"] == "Johannesburg"), "t_country"] = "RSA"
df.loc[(df["t_name"] == "Bogota"), "t_country"] = "COL"
df.loc[(df["t_name"] == "Cordoba") | (df["t_name"] == "Madrid Masters") | (df["t_name"] == "Madrid Masters(old)") | (df["t_name"] == "Barcelona") | (df["t_name"] == "Valencia(old)") | (df["t_name"] == "Valencia"), "t_country"] = "ESP"
df.loc[(df["t_name"] == "Montpellier") | (df["t_name"] == "Marseille") | (df["t_name"] == "Metz") |(df["t_name"] == "Nice") | (df["t_name"] == "Lyon") | (df["t_name"] == "Lyon(old)") | (df["t_name"] == "Monte Carlo Masters") | (df["t_name"] == "Roland Garros") | (df["t_name"] == "Paris Masters"), "t_country"] = "FRA"
df.loc[(df["t_name"] == "Sofia"), "t_country"] = "BUL"
df.loc[(df["t_name"] == "Belgrade"), "t_country"] = "SRB"
df.loc[(df["t_name"] == "Buenos Aires"), "t_country"] = "ARG"
df.loc[(df["t_name"] == "New York") | (df["t_name"] == "US Open") | (df["t_name"] == "Los Angeles") | (df["t_name"] == "Las Vegas") | (df["t_name"] == "Indianapolis") | (df["t_name"] == "San Jose") | (df["t_name"] == "Winston-Salem") | (df["t_name"] == "New Haven") | (df["t_name"] == "Memphis") | (df["t_name"] == "Cincinnati Masters") | (df["t_name"] == "Washington") | (df["t_name"] == "Atlanta") | (df["t_name"] == "Houston") | (df["t_name"] == "Indian Wells Masters") | (df["t_name"] == "Miami Masters") | (df["t_name"] == "Delray Beach"), "t_country"] = "USA"
df.loc[(df["t_name"] == "Rotterdam"), "t_country"] = "NED"                                                                                                                        
df.loc[(df["t_name"] == "Rio de Janeiro") | (df["t_name"] == "Sao Paulo") | (df["t_name"] == "Costa Do Sauipe") , "t_country"] = "BRA" 
df.loc[(df["t_name"] == "Acapulco") | (df["t_name"] == "Los Cabos"), "t_country"] = "MEX" 
df.loc[(df["t_name"] == "Dubai"), "t_country"] = "UAE" 
df.loc[(df["t_name"] == "Marrakech") | (df["t_name"] == "Casablanca"), "t_country"] = "MOR"
df.loc[(df["t_name"] == "Istanbul"), "t_country"] = "TUR"
df.loc[(df["t_name"] == "Budapest"), "t_country"] = "HUN"
df.loc[(df["t_name"] == "Poertschach"), "t_country"] = "AUT"
df.loc[(df["t_name"] == "Warsaw") | (df["t_name"] == "Sopot"), "t_country"] = "POL"
df.loc[(df["t_name"] == "Bucharest"), "t_country"] = "ROU"
df.loc[(df["t_name"] == "Lisbon") | (df["t_name"] == "Estoril"), "t_country"] = "POR" 
df.loc[(df["t_name"] == "Munich") | (df["t_name"] == "Hamburg") | (df["t_name"] == "Hamburg Masters") | (df["t_name"] == "Dusseldorf") | (df["t_name"] == "Stuttgart"), "t_country"] = "GER" 
df.loc[(df["t_name"] == "Rome Masters"), "t_country"] = "ITA"
df.loc[(df["t_name"] == "Geneva") | (df["t_name"] == "Gstaad") | (df["t_name"] == "Basel"), "t_country"] = "SUI"
df.loc[(df["t_name"] == "Bastad") | (df["t_name"] == "Stockholm"), "t_country"] = "SWE"                                             
df.loc[(df["t_name"] == "Umag"), "t_country"] = "CRO"                                              
df.loc[(df["t_name"] == "Kitzbuhel") | (df["t_name"] == "Vienna"), "t_country"] = "AUT" 
df.loc[(df["t_name"] == "Amersfoort"), "t_country"] = "NED" 
df.loc[(df["t_name"] == "Canada Masters"), "t_country"] = "CAN"   
df.loc[(df["t_name"] == "St. Petersburg") | (df["t_name"] == "Moscow"), "t_country"] = "RUS"                                    
df.loc[(df["t_name"] == "Chengdu") | (df["t_name"] == "Zhuhai") | (df["t_name"] == "Shanghai Masters") | (df["t_name"] == "Beijing") | (df["t_name"] == "Shenzhen"), "t_country"] = "CHN"                                     
df.loc[(df["t_name"] == "Kuala Lumpur"), "t_country"] = "MYS"
df.loc[(df["t_name"] == "Bangkok"), "t_country"] = "THA"
df.loc[(df["t_name"] == "Tokyo"), "t_country"] = "JPN"                                     
df.loc[(df["t_name"] == "Antwerp"), "t_country"] = "BEL" 
df.loc[(df["t_name"] == "Tour Finals"), "t_country"] = "GBR" 
df.loc[(df["t_name"] == "Tour Finals") & (df["t_id"].str.contains("2008")), "t_country"] = "CHN"  #pre-2009
df.loc[(df["t_name"] == "Tour Finals") & (df["t_id"].str.contains("2007")), "t_country"] = "CHN"  #pre-2009
df.loc[(df["t_name"] == "Tour Finals") & (df["t_id"].str.contains("2006")), "t_country"] = "CHN"  #pre-2009
df.loc[(df["t_name"] == "NextGen Finals"), "t_country"] = "ITA" #2019-may not always be true

In [25]:
# Create column for indoor or outdoor tournament (slams with retractable roofs on stadium court are counted as outdoor)
# Indoor = 1; Outdoor = 0

# tourney country - we are interested in "Home Court Advantage" (whether a player is playing in his home country and his opponent isn't)
df["t_indoor"] = ""
df.loc[(df["t_name"] == "Brisbane"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Adelaide"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Sydney"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Australian Open"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Auckland"), "t_indoor"] = 0       
df.loc[(df["t_name"] == "Pune"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Chennai"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Mumbai"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Zagreb"), "t_indoor"] = 1    
df.loc[(df["t_name"] == "Vina del Mar"), "t_indoor"] = 0    
df.loc[(df["t_name"] == "Santiago"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Costa Do Sauipe"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Johannesburg"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Doha"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Quito"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Bogota"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Cordoba"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Madrid Masters"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Madrid Masters(old)"), "t_indoor"] = 1 
df.loc[(df["t_name"] == "Barcelona"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Valencia"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Valencia(old)"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Montpellier"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Marseille"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Metz"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Nice"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Lyon"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Lyon(old)"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Monte Carlo Masters"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Roland Garros"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Paris Masters"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Hamburg Masters"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Hamburg"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Poertschach"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Warsaw"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Sopot"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Sofia"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Belgrade"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Estoril"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Istanbul"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Buenos Aires"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Amersfoort"), "t_indoor"] = 0
df.loc[(df["t_name"] == "New York"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Los Angeles"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Las Vegas"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Indianapolis"), "t_indoor"] = 0
df.loc[(df["t_name"] == "San Jose"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Winston-Salem"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Memphis"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Cincinnati Masters"), "t_indoor"] = 0
df.loc[(df["t_name"] == "US Open"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Washington"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Atlanta"), "t_indoor"] = 0
df.loc[(df["t_name"] == "New Haven"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Houston"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Indian Wells Masters"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Miami Masters"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Delray Beach"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Rotterdam"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Rio de Janeiro"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Sao Paulo"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Acapulco"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Los Cabos"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Dubai"), "t_indoor"] = 0                                                                                                                       
df.loc[(df["t_name"] == "Marrakech"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Casablanca"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Budapest"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Bucharest"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Munich"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Hamburg"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Dusseldorf"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Stuttgart"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Rome Masters"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Geneva"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Gstaad"), "t_indoor"] = 0 
df.loc[(df["t_name"] == "Basel"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Bastad"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Stockholm"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Umag"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Kitzbuhel"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Vienna"), "t_indoor"] = 1                                            
df.loc[(df["t_name"] == "Canada Masters"), "t_indoor"] = 0                                               
df.loc[(df["t_name"] == "St. Petersburg"), "t_indoor"] = 1  
df.loc[(df["t_name"] == "Moscow"), "t_indoor"] = 1
df.loc[(df["t_name"] == "Chengdu"), "t_indoor"] = 0  
df.loc[(df["t_name"] == "Zhuhai"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Shanghai Masters"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Beijing"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Shenzhen"), "t_indoor"] = 0
df.loc[(df["t_name"] == "Kuala Lumpur"), "t_indoor"] = 1 
df.loc[(df["t_name"] == "Bangkok"), "t_indoor"] = 1 
df.loc[(df["t_name"] == "Tokyo"), "t_indoor"] = 0                                      
df.loc[(df["t_name"] == "Antwerp"), "t_indoor"] = 1 
df.loc[(df["t_name"] == "Tour Finals"), "t_indoor"] = 1
df.loc[(df["t_name"] == "NextGen Finals"), "t_indoor"] = 1 


In [26]:
# A small percentage of tournaments are played at exceptional altitude (at over 3,000 ft). This clearly influences the importance of serve (as seen in the ace rate at these tournies. We want a marker column for these.)
df["t_alt"] = ""
df.loc[(df["t_name"] == "Bogota"), "t_alt"] = 1
df.loc[(df["t_name"] == "Quito"), "t_alt"] = 1
df.loc[(df["t_name"] == "Gstaad"), "t_alt"] = 1
df.loc[(df["t_alt"] != 1), "t_alt"] = 0

In [27]:
df["t_alt"].value_counts()

0    17768
1      400
Name: t_alt, dtype: int64

In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18168 entries, 11 to 23293
Data columns (total 47 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   t_id         18168 non-null  object 
 1   t_name       18168 non-null  object 
 2   t_surf       18168 non-null  object 
 3   t_draw_size  18168 non-null  int64  
 4   t_lvl        18168 non-null  object 
 5   t_date       18168 non-null  int64  
 6   m_num        18168 non-null  int64  
 7   w_id         18168 non-null  int64  
 8   w_ent        18168 non-null  object 
 9   w_name       18168 non-null  object 
 10  w_hd         18168 non-null  object 
 11  w_ht         17759 non-null  float64
 12  w_ioc        18168 non-null  object 
 13  w_age        18168 non-null  float64
 14  l_id         18168 non-null  int64  
 15  l_ent        18168 non-null  object 
 16  l_name       18168 non-null  object 
 17  l_hd         18168 non-null  object 
 18  l_ht         17095 non-null  float64
 19  l_i

In [29]:
# We will need to do multi-layered sorting later on to do some windowed, rolling, backward-looking feature development. Tourney week will be critical for this, but we also want
# to convert 'round' to a numerical value that can be used for multi-level sorting ("round_num").
df["t_round_num"] = ""

In [30]:
#Creating appropriate "round_num" values. 
# One note- I directly updated Jeff's yearly csvs for the Tour Finals & Next Gen Finals to specify where in the sequence round robin stage matches are.
# Also, generally, some missing individual values across features (missing minutes, heights etc.) have been added from the ATP site to Jeff's original csvs prior to loading into this script. 
#Look for *_jnr versions of Jeff's csvs on my Github for the raw data versions loaded here

df.loc[(df["t_round"] == "R128") & ((df["t_draw_size"] == 128) | (df["t_draw_size"] == 96)), "t_round_num"] = 1
df.loc[(df["t_round"] == "R64") & ((df["t_draw_size"] == 64) | (df["t_draw_size"] == 56) | (df["t_draw_size"] == 48)), "t_round_num"] = 1
df.loc[(df["t_round"] == "R32") & ((df["t_draw_size"] == 32) | (df["t_draw_size"] == 28)), "t_round_num"] = 1
df.loc[(df["t_round"] == "RR1") & (df["t_draw_size"] == 8), "t_round_num"] = 1
df.loc[(df["t_round"] == "RR1") & (df["t_draw_size"] == 16), "t_round_num"] = 1
df.loc[(df["t_round"] == "RR2") & (df["t_draw_size"] == 8), "t_round_num"] = 2
df.loc[(df["t_round"] == "RR2") & (df["t_draw_size"] == 16), "t_round_num"] = 2
df.loc[(df["t_round"] == "RR3") & (df["t_draw_size"] == 8), "t_round_num"] = 3
df.loc[(df["t_round"] == "RR3") & (df["t_draw_size"] == 16), "t_round_num"] = 3
        
df.loc[(df["t_round"] == "R64") & ((df["t_draw_size"] == 128) | (df["t_draw_size"] == 96)), "t_round_num"] = 2
df.loc[(df["t_round"] == "R32") & ((df["t_draw_size"] == 64) | (df["t_draw_size"] == 56) | (df["t_draw_size"] == 48)), "t_round_num"] = 2
df.loc[(df["t_round"] == "R16") & ((df["t_draw_size"] == 32) | (df["t_draw_size"] == 28)), "t_round_num"] = 2
df.loc[(df["t_round"] == "SF") & (df["t_draw_size"] == 8), "t_round_num"] = 4
df.loc[(df["t_round"] == "SF") & (df["t_draw_size"] == 16), "t_round_num"] = 4

df.loc[(df["t_round"] == "R32") & ((df["t_draw_size"] == 128) | (df["t_draw_size"] == 96)), "t_round_num"] = 3
df.loc[(df["t_round"] == "R16") & ((df["t_draw_size"] == 64) | (df["t_draw_size"] == 56) | (df["t_draw_size"] == 48)), "t_round_num"] = 3
df.loc[(df["t_round"] == "QF") & ((df["t_draw_size"] == 32) | (df["t_draw_size"] == 28)) , "t_round_num"] = 3
df.loc[(df["t_round"] == "F") & (df["t_draw_size"] == 8), "t_round_num"] = 5
df.loc[(df["t_round"] == "F") & (df["t_draw_size"] == 16), "t_round_num"] = 5
df.loc[(df["t_round"] == "BR") & (df["t_draw_size"] == 8), "t_round_num"] = 5
df.loc[(df["t_round"] == "BR") & (df["t_draw_size"] == 16), "t_round_num"] = 5
        
df.loc[(df["t_round"] == "R16") & ((df["t_draw_size"] == 128) | (df["t_draw_size"] == 96)), "t_round_num"] = 4
df.loc[(df["t_round"] == "QF") & ((df["t_draw_size"] == 64) | (df["t_draw_size"] == 56) | (df["t_draw_size"] == 48)), "t_round_num"] = 4
df.loc[(df["t_round"] == "SF") & ((df["t_draw_size"] == 32) | (df["t_draw_size"] == 28)), "t_round_num"] = 4

df.loc[(df["t_round"] == "QF") & ((df["t_draw_size"] == 128) | (df["t_draw_size"] == 96)), "t_round_num"] = 5
df.loc[(df["t_round"] == "SF") & ((df["t_draw_size"] == 64) | (df["t_draw_size"] == 56) | (df["t_draw_size"] == 48)), "t_round_num"] = 5
df.loc[(df["t_round"] == "F") & ((df["t_draw_size"] == 32) | (df["t_draw_size"] == 28)), "t_round_num"] = 5  
        
df.loc[(df["t_round"] == "SF") & ((df["t_draw_size"] == 128) | (df["t_draw_size"] == 96)), "t_round_num"] = 6
df.loc[(df["t_round"] == "F") & ((df["t_draw_size"] == 64) | (df["t_draw_size"] == 56) | (df["t_draw_size"] == 48)), "t_round_num"] = 6

df.loc[(df["t_round"] == "F") & ((df["t_draw_size"] == 128) | (df["t_draw_size"] == 96)), "t_round_num"] = 7

In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18168 entries, 11 to 23293
Data columns (total 48 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   t_id         18168 non-null  object 
 1   t_name       18168 non-null  object 
 2   t_surf       18168 non-null  object 
 3   t_draw_size  18168 non-null  int64  
 4   t_lvl        18168 non-null  object 
 5   t_date       18168 non-null  int64  
 6   m_num        18168 non-null  int64  
 7   w_id         18168 non-null  int64  
 8   w_ent        18168 non-null  object 
 9   w_name       18168 non-null  object 
 10  w_hd         18168 non-null  object 
 11  w_ht         17759 non-null  float64
 12  w_ioc        18168 non-null  object 
 13  w_age        18168 non-null  float64
 14  l_id         18168 non-null  int64  
 15  l_ent        18168 non-null  object 
 16  l_name       18168 non-null  object 
 17  l_hd         18168 non-null  object 
 18  l_ht         17095 non-null  float64
 19  l_i

In [32]:
df = df[['t_id','t_date','tour_wk','t_name','t_country','t_surf','t_indoor', 't_alt', 't_lvl','t_draw_size','m_num','t_round','t_round_num','m_best_of','m_score','m_time(m)','w_id','w_name','w_rank','w_rank_pts','w_ioc','w_ent','w_hd','w_ht', 'w_age','w_svpt','w_1stWon','w_2ndWon','w_SvGms','w_ace','w_bpSaved', 'w_bpFaced','l_id','l_name','l_rank','l_rank_pts','l_ioc','l_ent','l_hd','l_ht', 'l_age','l_svpt','l_1stWon','l_2ndWon','l_SvGms','l_ace','l_bpSaved','l_bpFaced']]

In [33]:
# no duplicated records
df.duplicated().value_counts()

False    18168
dtype: int64

In [34]:
# Sorting in this processing stream done by player unique numerical ID. Careful visual inspection was used primarily to check for anomolies, but also looking for unusual ID-player name mappings here
df.value_counts(subset = ["w_id", "w_name"], ascending=True).head(60)

w_id    w_name                  
106288  Mathias Bourgue             1
104770  Martin Fischer              1
106234  Aslan Karatsev              1
104667  Philipp Oswald              1
104619  Jesse Huta Galung           1
106328  Christian Harrison          1
106390  Matteo Donati               1
106393  Frederico Ferreira Silva    1
106420  Gianluigi Quinzi            1
109054  Daniel Masur                1
109303  Stefano Napolitano          1
105302  Ante Pavic                  1
104516  Arnau Brugues Davi          1
104804  Amir Weintraub              1
104476  Louk Sorensen               1
105678  Axel Michon                 1
104371  Ivo Minar                   1
105677  Nikola Cacic                1
117353  Duck Hee Lee                1
117357  Omar Jasika                 1
117360  Marc Polmans                1
117361  Akira Santillan             1
105454  Brydan Klein                1
105472  Nils Langer                 1
104256  Clement Reix                1
104250  Prakash A

In [35]:
# Sorting in this processing stream done by player unique numerical ID. Careful visual inspection was used primarily to check for anomolies, but also looking for unusual ID-player name mappings here
df.value_counts(subset = ["l_id", "l_name"], ascending=True).head(60)

l_id    l_name                      
208260  Zachary Svajda                  1
105738  Tsung Hua Yang                  1
104627  Boy Westerhof                   1
104644  Riccardo Ghedin                 1
105692  Alessandro Bega                 1
126237  Gian Marco Moroni               1
126208  Yusuke Takahashi                1
105688  Manuel Sanchez                  1
109640  Andrea Basso                    1
104677  Artem Sitak                     1
126204  Gerardo Lopez Villasenor        1
105678  Axel Michon                     1
126197  Orlando Luz                     1
105677  Nikola Cacic                    1
104729  Ivan Nedelko                    1
126185  Evgenii Tiurnev                 1
126152  Ryan Shane                      1
104778  Andrey Kumantsov                1
104779  Jamie Baker                     1
104783  Mikhail Ledovskikh              1
126128  Roman Safiullin                 1
126127  Benjamin Bonzi                  1
126627  Johan Nikles                   

In [36]:
# a bunch of players are missing height (in centimeters) data, at least in some records. Some will have to be sought out on the ATP website, 
# but some are available from later records for the same player. These players were visually identified outside of Python and filled in here.

df.loc[(df["w_name"] == "Adrian Andreev"), "w_ht"] = 180
df.loc[(df["l_name"] == "Adrian Andreev"), "l_ht"] = 180
df.loc[(df["w_name"] == "Adrian Menendez Maceiras"), "w_ht"] = 183
df.loc[(df["l_name"] == "Adrian Menendez Maceiras"), "l_ht"] = 183
df.loc[(df["w_name"] == "Agustin Velotti"), "w_ht"] = 174
df.loc[(df["l_name"] == "Agustin Velotti"), "l_ht"] = 174
df.loc[(df["w_name"] == "Akira Santillan"), "w_ht"] = 180
df.loc[(df["l_name"] == "Akira Santillan"), "l_ht"] = 180
df.loc[(df["w_name"] == "Alejandro Gomez "), "w_ht"] = 183 #his csv entries have a trailing space for some reason
df.loc[(df["l_name"] == "Alejandro Gomez "), "l_ht"] = 183
df.loc[(df["w_name"] == "Alejandro Gonzalez"), "w_ht"] = 191
df.loc[(df["l_name"] == "Alejandro Gonzalez"), "l_ht"] = 191
df.loc[(df["w_name"] == "Alen Avidzba"), "w_ht"] = 188
df.loc[(df["l_name"] == "Alen Avidzba"), "l_ht"] = 188
df.loc[(df["w_name"] == "Alessandro Bega"), "w_ht"] = 173
df.loc[(df["l_name"] == "Alessandro Bega"), "l_ht"] = 173
df.loc[(df["w_name"] == "Alex Bolt"), "w_ht"] = 183
df.loc[(df["l_name"] == "Alex Bolt"), "l_ht"] = 183
df.loc[(df["w_name"] == "Alexandar Lazarov"), "w_ht"] = 191
df.loc[(df["l_name"] == "Alexandar Lazarov"), "l_ht"] = 191
df.loc[(df["w_name"] == "Alexander Donski"), "w_ht"] = 188
df.loc[(df["l_name"] == "Alexander Donski"), "l_ht"] = 188
df.loc[(df["w_name"] == "Alexander Sarkissian"), "w_ht"] = 191
df.loc[(df["l_name"] == "Alexander Sarkissian"), "l_ht"] = 191
df.loc[(df["w_name"] == "Alexandre Muller"), "w_ht"] = 183
df.loc[(df["l_name"] == "Alexandre Muller"), "l_ht"] = 183
df.loc[(df["w_name"] == "Alexandre Sidorenko"), "w_ht"] = 185
df.loc[(df["l_name"] == "Alexandre Sidorenko"), "l_ht"] = 185
df.loc[(df["w_name"] == "Alexey Vatutin"), "w_ht"] = 178
df.loc[(df["l_name"] == "Alexey Vatutin"), "l_ht"] = 178
df.loc[(df["w_name"] == "Alexios Halebian"), "w_ht"] = 180
df.loc[(df["l_name"] == "Alexios Halebian"), "l_ht"] = 180
df.loc[(df["w_name"] == "Alibek Kachmazov"), "w_ht"] = 180
df.loc[(df["l_name"] == "Alibek Kachmazov"), "l_ht"] = 180
df.loc[(df["w_name"] == "Amine Ahouda"), "w_ht"] = 185
df.loc[(df["l_name"] == "Amine Ahouda"), "l_ht"] = 185
df.loc[(df["w_name"] == "Andrea Arnaboldi"), "w_ht"] = 180
df.loc[(df["l_name"] == "Andrea Arnaboldi"), "l_ht"] = 180
df.loc[(df["w_name"] == "Andrea Basso"), "w_ht"] = 180
df.loc[(df["l_name"] == "Andrea Basso"), "l_ht"] = 180
df.loc[(df["w_name"] == "Andres Artunedo Martinavarro"), "w_ht"] = 183
df.loc[(df["l_name"] == "Andres Artunedo Martinavarro"), "l_ht"] = 183
df.loc[(df["w_name"] == "Andres Molteni"), "w_ht"] = 180
df.loc[(df["l_name"] == "Andres Molteni"), "l_ht"] = 180
df.loc[(df["w_name"] == "Andrew Whittington"), "w_ht"] = 188
df.loc[(df["l_name"] == "Andrew Whittington"), "l_ht"] = 188
df.loc[(df["w_name"] == "Anil Yuksel"), "w_ht"] = 180
df.loc[(df["l_name"] == "Anil Yuksel"), "l_ht"] = 180
df.loc[(df["w_name"] == "Ante Pavic"), "w_ht"] = 196
df.loc[(df["l_name"] == "Ante Pavic"), "l_ht"] = 196
df.loc[(df["w_name"] == "Antoine Bellier"), "w_ht"] = 196
df.loc[(df["l_name"] == "Antoine Bellier"), "l_ht"] = 196
df.loc[(df["w_name"] == "Antoine Hoang"), "w_ht"] = 183
df.loc[(df["l_name"] == "Antoine Hoang"), "l_ht"] = 183
df.loc[(df["w_name"] == "Antonio Veic"), "w_ht"] = 180
df.loc[(df["l_name"] == "Antonio Veic"), "l_ht"] = 180
df.loc[(df["w_name"] == "Arjun Kadhe"), "w_ht"] = 185
df.loc[(df["l_name"] == "Arjun Kadhe"), "l_ht"] = 185
df.loc[(df["w_name"] == "Artem Dubrivnyy"), "w_ht"] = 183
df.loc[(df["l_name"] == "Artem Dubrivnyy"), "l_ht"] = 183
df.loc[(df["w_name"] == "Aleksandar Vukic"), "w_ht"] = 188
df.loc[(df["l_name"] == "Aleksandar Vukic"), "l_ht"] = 188
df.loc[(df["w_name"] == "Arthur De Greef"), "w_ht"] = 183
df.loc[(df["l_name"] == "Arthur De Greef"), "l_ht"] = 183
df.loc[(df["w_name"] == "Austin Krajicek"), "w_ht"] = 188
df.loc[(df["l_name"] == "Austin Krajicek"), "l_ht"] = 188
df.loc[(df["w_name"] == "Austin Smith"), "w_ht"] = 180
df.loc[(df["l_name"] == "Austin Smith"), "l_ht"] = 180
df.loc[(df["w_name"] == "Axel Michon"), "w_ht"] = 180
df.loc[(df["l_name"] == "Axel Michon"), "l_ht"] = 180
df.loc[(df["w_name"] == "Bastian Trinker"), "w_ht"] = 188
df.loc[(df["l_name"] == "Bastian Trinker"), "l_ht"] = 188
df.loc[(df["w_name"] == "Benjamin Bonzi"), "w_ht"] = 183
df.loc[(df["l_name"] == "Benjamin Bonzi"), "l_ht"] = 183
df.loc[(df["w_name"] == "Benjamin Mitchell"), "w_ht"] = 183
df.loc[(df["l_name"] == "Benjamin Mitchell"), "l_ht"] = 183
df.loc[(df["w_name"] == "Bernabe Zapata Miralles"), "w_ht"] = 183
df.loc[(df["l_name"] == "Bernabe Zapata Miralles"), "l_ht"] = 183
df.loc[(df["w_name"] == "Bjorn Fratangelo"), "w_ht"] = 183
df.loc[(df["l_name"] == "Bjorn Fratangelo"), "l_ht"] = 183
df.loc[(df["w_name"] == "Daniel Kosakowski"), "w_ht"] = 185
df.loc[(df["l_name"] == "Daniel Kosakowski"), "l_ht"] = 185
df.loc[(df["w_name"] == "Daniel Garza"), "w_ht"] = 180
df.loc[(df["l_name"] == "Daniel Garza"), "l_ht"] = 180
df.loc[(df["w_name"] == "Daniel Altmaier"), "w_ht"] = 188
df.loc[(df["l_name"] == "Daniel Altmaier"), "l_ht"] = 188
df.loc[(df["w_name"] == "Constant Lestienne"), "w_ht"] = 180
df.loc[(df["l_name"] == "Constant Lestienne"), "l_ht"] = 180
df.loc[(df["w_name"] == "Collin Altamirano"), "w_ht"] = 188
df.loc[(df["l_name"] == "Collin Altamirano"), "l_ht"] = 188
df.loc[(df["w_name"] == "Chun Hsin Tseng"), "w_ht"] = 175
df.loc[(df["l_name"] == "Chun Hsin Tseng"), "l_ht"] = 175
df.loc[(df["w_name"] == "Christopher Eubanks"), "w_ht"] = 201
df.loc[(df["l_name"] == "Christopher Eubanks"), "l_ht"] = 201
df.loc[(df["w_name"] == "Christian Harrison"), "w_ht"] = 180
df.loc[(df["l_name"] == "Christian Harrison"), "l_ht"] = 180
df.loc[(df["w_name"] == "Chase Buchanan"), "w_ht"] = 193
df.loc[(df["l_name"] == "Chase Buchanan"), "l_ht"] = 193
df.loc[(df["w_name"] == "Carlos Taberner"), "w_ht"] = 183
df.loc[(df["l_name"] == "Carlos Taberner"), "l_ht"] = 183
df.loc[(df["w_name"] == "Carl Soderlund"), "w_ht"] = 188
df.loc[(df["l_name"] == "Carl Soderlund"), "l_ht"] = 188
df.loc[(df["w_name"] == "Calvin Hemery"), "w_ht"] = 191
df.loc[(df["l_name"] == "Calvin Hemery"), "l_ht"] = 191
df.loc[(df["w_name"] == "Guillermo Olaso"), "w_ht"] = 175
df.loc[(df["l_name"] == "Guillermo Olaso"), "l_ht"] = 175
df.loc[(df["w_name"] == "Clement Reix"), "w_ht"] = 178
df.loc[(df["l_name"] == "Clement Reix"), "l_ht"] = 178
df.loc[(df["w_name"] == "Brayden Schnur"), "w_ht"] = 193
df.loc[(df["l_name"] == "Brayden Schnur"), "l_ht"] = 193
df.loc[(df["w_name"] == "Bowen Ouyang"), "w_ht"] = 185
df.loc[(df["l_name"] == "Bowen Ouyang"), "l_ht"] = 185
df.loc[(df["w_name"] == "Borna Gojo"), "w_ht"] = 196
df.loc[(df["l_name"] == "Borna Gojo"), "l_ht"] = 196
df.loc[(df["w_name"] == "Blake Mott"), "w_ht"] = 180
df.loc[(df["l_name"] == "Blake Mott"), "l_ht"] = 180
df.loc[(df["w_name"] == "Fabrice Martin"), "w_ht"] = 198
df.loc[(df["l_name"] == "Fabrice Martin"), "l_ht"] = 198
df.loc[(df["w_name"] == "Fabiano De Paula"), "w_ht"] = 178
df.loc[(df["l_name"] == "Fabiano De Paula"), "l_ht"] = 178
df.loc[(df["w_name"] == "Evgeny Karlovskiy"), "w_ht"] = 191
df.loc[(df["l_name"] == "Evgeny Karlovskiy"), "l_ht"] = 191
df.loc[(df["w_name"] == "Evgenii Tiurnev"), "w_ht"] = 191
df.loc[(df["l_name"] == "Evgenii Tiurnev"), "l_ht"] = 191
df.loc[(df["w_name"] == "Ernesto Escobedo"), "w_ht"] = 185
df.loc[(df["l_name"] == "Ernesto Escobedo"), "l_ht"] = 185
df.loc[(df["w_name"] == "Eric Quigley"), "w_ht"] = 185
df.loc[(df["l_name"] == "Eric Quigley"), "l_ht"] = 185
df.loc[(df["w_name"] == "Enrique Lopez Perez"), "w_ht"] = 180
df.loc[(df["l_name"] == "Enrique Lopez Perez"), "l_ht"] = 180
df.loc[(df["w_name"] == "Emilio Nava"), "w_ht"] = 185
df.loc[(df["l_name"] == "Emilio Nava"), "l_ht"] = 185
df.loc[(df["w_name"] == "Emilio Gomez"), "w_ht"] = 185
df.loc[(df["l_name"] == "Emilio Gomez"), "l_ht"] = 185
df.loc[(df["w_name"] == "Emil Reinberg"), "w_ht"] = 188
df.loc[(df["l_name"] == "Emil Reinberg"), "l_ht"] = 188
df.loc[(df["w_name"] == "Elliot Benchetrit"), "w_ht"] = 193
df.loc[(df["l_name"] == "Elliot Benchetrit"), "l_ht"] = 193
df.loc[(df["w_name"] == "Eduardo Struvay"), "w_ht"] = 180
df.loc[(df["l_name"] == "Eduardo Struvay"), "l_ht"] = 180
df.loc[(df["w_name"] == "Dennis Novikov"), "w_ht"] = 193
df.loc[(df["l_name"] == "Dennis Novikov"), "l_ht"] = 193
df.loc[(df["w_name"] == "David Vega Hernandez"), "w_ht"] = 188
df.loc[(df["l_name"] == "David Vega Hernandez"), "l_ht"] = 188
df.loc[(df["w_name"] == "Daniel Munoz de la Nava"), "w_ht"] = 175
df.loc[(df["l_name"] == "Daniel Munoz de la Nava"), "l_ht"] = 175
df.loc[(df["w_name"] == "Daniel Masur"), "w_ht"] = 183
df.loc[(df["l_name"] == "Daniel Masur"), "l_ht"] = 183
df.loc[(df["w_name"] == "Guido Andreozzi"), "w_ht"] = 183
df.loc[(df["l_name"] == "Guido Andreozzi"), "l_ht"] = 183
df.loc[(df["w_name"] == "Gregoire Burquier"), "w_ht"] = 178
df.loc[(df["l_name"] == "Gregoire Burquier"), "l_ht"] = 178
df.loc[(df["w_name"] == "Gleb Sakharov"), "w_ht"] = 185
df.loc[(df["l_name"] == "Gleb Sakharov"), "l_ht"] = 185
df.loc[(df["w_name"] == "Gianluigi Quinzi"), "w_ht"] = 171
df.loc[(df["l_name"] == "Gianluigi Quinzi"), "l_ht"] = 171
df.loc[(df["w_name"] == "Facundo Arguello"), "w_ht"] = 178
df.loc[(df["l_name"] == "Facundo Arguello"), "l_ht"] = 178
df.loc[(df["w_name"] == "Gian Marco Moroni"), "w_ht"] = 185
df.loc[(df["l_name"] == "Gian Marco Moroni"), "l_ht"] = 185
df.loc[(df["w_name"] == "Germain Gigounon"), "w_ht"] = 178
df.loc[(df["l_name"] == "Germain Gigounon"), "l_ht"] = 178
df.loc[(df["w_name"] == "Gerardo Lopez Villasenor"), "w_ht"] = 191
df.loc[(df["l_name"] == "Gerardo Lopez Villasenor"), "l_ht"] = 191
df.loc[(df["w_name"] == "Geoffrey Blancaneaux"), "w_ht"] = 180
df.loc[(df["l_name"] == "Geoffrey Blancaneaux"), "l_ht"] = 180
df.loc[(df["w_name"] == "Frederico Ferreira Silva"), "w_ht"] = 178
df.loc[(df["l_name"] == "Frederico Ferreira Silva"), "l_ht"] = 178
df.loc[(df["w_name"] == "Francisco Cerundolo"), "w_ht"] = 185
df.loc[(df["l_name"] == "Francisco Cerundolo"), "l_ht"] = 185
df.loc[(df["w_name"] == "Filippo Baldi"), "w_ht"] = 180
df.loc[(df["l_name"] == "Filippo Baldi"), "l_ht"] = 180
df.loc[(df["w_name"] == "Filip Horansky"), "w_ht"] = 191
df.loc[(df["l_name"] == "Filip Horansky"), "l_ht"] = 191
df.loc[(df["w_name"] == "Federico Gaio"), "w_ht"] = 178
df.loc[(df["l_name"] == "Federico Gaio"), "l_ht"] = 178
df.loc[(df["w_name"] == "Federico Coria"), "w_ht"] = 180
df.loc[(df["l_name"] == "Federico Coria"), "l_ht"] = 180
df.loc[(df["w_name"] == "Fred Simonsson"), "w_ht"] = 188
df.loc[(df["l_name"] == "Fred Simonsson"), "l_ht"] = 188
df.loc[(df["w_name"] == "John Patrick Smith"), "w_ht"] = 188
df.loc[(df["l_name"] == "John Patrick Smith"), "l_ht"] = 188
df.loc[(df["w_name"] == "Johan Nikles"), "w_ht"] = 173
df.loc[(df["l_name"] == "Johan Nikles"), "l_ht"] = 173
df.loc[(df["w_name"] == "Jenson Brooksby"), "w_ht"] = 193
df.loc[(df["l_name"] == "Jenson Brooksby"), "l_ht"] = 193
df.loc[(df["w_name"] == "Jeevan Nedunchezhiyan"), "w_ht"] = 173
df.loc[(df["l_name"] == "Jeevan Nedunchezhiyan"), "l_ht"] = 173
df.loc[(df["w_name"] == "Jc Aragone"), "w_ht"] = 178
df.loc[(df["l_name"] == "Jc Aragone"), "l_ht"] = 178
df.loc[(df["w_name"] == "Jason Kubler"), "w_ht"] = 178
df.loc[(df["l_name"] == "Jason Kubler"), "l_ht"] = 178
df.loc[(df["w_name"] == "Jason Jung"), "w_ht"] = 180
df.loc[(df["l_name"] == "Jason Jung"), "l_ht"] = 180
df.loc[(df["w_name"] == "Jared Donaldson"), "w_ht"] = 188
df.loc[(df["l_name"] == "Jared Donaldson"), "l_ht"] = 188
df.loc[(df["w_name"] == "Jan Satral"), "w_ht"] = 185
df.loc[(df["l_name"] == "Jan Satral"), "l_ht"] = 185
df.loc[(df["w_name"] == "Jack Mingjie Lin"), "w_ht"] = 178
df.loc[(df["l_name"] == "Jack Mingjie Lin"), "l_ht"] = 178
df.loc[(df["w_name"] == "Jabor Al Mutawa"), "w_ht"] = 178
df.loc[(df["l_name"] == "Jabor Al Mutawa"), "l_ht"] = 178
df.loc[(df["w_name"] == "Isak Arvidsson"), "w_ht"] = 188
df.loc[(df["l_name"] == "Isak Arvidsson"), "l_ht"] = 188
df.loc[(df["w_name"] == "Inigo Cervantes Huegun"), "w_ht"] = 183
df.loc[(df["l_name"] == "Inigo Cervantes Huegun"), "l_ht"] = 183
df.loc[(df["w_name"] == "Hugo Gaston"), "w_ht"] = 173
df.loc[(df["l_name"] == "Hugo Gaston"), "l_ht"] = 173
df.loc[(df["w_name"] == "Hicham Khaddari"), "w_ht"] = 180
df.loc[(df["l_name"] == "Hicham Khaddari"), "l_ht"] = 180
df.loc[(df["w_name"] == "Jordi Samper Montana"), "w_ht"] = 178
df.loc[(df["l_name"] == "Jordi Samper Montana"), "l_ht"] = 178
df.loc[(df["w_name"] == "Joris De Loore"), "w_ht"] = 191
df.loc[(df["l_name"] == "Joris De Loore"), "l_ht"] = 191
df.loc[(df["w_name"] == "Juan Ignacio Londero"), "w_ht"] = 180
df.loc[(df["l_name"] == "Juan Ignacio Londero"), "l_ht"] = 180
df.loc[(df["w_name"] == "Julian Lenz"), "w_ht"] = 188
df.loc[(df["l_name"] == "Julian Lenz"), "l_ht"] = 188
df.loc[(df["w_name"] == "Jurij Rodionov"), "w_ht"] = 191
df.loc[(df["l_name"] == "Jurij Rodionov"), "l_ht"] = 191
df.loc[(df["w_name"] == "Kamil Majchrzak"), "w_ht"] = 180
df.loc[(df["l_name"] == "Kamil Majchrzak"), "l_ht"] = 180
df.loc[(df["w_name"] == "Karim Hossam"), "w_ht"] = 178
df.loc[(df["l_name"] == "Karim Hossam"), "l_ht"] = 178
df.loc[(df["w_name"] == "Kento Takeuchi"), "w_ht"] = 180
df.loc[(df["l_name"] == "Kento Takeuchi"), "l_ht"] = 180
df.loc[(df["w_name"] == "Kevin King"), "w_ht"] = 191
df.loc[(df["l_name"] == "Kevin King"), "l_ht"] = 191
df.loc[(df["w_name"] == "Laurent Lokoli"), "w_ht"] = 188
df.loc[(df["l_name"] == "Laurent Lokoli"), "l_ht"] = 188
df.loc[(df["w_name"] == "Liam Broady"), "w_ht"] = 183
df.loc[(df["l_name"] == "Liam Broady"), "l_ht"] = 183
df.loc[(df["w_name"] == "Liam Caruana"), "w_ht"] = 178
df.loc[(df["l_name"] == "Liam Caruana"), "l_ht"] = 178
df.loc[(df["w_name"] == "Lorenzo Giustino"), "w_ht"] = 180
df.loc[(df["l_name"] == "Lorenzo Giustino"), "l_ht"] = 180
df.loc[(df["w_name"] == "Louis Wessels"), "w_ht"] = 198
df.loc[(df["l_name"] == "Louis Wessels"), "l_ht"] = 198
df.loc[(df["w_name"] == "Luca Vanni"), "w_ht"] = 198
df.loc[(df["l_name"] == "Luca Vanni"), "l_ht"] = 198
df.loc[(df["w_name"] == "Lucas Catarina"), "w_ht"] = 185
df.loc[(df["l_name"] == "Lucas Catarina"), "l_ht"] = 185
df.loc[(df["w_name"] == "Lucas Miedler"), "w_ht"] = 183
df.loc[(df["l_name"] == "Lucas Miedler"), "l_ht"] = 183
df.loc[(df["w_name"] == "Luis Patino"), "w_ht"] = 183
df.loc[(df["l_name"] == "Luis Patino"), "l_ht"] = 183
df.loc[(df["w_name"] == "Luke Saville"), "w_ht"] = 188
df.loc[(df["l_name"] == "Luke Saville"), "l_ht"] = 188
df.loc[(df["w_name"] == "Mackenzie Mcdonald"), "w_ht"] = 178
df.loc[(df["l_name"] == "Mackenzie Mcdonald"), "l_ht"] = 178
df.loc[(df["w_name"] == "Marc Andrea Huesler"), "w_ht"] = 196
df.loc[(df["l_name"] == "Marc Andrea Huesler"), "l_ht"] = 196
df.loc[(df["w_name"] == "Marc Polmans"), "w_ht"] = 188
df.loc[(df["l_name"] == "Marc Polmans"), "l_ht"] = 188
df.loc[(df["w_name"] == "Marcelo Demoliner"), "w_ht"] = 191
df.loc[(df["l_name"] == "Marcelo Demoliner"), "l_ht"] = 191
df.loc[(df["w_name"] == "Marco Trungelliti"), "w_ht"] = 178
df.loc[(df["l_name"] == "Marco Trungelliti"), "l_ht"] = 178
df.loc[(df["w_name"] == "Marko Tepavac"), "w_ht"] = 193
df.loc[(df["l_name"] == "Marko Tepavac"), "l_ht"] = 193
df.loc[(df["w_name"] == "Martin Alund"), "w_ht"] = 180
df.loc[(df["l_name"] == "Martin Alund"), "l_ht"] = 180
df.loc[(df["w_name"] == "Martin Vaisse"), "w_ht"] = 178
df.loc[(df["l_name"] == "Martin Vaisse"), "l_ht"] = 178
df.loc[(df["w_name"] == "Marvin Moeller"), "w_ht"] = 180
df.loc[(df["l_name"] == "Marvin Moeller"), "l_ht"] = 180
df.loc[(df["w_name"] == "Mate Valkusz"), "w_ht"] = 183
df.loc[(df["l_name"] == "Mate Valkusz"), "l_ht"] = 183
df.loc[(df["w_name"] == "Mathias Bourgue"), "w_ht"] = 188
df.loc[(df["l_name"] == "Mathias Bourgue"), "l_ht"] = 188
df.loc[(df["w_name"] == "Matteo Donati"), "w_ht"] = 185
df.loc[(df["l_name"] == "Matteo Donati"), "l_ht"] = 185
df.loc[(df["w_name"] == "Matteo Trevisan"), "w_ht"] = 183
df.loc[(df["l_name"] == "Matteo Trevisan"), "l_ht"] = 183
df.loc[(df["w_name"] == "Matteo Viola"), "w_ht"] = 185
df.loc[(df["l_name"] == "Matteo Viola"), "l_ht"] = 185
df.loc[(df["w_name"] == "Matthew Barton"), "w_ht"] = 191
df.loc[(df["l_name"] == "Matthew Barton"), "l_ht"] = 191
df.loc[(df["w_name"] == "Maxime Hamou"), "w_ht"] = 185
df.loc[(df["l_name"] == "Maxime Hamou"), "l_ht"] = 185
df.loc[(df["w_name"] == "Maxime Janvier"), "w_ht"] = 196
df.loc[(df["l_name"] == "Maxime Janvier"), "l_ht"] = 196
df.loc[(df["w_name"] == "Maxime Teixeira"), "w_ht"] = 188
df.loc[(df["l_name"] == "Maxime Teixeira"), "l_ht"] = 188
df.loc[(df["w_name"] == "Maximilian Marterer"), "w_ht"] = 188
df.loc[(df["l_name"] == "Maximilian Marterer"), "l_ht"] = 188
df.loc[(df["w_name"] == "Michael Linzer"), "w_ht"] = 180
df.loc[(df["l_name"] == "Michael Linzer"), "l_ht"] = 180
df.loc[(df["w_name"] == "Michael Mmoh"), "w_ht"] = 188
df.loc[(df["l_name"] == "Michael Mmoh"), "l_ht"] = 188
df.loc[(df["w_name"] == "Mikael Torpegaard"), "w_ht"] = 193
df.loc[(df["l_name"] == "Mikael Torpegaard"), "l_ht"] = 193
df.loc[(df["w_name"] == "Miljan Zekic"), "w_ht"] = 185
df.loc[(df["l_name"] == "Miljan Zekic"), "l_ht"] = 185
df.loc[(df["w_name"] == "Mitchell Krueger"), "w_ht"] = 188
df.loc[(df["l_name"] == "Mitchell Krueger"), "l_ht"] = 188
df.loc[(df["w_name"] == "Mousa Shanan Zayed"), "w_ht"] = 180
df.loc[(df["l_name"] == "Mousa Shanan Zayed"), "l_ht"] = 180
df.loc[(df["w_name"] == "Mubarak Shannan Zayid"), "w_ht"] = 178
df.loc[(df["l_name"] == "Mubarak Shannan Zayid"), "l_ht"] = 178
df.loc[(df["w_name"] == "N Vijay Sundar Prashanth"), "w_ht"] = 178
df.loc[(df["l_name"] == "N Vijay Sundar Prashanth"), "l_ht"] = 178
df.loc[(df["w_name"] == "Nathan Pasha"), "w_ht"] = 191
df.loc[(df["l_name"] == "Nathan Pasha"), "l_ht"] = 191
df.loc[(df["w_name"] == "Nicola Kuhn"), "w_ht"] = 185
df.loc[(df["l_name"] == "Nicola Kuhn"), "l_ht"] = 185
df.loc[(df["w_name"] == "Nicolas Kicker"), "w_ht"] = 178
df.loc[(df["l_name"] == "Nicolas Kicker"), "l_ht"] = 178
df.loc[(df["w_name"] == "Nikola Cacic"), "w_ht"] = 183
df.loc[(df["l_name"] == "Nikola Cacic"), "l_ht"] = 183
df.loc[(df["w_name"] == "Nino Serdarusic"), "w_ht"] = 191
df.loc[(df["l_name"] == "Nino Serdarusic"), "l_ht"] = 191
df.loc[(df["w_name"] == "Noah Rubin"), "w_ht"] = 175
df.loc[(df["l_name"] == "Noah Rubin"), "l_ht"] = 175
df.loc[(df["w_name"] == "Oliver Anderson"), "w_ht"] = 175
df.loc[(df["l_name"] == "Oliver Anderson"), "l_ht"] = 175
df.loc[(df["w_name"] == "Omar Awadhy"), "w_ht"] = 178
df.loc[(df["l_name"] == "Omar Awadhy"), "l_ht"] = 178
df.loc[(df["w_name"] == "Omar Jasika"), "w_ht"] = 183
df.loc[(df["l_name"] == "Omar Jasika"), "l_ht"] = 183
df.loc[(df["w_name"] == "Oriol Roca Batalla"), "w_ht"] = 173
df.loc[(df["l_name"] == "Oriol Roca Batalla"), "l_ht"] = 173
df.loc[(df["w_name"] == "Orlando Luz"), "w_ht"] = 180
df.loc[(df["l_name"] == "Orlando Luz"), "l_ht"] = 180
df.loc[(df["w_name"] == "Oscar Otte"), "w_ht"] = 193
df.loc[(df["l_name"] == "Oscar Otte"), "l_ht"] = 193
df.loc[(df["w_name"] == "Patrick Ciorcila"), "w_ht"] = 180
df.loc[(df["l_name"] == "Patrick Ciorcila"), "l_ht"] = 180
df.loc[(df["w_name"] == "Patrick Kypson"), "w_ht"] = 188
df.loc[(df["l_name"] == "Patrick Kypson"), "l_ht"] = 188
df.loc[(df["w_name"] == "Patrik Rosenholm"), "w_ht"] = 178
df.loc[(df["l_name"] == "Patrik Rosenholm"), "l_ht"] = 178
df.loc[(df["w_name"] == "Pedja Krstin"), "w_ht"] = 183
df.loc[(df["l_name"] == "Pedja Krstin"), "l_ht"] = 183
df.loc[(df["w_name"] == "Pedro Cachin"), "w_ht"] = 185
df.loc[(df["l_name"] == "Pedro Cachin"), "l_ht"] = 185
df.loc[(df["w_name"] == "Pedro Martinez"), "w_ht"] = 185
df.loc[(df["l_name"] == "Pedro Martinez"), "l_ht"] = 185
df.loc[(df["w_name"] == "Pedro Sakamoto"), "w_ht"] = 180
df.loc[(df["l_name"] == "Pedro Sakamoto"), "l_ht"] = 180
df.loc[(df["w_name"] == "Peter Torebko"), "w_ht"] = 185
df.loc[(df["l_name"] == "Peter Torebko"), "l_ht"] = 185
df.loc[(df["w_name"] == "Petros Chrysochos"), "w_ht"] = 185
df.loc[(df["l_name"] == "Petros Chrysochos"), "l_ht"] = 185
df.loc[(df["w_name"] == "Philip Davydenko"), "w_ht"] = 183
df.loc[(df["l_name"] == "Philip Davydenko"), "l_ht"] = 183
df.loc[(df["w_name"] == "Prajnesh Gunneswaran"), "w_ht"] = 188
df.loc[(df["l_name"] == "Prajnesh Gunneswaran"), "l_ht"] = 188
df.loc[(df["w_name"] == "Quentin Halys"), "w_ht"] = 191
df.loc[(df["l_name"] == "Quentin Halys"), "l_ht"] = 191
df.loc[(df["w_name"] == "Rayane Roumane"), "w_ht"] = 193
df.loc[(df["l_name"] == "Rayane Roumane"), "l_ht"] = 193
df.loc[(df["w_name"] == "Raymond Sarmiento"), "w_ht"] = 178
df.loc[(df["l_name"] == "Raymond Sarmiento"), "l_ht"] = 178
df.loc[(df["w_name"] == "Rhyne Williams"), "w_ht"] = 185
df.loc[(df["l_name"] == "Rhyne Williams"), "l_ht"] = 185
df.loc[(df["w_name"] == "Ricardo Ojeda Lara"), "w_ht"] = 178
df.loc[(df["l_name"] == "Ricardo Ojeda Lara"), "l_ht"] = 178
df.loc[(df["w_name"] == "Riccardo Bellotti"), "w_ht"] = 179
df.loc[(df["l_name"] == "Riccardo Bellotti"), "l_ht"] = 179
df.loc[(df["w_name"] == "Roberto Marcora"), "w_ht"] = 193
df.loc[(df["l_name"] == "Roberto Marcora"), "l_ht"] = 193
df.loc[(df["w_name"] == "Roberto Ortega Olmedo"), "w_ht"] = 168
df.loc[(df["l_name"] == "Roberto Ortega Olmedo"), "l_ht"] = 168
df.loc[(df["w_name"] == "Roman Safiullin"), "w_ht"] = 185
df.loc[(df["l_name"] == "Roman Safiullin"), "l_ht"] = 185
df.loc[(df["w_name"] == "Rudolf Molleker"), "w_ht"] = 185
df.loc[(df["l_name"] == "Rudolf Molleker"), "l_ht"] = 185
df.loc[(df["w_name"] == "Ryan Shane"), "w_ht"] = 193
df.loc[(df["l_name"] == "Ryan Shane"), "l_ht"] = 193
df.loc[(df["w_name"] == "Sebastian Ofner"), "w_ht"] = 191
df.loc[(df["l_name"] == "Sebastian Ofner"), "l_ht"] = 191
df.loc[(df["w_name"] == "Sekou Bangoura"), "w_ht"] = 183
df.loc[(df["l_name"] == "Sekou Bangoura"), "l_ht"] = 183
df.loc[(df["w_name"] == "Stefan Kozlov"), "w_ht"] = 183
df.loc[(df["l_name"] == "Stefan Kozlov"), "l_ht"] = 183
df.loc[(df["w_name"] == "Stefano Napolitano"), "w_ht"] = 196
df.loc[(df["l_name"] == "Stefano Napolitano"), "l_ht"] = 196
df.loc[(df["w_name"] == "Steven Diez"), "w_ht"] = 175
df.loc[(df["l_name"] == "Steven Diez"), "l_ht"] = 175
df.loc[(df["w_name"] == "Sumit Nagal"), "w_ht"] = 178
df.loc[(df["l_name"] == "Sumit Nagal"), "l_ht"] = 178
df.loc[(df["w_name"] == "Takanyi Garanganga"), "w_ht"] = 185
df.loc[(df["l_name"] == "Takanyi Garanganga"), "l_ht"] = 185
df.loc[(df["w_name"] == "Takuto Niki"), "w_ht"] = 183
df.loc[(df["l_name"] == "Takuto Niki"), "l_ht"] = 183
df.loc[(df["w_name"] == "Tallon Griekspoor"), "w_ht"] = 193
df.loc[(df["l_name"] == "Tallon Griekspoor"), "l_ht"] = 193
df.loc[(df["w_name"] == "Thai Son Kwiatkowski"), "w_ht"] = 188
df.loc[(df["l_name"] == "Thai Son Kwiatkowski"), "l_ht"] = 188
df.loc[(df["w_name"] == "Thiago Seyboth Wild"), "w_ht"] = 179
df.loc[(df["l_name"] == "Thiago Seyboth Wild"), "l_ht"] = 179
df.loc[(df["w_name"] == "Thomas Fabbiano"), "w_ht"] = 173
df.loc[(df["l_name"] == "Thomas Fabbiano"), "l_ht"] = 173
df.loc[(df["w_name"] == "Tim Puetz"), "w_ht"] = 185
df.loc[(df["l_name"] == "Tim Puetz"), "l_ht"] = 185
df.loc[(df["w_name"] == "Tim Van Rijthoven"), "w_ht"] = 188
df.loc[(df["l_name"] == "Tim Van Rijthoven"), "l_ht"] = 188
df.loc[(df["w_name"] == "Toni Androic"), "w_ht"] = 168
df.loc[(df["l_name"] == "Toni Androic"), "l_ht"] = 168
df.loc[(df["w_name"] == "Tristan Lamasine"), "w_ht"] = 183
df.loc[(df["l_name"] == "Tristan Lamasine"), "l_ht"] = 183
df.loc[(df["w_name"] == "Vaclav Safranek"), "w_ht"] = 183
df.loc[(df["l_name"] == "Vaclav Safranek"), "l_ht"] = 183
df.loc[(df["w_name"] == "Victor Baluda"), "w_ht"] = 179
df.loc[(df["l_name"] == "Victor Baluda"), "l_ht"] = 179
df.loc[(df["w_name"] == "Viktor Galovic"), "w_ht"] = 193
df.loc[(df["l_name"] == "Viktor Galovic"), "l_ht"] = 193
df.loc[(df["w_name"] == "Xin Gao"), "w_ht"] = 180
df.loc[(df["l_name"] == "Xin Gao"), "l_ht"] = 180
df.loc[(df["w_name"] == "Yan Bai"), "w_ht"] = 185
df.loc[(df["l_name"] == "Yan Bai"), "l_ht"] = 185
df.loc[(df["w_name"] == "Yaraslav Shyla"), "w_ht"] = 191
df.loc[(df["l_name"] == "Yaraslav Shyla"), "l_ht"] = 191
df.loc[(df["w_name"] == "Yassine Idmbarek"), "w_ht"] = 180
df.loc[(df["l_name"] == "Yassine Idmbarek"), "l_ht"] = 180
df.loc[(df["w_name"] == "Yibing Wu"), "w_ht"] = 183
df.loc[(df["l_name"] == "Yibing Wu"), "l_ht"] = 183
df.loc[(df["w_name"] == "Yosuke Watanuki"), "w_ht"] = 180
df.loc[(df["l_name"] == "Yosuke Watanuki"), "l_ht"] = 180
df.loc[(df["w_name"] == "Yusuke Takahashi"), "w_ht"] = 170
df.loc[(df["l_name"] == "Yusuke Takahashi"), "l_ht"] = 170
df.loc[(df["w_name"] == "Zachary Svajda"), "w_ht"] = 175
df.loc[(df["l_name"] == "Zachary Svajda"), "l_ht"] = 175
df.loc[(df["w_name"] == "Zdenek Kolar"), "w_ht"] = 185
df.loc[(df["l_name"] == "Zdenek Kolar"), "l_ht"] = 185
df.loc[(df["w_name"] == "Zsombor Piros"), "w_ht"] = 180
df.loc[(df["l_name"] == "Zsombor Piros"), "l_ht"] = 180
df.loc[(df["w_name"] == "Nils Langer"), "w_ht"] = 193
df.loc[(df["l_name"] == "Nils Langer"), "l_ht"] = 193
df.loc[(df["w_name"] == "Jaroslav Pospisil"), "w_ht"] = 178
df.loc[(df["l_name"] == "Jaroslav Pospisil"), "l_ht"] = 178
df.loc[(df["w_name"] == "Dimitar Kutrovsky"), "w_ht"] = 175
df.loc[(df["l_name"] == "Dimitar Kutrovsky"), "l_ht"] = 175
df.loc[(df["w_name"] == "Javier Marti"), "w_ht"] = 185
df.loc[(df["l_name"] == "Javier Marti"), "l_ht"] = 185
df.loc[(df["w_name"] == "Sergio Gutierrez Ferrol"), "w_ht"] = 178
df.loc[(df["l_name"] == "Sergio Gutierrez Ferrol"), "l_ht"] = 178
df.loc[(df["w_name"] == "Alexander Ward"), "w_ht"] = 185
df.loc[(df["l_name"] == "Alexander Ward"), "l_ht"] = 185
df.loc[(df["w_name"] == "Andrey Kumantsov"), "w_ht"] = 180
df.loc[(df["l_name"] == "Andrey Kumantsov"), "l_ht"] = 180
df.loc[(df["w_name"] == "Ariez Elyaas Deen Heshaam"), "w_ht"] = 178
df.loc[(df["l_name"] == "Ariez Elyaas Deen Heshaam"), "l_ht"] = 178
df.loc[(df["w_name"] == "Boy Westerhof"), "w_ht"] = 185
df.loc[(df["l_name"] == "Boy Westerhof"), "l_ht"] = 185
df.loc[(df["w_name"] == "Carlos Gomez Herrera"), "w_ht"] = 191
df.loc[(df["l_name"] == "Carlos Gomez Herrera"), "l_ht"] = 191
df.loc[(df["w_name"] == "Dennis Lajola"), "w_ht"] = 179
df.loc[(df["l_name"] == "Dennis Lajola"), "l_ht"] = 179
df.loc[(df["w_name"] == "Dino Marcan"), "w_ht"] = 178
df.loc[(df["l_name"] == "Dino Marcan"), "l_ht"] = 178
df.loc[(df["w_name"] == "Erik Chvojka"), "w_ht"] = 185
df.loc[(df["l_name"] == "Erik Chvojka"), "l_ht"] = 185
df.loc[(df["w_name"] == "Filip Veger"), "w_ht"] = 178
df.loc[(df["l_name"] == "Filip Veger"), "l_ht"] = 178
df.loc[(df["w_name"] == "Gerard Granollers"), "w_ht"] = 183
df.loc[(df["l_name"] == "Gerard Granollers"), "l_ht"] = 183
df.loc[(df["w_name"] == "Gianluca Naso"), "w_ht"] = 193
df.loc[(df["l_name"] == "Gianluca Naso"), "l_ht"] =193
df.loc[(df["w_name"] == "Ivan Nedelko"), "w_ht"] = 185
df.loc[(df["l_name"] == "Ivan Nedelko"), "l_ht"] = 185
df.loc[(df["w_name"] == "Jonathan Dasnieres De Veigy"), "w_ht"] = 175
df.loc[(df["l_name"] == "Jonathan Dasnieres De Veigy"), "l_ht"] = 175
df.loc[(df["w_name"] == "Josko Topic"), "w_ht"] = 178
df.loc[(df["l_name"] == "Josko Topic"), "l_ht"] = 178
df.loc[(df["w_name"] == "Kristijan Mesaros"), "w_ht"] = 188
df.loc[(df["l_name"] == "Kristijan Mesaros"), "l_ht"] = 188
df.loc[(df["w_name"] == "Marko Djokovic"), "w_ht"] = 187
df.loc[(df["l_name"] == "Marko Djokovic"), "l_ht"] = 187
df.loc[(df["w_name"] == "Maxime Authom"), "w_ht"] = 180
df.loc[(df["l_name"] == "Maxime Authom"), "l_ht"] = 180
df.loc[(df["w_name"] == "Mehdi Ziadi"), "w_ht"] = 180
df.loc[(df["l_name"] == "Mehdi Ziadi"), "l_ht"] = 180
df.loc[(df["w_name"] == "Mikhail Biryukov"), "w_ht"] = 178
df.loc[(df["l_name"] == "Mikhail Biryukov"), "l_ht"] = 178
df.loc[(df["w_name"] == "Milos Sekulic"), "w_ht"] = 178
df.loc[(df["l_name"] == "Milos Sekulic"), "l_ht"] = 178
df.loc[(df["w_name"] == "Nicolas Meister"), "w_ht"] = 180
df.loc[(df["l_name"] == "Nicolas Meister"), "l_ht"] = 180
df.loc[(df["w_name"] == "Nikolai Fidirko"), "w_ht"] = 185
df.loc[(df["l_name"] == "Nikolai Fidirko"), "l_ht"] = 185
df.loc[(df["w_name"] == "Peerakiat Siriluethaiwattana"), "w_ht"] = 175
df.loc[(df["l_name"] == "Peerakiat Siriluethaiwattana"), "l_ht"] = 175
df.loc[(df["w_name"] == "Robin Kern"), "w_ht"] = 188
df.loc[(df["l_name"] == "Robin Kern"), "l_ht"] = 188
df.loc[(df["w_name"] == "Romain Bogaerts"), "w_ht"] = 188
df.loc[(df["l_name"] == "Romain Bogaerts"), "l_ht"] = 188
df.loc[(df["w_name"] == "Sergey Betov"), "w_ht"] = 185
df.loc[(df["l_name"] == "Sergey Betov"), "l_ht"] = 185
df.loc[(df["w_name"] == "Suk Young Jeong"), "w_ht"] = 177
df.loc[(df["l_name"] == "Suk Young Jeong"), "l_ht"] = 177
df.loc[(df["w_name"] == "Walter Trusendi"), "w_ht"] = 179
df.loc[(df["l_name"] == "Walter Trusendi"), "l_ht"] = 179
df.loc[(df["w_name"] == "Wishaya Trongcharoenchaikul"), "w_ht"] = 193
df.loc[(df["l_name"] == "Wishaya Trongcharoenchaikul"), "l_ht"] = 193
df.loc[(df["w_name"] == "Yannick Mertens"), "w_ht"] = 185
df.loc[(df["l_name"] == "Yannick Mertens"), "l_ht"] = 185
df.loc[(df["w_name"] == "Younes Rachidi"), "w_ht"] =  177
df.loc[(df["l_name"] == "Younes Rachidi"), "l_ht"] = 177

In [37]:
# A bunch of players are missing handedness data. Filling in from ATP site here.
df.loc[(df["w_name"] == "Takuto Niki"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Takuto Niki"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Eric Quigley"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Eric Quigley"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Jose Hernandez"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Jose Hernandez"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Jordi Samper Montana"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Jordi Samper Montana"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Alexandar Lazov"), "w_hd"] = "L"
df.loc[(df["l_name"] == "Alexandar Lazov"), "l_hd"] = "L"
df.loc[(df["w_name"] == "Roberto Ortega Olmedo"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Roberto Ortega Olmedo"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Alejandro Gomez "), "w_hd"] = "R" #he's got a trailing space in his csvs for some reason
df.loc[(df["l_name"] == "Alejandro Gomez "), "l_hd"] = "R"
df.loc[(df["w_name"] == "Guillermo Olaso"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Guillermo Olaso"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Clement Reix"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Clement Reix"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Tigre Hank"), "w_hd"] = "L"
df.loc[(df["l_name"] == "Tigre Hank"), "l_hd"] = "L"
df.loc[(df["w_name"] == "Philip Davydenko"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Philip Davydenko"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Xin Gao"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Xin Gao"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Lucas Gomez"), "w_hd"] = "L"
df.loc[(df["l_name"] == "Lucas Gomez"), "l_hd"] = "L"
df.loc[(df["w_name"] == "Blake Mott"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Blake Mott"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Kevin King"), "w_hd"] = "L"
df.loc[(df["l_name"] == "Kevin King"), "l_hd"] = "L"
df.loc[(df["w_name"] == "Nathan Pasha"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Nathan Pasha"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Oliver Anderson"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Oliver Anderson"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Edan Leshem"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Edan Leshem"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Nino Serdarusic"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Nino Serdarusic"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Ryan Shane"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Ryan Shane"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Gerardo Lopez Villasenor"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Gerardo Lopez Villasenor"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Antoine Bellier"), "w_hd"] = "L"
df.loc[(df["l_name"] == "Antoine Bellier"), "l_hd"] = "L"
df.loc[(df["w_name"] == "Amine Ahouda"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Amine Ahouda"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Marvin Moeller"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Marvin Moeller"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Nils Langer"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Nils Langer"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Jaroslav Pospisil"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Jaroslav Pospisil"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Dimitar Kutrovsky"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Dimitar Kutrovsky"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Javier Marti"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Javier Marti"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Sergio Gutierrez Ferrol"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Sergio Gutierrez Ferrol"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Alexander Ward"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Alexander Ward"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Josko Topic"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Josko Topic"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Nicolas Meister"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Nicolas Meister"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Romain Bogaerts"), "w_hd"] = "L"
df.loc[(df["l_name"] == "Romain Bogaerts"), "l_hd"] = "L"
df.loc[(df["w_name"] == "Walter Trusendi"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Walter Trusendi"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Martin Vaisse"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Martin Vaisse"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Mousa Shanan Zayed"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Mousa Shanan Zayed"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Mubarak Shannan Zayid"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Mubarak Shannan Zayid"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Yassine Idmbarek"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Yassine Idmbarek"), "l_hd"] = "R"
df.loc[(df["w_name"] == "Austin Smith"), "w_hd"] = "R"
df.loc[(df["l_name"] == "Austin Smith"), "l_hd"] = "R"

In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18168 entries, 11 to 23293
Data columns (total 48 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   t_id         18168 non-null  object 
 1   t_date       18168 non-null  int64  
 2   tour_wk      18168 non-null  object 
 3   t_name       18168 non-null  object 
 4   t_country    18168 non-null  object 
 5   t_surf       18168 non-null  object 
 6   t_indoor     18168 non-null  object 
 7   t_alt        18168 non-null  object 
 8   t_lvl        18168 non-null  object 
 9   t_draw_size  18168 non-null  int64  
 10  m_num        18168 non-null  int64  
 11  t_round      18168 non-null  object 
 12  t_round_num  18168 non-null  object 
 13  m_best_of    18168 non-null  int64  
 14  m_score      18168 non-null  object 
 15  m_time(m)    18168 non-null  float64
 16  w_id         18168 non-null  int64  
 17  w_name       18168 non-null  object 
 18  w_rank       18159 non-null  float64
 19  w_r

In [39]:
#Save current df for next stage (feature creation for 2012-2019; hard court range)
df.to_csv('../data/cleaned_data_for_FeatureDev_2012to2019.csv', index=False)

In [40]:
#Save current df for next stage (feature creation for 2009-2019; clay court range)
#df.to_csv('../data/cleaned_data_for_FeatureDev_2009to2019.csv', index=False)