# 1. Cleaning and Wrangling 

1st Workbook for the Association of Tennis Professionals (ATP) men's singles predictive modeling project:

* Import Main Tour-level raw match data (2009-2019) from the ATP website, via Jeff Sackmann's Github ('https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/')
* Import historical wagering data (2009-2019; European/decimal format) from Dan Westin (http://www.tennis-data.co.uk/alldata.php) and Oddsportal (https://www.oddsportal.com/)
* Concatenate match data across years, clean/error-correct, add missing match stats and player descriptive/demographic data to existing fields, and add match/tourney-level features for feature development use and raw data merges (eg, unified time base) 
* Add additional, descriptive match and player-level features (e.g., country of tourny/match, indoor vs outdoor, altitude) that will be useful for filtering and development of more complicated predictive features (see Workbook 2) 
* Concatenate historical wagering data across years, derive implied win probabilities from wagering lines, clean/format data, and merge into match dataframe 
    * The derived implied win probabilities (per player/per match) will be used both as predictive features (accrued implied win probability average over time from matches prior to a given match being predicted), as well as benchmarks for model performance (implied win probabilities derived from closing lines of a given match being predicted)
    * Implied win probabilities (vig-corrected) computed both from Averaged wagering lines across books (closing lines only), as well as from Pinnacle Sports lines (both opening and closing lines)
* Merged match stats and implied win probabilities dataframe next moves to expansive descriptive and quantitative feature generation (see Workbook 2). 

## Data Wranging 
### 1. Imports

In [1]:
import pandas as pd
import numpy as np
import datetime
import os
import warnings
warnings.filterwarnings('ignore')

### 2. Loading and Concatening Data

In [2]:
# Load historical match data (main draws)

df_2019 = pd.read_csv('../data/historical_match_data/atp_matches_2019_jnr.csv')
df_2018 = pd.read_csv('../data/historical_match_data/atp_matches_2018_jnr.csv')
df_2017 = pd.read_csv('../data/historical_match_data/atp_matches_2017_jnr.csv')
df_2016 = pd.read_csv('../data/historical_match_data/atp_matches_2016_jnr.csv')
df_2015 = pd.read_csv('../data/historical_match_data/atp_matches_2015_jnr.csv')
df_2014 = pd.read_csv('../data/historical_match_data/atp_matches_2014_jnr.csv')
df_2013 = pd.read_csv('../data/historical_match_data/atp_matches_2013_jnr.csv')
df_2012 = pd.read_csv('../data/historical_match_data/atp_matches_2012_jnr.csv')
df_2011 = pd.read_csv('../data/historical_match_data/atp_matches_2011_jnr.csv')
df_2010 = pd.read_csv('../data/historical_match_data/atp_matches_2010_jnr.csv')
df_2009 = pd.read_csv('../data/historical_match_data/atp_matches_2009_jnr.csv')

# Load historical match data (qualifying draws)

#df_2019_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2019_jnr.csv')
#df_2018_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2018_jnr.csv')
#df_2017_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2017_jnr.csv')
#df_2016_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2016_jnr.csv')
#df_2015_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2015_jnr.csv')
#df_2014_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2014_jnr.csv')
#df_2013_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2013_jnr.csv')
#df_2012_q = pd.read_csv('../data/historical_match_data/atp_matches_qual_2012_jnr.csv')

#Some missing individual values across features in Jeff's original files (missing minutes, heights etc.) have been added from the ATP site to Jeff's original csvs prior to loading into this script. 
#These *_jnr versions with data manually added are loaded here. Jeff's original files without my additions can be found in a separate data subfolder on my Github (no "_jnr" suffixes), as well as on Jeff's Github (see top of this cell for URL)

In [3]:
df = pd.concat([df_2019, df_2018, df_2017, df_2016, df_2015, df_2014, df_2013, df_2012, df_2011, df_2010, df_2009], ignore_index=True)
del df_2019, df_2018, df_2017, df_2016, df_2015, df_2014, df_2013, df_2012, df_2011, df_2010, df_2009
df.head(20) 

Unnamed: 0,tourney_id,tourney_name,surface,draw_size,tourney_level,tourney_date,match_num,winner_id,winner_seed,winner_entry,...,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced,winner_rank,winner_rank_points,loser_rank,loser_rank_points
0,2019-M020,Brisbane,Hard,32,A,20181231,278,105683,5.0,,...,30.0,16.0,2.0,7.0,4.0,8.0,18.0,1855.0,67.0,780.0
1,2019-M020,Brisbane,Hard,32,A,20181231,282,200282,7.0,,...,31.0,18.0,4.0,8.0,4.0,9.0,31.0,1298.0,147.0,367.0
2,2019-M020,Brisbane,Hard,32,A,20181231,279,200175,,Q,...,29.0,18.0,7.0,8.0,4.0,8.0,131.0,433.0,56.0,895.0
3,2019-M020,Brisbane,Hard,32,A,20181231,298,105453,2.0,,...,27.0,15.0,6.0,8.0,1.0,5.0,9.0,3590.0,40.0,1050.0
4,2019-M020,Brisbane,Hard,32,A,20181231,283,111442,,,...,33.0,17.0,12.0,7.0,8.0,13.0,72.0,715.0,154.0,352.0
5,2019-M020,Brisbane,Hard,32,A,20181231,286,105453,2.0,,...,36.0,26.0,8.0,10.0,4.0,7.0,9.0,3590.0,63.0,810.0
6,2019-M020,Brisbane,Hard,32,A,20181231,273,105777,6.0,,...,33.0,21.0,7.0,9.0,0.0,3.0,19.0,1835.0,75.0,701.0
7,2019-M020,Brisbane,Hard,32,A,20181231,292,200282,7.0,,...,39.0,30.0,3.0,9.0,3.0,6.0,31.0,1298.0,72.0,715.0
8,2019-M020,Brisbane,Hard,32,A,20181231,291,106421,4.0,,...,39.0,27.0,7.0,10.0,2.0,6.0,16.0,1977.0,240.0,200.0
9,2019-M020,Brisbane,Hard,32,A,20181231,299,106421,4.0,,...,52.0,36.0,7.0,10.0,10.0,13.0,16.0,1977.0,239.0,200.0


In [4]:
# Remove Challenger events (came in with Main Tour event Qualifying rounds)
#df.drop(df[df['tourney_level'] == 'C'].index, inplace = True)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32350 entries, 0 to 32349
Data columns (total 49 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   tourney_id          32350 non-null  object 
 1   tourney_name        32350 non-null  object 
 2   surface             32350 non-null  object 
 3   draw_size           32350 non-null  int64  
 4   tourney_level       32350 non-null  object 
 5   tourney_date        32350 non-null  int64  
 6   match_num           32350 non-null  int64  
 7   winner_id           32350 non-null  int64  
 8   winner_seed         14030 non-null  float64
 9   winner_entry        4040 non-null   object 
 10  winner_name         32350 non-null  object 
 11  winner_hand         32343 non-null  object 
 12  winner_ht           31057 non-null  float64
 13  winner_ioc          32350 non-null  object 
 14  winner_age          32349 non-null  float64
 15  loser_id            32350 non-null  int64  
 16  lose

In [6]:
df.tail()

Unnamed: 0,tourney_id,tourney_name,surface,draw_size,tourney_level,tourney_date,match_num,winner_id,winner_seed,winner_entry,...,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced,winner_rank,winner_rank_points,loser_rank,loser_rank_points
32345,2009-605,Tour Finals,Hard,8,F,20091122,15,103786,,,...,35.0,21.0,10.0,9.0,5.0,9.0,7.0,3630.0,2.0,9205.0
32346,2009-D015,Davis Cup WG F: ESP vs CZE,Clay,4,D,20091204,1,104745,,,...,,,,,,,2.0,9205.0,20.0,1655.0
32347,2009-D015,Davis Cup WG F: ESP vs CZE,Clay,4,D,20091204,2,103970,,,...,,,,,,,18.0,1795.0,12.0,2625.0
32348,2009-D015,Davis Cup WG F: ESP vs CZE,Clay,4,D,20091204,4,104745,,,...,,,,,,,2.0,9205.0,102.0,526.0
32349,2009-D015,Davis Cup WG F: ESP vs CZE,Clay,4,D,20091204,5,103970,,,...,,,,,,,18.0,1795.0,465.0,72.0


In [7]:
# Load historical wagering data
#Read historical wagering data files into a single DataFrame. Set missing values to NaN.

df_1 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2009_JNR_amended.xlsx', engine='openpyxl')
df_2009 = df_1.parse('2009')
df_2 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2010_JNR_amended.xlsx', engine='openpyxl')
df_2010 = df_2.parse('2010')
df_3 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2011_JNR_amended.xlsx', engine='openpyxl')
df_2011 = df_3.parse('2011')
df_4 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2012_JNR_amended.xlsx', engine='openpyxl')
df_2012 = df_4.parse('2012')
df_5 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2013_JNR_amended.xlsx', engine='openpyxl')
df_2013 = df_5.parse('2013')
df_6 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2014_JNR_amended.xlsx', engine='openpyxl')
df_2014 = df_6.parse('2014')
df_7 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2015_JNR_amended.xlsx', engine='openpyxl')
df_2015 = df_7.parse('2015')
df_8 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2016_JNR_amended.xlsx', engine='openpyxl')
df_2016 = df_8.parse('2016')
df_9 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2017_JNR_amended.xlsx', engine='openpyxl')
df_2017 = df_9.parse('2017')
df_10 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2018_JNR_amended.xlsx', engine='openpyxl')
df_2018 = df_10.parse('2018')
df_11 = pd.ExcelFile('../data/historical_wagering_data/JNR_amended_files/2019_JNR_amended.xlsx', engine='openpyxl')
df_2019 = df_11.parse('2019')

# A considerable amount of cross-checking to the core dataframe (ATP-validated data) and amending rankings at a given time has been manually acheived in these 'JNR-amended' data files. Other small items
# in Dan's data sets have also been amended manually (based on feedback from early rounds of merge attempts).

In [8]:
# Concatenate historical wagering data across years. Will be investigated and tidied following processing of core match data dataframe.
# w in dataframe name is for 'wagering'
df_w = pd.concat([df_2019, df_2018, df_2017, df_2016, df_2015, df_2014, df_2013, df_2012, df_2011, df_2010, df_2009], ignore_index=True)
del df_1, df_2, df_3, df_4, df_5, df_6, df_7, df_8, df_9, df_10, df_11, df_2019, df_2018, df_2017, df_2016, df_2015, df_2014, df_2013, df_2012, df_2011, df_2010, df_2009
df_w.head(20) 

Unnamed: 0,ATP,Location,Tournament,Date,Series,Court,Surface,Round,Best of,Winner,...,PSW_O,PSL_O,EXW,EXL,LBW,LBL,SJW,SJL,UBW,UBL
0,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Dimitrov G.,...,1.33,3.62,,,,,,,,
1,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Raonic M.,...,1.32,3.7,,,,,,,,
2,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Kecmanovic M.,...,1.81,2.09,,,,,,,,
3,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Millman J.,...,1.43,3.05,,,,,,,,
4,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Uchiyama Y.,...,2.57,1.56,,,,,,,,
5,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Kudla D.,...,2.59,1.56,,,,,,,,
6,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Chardy J.,...,2.1,1.82,,,,,,,,
7,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Murray A.,...,1.12,7.42,,,,,,,,
8,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Kyrgios N.,...,1.41,3.1,,,,,,,,
9,1,Brisbane,Brisbane International,2019-01-01,ATP250,Outdoor,Hard,1st Round,3.0,Tsonga J.W.,...,2.1,1.81,,,,,,,,


### 3. Investigating and Tidying Historical Match Data

In [9]:
# Rename useful columns with more clarity
df.rename(columns = {'round':'t_rd','best_of':'m_bestof','match_num':'m_num','tourney_id':'t_id','tourney_name':'t_nm','surface':'t_surf','draw_size':'t_draw_sz','tourney_level':'t_lvl','tourney_date':'t_date','winner_id':'w_id','winner_seed':'w_sd','winner_entry':'w_ent','winner_name':'w_nm','winner_ht':'w_ht','winner_hand':'w_hd','winner_ioc':'w_ioc','winner_age':'w_age', 'loser_id':'l_id','loser_seed':'l_sd','loser_entry':'l_ent','loser_name':'l_nm','loser_hand':'l_hd','loser_ht':'l_ht','loser_ioc':'l_ioc','loser_age':'l_age','score':'m_score','minutes':'m_t(m)','winner_rank':'w_rk','winner_rank_points':'w_rk_pts','loser_rank':'l_rk','loser_rank_points':'l_rk_pts', 'w_1stIn':'w_1st_sv_in', 'l_1stIn':'l_1st_sv_in', 'w_svpt': 'w_sv_pts', 'w_1stIn':'w_1st_sv_in', 'w_1stWon':'w_1st_sv_pts_won', 'l_1stWon':'l_1st_sv_pts_won', 'w_2ndWon':'w_2nd_sv_pts_won', 'l_2ndWon':'l_2nd_sv_pts_won', 'w_1stIn':'w_1st_sv_in', 'l_1stIn':'l_1st_sv_in', 'l_svpt':'l_sv_pts', 'w_SvGms':'w_sv_gms', 'l_SvGms':'l_sv_gms', 'w_bpFaced':'w_bp_faced', 'l_bpFaced':'l_bp_faced', 'w_bpSaved':'w_bp_saved', 'l_bpSaved':'l_bp_saved'}, inplace=True)

In [10]:
# Drop a few columns we won't need
df = df.drop(['w_sd','l_sd'],axis=1)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32350 entries, 0 to 32349
Data columns (total 47 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   t_id              32350 non-null  object 
 1   t_nm              32350 non-null  object 
 2   t_surf            32350 non-null  object 
 3   t_draw_sz         32350 non-null  int64  
 4   t_lvl             32350 non-null  object 
 5   t_date            32350 non-null  int64  
 6   m_num             32350 non-null  int64  
 7   w_id              32350 non-null  int64  
 8   w_ent             4040 non-null   object 
 9   w_nm              32350 non-null  object 
 10  w_hd              32343 non-null  object 
 11  w_ht              31057 non-null  float64
 12  w_ioc             32350 non-null  object 
 13  w_age             32349 non-null  float64
 14  l_id              32350 non-null  int64  
 15  l_ent             6722 non-null   object 
 16  l_nm              32350 non-null  object

In [12]:
# Remove matches resulting in Walkover, and restricting to only matches where at least 12 games were completed.
# Also removed non-tour events (Olympics and Davis Cup)
df = df[~df['t_nm'].str.contains("Olympics")]
df = df[~df['t_nm'].str.contains("Davis Cup")]
df = df[~df['m_score'].str.contains("W/O")]
df = df[(df['w_sv_gms'] + df['l_sv_gms'] >= 12)]

In [13]:
df["w_ent"].unique(), df["l_ent"].unique()

(array([nan, 'Q', 'PR', 'WC', 'LL', 'SE', 'Alt', 'ALT'], dtype=object),
 array([nan, 'WC', 'PR', 'Q', 'LL', 'SE', 'Alt', 'ALT', 'S'], dtype=object))

In [14]:
df["w_ent"]

0        NaN
1        NaN
2          Q
3        NaN
4        NaN
        ... 
32341    NaN
32342    NaN
32343    NaN
32344    NaN
32345    NaN
Name: w_ent, Length: 28532, dtype: object

In [15]:
# Convert tourny entry type to hierarchy. 
df["w_ent"] = df["w_ent"].fillna(5) #regular entry (ie, got in based on ranking)
df.loc[((df["w_ent"] == "PR") | (df["w_ent"] == "WC") | (df["w_ent"] == "SE") | (df["w_ent"] == "Alt") | (df["w_ent"] == "ALT")), "w_ent"] = 4 #all got direct entry, but for extenuating circumstances (ie, not based on current ranking)
df.loc[(df["w_ent"] == "Q"), "w_ent"] = 2.5 #qualifier (ie, had to play matches on site to win way in)
df.loc[(df["w_ent"] == "LL"), "w_ent"] = 2 #lucky loser (ie, also had to play matches on site, but lost final qualifying match)

df["l_ent"] = df["l_ent"].fillna(5) #regular entry (ie, got in based on ranking)
df.loc[((df["l_ent"] == "PR") | (df["l_ent"] == "WC") | (df["l_ent"] == "SE") | (df["l_ent"] == "S") | (df["l_ent"] == "Alt") | (df["l_ent"] == "ALT")), "l_ent"] = 4 #all got direct entry, but for extenuating circumstances (ie, not based on current ranking)
df.loc[(df["l_ent"] == "Q"), "l_ent"] = 2.5 #qualifier (ie, had to play matches on site to win way in)
df.loc[(df["l_ent"] == "LL"), "l_ent"] = 2 #lucky loser (ie, also had to play matches on site, but lost final qualifying match)

In [16]:
df["w_ent"].unique()

array([5, 2.5, 4, 2], dtype=object)

In [17]:
df["l_ent"].unique()

array([5, 4, 2.5, 2], dtype=object)

In [18]:
df.head()

Unnamed: 0,t_id,t_nm,t_surf,t_draw_sz,t_lvl,t_date,m_num,w_id,w_ent,w_nm,...,l_1st_sv_in,l_1st_sv_pts_won,l_2nd_sv_pts_won,l_sv_gms,l_bp_saved,l_bp_faced,w_rk,w_rk_pts,l_rk,l_rk_pts
0,2019-M020,Brisbane,Hard,32,A,20181231,278,105683,5.0,Milos Raonic,...,30.0,16.0,2.0,7.0,4.0,8.0,18.0,1855.0,67.0,780.0
1,2019-M020,Brisbane,Hard,32,A,20181231,282,200282,5.0,Alex De Minaur,...,31.0,18.0,4.0,8.0,4.0,9.0,31.0,1298.0,147.0,367.0
2,2019-M020,Brisbane,Hard,32,A,20181231,279,200175,2.5,Miomir Kecmanovic,...,29.0,18.0,7.0,8.0,4.0,8.0,131.0,433.0,56.0,895.0
3,2019-M020,Brisbane,Hard,32,A,20181231,298,105453,5.0,Kei Nishikori,...,27.0,15.0,6.0,8.0,1.0,5.0,9.0,3590.0,40.0,1050.0
4,2019-M020,Brisbane,Hard,32,A,20181231,283,111442,5.0,Jordan Thompson,...,33.0,17.0,12.0,7.0,8.0,13.0,72.0,715.0,154.0,352.0


In [19]:
# Convert Tournament Level to a numeric hierarchy
df.loc[(df["t_lvl"] == "G"), "t_lvl"] = 4 #Grand Slams
df.loc[(df["t_lvl"] == "F"), "t_lvl"] = 3 #Tour Finals
df.loc[(df["t_lvl"] == "M"), "t_lvl"] = 2 #Masters Series
df.loc[(df["t_lvl"] == "A"), "t_lvl"] = 1 #Regular Tour Events

In [20]:
df["t_lvl"].unique()

array([1, 4, 2, 3], dtype=object)

In [21]:
# Convert surface type to numerics
df.loc[(df["t_surf"] == "Grass"), "t_surf"] = 3 
df.loc[(df["t_surf"] == "Hard"), "t_surf"] = 2 
df.loc[(df["t_surf"] == "Clay"), "t_surf"] = 1 

In [22]:
# Ages reported with unnecessary precision in Jeff's files
df["w_age"] = df["w_age"].round(2)
df["l_age"] = df["l_age"].round(2)

In [23]:
# Create 'tour week' feature for each tourney. 
# Because Jeff's data does not have match dates, doing this for both his match stats data and Dan Westin's/Oddsportal's historical wagering data is a way to correctly merge those two dataframes (and to bring match dates in from Dan's data in doing so)

df["tour_wk"] = ""

df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Pune")), "tour_wk"] = "2019_01"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2019_02"       
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2019_03"       
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Cordoba")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Sofia")), "tour_wk"] = "2019_04"     
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "New York")), "tour_wk"] = "2019_05" 
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Delray Beach")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Rio de Janeiro")), "tour_wk"] = "2019_06"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Sao Paulo")), "tour_wk"] = "2019_07"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2019_08"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2019_09"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Marrakech")), "tour_wk"] = "2019_10"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2019_11"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Budapest")), "tour_wk"] = "2019_12"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich")), "tour_wk"] = "2019_13"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2019_14"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2019_15"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Geneva")|(df["t_nm"] == "Lyon")), "tour_wk"] = "2019_16"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2019_17"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2019_18"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2019_19"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Antalya")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2019_20"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2019_21"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Bastad")|(df["t_nm"] == "Umag")|(df["t_nm"] == "Newport")), "tour_wk"] = "2019_22"
df.loc[(df["t_id"].str.contains("2019")) & (((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Gstaad")|(df["t_nm"] == "Hamburg"))), "tour_wk"] = "2019_23"
df.loc[(df["t_id"].str.contains("2019")) & (((df["t_nm"] == "Kitzbuhel")|(df["t_nm"] == "Los Cabos")|(df["t_nm"] == "Washington"))), "tour_wk"] = "2019_24"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2019_25"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2019_26"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2019_27"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2019_28"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2019_29"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Chengdu")|(df["t_nm"] == "Zhuhai")), "tour_wk"] = "2019_30"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2019_31"  
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2019_32"
df.loc[(df["t_id"].str.contains("2019")) & (((df["t_nm"] == "Antwerp")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2019_33"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Vienna")), "tour_wk"] = "2019_34"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2019_35"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "NextGen Finals")), "tour_wk"] = "2019_36"
df.loc[(df["t_id"].str.contains("2019")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2019_37"

df.loc[(df["t_id"].str.contains("2018")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Pune"))), "tour_wk"] = "2018_01"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2018_02"       
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2018_03"       
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Quito")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Sofia")), "tour_wk"] = "2018_04"     
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "New York")), "tour_wk"] = "2018_05" 
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Delray Beach")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Rio de Janeiro")), "tour_wk"] = "2018_06"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Sao Paulo")), "tour_wk"] = "2018_07"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2018_08"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2018_09"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Marrakech")), "tour_wk"] = "2018_10"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2018_11"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Budapest")), "tour_wk"] = "2018_12"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Istanbul")), "tour_wk"] = "2018_13"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2018_14"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2018_15"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Geneva")|(df["t_nm"] == "Lyon")), "tour_wk"] = "2018_16"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2018_17"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2018_18"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2018_19"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Antalya")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2018_20"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2018_21"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Bastad")|(df["t_nm"] == "Umag")|(df["t_nm"] == "Newport")), "tour_wk"] = "2018_22"
df.loc[(df["t_id"].str.contains("2018")) & (((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Gstaad")|(df["t_nm"] == "Hamburg"))), "tour_wk"] = "2018_23"
df.loc[(df["t_id"].str.contains("2018")) & (((df["t_nm"] == "Kitzbuhel")|(df["t_nm"] == "Los Cabos")|(df["t_nm"] == "Washington"))), "tour_wk"] = "2018_24"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2018_25"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2018_26"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2018_27"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2018_28"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2018_29"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Chengdu")|(df["t_nm"] == "Shenzhen")), "tour_wk"] = "2018_30"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2018_31"  
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2018_32"
df.loc[(df["t_id"].str.contains("2018")) & (((df["t_nm"] == "Antwerp")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2018_33"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Vienna")), "tour_wk"] = "2018_34"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2018_35"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "NextGen Finals")), "tour_wk"] = "2018_36"
df.loc[(df["t_id"].str.contains("2018")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2018_37"

df.loc[(df["t_id"].str.contains("2017")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2017_01"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2017_02"       
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2017_03"       
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Quito")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Sofia")), "tour_wk"] = "2017_04"     
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2017_05" 
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Delray Beach")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Rio de Janeiro")), "tour_wk"] = "2017_06"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Sao Paulo")), "tour_wk"] = "2017_07"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2017_08"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2017_09"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Marrakech")), "tour_wk"] = "2017_10"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2017_11"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Budapest")), "tour_wk"] = "2017_12"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Istanbul")), "tour_wk"] = "2017_13"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2017_14"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2017_15"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Geneva")|(df["t_nm"] == "Lyon")), "tour_wk"] = "2017_16"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2017_17"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2017_18"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2017_19"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Antalya")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2017_20"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2017_21"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Bastad")|(df["t_nm"] == "Umag")|(df["t_nm"] == "Newport")), "tour_wk"] = "2017_22"
df.loc[(df["t_id"].str.contains("2017")) & (((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Gstaad")|(df["t_nm"] == "Hamburg"))), "tour_wk"] = "2017_23"
df.loc[(df["t_id"].str.contains("2017")) & (((df["t_nm"] == "Kitzbuhel")|(df["t_nm"] == "Los Cabos")|(df["t_nm"] == "Washington"))), "tour_wk"] = "2017_24"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2017_25"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2017_26"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2017_27"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2017_28"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2017_29"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Chengdu")|(df["t_nm"] == "Shenzhen")), "tour_wk"] = "2017_30"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2017_31"  
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2017_32"
df.loc[(df["t_id"].str.contains("2017")) & (((df["t_nm"] == "Antwerp")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2017_33"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Vienna")), "tour_wk"] = "2017_34"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2017_35"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "NextGen Finals")), "tour_wk"] = "2017_36"
df.loc[(df["t_id"].str.contains("2017")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2017_37"

df.loc[(df["t_id"].str.contains("2016")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2016_01"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2016_02"       
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2016_03"       
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Quito")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Sofia")), "tour_wk"] = "2016_04"     
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2016_05" 
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Delray Beach")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Rio de Janeiro")), "tour_wk"] = "2016_06"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Sao Paulo")), "tour_wk"] = "2016_07"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2016_08"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2016_09"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Marrakech")), "tour_wk"] = "2016_10"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2016_11"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2016_12"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Istanbul")), "tour_wk"] = "2016_13"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2016_14"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2016_15"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Geneva")|(df["t_nm"] == "Nice")), "tour_wk"] = "2016_16"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2016_17"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2016_18"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2016_19"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Nottingham")), "tour_wk"] = "2016_20"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2016_21"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Hamburg")|(df["t_nm"] == "Newport")), "tour_wk"] = "2016_22"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Umag")|(df["t_nm"] == "Kitzbuhel")| (df["t_nm"] == "Gstaad")|(df["t_nm"] == "Washington")), "tour_wk"] = "2016_23"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2016_24"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Atlanta")), "tour_wk"] = "2016_25"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Los Cabos")), "tour_wk"] = "2016_26"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2016_27"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2016_28"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2016_29"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2016_30"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Chengdu")|(df["t_nm"] == "Shenzhen")), "tour_wk"] = "2016_31"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2016_32"  
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2016_33"
df.loc[(df["t_id"].str.contains("2016")) & (((df["t_nm"] == "Antwerp")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2016_34"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Vienna")), "tour_wk"] = "2016_35"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2016_36"
df.loc[(df["t_id"].str.contains("2016")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2016_37"

df.loc[(df["t_id"].str.contains("2015")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2015_01"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2015_02"       
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2015_03"       
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Quito")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Zagreb")), "tour_wk"] = "2015_04"     
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Sao Paulo")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2015_05" 
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Delray Beach")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Rio de Janeiro")), "tour_wk"] = "2015_06"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Buenos Aires")), "tour_wk"] = "2015_07"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2015_08"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2015_09"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Casablanca")), "tour_wk"] = "2015_10"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2015_11"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2015_12"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Istanbul")), "tour_wk"] = "2015_13"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2015_14"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2015_15"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Geneva")|(df["t_nm"] == "Nice")), "tour_wk"] = "2015_16"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2015_17"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2015_18"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2015_19"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Nottingham")), "tour_wk"] = "2015_20"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2015_21"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Newport")), "tour_wk"] = "2015_22"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Umag")|(df["t_nm"] == "Bogota")), "tour_wk"] = "2015_23"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Hamburg")|(df["t_nm"] == "Gstaad")|(df["t_nm"] == "Atlanta")), "tour_wk"] = "2015_24"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Kitzbuhel")|(df["t_nm"] == "Washington")), "tour_wk"] = "2015_25"                                            
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2015_26"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2015_27"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2015_28"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2015_29"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2015_30"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Kuala Lumpur")|(df["t_nm"] == "Shenzhen")), "tour_wk"] = "2015_31"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2015_32"  
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2015_33"
df.loc[(df["t_id"].str.contains("2015")) & (((df["t_nm"] == "Vienna")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2015_34"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Valencia")), "tour_wk"] = "2015_35"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2015_36"
df.loc[(df["t_id"].str.contains("2015")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2015_37"                                           

df.loc[(df["t_id"].str.contains("2014")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2014_01"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2014_02"       
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2014_03"       
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Vina del Mar")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Zagreb")), "tour_wk"] = "2014_04"     
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2014_05" 
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Delray Beach")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Rio de Janeiro")), "tour_wk"] = "2014_06"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Sao Paulo")), "tour_wk"] = "2014_07"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2014_08"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2014_09"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Casablanca")), "tour_wk"] = "2014_10"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2014_11"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2014_12"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich")), "tour_wk"] = "2014_13"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2014_14"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2014_15"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Dusseldorf")|(df["t_nm"] == "Nice")), "tour_wk"] = "2014_16"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2014_17"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2014_18"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2014_19"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2014_20"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Stuttgart")|(df["t_nm"] == "Newport")), "tour_wk"] = "2014_21"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Hamburg")|(df["t_nm"] == "Bogota")), "tour_wk"] = "2014_22"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Gstaad") |(df["t_nm"] == "Umag")), "tour_wk"] = "2014_23"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Kitzbuhel")|(df["t_nm"] == "Washington")), "tour_wk"] = "2014_24"                                          
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2014_25"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2014_26"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2014_27"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2014_28"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Metz")), "tour_wk"] = "2014_29"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Kuala Lumpur")|(df["t_nm"] == "Shenzhen")), "tour_wk"] = "2014_30"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2014_31"  
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2014_32"
df.loc[(df["t_id"].str.contains("2014")) & (((df["t_nm"] == "Vienna")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2014_33"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Valencia")), "tour_wk"] = "2014_34"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2014_35"
df.loc[(df["t_id"].str.contains("2014")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2014_36"  

df.loc[(df["t_id"].str.contains("2013")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2013_01"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2013_02"       
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2013_03"       
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Santiago")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Zagreb")), "tour_wk"] = "2013_04"     
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "San Jose")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Sao Paulo")), "tour_wk"] = "2013_05" 
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2013_06"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Delray Beach")), "tour_wk"] = "2013_07"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2013_08"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2013_09"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Casablanca")), "tour_wk"] = "2013_10"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2013_11"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2013_12"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich")), "tour_wk"] = "2013_13"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2013_14"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2013_15"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Dusseldorf")|(df["t_nm"] == "Nice")), "tour_wk"] = "2013_16"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2013_17"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2013_18"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2013_19"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2013_20"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Stuttgart")|(df["t_nm"] == "Newport")), "tour_wk"] = "2013_21"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Hamburg")|(df["t_nm"] == "Bogota")), "tour_wk"] = "2013_22"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Gstaad") |(df["t_nm"] == "Umag")), "tour_wk"] = "2013_23"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Kitzbuhel")|(df["t_nm"] == "Washington")), "tour_wk"] = "2013_24"                                          
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2013_25"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2013_26"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2013_27"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2013_28"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2013_29"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Kuala Lumpur")|(df["t_nm"] == "Bangkok")), "tour_wk"] = "2013_30"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2013_31"  
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2013_32"
df.loc[(df["t_id"].str.contains("2013")) & (((df["t_nm"] == "Vienna")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2013_33"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Valencia")), "tour_wk"] = "2013_34"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2013_35"
df.loc[(df["t_id"].str.contains("2013")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2013_36"  

df.loc[(df["t_id"].str.contains("2012")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2012_01"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2012_02"       
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2012_03"       
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Santiago")|(df["t_nm"] == "Montpellier")|(df["t_nm"] == "Zagreb")), "tour_wk"] = "2012_04"     
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "San Jose")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Sao Paulo")), "tour_wk"] = "2012_05" 
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2012_06"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Delray Beach")), "tour_wk"] = "2012_07"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2012_08"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2012_09"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Casablanca")), "tour_wk"] = "2012_10"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2012_11"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Barcelona")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2012_12"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Belgrade")), "tour_wk"] = "2012_13"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2012_14"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2012_15"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Dusseldorf")|(df["t_nm"] == "Nice")), "tour_wk"] = "2012_16"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2012_17"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2012_18"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2012_19"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2012_20"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Stuttgart") |(df["t_nm"] == "Umag")|(df["t_nm"] == "Newport")), "tour_wk"] = "2012_21"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Gstaad") |(df["t_nm"] == "Hamburg")), "tour_wk"] = "2012_22"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Kitzbuhel")|(df["t_nm"] == "Los Angeles")), "tour_wk"] = "2012_23"                                          
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Washington")), "tour_wk"] = "2012_24"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2012_25"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2012_26"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2012_27"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2012_28"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2012_29"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Kuala Lumpur")|(df["t_nm"] == "Bangkok")), "tour_wk"] = "2012_30"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2012_31"  
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2012_32"
df.loc[(df["t_id"].str.contains("2012")) & (((df["t_nm"] == "Vienna")|(df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm"))), "tour_wk"] = "2012_33"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Valencia")), "tour_wk"] = "2012_34"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2012_35"
df.loc[(df["t_id"].str.contains("2012")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2012_36"  

df.loc[(df["t_id"].str.contains("2011")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2011_01"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2011_02"       
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2011_03"       
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Santiago")|(df["t_nm"] == "Johannesburg")|(df["t_nm"] == "Zagreb")), "tour_wk"] = "2011_04"     
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "San Jose")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Costa Do Sauipe")), "tour_wk"] = "2011_05" 
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2011_06"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Delray Beach")), "tour_wk"] = "2011_07"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2011_08"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2011_09"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Casablanca")), "tour_wk"] = "2011_10"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2011_11"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Barcelona")), "tour_wk"] = "2011_12"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Belgrade")), "tour_wk"] = "2011_13"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2011_14"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2011_15"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Dusseldorf")|(df["t_nm"] == "Nice")), "tour_wk"] = "2011_16"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2011_17"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2011_18"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2011_19"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2011_20"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Newport")), "tour_wk"] = "2011_21"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2011_18"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Hamburg")), "tour_wk"] = "2011_19"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Gstaad")|(df["t_nm"] == "Los Angeles") |(df["t_nm"] == "Umag")), "tour_wk"] = "2011_20"                                          
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Washington") |(df["t_nm"] == "Kitzbuhel")), "tour_wk"] = "2011_21"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2011_22"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2011_23"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Winston-Salem")), "tour_wk"] = "2011_24"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2011_25"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2011_26"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Kuala Lumpur")|(df["t_nm"] == "Bangkok")), "tour_wk"] = "2011_27"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2011_28"  
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2011_29"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm")), "tour_wk"] = "2011_30"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Vienna")|(df["t_nm"] == "St. Petersburg")), "tour_wk"] = "2011_31"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Valencia")), "tour_wk"] = "2011_32"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2011_33"
df.loc[(df["t_id"].str.contains("2011")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2011_34"  

df.loc[(df["t_id"].str.contains("2010")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2010_01"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2010_02"       
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2010_03"       
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Santiago")|(df["t_nm"] == "Johannesburg")|(df["t_nm"] == "Zagreb")), "tour_wk"] = "2010_04"     
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "San Jose")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Costa Do Sauipe")), "tour_wk"] = "2010_05" 
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2010_06"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Delray Beach")), "tour_wk"] = "2010_07"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2010_08"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2010_09"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Casablanca")), "tour_wk"] = "2010_10"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2010_11"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Barcelona")), "tour_wk"] = "2010_12"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2010_13"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Belgrade")), "tour_wk"] = "2010_14"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2010_15"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Dusseldorf")|(df["t_nm"] == "Nice")), "tour_wk"] = "2010_16"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2010_17"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2010_18"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2010_19"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2010_20"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Newport")), "tour_wk"] = "2010_21"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2010_22"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Atlanta")|(df["t_nm"] == "Hamburg")), "tour_wk"] = "2010_23"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Gstaad")|(df["t_nm"] == "Los Angeles") |(df["t_nm"] == "Umag")), "tour_wk"] = "2010_24"                                          
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Washington")), "tour_wk"] = "2010_25"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2010_26"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2010_27"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "New Haven")), "tour_wk"] = "2010_28"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2010_29"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2010_30"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Kuala Lumpur")|(df["t_nm"] == "Bangkok")), "tour_wk"] = "2010_31"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2010_32"  
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2010_33"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm")), "tour_wk"] = "2010_34"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Vienna")|(df["t_nm"] == "St. Petersburg") |(df["t_nm"] == "Montpellier")), "tour_wk"] = "2010_35"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Valencia")), "tour_wk"] = "2010_36"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2010_37"
df.loc[(df["t_id"].str.contains("2010")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2010_38"

df.loc[(df["t_id"].str.contains("2009")) & (((df["t_nm"] == "Brisbane")|(df["t_nm"] == "Doha")|(df["t_nm"] == "Chennai"))), "tour_wk"] = "2009_01"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Auckland")|(df["t_nm"] == "Sydney")), "tour_wk"] = "2009_02"       
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Australian Open")), "tour_wk"] = "2009_03"       
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Vina del Mar")|(df["t_nm"] == "Johannesburg")|(df["t_nm"] == "Zagreb")), "tour_wk"] = "2009_04"     
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "San Jose")|(df["t_nm"] == "Rotterdam")|(df["t_nm"] == "Costa Do Sauipe")), "tour_wk"] = "2009_05" 
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Buenos Aires")|(df["t_nm"] == "Marseille")|(df["t_nm"] == "Memphis")), "tour_wk"] = "2009_06"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Acapulco")|(df["t_nm"] == "Dubai")|(df["t_nm"] == "Delray Beach")), "tour_wk"] = "2009_07"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Indian Wells Masters")), "tour_wk"] = "2009_08"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Miami Masters")), "tour_wk"] = "2009_09"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Houston")|(df["t_nm"] == "Casablanca")), "tour_wk"] = "2009_10"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Monte Carlo Masters")), "tour_wk"] = "2009_11"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Barcelona")), "tour_wk"] = "2009_12"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Rome Masters")), "tour_wk"] = "2009_13"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Estoril")|(df["t_nm"] == "Munich") |(df["t_nm"] == "Belgrade")), "tour_wk"] = "2009_14"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Madrid Masters")), "tour_wk"] = "2009_15"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Kitzbuhel")), "tour_wk"] = "2009_16"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Roland Garros")), "tour_wk"] = "2009_17"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Halle")|(df["t_nm"] == "Queen's Club")), "tour_wk"] = "2009_18"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "s Hertogenbosch")|(df["t_nm"] == "Eastbourne")), "tour_wk"] = "2009_19"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Wimbledon")), "tour_wk"] = "2009_20"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Newport")), "tour_wk"] = "2009_21"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Bastad") |(df["t_nm"] == "Stuttgart")), "tour_wk"] = "2009_22"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Indianapolis")|(df["t_nm"] == "Hamburg")), "tour_wk"] = "2009_23"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Gstaad")|(df["t_nm"] == "Los Angeles") |(df["t_nm"] == "Umag")), "tour_wk"] = "2009_24"                                          
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Washington")), "tour_wk"] = "2009_25"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Canada Masters")), "tour_wk"] = "2009_26"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Cincinnati Masters")), "tour_wk"] = "2009_27"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "New Haven")), "tour_wk"] = "2009_28"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "US Open")), "tour_wk"] = "2009_29"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Metz")|(df["t_nm"] == "Bucharest")), "tour_wk"] = "2009_30"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Kuala Lumpur")|(df["t_nm"] == "Bangkok")), "tour_wk"] = "2009_31"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Beijing")|(df["t_nm"] == "Tokyo")), "tour_wk"] = "2009_32"  
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Shanghai Masters")), "tour_wk"] = "2009_33"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Moscow")|(df["t_nm"] == "Stockholm")), "tour_wk"] = "2009_34"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Vienna")|(df["t_nm"] == "St. Petersburg") |(df["t_nm"] == "Lyon(old)")), "tour_wk"] = "2009_35"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Basel")|(df["t_nm"] == "Valencia")), "tour_wk"] = "2009_36"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Paris Masters")), "tour_wk"] = "2009_37"
df.loc[(df["t_id"].str.contains("2009")) & ((df["t_nm"] == "Tour Finals")), "tour_wk"] = "2009_38"


In [24]:
# Tournament Country - will be useful to look at potential advantage of player "Home Court Advantage"

df["t_co"] = ""

#ATP Tournaments and Grand Slams
df.loc[(df["t_nm"] == "Buenos Aires"), "t_co"] = "ARG"
df.loc[(df["t_nm"] == "Adelaide") | (df["t_nm"] == "Australian Open") | (df["t_nm"] == "Brisbane") | (df["t_nm"] == "Sydney"), "t_co"] = "AUS"  
df.loc[(df["t_nm"] == "Kitzbuhel") | (df["t_nm"] == "Poertschach") | (df["t_nm"] == "Vienna"), "t_co"] = "AUT" 
df.loc[(df["t_nm"] == "Antwerp"), "t_co"] = "BEL" 
df.loc[(df["t_nm"] == "Costa Do Sauipe") | (df["t_nm"] == "Rio de Janeiro") | (df["t_nm"] == "Sao Paulo") , "t_co"] = "BRA" 
df.loc[(df["t_nm"] == "Sofia"), "t_co"] = "BUL"
df.loc[(df["t_nm"] == "Canada Masters"), "t_co"] = "CAN" 
df.loc[(df["t_nm"] == "Santiago") | (df["t_nm"] == "Vina del Mar"), "t_co"] = "CHL"
df.loc[(df["t_nm"] == "Beijing") | (df["t_nm"] == "Chengdu") | (df["t_nm"] == "Shanghai Masters") | (df["t_nm"] == "Shenzhen") | (df["t_nm"] == "Zhuhai"), "t_co"] = "CHN"                                     
df.loc[(df["t_nm"] == "Bogota"), "t_co"] = "COL"
df.loc[(df["t_nm"] == "Umag") | (df["t_nm"] == "Zagreb"), "t_co"] = "CRO"
df.loc[(df["t_nm"] == "Quito"), "t_co"] = "ECU"
df.loc[(df["t_nm"] == "Barcelona") | (df["t_nm"] == "Cordoba") | (df["t_nm"] == "Madrid Masters") | (df["t_nm"] == "Madrid Masters(old)") | (df["t_nm"] == "Valencia") | (df["t_nm"] == "Valencia(old)"), "t_co"] = "ESP"
df.loc[(df["t_nm"] == "Lyon") | (df["t_nm"] == "Lyon(old)") | (df["t_nm"] == "Marseille") | (df["t_nm"] == "Metz") | (df["t_nm"] == "Monte Carlo Masters") | (df["t_nm"] == "Montpellier") | (df["t_nm"] == "Nice") | (df["t_nm"] == "Paris Masters") | (df["t_nm"] == "Roland Garros"), "t_co"] = "FRA"
df.loc[(df["t_nm"] == "Dusseldorf") | (df["t_nm"] == "Halle") | (df["t_nm"] == "Hamburg") | (df["t_nm"] == "Hamburg Masters") | (df["t_nm"] == "Munich") | (df["t_nm"] == "Stuttgart"), "t_co"] = "GER" 
df.loc[(df["t_nm"] == "Eastbourne") | (df["t_nm"] == "Nottingham") | (df["t_nm"] == "Queen's Club") | (df["t_nm"] == "Wimbledon"), "t_co"] = "GBR"
df.loc[(df["t_nm"] == "Budapest"), "t_co"] = "HUN"
df.loc[(df["t_nm"] == "Chennai") | (df["t_nm"] == "Mumbai") | (df["t_nm"] == "Pune"), "t_co"] = "IND"
df.loc[(df["t_nm"] == "Rome Masters"), "t_co"] = "ITA"
df.loc[(df["t_nm"] == "Tokyo"), "t_co"] = "JPN"  
df.loc[(df["t_nm"] == "Acapulco") | (df["t_nm"] == "Los Cabos"), "t_co"] = "MEX" 
df.loc[(df["t_nm"] == "Casablanca") | (df["t_nm"] == "Marrakech"), "t_co"] = "MOR"
df.loc[(df["t_nm"] == "Kuala Lumpur"), "t_co"] = "MYS"
df.loc[(df["t_nm"] == "Amersfoort") | (df["t_nm"] == "Rotterdam") | (df["t_nm"] == "s Hertogenbosch"), "t_co"] = "NED" 
df.loc[(df["t_nm"] == "Auckland"), "t_co"] = "NZL"
df.loc[(df["t_nm"] == "Sopot") | (df["t_nm"] == "Warsaw"), "t_co"] = "POL"
df.loc[(df["t_nm"] == "Estoril") | (df["t_nm"] == "Lisbon"), "t_co"] = "POR" 
df.loc[(df["t_nm"] == "Doha"), "t_co"] = "QAT"
df.loc[(df["t_nm"] == "Bucharest"), "t_co"] = "ROU"
df.loc[(df["t_nm"] == "Johannesburg"), "t_co"] = "RSA"
df.loc[(df["t_nm"] == "Moscow") | (df["t_nm"] == "St. Petersburg"), "t_co"] = "RUS" 
df.loc[(df["t_nm"] == "Belgrade"), "t_co"] = "SRB"
df.loc[(df["t_nm"] == "Basel") | (df["t_nm"] == "Geneva") | (df["t_nm"] == "Gstaad"), "t_co"] = "SUI"
df.loc[(df["t_nm"] == "Bastad") | (df["t_nm"] == "Stockholm"), "t_co"] = "SWE" 
df.loc[(df["t_nm"] == "Bangkok"), "t_co"] = "THA"
df.loc[(df["t_nm"] == "Antalya") | (df["t_nm"] == "Istanbul"), "t_co"] = "TUR"
df.loc[(df["t_nm"] == "Dubai"), "t_co"] = "UAE"
df.loc[(df["t_nm"] == "Atlanta") | (df["t_nm"] == "Cincinnati Masters") | (df["t_nm"] == "Delray Beach") | (df["t_nm"] == "Houston") | (df["t_nm"] == "Indian Wells Masters") | (df["t_nm"] == "Indianapolis") | (df["t_nm"] == "Las Vegas") | (df["t_nm"] == "Los Angeles") | (df["t_nm"] == "Memphis") | (df["t_nm"] == "Miami Masters") | (df["t_nm"] == "New Haven") | (df["t_nm"] == "New York") | (df["t_nm"] == "Newport") | (df["t_nm"] == "San Jose") | (df["t_nm"] == "US Open") | (df["t_nm"] == "Washington") | (df["t_nm"] == "Winston-Salem"), "t_co"] = "USA"

#Special End of Year Events
df.loc[(df["t_nm"] == "Tour Finals"), "t_co"] = "GBR" 
#df.loc[(df["t_nm"] == "Tour Finals") & (df["t_id"].str.contains("2008")), "t_co"] = "CHN"  #pre-2009
#df.loc[(df["t_nm"] == "Tour Finals") & (df["t_id"].str.contains("2007")), "t_co"] = "CHN"  #pre-2009
#df.loc[(df["t_nm"] == "Tour Finals") & (df["t_id"].str.contains("2006")), "t_co"] = "CHN"  #pre-2009
df.loc[(df["t_nm"] == "NextGen Finals"), "t_co"] = "ITA" #2019-may not always be true

In [25]:
# 't_GMT_diff'
# Will be useful for building features involving player travel burden. (https://www.timeanddate.com/time/zone/)
# Values per tournament are expressed as GMT + or - hours.
# Note that these are all US DST time zones. That's ok because what we will care about is the offset from one tourny to the next for a given player, not the actual time. 
# Daylight Saving Time began: March 12, 2023 02:00 local time. Clocks went forward one hour.
# Daylight Saving Time ends: November 5, 2023 02:00 local time. Clocks go back one hour.

df["t_GMT_diff"] = ""

#ATP Tournaments and Grand Slams
df.loc[(df["t_nm"] == "Indian Wells Masters") | (df["t_nm"] == "Los Cabos") | (df["t_nm"] == "Las Vegas") | (df["t_nm"] == "Los Angeles") | (df["t_nm"] == "San Jose"), "t_GMT_diff"] = -7
df.loc[(df["t_nm"] == "Acapulco"), "t_GMT_diff"] = -6
df.loc[(df["t_nm"] == "Bogota") | (df["t_nm"] == "Houston") | (df["t_nm"] == "Memphis") | (df["t_nm"] == "Quito"), "t_GMT_diff"] = -5
df.loc[(df["t_nm"] == "Atlanta") | (df["t_nm"] == "Canada Masters") | (df["t_nm"] == "Cincinnati Masters") | (df["t_nm"] == "Delray Beach") | (df["t_nm"] == "Indianapolis") | (df["t_nm"] == "Miami Masters") | (df["t_nm"] == "New Haven") | (df["t_nm"] == "New York") | (df["t_nm"] == "Newport") | (df["t_nm"] == "Santiago") | (df["t_nm"] == "US Open") | (df["t_nm"] == "Vina del Mar") | (df["t_nm"] == "Washington") | (df["t_nm"] == "Winston-Salem"), "t_GMT_diff"] = -4
df.loc[(df["t_nm"] == "Buenos Aires") | (df["t_nm"] == "Costa Do Sauipe") | (df["t_nm"] == "Rio de Janeiro") | (df["t_nm"] == "Sao Paulo"), "t_GMT_diff"] = -3
df.loc[(df["t_nm"] == "Casablanca") | (df["t_nm"] == "Eastbourne") | (df["t_nm"] == "Estoril") | (df["t_nm"] == "Lisbon") | (df["t_nm"] == "Marrakech") | (df["t_nm"] == "Nottingham") | (df["t_nm"] == "Queen's Club") | (df["t_nm"] == "Tour Finals") | (df["t_nm"] == "Wimbledon"), "t_GMT_diff"] = 1
df.loc[(df["t_nm"] == "Amersfoort") | (df["t_nm"] == "Antwerp") | (df["t_nm"] == "Barcelona") | (df["t_nm"] == "Basel") | (df["t_nm"] == "Bastad") | (df["t_nm"] == "Belgrade") | (df["t_nm"] == "Budapest") | (df["t_nm"] == "Cordoba") | (df["t_nm"] == "Dusseldorf") | (df["t_nm"] == "Geneva") | (df["t_nm"] == "Gstaad") | (df["t_nm"] == "Halle") | (df["t_nm"] == "Hamburg") | (df["t_nm"] == "Hamburg Masters") | (df["t_nm"] == "Johannesburg") | (df["t_nm"] == "Kitzbuhel") | (df["t_nm"] == "Lyon") | (df["t_nm"] == "Lyon(old)") | (df["t_nm"] == "Madrid Masters") | (df["t_nm"] == "Madrid Masters(old)") | (df["t_nm"] == "Marseille") | (df["t_nm"] == "Metz") | (df["t_nm"] == "Monte Carlo Masters") | (df["t_nm"] == "Montpellier") | (df["t_nm"] == "Munich") | (df["t_nm"] == "NextGen Finals") | \
       (df["t_nm"] == "Nice") | (df["t_nm"] == "Paris Masters") | (df["t_nm"] == "Poertschach") | (df["t_nm"] == "Roland Garros") | (df["t_nm"] == "Rome Masters") | (df["t_nm"] == "Rotterdam") | (df["t_nm"] == "s Hertogenbosch") | (df["t_nm"] == "Sopot") | (df["t_nm"] == "Stockholm") | (df["t_nm"] == "Stuttgart") | (df["t_nm"] == "Umag") | (df["t_nm"] == "Vienna") | (df["t_nm"] == "Warsaw") | (df["t_nm"] == "Valencia") | (df["t_nm"] == "Valencia(old)") | (df["t_nm"] == "Zagreb"), "t_GMT_diff"] = 2 #NextGen Finals in Turin
df.loc[(df["t_nm"] == "Antalya") | (df["t_nm"] == "Bucharest") | (df["t_nm"] == "Doha") | (df["t_nm"] == "Istanbul") | (df["t_nm"] == "Moscow") | (df["t_nm"] == "Sofia") | (df["t_nm"] == "St. Petersburg"), "t_GMT_diff"] = 3
df.loc[(df["t_nm"] == "Dubai"), "t_GMT_diff"] = 4
df.loc[(df["t_nm"] == "Chennai") | (df["t_nm"] == "Mumbai") | (df["t_nm"] == "Pune"), "t_GMT_diff"] = 5.5
df.loc[(df["t_nm"] == "Bangkok"), "t_GMT_diff"] = 7
df.loc[(df["t_nm"] == "Beijing") | (df["t_nm"] == "Chengdu") | (df["t_nm"] == "Kuala Lumpur") | (df["t_nm"] == "Shanghai Masters") | (df["t_nm"] == "Shenzhen") | (df["t_nm"] == "Zhuhai"), "t_GMT_diff"] = 8
df.loc[(df["t_nm"] == "Tokyo"), "t_GMT_diff"] = 9
df.loc[(df["t_nm"] == "Adelaide"), "t_GMT_diff"] = 9.5
df.loc[(df["t_nm"] == "Australian Open") | (df["t_nm"] == "Brisbane") | (df["t_nm"] == "Sydney"), "t_GMT_diff"] = 10     
df.loc[(df["t_nm"] == "Auckland"), "t_GMT_diff"] = 12       

In [26]:
# Create variable indicating indoor or outdoor tournament (grand slams with retractable roofs on stadium court are counted as outdoor)
# Outdoor = 0, Indoor = 1

df["t_ind"] = ""
df.loc[(df["t_nm"] == "Antwerp"), "t_ind"] = 1 
df.loc[(df["t_nm"] == "Bangkok"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Basel"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Johannesburg"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Kuala Lumpur"), "t_ind"] = 1 
df.loc[(df["t_nm"] == "Lyon(old)"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Madrid Masters(old)"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Marseille"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Memphis"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Metz"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Montpellier"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Moscow"), "t_ind"] = 1
df.loc[(df["t_nm"] == "New York"), "t_ind"] = 1
df.loc[(df["t_nm"] == "NextGen Finals"), "t_ind"] = 1 
df.loc[(df["t_nm"] == "Nice"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Paris Masters"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Rotterdam"), "t_ind"] = 1
df.loc[(df["t_nm"] == "San Jose"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Sao Paulo"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Sofia"), "t_ind"] = 1
df.loc[(df["t_nm"] == "St. Petersburg"), "t_ind"] = 1 
df.loc[(df["t_nm"] == "Tour Finals"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Valencia"), "t_ind"] = 1
df.loc[(df["t_nm"] == "Vienna"), "t_ind"] = 1 
df.loc[(df["t_nm"] == "Zagreb"), "t_ind"] = 1 

df.loc[(df["t_ind"] != 1), "t_ind"] = 0

In [27]:
df["t_ind"].value_counts()

0    23526
1     5006
Name: t_ind, dtype: int64

In [28]:
# For some features(see notebook 2) it will be useful to have a unique identifier for the same tournament year-to-year.
df["t_ident"] = ""

df.loc[(df["t_nm"] == "Acapulco"), "t_ident"] = 1
df.loc[(df["t_nm"] == "Adelaide"), "t_ident"] = 2
df.loc[(df["t_nm"] == "Amesfoort"), "t_ident"] = 3
df.loc[(df["t_nm"] == "Antalya"), "t_ident"] = 4
df.loc[(df["t_nm"] == "Antwerp"), "t_ident"] = 5
df.loc[(df["t_nm"] == "Atlanta"), "t_ident"] = 6
df.loc[(df["t_nm"] == "Auckland"), "t_ident"] = 7
df.loc[(df["t_nm"] == "Australian Open"), "t_ident"] = 8
df.loc[(df["t_nm"] == "Bangkok"), "t_ident"] = 9
df.loc[(df["t_nm"] == "Barcelona"), "t_ident"] = 10
df.loc[(df["t_nm"] == "Basel"), "t_ident"] = 11
df.loc[(df["t_nm"] == "Bastad"), "t_ident"] = 12
df.loc[(df["t_nm"] == "Beijing"), "t_ident"] = 13
df.loc[(df["t_nm"] == "Belgrade"), "t_ident"] = 14
df.loc[(df["t_nm"] == "Bogota"), "t_ident"] = 15
df.loc[(df["t_nm"] == "Brisbane"), "t_ident"] = 16
df.loc[(df["t_nm"] == "Bucharest"), "t_ident"] = 17
df.loc[(df["t_nm"] == "Budapest"), "t_ident"] = 18
df.loc[(df["t_nm"] == "Buenos Aires"), "t_ident"] = 19
df.loc[(df["t_nm"] == "Canada Masters"), "t_ident"] = 20
df.loc[(df["t_nm"] == "Casablanca"), "t_ident"] = 21
df.loc[(df["t_nm"] == "Chengdu"), "t_ident"] = 22
df.loc[(df["t_nm"] == "Chennai"), "t_ident"] = 23
df.loc[(df["t_nm"] == "Cincinnati Masters"), "t_ident"] = 24
df.loc[(df["t_nm"] == "Cordoba"), "t_ident"] = 25
df.loc[(df["t_nm"] == "Costa Do Sauipe"), "t_ident"] = 26
df.loc[(df["t_nm"] == "Delray Beach"), "t_ident"] = 27
df.loc[(df["t_nm"] == "Doha"), "t_ident"] = 28
df.loc[(df["t_nm"] == "Dubai"), "t_ident"] = 29
df.loc[(df["t_nm"] == "Dusseldorf"), "t_ident"] = 30
df.loc[(df["t_nm"] == "Eastbourne"), "t_ident"] = 31
df.loc[(df["t_nm"] == "Estoril"), "t_ident"] = 32
df.loc[(df["t_nm"] == "Geneva"), "t_ident"] = 33
df.loc[(df["t_nm"] == "Gstaad"), "t_ident"] = 34
df.loc[(df["t_nm"] == "Halle"), "t_ident"] = 35
df.loc[(df["t_nm"] == "Hamburg"), "t_ident"] = 36
df.loc[(df["t_nm"] == "Hamburg Masters"), "t_ident"] = 37
df.loc[(df["t_nm"] == "Houston"), "t_ident"] = 38
df.loc[(df["t_nm"] == "Indian Wells Masters"), "t_ident"] = 39
df.loc[(df["t_nm"] == "Indianapolis"), "t_ident"] = 40
df.loc[(df["t_nm"] == "Istanbul"), "t_ident"] = 41
df.loc[(df["t_nm"] == "Johannesburg"), "t_ident"] = 42
df.loc[(df["t_nm"] == "Kitzbuhel"), "t_ident"] = 43
df.loc[(df["t_nm"] == "Kuala Lumpur"), "t_ident"] = 44
df.loc[(df["t_nm"] == "Las Vegas"), "t_ident"] = 45
df.loc[(df["t_nm"] == "Lisbon"), "t_ident"] = 46
df.loc[(df["t_nm"] == "Los Angeles"), "t_ident"] = 47
df.loc[(df["t_nm"] == "Los Cabos"), "t_ident"] = 48
df.loc[(df["t_nm"] == "Lyon"), "t_ident"] = 49
df.loc[(df["t_nm"] == "Lyon(old)"), "t_ident"] = 50
df.loc[(df["t_nm"] == "Madrid Masters"), "t_ident"] = 51
df.loc[(df["t_nm"] == "Madrid Masters(old)"), "t_ident"] = 52
df.loc[(df["t_nm"] == "Marrakech"), "t_ident"] = 53
df.loc[(df["t_nm"] == "Marseille"), "t_ident"] = 54
df.loc[(df["t_nm"] == "Memphis"), "t_ident"] = 55
df.loc[(df["t_nm"] == "Metz"), "t_ident"] = 56
df.loc[(df["t_nm"] == "Miami Masters"), "t_ident"] = 57
df.loc[(df["t_nm"] == "Monte Carlo Masters"), "t_ident"] = 58
df.loc[(df["t_nm"] == "Montpellier"), "t_ident"] = 59
df.loc[(df["t_nm"] == "Moscow"), "t_ident"] = 60
df.loc[(df["t_nm"] == "Mumbai"), "t_ident"] = 61
df.loc[(df["t_nm"] == "Munich"), "t_ident"] = 62
df.loc[(df["t_nm"] == "New Haven"), "t_ident"] = 63
df.loc[(df["t_nm"] == "New York"), "t_ident"] = 64
df.loc[(df["t_nm"] == "Newport"), "t_ident"] = 65
df.loc[(df["t_nm"] == "NextGen Finals"), "t_ident"] = 66
df.loc[(df["t_nm"] == "Nice"), "t_ident"] = 67
df.loc[(df["t_nm"] == "Nottingham"), "t_ident"] = 68
df.loc[(df["t_nm"] == "Paris Masters"), "t_ident"] = 69
df.loc[(df["t_nm"] == "Poertschach"), "t_ident"] = 70
df.loc[(df["t_nm"] == "Pune"), "t_ident"] = 71
df.loc[(df["t_nm"] == "Queen's Club"), "t_ident"] = 72
df.loc[(df["t_nm"] == "Quito"), "t_ident"] = 73
df.loc[(df["t_nm"] == "Rio de Janeiro"), "t_ident"] = 74
df.loc[(df["t_nm"] == "Roland Garros"), "t_ident"] = 75
df.loc[(df["t_nm"] == "Rome Masters"), "t_ident"] = 76
df.loc[(df["t_nm"] == "Rotterdam"), "t_ident"] = 77
df.loc[(df["t_nm"] == "s Hertogenbosch"), "t_ident"] = 78
df.loc[(df["t_nm"] == "San Jose"), "t_ident"] = 79
df.loc[(df["t_nm"] == "Santiago"), "t_ident"] = 80
df.loc[(df["t_nm"] == "Sao Paulo"), "t_ident"] = 81
df.loc[(df["t_nm"] == "Shanghai Masters"), "t_ident"] = 82
df.loc[(df["t_nm"] == "Shenzhen"), "t_ident"] = 83
df.loc[(df["t_nm"] == "Sofia"), "t_ident"] = 84
df.loc[(df["t_nm"] == "Sopot"), "t_ident"] = 85
df.loc[(df["t_nm"] == "St. Petersburg"), "t_ident"] = 86
df.loc[(df["t_nm"] == "Stockholm"), "t_ident"] = 87
df.loc[(df["t_nm"] == "Stuttgart"), "t_ident"] = 88
df.loc[(df["t_nm"] == "Sydney"), "t_ident"] = 89
df.loc[(df["t_nm"] == "Tokyo"), "t_ident"] = 90
df.loc[(df["t_nm"] == "Tour Finals"), "t_ident"] = 91
df.loc[(df["t_nm"] == "Umag"), "t_ident"] = 92
df.loc[(df["t_nm"] == "US Open"), "t_ident"] = 93
df.loc[(df["t_nm"] == "Valencia"), "t_ident"] = 94
df.loc[(df["t_nm"] == "Valencia(old)"), "t_ident"] = 95
df.loc[(df["t_nm"] == "Vienna"), "t_ident"] = 96
df.loc[(df["t_nm"] == "Vina del Mar"), "t_ident"] = 97
df.loc[(df["t_nm"] == "Warsaw"), "t_ident"] = 98
df.loc[(df["t_nm"] == "Washington"), "t_ident"] = 99
df.loc[(df["t_nm"] == "Wimbledon"), "t_ident"] = 100
df.loc[(df["t_nm"] == "Winston-Salem"), "t_ident"] = 101
df.loc[(df["t_nm"] == "Zagreb"), "t_ident"] = 102
df.loc[(df["t_nm"] == "Zhuhai"), "t_ident"] = 103

In [29]:
# A few tournaments are played at exceptional altitude (>3,000 ft). This clearly influences the importance of serve (as seen in the ace rate at these tournies. We want a marker column for these.)
# <3,000 ft. = 0, >3,000 ft. = 1

df["t_alt"] = ""
df.loc[(df["t_nm"] == "Bogota"), "t_alt"] = 1
df.loc[(df["t_nm"] == "Quito"), "t_alt"] = 1
df.loc[(df["t_nm"] == "Gstaad"), "t_alt"] = 1

df.loc[(df["t_alt"] != 1), "t_alt"] = 0

In [30]:
df["t_alt"].value_counts()

0    28045
1      487
Name: t_alt, dtype: int64

In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28532 entries, 0 to 32345
Data columns (total 53 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   t_id              28532 non-null  object 
 1   t_nm              28532 non-null  object 
 2   t_surf            28532 non-null  object 
 3   t_draw_sz         28532 non-null  int64  
 4   t_lvl             28532 non-null  object 
 5   t_date            28532 non-null  int64  
 6   m_num             28532 non-null  int64  
 7   w_id              28532 non-null  int64  
 8   w_ent             28532 non-null  object 
 9   w_nm              28532 non-null  object 
 10  w_hd              28532 non-null  object 
 11  w_ht              28034 non-null  float64
 12  w_ioc             28532 non-null  object 
 13  w_age             28532 non-null  float64
 14  l_id              28532 non-null  int64  
 15  l_ent             28532 non-null  object 
 16  l_nm              28532 non-null  object

In [32]:
# We will need to do multi-tiered sorting to compute rolling-window, retrospective feature development. Tourney week will be critical for this, but we also want
# to convert 'rd' to a numerical value that can be used for multi-level sorting ("rd_num").
df["t_rd_num"] = ""

#Creating appropriate "rd_num" values. 

df.loc[(df["t_rd"] == "R128") & ((df["t_draw_sz"] == 128) | (df["t_draw_sz"] == 96)), "t_rd_num"] = 1
df.loc[(df["t_rd"] == "R64") & ((df["t_draw_sz"] == 64) | (df["t_draw_sz"] == 56) | (df["t_draw_sz"] == 48)), "t_rd_num"] = 1
df.loc[(df["t_rd"] == "R32") & ((df["t_draw_sz"] == 32) | (df["t_draw_sz"] == 28)), "t_rd_num"] = 1
df.loc[(df["t_rd"] == "RR1") & (df["t_draw_sz"] == 8), "t_rd_num"] = 1
df.loc[(df["t_rd"] == "RR1") & (df["t_draw_sz"] == 16), "t_rd_num"] = 1
df.loc[(df["t_rd"] == "RR2") & (df["t_draw_sz"] == 8), "t_rd_num"] = 2
df.loc[(df["t_rd"] == "RR2") & (df["t_draw_sz"] == 16), "t_rd_num"] = 2
df.loc[(df["t_rd"] == "RR3") & (df["t_draw_sz"] == 8), "t_rd_num"] = 3
df.loc[(df["t_rd"] == "RR3") & (df["t_draw_sz"] == 16), "t_rd_num"] = 3
        
df.loc[(df["t_rd"] == "R64") & ((df["t_draw_sz"] == 128) | (df["t_draw_sz"] == 96)), "t_rd_num"] = 2
df.loc[(df["t_rd"] == "R32") & ((df["t_draw_sz"] == 64) | (df["t_draw_sz"] == 56) | (df["t_draw_sz"] == 48)), "t_rd_num"] = 2
df.loc[(df["t_rd"] == "R16") & ((df["t_draw_sz"] == 32) | (df["t_draw_sz"] == 28)), "t_rd_num"] = 2
df.loc[(df["t_rd"] == "SF") & (df["t_draw_sz"] == 8), "t_rd_num"] = 4
df.loc[(df["t_rd"] == "SF") & (df["t_draw_sz"] == 16), "t_rd_num"] = 4

df.loc[(df["t_rd"] == "R32") & ((df["t_draw_sz"] == 128) | (df["t_draw_sz"] == 96)), "t_rd_num"] = 3
df.loc[(df["t_rd"] == "R16") & ((df["t_draw_sz"] == 64) | (df["t_draw_sz"] == 56) | (df["t_draw_sz"] == 48)), "t_rd_num"] = 3
df.loc[(df["t_rd"] == "QF") & ((df["t_draw_sz"] == 32) | (df["t_draw_sz"] == 28)) , "t_rd_num"] = 3
df.loc[(df["t_rd"] == "F") & (df["t_draw_sz"] == 8), "t_rd_num"] = 5
df.loc[(df["t_rd"] == "F") & (df["t_draw_sz"] == 16), "t_rd_num"] = 5
df.loc[(df["t_rd"] == "BR") & (df["t_draw_sz"] == 8), "t_rd_num"] = 5
df.loc[(df["t_rd"] == "BR") & (df["t_draw_sz"] == 16), "t_rd_num"] = 5
        
df.loc[(df["t_rd"] == "R16") & ((df["t_draw_sz"] == 128) | (df["t_draw_sz"] == 96)), "t_rd_num"] = 4
df.loc[(df["t_rd"] == "QF") & ((df["t_draw_sz"] == 64) | (df["t_draw_sz"] == 56) | (df["t_draw_sz"] == 48)), "t_rd_num"] = 4
df.loc[(df["t_rd"] == "SF") & ((df["t_draw_sz"] == 32) | (df["t_draw_sz"] == 28)), "t_rd_num"] = 4

df.loc[(df["t_rd"] == "QF") & ((df["t_draw_sz"] == 128) | (df["t_draw_sz"] == 96)), "t_rd_num"] = 5
df.loc[(df["t_rd"] == "SF") & ((df["t_draw_sz"] == 64) | (df["t_draw_sz"] == 56) | (df["t_draw_sz"] == 48)), "t_rd_num"] = 5
df.loc[(df["t_rd"] == "F") & ((df["t_draw_sz"] == 32) | (df["t_draw_sz"] == 28)), "t_rd_num"] = 5  
        
df.loc[(df["t_rd"] == "SF") & ((df["t_draw_sz"] == 128) | (df["t_draw_sz"] == 96)), "t_rd_num"] = 6
df.loc[(df["t_rd"] == "F") & ((df["t_draw_sz"] == 64) | (df["t_draw_sz"] == 56) | (df["t_draw_sz"] == 48)), "t_rd_num"] = 6

df.loc[(df["t_rd"] == "F") & ((df["t_draw_sz"] == 128) | (df["t_draw_sz"] == 96)), "t_rd_num"] = 7

#Errata:
# I directly updated Jeff's yearly csvs for the Tour Finals & Next Gen Finals to specify where in the sequence rd robin stage matches are.
# Also, generally, some missing individual values across features (missing match times, heights etc.) have been added from the ATP site to Jeff's original csvs prior to loading into this script. 
#Look for *_jnr versions of Jeff's csvs on my Github for the raw data versions loaded here

In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28532 entries, 0 to 32345
Data columns (total 54 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   t_id              28532 non-null  object 
 1   t_nm              28532 non-null  object 
 2   t_surf            28532 non-null  object 
 3   t_draw_sz         28532 non-null  int64  
 4   t_lvl             28532 non-null  object 
 5   t_date            28532 non-null  int64  
 6   m_num             28532 non-null  int64  
 7   w_id              28532 non-null  int64  
 8   w_ent             28532 non-null  object 
 9   w_nm              28532 non-null  object 
 10  w_hd              28532 non-null  object 
 11  w_ht              28034 non-null  float64
 12  w_ioc             28532 non-null  object 
 13  w_age             28532 non-null  float64
 14  l_id              28532 non-null  int64  
 15  l_ent             28532 non-null  object 
 16  l_nm              28532 non-null  object

In [34]:
# no duplicated records
df.duplicated().value_counts()

False    28532
dtype: int64

In [35]:
# Sorting in this processing stream done by player unique numerical ID. Careful visual inspection was used primarily to check for anomolies, but also looking for unusual ID-player name mappings here
# Winners
df.value_counts(subset = ["w_id", "w_nm"], ascending=True).head(60)

w_id    w_nm                       
105047  Tim Puetz                      1
106177  Oriol Roca Batalla             1
104256  Clement Reix                   1
106110  Filip Horansky                 1
106105  Guilherme Clezar               1
106053  Philip Davydenko               1
104311  Leonardo Tavares               1
104226  Bastian Knittel                1
106032  Facundo Arguello               1
105960  Evan King                      1
105948  Federico Coria                 1
104372  Alex Bogdanovic                1
105934  Chuhan Wang                    1
104397  Laurent Recouderc              1
105933  Roberto Quiroz                 1
106005  Constant Lestienne             1
106234  Aslan Karatsev                 1
104160  Lukas Dlouhy                   1
104136  Santiago Gonzalez              1
103862  Michail Elgin                  1
110536  Kevin King                     1
109303  Stefano Napolitano             1
109054  Daniel Masur                   1
106420  Gianluigi Qui

In [36]:
# Sorting in this processing stream done by player unique numerical ID. Careful visual inspection was used primarily to check for anomolies, but also looking for unusual ID-player name mappings here
# Losers
df.value_counts(subset = ["l_id", "l_nm"], ascending=True).head(60)

l_id    l_nm                    
208260  Zachary Svajda              1
105029  Sergey Betov                1
105026  Takuto Niki                 1
105004  Dimitar Kutrovsky           1
104993  Martin Vaisse               1
104986  Vishnu Vardhan              1
104975  Evgeny Kirillov             1
126646  Tim Van Rijthoven           1
104971  Blake Strode                1
104948  Petar Jelenic               1
126846  Aleksandar Vukic            1
127339  Borna Gojo                  1
104914  Ervin Eleskovic             1
104896  Bruno Agostinelli           1
104822  N Vijay Sundar Prashanth    1
104953  Nikolai Fidirko             1
126627  Johan Nikles                1
105059  Kento Takeuchi              1
105070  Hans Podlipnik Castillo     1
105269  Marcelo Demoliner           1
105262  Fernando Romboli            1
105245  Fabiano De Paula            1
126204  Gerardo Lopez Villasenor    1
105233  Jeevan Nedunchezhiyan       1
126208  Yusuke Takahashi            1
105180  Daniel Du

In [37]:
# Adding heights (cm), from ATP website and other web sources, for players with no height listed in Jeff's files

df.loc[(df["w_nm"] == "Adrian Andreev"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Adrian Andreev"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Adrian Menendez Maceiras"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Adrian Menendez Maceiras"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Agustin Velotti"), "w_ht"] = 174
df.loc[(df["l_nm"] == "Agustin Velotti"), "l_ht"] = 174
df.loc[(df["w_nm"] == "Akira Santillan"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Akira Santillan"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Alejandro Gomez "), "w_ht"] = 183 #his csv entries have a trailing space for some reason
df.loc[(df["l_nm"] == "Alejandro Gomez "), "l_ht"] = 183
df.loc[(df["w_nm"] == "Alejandro Gonzalez"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Alejandro Gonzalez"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Alen Avidzba"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Alen Avidzba"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Alessandro Bega"), "w_ht"] = 173
df.loc[(df["l_nm"] == "Alessandro Bega"), "l_ht"] = 173
df.loc[(df["w_nm"] == "Alex Bolt"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Alex Bolt"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Alexandar Lazarov"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Alexandar Lazarov"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Alexander Donski"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Alexander Donski"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Alexander Sarkissian"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Alexander Sarkissian"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Alexander Ward"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Alexander Ward"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Alexandre Muller"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Alexandre Muller"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Alexandre Sidorenko"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Alexandre Sidorenko"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Alexey Vatutin"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Alexey Vatutin"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Alexios Halebian"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Alexios Halebian"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Aleksandar Vukic"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Aleksandar Vukic"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Alibek Kachmazov"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Alibek Kachmazov"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Amine Ahouda"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Amine Ahouda"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Andrea Arnaboldi"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Andrea Arnaboldi"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Andrea Basso"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Andrea Basso"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Andres Artunedo Martinavarro"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Andres Artunedo Martinavarro"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Andres Molteni"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Andres Molteni"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Andrew Whittington"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Andrew Whittington"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Andrey Kumantsov"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Andrey Kumantsov"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Anil Yuksel"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Anil Yuksel"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Ante Pavic"), "w_ht"] = 196
df.loc[(df["l_nm"] == "Ante Pavic"), "l_ht"] = 196
df.loc[(df["w_nm"] == "Antoine Bellier"), "w_ht"] = 196
df.loc[(df["l_nm"] == "Antoine Bellier"), "l_ht"] = 196
df.loc[(df["w_nm"] == "Antoine Hoang"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Antoine Hoang"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Antonio Veic"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Antonio Veic"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Ariez Elyaas Deen Heshaam"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Ariez Elyaas Deen Heshaam"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Arjun Kadhe"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Arjun Kadhe"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Artem Dubrivnyy"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Artem Dubrivnyy"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Arthur De Greef"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Arthur De Greef"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Austin Krajicek"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Austin Krajicek"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Austin Smith"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Austin Smith"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Axel Michon"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Axel Michon"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Bastian Trinker"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Bastian Trinker"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Benjamin Bonzi"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Benjamin Bonzi"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Benjamin Mitchell"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Benjamin Mitchell"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Bernabe Zapata Miralles"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Bernabe Zapata Miralles"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Bjorn Fratangelo"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Bjorn Fratangelo"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Blake Mott"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Blake Mott"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Borna Gojo"), "w_ht"] = 196
df.loc[(df["l_nm"] == "Borna Gojo"), "l_ht"] = 196
df.loc[(df["w_nm"] == "Bowen Ouyang"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Bowen Ouyang"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Boy Westerhof"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Boy Westerhof"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Brayden Schnur"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Brayden Schnur"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Calvin Hemery"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Calvin Hemery"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Carl Soderlund"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Carl Soderlund"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Carlos Gomez Herrera"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Carlos Gomez Herrera"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Carlos Taberner"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Carlos Taberner"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Chase Buchanan"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Chase Buchanan"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Christopher Eubanks"), "w_ht"] = 201
df.loc[(df["l_nm"] == "Christopher Eubanks"), "l_ht"] = 201
df.loc[(df["w_nm"] == "Christian Harrison"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Christian Harrison"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Chun Hsin Tseng"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Chun Hsin Tseng"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Clement Reix"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Clement Reix"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Collin Altamirano"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Collin Altamirano"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Constant Lestienne"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Constant Lestienne"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Daniel Altmaier"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Daniel Altmaier"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Daniel Garza"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Daniel Garza"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Daniel Kosakowski"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Daniel Kosakowski"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Daniel Masur"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Daniel Masur"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Daniel Munoz de la Nava"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Daniel Munoz de la Nava"), "l_ht"] = 175
df.loc[(df["w_nm"] == "David Vega Hernandez"), "w_ht"] = 188
df.loc[(df["l_nm"] == "David Vega Hernandez"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Dennis Lajola"), "w_ht"] = 179
df.loc[(df["l_nm"] == "Dennis Lajola"), "l_ht"] = 179
df.loc[(df["w_nm"] == "Dennis Novikov"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Dennis Novikov"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Dimitar Kutrovsky"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Dimitar Kutrovsky"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Dino Marcan"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Dino Marcan"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Eduardo Struvay"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Eduardo Struvay"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Elliot Benchetrit"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Elliot Benchetrit"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Emil Reinberg"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Emil Reinberg"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Emilio Gomez"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Emilio Gomez"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Emilio Nava"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Emilio Nava"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Enrique Lopez Perez"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Enrique Lopez Perez"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Erik Chvojka"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Erik Chvojka"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Eric Quigley"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Eric Quigley"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Ernesto Escobedo"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Ernesto Escobedo"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Evgeny Karlovskiy"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Evgeny Karlovskiy"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Evgenii Tiurnev"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Evgenii Tiurnev"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Fabiano De Paula"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Fabiano De Paula"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Fabrice Martin"), "w_ht"] = 198
df.loc[(df["l_nm"] == "Fabrice Martin"), "l_ht"] = 198
df.loc[(df["w_nm"] == "Facundo Arguello"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Facundo Arguello"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Federico Coria"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Federico Coria"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Federico Gaio"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Federico Gaio"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Filip Horansky"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Filip Horansky"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Filip Veger"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Filip Veger"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Filippo Baldi"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Filippo Baldi"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Francisco Cerundolo"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Francisco Cerundolo"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Fred Simonsson"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Fred Simonsson"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Frederico Ferreira Silva"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Frederico Ferreira Silva"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Geoffrey Blancaneaux"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Geoffrey Blancaneaux"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Gerard Granollers"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Gerard Granollers"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Gerardo Lopez Villasenor"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Gerardo Lopez Villasenor"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Germain Gigounon"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Germain Gigounon"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Gian Marco Moroni"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Gian Marco Moroni"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Gianluca Naso"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Gianluca Naso"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Gianluigi Quinzi"), "w_ht"] = 171
df.loc[(df["l_nm"] == "Gianluigi Quinzi"), "l_ht"] = 171
df.loc[(df["w_nm"] == "Gleb Sakharov"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Gleb Sakharov"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Gregoire Burquier"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Gregoire Burquier"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Guido Andreozzi"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Guido Andreozzi"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Guillermo Olaso"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Guillermo Olaso"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Hicham Khaddari"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Hicham Khaddari"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Hugo Gaston"), "w_ht"] = 173
df.loc[(df["l_nm"] == "Hugo Gaston"), "l_ht"] = 173
df.loc[(df["w_nm"] == "Inigo Cervantes Huegun"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Inigo Cervantes Huegun"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Isak Arvidsson"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Isak Arvidsson"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Ivan Nedelko"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Ivan Nedelko"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Jabor Al Mutawa"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Jabor Al Mutawa"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Jack Mingjie Lin"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Jack Mingjie Lin"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Jan Satral"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Jan Satral"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Jared Donaldson"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Jared Donaldson"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Jaroslav Pospisil"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Jaroslav Pospisil"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Jason Jung"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Jason Jung"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Jason Kubler"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Jason Kubler"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Javier Marti"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Javier Marti"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Jc Aragone"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Jc Aragone"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Jeevan Nedunchezhiyan"), "w_ht"] = 173
df.loc[(df["l_nm"] == "Jeevan Nedunchezhiyan"), "l_ht"] = 173
df.loc[(df["w_nm"] == "Jenson Brooksby"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Jenson Brooksby"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Johan Nikles"), "w_ht"] = 173
df.loc[(df["l_nm"] == "Johan Nikles"), "l_ht"] = 173
df.loc[(df["w_nm"] == "John Patrick Smith"), "w_ht"] = 188
df.loc[(df["l_nm"] == "John Patrick Smith"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Jonathan Dasnieres De Veigy"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Jonathan Dasnieres De Veigy"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Jordi Samper Montana"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Jordi Samper Montana"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Joris De Loore"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Joris De Loore"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Josko Topic"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Josko Topic"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Juan Ignacio Londero"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Juan Ignacio Londero"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Julian Lenz"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Julian Lenz"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Jurij Rodionov"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Jurij Rodionov"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Kamil Majchrzak"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Kamil Majchrzak"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Karim Hossam"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Karim Hossam"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Kento Takeuchi"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Kento Takeuchi"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Kevin King"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Kevin King"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Kristijan Mesaros"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Kristijan Mesaros"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Laurent Lokoli"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Laurent Lokoli"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Liam Broady"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Liam Broady"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Liam Caruana"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Liam Caruana"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Lorenzo Giustino"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Lorenzo Giustino"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Louis Wessels"), "w_ht"] = 198
df.loc[(df["l_nm"] == "Louis Wessels"), "l_ht"] = 198
df.loc[(df["w_nm"] == "Luca Vanni"), "w_ht"] = 198
df.loc[(df["l_nm"] == "Luca Vanni"), "l_ht"] = 198
df.loc[(df["w_nm"] == "Lucas Catarina"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Lucas Catarina"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Lucas Miedler"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Lucas Miedler"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Luis Patino"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Luis Patino"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Luke Saville"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Luke Saville"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Mackenzie Mcdonald"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Mackenzie Mcdonald"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Marc Andrea Huesler"), "w_ht"] = 196
df.loc[(df["l_nm"] == "Marc Andrea Huesler"), "l_ht"] = 196
df.loc[(df["w_nm"] == "Marc Polmans"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Marc Polmans"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Marcelo Demoliner"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Marcelo Demoliner"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Marco Trungelliti"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Marco Trungelliti"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Marko Djokovic"), "w_ht"] = 187
df.loc[(df["l_nm"] == "Marko Djokovic"), "l_ht"] = 187
df.loc[(df["w_nm"] == "Marko Tepavac"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Marko Tepavac"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Martin Alund"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Martin Alund"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Martin Vaisse"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Martin Vaisse"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Marvin Moeller"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Marvin Moeller"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Mate Valkusz"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Mate Valkusz"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Mathias Bourgue"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Mathias Bourgue"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Matteo Donati"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Matteo Donati"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Matteo Trevisan"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Matteo Trevisan"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Matteo Viola"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Matteo Viola"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Matthew Barton"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Matthew Barton"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Maxime Authom"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Maxime Authom"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Maxime Hamou"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Maxime Hamou"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Maxime Janvier"), "w_ht"] = 196
df.loc[(df["l_nm"] == "Maxime Janvier"), "l_ht"] = 196
df.loc[(df["w_nm"] == "Maxime Teixeira"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Maxime Teixeira"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Maximilian Marterer"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Maximilian Marterer"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Mehdi Ziadi"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Mehdi Ziadi"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Michael Linzer"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Michael Linzer"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Michael Mmoh"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Michael Mmoh"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Mikael Torpegaard"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Mikael Torpegaard"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Mikhail Biryukov"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Mikhail Biryukov"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Miljan Zekic"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Miljan Zekic"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Milos Sekulic"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Milos Sekulic"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Mitchell Krueger"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Mitchell Krueger"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Mousa Shanan Zayed"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Mousa Shanan Zayed"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Mubarak Shannan Zayid"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Mubarak Shannan Zayid"), "l_ht"] = 178
df.loc[(df["w_nm"] == "N Vijay Sundar Prashanth"), "w_ht"] = 178
df.loc[(df["l_nm"] == "N Vijay Sundar Prashanth"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Nathan Pasha"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Nathan Pasha"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Nicolas Meister"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Nicolas Meister"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Nicola Kuhn"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Nicola Kuhn"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Nicolas Kicker"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Nicolas Kicker"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Nikola Cacic"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Nikola Cacic"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Nikolai Fidirko"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Nikolai Fidirko"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Nils Langer"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Nils Langer"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Nino Serdarusic"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Nino Serdarusic"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Noah Rubin"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Noah Rubin"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Oliver Anderson"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Oliver Anderson"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Omar Awadhy"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Omar Awadhy"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Omar Jasika"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Omar Jasika"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Oriol Roca Batalla"), "w_ht"] = 173
df.loc[(df["l_nm"] == "Oriol Roca Batalla"), "l_ht"] = 173
df.loc[(df["w_nm"] == "Orlando Luz"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Orlando Luz"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Oscar Otte"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Oscar Otte"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Patrick Ciorcila"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Patrick Ciorcila"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Patrick Kypson"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Patrick Kypson"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Patrik Rosenholm"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Patrik Rosenholm"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Pedja Krstin"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Pedja Krstin"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Pedro Cachin"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Pedro Cachin"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Pedro Martinez"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Pedro Martinez"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Pedro Sakamoto"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Pedro Sakamoto"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Peerakiat Siriluethaiwattana"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Peerakiat Siriluethaiwattana"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Peter Torebko"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Peter Torebko"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Petros Chrysochos"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Petros Chrysochos"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Philip Davydenko"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Philip Davydenko"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Prajnesh Gunneswaran"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Prajnesh Gunneswaran"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Quentin Halys"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Quentin Halys"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Rayane Roumane"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Rayane Roumane"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Raymond Sarmiento"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Raymond Sarmiento"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Rhyne Williams"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Rhyne Williams"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Ricardo Ojeda Lara"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Ricardo Ojeda Lara"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Riccardo Bellotti"), "w_ht"] = 179
df.loc[(df["l_nm"] == "Riccardo Bellotti"), "l_ht"] = 179
df.loc[(df["w_nm"] == "Roberto Marcora"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Roberto Marcora"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Roberto Ortega Olmedo"), "w_ht"] = 168
df.loc[(df["l_nm"] == "Roberto Ortega Olmedo"), "l_ht"] = 168
df.loc[(df["w_nm"] == "Robin Kern"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Robin Kern"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Romain Bogaerts"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Romain Bogaerts"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Roman Safiullin"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Roman Safiullin"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Rudolf Molleker"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Rudolf Molleker"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Ryan Shane"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Ryan Shane"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Sebastian Ofner"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Sebastian Ofner"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Sekou Bangoura"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Sekou Bangoura"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Sergey Betov"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Sergey Betov"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Sergio Gutierrez Ferrol"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Sergio Gutierrez Ferrol"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Stefan Kozlov"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Stefan Kozlov"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Stefano Napolitano"), "w_ht"] = 196
df.loc[(df["l_nm"] == "Stefano Napolitano"), "l_ht"] = 196
df.loc[(df["w_nm"] == "Steven Diez"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Steven Diez"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Suk Young Jeong"), "w_ht"] = 177
df.loc[(df["l_nm"] == "Suk Young Jeong"), "l_ht"] = 177
df.loc[(df["w_nm"] == "Sumit Nagal"), "w_ht"] = 178
df.loc[(df["l_nm"] == "Sumit Nagal"), "l_ht"] = 178
df.loc[(df["w_nm"] == "Takanyi Garanganga"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Takanyi Garanganga"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Takuto Niki"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Takuto Niki"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Tallon Griekspoor"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Tallon Griekspoor"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Thai Son Kwiatkowski"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Thai Son Kwiatkowski"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Thiago Seyboth Wild"), "w_ht"] = 179
df.loc[(df["l_nm"] == "Thiago Seyboth Wild"), "l_ht"] = 179
df.loc[(df["w_nm"] == "Thomas Fabbiano"), "w_ht"] = 173
df.loc[(df["l_nm"] == "Thomas Fabbiano"), "l_ht"] = 173
df.loc[(df["w_nm"] == "Tim Puetz"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Tim Puetz"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Tim Van Rijthoven"), "w_ht"] = 188
df.loc[(df["l_nm"] == "Tim Van Rijthoven"), "l_ht"] = 188
df.loc[(df["w_nm"] == "Toni Androic"), "w_ht"] = 168
df.loc[(df["l_nm"] == "Toni Androic"), "l_ht"] = 168
df.loc[(df["w_nm"] == "Tristan Lamasine"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Tristan Lamasine"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Vaclav Safranek"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Vaclav Safranek"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Victor Baluda"), "w_ht"] = 179
df.loc[(df["l_nm"] == "Victor Baluda"), "l_ht"] = 179
df.loc[(df["w_nm"] == "Viktor Galovic"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Viktor Galovic"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Walter Trusendi"), "w_ht"] = 179
df.loc[(df["l_nm"] == "Walter Trusendi"), "l_ht"] = 179
df.loc[(df["w_nm"] == "Wishaya Trongcharoenchaikul"), "w_ht"] = 193
df.loc[(df["l_nm"] == "Wishaya Trongcharoenchaikul"), "l_ht"] = 193
df.loc[(df["w_nm"] == "Xin Gao"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Xin Gao"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Yan Bai"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Yan Bai"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Yannick Mertens"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Yannick Mertens"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Yaraslav Shyla"), "w_ht"] = 191
df.loc[(df["l_nm"] == "Yaraslav Shyla"), "l_ht"] = 191
df.loc[(df["w_nm"] == "Yassine Idmbarek"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Yassine Idmbarek"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Yibing Wu"), "w_ht"] = 183
df.loc[(df["l_nm"] == "Yibing Wu"), "l_ht"] = 183
df.loc[(df["w_nm"] == "Yosuke Watanuki"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Yosuke Watanuki"), "l_ht"] = 180
df.loc[(df["w_nm"] == "Younes Rachidi"), "w_ht"] =  177
df.loc[(df["l_nm"] == "Younes Rachidi"), "l_ht"] = 177
df.loc[(df["w_nm"] == "Yusuke Takahashi"), "w_ht"] = 170
df.loc[(df["l_nm"] == "Yusuke Takahashi"), "l_ht"] = 170
df.loc[(df["w_nm"] == "Zachary Svajda"), "w_ht"] = 175
df.loc[(df["l_nm"] == "Zachary Svajda"), "l_ht"] = 175
df.loc[(df["w_nm"] == "Zdenek Kolar"), "w_ht"] = 185
df.loc[(df["l_nm"] == "Zdenek Kolar"), "l_ht"] = 185
df.loc[(df["w_nm"] == "Zsombor Piros"), "w_ht"] = 180
df.loc[(df["l_nm"] == "Zsombor Piros"), "l_ht"] = 180

In [38]:
# A bunch of players are missing handedness data. Filling in from ATP site here.

df.loc[(df["w_nm"] == "Alejandro Gomez "), "w_hd"] = "R" #he's got a trailing space in his csvs for some reason
df.loc[(df["l_nm"] == "Alejandro Gomez "), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Alexandar Lazov"), "w_hd"] = "L"
df.loc[(df["l_nm"] == "Alexandar Lazov"), "l_hd"] = "L"
df.loc[(df["w_nm"] == "Alexander Ward"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Alexander Ward"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Amine Ahouda"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Amine Ahouda"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Antoine Bellier"), "w_hd"] = "L"
df.loc[(df["l_nm"] == "Antoine Bellier"), "l_hd"] = "L"
df.loc[(df["w_nm"] == "Austin Smith"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Austin Smith"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Blake Mott"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Blake Mott"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Clement Reix"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Clement Reix"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Dimitar Kutrovsky"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Dimitar Kutrovsky"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Edan Leshem"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Edan Leshem"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Eric Quigley"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Eric Quigley"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Gerardo Lopez Villasenor"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Gerardo Lopez Villasenor"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Guillermo Olaso"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Guillermo Olaso"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Jaroslav Pospisil"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Jaroslav Pospisil"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Javier Marti"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Javier Marti"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Jordi Samper Montana"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Jordi Samper Montana"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Jose Hernandez"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Jose Hernandez"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Josko Topic"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Josko Topic"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Kevin King"), "w_hd"] = "L"
df.loc[(df["l_nm"] == "Kevin King"), "l_hd"] = "L"
df.loc[(df["w_nm"] == "Lucas Gomez"), "w_hd"] = "L"
df.loc[(df["l_nm"] == "Lucas Gomez"), "l_hd"] = "L"
df.loc[(df["w_nm"] == "Martin Vaisse"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Martin Vaisse"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Marvin Moeller"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Marvin Moeller"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Mousa Shanan Zayed"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Mousa Shanan Zayed"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Mubarak Shannan Zayid"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Mubarak Shannan Zayid"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Nathan Pasha"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Nathan Pasha"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Nicolas Meister"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Nicolas Meister"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Nils Langer"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Nils Langer"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Nino Serdarusic"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Nino Serdarusic"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Oliver Anderson"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Oliver Anderson"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Philip Davydenko"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Philip Davydenko"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Romain Bogaerts"), "w_hd"] = "L"
df.loc[(df["l_nm"] == "Romain Bogaerts"), "l_hd"] = "L"
df.loc[(df["w_nm"] == "Ryan Shane"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Ryan Shane"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Roberto Ortega Olmedo"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Roberto Ortega Olmedo"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Sergio Gutierrez Ferrol"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Sergio Gutierrez Ferrol"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Takuto Niki"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Takuto Niki"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Tigre Hank"), "w_hd"] = "L"
df.loc[(df["l_nm"] == "Tigre Hank"), "l_hd"] = "L"
df.loc[(df["w_nm"] == "Walter Trusendi"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Walter Trusendi"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Xin Gao"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Xin Gao"), "l_hd"] = "R"
df.loc[(df["w_nm"] == "Yassine Idmbarek"), "w_hd"] = "R"
df.loc[(df["l_nm"] == "Yassine Idmbarek"), "l_hd"] = "R"

In [39]:
# Convert handedness to numeric type
df.loc[(df["w_hd"] == "R"), "w_hd"] = 0 
df.loc[(df["w_hd"] == "L"), "w_hd"] = 1 
df.loc[(df["w_hd"] == "U"), "w_hd"] = 0 #assume the few unknowns are righties
df.loc[(df["l_hd"] == "R"), "l_hd"] = 0
df.loc[(df["l_hd"] == "L"), "l_hd"] = 1
df.loc[(df["l_hd"] == "U"), "l_hd"] = 0 #assume the few unknowns are righties

In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28532 entries, 0 to 32345
Data columns (total 54 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   t_id              28532 non-null  object 
 1   t_nm              28532 non-null  object 
 2   t_surf            28532 non-null  object 
 3   t_draw_sz         28532 non-null  int64  
 4   t_lvl             28532 non-null  object 
 5   t_date            28532 non-null  int64  
 6   m_num             28532 non-null  int64  
 7   w_id              28532 non-null  int64  
 8   w_ent             28532 non-null  object 
 9   w_nm              28532 non-null  object 
 10  w_hd              28532 non-null  object 
 11  w_ht              28517 non-null  float64
 12  w_ioc             28532 non-null  object 
 13  w_age             28532 non-null  float64
 14  l_id              28532 non-null  int64  
 15  l_ent             28532 non-null  object 
 16  l_nm              28532 non-null  object

### 4. Investigating and Tidying Historical Wagering Data

In [41]:
# Strip df down to only what will be needed for the merge with match stats dataframe (second workbook). 
 
df2_w = df_w[['Location','Tournament','Date','Surface','Round','Winner','Loser', 'WRank', 'LRank', 'PSW', 'PSL', 'AvgW', 'AvgL', 'PSW_O', 'PSL_O', 'Comment']]

#We have no player id numbers from Dan's dataframe, and player name convention is different from the core dataframe too, but player rankings at time of match (a unique, albeit time-dependent, identifier), combined with a tour week identifier to be generated later in this workbook will be sufficient to get the wagering info in proper merge alignment with the match stats dataframe in the second workbook. 

In [42]:
# Renaming remaining columns per core dataframe
df2_w.rename(columns = {'Location':'t_loc','Tournament':'t_nm', 'Date': 'm_date', 'Surface':'t_surf','Round':'t_rd','Winner':'w_nm','Loser':'l_nm','WRank':'w_rk','LRank':'l_rk','PSW':'PSW_BL','PSL':'PSL_BL','AvgW':'AvgW_BL','AvgL':'AvgL_BL','PSW_O':'PSW_O_BL','PSL_O':'PSL_O_BL'}, inplace=True)
del df_w

In [43]:
df2_w.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29106 entries, 0 to 29105
Data columns (total 16 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   t_loc     29106 non-null  object        
 1   t_nm      29106 non-null  object        
 2   m_date    29106 non-null  datetime64[ns]
 3   t_surf    29106 non-null  object        
 4   t_rd      29106 non-null  object        
 5   w_nm      29106 non-null  object        
 6   l_nm      29106 non-null  object        
 7   w_rk      29095 non-null  float64       
 8   l_rk      29044 non-null  float64       
 9   PSW_BL    26227 non-null  float64       
 10  PSL_BL    26227 non-null  float64       
 11  AvgW_BL   25401 non-null  float64       
 12  AvgL_BL   25401 non-null  float64       
 13  PSW_O_BL  13971 non-null  float64       
 14  PSL_O_BL  13971 non-null  float64       
 15  Comment   29106 non-null  object        
dtypes: datetime64[ns](1), float64(8), object(7)
memory usage: 

In [44]:
# Filter out matches that didn't happen
df2_w = df2_w[~df2_w['Comment'].str.contains("Walkover")]

In [45]:
# Create additional column with datetime in Date converted to string to make useful for renaming indexing below
df2_w['m_date_str'] = df2_w['m_date'].dt.strftime('%Y-%m-%d') #we want datetime as a string to index into here

In [46]:
# Aligning tournmnt nms with those in the match stats dataframe will be key for merging implied win probabilities into match stats dataframe
# As such, need to make a few amendments/corrections in the wagering dataframe

df2_w.loc[(df2_w["t_nm"] == "French Open"), "t_loc"] = "Roland Garros" #avoid ambiguity with Paris Indoor
df2_w.loc[(df2_w["t_nm"] == "Wimbledon"), "t_loc"] = "Wimbledon" #avoid ambiguity with London event and align with Dan Westin files
df2_w.loc[(df2_w["t_nm"] == "US Open"), "t_loc"] = "Flushing Meadows" #avoid ambiguity with New York ATP event
df2_w.loc[(df2_w["t_loc"] == "'s-Hertogenbosch"), "t_loc"] = "s Hertogenbosch" #align with spelling in Dan Westin's files
df2_w.loc[(df2_w["t_loc"] == "Queens Club"), "t_loc"] = "Queen's Club" #align with spelling in Dan Westin's files
df2_w.loc[(df2_w["t_loc"] == "Dubai "), "t_loc"] = "Dubai" #extra space bug in Dan's files for Dubai
df2_w.loc[(df2_w["t_loc"] == "Johannesburg "), "t_loc"] = "Johannesburg" #extra space bug in Dan's files for Johannesburg
df2_w.loc[(df2_w["t_loc"] == "Shenzhen "), "t_loc"] = "Shenzhen" #same for Shenzen
df2_w.loc[(df2_w["t_loc"] == "Estoril "), "t_loc"] = "Estoril" #same for Estoril

In [47]:
# Creating tour week variable per tournament will be key for merging implied win probabilities into match stats dataframe (as match stats dataframe does NOT have match dates; they will come in with this dataframe)

df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Pune")), "tour_wk"] = "2019_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2019_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2019_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Cordoba")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Sofia")), "tour_wk"] = "2019_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "New York")), "tour_wk"] = "2019_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Delray Beach")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Rio de Janeiro")), "tour_wk"] = "2019_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Sao Paulo")), "tour_wk"] = "2019_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2019_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2019_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Marrakech")), "tour_wk"] = "2019_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2019_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Budapest")), "tour_wk"] = "2019_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich")), "tour_wk"] = "2019_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2019_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2019_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Geneva")|(df2_w["t_loc"] == "Lyon")), "tour_wk"] = "2019_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2019_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2019_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2019_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Antalya")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2019_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2019_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Bastad")|(df2_w["t_loc"] == "Umag")|(df2_w["t_loc"] == "Newport")), "tour_wk"] = "2019_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & (((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Hamburg"))), "tour_wk"] = "2019_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & (((df2_w["t_loc"] == "Kitzbuhel")|(df2_w["t_loc"] == "Los Cabos")|(df2_w["t_loc"] == "Washington"))), "tour_wk"] = "2019_24"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Montreal")), "tour_wk"] = "2019_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2019_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2019_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2019_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2019_29"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Chengdu")|(df2_w["t_loc"] == "Zhuhai")), "tour_wk"] = "2019_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2019_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2019_32"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & (((df2_w["t_loc"] == "Antwerp")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2019_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Vienna")), "tour_wk"] = "2019_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2019_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "Milan")), "tour_wk"] = "2019_36"
df2_w.loc[(df2_w["m_date_str"].str.contains("2019")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2019_37"

df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Pune"))), "tour_wk"] = "2018_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2018_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2018_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Quito")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Sofia")), "tour_wk"] = "2018_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "New York")), "tour_wk"] = "2018_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Delray Beach")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Rio de Janeiro")), "tour_wk"] = "2018_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Sao Paulo")), "tour_wk"] = "2018_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2018_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2018_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Marrakech")), "tour_wk"] = "2018_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2018_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Budapest")), "tour_wk"] = "2018_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Istanbul")), "tour_wk"] = "2018_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2018_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2018_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Geneva")|(df2_w["t_loc"] == "Lyon")), "tour_wk"] = "2018_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2018_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2018_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2018_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Antalya")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2018_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2018_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Bastad")|(df2_w["t_loc"] == "Umag")|(df2_w["t_loc"] == "Newport")), "tour_wk"] = "2018_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & (((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Hamburg"))), "tour_wk"] = "2018_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & (((df2_w["t_loc"] == "Kitzbuhel")|(df2_w["t_loc"] == "Los Cabos")|(df2_w["t_loc"] == "Washington"))), "tour_wk"] = "2018_24"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Toronto")), "tour_wk"] = "2018_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2018_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2018_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2018_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2018_29"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Chengdu")|(df2_w["t_loc"] == "Shenzhen")), "tour_wk"] = "2018_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2018_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2018_32"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & (((df2_w["t_loc"] == "Antwerp")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2018_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Vienna")), "tour_wk"] = "2018_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2018_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "Milan")), "tour_wk"] = "2018_36"
df2_w.loc[(df2_w["m_date_str"].str.contains("2018")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2018_37"

df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2017_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2017_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2017_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Quito")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Sofia")), "tour_wk"] = "2017_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2017_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Delray Beach")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Rio de Janeiro")), "tour_wk"] = "2017_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Sao Paulo")), "tour_wk"] = "2017_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2017_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2017_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Marrakech")), "tour_wk"] = "2017_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2017_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Budapest")), "tour_wk"] = "2017_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Istanbul")), "tour_wk"] = "2017_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2017_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2017_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Geneva")|(df2_w["t_loc"] == "Lyon")), "tour_wk"] = "2017_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2017_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2017_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2017_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Antalya")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2017_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2017_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Bastad")|(df2_w["t_loc"] == "Umag")|(df2_w["t_loc"] == "Newport")), "tour_wk"] = "2017_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & (((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Hamburg"))), "tour_wk"] = "2017_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & (((df2_w["t_loc"] == "Kitzbuhel")|(df2_w["t_loc"] == "Los Cabos")|(df2_w["t_loc"] == "Washington"))), "tour_wk"] = "2017_24"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Montreal")), "tour_wk"] = "2017_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2017_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2017_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2017_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2017_29"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Chengdu")|(df2_w["t_loc"] == "Shenzhen")), "tour_wk"] = "2017_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2017_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2017_32"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & (((df2_w["t_loc"] == "Antwerp")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2017_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Vienna")), "tour_wk"] = "2017_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2017_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "Milan")), "tour_wk"] = "2017_36"
df2_w.loc[(df2_w["m_date_str"].str.contains("2017")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2017_37"

df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2016_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2016_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2016_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Quito")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Sofia")), "tour_wk"] = "2016_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2016_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Delray Beach")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Rio de Janeiro")), "tour_wk"] = "2016_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Sao Paulo")), "tour_wk"] = "2016_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2016_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2016_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Marrakech")), "tour_wk"] = "2016_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2016_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2016_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Istanbul")), "tour_wk"] = "2016_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2016_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2016_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Geneva")|(df2_w["t_loc"] == "Nice")), "tour_wk"] = "2016_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2016_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2016_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2016_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Nottingham")), "tour_wk"] = "2016_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2016_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Hamburg")|(df2_w["t_loc"] == "Newport")), "tour_wk"] = "2016_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Umag")|(df2_w["t_loc"] == "Kitzbuhel")| (df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Washington")), "tour_wk"] = "2016_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Toronto")), "tour_wk"] = "2016_24"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Atlanta")), "tour_wk"] = "2016_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Los Cabos")), "tour_wk"] = "2016_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2016_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2016_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2016_29"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2016_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Chengdu")|(df2_w["t_loc"] == "Shenzhen")), "tour_wk"] = "2016_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2016_32"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2016_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & (((df2_w["t_loc"] == "Antwerp")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2016_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Vienna")), "tour_wk"] = "2016_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2016_36"
df2_w.loc[(df2_w["m_date_str"].str.contains("2016")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2016_37"

df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2015_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2015_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2015_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Quito")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Zagreb")), "tour_wk"] = "2015_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Sao Paulo")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2015_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Delray Beach")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Rio de Janeiro")), "tour_wk"] = "2015_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Buenos Aires")), "tour_wk"] = "2015_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2015_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2015_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Casablanca")), "tour_wk"] = "2015_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2015_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2015_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Istanbul")), "tour_wk"] = "2015_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2015_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2015_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Geneva")|(df2_w["t_loc"] == "Nice")), "tour_wk"] = "2015_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2015_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2015_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2015_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Nottingham")), "tour_wk"] = "2015_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2015_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Newport")), "tour_wk"] = "2015_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Umag")|(df2_w["t_loc"] == "Bogota")), "tour_wk"] = "2015_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Hamburg")|(df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Atlanta")), "tour_wk"] = "2015_24"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Kitzbuhel")|(df2_w["t_loc"] == "Washington")), "tour_wk"] = "2015_25"                                            
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Montreal")), "tour_wk"] = "2015_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2015_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2015_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2015_29"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2015_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Kuala Lumpur")|(df2_w["t_loc"] == "Shenzhen")), "tour_wk"] = "2015_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2015_32"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2015_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & (((df2_w["t_loc"] == "Vienna")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2015_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Valencia")), "tour_wk"] = "2015_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2015_36"
df2_w.loc[(df2_w["m_date_str"].str.contains("2015")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2015_37"                                           

df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2014_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2014_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2014_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Vina del Mar")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Zagreb")), "tour_wk"] = "2014_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2014_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Delray Beach")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Rio de Janeiro")), "tour_wk"] = "2014_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Sao Paulo")), "tour_wk"] = "2014_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2014_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2014_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Casablanca")), "tour_wk"] = "2014_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2014_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2014_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Oeiras")|(df2_w["t_loc"] == "Munich")), "tour_wk"] = "2014_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2014_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2014_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Dusseldorf")|(df2_w["t_loc"] == "Nice")), "tour_wk"] = "2014_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2014_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2014_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2014_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2014_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Stuttgart")|(df2_w["t_loc"] == "Newport")), "tour_wk"] = "2014_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Hamburg")|(df2_w["t_loc"] == "Bogota")), "tour_wk"] = "2014_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Gstaad") |(df2_w["t_loc"] == "Umag")), "tour_wk"] = "2014_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Kitzbuhel")|(df2_w["t_loc"] == "Washington")), "tour_wk"] = "2014_24"                                          
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Toronto")), "tour_wk"] = "2014_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2014_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2014_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2014_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Metz")), "tour_wk"] = "2014_29"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Kuala Lumpur")|(df2_w["t_loc"] == "Shenzhen")), "tour_wk"] = "2014_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2014_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2014_32"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & (((df2_w["t_loc"] == "Vienna")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2014_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Valencia")), "tour_wk"] = "2014_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2014_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2014")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2014_36"  

df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2013_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2013_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2013_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Vina del Mar")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Zagreb")), "tour_wk"] = "2013_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "San Jose")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Sao Paulo")), "tour_wk"] = "2013_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2013_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Delray Beach")), "tour_wk"] = "2013_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2013_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2013_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Casablanca")), "tour_wk"] = "2013_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2013_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2013_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Oeiras")|(df2_w["t_loc"] == "Munich")), "tour_wk"] = "2013_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2013_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2013_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Dusseldorf")|(df2_w["t_loc"] == "Nice")), "tour_wk"] = "2013_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2013_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2013_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2013_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2013_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Stuttgart")|(df2_w["t_loc"] == "Newport")), "tour_wk"] = "2013_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Hamburg")|(df2_w["t_loc"] == "Bogota")), "tour_wk"] = "2013_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Gstaad") |(df2_w["t_loc"] == "Umag")), "tour_wk"] = "2013_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Kitzbuhel")|(df2_w["t_loc"] == "Washington")), "tour_wk"] = "2013_24"                                          
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Montreal")), "tour_wk"] = "2013_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2013_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2013_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2013_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2013_29"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Kuala Lumpur")|(df2_w["t_loc"] == "Bangkok")), "tour_wk"] = "2013_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2013_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2013_32"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & (((df2_w["t_loc"] == "Vienna")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2013_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Valencia")), "tour_wk"] = "2013_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2013_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2013")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2013_36"  

df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2012_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2012_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2012_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Vina del Mar")|(df2_w["t_loc"] == "Montpellier")|(df2_w["t_loc"] == "Zagreb")), "tour_wk"] = "2012_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "San Jose")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Sao Paulo")), "tour_wk"] = "2012_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2012_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Delray Beach")), "tour_wk"] = "2012_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2012_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2012_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Casablanca")), "tour_wk"] = "2012_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2012_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Barcelona")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2012_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Belgrade")), "tour_wk"] = "2012_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2012_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2012_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Dusseldorf")|(df2_w["t_loc"] == "Nice")), "tour_wk"] = "2012_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2012_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2012_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2012_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2012_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Stuttgart") |(df2_w["t_loc"] == "Umag")|(df2_w["t_loc"] == "Newport")), "tour_wk"] = "2012_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Gstaad") |(df2_w["t_loc"] == "Hamburg")), "tour_wk"] = "2012_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Kitzbuhel")|(df2_w["t_loc"] == "Los Angeles")), "tour_wk"] = "2012_23"                                          
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Washington")), "tour_wk"] = "2012_24"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Toronto")), "tour_wk"] = "2012_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2012_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2012_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2012_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2012_29"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Kuala Lumpur")|(df2_w["t_loc"] == "Bangkok")), "tour_wk"] = "2012_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2012_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2012_32"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & (((df2_w["t_loc"] == "Vienna")|(df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm"))), "tour_wk"] = "2012_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Valencia")), "tour_wk"] = "2012_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2012_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2012")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2012_36"

df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2011_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2011_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2011_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Santiago")|(df2_w["t_loc"] == "Johannesburg")|(df2_w["t_loc"] == "Zagreb")), "tour_wk"] = "2011_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "San Jose")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Costa Do Sauipe")), "tour_wk"] = "2011_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2011_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Delray Beach")), "tour_wk"] = "2011_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2011_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2011_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Casablanca")), "tour_wk"] = "2011_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2011_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Barcelona")), "tour_wk"] = "2011_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Belgrade")), "tour_wk"] = "2011_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2011_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2011_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Dusseldorf")|(df2_w["t_loc"] == "Nice")), "tour_wk"] = "2011_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2011_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2011_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2011_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2011_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Newport")), "tour_wk"] = "2011_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2011_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Hamburg")), "tour_wk"] = "2011_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Los Angeles") |(df2_w["t_loc"] == "Umag")), "tour_wk"] = "2011_20"                                          
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Washington") |(df2_w["t_loc"] == "Kitzbuhel")), "tour_wk"] = "2011_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Montreal")), "tour_wk"] = "2011_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2011_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Winston-Salem")), "tour_wk"] = "2011_24"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2011_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2011_26"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Kuala Lumpur")|(df2_w["t_loc"] == "Bangkok")), "tour_wk"] = "2011_27"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2011_28"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2011_29"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm")), "tour_wk"] = "2011_30"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Vienna")|(df2_w["t_loc"] == "St. Petersburg")), "tour_wk"] = "2011_31"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Valencia")), "tour_wk"] = "2011_32"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2011_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2011")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2011_34"  

df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2010_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2010_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2010_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Santiago")|(df2_w["t_loc"] == "Johannesburg")|(df2_w["t_loc"] == "Zagreb")), "tour_wk"] = "2010_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "San Jose")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Costa Do Sauipe")), "tour_wk"] = "2010_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2010_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Delray Beach")), "tour_wk"] = "2010_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2010_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2010_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Casablanca")), "tour_wk"] = "2010_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2010_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Barcelona")), "tour_wk"] = "2010_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2010_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Belgrade")), "tour_wk"] = "2010_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2010_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Dusseldorf")|(df2_w["t_loc"] == "Nice")), "tour_wk"] = "2010_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2010_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2010_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2010_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2010_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Newport")), "tour_wk"] = "2010_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2010_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Atlanta")|(df2_w["t_loc"] == "Hamburg")), "tour_wk"] = "2010_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Los Angeles") |(df2_w["t_loc"] == "Umag")), "tour_wk"] = "2010_24"                                          
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Washington")), "tour_wk"] = "2010_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Toronto")), "tour_wk"] = "2010_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2010_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "New Haven")), "tour_wk"] = "2010_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2010_29"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2010_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Kuala Lumpur")|(df2_w["t_loc"] == "Bangkok")), "tour_wk"] = "2010_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2010_32"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2010_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm")), "tour_wk"] = "2010_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Vienna")|(df2_w["t_loc"] == "St. Petersburg") |(df2_w["t_loc"] == "Montpellier")), "tour_wk"] = "2010_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Valencia")), "tour_wk"] = "2010_36"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2010_37"
df2_w.loc[(df2_w["m_date_str"].str.contains("2010")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2010_38"

df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & (((df2_w["t_loc"] == "Brisbane")|(df2_w["t_loc"] == "Doha")|(df2_w["t_loc"] == "Chennai"))), "tour_wk"] = "2009_01"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Auckland")|(df2_w["t_loc"] == "Sydney")), "tour_wk"] = "2009_02"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Melbourne")), "tour_wk"] = "2009_03"       
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Vina del Mar")|(df2_w["t_loc"] == "Johannesburg")|(df2_w["t_loc"] == "Zagreb")), "tour_wk"] = "2009_04"     
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "San Jose")|(df2_w["t_loc"] == "Rotterdam")|(df2_w["t_loc"] == "Costa Do Sauipe")), "tour_wk"] = "2009_05" 
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Buenos Aires")|(df2_w["t_loc"] == "Marseille")|(df2_w["t_loc"] == "Memphis")), "tour_wk"] = "2009_06"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Acapulco")|(df2_w["t_loc"] == "Dubai")|(df2_w["t_loc"] == "Delray Beach")), "tour_wk"] = "2009_07"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Indian Wells")), "tour_wk"] = "2009_08"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Miami")), "tour_wk"] = "2009_09"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Houston")|(df2_w["t_loc"] == "Casablanca")), "tour_wk"] = "2009_10"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Monte Carlo")), "tour_wk"] = "2009_11"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Barcelona")), "tour_wk"] = "2009_12"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Rome")), "tour_wk"] = "2009_13"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Estoril")|(df2_w["t_loc"] == "Munich") |(df2_w["t_loc"] == "Belgrade")), "tour_wk"] = "2009_14"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Madrid")), "tour_wk"] = "2009_15"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Kitzbuhel")), "tour_wk"] = "2009_16"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Roland Garros")), "tour_wk"] = "2009_17"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Halle")|(df2_w["t_loc"] == "Queen's Club")), "tour_wk"] = "2009_18"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "s Hertogenbosch")|(df2_w["t_loc"] == "Eastbourne")), "tour_wk"] = "2009_19"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Wimbledon")), "tour_wk"] = "2009_20"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Newport")), "tour_wk"] = "2009_21"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Bastad") |(df2_w["t_loc"] == "Stuttgart")), "tour_wk"] = "2009_22"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Indianapolis")|(df2_w["t_loc"] == "Hamburg")), "tour_wk"] = "2009_23"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Gstaad")|(df2_w["t_loc"] == "Los Angeles") |(df2_w["t_loc"] == "Umag")), "tour_wk"] = "2009_24"                                          
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Washington")), "tour_wk"] = "2009_25"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Montreal")), "tour_wk"] = "2009_26"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Cincinnati")), "tour_wk"] = "2009_27"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "New Haven")), "tour_wk"] = "2009_28"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Flushing Meadows")), "tour_wk"] = "2009_29"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Metz")|(df2_w["t_loc"] == "Bucharest")), "tour_wk"] = "2009_30"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Kuala Lumpur")|(df2_w["t_loc"] == "Bangkok")), "tour_wk"] = "2009_31"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Beijing")|(df2_w["t_loc"] == "Tokyo")), "tour_wk"] = "2009_32"  
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Shanghai")), "tour_wk"] = "2009_33"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Moscow")|(df2_w["t_loc"] == "Stockholm")), "tour_wk"] = "2009_34"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Vienna")|(df2_w["t_loc"] == "St. Petersburg") |(df2_w["t_loc"] == "Lyon")), "tour_wk"] = "2009_35"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Basel")|(df2_w["t_loc"] == "Valencia")), "tour_wk"] = "2009_36"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "Paris")), "tour_wk"] = "2009_37"
df2_w.loc[(df2_w["m_date_str"].str.contains("2009")) & ((df2_w["t_loc"] == "London")), "tour_wk"] = "2009_38"

In [48]:
# It will also be useful later to have a column in the main dataframe that is simply tournament year, in numeric format
df2_w['m_yr'] = [x[:4] for x in df2_w['m_date_str']]
df2_w["m_yr"] = df2_w["m_yr"].astype('int64') 

In [49]:
df2_w.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28927 entries, 0 to 29105
Data columns (total 19 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   t_loc       28927 non-null  object        
 1   t_nm        28927 non-null  object        
 2   m_date      28927 non-null  datetime64[ns]
 3   t_surf      28927 non-null  object        
 4   t_rd        28927 non-null  object        
 5   w_nm        28927 non-null  object        
 6   l_nm        28927 non-null  object        
 7   w_rk        28916 non-null  float64       
 8   l_rk        28865 non-null  float64       
 9   PSW_BL      26068 non-null  float64       
 10  PSL_BL      26068 non-null  float64       
 11  AvgW_BL     25245 non-null  float64       
 12  AvgL_BL     25245 non-null  float64       
 13  PSW_O_BL    13892 non-null  float64       
 14  PSL_O_BL    13892 non-null  float64       
 15  Comment     28927 non-null  object        
 16  m_date_str  28927 non-

### 5. Implied Win Probability Calculation
Per player, per match. Implied win probability separately from opening and closing lines for Pinnacle Sports, and from closing lines only for average across a number of books (exact set of books varies from match-to-match and year-to-year)

In [50]:
# Convert decimal odds to American odds for Averaged closing Lines

df2_w["AvgW_AO"] = ""
df2_w["AvgL_AO"] = ""

df2_w.loc[(df2_w["AvgW_BL"] >= 2), "AvgW_AO"] = (df2_w["AvgW_BL"] - 1) * 100
df2_w.loc[(df2_w["AvgW_BL"] < 2), "AvgW_AO"] = (-100)/(df2_w["AvgW_BL"] - 1) 

df2_w.loc[(df2_w["AvgL_BL"] >= 2), "AvgL_AO"] = (df2_w["AvgL_BL"] - 1) * 100
df2_w.loc[(df2_w["AvgL_BL"] < 2), "AvgL_AO"] = (-100)/(df2_w["AvgL_BL"] - 1)

df2_w["AvgW_AO"] = pd.to_numeric(df2_w["AvgW_AO"])
df2_w["AvgL_AO"] = pd.to_numeric(df2_w["AvgL_AO"])

# Convert decimal odds to American odds for Pinnacle Sports closing lines
df2_w["PSW_AO"] = ""
df2_w["PSL_AO"] = ""

df2_w.loc[(df2_w["PSW_BL"] >= 2), "PSW_AO"] = (df2_w["PSW_BL"] - 1) * 100
df2_w.loc[(df2_w["PSW_BL"] < 2), "PSW_AO"] = (-100)/(df2_w["PSW_BL"] - 1) 

df2_w.loc[(df2_w["PSL_BL"] >= 2), "PSL_AO"] = (df2_w["PSL_BL"] - 1) * 100
df2_w.loc[(df2_w["PSL_BL"] < 2), "PSL_AO"] = (-100)/(df2_w["PSL_BL"] - 1)

df2_w["PSW_AO"] = pd.to_numeric(df2_w["PSW_AO"])
df2_w["PSL_AO"] = pd.to_numeric(df2_w["PSL_AO"])

# Convert decimal odds to American odds for Pinnacle Sports closing lines
df2_w["PSW_O_AO"] = ""
df2_w["PSL_O_AO"] = ""

df2_w.loc[(df2_w["PSW_O_BL"] >= 2), "PSW_O_AO"] = (df2_w["PSW_O_BL"] - 1) * 100
df2_w.loc[(df2_w["PSW_O_BL"] < 2), "PSW_O_AO"] = (-100)/(df2_w["PSW_O_BL"] - 1) 

df2_w.loc[(df2_w["PSL_O_BL"] >= 2), "PSL_O_AO"] = (df2_w["PSL_O_BL"] - 1) * 100
df2_w.loc[(df2_w["PSL_O_BL"] < 2), "PSL_O_AO"] = (-100)/(df2_w["PSL_O_BL"] - 1)

df2_w["PSW_O_AO"] = pd.to_numeric(df2_w["PSW_O_AO"])
df2_w["PSL_O_AO"] = pd.to_numeric(df2_w["PSL_O_AO"])

In [51]:
# Convert American Odds data to implied odds for Averaged closing lines
df2_w["AvgW_C_IP"] = ""
df2_w["AvgL_C_IP"] = ""

df2_w.loc[(df2_w["AvgW_AO"] < 0), "AvgW_C_IP"] = abs(df2_w["AvgW_AO"])/(abs(df2_w["AvgW_AO"]) + 100) *100  #favorite winners
df2_w.loc[(df2_w["AvgW_AO"] > 0), "AvgW_C_IP"] = 100/(abs(df2_w["AvgW_AO"]) + 100) *100  #underdog winners

df2_w.loc[(df2_w["AvgL_AO"] < 0), "AvgL_C_IP"] = abs(df2_w["AvgL_AO"])/(abs(df2_w["AvgL_AO"]) + 100) *100  #favorite losers
df2_w.loc[(df2_w["AvgL_AO"] > 0), "AvgL_C_IP"] = 100/(abs(df2_w["AvgL_AO"]) + 100) *100  #underdog losers

df2_w["AvgW_C_IP"] = pd.to_numeric(df2_w["AvgW_C_IP"])
df2_w["AvgL_C_IP"] = pd.to_numeric(df2_w["AvgL_C_IP"])

# Convert American Odds data to implied odds for Pinnacle closing lines
df2_w["PSW_C_IP"] = ""
df2_w["PSL_C_IP"] = ""

df2_w.loc[(df2_w["PSW_AO"] < 0), "PSW_C_IP"] = abs(df2_w["PSW_AO"])/(abs(df2_w["PSW_AO"]) + 100) *100  #favorite winners
df2_w.loc[(df2_w["PSW_AO"] > 0), "PSW_C_IP"] = 100/(abs(df2_w["PSW_AO"]) + 100) *100  #underdog winners

df2_w.loc[(df2_w["PSL_AO"] < 0), "PSL_C_IP"] = abs(df2_w["PSL_AO"])/(abs(df2_w["PSL_AO"]) + 100) *100  #favorite losers
df2_w.loc[(df2_w["PSL_AO"] > 0), "PSL_C_IP"] = 100/(abs(df2_w["PSL_AO"]) + 100) *100  #underdog losers

df2_w["PSW_C_IP"] = pd.to_numeric(df2_w["PSW_C_IP"])
df2_w["PSL_C_IP"] = pd.to_numeric(df2_w["PSL_C_IP"])

# Convert American Odds data to implied odds for Pinnacle opening lines
df2_w["PSW_O_IP"] = ""
df2_w["PSL_O_IP"] = ""

df2_w.loc[(df2_w["PSW_O_AO"] < 0), "PSW_O_IP"] = abs(df2_w["PSW_O_AO"])/(abs(df2_w["PSW_O_AO"]) + 100) *100  #favorite winners
df2_w.loc[(df2_w["PSW_O_AO"] > 0), "PSW_O_IP"] = 100/(abs(df2_w["PSW_O_AO"]) + 100) *100  #underdog winners

df2_w.loc[(df2_w["PSL_O_AO"] < 0), "PSL_O_IP"] = abs(df2_w["PSL_O_AO"])/(abs(df2_w["PSL_O_AO"]) + 100) *100  #favorite losers
df2_w.loc[(df2_w["PSL_O_AO"] > 0), "PSL_O_IP"] = 100/(abs(df2_w["PSL_O_AO"]) + 100) *100  #underdog losers

df2_w["PSW_O_IP"] = pd.to_numeric(df2_w["PSW_O_IP"])
df2_w["PSL_O_IP"] = pd.to_numeric(df2_w["PSL_O_IP"])

In [52]:
# Remove the vig from the implied probabilities for Averaged closing lines
df2_w["AvgW_C_IP_NV"] = ""
df2_w["AvgL_C_IP_NV"] = ""

df2_w["AvgW_C_IP_NV"] = df2_w["AvgW_C_IP"]/(df2_w["AvgW_C_IP"] + df2_w["AvgL_C_IP"]) * 100
df2_w["AvgL_C_IP_NV"] = df2_w["AvgL_C_IP"]/(df2_w["AvgW_C_IP"] + df2_w["AvgL_C_IP"]) * 100

df2_w["AvgW_C_IP_NV"] = round(pd.to_numeric(df2_w["AvgW_C_IP_NV"]), 2)
df2_w["AvgL_C_IP_NV"] = round(pd.to_numeric(df2_w["AvgL_C_IP_NV"]), 2)

# Remove the vig from the implied probabilities for Pinnacle Closing Lines
df2_w["PSW_C_IP_NV"] = ""
df2_w["PSL_C_IP_NV"] = ""

df2_w["PSW_C_IP_NV"] = df2_w["PSW_C_IP"]/(df2_w["PSW_C_IP"] + df2_w["PSL_C_IP"]) * 100
df2_w["PSL_C_IP_NV"] = df2_w["PSL_C_IP"]/(df2_w["PSW_C_IP"] + df2_w["PSL_C_IP"]) * 100

df2_w["PSW_C_IP_NV"] = round(pd.to_numeric(df2_w["PSW_C_IP_NV"]), 2)
df2_w["PSL_C_IP_NV"] = round(pd.to_numeric(df2_w["PSL_C_IP_NV"]), 2)

# Remove the vig from the implied probabilities for Pinnacle opening lines
df2_w["PSW_O_IP_NV"] = ""
df2_w["PSL_O_IP_NV"] = ""

df2_w["PSW_O_IP_NV"] = df2_w["PSW_O_IP"]/(df2_w["PSW_O_IP"] + df2_w["PSL_O_IP"]) * 100
df2_w["PSL_O_IP_NV"] = df2_w["PSL_O_IP"]/(df2_w["PSW_O_IP"] + df2_w["PSL_O_IP"]) * 100

df2_w["PSW_O_IP_NV"] = round(pd.to_numeric(df2_w["PSW_O_IP_NV"]), 2)
df2_w["PSL_O_IP_NV"] = round(pd.to_numeric(df2_w["PSL_O_IP_NV"]), 2)


In [53]:
df2_w.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28927 entries, 0 to 29105
Data columns (total 37 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   t_loc         28927 non-null  object        
 1   t_nm          28927 non-null  object        
 2   m_date        28927 non-null  datetime64[ns]
 3   t_surf        28927 non-null  object        
 4   t_rd          28927 non-null  object        
 5   w_nm          28927 non-null  object        
 6   l_nm          28927 non-null  object        
 7   w_rk          28916 non-null  float64       
 8   l_rk          28865 non-null  float64       
 9   PSW_BL        26068 non-null  float64       
 10  PSL_BL        26068 non-null  float64       
 11  AvgW_BL       25245 non-null  float64       
 12  AvgL_BL       25245 non-null  float64       
 13  PSW_O_BL      13892 non-null  float64       
 14  PSL_O_BL      13892 non-null  float64       
 15  Comment       28927 non-null  object

In [54]:
df3_w = df2_w[['m_date', 'm_yr', 'tour_wk', 'w_rk', 'l_rk', 'AvgW_C_IP_NV', 'AvgL_C_IP_NV', 'PSW_C_IP_NV', 'PSL_C_IP_NV', 'PSW_O_IP_NV', 'PSL_O_IP_NV', 'Comment']]

### 6. Merge Implied Win Probability Data Into Core Match Dataframe
The match date columns from the win probability dataframe will also be very useful in expansive feature development (Workbook 2)

In [55]:
# Join wagering data to main dataframe after stripping down to just the merge alignment and data columns we need
df2 = pd.merge(df, df3_w, how='left', on=['tour_wk','w_rk','l_rk'])
df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28540 entries, 0 to 28539
Data columns (total 63 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   t_id              28540 non-null  object        
 1   t_nm              28540 non-null  object        
 2   t_surf            28540 non-null  object        
 3   t_draw_sz         28540 non-null  int64         
 4   t_lvl             28540 non-null  object        
 5   t_date            28540 non-null  int64         
 6   m_num             28540 non-null  int64         
 7   w_id              28540 non-null  int64         
 8   w_ent             28540 non-null  object        
 9   w_nm              28540 non-null  object        
 10  w_hd              28540 non-null  object        
 11  w_ht              28525 non-null  float64       
 12  w_ioc             28540 non-null  object        
 13  w_age             28540 non-null  float64       
 14  l_id              2854

In [56]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28540 entries, 0 to 28539
Data columns (total 63 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   t_id              28540 non-null  object        
 1   t_nm              28540 non-null  object        
 2   t_surf            28540 non-null  object        
 3   t_draw_sz         28540 non-null  int64         
 4   t_lvl             28540 non-null  object        
 5   t_date            28540 non-null  int64         
 6   m_num             28540 non-null  int64         
 7   w_id              28540 non-null  int64         
 8   w_ent             28540 non-null  object        
 9   w_nm              28540 non-null  object        
 10  w_hd              28540 non-null  object        
 11  w_ht              28525 non-null  float64       
 12  w_ioc             28540 non-null  object        
 13  w_age             28540 non-null  float64       
 14  l_id              2854

In [57]:
# There's one little complication stemming from the merge columns used- we get duplicated rows (one with the odds from the first match, one with the odds from the second match) in the very rare occurance (only in the NextGen and Tour Finals) where two players play twice in the same tourney AND the same player wins both times. 
#The easiest (though certainly not elegant) way to deal with this is simply to hard-code prune the duped rows eith the incorrect odds out before saving.
df2['t_rd_num'] = df2['t_rd_num'].astype(float)

df2 = df2.drop(df2[(df2["m_date"] == "11/20/2011") & (df2["t_rd_num"] == 5)].index) #Federer-Tsonga played twice in Tour Finals
df2 = df2.drop(df2[(df2["m_date"] == "11/27/2011") & (df2["t_rd_num"] == 1)].index) #Federer-Tsonga played twice in Tour Finals

df2 = df2.drop(df2[(df2["m_date"] == "11/15/2017") & (df2["t_rd_num"] == 5)].index) #Dimitrov-Goffin played twice in Tour Finals
df2 = df2.drop(df2[(df2["m_date"] == "11/19/2017") & (df2["t_rd_num"] == 2)].index) #Dimitrov-Goffin played twice in Tour Finals

df2 = df2.drop(df2[(df2["m_date"] == "11/8/2017") & (df2["t_rd_num"] == 5)].index) #Chung-Rublev played twice in NextGen Finals
df2 = df2.drop(df2[(df2["m_date"] == "11/11/2017") & (df2["t_rd_num"] == 2)].index) #Chung-Rublev played twice in NextGen Finals

In [58]:
#Creates 'm_num' feature that will be critical in feature generation (see workbook 2)
df3 = df2
df3 = df3.sort_values(by=['tour_wk', 't_id', 'm_date'], ascending = True)
df3 = df3.reset_index(drop=True)
df3["m_num"] = df3.index
df3["m_num"] = df3["m_num"] + 1 # corrects for zero indexing 

In [59]:
#Convert a bunch of features to integer from object (will be more useful in feature generation that way)
df3["t_ident"] = df3["t_ident"].astype('int64') 
df3["t_surf"] = df3["t_surf"].astype('int64') 
df3["t_ind"] = df3["t_ind"].astype('int64') 
df3["t_alt"] = df3["t_alt"].astype('int64') 
df3["t_GMT_diff"] = df3["t_GMT_diff"].astype('int64')
df3["t_lvl"] = df3["t_lvl"].astype('int64')
df3["w_ent"] = df3["w_ent"].astype('int64')
df3["l_ent"] = df3["l_ent"].astype('int64')
df3["w_hd"] = df3["w_hd"].astype('int64')
df3["l_hd"] = df3["l_hd"].astype('int64')

df3["t_id"] = df3["t_id"].str.replace('-','')
df3["t_id"] = df3["t_id"].str.replace('M','10') #some tourny IDs contained a letter (M)
df3["t_id"] = df3["t_id"].astype('int64') 

In [60]:
df3 = df3.drop(["t_date", "tour_wk", "t_rd"], axis = 1)

In [61]:
df3 = df3[["t_id", "t_ident", "t_nm", "t_co", "t_GMT_diff", "t_surf", "t_ind", "t_alt", "t_draw_sz", "t_lvl", "m_bestof", "m_num", "m_date", "m_yr", "t_rd_num", "m_t(m)", "w_id", "w_ent", "w_nm", "w_rk", "w_rk_pts", "w_hd", "w_ht", "w_ioc", "w_age", "w_sv_gms", "w_sv_pts", "w_1st_sv_in", "w_1st_sv_pts_won", "w_2nd_sv_pts_won", "w_ace", "w_df", "w_bp_saved", "w_bp_faced", "l_id", "l_ent", "l_nm", "l_rk", "l_rk_pts", "l_hd", "l_ht", "l_ioc", "l_age", "l_sv_gms", "l_sv_pts", "l_1st_sv_in", "l_1st_sv_pts_won", "l_1st_sv_pts_won", "l_2nd_sv_pts_won", "l_ace", "l_df", "l_bp_saved", "l_bp_faced", "AvgW_C_IP_NV", "AvgL_C_IP_NV", "PSW_C_IP_NV", "PSL_C_IP_NV", "PSW_O_IP_NV", "PSL_O_IP_NV", "Comment"]]

In [62]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28534 entries, 0 to 28533
Data columns (total 60 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   t_id              28534 non-null  int64         
 1   t_ident           28534 non-null  int64         
 2   t_nm              28534 non-null  object        
 3   t_co              28534 non-null  object        
 4   t_GMT_diff        28534 non-null  int64         
 5   t_surf            28534 non-null  int64         
 6   t_ind             28534 non-null  int64         
 7   t_alt             28534 non-null  int64         
 8   t_draw_sz         28534 non-null  int64         
 9   t_lvl             28534 non-null  int64         
 10  m_bestof          28534 non-null  int64         
 11  m_num             28534 non-null  int64         
 12  m_date            28534 non-null  datetime64[ns]
 13  m_yr              28534 non-null  int64         
 14  t_rd_num          2853

In [64]:
#Save current df for next stage (feature creation for 2009-2019; clay court range)
df3.to_csv('../data/cleaned_data_for_FeatureDev.csv', index=False)