The purpose of this notebook is to use a dataset that was scraped from Ballchasing.com to programmatically parse user profiles on Rocketleague.tracker.network to gather player stats.
- Read in ballchasing data to dataframe
- iterate through each game and send the name of the player to RL Tracker. Look up the player via steam ID or xbox/psn name
- grab stats from profile - lifetime wins, goal/shot ratio, 1v1 rating (MMR)

In [None]:
import pandas as pd
import numpy as np
import requests
import json
from bs4 import BeautifulSoup
from requests_html import HTMLSession

In [None]:
ballchasing = pd.read_excel('Ballchasing_data_Nov-01-2021.xlsx', index_col = 0, dtype = {'player2_steam_id':'str'})

https://stackoverflow.com/questions/67209947/python-cant-scrape-data-from-my-targeted-site-anymore-using-re-requests-and

In [None]:
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}

Loop through the dataframe and grab p1 steam id, request the webpage, save the data to columns p1_wins, p1_goalshot_ratio, p1_mmr

In [None]:
base_url = 'https://api.tracker.gg/api/v2/rocket-league/standard/profile/'

In [None]:
ballchasing.head()

In [None]:
ballchasing['p1_profile'] = base_url + 'steam/' + ballchasing['player1_steam_id'].map(str)

In [None]:
ballchasing.head()

In [None]:
p1_url_list = ballchasing['p1_profile'].tolist()

Now, using that list, iterate over it and get response from website and get data:

In [None]:
p1_wins_list = []
p1_mmr = []
p1_gs_ratio = []
num_loops = 0

#for each player profile in the list:
for i in range(len(p1_url_list)):
    
    # get the response for each player profile
    response = requests.get(p1_url_list[i], headers=headers)
    
    # if the page doesnt exist (profile is private, deleted, etc.), append "null" for each stat and skip to next iteration
    if response.status_code != 200:
        p1_wins_list.append('null')
        p1_mmr.append('null')
        p1_gs_ratio.append('null')
        
        continue
    
    # if it does exist, format data as json and append each of the stats to their own list
    data = response.json()
    p1_wins_list.append(data['data']['segments'][0]['stats']['wins']['value'])
    try:
        p1_mmr.append(data['data']['segments'][2]['stats']['rating']['value'])
    except IndexError:
        p1_mmr.append('null')
    p1_gs_ratio.append(round(data['data']['segments'][0]['stats']['goalShotRatio']['value'], 2))
    
    print(num_loops)
    num_loops += 1

-------------------------

Now that i have p1 stats, I can append them to the df as a column and then go get p2 stats.

In [None]:
ballchasing['p1_wins'] = p1_wins_list
ballchasing['p1_mmr'] = p1_mmr
ballchasing['p1_gs_ratio'] = p1_gs_ratio
#wont work until I fully run the loop and get every piece of data.

get p2 stats - trickier now, because I don't have a steam ID - will have to construct a list of URLs with their platform and username.

In [None]:
ballchasing.head()

In [None]:
ballchasing['p2_profile'] = np.where(ballchasing['player2_platform'] == 'steam', base_url + 'steam/' + ballchasing['player2_steam_id'].map(str),'null')

ballchasing['p2_profile'] = np.where(ballchasing['player2_platform'] == 'xbox', base_url + 'xbox/' + ballchasing['player2_name'],ballchasing['p2_profile'])

ballchasing['p2_profile'] = np.where(ballchasing['player2_platform'] == 'epic', base_url + 'epic/' + ballchasing['player2_name'],ballchasing['p2_profile'])

ballchasing['p2_profile'] = np.where(ballchasing['player2_platform'] == 'ps4', base_url + 'ps4/' + ballchasing['player2_name'],ballchasing['p2_profile'])



In [None]:
ballchasing['p2_profile'].iloc[0]

In [None]:
p2_url_list = ballchasing['p2_profile'].tolist()

repeat code from above but for p2:

In [None]:
p2_wins_list = []
p2_mmr = []
p2_gs_ratio = []
num_loops = 0

#for each player profile in the list:
for i in range(len(p2_url_list)):
    
    if p2_url_list[i] == 'null':
        p2_wins_list.append('null')
        p2_mmr.append('null')
        p2_gs_ratio.append('null')
        continue
    
    else:
        # get the response for each player profile
        response = requests.get(p2_url_list[i], headers=headers)

        # if the page doesnt exist (profile is private, deleted, etc.), append "null" for each stat and skip to next iteration
        if response.status_code != 200:
            p2_wins_list.append('null')
            p2_mmr.append('null')
            p2_gs_ratio.append('null')
            print(response.status_code)
            continue

        # if it does exist, format data as json and append each of the stats to their own list
        data = response.json()
        p2_wins_list.append(data['data']['segments'][0]['stats']['wins']['value'])
        try:
            p2_mmr.append(data['data']['segments'][2]['stats']['rating']['value'])
        except IndexError:
            p2_mmr.append('null')
        p2_gs_ratio.append(round(data['data']['segments'][0]['stats']['goalShotRatio']['value'], 2))

        print(num_loops)
        num_loops += 1

Then take the lists and add them to the dataframe:

In [None]:
ballchasing['p2_wins'] = p2_wins_list
ballchasing['p2_mmr'] = p2_mmr
ballchasing['p2_gs_ratio'] = p2_gs_ratio
#wont work until you fully run the loop and get every piece of data so that list lengths match.