1. Write a approx 750-800 (3PAGE) word description of: 
    - EDA application (e.g., use-case, domain of operation).
    - Describe 3 interesting results from the EDA.
    - How you connected your work to lecture material 
    - Advanced techniques that you used
    - References for API or websites

If you are submitting a PDF, please mention that here. 

-----------------------------------------------------
This application will scrape the user reviews of a given Steam game. It will return a specified amount of reviews of a specified game. For each review, the application will grab the following information:
User name
User ID
User Country (If publically available to the steam api)
Date of the review
Rating of the review
Text content of the review

This information is then stored into a json file to be used later on for analyzing trends in user reviews.

THREE INTERESTING EXAMPLES

For most of the earlier lectures in the second term, I was already aware of web scraping techniques due to my prior experience in developing a tool for data gathering for Curve Games, however I did not know much in the way of presenting and vizualising data so the later weeks were quite helpful for me, especially the ones covering pandas databases and the matplot library.

An advanced technique I used was employing the use of Selenium. This was necessary for what I wanted to d as Steam's review page makes use of an infinite scroll set up. This means that in order to load more than ten reviews, I would need to have some form of interaction with the webpage. Since BeautifulSoup and requests wouldn't be enough for this, I decided to use Selenium and ChromeDriver.

Data is retrieved via a combination of webscraping and use of the Steam API. I scrape elements such as the user's display name and id. I can then use the id to retreive further data about the user through the Steam Api. I use this to access their country without the need to load their profile in ChromeDriver which would slow down the Application immensely. I also grab details aout the review via web scraping as this stuff would not be available through Steam's API.
https://steamcommunity.com/robots.txt
-----------------------------------------------------

------------
2. CODE
-----------

In [None]:
# PASTE THE CODE HERE
# Make this easy for user testing - Running this cell should load all code for EDA.

from bs4 import BeautifulSoup
from steam_web_api import Steam
from dotenv import load_dotenv
import os
import json
import requests
from dateutil import parser

def date_to_computer_readable(date):
    try:
        parsed_date = parser.parse(date)
    except:
        return date
    return parsed_date.strftime('%d-%m-%Y')

load_dotenv()
KEY = os.getenv('STEAM_API_KEY')
steam = Steam(KEY)

app_id =int(input('Enter the app id: '))
response = requests.get(f'https://steamcommunity.com/app/{app_id}/reviews/?browsefilter=mostrecent&snr=1_5_100010_&p=1')

if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')
    review_boxes = soup.find_all('div', class_='apphub_Card')
    
    data = []

    for review_box in review_boxes:
        user = review_box.find('div', class_='apphub_CardContentAuthorName')
        user_ID = user.find('a').get('href').split('/')[-2]
        try:
            country = steam.users.get_user_details(user_ID)['player']['loccountrycode']
        except:
            country = 'Private'
        date = review_box.find('div', class_='date_posted').get_text()
        pos_rating = True if 'icon_thumbsUp' in review_box.find('div', class_='thumb').find('img').get('src') else False
        review = review_box.find('div', class_='apphub_CardTextContent').get_text().replace(date, '').replace('Product refunded', '').strip()
        date = date.replace('Posted: ', '')
        data.append({
            "user": user.get_text(),
            "user_ID": user_ID,
            "country": country,
            "date": date_to_computer_readable(date),
            "positive rating": pos_rating,
            "review": review
        })
    new_file = open(f'{app_id}.json', 'w')
    json.dump(data, new_file, indent=4)
    new_file.close()


https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1
https://community.fastly.steamstatic.com/public/shared/images/userreviews/icon_thumbsUp.png?v=1


In [5]:
#This is the version that employs the use of Selenium in order to scrape the steam reviews page while dealing with the 'infinite scroll' aspect.

from steam_web_api import Steam
from dotenv import load_dotenv
from selenium import webdriver
from selenium.webdriver.common.by import By
import os
import json
import time
import pandas as pd
from dateutil import parser

def date_to_computer_readable(date):
    try:
        parsed_date = parser.parse(date)
    except:
        return date
    return parsed_date.strftime('%d-%m-%Y')


load_dotenv()
KEY = os.getenv('STEAM_API_KEY')
steam = Steam(KEY)

app_id =int(input('Enter the app id: '))
max_reviews = int(input('Enter the maximum number of reviews to scrape: '))

driver = webdriver.Chrome()
driver.get(f'https://steamcommunity.com/app/{app_id}/reviews/?browsefilter=mostrecent&snr=1_5_100010_&p=1')

new_content = True
while new_content:
    try:
        driver.execute_script("javascript:CheckForMoreContent()")
        if driver.find_element(By.CLASS_NAME, 'apphub_NoMoreContentText2').is_displayed() or len(driver.find_elements(By.CLASS_NAME, 'apphub_Card')) >= max_reviews:
            new_content = False
    except:
        time.sleep(0.05)

review_boxes = driver.find_elements(By.CLASS_NAME, 'apphub_Card')
print(str(len(review_boxes)) + " reviews found!")

data = []

for review_box in review_boxes:
    user = review_box.find_element(By.CLASS_NAME, 'apphub_CardContentAuthorName')
    user_ID = user.find_element(By.TAG_NAME, 'a').get_dom_attribute('href').split('/')[-2]
    try:
        country = steam.users.get_user_details(user_ID)['player']['loccountrycode']
    except:
        country = 'Private'
    date = review_box.find_element(By.CLASS_NAME, 'date_posted').text
    print(review_box.find_element(By.CLASS_NAME, 'thumb').text)
    pos_rating = True if 'icon_thumbsUp' in review_box.find_element(By.CLASS_NAME, 'thumb').find_element(By.TAG_NAME, 'img').get_attribute('src') else False
    review = review_box.find_element(By.CLASS_NAME, 'apphub_CardTextContent').text.replace(date, '').replace('Product refunded', '').strip()
    date = date.replace('Posted: ', '')
    data.append({
        "user": user.text,
        "user_ID": user_ID,
        "country": country,
        "date": date_to_computer_readable(date),
        "positive rating": pos_rating,
        "review": review
    })
    new_file = open(f'{app_id}.json', 'w')
    json.dump(data, new_file, indent=4)
    new_file.close()





ValueError: invalid literal for int() with base 10: ''

In [None]:
df = pd.read_json(f'{app_id}.json')
# print negative entries from the UK
# print(df[(df['country'] == 'GB') & (df['positive rating'] == False)]['review'])

# print all entries from the UK that are not positive
print(df[(df['country'] == 'GB') & (df['positive rating'] == False)])


                 user            user_ID country       date  positive rating  \
58    PokemonElite499  76561199211558372      GB 2024-01-12             True   
69           giuseppe  76561198397165602      GB 2024-11-30             True   
75        Raccacoonie  76561198030236479      GB 2024-11-30             True   
134            Bernie  76561198282166680      GB 2024-11-29             True   
319           Dante71  76561198149206158      GB 2024-11-24             True   
...               ...                ...     ...        ...              ...   
4749             Gray  76561198857840295      GB 2023-07-25             True   
4762          xLolenx  76561199092445142      GB 2023-07-24             True   
4789    bidibidoodaba  76561199143963376      GB 2023-07-23             True   
4790  ChildOfTheSun32  76561197970232486      GB 2023-07-23             True   
4970            Katto  76561197976002174      GB 2023-07-13             True   

                                       

3. Reflect on project management (approx 250 words) 
    - Timeliness: Reflect on how consistently you made an effort meet deadlines
    - Organization: Reflect on how managed the code-base as the project size increased

-----------------------------------------------------

Here is my refglection etc...

-----------------------------------------------------


4. Process reflection (approx 200 words) - Discuss the week by week iterative development of your chatbot.
    - Describe each week of chatbot devleopment.
    - What was the feedback you received? How did you work on the feedback to improve EDA? What new features did you add? 

-----------------------------------------------------

Reflection goes here

-----------------------------------------------------
