# Project Overview

### Web Scraper:
 - We created two lists for the companies that we are focusing on - one for their stock tickers and the other includes their full names. Then, we programmed a loop that will append the search urls with each ticker and company name. To accomplish the latter, we are using a simple value of 'x', which is given a value of '0' outside of the loop, and iterated at the end of the loop with 'x += 1'.
 - While looping throug the lists, the scraper will scrape relevant stock information and article headline/descriptions from the Yahoo Finance and Forbes websites.
 - As the loop runs, we chose to store all of the stock informaion in a dictionary named, 'stockDict', which will be merged with the dictionary we created for the headlines/articles/sentiment analysis at the end of the loop.

### Sentiment Analysis:
 - Using the NLTK Vader Sentiment Intensity Analyzer, we were able to use the headline and description variables to perform a sentiment analysis on both the headline itself, and the article description.
 - To keep things clean and easily analyzed, we chose to store the sentiment analysis ratings in separate dictionaries, and then pull only the 'compound' sentiment score for each to store in the main 'gcDict' dictionary.
 - After running the sentiment analysis, we coded a simple conditional statement to convert the compound sentiment score into an easily readable 'positive', 'negative', or 'neutral' sentiment for easy future analysis.
 
### MongoDB:
 - As each iteration of the loop was being performed, we stored all of the relevant information in their respective dictionaries.
 - At the end of each iteration, we merged the two dictionaries, and then pushed the master dictionary to the 'gameCollection' collection in a db named 'gamesDB' .
 - Once each iteration of the loop begins, the dictionary is cleared to save memory and keep the code clean.

In [None]:
# IMPORT LIBRARIES
import pymongo
from splinter import Browser
from bs4 import BeautifulSoup as bs
import time
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

In [None]:
# BROWSER MAGIC
def init_browser():
    executable_path = {"executable_path": "chromedriver.exe"}
    return Browser("chrome", **executable_path, headless=False)

In [None]:
# CONNECT TO MONGODB AND ACCESS THE 'GAMESDB' DATABASE
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)
db = client.gameDB

In [None]:
# LIST OF STOCK TICKERS FOR EACH GAME COMPANY
stockList = ['NTDOY',
             'EA',
             'UBSFY',
             'ATVI',
             'SGAMY',
             'TTWO',
             'SQNXF',
             'NCBDF']

# CORRESPONDING LIST OF GAME COMAPNIES
gcList = ['Nintendo',
          'Electronic Arts', 
          'Ubisoft',
          'Activision Blizzard',
          'Sega Sammy Holdings',
          'Take-two Interactive',
          'Square Enix',
          'Bandai Namco']

x = 0

#  ------------------------------------------------------
#                      STOCK SCRAPER
#  ------------------------------------------------------

for y in stockList:
    # SLEEP TIMER
    time.sleep(2)
    
    # INITIALIZE BROWSER
    browser = init_browser()

    # CREATE EMPTY DICTIONARY TO STORE STOCK INFORMATION
    stockDict = {}  
    
    # URL CONFIG FOR YAHOO FINANCE
    url = f'https://finance.yahoo.com/quote/{y}?p={y}&.tsrc=fin-srch'
    browser.visit(url)

    # SLEEP TIMER - ALLOW THE HTML TO LOAD/POPULATE 
    time.sleep(3)

    # BROWSER/SOUP INFO
    html = browser.html
    soup = bs(html, "html.parser")
    
    # SCRAPE THE STOCK INFORMATION
    stockPrice = soup.find('span', class_='Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)').text.strip()
    marketCap = soup.find('td', {'data-test':"MARKET_CAP-value"}).span.text
    stockPrevious = soup.find('td', {'data-test':"PREV_CLOSE-value"}).span.text
    stockOpen = soup.find('td', {'data-test':"OPEN-value"}).span.text
    stockVolume = soup.find('td', {'data-test':"TD_VOLUME-value"}).span.text
    avg3moVolume = soup.find('td', {'data-test':"AVERAGE_VOLUME_3MONTH-value"}).span.text
    
    # UPDATE STOCK DICTIONARY WITH COMPANY NAME AND STOCK INFO
    stockDict.update( {'company' : gcList[x]} )
    stockDict.update( {'stock_ticker' : y} )
    stockDict.update( {'current_stock_price' : stockPrice} )
    stockDict.update( {'previous_close' : stockPrevious} )
    stockDict.update( {'stock_open' : stockOpen} )
    stockDict.update( {'stock_volume' : stockVolume} )
    stockDict.update( {'avg_3month_volume' : avg3moVolume} )  
    
    # CLOSE THE BROWSER WHEN COMPLETE
    browser.quit()
    
#  ------------------------------------------------------
#     HEADLINE/ARTICLE SCRAPER AND SENTIMENT ANALYSIS
#  ------------------------------------------------------
    
    # SLEEP TIMER
    time.sleep(2)
    
    # INITIALIZE BROWSER
    browser = init_browser()

    # CREATE EMPTY DICTIONARIES TO STORE INFORMATION
    gcDict = {}
    headlineAnalysis = {}
    bodyAnalysis = {}
    
    # URL CONFIG FOR FORBES
    url = f'https://www.forbes.com/search/?q={gcList[x]}'
    browser.visit(url)

    # SLEEP TIMER - ALLOW THE HTML TO LOAD/POPULATE
    time.sleep(3)

    # BROWSER/SOUP INFO
    html = browser.html
    soup = bs(html, "html.parser")

    # SCRAPE THE FIRST HEADLINE
    getHeadline = soup.find('a', class_='stream-item__title').text.strip()

    # SCRAPE THE CORRESPONDING ARTICLE DESCRIPTION
    getDesc = soup.find('div', class_='stream-item__description').text.strip()

    # SENTIMENT ANALYSIS
    sia = SIA() 
    headline_score = sia.polarity_scores(getHeadline)
    body_score = sia.polarity_scores(getDesc)
    gcDict.update({'headline' : getHeadline})
    headlineAnalysis.update(headline_score)
    gcDict['headline_sentiment_score'] = headlineAnalysis['compound']
    gcDict.update( {"headline_sentiment": 
        "positive" if headlineAnalysis['compound'] > 0.3 else 
        "negative" if headlineAnalysis['compound'] < -0.3 else 
        "neutral"
    })
    gcDict.update( {'article_description' : getDesc} )
    bodyAnalysis.update(body_score)
    gcDict['article_sentiment_score'] = bodyAnalysis['compound']
    gcDict.update( {"article_sentiment": 
        "positive" if bodyAnalysis['compound'] > 0.3 else 
        "negative" if bodyAnalysis['compound'] < -0.3 else 
        "neutral"
    })
        
    # CLOSE THE BROWSER WHEN COMPLETE
    browser.quit()
    
    # MERGE THE ARTICLE/SENTIMENT DICTIONARY INTO THE STOCK DICTIONARY
    stockDict.update(gcDict)
    
    # PUSH THE DICTIONARY TO THE DATABASE
    db.gameCollection.insert_one(stockDict)
    
    x += 1
    