# Project Overview

### Web Scraper:
 - I created a list of the companies that we are focusing on, and then programmed a loop that will append the search url with each name in the list.
 - While looping throug the list, the scraper will scrape the article headline and description from the Forbes website and store it in a variable.

### Sentiment Analysis:
 - Using the NLTK Vader Sentiment Intensity Analyzer, I was able to use the headline and description variables to perform a sentiment analysis on both the headline itself, and the article description.
 - To keep things clean and easily analyzed, I chose to store the sentiment analysis ratings in separate dictionaries, and then pull only the 'compound' sentiment score for each to store in the main 'gcDict' dictionary.
 - After running the sentiment analysis, I coded a simple conditional statement to convert the compound sentiment score into an easily readable 'positive', 'negative', or 'neutral' sentiment for easy future analysis.
 
### MongoDB:
 - As each iteration of the loop was being performed, I stored all of the relevant information in the 'gcDict' dictionary.
 - At the end of each iteration, I pushed the dictionry to the 'gameCollection' collection in a db named 'gamesDB' .
 - Once each iteration of the loop begins, the dictionary is cleared to save memory and keep the code clean.

In [1]:
# IMPORT STUFF HERE
import pymongo
import datetime
from splinter import Browser
from bs4 import BeautifulSoup as bs
import time
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

In [2]:
# BROWSER MAGIC
def init_browser():
    executable_path = {"executable_path": "chromedriver.exe"}
    return Browser("chrome", **executable_path, headless=False)

In [3]:
# CONNECT TO MONGODB AND ACCESS THE 'GAMESDB' DATABASE
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)
db = client.gameDB

In [4]:
# LIST OF STOCK TICKERS FOR EACH COMPANY
stockList = ['NTDOY', 'EA', 'UBSFY', 'ATVI', 'SGAMY', 'TTWO', 'SQNXF', 'NCBDF']

# LIST OF GAME COMAPNIES
gcList = ['Nintendo',
          'Electronic Arts', 
          'Ubisoft',
          'Activision Blizzard',
          'Sega Sammy Holdings',
          'Take-two Interactive',
          'Square Enix',
          'Bandai Namco']

x = 0

#  ------------------------------------------------------
#                      STOCK SCRAPER
#  ------------------------------------------------------

for y in stockList:
    # SLEEP TIMER
    time.sleep(1)
    
    # INITIALIZE BROWSER
    browser = init_browser()

    # CREATE EMPTY DICTIONARY TO STORE STOCK INFORMATION
    stockDict = {}  
    
    # URL CONFIG FOR YAHOO FINANCE
    url = f'https://finance.yahoo.com/quote/{y}?p={y}&.tsrc=fin-srch'
    browser.visit(url)

    # SET TIMER
    time.sleep(2)

    # BROWSER/SOUP INFO
    html = browser.html
    soup = bs(html, "html.parser")
    
    # SCRAPE THE STOCK PRICE
    stockPrice = soup.find('span', class_='Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)').text.strip()
    
    # UPDATE STOCK DICTIONARY WITH COMPNY NAME AND STOCK TICKER
    stockDict.update( {'company' : gcList[x]} )
    stockDict.update( {'stock_ticker' : y} )    
    
    # UPDATE STOCK DICTIONARY WITH STOCK INFO
    stockDict.update( {'stock_price' : stockPrice} )
    
    # CLOSE THE BROWSER WHEN COMPLETE
    browser.quit()
    
#  ------------------------------------------------------
#     HEADLINE/ARTICLE SCRAPER AND SENTIMENT ANALYSIS
#  ------------------------------------------------------
    
    # SLEEP TIMER
    time.sleep(1)
    
    # INITIALIZE BROWSER
    browser = init_browser()

    # CREATE EMPTY DICTIONARIES TO STORE INFORMATION
    gcDict = {}
    headlineAnalysis = {}
    bodyAnalysis = {}
    
    # URL CONFIG FOR FORBES
    url = f'https://www.forbes.com/search/?q={gcList[x]}'
    browser.visit(url)

    # SET TIMER
    time.sleep(2)

    # BROWSER/SOUP INFO
    html = browser.html
    soup = bs(html, "html.parser")

    # SCRAPE THE FIRST HEADLINE
    getHeadline = soup.find('a', class_='stream-item__title').text.strip()

    # SCRAPE THE ARTICLE DESCRIPTION
    getDesc = soup.find('div', class_='stream-item__description').text.strip()

    # SENTIMENT ANALYSIS
    sia = SIA() 
    headline_score = sia.polarity_scores(getHeadline)
    body_score = sia.polarity_scores(getDesc)
    gcDict.update({'headline' : getHeadline})
    headlineAnalysis.update(headline_score)
    gcDict['headline_sentiment_score'] = headlineAnalysis['compound']
    gcDict.update( {"headline_sentiment": 
        "positive" if headlineAnalysis['compound'] > 0.3 else 
        "negative" if headlineAnalysis['compound'] < -0.3 else 
        "neutral"
    })
    gcDict.update( {'article_description' : getDesc} )
    bodyAnalysis.update(body_score)
    gcDict['article_sentiment_score'] = bodyAnalysis['compound']
    gcDict.update( {"article_sentiment": 
        "positive" if bodyAnalysis['compound'] > 0.3 else 
        "negative" if bodyAnalysis['compound'] < -0.3 else 
        "neutral"
    })
        
    # CLOSE THE BROWSER WHEN COMPLETE
    browser.quit()
    
    # MERGE THE ARTICLE/SENTIMENT DICTIONARY INTO THE STOCK DICTIONARY
    stockDict.update(gcDict)
    
    # PUSH THE DICT TO THE DATABASE
    db.gameCollection.insert_one(stockDict)
    
    x += 1
    