<h1>Real-time NBA News Analyzer</h1>
<b>Here, firstly we gather the real-time new of various National Basketball Association (NBA) players by a Web Crawler to scrap the current news from the Fox News Sports network. For this, we use the BeautifulSoup python package. After collecting the data, it is manipulated to make it into a form which we can work with for applying NLP techniques.</b>

In [50]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = "https://www.foxsports.com/nba/player-news"

webpage = requests.get(URL)

news = BeautifulSoup(webpage.content, 'html.parser')

print("Response status code from website: ", webpage.status_code)

newsArticles = news.find("div", {"class", "player-news-list"})

players = {}

for article in newsArticles.findAll("div", {"class", "player-news-article league"}):
  player_news = {}
  # extracting the player news description
  player_news["Description"] = article.find("div", {"class", "player-news-article-description"}).text[6:]
  
  #extracting the player impact
  player_news["Impact"] = article.find("div", {"class", "player-news-article-impact"}).text[8:]
  players[article.find("div", {"class", "player-news-article-title"}).text] = player_news

# converting to dataframe and exporting as csv file
df = pd.DataFrame(players)
df = df.transpose()
print(df)

Response status code from website:  200
                                                               Description  \
Tim Hardaway Jr.         Hardaway (illness) is off the injury report fo...   
Jimmy Butler             Butler is probable for Tuesday's contest again...   
Devin Booker             Booker (groin) has been ruled out for Monday's...   
Ivica Zubac              Zubac (knee) is slated to practice with the te...   
Norman Powell            Head coach Ty Lue said Powell (groin) practice...   
Paul George              George (knee) will partake in practice Tuesday...   
Collin Sexton            Sexton (hamstring) will miss Tuesday's game ve...   
Klay Thompson            Thompson (knee) is probable for Tuesday's matc...   
Kristaps Porzingis       Porzingis didn't participate in Tuesday's shoo...   
Kelly Olynyk             Olynyk (ankle) has been ruled out for Monday's...   
Kyle Lowry               Lowry (knee) is questionable for Tuesday's mat...   
Jayson Tatum            

<br><b>Next, we apply sentiment analysis on the collected news. This helps us clearly segregate the positive news from the negative news. For this purpose, we utilize the transformers package, which provides a pre-built sentiment analysis pipeline. The pipeline takes the string to be classified as input and returns the label of 'Positive', 'Negative' or 'Neutral' along with the sentiment score, indicating the magnitude of the sentiment.</b>

In [51]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

sentiment = []
sentiment_value = []

for player in players:
    sentiment.append(sentiment_pipeline(players[player]["Impact"])[0]['label'])
    sentiment_value.append(sentiment_pipeline(players[player]["Impact"])[0]['score'])

df['Sentiment'] = sentiment
df['Sentiment Score'] = sentiment_value
# df.to_csv("nba_news.csv")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceCla

<br><b>Then, we go for keyword extraction, which essentially allows us to extract the useful keywords from the news. It allows us to highlight the main ideas of the news. Keyword extraction is useful in many ways. It helps in case one wishes to paraphrase the news, or create hash tags to post about the news etc.</b>

In [61]:
# Yet Another Keyword Extractor - for keyword extraction from the impact of the news
import yake

# keywords extracted from the description paragraph
player_keywords_desc = []

for player in players:
    keywordsString = ""
    doc = players[player]['Description']
    # dedupLim to set the hyperparameter for allowing repetition of words in the generation of keywords
    kw_extractor = yake.KeywordExtractor(n = 3, dedupLim = 0.5, top = 10)
    keywords = kw_extractor.extract_keywords(doc)
    for idx, kw in enumerate(keywords):
        if idx == len(keywords)-1:
             keywordsString += kw[0]
             break
        keywordsString += kw[0] + ", "
    player_keywords_desc.append(keywordsString)

print("Description Keywords player wise: ")
print(player_keywords_desc)
df['Keywords-Description'] = player_keywords_desc

print()
# keywords extracted from the impact paragraph
player_keywords_impact = []

for player in players:
    keywordsString = ""
    doc = players[player]['Impact']
    # dedupLim to set the hyperparameter for allowing repetition of words in the generation of keywords
    kw_extractor = yake.KeywordExtractor(n = 3, dedupLim = 0.5, top = 10)
    keywords = kw_extractor.extract_keywords(doc)
    for idx, kw in enumerate(keywords):
        if idx == len(keywords)-1:
             keywordsString += kw[0]
             break
        keywordsString += kw[0] + ", "
    player_keywords_impact.append(keywordsString)

print("Impact Keywords player wise: ")
print(player_keywords_impact)
df['Keywords-Impact'] = player_keywords_impact

df.to_csv("nba_news.csv")

Description Keywords player wise: 
['report for Wednesday, Wednesday game, injury report, Hardaway, illness, Timberwolves, injury, report, game', 'knee injury management, probable for Tuesday, Tuesday contest, Butler is probable, knee injury, Tuesday, Bulls, Butler, management, probable', 'Arizona Republic reports, Duane Rankin, Monday matchup, Booker, groin, Lakers, Duane, Monday, Rankin, Arizona', 'contact session Tuesday, Tomer Azarly, Zubac, knee, Azarly of ClutchPoints.com, ClutchPoints.com reports, Tuesday, Tomer, slated to practice, Azarly', 'five-game road trip, upcoming five-game road, Lue said Powell, Ohm Youngmisuk, practiced Tuesday, coach Ty Lue, Youngmisuk of ESPN.com, Head coach, road trip, ESPN.com reports', 'Los Angeles Times, Angeles Times reports, Andrew Greif, practice Tuesday, George, knee, Tuesday, Andrew, partake in practice, Greif', 'miss Tuesday game, Tuesday game versus, versus the Pistons, Sexton, hamstring, Pistons, Tuesday, miss, game, versus', 'Thompson, k