<h1>Real-time NBA News Analyzer</h1>
<b>Here, firstly we gather the real-time new of various National Basketball Association (NBA) players by a Web Crawler to scrap the current news from the Fox News Sports network. For this, we use the BeautifulSoup python package. After collecting the data, it is manipulated to make it into a form which we can work with for applying NLP techniques.</b>

In [31]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = "https://www.foxsports.com/nba/player-news"

webpage = requests.get(URL)

news = BeautifulSoup(webpage.content, 'html.parser')

print("Response status code from website: ", webpage.status_code)

newsArticles = news.find("div", {"class", "player-news-list"})

players = {}

for article in newsArticles.findAll("div", {"class", "player-news-article league"}):
  player_news = {}
  # extracting the player news description
  player_news["Description"] = article.find("div", {"class", "player-news-article-description"}).text[6:]
  
  #extracting the player impact
  player_news["Impact"] = article.find("div", {"class", "player-news-article-impact"}).text[8:]

  #extracting the URL for PFP 
  player_news["PFP"] = article.find("a", {"class", "player-news-article-headshot flex-circle"}).img['src']

  players[article.find("div", {"class", "player-news-article-title"}).text] = player_news

# converting to dataframe and exporting as csv file
df = pd.DataFrame(players)
df = df.transpose()
print(df)

Response status code from website:  200
                                                             Description  \
Tyler Herro            Herro (knee) said Thursday that he is good to ...   
Russell Westbrook      Westbrook (recently signed) is absent from the...   
Fred VanVleet          VanVleet (personal) has been ruled out of Thur...   
D'Angelo Russell       Russell (ankle) won't return to Thursday's con...   
LeBron James           James (foot) will play Thursday against the Wa...   
Anthony Davis          Davis (foot) will play in Thursday's matchup w...   
Andrew Wiggins         Wiggins (personal), who is listed as out for T...   
Tre Jones              Jones (foot)  will not play in Thursday's game...   
Ivica Zubac            Zubac will miss Friday's game against the King...   
Joel Embiid            Embiid is dealing with a non-COVID illness, bu...   
Kyle Lowry             Head coach Erik Spoelstra indicated Thursday t...   
Robert Williams III    Williams will start Thurs

<br><b>Next, we apply sentiment analysis on the collected news. This helps us clearly segregate the positive news from the negative news. For this purpose, we utilize the transformers package, which provides a pre-built sentiment analysis pipeline. The pipeline takes the string to be classified as input and returns the label of 'Positive', 'Negative' or 'Neutral' along with the sentiment score, indicating the magnitude of the sentiment.</b>

In [39]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

sentiment = []
sentiment_value = []

for player in players:
    l1 = sentiment_pipeline(players[player]["Impact"])[0]['label']
    s1 = sentiment_pipeline(players[player]["Impact"])[0]['score']
    l2 = sentiment_pipeline(players[player]["Description"])[0]['label']
    s2 = sentiment_pipeline(players[player]["Description"])[0]['score']
    if l1 == l2:
        sentiment.append(l1)
        if s1 > s2:
            sentiment_value.append(s1)
        else:
            sentiment_value.append(s2)
    else:
        if s1 > s2:
            sentiment.append(l1)
            sentiment_value.append(s1)
        else:
            sentiment.append(l2)
            sentiment_value.append(s2)


df['Sentiment'] = sentiment
df['Sentiment Score'] = sentiment_value

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceCla

<br><b>Then, we go for keyword extraction, which essentially allows us to extract the useful keywords from the news. It allows us to highlight the main ideas of the news. Keyword extraction is useful in many ways. It helps in case one wishes to paraphrase the news, or create hash tags to post about the news etc.</b>

In [33]:
# Yet Another Keyword Extractor - for keyword extraction from the impact of the news
import yake

# keywords extracted from the description paragraph
player_keywords_desc = []

for player in players:
    keywordsString = ""
    doc = players[player]['Description']
    # dedupLim to set the hyperparameter for allowing repetition of words in the generation of keywords
    kw_extractor = yake.KeywordExtractor(n = 3, dedupLim = 0.5, top = 10)
    keywords = kw_extractor.extract_keywords(doc)
    for idx, kw in enumerate(keywords):
        if idx == len(keywords)-1:
             keywordsString += kw[0]
             break
        keywordsString += kw[0] + ", "
    player_keywords_desc.append(keywordsString)

print("Description Keywords player wise: ")
print(player_keywords_desc)
df['Keywords-Description'] = player_keywords_desc

print()

# keywords extracted from the impact paragraph
player_keywords_impact = []

for player in players:
    keywordsString = ""
    doc = players[player]['Impact']
    # dedupLim to set the hyperparameter for allowing repetition of words in the generation of keywords
    kw_extractor = yake.KeywordExtractor(n = 3, dedupLim = 0.5, top = 10)
    keywords = kw_extractor.extract_keywords(doc)
    for idx, kw in enumerate(keywords):
        if idx == len(keywords)-1:
             keywordsString += kw[0]
             break
        keywordsString += kw[0] + ", "
    player_keywords_impact.append(keywordsString)

print("Impact Keywords player wise: ")
print(player_keywords_impact)
df['Keywords-Impact'] = player_keywords_impact

Description Keywords player wise: 
['South Florida Sun, Florida Sun Sentinel, Sun Sentinel reports, game versus Milwaukee, Ira Winderman, Friday game versus, Herro, knee, Milwaukee, Ira', 'Clippers debut Friday, Tomer Azarly, recently signed, make his Clippers, Azarly of ClutchPoints.com, slated to make, injury report, ClutchPoints.com reports, Westbrook, Kings', 'Kayla Grey, Grey of TSN, TSN reports, Thursday contest, VanVleet, personal, Pelicans, Kayla, Thursday, Grey', 'NBC Sports Bay, Sports Bay Area, Bay Area reports, Dalton Johnson, Johnson of NBC, Russell, ankle, return to Thursday, Thursday contest, Warriors', 'Marc J. Spears, James, foot, play Thursday, Spears of ESPN.com, ESPN.com reports, Warriors, Marc, Spears, reports', 'Marc J. Spears, Davis, foot, play in Thursday, Thursday matchup, Spears of ESPN.com, ESPN.com reports, Warriors, Marc, Thursday', 'Anthony Slater, Athletic reports, Thursday game, family matter, timeline to return, Wiggins, personal, Lakers, Anthony, Warri

<br> <b> Next, we go for generating summary of the piece of text.

In [34]:
# creating summarization model
classifier = pipeline("summarization")

summaries = []

for player in players:
    summaries.append(classifier(players[player]['Impact'], max_length=30)[0]['summary_text'])

df['Summary'] = summaries

No model was supplied, defaulted to t5-small and revision d769bba (https://huggingface.co/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.
All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


<br><b> Next, we apply the Question Answering NLP technique</b>

In [35]:
from transformers import pipeline
question_answerer = pipeline('question-answering')

status = []

for player in players:
    status.append(question_answerer({'question': 'Is he injured?','context': players[player]['Description']})['answer'])

df['Status'] = status

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some layers from the model checkpoint at distilbert-base-cased-distilled-squad were not used when initializing TFDistilBertForQuestionAnswering: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForQuestionAnswering were not initialized from the model ch

<br><b> We then finally export all the results as CSV file for easy access and readability. The CSV file format allows for the data to be used in other models and applications as well. </b>

In [37]:
# extracting generated information from crawled data to csv file
df["Description"]=df["Description"].str.replace(',','-')
df["Impact"]=df["Impact"].str.replace(',','-')
df["Keywords-Description"]=df["Keywords-Description"].str.replace(',','-')
df["Keywords-Impact"]=df["Keywords-Impact"].str.replace(',','-')
df["Status"]=df["Status"].str.replace(',','-')
df["Summary"]=df["Summary"].str.replace(',','-')
df.to_csv("nba_news.csv")

In [38]:
import webbrowser

webbrowser.get('"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" %s').open(r"C:\Users\yashs\Documents\D-Drive\Sem 6\NLP\display-output.html")

True