<h1>Real-time NBA News Analyzer</h1>
<b>Here, firstly we gather the real-time new of various National Basketball Association (NBA) players by a Web Crawler to scrap the current news from the Fox News Sports network. For this, we use the BeautifulSoup python package. After collecting the data, it is manipulated to make it into a form which we can work with for applying NLP techniques.</b>

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = "https://www.foxsports.com/nba/player-news"

webpage = requests.get(URL)

news = BeautifulSoup(webpage.content, 'html.parser')

print("Response status code from website: ", webpage.status_code)

newsArticles = news.find("div", {"class", "player-news-list"})

players = {}

for article in newsArticles.findAll("div", {"class", "player-news-article league"}):
  player_news = {}
  # extracting the player news description
  player_news["Description"] = article.find("div", {"class", "player-news-article-description"}).text[6:]
  
  #extracting the player impact
  player_news["Impact"] = article.find("div", {"class", "player-news-article-impact"}).text[8:]

  #extracting the URL for PFP 
  player_news["PFP"] = article.find("a", {"class", "player-news-article-headshot flex-circle"}).img['src']

  players[article.find("div", {"class", "player-news-article-title"}).text] = player_news

# converting to dataframe and exporting as csv file
df = pd.DataFrame(players)
df = df.transpose()
print(df)

Response status code from website:  200
                                                               Description  \
Klay Thompson            Thompson (knee) is available for Tuesday's gam...   
Nikola Jokic             Jokic (knee) is in the starting lineup for Tue...   
Jamal Murray             Murray (knee) won't play in Tuesday's game aga...   
Domantas Sabonis         Sabonis recorded 28 points (12-19 FG, 2-2 3Pt,...   
Ivica Zubac              Zubac (knee) is slated to practice with the te...   
Norman Powell            Head coach Ty Lue said Powell (groin) practice...   
Paul George              George (knee) will take part in practice Tuesd...   
Michael Porter Jr.       Porter (heel) won't play in Tuesday's game ver...   
Russell Westbrook        Westbrook has been ruled out for Monday's game...   
Terry Rozier             Rozier (hip) is out for Monday's contest again...   
LeBron James             James (ankle) is probable for Wednesday agains...   
Jerami Grant            

<br><b>Next, we apply sentiment analysis on the collected news. This helps us clearly segregate the positive news from the negative news. For this purpose, we utilize the transformers package, which provides a pre-built sentiment analysis pipeline. The pipeline takes the string to be classified as input and returns the label of 'Positive', 'Negative' or 'Neutral' along with the sentiment score, indicating the magnitude of the sentiment.</b>

In [3]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

sentiment = []
sentiment_value = []

for player in players:
    sentiment.append(sentiment_pipeline(players[player]["Impact"])[0]['label'])
    sentiment_value.append(sentiment_pipeline(players[player]["Impact"])[0]['score'])

df['Sentiment'] = sentiment
df['Sentiment Score'] = sentiment_value

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


<br><b>Then, we go for keyword extraction, which essentially allows us to extract the useful keywords from the news. It allows us to highlight the main ideas of the news. Keyword extraction is useful in many ways. It helps in case one wishes to paraphrase the news, or create hash tags to post about the news etc.</b>

In [4]:
# Yet Another Keyword Extractor - for keyword extraction from the impact of the news
import yake

# keywords extracted from the description paragraph
player_keywords_desc = []

for player in players:
    keywordsString = ""
    doc = players[player]['Description']
    # dedupLim to set the hyperparameter for allowing repetition of words in the generation of keywords
    kw_extractor = yake.KeywordExtractor(n = 3, dedupLim = 0.5, top = 10)
    keywords = kw_extractor.extract_keywords(doc)
    for idx, kw in enumerate(keywords):
        if idx == len(keywords)-1:
             keywordsString += kw[0]
             break
        keywordsString += kw[0] + ", "
    player_keywords_desc.append(keywordsString)

print("Description Keywords player wise: ")
print(player_keywords_desc)
df['Keywords-Description'] = player_keywords_desc

print()

# keywords extracted from the impact paragraph
player_keywords_impact = []

for player in players:
    keywordsString = ""
    doc = players[player]['Impact']
    # dedupLim to set the hyperparameter for allowing repetition of words in the generation of keywords
    kw_extractor = yake.KeywordExtractor(n = 3, dedupLim = 0.5, top = 10)
    keywords = kw_extractor.extract_keywords(doc)
    for idx, kw in enumerate(keywords):
        if idx == len(keywords)-1:
             keywordsString += kw[0]
             break
        keywordsString += kw[0] + ", "
    player_keywords_impact.append(keywordsString)

print("Impact Keywords player wise: ")
print(player_keywords_impact)
df['Keywords-Impact'] = player_keywords_impact

Description Keywords player wise: 
['Thompson, knee, Tuesday game, Knicks, game', 'lineup for Tuesday, Tuesday game, starting lineup, Jokic, knee, Grizzlies, lineup, game', 'Denver Post reports, Mike Singer, game against Memphis, Murray, knee, play in Tuesday, Tuesday game, Memphis, Mike, Denver', 'Monday 125-119 loss, minutes during Monday, Sabonis recorded, points, rebounds, Monday, Sabonis, recorded, 12-19, minutes', 'contact session Tuesday, Tomer Azarly, Zubac, knee, Azarly of ClutchPoints.com, ClutchPoints.com reports, Tuesday, Tomer, slated to practice, Azarly', 'five-game road trip, upcoming five-game road, Lue said Powell, Ohm Youngmisuk, practiced Tuesday, coach Ty Lue, Youngmisuk of ESPN.com, Head coach, road trip, ESPN.com reports', 'Los Angeles Times, Angeles Times reports, Andrew Greif, practice Tuesday, George, knee, Tuesday, Andrew, part in practice, Greif', 'Denver Post reports, game versus Memphis, Mike Singer, Tuesday game versus, Porter, heel, play in Tuesday, Memph

<br> <b> Next, we go for generating summary of the piece of text.

In [8]:
# creating summarization model
classifier = pipeline("summarization")

summaries = []

for player in players:
    summaries.append(classifier(players[player]['Impact'], max_length=30)['summary_text'])

df['Summary'] = summaries

No model was supplied, defaulted to t5-small and revision d769bba (https://huggingface.co/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.
All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


<br><b> Next, we apply the Question Answering NLP technique</b>

In [None]:
from transformers import pipeline
question_answerer = pipeline('question-answering')

status = []

for player in players:
    status.append(question_answerer({'question': 'Is he injured?','context': players[player]['Description']})['answer'])

df['Status'] = status

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some layers from the model checkpoint at distilbert-base-cased-distilled-squad were not used when initializing TFDistilBertForQuestionAnswering: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForQuestionAnswering were not initialized from the model ch

<br><b> We then finally export all the results as CSV file for easy access and readability. The CSV file format allows for the data to be used in other models and applications as well. </b>

In [None]:
# extracting generated information from crawled data to csv file
df.to_csv("nba_news.csv")