# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [418]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
load_dotenv()
from newsapi.newsapi_client import NewsApiClient
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
%matplotlib inline

In [419]:
# Read your api key environment variable
# YOUR CODE HERE!
api_key = os.getenv('news_api')

In [420]:
# Create a newsapi client
# YOUR CODE HERE!
newsapi = NewsApiClient (api_key = api_key)

In [421]:
# Fetch the Bitcoin news articles
# YOUR CODE HERE!
btc_news = newsapi.get_everything(
    q="bitcoin",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

In [422]:
# Fetch the Ethereum news articles
# YOUR CODE HERE!
eth_news = newsapi.get_everything(
    q="ethereum",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

In [423]:
# Create the Bitcoin sentiment scores DataFrame
# YOUR CODE HERE!
btc_sentiments = []

for article in btc_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        btc_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
        })

    except AttributeError:
        pass

In [424]:
# Create Bitcoin DataFrame
btc_df = pd.DataFrame(btc_sentiments)
btc_df.tail()

Unnamed: 0,text,date,compound,positive,negative,neutral
95,Ripples rough few days just got way worse. Aft...,2020-12-29,-0.5574,0.035,0.124,0.842
96,By Reuters Staff\r\nFILE PHOTO: A sign of Foxc...,2020-12-08,0.0,0.0,0.0,1.0
97,By Reuters Staff\r\n(Reuters) - Coinbase Globa...,2020-12-17,0.296,0.064,0.0,0.936
98,Hat tip to Rabobank for their description of y...,2020-12-02,0.743,0.178,0.0,0.822
99,By Reuters Staff\r\nFILE PHOTO: A trader weari...,2020-12-15,0.0,0.0,0.0,1.0


In [425]:
btc_df.head()

Unnamed: 0,text,date,compound,positive,negative,neutral
0,Visa has partnered with cryptocurrency startup...,2020-12-03,0.6369,0.162,0.0,0.838
1,After reaching a previous all-time high on Nov...,2020-12-16,0.6486,0.174,0.0,0.826
2,Its been almost three years to the day since t...,2020-12-16,0.4019,0.072,0.0,0.928
3,Everything is dumb until it works.\r\nAs 2020 ...,2020-12-17,0.2732,0.136,0.083,0.781
4,The government of India is considering an 18% ...,2020-12-29,-0.2924,0.0,0.059,0.941


In [426]:
# Reorder Bitcoin DataFrame columns
cols = ["compound", "negative", "neutral", "positive", "text"]
btc_df = btc_df[cols]
btc_df.head()

Unnamed: 0,compound,negative,neutral,positive,text
0,0.6369,0.0,0.838,0.162,Visa has partnered with cryptocurrency startup...
1,0.6486,0.0,0.826,0.174,After reaching a previous all-time high on Nov...
2,0.4019,0.0,0.928,0.072,Its been almost three years to the day since t...
3,0.2732,0.083,0.781,0.136,Everything is dumb until it works.\r\nAs 2020 ...
4,-0.2924,0.059,0.941,0.0,The government of India is considering an 18% ...


In [427]:
# Create the ethereum sentiment scores DataFrame
# YOUR CODE HERE!
eth_sentiments = []

for article in eth_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        eth_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
        })

    except AttributeError:
        pass

In [428]:
# Create Ethereum DataFrame
eth_df = pd.DataFrame(eth_sentiments)
eth_df.head()

Unnamed: 0,text,date,compound,positive,negative,neutral
0,The Securities and Exchange Commission plans t...,2020-12-22,0.5267,0.136,0.0,0.864
1,Bitcoin was once derided by serious investors ...,2020-12-19,0.0772,0.066,0.085,0.849
2,FILE PHOTO: Representations of virtual currenc...,2020-12-16,0.0,0.0,0.0,1.0
3,FILE PHOTO: A representation of virtual curren...,2020-12-16,0.0,0.0,0.0,1.0
4,FILE PHOTO: Representations of virtual currenc...,2020-12-16,0.0,0.0,0.0,1.0


In [429]:
# Reorder Ethereum DataFrame columns
cols = ["compound", "negative", "neutral", "positive", "text"]
eth_df = eth_df[cols]
eth_df.head()

Unnamed: 0,compound,negative,neutral,positive,text
0,0.5267,0.0,0.864,0.136,The Securities and Exchange Commission plans t...
1,0.0772,0.085,0.849,0.066,Bitcoin was once derided by serious investors ...
2,0.0,0.0,1.0,0.0,FILE PHOTO: Representations of virtual currenc...
3,0.0,0.0,1.0,0.0,FILE PHOTO: A representation of virtual curren...
4,0.0,0.0,1.0,0.0,FILE PHOTO: Representations of virtual currenc...


In [430]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!
btc_df.describe()

Unnamed: 0,compound,negative,neutral,positive
count,100.0,100.0,100.0,100.0
mean,0.120697,0.02253,0.92776,0.04972
std,0.341145,0.051284,0.074283,0.058137
min,-0.9468,0.0,0.637,0.0
25%,0.0,0.0,0.8685,0.0
50%,0.0,0.0,0.9385,0.0365
75%,0.368875,0.0,1.0,0.08875
max,0.8016,0.363,1.0,0.209


In [431]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!
eth_df.describe()

Unnamed: 0,compound,negative,neutral,positive
count,94.0,94.0,94.0,94.0
mean,0.233443,0.024734,0.894745,0.080521
std,0.341135,0.044595,0.088672,0.074175
min,-0.7792,0.0,0.691,0.0
25%,0.0,0.0,0.83775,0.0
50%,0.2838,0.0,0.912,0.071
75%,0.45715,0.04775,0.9905,0.131
max,0.8779,0.239,1.0,0.278


### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [432]:
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
stop_words = set(stopwords.words("english"))
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer
wn = WordNetLemmatizer ()
from string import punctuation
import re

In [433]:
# Expand the default stopwords list if necessary
nltk.download("stopwords")

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\pvolc\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [434]:
# Complete the tokenizer function for bitcoin/Convert the words to lowercase/Remove the punctuation/Remove the stop words
corpus_btc = []
for i in range (0, 100):
    text = re.sub("[^a-zA-Z]", " ", btc_df["text"] [i])
    text = text.lower()
    text = text.split()
    ps = PorterStemmer()
    text = [ps.stem (word) for word in text if not word in stop_words]
    text = [wn.lemmatize (word) for word in text]
    text = " ".join (text)
    corpus_btc.append (text)

In [435]:
tokens = corpus_btc

In [436]:
# Create a list of the words
print (tokens)

['visa partner cryptocurr startup blockfi offer first reward credit card pay bitcoin rather cash worth appli unless your extrem bullish char', 'reach previou time high novemb th decemb st bitcoin trade well surpass previou peak price bitcoin valu rapidli char', 'almost three year day sinc price bitcoin close break ceil came crash wednesday arbitrari mileston hodler dream final char', 'everyth dumb work come close cryptocurr world experienc anoth late year surg consum interest price climb valu bitcoin char', 'govern india consid tax bitcoin transact accord new report time india clear whether propos good servic tax gst would char', 'secur exchang commiss plan sue rippl feder civil court sell unregist secur accord news releas publish onlin cryptocurr compani late char', 'unlik convent cryptocurr central bank control digit yuan case peopl bank china move give countri power theori stabil freq char', 'imag copyrightgetti imag bitcoin hit new time high break volatil virtual currenc gain year 

In [437]:
# Create a new tokens column for bitcoin
btc_df ["tokens"] = tokens
btc_df.head()

Unnamed: 0,compound,negative,neutral,positive,text,tokens
0,0.6369,0.0,0.838,0.162,Visa has partnered with cryptocurrency startup...,visa partner cryptocurr startup blockfi offer ...
1,0.6486,0.0,0.826,0.174,After reaching a previous all-time high on Nov...,reach previou time high novemb th decemb st bi...
2,0.4019,0.0,0.928,0.072,Its been almost three years to the day since t...,almost three year day sinc price bitcoin close...
3,0.2732,0.083,0.781,0.136,Everything is dumb until it works.\r\nAs 2020 ...,everyth dumb work come close cryptocurr world ...
4,-0.2924,0.059,0.941,0.0,The government of India is considering an 18% ...,govern india consid tax bitcoin transact accor...


In [438]:
# Complete the tokenizer function for ethereum/Convert the words to lowercase/Remove the punctuation/Remove the stop words
corpus_eth = []
for i in range (0, 94):
    text = re.sub("[^a-zA-Z]", " ", eth_df["text"] [i])
    text = text.lower()
    text = text.split()
    ps = PorterStemmer()
    text = [ps.stem (word) for word in text if not word in stop_words]
    text = [wn.lemmatize (word) for word in text]
    text = " ".join (text)
    corpus_eth.append (text)

In [439]:
tokens_eth = corpus_eth

In [440]:
print (tokens_eth)

['secur exchang commiss plan sue rippl feder civil court sell unregist secur accord news releas publish onlin cryptocurr compani late char', 'bitcoin derid seriou investor bubbl ponzi scheme year becom irresist invest mani wednesday bitcoin top fo char', 'file photo represent virtual currenc bitcoin seen pictur illustr taken taken march reuter dado ruvic illustr london reuter major u cryptocurr char', 'file photo represent virtual currenc bitcoin seen front stock graph illustr taken novemb reuter dado ruvic illustr london reuter major u char', 'file photo represent virtual currenc bitcoin seen pictur illustr taken taken march reuter dado ruvic illustr london reuter major u cryptocurr char', 'new york reuter total investor inflow cryptocurr fund product hit billion far year accord latest data asset manag coin char', 'new york reuter institut investor pump million cryptocurr fund product week end dec second highest record push sector asset manag char', 'new york reuter institut investor 

In [463]:
# Create a new tokens column for ethereum
eth_df ["tokens"] = tokens_eth
eth_df.head()

Unnamed: 0,compound,negative,neutral,positive,text,tokens
0,0.5267,0.0,0.864,0.136,The Securities and Exchange Commission plans t...,secur exchang commiss plan sue rippl feder civ...
1,0.0772,0.085,0.849,0.066,Bitcoin was once derided by serious investors ...,bitcoin derid seriou investor bubbl ponzi sche...
2,0.0,0.0,1.0,0.0,FILE PHOTO: Representations of virtual currenc...,file photo represent virtual currenc bitcoin s...
3,0.0,0.0,1.0,0.0,FILE PHOTO: A representation of virtual curren...,file photo represent virtual currenc bitcoin s...
4,0.0,0.0,1.0,0.0,FILE PHOTO: Representations of virtual currenc...,file photo represent virtual currenc bitcoin s...


---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [464]:
from collections import Counter
from nltk import ngrams

In [468]:
processed = corpus_btc(tokens)
print(processed)

TypeError: 'list' object is not callable

In [465]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!
bigram_counts = Counter(ngrams(processed, n=2))
print(dict(bigram_counts))

NameError: name 'processed' is not defined

In [444]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [445]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [446]:
# Get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [447]:
# Get the top 10 words for Ethereum
# YOUR CODE HERE!

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [448]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [449]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [450]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [451]:
import spacy
from spacy import displacy

In [452]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [453]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [454]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [455]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [456]:
# Render the visualization
# YOUR CODE HERE!

In [457]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [458]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [459]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [460]:
# Render the visualization
# YOUR CODE HERE!

In [461]:
# List all Entities
# YOUR CODE HERE!