# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer() #this is for when creating sentiment score dataframes

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/tylergehbauer/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable'
#uses my .env file to read in my api key
load_dotenv()
news_api = os.getenv('news_api')

In [3]:
#print(news_api)

In [8]:
from newsapi import NewsApiClient 

In [10]:
# This creates the newsapi client
newsapi = NewsApiClient(api_key = news_api)

In [40]:
# Fetch the Bitcoin news articles
bitcoin_news = newsapi.get_everything(q = 'bitcoin', language = 'en', sort_by = 'relevancy')
#q are the keywords
#language is the language
#bitcoin_news is a dict of dicts

In [38]:
example_df = pd.DataFrame.from_dict(bitcoin_news["articles"])
example_df.head() #this shows each article from 'bitcoin_news'

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Lifehacker.com'}",Jeff Somers,Is the Crypto Bubble Going to Burst?,Even if you aren’t paying attention to Bitcoin...,https://lifehacker.com/is-the-crypto-bubble-go...,https://i.kinja-img.com/gawker-media/image/upl...,2022-02-09T16:00:00Z,Even if you arent paying attention to Bitcoin ...
1,"{'id': 'the-verge', 'name': 'The Verge'}",Mitchell Clark,The International Monetary Fund tells El Salva...,The International Monetary Fund’s executive di...,https://www.theverge.com/2022/1/25/22901374/el...,https://cdn.vox-cdn.com/thumbor/altkKN7BnaLUpb...,2022-01-25T22:11:14Z,El Salvador introduced Bitcoin as a legal tend...
2,"{'id': 'the-verge', 'name': 'The Verge'}",Corin Faife,DeepDotWeb operator sentenced to eight years f...,"The operator of DeepDotWeb, a site that indexe...",https://www.theverge.com/2022/1/27/22904803/de...,https://cdn.vox-cdn.com/thumbor/mde_l3lUC4muDP...,2022-01-27T18:16:57Z,Israeli national Tal Prihar pled guilty to rou...
3,"{'id': 'engadget', 'name': 'Engadget'}",Kris Holt,Netflix is already making a docuseries about t...,Netflix\r\n is making a docuseries about one o...,https://www.engadget.com/netflix-billion-dolla...,https://s.yimg.com/os/creatr-uploaded-images/2...,2022-02-11T19:22:41Z,Netflix\r\n is making a docuseries about one o...
4,"{'id': 'wired', 'name': 'Wired'}",Gian M. Volpicelli,Gibraltar Could Launch the World’s First Crypt...,“The Rock” hopes a new stock exchange will att...,https://www.wired.com/story/gibraltar-crypto-e...,https://media.wired.com/photos/61f0b4bf03f9ae9...,2022-01-26T12:00:00Z,British entrepreneur and financier Richard ODe...


In [13]:
# Fetch the Ethereum news articles
ethereum_news = newsapi.get_everything(q = 'ethereum', language = 'en', sort_by = 'relevancy')
#q are the keywords
#language is the language

Sentiment Analysis is used to analyze the emotion of the text. Postive words are associated with words like 'love' and 'enjoy' and vice versa for negative words.

The compound score is the sum of positive, negative & neutral scores which is then normalized between -1(most extreme negative) and +1 (most extreme positive).

The positive, negative & neutral scores for each article should add up to 1. 

source: https://analyticsindiamag.com/sentiment-analysis-made-easy-using-vader/#:~:text=The%20compound%20score%20is%20the,%25%20Negative%2C%2050.8%25%20Neutral

In [19]:
# Create the Bitcoin sentiment scores DataFrame
sentiments = [] #creates empty list

#Going to use VADER’s SentimentIntensityAnalyzer() to see how many 'Neutral (neu)' , 'Positive (pos)', and 'Negatvie(neg)'
# words there are for each article. 
for articles in bitcoin_news["articles"]: #goes through each artilce in bitcoin_news and applies polai
    try:
        text = articles["content"] #accessing each articles 'content' which contains the article itself in 'bitcoin_news'
        
        results = analyzer.polarity_scoresolarity_scores(text) #this is how get compound, pos, neg, and nue polarity scores
        #during imports we set analyzer equal to SentimentIntensityAnalyzer() 
        #'results' uses this function to find polarity score for each article in bitcoin_news
        
        compound = results["compound"] #gets compound polarity score
        pos = results["pos"] #gets postive polarity score
        neu = results["neu"] #gets neutral polarity score
        neg = results["neg"] #gets negative polarity score

        sentiments.append({ #appends a dictionary inside the list-(sentiments) so a dataframe can be created
            "text": text, #this appends all polarity score we got above
            "Compound": compound,
            "Positive": pos,
            "Negative": neg,
            "Neutral": neu,
        })
    except AttributeError:
        pass #rinse and repeat for each article
    
bitcoin = pd.DataFrame(sentiments) #creates a dataframe using the list of dicts
bitcoin.head()

Unnamed: 0,text,Compound,Positive,Negative,Neutral
0,Even if you arent paying attention to Bitcoin ...,0.5859,0.124,0.0,0.876
1,El Salvador introduced Bitcoin as a legal tend...,0.3182,0.105,0.0,0.895
2,Israeli national Tal Prihar pled guilty to rou...,-0.3182,0.045,0.084,0.871
3,Netflix\r\n is making a docuseries about one o...,-0.7096,0.0,0.169,0.831
4,British entrepreneur and financier Richard ODe...,0.6808,0.185,0.0,0.815


In [20]:
# Create the Ethereum sentiment scores DataFrame
sentiments = []

for articles in ethereum_news["articles"]:
    try:
        text = articles["content"]
        results = analyzer.polarity_scores(text)
        compound = results["compound"]
        pos = results["pos"]
        neu = results["neu"]
        neg = results["neg"]

        sentiments.append({
            "text": text,
            "Compound": compound,
            "Positive": pos,
            "Negative": neg,
            "Neutral": neu,
        })
    except AttributeError:
        pass
    
ethereum = pd.DataFrame(sentiments)
ethereum.head()

Unnamed: 0,text,Compound,Positive,Negative,Neutral
0,If people who buy cryptocurrencies intended on...,-0.2023,0.039,0.062,0.899
1,"Technical analysis isnt a perfect tool, but it...",-0.2498,0.0,0.059,0.941
2,This enables an L1 platform to bootstrap its n...,0.0,0.0,0.0,1.0
3,"The means-and-ends moralists, or non-doers, al...",0.0,0.0,0.0,1.0
4,Coinbase reported that the share of trading vo...,0.6705,0.188,0.0,0.812


In [21]:
# Describe the Bitcoin Sentiment
bitcoin.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,20.0,20.0,20.0,20.0
mean,0.181095,0.0908,0.0416,0.8676
std,0.459473,0.059023,0.057329,0.061515
min,-0.7096,0.0,0.0,0.729
25%,-0.225725,0.04575,0.0,0.83475
50%,0.2957,0.0925,0.0,0.8735
75%,0.5859,0.14525,0.0855,0.9015
max,0.7783,0.185,0.169,1.0


In [22]:
# Describe the Ethereum Sentiment
ethereum.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,20.0,20.0,20.0,20.0
mean,0.100645,0.0591,0.0264,0.91455
std,0.351331,0.074079,0.048366,0.088084
min,-0.6808,0.0,0.0,0.766
25%,-0.025675,0.0,0.0,0.8225
50%,0.0,0.0425,0.0,0.9425
75%,0.232225,0.08275,0.04175,1.0
max,0.8341,0.234,0.174,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [65]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [96]:
# Instantiate the lemmatizer
wnl = WordNetLemmatizer() 

# Create a list of stopwords
stop = stopwords.words('english') # contains all stop words for english

In [97]:
# Expand the default stopwords list if necessary
#will append each word to the list 'stop' created in cell above.
stop.append("u")
stop.append("it'")
stop.append("'s")
stop.append("n't")
stop.append('…')
stop.append("\`")
stop.append('``')
stop.append('char')
stop.append("''")
stop.append('’')
stop.append('arent')
stop.append('Mr.')
stop.append(',')
#stop.append('cryptocurrency') ?
#
stop = set(stop)


In [98]:
#stop #to check if stop words appended

In [99]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Create a list of the words
    words = word_tokenize(text) #tokenizes each word in text

    # Convert the words to lowercase
    words = list(filter(lambda w: w.lower(), words))
    
    # Remove the punctuation
    words = list(filter(lambda t: t not in punctuation, words))
    
    # Remove the stopwords
    words = list(filter(lambda t: t.lower() not in stop, words)) #uses the stop list we created
    
    # Lemmatize Words into root words
    #makes Ran and Running into same word for exmaple. 
    tokens = [wnl.lemmatize(word) for word in words]
    
    return tokens

In [100]:
# Create a new tokens column for Bitcoin
bitcoin["tokens"] = bitcoin.text.apply(tokenizer)
bitcoin.head()

Unnamed: 0,text,Compound,Positive,Negative,Neutral,tokens
0,Even if you arent paying attention to Bitcoin ...,0.5859,0.124,0.0,0.876,"[Even, paying, attention, Bitcoin, cryptocurre..."
1,El Salvador introduced Bitcoin as a legal tend...,0.3182,0.105,0.0,0.895,"[El, Salvador, introduced, Bitcoin, legal, ten..."
2,Israeli national Tal Prihar pled guilty to rou...,-0.3182,0.045,0.084,0.871,"[Israeli, national, Tal, Prihar, pled, guilty,..."
3,Netflix\r\n is making a docuseries about one o...,-0.7096,0.0,0.169,0.831,"[Netflix, making, docuseries, one, worst, rapp..."
4,British entrepreneur and financier Richard ODe...,0.6808,0.185,0.0,0.815,"[British, entrepreneur, financier, Richard, OD..."


In [101]:
# Create a new tokens column for Ethereum
ethereum["tokens"] = ethereum.text.apply(tokenizer)
ethereum.head()

Unnamed: 0,text,Compound,Positive,Negative,Neutral,tokens
0,If people who buy cryptocurrencies intended on...,-0.2023,0.039,0.062,0.899,"[people, buy, cryptocurrencies, intended, hold..."
1,"Technical analysis isnt a perfect tool, but it...",-0.2498,0.0,0.059,0.941,"[Technical, analysis, isnt, perfect, tool, may..."
2,This enables an L1 platform to bootstrap its n...,0.0,0.0,0.0,1.0,"[enables, L1, platform, bootstrap, national, e..."
3,"The means-and-ends moralists, or non-doers, al...",0.0,0.0,0.0,1.0,"[means-and-ends, moralist, non-doers, always, ..."
4,Coinbase reported that the share of trading vo...,0.6705,0.188,0.0,0.812,"[Coinbase, reported, share, trading, volume, e..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [102]:
from collections import Counter
from nltk import ngrams

In [103]:
# Generate the Bitcoin N-grams where N=2
N=2
grams = ngrams(tokenizer(bitcoin.text.str.cat()), N)
Counter(grams).most_common(20)

[(('El', 'Salvador'), 3),
 (('Illustration', 'Alex'), 3),
 (('Alex', 'Castro'), 3),
 (('Castro', 'Verge'), 3),
 (('char', 'Feb'), 3),
 (('Reuters', 'Bitcoin'), 3),
 (('alongside', 'US'), 2),
 (('US', 'dollar'), 2),
 (('International', 'Monetary'), 2),
 (('Monetary', 'Fund'), 2),
 (('Mr.', 'Lichtensteins'), 2),
 (('Lichtensteins', 'wallet'), 2),
 (('Even', 'paying'), 1),
 (('paying', 'attention'), 1),
 (('attention', 'Bitcoin'), 1),
 (('Bitcoin', 'cryptocurrencies'), 1),
 (('cryptocurrencies', 'might'), 1),
 (('might', 'noticed'), 1),
 (('noticed', 'value'), 1),
 (('value', 'plummeted'), 1)]

In [104]:
# Generate the Ethereum N-grams where N=2
N = 2
grams = ngrams(tokenizer(ethereum.text.str.cat()), N)
Counter(grams).most_common(20)

[(('char', 'Feb'), 4),
 (('324', 'million'), 4),
 (('char', 'version'), 3),
 (('version', 'article'), 3),
 (('article', 'published'), 3),
 (('published', 'TIME'), 3),
 (('TIME', 'newsletter'), 3),
 (('newsletter', 'Metaverse'), 3),
 (('Metaverse', 'Subscribe'), 3),
 (('Subscribe', 'weekly'), 3),
 (('weekly', 'guide'), 3),
 (('guide', 'future'), 3),
 (('find', 'past'), 3),
 (('past', 'issue'), 3),
 (('issue', 'newsletter'), 3),
 (('trading', 'volume'), 2),
 (('profile', 'picture'), 2),
 (('char', 'Online'), 2),
 (('Online', 'thief'), 2),
 (('thief', 'made'), 2)]

In [105]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [106]:
# Use token_count to get the top 10 words for Bitcoin
all_tokens = tokenizer(bitcoin.text.str.cat())
token_count(all_tokens, 10)

[('char', 20),
 ('Bitcoin', 14),
 ('Reuters', 5),
 ('El', 4),
 ('Salvador', 3),
 ('dollar', 3),
 ('Illustration', 3),
 ('Alex', 3),
 ('Castro', 3),
 ('Verge', 3)]

In [107]:
# Use token_count to get the top 10 words for Ethereum
all_tokens = tokenizer(ethereum.text.str.cat())
token_count(all_tokens, 10)

[('char', 19),
 ('newsletter', 6),
 ('million', 5),
 ('Bitcoin', 5),
 ('Ethereum', 4),
 ('token', 4),
 ('ethereum', 4),
 ('Feb', 4),
 ('Reuters', 4),
 ('324', 4)]

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---