# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\16155\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv("news_api")

In [3]:
# Create a newsapi client
from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
btc_articles = newsapi.get_everything(q='bitcoin', language='en')


In [5]:
# Fetch the Ethereum news articles
eth_articles = newsapi.get_everything(q='ethereum', language='en')

In [6]:
# Create the Bitcoin sentiment scores DataFrame
def create_df(news):
    articles = []
    for article in news["articles"]:
        try:
            title = article["title"]
            description = article["description"]
            content = article["content"]
            date = article["publishedAt"][:10]
            
            sentiment = analyzer.polarity_scores(content)
            compound = sentiment["compound"]
            pos = sentiment["pos"]
            neu = sentiment["neu"]
            neg = sentiment["neg"]

            articles.append({
                "title": title,
                "description": description,
                "date": date,
                "content": content,
                "compound": compound,
                "pos": pos,
                "neu": neu,
                "neg": neg
            })
        except AttributeError:
            pass

    return pd.DataFrame(articles)

btc_df = create_df(btc_articles)
btc_df.head()

Unnamed: 0,title,description,date,content,compound,pos,neu,neg
0,A fake press release claiming Kroger accepts c...,A crypto hoax claimed Kroger is accepting Bitc...,2021-11-05,A similar hoax earlier this year tied Walmart ...,-0.2732,0.0,0.937,0.063
1,"Who Bought $1.6B in Bitcoin Wednesday, and Why?",last week the cryptocurrency market persistent...,2021-10-10,"Specifically, why did someone make a massive p...",0.5461,0.121,0.879,0.0
2,Bitcoin Miners Are Gobbling Up U.S. Energy,There’s a big new presence slurping up power f...,2021-10-28,Theres a big new presence slurping up power fr...,0.3612,0.096,0.904,0.0
3,Mining Bitcoin Using Nuclear Power May Be Fine...,"Last week, the Wall Street Journal ran a piece...",2021-10-08,"Last week, the Wall Street Journal ran a piece...",0.34,0.099,0.901,0.0
4,Roughly One-Third of Bitcoin Is Controlled by ...,"For all the talk of democratizing finance, the...",2021-10-26,"For all the talk of democratizing finance, the...",0.0,0.0,1.0,0.0


In [7]:
# Create the Ethereum sentiment scores DataFrame
eth_df = create_df(eth_articles)
eth_df.head()

Unnamed: 0,title,description,date,content,compound,pos,neu,neg
0,Nervos launches cross-chain bridge to connect ...,A new cross-chain bridge is currently connecte...,2021-10-16,A new cross-chain bridge is currently connecte...,0.0,0.0,1.0,0.0
1,"Mark Cuban Heralds Ethereum, Bitcoin",Mark Cuban has some advice for people who are ...,2021-10-14,Mark Cuban has some advice for people who are ...,0.0,0.0,1.0,0.0
2,JPMorgan says ethereum is a better bet than bi...,Ethereum is the more resilient cryptocurrency ...,2021-11-05,Ethereum and bitcoin are the two biggest crypt...,0.4588,0.094,0.906,0.0
3,A meme coin named after Elon Musk rode the wav...,The surge in October pushed the ethereum-based...,2021-11-01,Elon Musk\r\npicture alliance / Getty Images\r...,0.5267,0.093,0.907,0.0
4,Obscure altcoin mana spikes 400% as Facebook's...,"The price of Decentraland, whose ticker is man...",2021-11-01,Cryptocurrency and business continuity line im...,0.4588,0.097,0.903,0.0


In [8]:
# Describe the Bitcoin Sentiment
btc_df.describe()

Unnamed: 0,compound,pos,neu,neg
count,20.0,20.0,20.0,20.0
mean,0.21373,0.05605,0.9408,0.00315
std,0.288324,0.067147,0.065844,0.014087
min,-0.2732,0.0,0.801,0.0
25%,0.0,0.0,0.89375,0.0
50%,0.0,0.0,0.9685,0.0
75%,0.485175,0.10625,1.0,0.0
max,0.7558,0.199,1.0,0.063


In [9]:
# Describe the Ethereum Sentiment
eth_df.describe()

Unnamed: 0,compound,pos,neu,neg
count,20.0,20.0,20.0,20.0
mean,0.22443,0.04795,0.94965,0.0024
std,0.295698,0.066084,0.070553,0.010733
min,0.0,0.0,0.792,0.0
25%,0.0,0.0,0.90525,0.0
50%,0.0,0.0,1.0,0.0
75%,0.475775,0.09475,1.0,0.0
max,0.8225,0.208,1.0,0.048


### Questions:

Q: Which coin had the highest mean positive score?

A: Bitcoin

Q: Which coin had the highest compound score?

A: Ethereum

Q. Which coin had the highest positive score?

A: Ethereum

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
# Expand the default stopwords list if necessary
sw = set(stopwords.words('english'))

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', article)
    
    # Create a tokenized list of the words
    words = word_tokenize(re_clean)
    
    # Lemmatize words into root words
    lemmatized = [lemmatizer.lemmatize(word) for word in words]
   
    # Convert the words to lowercase
    tokens = [word.lower() for word in lemmatized if word.lower() not in sw]
    
    # Remove the stop words
    
    
    return tokens

In [13]:
# Create a new tokens column for Bitcoin
btc_token = []

for i in btc_articles["articles"]:
    article = i["content"]
    btc_token.append(tokenizer(article))

In [14]:
btc_df["tokens"] = btc_token

In [15]:
print(btc_token)

[['similar', 'hoax', 'earlier', 'year', 'tied', 'walmart', 'litecoinif', 'buy', 'something', 'verge', 'link', 'vox', 'media', 'may', 'earn', 'commission', 'see', 'ethic', 'statementphoto', 'illustration', 'thiago', 'prudencios', 'char'], ['specifically', 'someone', 'make', 'massive', 'purchase', 'billion', 'worth', 'bitcoin', 'wednesday', 'couple', 'minuteswhile', 'many', 'see', 'huge', 'buy', 'signal', 'bullishness', 'may', 'char'], ['theres', 'big', 'new', 'presence', 'slurping', 'power', 'us', 'grid', 'growing', 'bitcoin', 'miner', 'new', 'research', 'show', 'us', 'ha', 'overtaken', 'china', 'top', 'global', 'destination', 'bitcoin', 'mining', 'char'], ['last', 'week', 'wall', 'street', 'journal', 'ran', 'piece', 'three', 'recent', 'nuclearbitcoin', 'deal', 'may', 'signal', 'growing', 'trend', 'industry', 'journal', 'piece', 'reflects', 'small', 'growing', 'sense', 'excitemen', 'char'], ['talk', 'democratizing', 'finance', 'vast', 'majority', 'bitcoin', 'continues', 'owned', 'relati

In [16]:
# Create a new tokens column for Ethereum
eth_token = []

for i in eth_articles["articles"]:
    article = i["content"]
    eth_token.append(tokenizer(article))

In [17]:
eth_df["tokens"] = eth_token

In [18]:
eth_df.head()

Unnamed: 0,title,description,date,content,compound,pos,neu,neg,tokens
0,Nervos launches cross-chain bridge to connect ...,A new cross-chain bridge is currently connecte...,2021-10-16,A new cross-chain bridge is currently connecte...,0.0,0.0,1.0,0.0,"[new, crosschain, bridge, currently, connected..."
1,"Mark Cuban Heralds Ethereum, Bitcoin",Mark Cuban has some advice for people who are ...,2021-10-14,Mark Cuban has some advice for people who are ...,0.0,0.0,1.0,0.0,"[mark, cuban, ha, advice, people, new, investi..."
2,JPMorgan says ethereum is a better bet than bi...,Ethereum is the more resilient cryptocurrency ...,2021-11-05,Ethereum and bitcoin are the two biggest crypt...,0.4588,0.094,0.906,0.0,"[ethereum, bitcoin, two, biggest, cryptocurren..."
3,A meme coin named after Elon Musk rode the wav...,The surge in October pushed the ethereum-based...,2021-11-01,Elon Musk\r\npicture alliance / Getty Images\r...,0.5267,0.093,0.907,0.0,"[elon, muskpicture, alliance, getty, imagesa, ..."
4,Obscure altcoin mana spikes 400% as Facebook's...,"The price of Decentraland, whose ticker is man...",2021-11-01,Cryptocurrency and business continuity line im...,0.4588,0.097,0.903,0.0,"[cryptocurrency, business, continuity, line, i..."


In [19]:
btc_df.head()

Unnamed: 0,title,description,date,content,compound,pos,neu,neg,tokens
0,A fake press release claiming Kroger accepts c...,A crypto hoax claimed Kroger is accepting Bitc...,2021-11-05,A similar hoax earlier this year tied Walmart ...,-0.2732,0.0,0.937,0.063,"[similar, hoax, earlier, year, tied, walmart, ..."
1,"Who Bought $1.6B in Bitcoin Wednesday, and Why?",last week the cryptocurrency market persistent...,2021-10-10,"Specifically, why did someone make a massive p...",0.5461,0.121,0.879,0.0,"[specifically, someone, make, massive, purchas..."
2,Bitcoin Miners Are Gobbling Up U.S. Energy,There’s a big new presence slurping up power f...,2021-10-28,Theres a big new presence slurping up power fr...,0.3612,0.096,0.904,0.0,"[theres, big, new, presence, slurping, power, ..."
3,Mining Bitcoin Using Nuclear Power May Be Fine...,"Last week, the Wall Street Journal ran a piece...",2021-10-08,"Last week, the Wall Street Journal ran a piece...",0.34,0.099,0.901,0.0,"[last, week, wall, street, journal, ran, piece..."
4,Roughly One-Third of Bitcoin Is Controlled by ...,"For all the talk of democratizing finance, the...",2021-10-26,"For all the talk of democratizing finance, the...",0.0,0.0,1.0,0.0,"[talk, democratizing, finance, vast, majority,..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [20]:
from collections import Counter
from nltk import ngrams

In [21]:
# Generate the Bitcoin N-grams where N=2
btc_token = [x for i in btc_token for x in i]
btc_grams = Counter(ngrams(btc_token, n=2))
print(dict(btc_grams))

{('similar', 'hoax'): 1, ('hoax', 'earlier'): 1, ('earlier', 'year'): 1, ('year', 'tied'): 1, ('tied', 'walmart'): 1, ('walmart', 'litecoinif'): 1, ('litecoinif', 'buy'): 1, ('buy', 'something'): 1, ('something', 'verge'): 1, ('verge', 'link'): 1, ('link', 'vox'): 1, ('vox', 'media'): 1, ('media', 'may'): 1, ('may', 'earn'): 1, ('earn', 'commission'): 1, ('commission', 'see'): 1, ('see', 'ethic'): 1, ('ethic', 'statementphoto'): 1, ('statementphoto', 'illustration'): 1, ('illustration', 'thiago'): 1, ('thiago', 'prudencios'): 1, ('prudencios', 'char'): 1, ('char', 'specifically'): 1, ('specifically', 'someone'): 1, ('someone', 'make'): 1, ('make', 'massive'): 1, ('massive', 'purchase'): 1, ('purchase', 'billion'): 1, ('billion', 'worth'): 1, ('worth', 'bitcoin'): 1, ('bitcoin', 'wednesday'): 1, ('wednesday', 'couple'): 1, ('couple', 'minuteswhile'): 1, ('minuteswhile', 'many'): 1, ('many', 'see'): 1, ('see', 'huge'): 1, ('huge', 'buy'): 1, ('buy', 'signal'): 1, ('signal', 'bullishness'

In [22]:
# Generate the Ethereum N-grams where N=2
eth_token = [x for i in eth_token for x in i]
eth_grams = Counter(ngrams(eth_token, n=2))
print(dict(eth_grams))

{('new', 'crosschain'): 1, ('crosschain', 'bridge'): 2, ('bridge', 'currently'): 1, ('currently', 'connected'): 1, ('connected', 'ethereum'): 1, ('ethereum', 'crosschain'): 1, ('bridge', 'cardano'): 1, ('cardano', 'public'): 1, ('public', 'chain'): 1, ('chain', 'come'): 1, ('come', 'futurenervostoday'): 1, ('futurenervostoday', 'announced'): 1, ('announced', 'force'): 1, ('force', 'bridge'): 1, ('bridge', 'char'): 1, ('char', 'mark'): 1, ('mark', 'cuban'): 1, ('cuban', 'ha'): 1, ('ha', 'advice'): 1, ('advice', 'people'): 1, ('people', 'new'): 1, ('new', 'investing'): 1, ('investing', 'cryptocurrencyas'): 1, ('cryptocurrencyas', 'investment'): 1, ('investment', 'think'): 1, ('think', 'ethereum'): 1, ('ethereum', 'ha'): 1, ('ha', 'upside'): 1, ('upside', 'told'): 1, ('told', 'cnbc'): 1, ('cnbc', 'make'): 1, ('make', 'wednesday'): 1, ('wednesday', 'bitcoin'): 1, ('bitcoin', 'added'): 1, ('added', 'better'): 1, ('better', 'char'): 1, ('char', 'ethereum'): 1, ('ethereum', 'bitcoin'): 1, ('b

In [23]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [24]:
# Use token_count to get the top 10 words for Bitcoin
token_count(btc_grams)

[(('reuters', 'bitcoin'), 5),
 (('cryptocurrency', 'bitcoin'), 4),
 (('illustration', 'taken'), 4),
 (('oct', 'reuters'), 4),
 (('bitcoin', 'seen'), 4),
 (('char', 'representation'), 3),
 (('representation', 'virtual'), 3),
 (('virtual', 'cryptocurrency'), 3),
 (('seen', 'picture'), 3),
 (('picture', 'illustration'), 3)]

In [25]:
# Use token_count to get the top 10 words for Ethereum
token_count(eth_grams)

[(('illustration', 'taken'), 4),
 (('taken', 'june'), 4),
 (('bitcoin', 'ethereum'), 3),
 (('ethereum', 'dogecoin'), 3),
 (('crosschain', 'bridge'), 2),
 (('cryptocurrency', 'exchange'), 2),
 (('char', 'representations'), 2),
 (('representations', 'cryptocurrencies'), 2),
 (('cryptocurrencies', 'bitcoin'), 2),
 (('dogecoin', 'ripple'), 2)]

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [26]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
btc_cloud = " ".join(btc_token)
btc_wc = WordCloud().generate(btc_cloud)
plt.imshow(btc_wc)

<matplotlib.image.AxesImage at 0x1d463d84730>

In [None]:
# Generate the Ethereum word cloud
eth_cloud = " ".join(eth_token)
eth_wc = WordCloud().generate(eth_cloud)
plt.imshow(eth_wc)

<matplotlib.image.AxesImage at 0x15d6b374070>

In [None]:
# NOTE: KERNEL KEEPS CRASHING TRYING TO GENERATE WORDCLOUD PLOTS

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [27]:
import spacy
from spacy import displacy

In [None]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [28]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [30]:
# Concatenate all of the Bitcoin text together
btc_text_together = []
for x in btc_token:
    btc_text_together.append(x)

In [31]:
btc_clean_text = " ".join(btc_text_together)

In [32]:
# Run the NER processor on all of the text
btc_ner = nlp(btc_clean_text)
# Add a title to the document
btc_ner.user_data['title'] = 'BTC NER Analysis'

In [34]:
# Render the visualization
displacy.render(btc_ner, style='ent')

In [35]:
# List all Entities
for ent in btc_ner.ents:
    print(ent.text, ent.label_)

earlier year DATE
walmart litecoinif LOC
vox media ORG
char ORG
billion CARDINAL
bitcoin GPE
wednesday DATE
s CARDINAL
us GPE
bitcoin GPE
china GPE
last week DATE
wall street journal ORG
three CARDINAL
june DATE
reutersdado PERSON
october DATE
sulondon ORG
reuters bitcoin cusp ORG
char securities exchange commission ORG
kellythe us securities exchange commission ORG
four CARDINAL
october DATE
october DATE
reutersedgar PERSON
bitcoin GPE
wednesday DATE
october DATE
reutersedgar PERSON
oct CARDINAL
reuters ORG
reuters ORG
reuters bitcoin ORG
tuesday DATE
us exchange tra ORG
getty imagesthe PERSON
bitcoinlinked exchangetraded fund ORG
tuesday DATE
bitcoin GPE
bitcoin conventionmarco PERSON
imagesthe securities exchange commission ORG
etf bitcoin FAC
etf bitcoins PERSON
bitcoin GPE
first ORDINAL
etf ORG
bitcoin GPE
displayrafael henriquesopa PERSON
bitcoin wa PERSON
monday DATE
morning TIME
mexican NORP
volaris GPE
juarez international airport FAC
mexico city GPE
mexico GPE
january DATE
ja

---

### Ethereum NER

In [43]:
# Concatenate all of the Ethereum text together
eth_text_together = []
for x in eth_token:
    eth_text_together.append(x)

In [44]:
# Run the NER processor on all of the text
eth_clean_text = " ".join(eth_text_together)
# Add a title to the document
eth_ner = nlp(eth_clean_text)
eth_ner.user_data['title'] = 'ETH NER Analysis'

In [45]:
# Render the visualization
displacy.render(eth_ner, style='ent')

In [46]:
# List all Entities
for ent in eth_ner.ents:
    print(ent.text, ent.label_)

futurenervostoday DATE
mark cuban PERSON
cnbc ORG
wednesday DATE
bitcoin PERSON
better char ethereum ORG
two CARDINAL
mansfield GPE
getty PERSON
bitcoin GPE
jpmorgan PERSON
beca char ORG
muskpicture alliance getty imagesa cryptocurrency ORG
elon musk ORG
moon PERSON
october DATE
mars ORG
november DATE
past week DATE
facebooks PERSON
hong kong GPE
september DATE
bitcoin ethereumnurphoto PERSON
getty imagesif PERSON
success firstever bitcoinfutures exchangetraded ORG
bitcoin GPE
june DATE
reutersdado ORG
burger king ORG
bitcoin GPE
rai ORG
raicrypto hedge PERSON
rai ORG
bitcoin GPE
friday DATE
second ORDINAL
new york GPE
reuters ORG
bitcoin GPE
far year DATE
thursday DATE
pm TIME
new york GPE
process starte char PERSON
quentin GPE
coming week DATE
june DATE
bitcoin GPE
june DATE
reutersdado ORG
june DATE
reutersedgar suillustrationnew york ORG


---