# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\wanda\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv('news_api')

In [3]:
# Create a newsapi client
from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
btc_news = newsapi.get_everything(
    q='bitcoin',
    language='en',
    sort_by='relevancy'
)

In [5]:
# Fetch the Ethereum news articles
eth_news = newsapi.get_everything(
    q='ethereum',
    language='en',
    sort_by='relevancy'
)

In [11]:
# Create the Bitcoin sentiment scores DataFrame
sentiment = []

for articles in btc_news["articles"]:
    try:
        text = articles['content']
        results = analyzer.polarity_scores(text)
        compound = results['compound']
        pos = results['pos']
        neu = results['neu']
        neg = results['neg']
        
        sentiment.append({
            'Compound': compound,
            'Positive': pos,
            'Negative': neg,
            'Neutral': neu,
            'text': text
        })
    except AttributeError:
        pass
    
btc = pd.DataFrame(sentiment)
btc

Unnamed: 0,Compound,Positive,Negative,Neutral,text
0,-0.34,0.0,0.061,0.939,When my wife started a little garden in our ur...
1,0.6908,0.178,0.0,0.822,"Like Dogecoin devotees, the mayor of Reno, and..."
2,0.4019,0.08,0.0,0.92,Photo by Joe Raedle/Getty Images\r\n\n \n\n Tw...
3,-0.886,0.0,0.271,0.729,"By Joe TidyCyber reporter \r\n""Follow the mone..."
4,0.624,0.127,0.0,0.873,To get a roundup of TechCrunchs biggest and mo...
5,0.2732,0.097,0.0,0.903,"As longtime TechCrunch readers know well, Mich..."
6,0.5719,0.139,0.0,0.861,"After the bell today, Coinbase reported anothe..."
7,0.128,0.089,0.075,0.836,"SINGAPORE, July 28 (Reuters) - Bitcoin broke a..."
8,0.0,0.0,0.0,1.0,T-Mobile is grappling with yet another reporte...
9,0.0,0.0,0.0,1.0,Representations of cryptocurrency Bitcoin are ...


In [12]:
# Create the Ethereum sentiment scores DataFrame
sentiment = []

for articles in eth_news["articles"]:
    try:
        text = articles['content']
        results = analyzer.polarity_scores(text)
        compound = results['compound']
        pos = results['pos']
        neu = results['neu']
        neg = results['neg']
        
        sentiment.append({
            'Compound': compound,
            'Positive': pos,
            'Negative': neg,
            'Neutral': neu,
            'text': text
        })
    except AttributeError:
        pass
    
eth = pd.DataFrame(sentiment)
eth

Unnamed: 0,Compound,Positive,Negative,Neutral,text
0,0.3612,0.075,0.0,0.925,There are many blockchain platforms competing ...
1,-0.2411,0.0,0.061,0.939,Blockchain infrastructure startups are heating...
2,0.6956,0.19,0.0,0.81,Cent was founded in 2017 as an ad-free creator...
3,0.5719,0.139,0.0,0.861,"After the bell today, Coinbase reported anothe..."
4,0.0,0.0,0.0,1.0,Representation of the Ethereum virtual currenc...
5,0.0,0.0,0.0,1.0,"HONG KONG, Aug 5 (Reuters) - Ether held near t..."
6,0.0,0.0,0.0,1.0,Representations of cryptocurrencies Bitcoin an...
7,0.34,0.105,0.0,0.895,Cryptocurrencies spiked Monday after Amazon li...
8,-0.3612,0.05,0.094,0.855,"By Mary-Ann RussonBusiness reporter, BBC News\..."
9,0.6369,0.157,0.0,0.843,"""Anthony Di Iorio, a co-founder of the Ethereu..."


In [13]:
# Describe the Bitcoin Sentiment
btc.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,20.0,20.0,20.0,20.0
mean,0.063935,0.0487,0.0332,0.9181
std,0.410191,0.060552,0.066017,0.076228
min,-0.886,0.0,0.0,0.729
25%,-0.074,0.0,0.0,0.8685
50%,0.0,0.0,0.0,0.9185
75%,0.416125,0.091,0.0645,1.0
max,0.6908,0.178,0.271,1.0


In [14]:
# Describe the Ethereum Sentiment
eth.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,20.0,20.0,20.0,20.0
mean,0.1194,0.05235,0.0176,0.93
std,0.332101,0.064716,0.032365,0.064286
min,-0.4404,0.0,0.0,0.81
25%,0.0,0.0,0.0,0.879
50%,0.0,0.024,0.0,0.9335
75%,0.3453,0.0825,0.0115,1.0
max,0.6956,0.19,0.094,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: ethereum

Q: Which coin had the highest compound score?

A: ethereum

Q. Which coin had the highest positive score?

A: ethereum

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [15]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [29]:
# Instantiate the lemmatizer
wnl = WordNetLemmatizer()

# Create a list of stopwords
stop = stopwords.words('english')

# Expand the default stopwords list if necessary
stop.append("'s")
stop.append('it')
stop.append("n't")
stop = set(stop)

In [30]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
  
    # Create a tokenized list of the words
    words = word_tokenize(text)
   
    # Convert the words to lowercase
    words = list(filter(lambda w: w.lower(), words))
    
    # Remove the stop words
    words = list(filter(lambda t: t.lower() not in stop, words))
    
    # Remove the punctuation from text
    words = list(filter(lambda t: t not in punctuation, words))
                 
    # Lemmatize words into root words
    tokens = [wnl.lemmatize(word) for word in words]
    
    return tokens

In [31]:
# Create a new tokens column for Bitcoin
btc['tokens'] = btc.text.apply(tokenizer)
btc

Unnamed: 0,Compound,Positive,Negative,Neutral,text,tokens
0,-0.34,0.0,0.061,0.939,When my wife started a little garden in our ur...,"[wife, started, little, garden, urban, backyar..."
1,0.6908,0.178,0.0,0.822,"Like Dogecoin devotees, the mayor of Reno, and...","[Like, Dogecoin, devotee, mayor, Reno, leader,..."
2,0.4019,0.08,0.0,0.92,Photo by Joe Raedle/Getty Images\r\n\n \n\n Tw...,"[Photo, Joe, Raedle/Getty, Images, Twitter, Sq..."
3,-0.886,0.0,0.271,0.729,"By Joe TidyCyber reporter \r\n""Follow the mone...","[Joe, TidyCyber, reporter, '', Follow, money, ..."
4,0.624,0.127,0.0,0.873,To get a roundup of TechCrunchs biggest and mo...,"[get, roundup, TechCrunchs, biggest, important..."
5,0.2732,0.097,0.0,0.903,"As longtime TechCrunch readers know well, Mich...","[longtime, TechCrunch, reader, know, well, Mic..."
6,0.5719,0.139,0.0,0.861,"After the bell today, Coinbase reported anothe...","[bell, today, Coinbase, reported, another, per..."
7,0.128,0.089,0.075,0.836,"SINGAPORE, July 28 (Reuters) - Bitcoin broke a...","[SINGAPORE, July, 28, Reuters, Bitcoin, broke,..."
8,0.0,0.0,0.0,1.0,T-Mobile is grappling with yet another reporte...,"[T-Mobile, grappling, yet, another, reported, ..."
9,0.0,0.0,0.0,1.0,Representations of cryptocurrency Bitcoin are ...,"[Representations, cryptocurrency, Bitcoin, see..."


In [32]:
# Create a new tokens column for Ethereum
eth['tokens'] = eth.text.apply(tokenizer)
eth

Unnamed: 0,Compound,Positive,Negative,Neutral,text,tokens
0,0.3612,0.075,0.0,0.925,There are many blockchain platforms competing ...,"[many, blockchain, platform, competing, invest..."
1,-0.2411,0.0,0.061,0.939,Blockchain infrastructure startups are heating...,"[Blockchain, infrastructure, startup, heating,..."
2,0.6956,0.19,0.0,0.81,Cent was founded in 2017 as an ad-free creator...,"[Cent, founded, 2017, ad-free, creator, networ..."
3,0.5719,0.139,0.0,0.861,"After the bell today, Coinbase reported anothe...","[bell, today, Coinbase, reported, another, per..."
4,0.0,0.0,0.0,1.0,Representation of the Ethereum virtual currenc...,"[Representation, Ethereum, virtual, currency, ..."
5,0.0,0.0,0.0,1.0,"HONG KONG, Aug 5 (Reuters) - Ether held near t...","[HONG, KONG, Aug, 5, Reuters, Ether, held, nea..."
6,0.0,0.0,0.0,1.0,Representations of cryptocurrencies Bitcoin an...,"[Representations, cryptocurrencies, Bitcoin, E..."
7,0.34,0.105,0.0,0.895,Cryptocurrencies spiked Monday after Amazon li...,"[Cryptocurrencies, spiked, Monday, Amazon, lis..."
8,-0.3612,0.05,0.094,0.855,"By Mary-Ann RussonBusiness reporter, BBC News\...","[Mary-Ann, RussonBusiness, reporter, BBC, News..."
9,0.6369,0.157,0.0,0.843,"""Anthony Di Iorio, a co-founder of the Ethereu...","[``, Anthony, Di, Iorio, co-founder, Ethereum,..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [15]:
from collections import Counter
from nltk import ngrams

In [16]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [18]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [20]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---