# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi import NewsApiClient
from pathlib import Path
from datetime import datetime, timedelta
%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/tyesondemets/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv("NEWS_API_KEY")

In [3]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
btc_news = newsapi.get_everything(q='bitcoin', language='en')

In [5]:
#Display articles
btc_news

{'status': 'ok',
 'totalResults': 7876,
 'articles': [{'source': {'id': 'wired', 'name': 'Wired'},
   'author': 'Gian M. Volpicelli',
   'title': 'As Kazakhstan Descends into Chaos, Crypto Miners Are at a Loss',
   'description': 'The central Asian country became No. 2 in the world for Bitcoin mining. But political turmoil and power cuts have hit hard, and the future looks bleak.',
   'url': 'https://www.wired.com/story/kazakhstan-cryptocurrency-mining-unrest-energy/',
   'urlToImage': 'https://media.wired.com/photos/61de2d453e654a13e9a16ef0/191:100/w_1280,c_limit/Business_Kazakhstan-2HDE52K.jpg',
   'publishedAt': '2022-01-12T12:00:00Z',
   'content': 'When Denis Rusinovich set up cryptocurrency mining company Maveric Group in Kazakhstan in 2017, he thought he had hit the jackpot. Next door to China and Russia, the country had everything a Bitcoin … [+4140 chars]'},
  {'source': {'id': 'the-verge', 'name': 'The Verge'},
   'author': 'Mitchell Clark',
   'title': 'The International Mon

In [20]:
# Fetch the Ethereum news articles
eth_news = newsapi.get_everything(q='ethereum' or 'eth', language='en')

In [21]:
#Display articles
eth_news

{'status': 'ok',
 'totalResults': 3783,
 'articles': [{'source': {'id': 'the-verge', 'name': 'The Verge'},
   'author': 'Corin Faife',
   'title': 'Crypto.com admits over $30 million stolen by hackers',
   'description': 'Cryptocurrency exchange Crypto.com has said that $15 million in ethereum and $18 million in bitcoin were stolen by hackers in a security breach',
   'url': 'https://www.theverge.com/2022/1/20/22892958/crypto-com-exchange-hack-bitcoin-ethereum-security',
   'urlToImage': 'https://cdn.vox-cdn.com/thumbor/mde_l3lUC4muDPEFG7LYrUz0O3g=/0x146:2040x1214/fit-in/1200x630/cdn.vox-cdn.com/uploads/chorus_asset/file/8921023/acastro_bitcoin_2.jpg',
   'publishedAt': '2022-01-20T13:23:31Z',
   'content': 'In a new blog post the company said that 4,836 ETH and 443 bitcoin were taken\r\nIllustration by Alex Castro / The Verge\r\nIn a blog post published in the early hours of Thursday morning, cryptocurrency… [+2004 chars]'},
  {'source': {'id': None, 'name': 'Gizmodo.com'},
   'author

In [22]:
def create_df(news):
    articles = []
    for article in news:
        try:
            title = article["title"]
            description = article["description"]
            text = article["content"]
            date = article["publishedAt"]

            articles.append({
                "title": title,
                "description": description,
                "text": text,
                "date": date,
            })
        except AttributeError:
            pass

    return pd.DataFrame(articles)

In [23]:
btc_df = create_df(btc_news['articles'])
eth_df = create_df(eth_news['articles'])

In [24]:
btc_df.head()

Unnamed: 0,title,description,text,date
0,"As Kazakhstan Descends into Chaos, Crypto Mine...",The central Asian country became No. 2 in the ...,When Denis Rusinovich set up cryptocurrency mi...,2022-01-12T12:00:00Z
1,The International Monetary Fund tells El Salva...,The International Monetary Fund’s executive di...,El Salvador introduced Bitcoin as a legal tend...,2022-01-25T22:11:14Z
2,Jack Dorsey’s Block is working to make Bitcoin...,Block is working on building an “open Bitcoin ...,Were officially building an open Bitcoin minin...,2022-01-14T13:46:28Z
3,DeepDotWeb operator sentenced to eight years f...,"The operator of DeepDotWeb, a site that indexe...",Israeli national Tal Prihar pled guilty to rou...,2022-01-27T18:16:57Z
4,Crypto.com admits over $30 million stolen by h...,Cryptocurrency exchange Crypto.com has said th...,"In a new blog post the company said that 4,836...",2022-01-20T13:23:31Z


In [25]:
eth_df.head()

Unnamed: 0,title,description,text,date
0,Crypto.com admits over $30 million stolen by h...,Cryptocurrency exchange Crypto.com has said th...,"In a new blog post the company said that 4,836...",2022-01-20T13:23:31Z
1,Hackers Launder $15 Million Stolen From Crypto...,Hackers who made off with roughly $15 million ...,Hackers who made off with roughly $15 million ...,2022-01-19T12:00:00Z
2,Eric Adams Is Taking His First Paycheck in Crypto,"Mr. Adams, who wants New York City to become t...","On some level, the new mayor is simply employi...",2022-01-20T19:54:48Z
3,Robinhood opens cryptocurrency wallet to beta ...,"Back in September\r\n, Robinhood announced pla...","Back in September\r\n, Robinhood announced pla...",2022-01-21T22:57:21Z
4,Crypto.com Finally Acknowledges $34 Million St...,Trading platform Crypto.com lost about $34 mil...,Trading platform Crypto.com lost about $34 mil...,2022-01-20T12:00:00Z


In [13]:
eth_df.shape

(20, 4)

In [None]:
#Save to CSV
# file_path = Path("Resources/btc_headlines.csv")
# btc_df.to_csv(file_path, index=False, encoding='utf-8-sig')

In [None]:
#Save to CSV
# file_path = Path("Resources/eth_headlines.csv")
# eth_df.to_csv(file_path, index=False, encoding='utf-8-sig')

In [None]:
# Sentiment score function
# def get_sentiment(score):
#     """
#     Calculates the sentiment based on the compound score.
#     """
#     result = 0  # Neutral by default
#     if score >= 0.05:  # Positive
#         result = 1
#     elif score <= -0.05:  # Negative
#         result = -1

#     return result

In [26]:
# Create the Bitcoin sentiment scores DataFrame
btc_sentiments = []

for article in btc_news['articles']:
    try:
        text = article["content"]
        date = article["publishedAt"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        btc_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
    except AttributeError:
        pass
# Create DataFrame
btc_sentiments_df = pd.DataFrame(btc_sentiments)

#Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
btc_sentiments_df = btc_sentiments_df[cols]

btc_sentiments_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-01-12T12:00:00Z,When Denis Rusinovich set up cryptocurrency mi...,0.0,0.0,0.0,1.0
1,2022-01-25T22:11:14Z,El Salvador introduced Bitcoin as a legal tend...,0.3182,0.105,0.0,0.895
2,2022-01-14T13:46:28Z,Were officially building an open Bitcoin minin...,-0.4404,0.0,0.083,0.917
3,2022-01-27T18:16:57Z,Israeli national Tal Prihar pled guilty to rou...,-0.3182,0.045,0.084,0.871
4,2022-01-20T13:23:31Z,"In a new blog post the company said that 4,836...",0.0,0.0,0.0,1.0


In [28]:
# Create the Ethereum sentiment scores DataFrame
eth_sentiments = []

for article in eth_news['articles']:
    try:
        text = article["content"]
        date = article["publishedAt"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        eth_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
    except AttributeError:
        pass
# Create DataFrame
eth_sentiments_df = pd.DataFrame(eth_sentiments)

#Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
eth_sentiments_df = eth_sentiments_df[cols]

eth_sentiments_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-01-20T13:23:31Z,"In a new blog post the company said that 4,836...",0.0,0.0,0.0,1.0
1,2022-01-19T12:00:00Z,Hackers who made off with roughly $15 million ...,0.0,0.0,0.0,1.0
2,2022-01-20T19:54:48Z,"On some level, the new mayor is simply employi...",0.1779,0.052,0.0,0.948
3,2022-01-21T22:57:21Z,"Back in September\r\n, Robinhood announced pla...",0.0772,0.038,0.0,0.962
4,2022-01-20T12:00:00Z,Trading platform Crypto.com lost about $34 mil...,-0.1027,0.056,0.067,0.877


In [29]:
# Describe the Bitcoin Sentiment
btc_sentiments_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,0.05056,0.06255,0.0379,0.89955
std,0.376771,0.061657,0.043444,0.063682
min,-0.4404,0.0,0.0,0.765
25%,-0.33155,0.0,0.0,0.86625
50%,0.0386,0.054,0.0,0.9145
75%,0.32895,0.10675,0.08325,0.934
max,0.6808,0.185,0.101,1.0


In [30]:
# Describe the Ethereum Sentiment
eth_sentiments_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,0.155615,0.0652,0.019,0.9158
std,0.322773,0.063741,0.044849,0.079075
min,-0.6808,0.0,0.0,0.775
25%,0.0,0.0,0.0,0.8755
50%,0.08995,0.0515,0.0,0.927
75%,0.4068,0.106,0.0,1.0
max,0.7579,0.217,0.174,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum

Q: Which coin had the highest compound score?

A: Ethereum

Q. Which coin had the highest positive score?

A: Ethereum

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [31]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [33]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
sw = set(stopwords.words('english'))

# Expand the default stopwords list if necessary
sw

{'a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 'her',
 'here',
 'hers',
 'herself',
 'him',
 'himself',
 'his',
 'how',
 'i',
 'if',
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it's",
 'its',
 'itself',
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'only',
 'or',
 'other',
 'our',
 'ours',
 'ourselves',
 'out',
 'over',
 'own',
 'r

In [None]:
# # Complete the tokenizer function
# def tokenizer(text):
#     """Tokenizes text."""
    
#     # Remove the punctuation from text

   
#     # Create a tokenized list of the words
    
    
#     # Lemmatize words into root words

   
#     # Convert the words to lowercase
    
    
#     # Remove the stop words
    
    
#     return tokens

In [56]:
 def clean_text(article):
# Define a set of stopwords using `stopwords.words()`
    sw = set(stopwords.words('english'))
    #Create custom stopwords
    sw_addons = {'said', 'today', 'week'}
    # Define the regex parameters
    regex = re.compile("[^a-zA-Z ]")
    # Apply regex parameters to article
    re_clean = regex.sub('', article)
    # Apply `word_tokenize` to the regex scrubbed text
    re_words = word_tokenize(re_clean)
    #Apply lemmatizer
    # lemmatizer.lemmatize(re_words)
    # Create list of lower-case words that are not in the stopword set
    output = [word.lower() for word in re_words if word.lower() not in sw.union(sw_addons)]
    # Return the final list
    return output

In [57]:
btc_str = str(btc_sentiments_df['text'])

In [58]:
btc_cleaned = clean_text(btc_str)
set(btc_cleaned)

{'accept',
 'accepted',
 'announced',
 'asset',
 'back',
 'ban',
 'bank',
 'become',
 'beginning',
 'bitcoin',
 'bitcointhemed',
 'block',
 'blog',
 'british',
 'building',
 'caption',
 'central',
 'ceo',
 'company',
 'convention',
 'cryptocom',
 'cryptocurrency',
 'denis',
 'digital',
 'dorsey',
 'dtype',
 'el',
 'employi',
 'entrepreneur',
 'european',
 'events',
 'everywhere',
 'extended',
 'feb',
 'financier',
 'founder',
 'guilty',
 'image',
 'inc',
 'introduced',
 'israeli',
 'jack',
 'largest',
 'legal',
 'level',
 'lost',
 'mayor',
 'memebased',
 'mi',
 'mil',
 'minin',
 'national',
 'new',
 'object',
 'ode',
 'officially',
 'one',
 'onstage',
 'open',
 'pla',
 'platform',
 'pled',
 'post',
 'prihar',
 'proposed',
 'reasname',
 'regulators',
 'reuters',
 'ric',
 'richard',
 'robinhood',
 'rou',
 'rusinovich',
 'russias',
 'salvador',
 'septemberrn',
 'set',
 'simply',
 'staffrnjan',
 'superspreader',
 'tal',
 'tend',
 'tesla',
 'text',
 'thursday',
 'top',
 'trading',
 'tslao',

In [None]:
)

In [None]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [43]:
str(btc_sentiments_df['text'])

"0     When Denis Rusinovich set up cryptocurrency mi...\n1     El Salvador introduced Bitcoin as a legal tend...\n2     Were officially building an open Bitcoin minin...\n3     Israeli national Tal Prihar pled guilty to rou...\n4     In a new blog post the company said that 4,836...\n5     Bitcoin, the largest digital asset, extended i...\n6     Block founder Jack Dorsey has announced on Twi...\n7     Tesla Inc (TSLA.O) will accept the meme-based ...\n8     On some level, the new mayor is simply employi...\n9     British entrepreneur and financier Richard ODe...\n10    Russia's central bank on Thursday proposed ban...\n11    Image caption, Bitcoin is accepted everywhere ...\n12    A cryptocurrency CEO has become one of the ric...\n13    By Reuters Staff\\r\\nJan 26 (Reuters) - The U.S...\n14    Jack Dorsey onstage at a bitcoin convention in...\n15    Between Bitcoin-themed superspreader events an...\n16    Back in September\\r\\n, Robinhood announced pla...\n17    Trading platform Cry

In [None]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [None]:
from collections import Counter
from nltk import ngrams

In [None]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [None]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---