# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi.newsapi_client import NewsApiClient

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\yasir\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [20]:
# Read your api key environment variable
api_key = os.getenv("NEWS_API_KEY")

In [23]:
type(api_key)


str

In [24]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=api_key)

In [25]:
# Fetch the Bitcoin news articles
bitcoin_news_articles = newsapi.get_everything(q="bitcoin", language="en")
type(bitcoin_news_articles)

dict

In [26]:

# Print total articles
print(f"Total articles about bitcoin: {bitcoin_news_articles['totalResults']}")

# Show sample article
bitcoin_news_articles["articles"][2]
    
# Getting all articles together      
bitcoin_contents = []
for article in bitcoin_news_articles['articles']:
      if article['content']:
          bitcoin_contents.append(article['content'])
    
print(bitcoin_contents)

Total articles about bitcoin: 7883
['One of the strictest crackdowns worldwide\r\nPhoto by Michele Doying / The Verge\r\nIndia is reportedly moving forward with a sweeping ban on cryptocurrencies. According to Reuters, the countrys legislat… [+1656 chars]', 'The hacker behind last years big Twitter hack\r\n has just been sentenced to hard time.\r\nGraham Ivan Clark, the teenage hacker who broke\r\n into Twitters systems, took over verified accounts, and used t… [+2552 chars]', 'Some things are best left a mystery at least as far as Coinbase is concerned.\xa0\r\nThe San Francisco-based cryptocurrency exchange has been preparing to go public since last year, and in a Thursday pros… [+1953 chars]', 'TL;DR: Enter the The Complete Bitcoin (BTC) Investment Giveaway for a chance to win over $12,000 in cryptocurrency-related prizes.\r\nThe Bitcoin Investment Giveaway includes everything you need to get… [+1641 chars]', 'A proposed law in India would make it a crime to mine, trade, or even hold

In [27]:
# Create the Bitcoin sentiment scores DataFrame
bitcoin_sentiments = []

for article in bitcoin_news_articles["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        bitcoin_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
bitcoin_df = pd.DataFrame(bitcoin_sentiments)

# Reorder DataFrame columns
cols = ["compound", "negative", "neutral", "positive", "text"]
bitcoin_df = bitcoin_df[cols]

bitcoin_df.head()
bitcoin_df.shape

(19, 5)

In [29]:
bitcoin_df.head()

Unnamed: 0,compound,negative,neutral,positive,text
0,-0.5574,0.11,0.89,0.0,One of the strictest crackdowns worldwide\r\nP...
1,-0.5106,0.142,0.858,0.0,The hacker behind last years big Twitter hack\...
2,0.6369,0.0,0.887,0.113,Some things are best left a mystery at least a...
3,0.0,0.0,0.0,0.0,
4,0.8316,0.0,0.754,0.246,TL;DR: Enter the The Complete Bitcoin (BTC) In...


In [30]:
bitcoin_df.describe()

Unnamed: 0,compound,negative,neutral,positive
count,19.0,19.0,19.0,19.0
mean,0.038342,0.043895,0.846684,0.056842
std,0.464811,0.087203,0.233882,0.077159
min,-0.9062,0.0,0.0,0.0
25%,-0.125,0.0,0.8105,0.0
50%,0.0,0.0,0.89,0.0
75%,0.27155,0.0475,1.0,0.087
max,0.8316,0.326,1.0,0.246


In [31]:
# Fetch the Ethereum news articles
ethereum_news_articles = newsapi.get_everything(q="ethereum", language="en")
type(ethereum_news_articles)

dict

In [38]:
# Print total articles
print(f"Total articles about ethereum: {ethereum_news_articles['totalResults']}")

# Show sample article
ethereum_news_articles["articles"][2]

Total articles about ethereum: 2032


{'source': {'id': 'mashable', 'name': 'Mashable'},
 'author': "Danica D'Souza",
 'title': "A beginner's guide to NFTs, the crypto potentially worth millions",
 'description': "Here's everything you need to know about non-fungible tokens, the latest cryptocurrency craze.\nRead the full story here. (And learn even more about NFTs here.)\xa0 Read more...More about Mashable Video, Blockchain, Ethereum, Cryptocurrency, and Nft",
 'url': 'https://mashable.com/video/what-is-an-nft-explainer/',
 'urlToImage': 'https://mondrian.mashable.com/2021%252F03%252F16%252Fe1%252F03d4c959891a4bc4b6f9115746360a34.94b78.png%252F1200x630.png?signature=-pFRPpZyBXt2XETGtgHXvPcePz4=',
 'publishedAt': '2021-03-16T18:30:28Z',
 'content': "Here's everything you need to know about non-fungible tokens, the latest cryptocurrency craze.\r\nRead the full story here. (And learn even more about NFTs here.)"}

In [39]:
ethereum_contents = []
for article in bitcoin_news_articles['articles']:
      if article['content']:
            ethereum_contents.append(article['content'])
    
print(ethereum_contents)

['One of the strictest crackdowns worldwide\r\nPhoto by Michele Doying / The Verge\r\nIndia is reportedly moving forward with a sweeping ban on cryptocurrencies. According to Reuters, the countrys legislat… [+1656 chars]', 'The hacker behind last years big Twitter hack\r\n has just been sentenced to hard time.\r\nGraham Ivan Clark, the teenage hacker who broke\r\n into Twitters systems, took over verified accounts, and used t… [+2552 chars]', 'Some things are best left a mystery at least as far as Coinbase is concerned.\xa0\r\nThe San Francisco-based cryptocurrency exchange has been preparing to go public since last year, and in a Thursday pros… [+1953 chars]', 'TL;DR: Enter the The Complete Bitcoin (BTC) Investment Giveaway for a chance to win over $12,000 in cryptocurrency-related prizes.\r\nThe Bitcoin Investment Giveaway includes everything you need to get… [+1641 chars]', 'A proposed law in India would make it a crime to mine, trade, or even hold cryptocurrencies like bitcoin in t

In [40]:
# Create the Ethereum sentiment scores DataFrame
ethereum_sentiments = []

for article in ethereum_news_articles["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        ethereum_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
ethereum_df = pd.DataFrame(ethereum_sentiments)

# Reorder DataFrame columns
cols = ["compound", "negative", "neutral", "positive", "text"]
ethereum_df = ethereum_df[cols]

ethereum_df.shape

(20, 5)

In [41]:
ethereum_df.head()

Unnamed: 0,compound,negative,neutral,positive,text
0,0.0,0.0,1.0,0.0,Famed auction house Christies just sold its fi...
1,-0.5574,0.11,0.89,0.0,One of the strictest crackdowns worldwide\r\nP...
2,-0.1531,0.062,0.938,0.0,Here's everything you need to know about non-f...
3,0.4767,0.0,0.916,0.084,OpenSea has been one of a handful of NFT marke...
4,-0.4588,0.145,0.789,0.066,NFTs are the latest cryptocurrency rage these ...


In [42]:
# Describe the Ethereum Sentiment
ethereum_df.describe()

Unnamed: 0,compound,negative,neutral,positive
count,20.0,20.0,20.0,20.0
mean,0.164405,0.03075,0.89905,0.07025
std,0.40755,0.046713,0.076916,0.071886
min,-0.5574,0.0,0.754,0.0
25%,-0.038275,0.0,0.856,0.0
50%,0.1609,0.0,0.9165,0.072
75%,0.44915,0.0645,0.93825,0.09025
max,0.8316,0.145,1.0,0.246


### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [34]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Instantiate the lemmatizer
# YOUR CODE HERE!

# Create a list of stopwords
# YOUR CODE HERE!

# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text

   
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    return tokens

In [13]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [14]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [3]:
from collections import Counter
from nltk import ngrams

In [16]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [18]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [20]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [4]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [5]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---