# Google Natural Language API Demo
#### Further Documentation:
https://cloud.google.com/natural-language 

## Introduction

The Natural Language API has several methods for performing analysis and annotation on your text. Each level of analysis provides valuable information for language understanding. These methods are listed below:

**Sentiment analysis** inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral. This method returns the sentiment of the text as a whole as well as the sentiment of individual sentences within it. Sentiment analysis is performed through the analyzeSentiment method.

**Entity analysis** inspects the given text for known entities (Proper nouns such as public figures, landmarks, and so on. Common nouns such as restaurant, stadium, and so on.) and returns information about those entities. This includes a Wikipedia link (if applicable), the entity type and the salience (a measure of the relevance of the entity to the entire text). Entity analysis is performed with the analyzeEntities method.

**Entity sentiment analysis** inspects the given text for known entities (proper nouns and common nouns), returns information about those entities, and identifies the prevailing emotional opinion of the entity within the text, especially to determine a writer's attitude toward the entity as positive, negative, or neutral. An example of how this might be used is when presented with a sentence that contains a number of different emotions; for example, "I liked the food but the service was terrible". Entity analysis is performed with the analyzeEntitySentiment method.

**Syntactic analysis** extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens. For each word in the text, the API tells you the word's part of speech (noun, verb, adjective, etc.) and how it relates to other words in the sentence. Syntactic Analysis is performed with the analyzeSyntax method.

**Content classification** analyzes text content and returns a content category for the content. Content classification is performed by using the classifyText method.

Each API call also detects and returns the language, if a language is not specified by the caller in the initial request. A full list of supported languages can be found here: https://cloud.google.com/natural-language/docs/languages

Additionally, if you wish to perform several natural language operations on given text using only one API call, the annotateText request can also be used to perform sentiment analysis and entity analysis.


## The Natural Language API: Set Up And Examples

#### Steps to set up Google Natural Language API

Enable the Google Natural Language API:
Enable Cloud Natural Language API in Google Marketplace: https://console.cloud.google.com/marketplace/product/google/language.googleapis.com

**Set your project ID:**
export PROJECT_ID=$(gcloud config get-value core/project)

**Create a service account:**
gcloud iam service-accounts create my-natlang-sa --display-name "my natural language service account"
  
**Create a Google Application Crendential key file:**
gcloud iam service-accounts keys create ~/key.json --iam-account my-natlang-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com
  
**Set up your environment to use the service account key file:**
export GOOGLE_APPLICATION_CREDENTIALS="/home/USER/key.json"

**Install/upgrade Google Cloud Language:**
pip install --user --upgrade google-cloud-language


In [1]:
pip install --user --upgrade google-cloud-language

Collecting google-cloud-language
  Downloading google_cloud_language-2.0.0-py2.py3-none-any.whl (149 kB)
[K     |████████████████████████████████| 149 kB 16.2 MB/s eta 0:00:01
Installing collected packages: google-cloud-language
Successfully installed google-cloud-language-2.0.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Import google-cloud-language
# Make sure that you have installed or upgraded to the latest google-cloud-language using pip
from google.cloud import language_v1 as language
import pandas as pd
#Print all columns and all rows in a panda dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

#### Set up functions to call Google Natural Language API
Here are some examples of the API in action <br>
Sentiment Analysis:

In [3]:
# Code from Google at https://codelabs.developers.google.com/codelabs/cloud-natural-language-python3#7
# Probably would be better off changing all the functions to follow the Google standard ones from the codelab, and then making 
# small modifications to the rest of the code to make it all work together.

def analyze_text_sentiment(text):
    client = language.LanguageServiceClient()
    document = language.Document(content=text, type_=language.Document.Type.PLAIN_TEXT)

    response = client.analyze_sentiment(document=document)

    sentiment = response.document_sentiment
    results = dict(
        text=text,
        score=f"{sentiment.score:.1%}",
        magnitude=f"{sentiment.magnitude:.1%}",
    )
    for k, v in results.items():
        print(f"{k:10}: {v}")
    
    # Get sentiment for all sentences in the document
    sentence_sentiment = []
    for sentence in response.sentences:
        item={}
        item["text"]=sentence.text.content
        item["sentiment score"]=sentence.sentiment.score
        item["sentiment magnitude"]=sentence.sentiment.magnitude
        sentence_sentiment.append(item)
    
    return sentence_sentiment

In [4]:
text = "Stocks are going down on the NASDAQ"
analyze_text_sentiment(text)

text      : Stocks are going down on the NASDAQ
score     : -70.0%
magnitude : 70.0%


[{'text': 'Stocks are going down on the NASDAQ',
  'sentiment score': -0.699999988079071,
  'sentiment magnitude': 0.699999988079071}]

Syntactic Analysis:

In [51]:
# Syntax Analysis
def gcp_analyze_syntax(text, debug=0):
    """
    Analyzing Syntax in a String

    Args:
      text The text content to analyze
    """

    client = language.LanguageServiceClient()
    document = language.Document(content=text, type_=language.Document.Type.PLAIN_TEXT)
    response = client.analyze_syntax(document=document)
    
    output = []   
    # Loop through tokens returned from the API
    for token in response.tokens:
        word = {}
        # Get the text content of this token. Usually a word or punctuation.
        text = token.text  

        # Get the part of speech information for this token.
        # Parts of spech are as defined in:
        # http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf
        part_of_speech = token.part_of_speech
        # Get the tag, e.g. NOUN, ADJ for Adjective, et al.
        
        # Get the dependency tree parse information for this token.
        # For more information on dependency labels:
        # http://www.aclweb.org/anthology/P13-2017
        dependency_edge = token.dependency_edge   
        
        word["word"]=text.content
        word["begin_offset"]=text.begin_offset        
        word["part_of_speech"]=language.PartOfSpeech.Tag(part_of_speech.tag).name
        
        # Get the voice, e.g. ACTIVE or PASSIVE
        word["Voice"]=language.PartOfSpeech.Voice(part_of_speech.voice).name
        word["Tense"]=language.PartOfSpeech.Tense(part_of_speech.tense).name
        
        # See API reference for additional Part of Speech information available
        # Get the lemma of the token. Wikipedia lemma description
        # https://en.wikipedia.org/wiki/Lemma_(morphology)        
        word["Lemma"]=token.lemma
        word["index"]=dependency_edge.head_token_index
        word["Label"]=language.DependencyEdge.Label(dependency_edge.label).name
        
        if debug:
            print(u"Token text: {}".format(text.content))
            print(
                u"Location of this token in overall document: {}".format(text.begin_offset)
            ) 
            print(
                u"Part of Speech tag: {}".format(
                    language.PartOfSpeech.Tag(part_of_speech.tag).name
                )
            )        

            print(u"Voice: {}".format(language.PartOfSpeech.Voice(part_of_speech.voice).name))
            # Get the tense, e.g. PAST, FUTURE, PRESENT, et al.
            print(u"Tense: {}".format(language.PartOfSpeech.Tense(part_of_speech.tense).name))

            print(u"Lemma: {}".format(token.lemma))

            print(u"Head token index: {}".format(dependency_edge.head_token_index))
            print(
                u"Label: {}".format(language.DependencyEdge.Label(dependency_edge.label).name)
            )
        
        output.append(word)
        

    # Get the language of the text, which will be the same as
    # the language specified in the request or, if not specified,
    # the automatically-detected language.
    if debug:
        print(u"Language of the text: {}".format(response.language))
    return (output)

In [97]:
gcp_analyze_syntax(text)

[{'word': 'Stocks',
  'begin_offset': -1,
  'part_of_speech': 'NOUN',
  'Voice': 'VOICE_UNKNOWN',
  'Tense': 'TENSE_UNKNOWN',
  'Lemma': 'stock',
  'index': 2,
  'Label': 'NSUBJ'},
 {'word': 'are',
  'begin_offset': -1,
  'part_of_speech': 'VERB',
  'Voice': 'VOICE_UNKNOWN',
  'Tense': 'PRESENT',
  'Lemma': 'be',
  'index': 2,
  'Label': 'AUX'},
 {'word': 'going',
  'begin_offset': -1,
  'part_of_speech': 'VERB',
  'Voice': 'VOICE_UNKNOWN',
  'Tense': 'PRESENT',
  'Lemma': 'go',
  'index': 2,
  'Label': 'ROOT'},
 {'word': 'down',
  'begin_offset': -1,
  'part_of_speech': 'ADV',
  'Voice': 'VOICE_UNKNOWN',
  'Tense': 'TENSE_UNKNOWN',
  'Lemma': 'down',
  'index': 2,
  'Label': 'ADVMOD'},
 {'word': 'on',
  'begin_offset': -1,
  'part_of_speech': 'ADP',
  'Voice': 'VOICE_UNKNOWN',
  'Tense': 'TENSE_UNKNOWN',
  'Lemma': 'on',
  'index': 2,
  'Label': 'PREP'},
 {'word': 'the',
  'begin_offset': -1,
  'part_of_speech': 'DET',
  'Voice': 'VOICE_UNKNOWN',
  'Tense': 'TENSE_UNKNOWN',
  'Lemma':

Entity Analysis:

In [53]:
# Entity Analysis
def gcp_analyze_entities(text, debug=0):
    """
    Analyzing Entities in a String

    Args:
      text_content The text content to analyze
    """

    client = language.LanguageServiceClient()
    document = language.Document(content=text, type_=language.Document.Type.PLAIN_TEXT)
    response = client.analyze_entities(document=document)
    output = []   
    
    # Loop through entitites returned from the API
    for entity in response.entities:
        item = {}
        item["name"]=entity.name
        item["type"]=language.Entity.Type(entity.type_).name
        item["Salience"]=entity.salience
        
        if debug:
            print(u"Representative name for the entity: {}".format(entity.name))

            # Get entity type, e.g. PERSON, LOCATION, ADDRESS, NUMBER, et al
            print(u"Entity type: {}".format(language.Entity.Type(entity.type_).name))

            # Get the salience score associated with the entity in the [0, 1.0] range
            print(u"Salience score: {}".format(entity.salience))

        # Loop over the metadata associated with entity. For many known entities,
        # the metadata is a Wikipedia URL (wikipedia_url) and Knowledge Graph MID (mid).
        # Some entity types may have additional metadata, e.g. ADDRESS entities
        # may have metadata for the address street_name, postal_code, et al.
        for metadata_name, metadata_value in entity.metadata.items():
            item[metadata_name]=metadata_value
            if debug:
                print(u"{}: {}".format(metadata_name, metadata_value))

        # Loop over the mentions of this entity in the input document.
        # The API currently supports proper noun mentions.
        if debug:
            for mention in entity.mentions:
                print(u"Mention text: {}".format(mention.text.content))
                # Get the mention type, e.g. PROPER for proper noun
                print(
                    u"Mention type: {}".format(language.EntityMention.Type(mention.type_).name)
                )
        output.append(item)
    
    # Get the language of the text, which will be the same as
    # the language specified in the request or, if not specified,
    # the automatically-detected language.
    if debug:
        print(u"Language of the text: {}".format(response.language))
    
    return(output)

In [98]:
gcp_analyze_entities(text)

[{'name': 'Stocks', 'type': 'OTHER', 'Salience': 0.8703455328941345},
 {'name': 'NASDAQ',
  'type': 'ORGANIZATION',
  'Salience': 0.12965445220470428,
  'wikipedia_url': 'https://en.wikipedia.org/wiki/Nasdaq',
  'mid': '/m/05dq_'}]

Content Classification:

In [99]:
# Content Classification

def gcp_classify_text(text):
    client = language.LanguageServiceClient()
    document = language.Document(content=text, type_=language.Document.Type.PLAIN_TEXT)

    response = client.classify_text(document=document)

    for category in response.categories:
        print("=" * 80)
        print(f"category  : {category.name}")
        print(f"confidence: {category.confidence:.0%}")

A longer piece of text is required.

In [100]:
text="Although most people consider piranhas to be quite dangerous, they are, for the most part, entirely harmless. \n\
Piranhas rarely feed on large animals; they eat smaller fish and aquatic plants. When confronted with humans, piranhas’ \n\
first instinct is to flee, not attack. Their fear of humans makes sense. Far more piranhas are eaten by people than people \n\
are eaten by piranhas. If the fish are well-fed, they won’t bite humans."

gcp_classify_text(text)

category  : /Hobbies & Leisure
confidence: 81%
category  : /Pets & Animals/Wildlife
confidence: 66%


## Demo 1 - Process a single news article

In [80]:
text="Deutsche Bank to Move ‘Heart’ of IT Systems Into Google’s Cloud  \n \
Steven Arons and Nico Grant  \n \
December 4, 2020  \n \
(Bloomberg) -- Deutsche Bank AG expects to ultimately replace large parts of its core banking system with alternatives powered by Alphabet Inc.’s Google, as the German lender embarks on its biggest effort yet to modernize computer systems that have hampered it for years.\n \
The two companies on Friday finalized a cloud computing agreement under which the German lender plans to shift most of its data onto Google servers, technology head Bernd Leukert said in a phone interview. Both firms also agreed to jointly develop products including new lending offerings and retail apps.\n \
The deal will include “applications at the heart of our IT,” Leukert said in an interview, adding that only Deutsche Bank will have the key to decrypt data it transfers to the cloud.\n \
Cutting costs through automation and better technology are a centerpiece of the turnaround plan unveiled by Chief Executive Officer Christian Sewing last year. For Google, the deal is a step toward breaking into the growing cloud business with European banks. The business is currently dominated by Microsoft Corp. and Amazon.com Inc., a Bloomberg News analysis earlier this year showed.\n \
Exactly how much of its systems Deutsche Bank will move into Google’s cloud will depend on “legal, regulatory and data privacy considerations,” Leukert said.\n \
The two companies expect to sell some technology that they develop together to other financial services providers as white-label products and split the revenue, he said.\n \
The contract is set to last at least 10 years and Deutsche Bank expects to make a cumulative return on investment of 1 billion euros ($1.2 billion) through the alliance, Bloomberg News previously reported.\n \
Alphabet last month unveiled an expansion of Google Pay, partnering with banks and retailers in the U.S. to offer consumers new forms of bank accounts, cards and discounts. The upgraded payment system marks the tech giant’s deepest foray yet into the U.S. financial system.  \n \
For more articles like this, please visit us at bloomberg.com \n \
©2020 Bloomberg L.P.©2020 Bloomberg L.P."

#### Analyze Syntax
Syntactic Analysis breaks up the given text into a series of sentences and tokens and provides linguistic information about those tokens

In [58]:
text_syntax=gcp_analyze_syntax(text)
df_syntax = pd.DataFrame(text_syntax)
df_syntax

Unnamed: 0,word,begin_offset,part_of_speech,Voice,Tense,Lemma,index,Label
0,Deutsche,-1,NOUN,VOICE_UNKNOWN,TENSE_UNKNOWN,Deutsche,1,NN
1,Bank,-1,NOUN,VOICE_UNKNOWN,TENSE_UNKNOWN,Bank,1,ROOT
2,to,-1,PRT,VOICE_UNKNOWN,TENSE_UNKNOWN,to,3,AUX
3,Move,-1,VERB,VOICE_UNKNOWN,TENSE_UNKNOWN,Move,1,VMOD
4,‘,-1,PUNCT,VOICE_UNKNOWN,TENSE_UNKNOWN,‘,5,P
5,Heart,-1,NOUN,VOICE_UNKNOWN,TENSE_UNKNOWN,Heart,3,DOBJ
6,’,-1,PUNCT,VOICE_UNKNOWN,TENSE_UNKNOWN,’,5,P
7,of,-1,ADP,VOICE_UNKNOWN,TENSE_UNKNOWN,of,5,PREP
8,IT,-1,NOUN,VOICE_UNKNOWN,TENSE_UNKNOWN,IT,9,NN
9,Systems,-1,NOUN,VOICE_UNKNOWN,TENSE_UNKNOWN,system,7,POBJ


#### Analyze Entities
Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities.

In [59]:
entities=gcp_analyze_entities(text)
df_entities = pd.DataFrame(entities)
df_entities

Unnamed: 0,name,type,Salience,mid,wikipedia_url,month,day,year,currency,value
0,Deutsche Bank,ORGANIZATION,0.112376,/m/02lc8s,https://en.wikipedia.org/wiki/Deutsche_Bank,,,,,
1,Google,ORGANIZATION,0.111964,/m/045c7b,https://en.wikipedia.org/wiki/Google,,,,,
2,lender,ORGANIZATION,0.111934,,,,,,,
3,core banking system,OTHER,0.04961,,,,,,,
4,IT Systems,OTHER,0.04961,,,,,,,
5,computer systems,OTHER,0.047956,,,,,,,
6,Heart,OTHER,0.038004,,,,,,,
7,parts,LOCATION,0.03344,,,,,,,
8,alternatives,OTHER,0.03344,,,,,,,
9,Bloomberg,ORGANIZATION,0.028145,/m/027sm6,https://en.wikipedia.org/wiki/Bloomberg_L.P.,,,,,


#### Classify Documents
Google Natual Language API classifies documents into these major categories: <br>
Adult

Arts & Entertainment

Autos & Vehicles

Beauty & Fitness

Books & Literature

Business & Industrial

Computers & Electronics

Finance

Food & Drink

Games

Health

Hobbies & Leisure

Home & Garden

Internet & Telecom

Jobs & Education

Law & Government

News

Online Communities

People & Society

Pets & Animals

Real Estate

Reference

Science

Sensitive Subjects

Shopping

Sports

Travel

A full list of categories and subcategories could be found here: https://cloud.google.com/natural-language/docs/categories

#### Analyze Sentiment
Interpreting Google Sentiment Analysis Values:

Sentiment Score - a number from -1.0 to 1.0 indicating how positive or negative the statement is.

Sentiment Magnitude - a number ranging from 0 to infinity that represents the weight of sentiment expressed in the statement, regardless of being positive or negative. This value is often proportional to the length of the document.

In [81]:
# sentiment, magnitude, sentence_sentiment=gcp_analyze_sentiment(text) <- never declared
sentence_sentiment = analyze_text_sentiment(text)

text      : Deutsche Bank to Move ‘Heart’ of IT Systems Into Google’s Cloud  
 Steven Arons and Nico Grant  
 December 4, 2020  
 (Bloomberg) -- Deutsche Bank AG expects to ultimately replace large parts of its core banking system with alternatives powered by Alphabet Inc.’s Google, as the German lender embarks on its biggest effort yet to modernize computer systems that have hampered it for years.
 The two companies on Friday finalized a cloud computing agreement under which the German lender plans to shift most of its data onto Google servers, technology head Bernd Leukert said in a phone interview. Both firms also agreed to jointly develop products including new lending offerings and retail apps.
 The deal will include “applications at the heart of our IT,” Leukert said in an interview, adding that only Deutsche Bank will have the key to decrypt data it transfers to the cloud.
 Cutting costs through automation and better technology are a centerpiece of the turnaround plan unveiled b

In [82]:
df_sentiment = pd.DataFrame(sentence_sentiment)
df_sentiment

Unnamed: 0,text,sentiment score,sentiment magnitude
0,Deutsche Bank to Move ‘Heart’ of IT Systems In...,-0.1,0.1
1,The two companies on Friday finalized a cloud ...,0.0,0.0
2,Both firms also agreed to jointly develop prod...,0.3,0.3
3,The deal will include “applications at the hea...,0.0,0.0
4,Cutting costs through automation and better te...,0.3,0.3
5,"For Google, the deal is a step toward breaking...",-0.1,0.1
6,The business is currently dominated by Microso...,0.0,0.0
7,Exactly how much of its systems Deutsche Bank ...,-0.1,0.1
8,The two companies expect to sell some technolo...,0.0,0.0
9,The contract is set to last at least 10 years ...,-0.1,0.1


## Demo 2 - Process sample news articles from Refinitiv

In [30]:
from google.cloud import storage

#news_sample="github/gcp/FinancialServicesHeadline100.csv" 
news_sample="gs://ml-core-shared-standard-bucket/data/FinancialServicesHeadline100.csv"
df = pd.read_csv(news_sample)
print(df.shape)
df.head()

OSError: Forbidden: https://www.googleapis.com/storage/v1/b/ml-core-shared-standard-bucket/o/
231657102360-compute@developer.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket.

In [None]:
text=df["headline"]
print("size of document:", text.shape)
text.head()

In [None]:
# Combine all news into one document
text_all= df["headline"].to_string(index=False)
#print(text_all)

#### Analyze Syntax
Syntactic Analysis breaks up the given text into a series of sentences and tokens and provides linguistic information about those tokens

In [None]:
# Process each news as a separate document

df_text_syntax=pd.DataFrame()
for text in df["headline"]:
    item=gcp_analyze_syntax(text)
    df_text_syntax=df_text_syntax.append(pd.DataFrame(item))


In [None]:
print("size of output:", df_text_syntax.shape)
df_text_syntax.head(50)

#### Analyze Entities
Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities.

In [None]:
# Process each article independently

df_entities=pd.DataFrame()
for text in df["headline"]:
    item=pd.DataFrame(gcp_analyze_entities(text))
    df_entities=df_entities.append(item, ignore_index=True)
# entities=gcp_analyze_entities(text_all)
# df_entities2 = pd.DataFrame(entities)

In [None]:
print("size of output:", df_entities.shape)
df_entities.head(50)

#### Classify Documents
Google Natual Language API classifies documents into these major categories: <br>
Adult

Arts & Entertainment

Autos & Vehicles

Beauty & Fitness

Books & Literature

Business & Industrial

Computers & Electronics

Finance

Food & Drink

Games

Health

Hobbies & Leisure

Home & Garden

Internet & Telecom

Jobs & Education

Law & Government

News

Online Communities

People & Society

Pets & Animals

Real Estate

Reference

Science

Sensitive Subjects

Shopping

Sports

Travel

A full list of categories and subcategories could be found here:
https://cloud.google.com/natural-language/docs/categories

In [26]:
## Overall document classification
gcp_classify_text(text_all)

NameError: name 'text_all' is not defined

In [27]:
# Process each article independently

df_sentiment=pd.DataFrame()
item_sentiment=pd.DataFrame(columns=["text", "sentiment score","sentiment magnitude"])
for text in df["headline"]:
    sentiment, magnitude, sentence_sentiment=gcp_analyze_sentiment(text)
    #item=pd.DataFrame(sentence_sentiment)
    #df_sentiment=df_sentiment.append(item, ignore_index=True)
    item_sentiment.loc[0, "text"]=text
    item_sentiment.loc[0, "sentiment score"]=sentiment
    item_sentiment.loc[0,"sentiment magnitude"]=magnitude
    df_sentiment=df_sentiment.append(item_sentiment, ignore_index=True)

NameError: name 'df' is not defined

In [None]:
df_sentiment.head(100)

In [None]:
# Plot Sentiment Scores
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import colors
from matplotlib.ticker import PercentFormatter
plt.rcParams.update({'figure.figsize':(16,8)})

x = df_sentiment["sentiment score"]
y =  df_sentiment["sentiment magnitude"]

sns.scatterplot(data= df_sentiment[["sentiment score", "sentiment magnitude"]])
                
n_bins=30

#plt.hist(x, bins=n_bins)
#plt.show()

fig, axs = plt.subplots(1, 2, sharey=True, tight_layout=True)
# We can set the number of bins with the `bins` kwarg
axs[0].set_xlabel("Sentiment Score")
axs[0].set_ylabel("percentage")
axs[0].set_title('Histogram of Sentiment Score')
axs[1].set_xlabel("Sentiment Magnitude")
axs[1].set_title('Histogram of Sentiment Magnitude')

axs[0].hist(x, bins=n_bins)
axs[1].hist(y, bins=n_bins)
plt.show()


fig, ax = plt.subplots(tight_layout=True)
hist = ax.hist2d(x, y, norm=colors.LogNorm())
plt.title("Sentiment Score and Magnitude 2-D Distribution")
ax.set_xlabel("Sentiment Score")
ax.set_ylabel("Sentiment Magnitude")

plt.show()

In [None]:
df_sentiment_all.head(10)