# Google Natural Language API test

## 主要測試任務
利用Google nlp API進行文本情感分析（`analyze_sentiment`）、實體辨識（`analyze_entities`）、句構分析（`analyze_syntax`）與文本分類（`classify_text`）測試 

## 測試前準備
1. 啟用Google Cloud Natural Language API （啟用教學可參考[此篇](https://segmentfault.com/a/1190000014216330)）
2. Python >=3.6
3. Google Cloud Natural Language API client Python library -> google-cloud-language

## 測試步驟
1. Set Up Authentication
2. Test APIs


## Reference 
皆參考[官方文件](https://cloud.google.com/natural-language/docs/reference/rpc/google.cloud.language.v1)與相關sample code，但官方文件可能許久未維護，有些範例程式並不可用，因此需稍加修改才能使用

---

## 1. Set Up Authentication
[reference](https://stackoverflow.com/questions/44328277/how-to-auth-to-google-cloud-using-service-account-in-python)

In [1]:
# !pip install --upgrade google-cloud
# !pip install --upgrade google-cloud-language

import os
from google.cloud import language_v1

# setting the credentials locally 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'cred.json'

# Instantiates a client
client = language_v1.LanguageServiceClient()

## 2. Test APIs
> 官方的[試用api頁面](https://cloud.google.com/natural-language/)其實也不錯

In [2]:
# 測試用文字
text = "普通的版型沒有腰身，價錢便宜穿起來舒適！"

### 2.1 client.analyze_sentiment -> 做情感分析

In [3]:
# about sentiment score
# https://cloud.google.com/natural-language/docs/reference/rpc/google.cloud.language.v1#google.cloud.language.v1.Sentiment

def get_sentiment_score(content):

    document = language_v1.Document(
        content=content, type_=language_v1.Document.Type.PLAIN_TEXT
    )

    # Detects the sentiment of the text
    sentiment = client.analyze_sentiment(
        request={"document": document}
    ).document_sentiment
    
    score = sentiment.score
    
    return score

sentiment = get_sentiment_score(text)

print("Text: {}".format(text))
print("Sentiment Score: {}".format(sentiment))

Text: 普通的版型沒有腰身，價錢便宜穿起來舒適！
Sentiment Score: 0.8999999761581421


### 2.2 client.analyze_entities -> 做實體辨識

In [4]:
# about entities
# https://cloud.google.com/natural-language/docs/reference/rpc/google.cloud.language.v1#google.cloud.language.v1.Entity

def get_entities(content):

    document = language_v1.Document(
        content=content, type_=language_v1.Document.Type.PLAIN_TEXT
    )

    # Detects the entities of the text
    response = client.analyze_entities(
        request={"document": document}
    )
    
    print("Text: {}".format(content))
    print()
    for entity in response.entities:
        print(u"Representative name for the entity: {}".format(entity.name))

        # Get entity type, e.g. PERSON, LOCATION, ADDRESS, NUMBER, et al
        print(u"Entity type: {}".format(language_v1.Entity.Type(entity.type_).name))

        # Get the salience score associated with the entity in the [0, 1.0] range
        # salience score 某字對文本的重要性
        print(u"Salience score: {}".format(entity.salience))

        # Loop over the metadata associated with entity. For many known entities,
        # the metadata is a Wikipedia URL (wikipedia_url) and Knowledge Graph MID (mid).
        # Some entity types may have additional metadata, e.g. ADDRESS entities
        # may have metadata for the address street_name, postal_code, et al.
        for metadata_name, metadata_value in entity.metadata.items():
            print(u"{}: {}".format(metadata_name, metadata_value))

        # Loop over the mentions of this entity in the input document.
        # The API currently supports proper noun mentions.
        for mention in entity.mentions:
            print(u"Mention text: {}".format(mention.text.content))

            # Get the mention type, e.g. PROPER for proper noun
            print(
                u"Mention type: {}".format(language_v1.EntityMention.Type(mention.type_).name)
            )
        print()
    return response

entities = get_entities(text)


Text: 普通的版型沒有腰身，價錢便宜穿起來舒適！

Representative name for the entity: 版型
Entity type: OTHER
Salience score: 0.38793444633483887
Mention text: 版型
Mention type: COMMON

Representative name for the entity: 價錢
Entity type: OTHER
Salience score: 0.3216562867164612
Mention text: 價錢
Mention type: COMMON

Representative name for the entity: 腰身
Entity type: OTHER
Salience score: 0.29040926694869995
Mention text: 腰身
Mention type: COMMON



### 2.3 client.analyze_syntax -> 做語句結構分析

In [5]:
# about syntax
# https://cloud.google.com/natural-language/docs/reference/rpc/google.cloud.language.v1#google.cloud.language.v1.Entity

def get_syntax(content):

    document = language_v1.Document(
        content=content, type_=language_v1.Document.Type.PLAIN_TEXT
    )

    # Detects the syntax of the text
    response = client.analyze_syntax(
        request={"document": document}
    )
    
    print("Text: {}".format(content))
    print()
    
    tmp = []
    for token in response.tokens:
        # Get the text content of this token. Usually a word or punctuation.
        text = token.text
        print(u"Token text: {}".format(text.content))
        print(
            u"Location of this token in overall document: {}".format(text.begin_offset)
        )
        # Get the part of speech information for this token.
        # Part of speech is defined in:
        # http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf
        part_of_speech = token.part_of_speech
        # Get the tag, e.g. NOUN, ADJ for Adjective, et al.
        print(
            u"Part of Speech tag: {}".format(
                language_v1.PartOfSpeech.Tag(part_of_speech.tag).name
            )
        )
        # Get the voice, e.g. ACTIVE or PASSIVE
        print(u"Voice: {}".format(language_v1.PartOfSpeech.Voice(part_of_speech.voice).name))
        # Get the tense, e.g. PAST, FUTURE, PRESENT, et al.
        print(u"Tense: {}".format(language_v1.PartOfSpeech.Tense(part_of_speech.tense).name))
        # See API reference for additional Part of Speech information available
        # Get the lemma of the token. Wikipedia lemma description
        # https://en.wikipedia.org/wiki/Lemma_(morphology)
        print(u"Lemma: {}".format(token.lemma))
        # Get the dependency tree parse information for this token.
        # For more information on dependency labels:
        # http://www.aclweb.org/anthology/P13-2017
        dependency_edge = token.dependency_edge
        print(u"Head token index: {}".format(dependency_edge.head_token_index))
        print(
            u"Label: {}".format(language_v1.DependencyEdge.Label(dependency_edge.label).name)
        )

        print()
        tmp.append([text.content, language_v1.PartOfSpeech.Tag(part_of_speech.tag).name,
                    dependency_edge.head_token_index,
                    language_v1.DependencyEdge.Label(dependency_edge.label).name])
    return tmp

syntax = get_syntax(text)

Text: 普通的版型沒有腰身，價錢便宜穿起來舒適！

Token text: 普通
Location of this token in overall document: -1
Part of Speech tag: ADJ
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: 普通
Head token index: 2
Label: AMOD

Token text: 的
Location of this token in overall document: -1
Part of Speech tag: PRT
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: 的
Head token index: 0
Label: RCMODREL

Token text: 版型
Location of this token in overall document: -1
Part of Speech tag: NOUN
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: 版型
Head token index: 3
Label: NSUBJ

Token text: 沒有
Location of this token in overall document: -1
Part of Speech tag: VERB
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: 沒有
Head token index: 8
Label: DEP

Token text: 腰身
Location of this token in overall document: -1
Part of Speech tag: NOUN
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: 腰身
Head token index: 3
Label: DOBJ

Token text: ，
Location of this token in overall document: -1
Part of Speech tag: PUNCT
Voice: VOICE_UNKNOWN
Tens

In [6]:
# !pip install --upgrade spacy
import spacy
from spacy import displacy

tokens = {
    "words": [
    ],
    "arcs": [
    ]
}
for i, item in enumerate(syntax):
    tokens["words"].append({"text": item[0], "tag": item[1]})
    
displacy.render(tokens, style='dep', jupyter=True,
                manual=True, options={'distance': 80})

### 2.4 client.classify_text -> 做文本分類
> 但不支援中文（The language zh-Hant is not supported for classify_text analysis.）

In [7]:
# about classify_text
# https://cloud.google.com/natural-language/docs/samples/language-classify-text-tutorial-classify

def classify_text(content, verbose=True):

    document = language_v1.Document(
        content=content, type_=language_v1.Document.Type.PLAIN_TEXT
    )

    response = client.classify_text(
        request={"document": document}
    )
    
    print("Text: {}".format(content))
    print()
    
    categories = response.categories

    result = {}

    for category in categories:
        # Turn the categories into a dictionary of the form:
        # {category.name: category.confidence}, so that they can
        # be treated as a sparse vector.
        result[category.name] = category.confidence

    if verbose:
        for category in categories:
            print(u"=" * 20)
            print(u"{:<16}: {}".format("category", category.name))
            print(u"{:<16}: {}".format("confidence", category.confidence))

    return result

text = "President Biden is giving a speech in Poland to wrap up a visit to Europe intended to bolster NATO’s unity over Russia’s invasion of Ukraine. While Mr. Biden was in Warsaw, an missile strike rocked the city of Lviv in western Ukraine, close to the Polish border."
classify_text = classify_text(text)

Text: President Biden is giving a speech in Poland to wrap up a visit to Europe intended to bolster NATO’s unity over Russia’s invasion of Ukraine. While Mr. Biden was in Warsaw, an missile strike rocked the city of Lviv in western Ukraine, close to the Polish border.

category        : /News/Politics
confidence      : 0.8899999856948853
category        : /Law & Government/Government
confidence      : 0.800000011920929
