In [1]:
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

#### Initialize the client

In [2]:
client = language.LanguageServiceClient()

#### Create text content
The topic is about aircraft and airlines. 

In [24]:
text = 'The Airbus A380 is a remarkable piece of engineering but it may not \
be as relevant to the current aviation market as Airbus will have hoped. \
The preference of airlines to run multiple flights a day between major airports \
using smaller aircraft rather than a few flights using jumbo planes, and \
the prevalance of direct flights between smaller airports rather than a \
hub and spoke model means that there is a greater demand for medium-sized \
aircraft which are fuel-efficient and have a long range. This explains the \
success of the Boeing 787, and even Airbus has the A350 which competes with \
Boeing in this category.'

#### Create a document object
We need to create such an object in order to perform a classifiction operation. The content of the document is the plain text which we just created

In [25]:
document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)

#### Retrieve the category labels for our text
The classify_text() function returns a set of labels/categories for the topics discussed in the text

In [26]:
categories = client.classify_text(document).categories

#### Print out the categories along with the confidence scores
The classify_text() returns the set of categories along with a confidence score (between 0 and 1) for that category in the text.

In [27]:
for category in categories:
    print('=' * 20)
    print('{:<16}: {}'.format('name', category.name))
    print('{:<16}: {}'.format('confidence', category.confidence)) 

name            : /Business & Industrial/Transportation & Logistics
confidence      : 0.8600000143051147
name            : /Travel/Air Travel
confidence      : 0.7400000095367432


#### We extract the entities discussed in text using the analyze_entities() function
Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities.

In [28]:
entities = client.analyze_entities(document).entities

#### View the entities for our text
There a number of entities returned in order of salience within the text (highest to lowest). We view the entities along with their salience scores and the associated metadata (if any). The metadata currently supports a link to the Wikipedia article for the topic pertaining to the entity.

In [29]:
for entity in entities:
        print('=' * 20)
        print('{:<16}: {}'.format('name', entity.name))
        print('{:<16}: {}'.format('salience', entity.salience))
        print('{:<16}: {}'.format('metadata', list(entity.metadata.items())))

name            : Airbus A380
salience        : 0.4124455153942108
metadata        : [('wikipedia_url', 'https://en.wikipedia.org/wiki/Airbus_A380'), ('mid', '/m/018rl2')]
name            : Airbus
salience        : 0.30752143263816833
metadata        : [('wikipedia_url', 'https://en.wikipedia.org/wiki/Airbus'), ('mid', '/m/015zfz')]
name            : engineering
salience        : 0.0787150114774704
metadata        : []
name            : aviation market
salience        : 0.030781183391809464
metadata        : []
name            : flights
salience        : 0.012734441086649895
metadata        : []
name            : flights
salience        : 0.012734441086649895
metadata        : []
name            : preference
salience        : 0.012731047347187996
metadata        : []
name            : aircraft
salience        : 0.011622513644397259
metadata        : []
name            : airports
salience        : 0.011481346562504768
metadata        : []
name            : airports
salience        : 0.0

#### The text can also be contained in a Cloud Storage bucket
We will not display the contents of the bucket here, just the categories

In [30]:
document = types.Document(
    gcs_content_uri='gs://cloud-ml-api/text_file.txt',
    type=enums.Document.Type.PLAIN_TEXT)

In [31]:
categories = client.classify_text(document).categories

for category in categories:
    print('=' * 20)
    print('{:<16}: {}'.format('name', category.name))
    print('{:<16}: {}'.format('confidence', category.confidence))

name            : /Pets & Animals/Pets/Dogs
confidence      : 0.9900000095367432
name            : /Hobbies & Leisure
confidence      : 0.9800000190734863
name            : /Arts & Entertainment
confidence      : 0.550000011920929


#### Text classification is not supported for many languages
German is not supported as of 14-Aug-2018. The following error shows up:
<i>InvalidArgument: 400 The language de is not supported for classify_text analysis.</i>

In [32]:
text_de = 'Zwei Wochen vor Beginn der neuen Bundesligasaison \
FC Bayern ein deutliches Ausrufezeichen gesetzt. Dank eines 5:0 \
Erfolgs im Finale um den DFL-Supercup bei Eintracht Frankfurt \
haben die Münchner gezeigt, dass der Kampf um die Deutsche Meisterschaft auch \
in der Spielzeit 2018/19 nur über den Rekordchampion führt.'

In [33]:
document_de = types.Document(
    content=text_de,
    type=enums.Document.Type.PLAIN_TEXT)

In [34]:
categories = client.classify_text(document_de).categories

InvalidArgument: 400 The language de is not supported for classify_text analysis.