 <img style="float: right;" src="https://docs.expert.ai/logo.png" width="150px">
 
# My first Notebook with expert.ai Natural Language API 

 **expert.ai Natural Language API** (https://developer.expert.ai/) parses and "understands" large volumes of text.

In this section we'll install and play with expert.ai Natural Language API to work with Python, and then introduce some concepts related to Natural Language Processing.

You can also download the source code of our Python SDK and this notebook from Github at https://github.com/therealexpertai/


## Installation and Setup
First, install __expert.ai-nlapi__ library using pip. 
* https://pypi.org/project/expertai-nlapi/




In [None]:
!pip install -U expertai-nlapi==1.2.5

That's it, you're ready to go.

## Working with expert.ai in Python
First you have to setup your account credentials; if you don't have them, get them at https://developer.expert.ai/ui/login

In [None]:
import os
os.environ["EAI_USERNAME"] = 'techlab@expertsystem.com'
os.environ["EAI_PASSWORD"] = 'N3vershareyourpwd!'

Now let's play with Python and Natural Language Processing.

Currently the API supports five languages i.e. English, French, Spanish, Italian and German. You have to define the text you want to process and the language model to use for the analysis.

In [None]:
from expertai.client import ExpertAiClient
client = ExpertAiClient()

In [None]:
text = 'Facebook is looking at buying an American startup for $6 million based in Springfield, IL .' 
language= 'en'

## Quick run
Let's start with the fist API, just sending the text. This is how it looks like.

In [None]:
document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'disambiguation'
})

The `disambiguation` analysis returns all the information generated by the Natural Language engine from the text. Let's see in the details the available metadata.

## Tokenization & Lemmatization
Lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a *morphological analysis* to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.

In [None]:
print (f'{"TOKEN":{20}} {"LEMMA":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{20}} {token.lemma:{8}}')

##  Part of Speech 
We also looked at the part-of-speech information assigned to each token

In [None]:
print (f'{"TOKEN":{18}} {"PoS":{6}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.pos.key:{6}} ' )

## Dependency Parsing information
The dependency parsing information are available for each token, together with the information about the connected tokens.

In [None]:
print (f'{"TOKEN":{18}} {"Dependency label":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.dependency.label:{4}} ' )

## Named Entities
Going a step beyond tokens, *named entities* add another layer of context.  Named entities are accessible through the `entities` object.

In [None]:
document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'entities'})


print (f'{"ENTITY":{20}} {"TYPE":{10}} {"TYPE_EXPLAINED":{10}}')
       
for entity in document.entities:
    print (f'{entity.lemma:{20}} {entity.type_.key:{10}} {entity.type_.description:{10}}')

Then you can get the open data connected with an entity, i.e `Springfield, IL` 

In [None]:
print(document.entities[1].lemma)

In [None]:
for entry in document.knowledge:
    if (entry.syncon == document.entities[1].syncon):
            for prop in entry.properties:
                print (f'{prop.type_:{12}} {prop.value:{30}}')
    

Springfield has been recognized as [Q28515](https://www.wikidata.org/wiki/Q28515) on Wikidata, that is the Q-id for Springfield, IL (i.e.not for Springfield in Vermont o in California)

## Key Elements
*Key elements* are identified from the document as main sentences, main keywords, main lemmas and relevant topics; let's focus on the main lemmas of the document.

In [None]:
document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'relevants'})


print (f'{"LEMMA":{20}} {"SCORE":{5}} ')
       
for mainlemma in document.main_lemmas:
    print (f'{mainlemma.value:{20}} {mainlemma.score:{5}}')

## Classification
Let's see how to classify documents according the **IPTC Media Topics Taxonomy**; we're going to use a text that has more textual information and then we'll use the matplot lib to show the categorization result

In [None]:
text = """Strategic acquisitions have been important to the growth of Facebook (FB). 
Mark Zuckerberg founded the company in 2004, and since then it has acquired scores of companies, 
ranging from tiny two-person start-ups to well-established businesses such as WhatsApp. For 2019, 
Facebook reported 2.5 billion monthly active users (MAU) and $70.69 billion in revenue."""

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

document = client.iptc_media_topics_classification(body={"document": {"text": text}}, params={'language': language})

categories = []
scores = []

print (f'{"CATEGORY":{27}} {"IPTC ID":{10}} {"FREQUENCY":{8}}')
for category in document.categories:
    categories.append(category.label)
    scores.append(category.frequency)
    print (f'{category.label:{27}} {category.id_:{10}}{category.frequency:{8}}')
    
    

In [None]:
plt.bar(categories, scores, color='#17a2b8')
plt.xlabel("Categories")
plt.ylabel("Frequency")
plt.title("Media Topics Classification")

plt.show()


Good job! You're an expert in the expert.ai community! 

Check out other language SDKs available on our [Github page](https://github.com/therealexpertai).