 <img style="float: right;" src="https://docs.expert.ai/logo.png" width="150px">
 
# My first Notebook with expert.ai Natural Language API v2

 **expert.ai Natural Language API v2** (https://developer.expert.ai/) parses and "understands" large volumes of text.

In this section we'll install and play with expert.ai Natural Language API to work with Python, and then introduce some concepts related to Natural Language Processing.

You can also download the source code of our Python SDK and this notebook from Github at https://github.com/therealexpertai/


## Installation and Setup
First, install __expert.ai-nlapi__ library using pip. 
* https://pypi.org/project/expertai-nlapi/




In [None]:
!pip install -U expertai-nlapi

That's it, you're ready to go.

## Working with NL API in Python
First you have to setup your account credentials; if you don't have them, get them at https://developer.expert.ai/ui/login

Set your environment variables with NL API credentials 

```bash
SET EAI_USERNAME=YOUR_USER
SET EAI_PASSWORD=YOUR_PASSWORD
```
or 

```bash
export EAI_USERNAME=YOUR_USER
export EAI_PASSWORD=YOUR_PASSWORD
```

as an alternative you can always add to your notebook the following statements

```python
import os
os.environ["EAI_USERNAME"] = 'YOUR_USER'
os.environ["EAI_PASSWORD"] = 'YOUR_PASSWORD'
```

Now let's play with Python and Natural Language Processing.

Currently the API supports five languages i.e. English, French, Spanish, Italian and German. You have to define the text you want to process and the language model to use for the analysis.

In [None]:
from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()

In [None]:
text = 'Facebook is looking at buying an American startup for $6 million based in Springfield, IL .' 
language= 'en'

## Quick run
Let's start with the fist API, just sending the text. This is how it looks like.

In [None]:
document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'disambiguation'
})

The `disambiguation` analysis returns all the information generated by the Natural Language engine from the text. Let's see in the details the available metadata.

## Tokenization & Lemmatization
Lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a *morphological analysis* to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.

In [None]:
print (f'{"TOKEN":{20}} {"LEMMA":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{20}} {token.lemma:{8}}')

##  Part of Speech 
We also looked at the part-of-speech information assigned to each token

In [None]:
print (f'{"TOKEN":{18}} {"PoS":{6}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.pos:{6}} ' )

## Dependency Parsing information
The dependency parsing information are available for each token, together with the information about the connected tokens.

In [None]:
print (f'{"TOKEN":{18}} {"Dependency label":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.dependency.label:{4}} ' )

## Named Entities
Going a step beyond tokens, *named entities* add another layer of context.  Named entities are accessible through the `entities` object.

In [None]:
document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'entities'})


print (f'{"ENTITY":{20}} {"TYPE":{10}}')
       
for entity in document.entities:
    print (f'{entity.lemma:{20}} {entity.type_:{10}}')

Then you can get the open data connected with an entity, i.e `Springfield, IL` 

In [None]:
print(document.entities[1].lemma)

In [None]:
for entry in document.knowledge:
    if (entry.syncon == document.entities[1].syncon):
            for prop in entry.properties:
                print (f'{prop.type_:{12}} {prop.value:{30}}')
    

Springfield has been recognized as [Q28515](https://www.wikidata.org/wiki/Q28515) on Wikidata, that is the Q-id for Springfield, IL (i.e.not for Springfield in Vermont o in California)

## Key Elements
*Key elements* are identified from the document as main sentences, main keywords, main lemmas and relevant topics; let's focus on the main lemmas of the document.

In [None]:
document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'relevants'})


print (f'{"LEMMA":{20}} {"SCORE":{5}} ')
       
for mainlemma in document.main_lemmas:
    print (f'{mainlemma.value:{20}} {mainlemma.score:{5}}')

## Classification
Let's see how to classify documents according the **IPTC Media Topics Taxonomy**; we're going to use a text that has more textual information and then we'll use the matplot lib to show the categorization result
* [taxonomy definition](http://cv.iptc.org/newscodes/mediatopic)

Results will be displayed using Matplotlib, so first install it:

In [1]:
!pip install -U matplotlib

Collecting matplotlib
  Downloading matplotlib-3.4.2-cp38-cp38-win_amd64.whl (7.1 MB)
Installing collected packages: matplotlib
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.3.4
    Uninstalling matplotlib-3.3.4:
      Successfully uninstalled matplotlib-3.3.4
Successfully installed matplotlib-3.4.2


You should consider upgrading via the 'c:\users\avarone\appdata\local\programs\python\python38\python.exe -m pip install --upgrade pip' command.


Then, set the text you want to classifiy:

In [None]:
text = """Britain gave emergency approval on Wednesday to Pfizer’s American-developed coronavirus vaccine, leaping ahead of the United States to become the first Western country to allow its health service to begin mass inoculations against a disease that has killed more than 1.4 million people worldwide.
The approval kicks off a vaccination campaign with little precedent in modern medicine, encompassing not only ultracold dry ice and trays of glass vials but also a crusade against anti-vaccine misinformation.
The specter of Britain beating the United States to approval had already angered the White House in recent days, heaping additional pressure on American regulators to match Britain’s pace.
But while the go-ahead for Pfizer bodes well for rich countries like Britain that have ordered tens of millions of doses, it offered little relief to poorer countries that could not afford to buy supplies in advance and may struggle to pay for the exceptional demands of distributing the vaccine.
Already, the quandary of transporting vials at South Pole–like temperatures was dictating who could be vaccinated: Nursing-home residents were supposed to be Britain’s top priority under an advisory committee’s plans, but a limit on how many times officials believe the Pfizer vaccine can be moved before it loses effectiveness means that National Health Service staff members will receive the shots first."""

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

iptc_classification = client.classification(body={"document": {"text": text}}, params={'taxonomy': 'iptc', 'language': language})

iptc_categories = []
iptc_scores = []

print (f'{"CATEGORY":{27}} {"ID":{10}} {"FREQUENCY":{8}}')
for category in iptc_classification.categories:
    iptc_categories.append(category.label)
    iptc_scores.append(category.frequency)
    print (f'{category.label:{27}} {category.id_:{10}}{category.frequency:{8}}')
    
    

In [None]:
plt.bar(iptc_categories, iptc_scores, color='#17a2b8')
plt.xlabel("Categories")
plt.ylabel("Frequency")
plt.title("Media Topics Classification")

plt.show()


NL API v2 introduced an additional classifier, that classifies documents according to a geographic taxonomy.
* [taxonomy definition](https://docs.expert.ai/nlapi/latest/guide/taxonomies/)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

geo_classification = client.classification(body={"document": {"text": text}}, params={'taxonomy': 'geotax', 'language': language})

geo_categories = []
geo_scores = []

print (f'{"CATEGORY":{27}} {"ID":{10}} {"FREQUENCY":{8}}')
for category in geo_classification.categories:
    geo_categories.append(category.label)
    geo_scores.append(category.frequency)
    print (f'{category.label:{27}} {category.id_:{10}}{category.frequency:{8}}')
    

In [None]:
plt.bar(geo_categories, geo_scores, color='#66E295')
plt.xlabel("Categories")
plt.ylabel("Frequency")
plt.title("Geographic TaxonomyClassification")

plt.show()

Good job! You're an expert in the expert.ai community! 

Check out other language SDKs available on our [Github page](https://github.com/therealexpertai).