 <img style="float: right;" src="https://docs.expert.ai/logo.png" width="150px">
 
# Detect Personally Identifiable Information (PII) in Italian documents

In this notbook you will learn how to detect [PII](https://en.wikipedia.org/wiki/Personal_data) in Italian documents using the expert.ai [Natural Language API](https://docs.expert.ai/nlapi).  
Detecting PII allows you to determine if a document contains sensitive data and helps creating a new version of the document in which that data is [de-identified](https://en.wikipedia.org/wiki/De-identification).

## Requisites

This notebook uses [expertai-nlapi](https://pypi.org/project/expertai-nlapi/) to access the Natural Language API and [pandas](https://pypi.org/project/pandas/) to present results, so install both packages:

In [None]:
!pip install expertai-nlapi

In [None]:
!pip install pandas

To access the API you need to set two environment variables with your expert.ai developer account credentials.  
If you don't have an account already, get one for free by signing up on [developer.expert.ai](https://developer.expert.ai).  
Replace `YOUR USERNAME` and `YOUR PASSWORD` with your credentials:

In [None]:
import os
os.environ["EAI_USERNAME"] = 'YOUR USERNAME'
os.environ["EAI_PASSWORD"] = 'YOUR PASSWORD'

## Instantiate the Natural Language API client

In [None]:
from expertai.nlapi.cloud.client import ExpertAiClient
import json, os

client = ExpertAiClient()

## Load the documents from the `documents_it` folder
The `documents_en` folder is located in the folder of the [GitHub repository](https://github.com/therealexpertai/) containing this notebook.

In [None]:
filesTexts=[]

for fileName in os.listdir("documents_it"):
    with open('documents_it/' + fileName) as file:
        filesTexts.append({'text':file.read(), 'fileName':fileName})

## Detect PII in all the documents

In [None]:
filesResults=[]

for fileText in filesTexts:
    filesResults.append({
        'fileName': fileText['fileName'],
        'results': client.detection(body={"document": {"text": fileText['text']}}, params={'language': 'it','detector':'pii'})
    })

## Present detected information with a pandas DataFrame

In [None]:
import pandas as pandas
import json
from IPython.core.display import display, HTML

pandas.set_option('display.max_rows', None)
mapColoredCell = set()

def coloredCell(s):
    key = '-'.join(s.name[0:3])
    if(key not in mapColoredCell):
        mapColoredCell.add(key)
        return ['border-top: 1px solid !important']
    
    return['']
   
dataToShow = []

for fileResults in filesResults:
    mapInstances = {}
    fieldName=""
    
    for extraction in fileResults['results'].extractions:
        
        if extraction.template in mapInstances:
            mapInstances[extraction.template] += 1
        else:
            mapInstances[extraction.template] = 1
            
        dateCount=0;
        
        for field in extraction.fields:
            fieldName = field.name
            if field.name == "dateTime":
                dateCount+=1
                fieldName+=" #" + str(dateCount)
            row = {
                "file": fileResults['fileName'],
                "template": extraction.template,
                "instance": '#' + str(mapInstances[extraction.template]),
                'field': fieldName,
                'value': field.value
            }

            dataToShow.append(row)
           
dataFrame = pandas.DataFrame(dataToShow)
dataFrame.set_index(['file', 'template', 'instance', 'field'], inplace=True)
leftAlignedDataFrame = dataFrame.style.set_properties(**{'text-align': 'left', 'padding-left': '30px'})  
leftAlignedDataFrame.apply(coloredCell,axis=1)
display(leftAlignedDataFrame)

## Print the JSON-LD object
The PII detector output includes a [JSON-LD](https://json-ld.org/) object. It contains exactly the same detected information, but in JSON-LD format and the data types are linked to [schema.org](https://schema.org/) types.

In [None]:
for fileResults in filesResults:
    print("************************")
    print (fileResults['fileName']+": ")
    print(json.dumps(fileResults['results'].extra_data, indent=2, sort_keys=True))
    print("************************")

Congratulations, you're done, it's that simple!  
Read the [documentation](https://docs.expert.ai/nlapi/latest/guide/detectors/#pii-detector) to know more about the capabilities of the PII detector.