In [23]:
import pandas as pd
import numpy as np
import os

import sys

from google.api_core.client_options import ClientOptions
from google.cloud import automl_v1
from google.cloud.automl_v1.proto import service_pb2

## Set the Credential

In [41]:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="my-project-12195-46fed19566f2.json"

## 1 - Single label classification classifies documents by assigning a label to them

#### Read the Document

In [11]:
happiness = pd.read_csv('happiness.csv',header=None,names=['items','label'])
happiness.head()

Unnamed: 0,items,label
0,We had a serious talk with some friends of our...,bonding
1,I meditated last night.,leisure
2,My grandmother start to walk from the bed afte...,affection
3,I picked my daughter up from the airport and w...,bonding
4,when i received flowers from my best friend,bonding


In [14]:
happiness.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12664 entries, 0 to 12663
Data columns (total 2 columns):
items    12664 non-null object
label    12664 non-null object
dtypes: object(2)
memory usage: 198.0+ KB


There are total 12664 values in the dataframe

##### Lets find the distinct classes for label column

In [16]:
happiness.label.unique()

array(['bonding', 'leisure', 'affection', 'enjoy_the_moment',
       'achievement', 'nature', 'exercise'], dtype=object)

There are total 7 classes for label column.

#### Step to train the model
1-The whole model is trained in Google cloud platform

2-Open the AutoML Natural Language UI and select Get started in the box corresponding to the type of model you plan to train.

3-Click the New Dataset button in the title bar.

4-Enter a name for the dataset and select the model as single label classification that matches the sample dataset you choose.

6-Leave the Location set to Global.

7-In the Import text items section, choose Select a CSV file on Cloud Storage, and enter the path to the dataset you want to use into the text box.

8-Now import will take some time and once done we will get a notification

9-Now review the dataset .The dataset is splitted it 80 10 10 and we can change it .

10-When we are done reviewing the dataset, click the Train tab just below the title bar and Click Start Training.

11-Enter a name for the new model and check the Deploy model after training finishes check box.

12-Once the model is trained Evaluate the model .Change the thresold as per your requiremnet


#### Now Lets make the prediction on new text

In [42]:
def inline_text_payload(file_path):
    with open(file_path, 'rb') as ff:
        content = ff.read()
    return {'text_snippet': {'content': content, 'mime_type': 'text/plain'} }

def pdf_payload(file_path):
    return {'document': {'input_config': {'gcs_source': {'input_uris': [file_path] } } } }

def get_prediction(file_path, model_name):
    options = ClientOptions(api_endpoint='automl.googleapis.com')
    prediction_client = automl_v1.PredictionServiceClient(client_options=options)

    payload = inline_text_payload(file_path)
    # Uncomment the following line (and comment the above line) if want to predict on PDFs.
    # payload = pdf_payload(file_path)

    params = {}
    request = prediction_client.predict(model_name, payload, params)
    return request  # waits until request is returned

get_prediction('item.txt', 'projects/my-project-12195/locations/us-central1/models/TCN8667988922455818240')

payload {
  annotation_spec_id: "4152891701993668608"
  classification {
    score: 0.7859312295913696
  }
  display_name: "nature"
}
payload {
  annotation_spec_id: "6458734711207362560"
  classification {
    score: 0.09789568930864334
  }
  display_name: "achievement"
}
payload {
  annotation_spec_id: "2711739821235109888"
  classification {
    score: 0.0628107562661171
  }
  display_name: "enjoy_the_moment"
}
payload {
  annotation_spec_id: "5017582830448803840"
  classification {
    score: 0.03903582692146301
  }
  display_name: "exercise"
}
payload {
  annotation_spec_id: "7323425839662497792"
  classification {
    score: 0.012882535345852375
  }
  display_name: "leisure"
}
payload {
  annotation_spec_id: "8764577720421056512"
  classification {
    score: 0.0009245725232176483
  }
  display_name: "bonding"
}
payload {
  annotation_spec_id: "405896812021415936"
  classification {
    score: 0.0005194258992560208
  }
  display_name: "affection"
}

Output:
The text "I picked up running again now that the winter weather has finally died down. I feel much better physically and emotionally because of it." belongs to :

    nature class with confidence score of 0.7859312295913696
    achievement class with confidence score of 0.09789568930864334
    enjoy_the_moment class with confidence score of 0.0628107562661171
    exercise class with confidence score of 0.03903582692146301
    leisure class with confidence score of 0.012882535345852375
    bonding class with confidence score of 0.0009245725232176483
    affection class with confidence score of 0.0005194258992560208

**Thus the text belongs to nature class**

## 2) Prediction on custom Sentiment analysis model

##### Below Dataset is used to train the sentiment model

In [32]:
Sentiment = pd.read_csv('crowdflower-twitter-claritin-80-10-10.csv',header=None,names=['split','content','Score'])
Sentiment.head()

Unnamed: 0,split,content,Score
0,TRAIN,@freewrytin God is way too good for Claritin,2
1,TRAIN,I need Claritin. So bad. When did I become cur...,3
2,TRAIN,Thank god for Claritin.,4
3,TRAIN,And what's worse is that I reached my 3-day li...,2
4,TRAIN,Time to take some Claritin or Allegra or somet...,3


In [36]:
Sentiment.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4574 entries, 0 to 4573
Data columns (total 3 columns):
split      4574 non-null object
content    4574 non-null object
Score      4574 non-null int64
dtypes: int64(1), object(2)
memory usage: 107.3+ KB


##### Lets find the no of distinct Score

In [34]:
Sentiment.Score.unique()

array([2, 3, 4, 1, 0], dtype=int64)

##### Same step as above is followed but instead of entity extraction sentiment analysis model is used with maximum score as 4

##### Now Lets make the prediction on new sentiment

#### Call the function and pass the txt and sentimental model path to predict the sentiment 

In [49]:
get_prediction('sentiment1.txt','projects/my-project-12195/locations/us-central1/models/TST7841015440879910912')

payload {
  text_sentiment {
    sentiment: 1
  }
}
metadata {
  key: "sentiment_score"
  value: "-0.4350065"
}

##### Sentiment Score for the text "I picked the perfect day to forget to take claritin" is 1

## 3) Prediction on custom Entity extraction mode

To create an entity extraction model, use a corpus of biomedical research abstracts that mention hundreds of diseases and concepts. The resulting model identifies these medical entities in other documents.

Call the function and pass the txt and entity extraction model path to extract the entity

In [52]:
get_prediction('entity2.txt','projects/my-project-12195/locations/us-central1/models/TEN7001657060328734720')

payload {
  annotation_spec_id: "6712906615177084928"
  display_name: "SpecificDisease"
  text_extraction {
    score: 0.9993165135383606
    text_segment {
      start_offset: 36
      end_offset: 43
      content: "obesity"
    }
  }
}
payload {
  annotation_spec_id: "227723151763570688"
  display_name: "Modifier"
  text_extraction {
    score: 0.9985511898994446
    text_segment {
      start_offset: 394
      end_offset: 412
      content: "mesenchymal tumour"
    }
  }
}
payload {
  annotation_spec_id: "9018749624390778880"
  display_name: "DiseaseClass"
  text_extraction {
    score: 0.9135051369667053
    text_segment {
      start_offset: 436
      end_offset: 452
      content: "fat-cell tumours"
    }
  }
}
payload {
  annotation_spec_id: "9018749624390778880"
  display_name: "DiseaseClass"
  text_extraction {
    score: 0.9969135522842407
    text_segment {
      start_offset: 455
      end_offset: 462
      content: "lipomas"
    }
  }
}
payload {
  annotation_spec_id: "671

**Result**

In the above output

"Obesity" belongs to Class "SpecificDisease" with score 0.9993165135383606

"mesenchymal tumour" belongs to class "Modifier" with score 0.9985511898994446

"fat-cell tumours" belongs to class "DiseaseClass" with score of 0.9135051369667053

"lipomas" belongs to class "DiseaseClass" with score of 0.9969135522842407

"obese" belongs to class "Modifier" with score of 0.9989087581634521

"partial or complete deficiency of Hmgic" belongs to "CompositeMention" with score of 0.9989087581634521

"leptin deficiency" belongs to class "SpecificDisease" with score of 0.9995229244232178
