## Natural Language Understanding
https://natural-language-understanding-demo.mybluemix.net/  
https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/natural_language_understanding_v1.py  

### Run on GCC
Everyone should be able to get access to "free tier" of IBM cloud, offer includes IBM Cloud as well as Watson services

In [1]:
import sys
import os
import json
import re
import pandas as pd

In [2]:
# !pip install ibm_watson

In [3]:
import ibm_watson
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_watson.natural_language_understanding_v1 import Features, EntitiesOptions, \
    KeywordsOptions, CategoriesOptions, ConceptsOptions, EmotionOptions, SentimentOptions, \
    RelationsOptions, SemanticRolesOptions
    
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

In [4]:
authenticator = IAMAuthenticator('your_credentials')
natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='2019-07-12',
    authenticator=authenticator
)

natural_language_understanding.set_service_url('https://gateway.watsonplatform.net/natural-language-understanding/api')

#### Copy files to local FS from GCP bucket

In [5]:
!mkdir -p /home/jupyter/data/tweets

In [6]:
# !gsutil -m cp -n 'gs://msca-bdp-data-open/tweets/jeep_new.txt' '/home/jupyter/data/tweets/'

https://watson-api-explorer.mybluemix.net/apis/natural-language-understanding-v1

#### Concepts

Identify general concepts that are referenced or alluded to in your content. Concepts that are detected typically have an associated link to a DBpedia resource.

#### Entities
Detect important people, places, geopolitical entities and other types of entities in your content. Entity detection recognizes consecutive coreferences of each entity. For example, analysis of the following text would count "Barack Obama" and "He" as the same entity:
"Barack Obama was the 44th President of the United States. He took office in January 2009."

#### Keywords
Determine the most important keywords in your content. Keyword phrases are organized by relevance in the results.

#### Categories
Categorize your content into a hierarchical 5-level taxonomy. For example, "Leonardo DiCaprio won an Oscar" returns "/art and entertainment/movies and tv/movies" as the most confident classification.

#### Sentiment
Determine whether your content conveys postive or negative sentiment. Sentiment information can be returned for detected entities, keywords, or user-specified target phrases found in the text.

#### Emotion
Detect anger, disgust, fear, joy, or sadness that is conveyed by your content. Emotion information can be returned for detected entities, keywords, or user-specified target phrases found in the text.

#### Relations
Recognize when two entities are related, and identify the type of relation. For example, you can identify an "awardedTo" relation between an award and its recipient.

#### Semantic Roles
Parse sentences into subject-action-object form, and identify entities and keywords that are subjects or objects of an action.

## Analyzing text

In [8]:
response = natural_language_understanding.analyze(
            language = "en", text = text,
            features = Features(
                        entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "entities": [
    {
      "type": "Company",
      "text": "BMW",
      "sentiment": {
        "score": 0.938528,
        "label": "positive"
      },
      "relevance": 0.958279,
      "emotion": {
        "sadness": 0.19455,
        "joy": 0.595719,
        "fear": 0.069535,
        "disgust": 0.050423,
        "anger": 0.046226
      },
      "disambiguation": {
        "subtype": [
          "Organization",
          "Engine",
          "AutomobileCompany",
          "AwardWinner"
        ],
        "name": "BMW",
        "dbpedia_resource": "http://dbpedia.org/resource/BMW"
      },
      "count": 1,
      "confidence": 0.997132
    },
    {
      "type": "Company",
      "text": "Benz",
      "sentiment": {
        "score": 0.938528,
        "label": "positive"
      },
      "relevance": 0.854734,
      "emotion": {
        "sadness": 0.19455,
        "joy": 0.595719,
   

In [9]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "keywords": [
    {
      "text": "world\u2019s largest luxury automaker",
      "sentiment": {
        "score": 0.938528,
        "label": "positive"
      },
      "relevance": 0.880562,
      "emotion": {
        "sadness": 0.19455,
        "joy": 0.595719,
        "fear": 0.069535,
        "disgust": 0.050423,
        "anger": 0.046226
      },
      "count": 1
    },
    {
      "text": "last year",
      "sentiment": {
        "score": 0.938528,
        "label": "positive"
      },
      "relevance": 0.792378,
      "emotion": {
        "sadness": 0.19455,
        "joy": 0.595719,
        "fear": 0.069535,
        "disgust": 0.050423,
        "anger": 0.046226
      },
      "count": 1
    },
    {
      "text": "smaller cars",
      "sentiment": {
        "score": 0.938528,
        "label": "positive"
      },
      "relevance": 0.749198,
      "emotion": {
        "sadne

In [10]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "concepts": [
    {
      "text": "BMW",
      "relevance": 0.888452,
      "dbpedia_resource": "http://dbpedia.org/resource/BMW"
    },
    {
      "text": "Mercedes-Benz",
      "relevance": 0.888266,
      "dbpedia_resource": "http://dbpedia.org/resource/Mercedes-Benz"
    },
    {
      "text": "Automotive industry",
      "relevance": 0.882851,
      "dbpedia_resource": "http://dbpedia.org/resource/Automotive_industry"
    },
    {
      "text": "Luxury vehicle",
      "relevance": 0.852593,
      "dbpedia_resource": "http://dbpedia.org/resource/Luxury_vehicle"
    },
    {
      "text": "Karl Benz",
      "relevance": 0.780259,
      "dbpedia_resource": "http://dbpedia.org/resource/Karl_Benz"
    },
    {
      "text": "Diesel engine",
      "relevance": 0.765713,
      "dbpedia_resource": "http://dbpedia.org/resource/Diesel_engine"
    },
    {
      "text": "World",
    

In [11]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        categories = CategoriesOptions())).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 2
  },
  "relations": [
    {
      "type": "basedIn",
      "sentence": "After overtaking BMW last year as the world's largest luxury automaker, Mercedes-Benz is turning its attention to a newer, untapped audience: millennials seeking smaller cars as they progress through the stages of early adulthood.",
      "score": 0.808415,
      "arguments": [
        {
          "text": "automaker",
          "location": [
            61,
            70
          ],
          "entities": [
            {
              "type": "Organization",
              "text": "Mercedes-Benz",
              "disambiguation": {
                "subtype": [
                  "Commercial"
                ]
              }
            }
          ]
        },
        {
          "text": "world",
          "location": [
            38,
            43
          ],
          "entities": [
            {
              "type": "Geopolitical

In [12]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "sentiment": {
    "document": {
      "score": 0.938528,
      "label": "positive"
    }
  },
  "language": "en"
}


In [13]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "the world",
        "keywords": [
          {
            "text": "world"
          }
        ]
      },
      "sentence": "After overtaking BMW last year as the world\u2019s largest luxury automaker, Mercedes-Benz is turning its attention to a newer, untapped audience: millennials seeking smaller cars as they progress through the stages of early adulthood.",
      "object": {
        "text": "largest luxury automaker",
        "keywords": [
          {
            "text": "largest luxury automaker"
          }
        ]
      },
      "action": {
        "verb": {
          "text": "has",
          "tense": "present"
        },
        "text": "s",
        "normalized": "s"
      }
    },
    {
      "subject": {
        "text": "Mercedes-Benz",
        "keywords": [
          {
            "text": "Mercedes-Benz"
          }
        ],

In [14]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "emotion": {
    "document": {
      "emotion": {
        "sadness": 0.19455,
        "joy": 0.595719,
        "fear": 0.069535,
        "disgust": 0.050423,
        "anger": 0.046226
      }
    }
  }
}


In [15]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "categories": [
    {
      "score": 0.996276,
      "label": "/automotive and vehicles/cars"
    },
    {
      "score": 0.983816,
      "label": "/automotive and vehicles/cars/performance vehicles"
    },
    {
      "score": 0.962343,
      "label": "/automotive and vehicles/vehicle brands/mercedes-benz"
    }
  ]
}


## Analyzing URLS

In [16]:
url='https://www.chicagotribune.com/news/criminal-justice/ct-fatal-chicago-police-shooting-hearing-20190809-kmahnsrqirff5c52aqnjrgl64i-story.html'

response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        entities = EntitiesOptions(sentiment = True, emotion = True),
                        keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 1592,
    "features": 3
  },
  "retrieved_url": "https://www.chicagotribune.com/news/criminal-justice/ct-fatal-chicago-police-shooting-hearing-20190809-kmahnsrqirff5c52aqnjrgl64i-story.html",
  "language": "en",
  "keywords": [
    {
      "text": "Chicago police officer",
      "sentiment": {
        "score": 0,
        "label": "neutral"
      },
      "relevance": 0.736369,
      "emotion": {
        "sadness": 0.490538,
        "joy": 0.042388,
        "fear": 0.17918,
        "disgust": 0.333886,
        "anger": 0.289149
      },
      "count": 1
    },
    {
      "text": "high-speed car chase",
      "sentiment": {
        "score": 0,
        "label": "neutral"
      },
      "relevance": 0.718916,
      "emotion": {
        "sadness": 0.490538,
        "joy": 0.042388,
        "fear": 0.17918,
        "disgust": 0.333886,
        "anger": 0.289149
      },
      "count": 1
    },
    {
      "text": "Officer Jose Diaz"

In [17]:
response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 1592,
    "features": 1
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "A Chicago police officer",
        "keywords": [
          {
            "text": "Chicago police officer"
          }
        ],
        "entities": [
          {
            "type": "Location",
            "text": "Chicago",
            "disambiguation": {
              "subtype": [
                "City"
              ],
              "name": "Chicago",
              "dbpedia_resource": "http://dbpedia.org/resource/Chicago"
            }
          },
          {
            "type": "JobTitle",
            "text": "officer"
          }
        ]
      },
      "sentence": " A Chicago police officer who fatally shot a black teen in a South Side backyard in 2016 testified Friday he believed the male had fired at him moments earlier during a chaotic, high-speed car chase that ended in a collision with his police SUV.",
      "object": {
 

### Semantically difficult examples

In [18]:
url='http://www.cnbc.com/2017/05/11/caterpillar-earnings-will-beat-all-year-long-bank-of-america-says.html'

response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 1037,
    "features": 1
  },
  "retrieved_url": "https://www.cnbc.com/2017/05/11/caterpillar-earnings-will-beat-all-year-long-bank-of-america-says.html",
  "language": "en",
  "concepts": [
    {
      "text": "Bank of America",
      "relevance": 0.955122,
      "dbpedia_resource": "http://dbpedia.org/resource/Bank_of_America"
    },
    {
      "text": "Merrill Lynch",
      "relevance": 0.88865,
      "dbpedia_resource": "http://dbpedia.org/resource/Merrill_Lynch"
    },
    {
      "text": "Private banking",
      "relevance": 0.571594,
      "dbpedia_resource": "http://dbpedia.org/resource/Private_banking"
    },
    {
      "text": "Stock",
      "relevance": 0.551895,
      "dbpedia_resource": "http://dbpedia.org/resource/Stock"
    },
    {
      "text": "Troubled Asset Relief Program",
      "relevance": 0.485637,
      "dbpedia_resource": "http://dbpedia.org/resource/Troubled_Asset_Relief_Program"
    },
    {
      "

In [19]:
text="Caterpillar earnings will 'beat all year long,' Bank of America says"

response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True)
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 68,
    "features": 1
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "Caterpillar earnings",
        "keywords": [
          {
            "text": "Caterpillar earnings"
          }
        ],
        "entities": [
          {
            "type": "Company",
            "text": "Caterpillar"
          }
        ]
      },
      "sentence": "Caterpillar earnings will 'beat all year long,' Bank of America says",
      "action": {
        "verb": {
          "text": "'beat",
          "tense": "future"
        },
        "text": "will 'beat",
        "normalized": "will 'beat"
      }
    },
    {
      "subject": {
        "text": "Bank of America",
        "keywords": [
          {
            "text": "Bank"
          },
          {
            "text": "America"
          }
        ],
        "entities": [
          {
            "type": "Company",
            "text": "Bank of America"
          }
        ]
 

## Expect the explosion of data volumes

In [20]:
url='https://en.wikipedia.org/wiki/University_of_Chicago'

response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        entities = EntitiesOptions(sentiment = True, emotion = True),
                        keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        relations = RelationsOptions(),
                        sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        emotion = EmotionOptions(),
                        categories = CategoriesOptions()
            )).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 5,
    "text_characters": 50000,
    "features": 8
  },
  "sentiment": {
    "document": {
      "score": 0.658094,
      "label": "positive"
    }
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "The University of Chicago (UChicago, U of C, or Chicago)",
        "keywords": [
          {
            "text": "Chicago"
          },
          {
            "text": "University"
          },
          {
            "text": "UChicago"
          }
        ],
        "entities": [
          {
            "type": "Organization",
            "text": "University of Chicago",
            "disambiguation": {
              "subtype": [
                "Location",
                "Company",
                "PeriodicalPublisher",
                "CollegeUniversity"
              ],
              "name": "University of Chicago",
              "dbpedia_resource": "http://dbpedia.org/resource/University_of_Chicago"
            }
          },
         

## Running Tweets through NLU

In [21]:
directory = '/home/jupyter/data/tweets/'
file = 'jeep_new.txt'
path = directory + file

In [22]:
tweets = pd.read_csv(path,sep='\t', names = ['id', 'lang', 'created_at', 'screen_name', \
                                                       'name', 'location', 'retweet_count', 'text'])

tweets = tweets[tweets['text'].str.contains("love|hate", case=False)==True]

tweets = tweets.sample(n=1000)

In [23]:
tweets.head(5)

Unnamed: 0,id,lang,created_at,screen_name,name,location,retweet_count,text
11274,9.235088e+17,tl,Thu Oct 26 11:17:00 +0000 2017,jprlta_,♎,bp🐻🐼,0.0,nakaisip ako bigla ng loveteam name kanina sa ...
46968,9.228689e+17,en,Tue Oct 24 16:54:28 +0000 2017,_________beast,beast,,0.0,A royal pair. 📸: Victoria D. #jeep #itsajeepth...
32751,9.217452e+17,en,Sat Oct 21 14:29:04 +0000 2017,yendijackson,- yend🌴. 🇦🇬,"New York, NY",0.0,RT @YourTravelGroup: Such pretty views whilst ...
35505,9.218368e+17,en,Sat Oct 21 20:33:08 +0000 2017,joshhurleysnuts,jh,,0.0,@realDonaldTrump I jeep hearing about how you ...
14536,9.224361e+17,tl,Mon Oct 23 12:14:33 +0000 2017,josellevelasco,Joselle Velasco,Philippines,0.0,nakakita ng shs couple sa loob ng jeep na sina...


In [24]:
#Take a small sample
#tweets_eng = tweets[tweets['lang']=='en'].reset_index(drop=True).sample(frac=0.01, replace=True)
tweets_eng = tweets[tweets['lang']=='en'].reset_index(drop=True).head(10)

In [25]:
len(tweets_eng)

10

In [26]:
# For NLU we DO NOT have to remove special characters to avoid problems with analysis
tweets_eng['text_clean'] = tweets_eng['text'].map(lambda x: re.sub('[^a-zA-Z0-9 @ . , : - _]', '', str(x)))

In [27]:
pd.set_option('display.max_colwidth', 200)
pd.set_option('display.max_rows', 10)
tweets_eng[['text']].head(5)

Unnamed: 0,text
0,A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/3gEXqREPic
1,RT @YourTravelGroup: Such pretty views whilst on the jeep safari 🚙 heading to #NelsonsDockyard ⛵ @antiguabarbuda #LoveAntiguaBarbuda https:…
2,"@realDonaldTrump I jeep hearing about how you are fucking the middle amd lower class and the uneducated love it, go… https://t.co/3boAgJhYIo"
3,Do your friends love Jeep as much as you do? Tag your off-roading buddy! https://t.co/svhbu7oE4X
4,#JustForYouSweepstakes @sprint I would love to win the Jeep ☺


In [28]:
# Read individual columns
raw_id = tweets_eng["id"].values
raw_text = tweets_eng["text"].values

In [29]:
raw_text

array(['A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/3gEXqREPic',
       'RT @YourTravelGroup: Such pretty views whilst on the jeep safari 🚙 heading to #NelsonsDockyard ⛵ @antiguabarbuda #LoveAntiguaBarbuda https:…',
       '@realDonaldTrump I jeep hearing about how you are fucking the middle amd lower class and the uneducated love it, go… https://t.co/3boAgJhYIo',
       'Do your friends love Jeep as much as you do? Tag your off-roading buddy! https://t.co/svhbu7oE4X',
       '#JustForYouSweepstakes @sprint I would love to win the Jeep ☺',
       'New design for Utah Lovers! #MokiDugway #Utah #SR261 #moab #roadtrip #4x4 #Jeep #mexicanhat #adventure #redrock #4wd https://t.co/6tu15dGkRT',
       'A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/p2P4cvopFm',
       '@eightyeightpark @eightyeightpark Love it! How long have y

In [30]:
# Set up lists to collect sentiments, and scores
sentiment_list = []
score_list = []

# carry over existing values
tweet_id = []
tweet_list = []

In [31]:
count = 0
for raw_id, raw_text in \
zip(raw_id, raw_text):
    response = natural_language_understanding.analyze(
        language = "en", #do not NLU to guess language on Tweets
        text=raw_text,
        
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            )).get_result()

    if "label" in response["sentiment"]["document"]:
        sentiment_list.append(str(response["sentiment"]["document"]["label"]))
    else:
        sentiment_list.append("NA") 

    if "score" in response["sentiment"]["document"]:
        score_list.append(str(response["sentiment"]["document"]["score"]))
    else:
        score_list.append("0") 

    tweet_list.append(raw_text)
    tweet_id.append(raw_id)
    #print(count)
    #print(tweet_list)
    #print(tweet_id)
    count += 1

#### Dealing with nested JSONs

In [32]:
response

{'usage': {'text_units': 1, 'text_characters': 127, 'features': 1},
 'sentiment': {'document': {'score': 0.514657, 'label': 'positive'}},
 'language': 'en'}

In [33]:
response["sentiment"]

{'document': {'score': 0.514657, 'label': 'positive'}}

In [34]:
response["sentiment"]["document"]

{'score': 0.514657, 'label': 'positive'}

In [35]:
response["sentiment"]["document"]["label"]

'positive'

In [36]:
response["sentiment"]["document"]["score"]

0.514657

#### Combining results together

In [37]:
# Create a dataframe from the results
column_names = ["Tweet_Id", "Tweet_Text", "NLU_Sentiment", "NLU_Score"]
nlu_results = [tweet_id, tweet_list, sentiment_list, score_list]
results_dict = dict(zip(column_names,nlu_results))
results = pd.DataFrame.from_dict(results_dict, orient='columns')
results = results[column_names]   # set specific column order

In [38]:
results.head(10)

Unnamed: 0,Tweet_Id,Tweet_Text,NLU_Sentiment,NLU_Score
0,9.228689e+17,A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/3gEXqREPic,neutral,0.0
1,9.217452e+17,RT @YourTravelGroup: Such pretty views whilst on the jeep safari 🚙 heading to #NelsonsDockyard ⛵ @antiguabarbuda #LoveAntiguaBarbuda https:…,positive,0.628498
2,9.218368e+17,"@realDonaldTrump I jeep hearing about how you are fucking the middle amd lower class and the uneducated love it, go… https://t.co/3boAgJhYIo",negative,-0.964076
3,9.236069e+17,Do your friends love Jeep as much as you do? Tag your off-roading buddy! https://t.co/svhbu7oE4X,positive,0.640436
4,9.226445e+17,#JustForYouSweepstakes @sprint I would love to win the Jeep ☺,positive,0.964391
5,9.225307e+17,New design for Utah Lovers! #MokiDugway #Utah #SR261 #moab #roadtrip #4x4 #Jeep #mexicanhat #adventure #redrock #4wd https://t.co/6tu15dGkRT,neutral,0.0
6,9.225938e+17,A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/p2P4cvopFm,neutral,0.0
7,9.236309e+17,@eightyeightpark @eightyeightpark Love it! How long have you had your Jeep?,positive,0.675406
8,9.228586e+17,"@fonzdelivers he drives a jeep, has a beard , trynna give some love , and he is the nicest nigga around",positive,0.744969
9,9.228842e+17,Rise and shine. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepfamily #OIIIIIIIO https://t.co/R1yoieiLjL,positive,0.514657


In [39]:
# Data Export
outfile = "jeep_with_sentiment.xlsx"
outpath = directory + outfile

results.to_excel(outpath)

In [40]:
!ls -l /home/jupyter/data/tweets/*.xlsx

-rw-r--r-- 1 root root 37889 Nov 28 17:23 /home/jupyter/data/tweets/jeep_adv_sentiment.xlsx
-rw-r--r-- 1 root root  6495 Nov 29 17:29 /home/jupyter/data/tweets/jeep_with_sentiment.xlsx


In [41]:
import datetime
import pytz

datetime.datetime.now(pytz.timezone('US/Central')).strftime("%a, %d %B %Y %H:%M:%S")

'Sun, 29 November 2020 11:29:42'