# Text Analysis of News
<font color="red"><b>This is NOT an Official Google Product and is only for education!!!</b></font>
<br><br>
Google Cloud Natural Language reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app.

In this example, we will use [Natural Language API](https://cloud.google.com/natural-language/) to do complete text analysis of News Headlines & Abstracts from New York Times (Data is gathered from [Public API of New York Times](https://developer.nytimes.com/)). 

In [None]:
!gsutil cp gs://fox_workshop/news.csv .

In [None]:
import pandas as pd
import re
import time
df = pd.read_csv("news.csv")
df.drop_duplicates(subset=['title', 'abstract','section'], keep=False)
df.head()

<br><br>
## Analyzing News Headlines for Entities
Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities. Entity analysis is performed with the analyzeEntities method. For information on which languages are supported by the Natural Language API, see [Language Support](https://cloud.google.com/natural-language/docs/languages).

In [None]:
def analyze_title(text):
    """Detects entities in the text."""
    client = language.LanguageServiceClient()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects entities in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    entities = client.analyze_entities(document).entities

    # entity types from enums.Entity.Type
    entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
                   'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')
    entities_people = []
    entities_locations = []
    entities_organizations = []
    for entity in entities:
        if entity_type[entity.type] == 'PERSON' and  entity.metadata.get('wikipedia_url', '-') != '-':
          entities_people.append(entity.name)
        if entity_type[entity.type] == 'LOCATION':
          entities_locations.append(entity.name)
        if entity_type[entity.type] == 'ORGANIZATION':
          entities_organizations.append(entity.name)
    return entities_people,entities_locations,entities_organizations

<br><br>
## Sentiment Analysis for News Headlines

Sentiment Analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral. Sentiment analysis is performed through the analyzeSentiment method
<br><br><br>
### Understanding the response
The response has two elements:
* score of the sentiment ranges between -1.0 (negative) and 1.0 (positive) and corresponds to the overall emotional leaning of the text.
* magnitude indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. 

Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes).

In [None]:
def sentiment_text(text):
    """Detects sentiment in the text."""
    client = language.LanguageServiceClient()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)
    sentiment = client.analyze_sentiment(document).document_sentiment
    return sentiment.score

<br><br>
### Now lets Analyze Entities in our News Headlines

In [None]:
# Import Google Cloud Libraries for NLP
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
import six

for index, row in df.iterrows():
  individuals,locations,organizations = analyze_title(row['title'])
  df.loc[index,'individuals'] = ', '.join(individuals)
  df.loc[index,'locations'] = ', '.join(locations)
  df.loc[index,'organizations'] = ', '.join(organizations)

In [None]:
df[df['individuals'].str.contains('Trump') &  df['sentiment_score_title'] != 0]

<br><br>
### Now lets Analyze Sentiment in our News Headlines

In [None]:
for index, row in df.iterrows():
  sentiment_score_title = sentiment_text(row['title'])
  df.loc[index,'sentiment_score_title'] = float(sentiment_score_title)
  time.sleep(.300)

In [None]:
df

<br><br>
## BIGQUERY

BigQuery is Google's serverless, highly scalable, low cost enterprise data warehouse designed to make all your data analysts productive. Because there is no infrastructure to manage, you can focus on analyzing data to find meaningful insights using familiar SQL and you don't need a database administrator. BigQuery enables you to analyze all your data by creating a logical data warehouse over managed, columnar storage as well as data from object storage, and spreadsheets. BigQuery makes it easy to securely share insights within your organization and beyond as datasets, queries, spreadsheets and reports. BigQuery allows organizations to capture and analyze data in real-time using its powerful streaming ingestion capability so that your insights are always current.. 
<br>
* Learn more [here](https://cloud.google.com/bigquery/)
* Quick Video is [Here](https://www.youtube.com/watch?time_continue=4&v=eyBK9nj-7AA) In case you dont like reading:) 

<br>
Below we will insert our DataFrame into BigQuery for further analysis


In [None]:
import google.datalab.bigquery as bq

bigquery_dataset_name = 'news_feed_0622'
bigquery_table_name = 'news_entity_sent_headlines'

# Define BigQuery dataset and table
dataset = bq.Dataset(bigquery_dataset_name)
table = bq.Table(bigquery_dataset_name + '.' + bigquery_table_name)

# Create BigQuery dataset
if not dataset.exists():
  print ("Dataset Not Found in BigQuery!! Creating One!!")
  dataset.create()

# Create or overwrite the existing table if it exists
table_schema = bq.Schema.from_data(df)
if not table.exists():
  print ("Table Not Found in BigQuery!! Creating One!!")
  table.create(schema = table_schema, overwrite = True)

## Insert 
Inserting the dataframe we created into BigQuery

In [None]:
table.insert(df)

<br><br>
## Plotting Queries
You can run SQL Queries from BigQuery & plot the results. More examples can be found [here](https://cloud.google.com/bigquery/docs/visualize-datalab)

In [None]:
%%bq query --name section_sentiment_avg 
SELECT section, avg(sentiment_score_title) AS sentiment
## ENTER YOUR OWN Project ID, DataSet Name & Table Name Below 
## FROM `<PROJECT ID>.<DATASET NAME>.<TABLE NAME>`
FROM `ml-workshop-198917.news_feed_0622.news_entity_sent_headlines`
WHERE sentiment_score_title != 0
GROUP BY section
ORDER BY sentiment DESC

In [None]:
%chart columns --data section_sentiment_avg --fields section,sentiment

In [None]:
%%bq query
SELECT title, sentiment_score_title AS sentiment
FROM `ml-workshop-198917.news_feed_0622.news_entity_sent_headlines`
WHERE sentiment_score_title != 0 and individuals LIKE "%Trump%"
ORDER BY sentiment

In [None]:
%%bq query
SELECT title, sentiment_score_title AS sentiment
FROM `ml-workshop-198917.news_feed_0622.news_entity_sent_headlines`
WHERE sentiment_score_title != 0 and organizations LIKE "%Netflix%"
ORDER BY sentiment

<br><br>
## BigQuery to DataFrame
Below is an example of how you can convert BigQuery output to pandas dataframe

In [None]:
query="""
SELECT
  title,
  abstract,
  section
FROM `ml-workshop-198917.news_feed_0622.news_entity_sent_headlines`
"""

import google.datalab.bigquery as bq
df_news = bq.Query(query).execute().result().to_dataframe()
df_news.head()

<br> <br>
# Bonus Lab - Analyze realtime tweets with NLP API & BigQuery

Explore this [tutorial](https://github.com/vcarpenter/google_cloud_machine_learning_api#natural-language-api-bigquery-demo) on streaming real time tweets from twitter to NLP API and saving them in BigQuery for deep analysis. Codebase can be found [here](https://github.com/vcarpenter/google_cloud_machine_learning_api/tree/master/natural-language)