# Pygooglenews Tutorial

I'm using a conda environment with Python version 3.8.0. To install pygooglenews, run

`pip install pygooglenews --upgrade`

If you're having issues with the `feedparser` dependency, run 

`conda install feedparser==5.2.1 --no-cache-dir` 

and then rerun the pygooglenews installation command. In this notebook, I will make a tutorial on how to use pygooglenews, while I get used to it myself.

In [1]:
from pygooglenews import GoogleNews
import pandas as pd
import datetime

In [2]:
# Initialize a GoogleNews instance.
gn = GoogleNews(lang="en", country="US")

This package uses the RSS feed of Google News, and it heavily relies on the Feedparser package to parse the RSS feed, which is an XML page. Simple functionality of this library is derived direcrtly from the RSS feed. 

## Top Stories

`top_news()` returns the top stories for the selected country and language that are defined in the `GoogleNews` class. The rreturned object returns a dictionary containing keys `feed` of type FeedParserDict and `entries` list of articles found with all data parsed.

In [None]:
top = gn.top_news(proxies=None, scraping_bee=None)

## Stories by Topic

The returned dictionary contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed. Accepted topics are 
- WORLD
- NATION
- BUSINESS
- TECHNOLOGY
- ENTERTAINMENT
- SCIENCE
- SPORTS
- HEALTH

In [10]:
business = gn.topic_headlines(topic="BUSINESS", proxies=None, scraping_bee=None)

## Stories by Geolocation

Returns similar formatted dictionary as the previous two.

In [49]:
ny = gn.geo_headlines("new york", proxies=None, scraping_bee=None)

## Stories by Query

Returns similar formatted dictionary.

In [None]:
# This raises Exception "Could not parse your date."
# "bad escape \d at position 7."
apple = gn.search(query="APPL", helper=True, when=None, from_="2005-01-01", 
                  to_="2005-01-01", proxies=None, scraping_bee=None)

In [4]:
# Parsing the date only changes the query; this works.
apple = gn.search(query="intitle:Apple after: 2012-01-01 before: 2012-01-01", 
                  when=None, helper=True, proxies=None, scraping_bee=True)
print(apple["feed"])

Exception: ScrapingBee status_code: 400 {"errors":{"query":{"custom_google":["If you wish to scrape Google, use the custom_google=True parameter. ! Each requests                     will costs 20 credits !"]}}}


There are additional terms a user can provide to the query to specify the search further.

In [43]:
# OR: returns results that contain either APPL or MSFT or both.
a = gn.search(query="APPL OR MSFT")

In [44]:
# Exclude [-]: restricts results to ones that do not contain a word.
b = gn.search(query="APPL -MSFT")

In [46]:
# Include [+]: the word must occur in all results.
c = gn.search(query="APPL +MSFT")

In [26]:
# In title [intitle:]: search for articles that mention word in title
d  = gn.search(query="intitle:APPL")
d["entries"]

[{'title': 'Apple Price Prediction: APPL Stock Price Rises, Will It Sustain? - The Coin Republic',
  'title_detail': {'type': 'text/plain',
   'language': None,
   'base': '',
   'value': 'Apple Price Prediction: APPL Stock Price Rises, Will It Sustain? - The Coin Republic'},
  'links': [{'rel': 'alternate',
    'type': 'text/html',
    'href': 'https://news.google.com/rss/articles/CBMiaWh0dHBzOi8vd3d3LnRoZWNvaW5yZXB1YmxpYy5jb20vMjAyMy8wNy8yMC9hcHBsZS1wcmljZS1wcmVkaWN0aW9uLWFwcGwtc3RvY2stcHJpY2UtcmlzZXMtd2lsbC1pdC1zdXN0YWluL9IBAA?oc=5'}],
  'link': 'https://news.google.com/rss/articles/CBMiaWh0dHBzOi8vd3d3LnRoZWNvaW5yZXB1YmxpYy5jb20vMjAyMy8wNy8yMC9hcHBsZS1wcmljZS1wcmVkaWN0aW9uLWFwcGwtc3RvY2stcHJpY2UtcmlzZXMtd2lsbC1pdC1zdXN0YWluL9IBAA?oc=5',
  'id': 'CBMiaWh0dHBzOi8vd3d3LnRoZWNvaW5yZXB1YmxpYy5jb20vMjAyMy8wNy8yMC9hcHBsZS1wcmljZS1wcmVkaWN0aW9uLWFwcGwtc3RvY2stcHJpY2UtcmlzZXMtd2lsbC1pdC1zdXN0YWluL9IBAA',
  'guidislink': False,
  'published': 'Thu, 20 Jul 2023 20:03:00 GMT',
  'published_par

There are a few more such as phrase term (searching a famous quote), allintext, allintitle, inurl, and allinurl.

We should find a way to use pygooglenews with scraping bee, proxies, or another service that allows us to rotate proxies.