## Python's requests package

### A. Scrapping HTML webpages
Python's requests package allow you to perform HTTP requests. 

In [None]:
import requests 
from requests.exceptions import RequestException

#### 1. Perform HTTP Request

We will begin by creating a "url" variable that contains the URL of the webpage we want to retrieve data from. We can then use the "get" method, which performs the HTTP requests and returns a requests.Response Python object.

In [None]:
url = 'http://composingprograms.com/shakespeare.txt' 
r = requests.get(url) 
print(type(r))

The "text" method returns the text content of the webpage.

In [None]:
# requests.text contains the HTTP response content body
text = r.text
print(text[:200])

In [None]:
# access the response body as bytes (returns binary data)
r.content[:200] 

Methods to access the details of the HTTP requests are also provided.

In [None]:
# The request information is saved as a Python object in r.request: 
print(r.request)

In [None]:
# What were the HTTP request headers? 
request_headers = r.request.headers
print(request_headers)

In [None]:
request_headers['User-Agent']

Other methods allow more information on the HTTP request response such as status code, status message, reponse headers, etc. 

In [None]:
# Which HTTP status code did we get back from the server? 
print(r.status_code) 

In [None]:
# If the response was successful, no Exception will be raised
# otherwise HTTPError will be raised for certain status codes
r.raise_for_status()

If we made a bad request, the above method will return code like 404 or 405 which will raise an http error.

HTTP response codes indicate whether a specific HTTP request has been sucessfully completed. Responses are grouped in five classes and error codes in each class have their own meaning. For more information on status codes, visit https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

In [None]:
# What is the textual status code? 
print(r.reason) 

In [None]:
# What were the HTTP response headers? 
print(r.headers)

#### 2. Working with URLs with Parameters

Urls may contain "query string" that is meant to contain data that does not fit within a URL's normal hierarchical path structure. 

In [None]:
url = 'https://finance.yahoo.com/quote/%5EGSPC/history'
parameters = {'period1':1551648546,
              'period2':1583270946,
              'interval':'1d',
              'filter':'history',
              'frequency':'1d'
             }

# perform HTTP GET request
r = requests.get(url, params=parameters) 

# The HTTP response content: 
print(r.url)
print(r.text)

In recent years, most web frameworks will allow us to define "nice looking" URLs that just include the parameters in the path of a URL, for example, "/product/307/" instead of "products.html?p=307". Hence, there might be dynamic parts in the URL to which the server might respond in different ways.


In some circumstances, requests will try to help you out and encode some characters for you:

In [None]:
import requests
url = 'https://finance.yahoo.com/quote/' + ' ^GSPC '
r = requests.get(url)  
print(r.url)
r.text

#### 3. Timeout

WHen making requests to an external server, your system will need to wait upon the reponse before moving on. By default, requests will wait indefinitely on the response. So, you should almost always specify a timeout duration. You can use timeout parameter to do so. 

In [None]:
import requests
url = 'https://finance.yahoo.com/quote/' + ' ^GSPC '
r = requests.get(url, timeout=1)  
r

You can also submit a timeout for the time you allow to establish a connection to the server. You can pass a tuple where the first item specifies the time to connect to the server and the second item specifies the time to wait on a response once the connection is established. 

Requests also provides methods for timeout exceptions.

In [None]:
import requests
url = 'https://finance.yahoo.com/quote/' + ' ^GSPC '
r = requests.get(url, timeout=(2,5))  
r

### B. Working with APIs

#### 1. Authentications

You can use requests library also with APIs. Usually, API requires an authentication process before you can access it. You can provide this information in "auth" argument.

In [None]:
from requests.auth import HTTPBasicAuth

r = requests.get('https://api.yelp.com/v3/businesses/',
#                  auth=('username','password')
                 auth=HTTPBasicAuth('TOKEN', 'ACCESS_KEY' )
                )


Normally, however, you would not just copy paste your username and password on your code. A more secure and common form of authentication for several web APIs is to use OAuth. The requests-oauthlib allows to easily make OAuth1 authenticated requests.

In order to access the yelp API using OAuth, you will need to first create an app and obtain your private API keys. 
Follow the instructions from their API documentation:
https://www.yelp.com/developers/documentation/v3/authentication

In [None]:
from requests_oauthlib import OAuth1

url = 'https://api.yelp.com/v3/businesses/'
auth = OAuth1('YOUR_APP_KEY', 'YOUR_APP_SECRET',
              'USER_OAUTH_TOKEN', 'USER_OAUTH_TOKEN_SECRET')
requests.get(url, auth=auth)

#### 2. Getting to know json data

In [None]:
url = 'https://api.coindesk.com/v1/bpi/historical/close'
parameters = {'start':'2011-01-01',
              'end':'2019-09-05'
             }
r = requests.get(url, parameters)
if r.ok:
    data = r.json()
print(data)

In [None]:
type(data)

In [None]:
data.keys()

In [None]:
data['bpi']

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt
from datetime import datetime as dt

price = list(data['bpi'].values())
period = list(data['bpi'].keys())
date = [dt.strptime(i, '%Y-%m-%d') for i in period] 
plt.plot(date, price)
plt.show()