# Using library Requests
---
**Author**: Marko Bajec

**Last update**: 24.2.2019

**Description**: the library <code>Requests</code> is a high-level HTTP library for Python. It can be easily used for fetching, as it support many useful features, such as *Keep-Alive & Connection Pooling*, *Sessions with Cookie Persistence*, *Proxy Handling*, *Connection Timeouts*, etc. 

This notebook shows few examples of using <code>Request</code> for fetching pages. 

**Official web page:** http://docs.python-requests.org/. For more details check [here](http://www.python-requests.org/en/latest/api/#classes).

---

In [None]:
import requests
# url samples
url1 = 'http://github.com'
url2 = 'http://www.times.si'
url3 = 'http://www.delo.si'
url4 = 'http://dev.vitabits.org'
url5 = 'http://en.knu.ac.kr/main/main.htm'

### Simple fetch using http GET request

In [None]:
response = requests.get(url2)
print('status code:', response.status_code)
print('url:', response.url)
print('content:', response.text)

### Identifying redirections

In [None]:
response = requests.get(url4, verify=False, allow_redirects=True, timeout=50)
print('status code:', response.status_code)
print('starting url:', url4)
print('ending url:', response.url)
print('history:', response.history)
print('headers:', response.headers)
# note the attribute history. If not empty, it tells what had happened before we got to the last URL, 
# in our case https://www.delo.si/. Remember that we called unsecure http//www.delo.si and not https://www.delo.si

### Blocking redirections

In [None]:
response = requests.get(url3, verify=False, allow_redirects=False, timeout=50)
print('status code:', response.status_code)
print('url:', response.url)
print('history:', response.history)
print('headers:', response.headers)

### http HEAD request

In [None]:
response = requests.head(url2)
print('status code:', response.status_code)
print('url:', response.url)
print('headers:', response.headers)
print('text:', response.text)
# note that since we made http HEAD request, the response.text attribute is empty. 

### Handling exceptions
**Error and exception handling** is of utter importance for crawlers that need to be robust in order to visit a large portion of the web. <code>Requests</code> lib can catch several types of exceptions.  

In [None]:
try:
    response = requests.get(url5, timeout=1)
    print(response.url)
    print(response.status_code)
except requests.HTTPError:
    print('An HTTP error occurred.')
except requests.ConnectionError:
    print('A Connection error occurred.')
except requests.URLRequired:
    print('A valid URL is required to make a request.')
except requests.TooManyRedirects:
    print('Too many redirects.')
except requests.ConnectTimeout:
    print('The request timed out while trying to connect to the remote server.')
except requests.ReadTimeout:
    print('The server did not send any data in the allotted amount of time.')
except requests.Timeout:
    print('The request timed out.')
except requests.RequestException as e:
    print(e)
except:
    print('Unknown error occured!')
    raise

### Basic authentication
<code>HTTP Basic Auth</code> is very common authentication mechanism for web services. <code>Requests</code> supports it straight out of the box.

In [None]:
response = requests.get('https://api.github.com/user', auth=('enter username here', 'enter password here'))
print(response.url)
print(response.status_code)

In [None]:
url = 'https://moji.kd-skladi.net/#/index'
response = requests.get(url, allow_redirects=False, auth=('enter username here', 'enter password here'))
print(response.url)
print(response.status_code)
print(response.headers)
print(response.text)