In [1]:
import json
import requests
from pprint import pprint  # prettyprint

Now we get the result from python.org by imeplementing requests.get() method

In [2]:
r = requests.get('https://www.python.org/')
print(r)

<Response [200]>


Status Code is the message sent by the server to indicate that the current status of the request.

200: Success<br>
300: Redirected<br>
400: Client Error<br>
500: Server Error<br>
404: Page Not Found Error<br>

Normally code that is less than 400 is regarded as a successful request.

In [3]:
print(r.status_code)

200


Use `dir(r)` to retrieve the attribuate of the request.

In [4]:
r_attribs = [c for c in dir(r) if not c.startswith("_")]

## Content And Text
We can use `r.content` or `r.text` to read the raw text of the website.

In [5]:
print(r.content[:100])

b'<!doctype html>\n<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->\n<!-'


In [6]:
print(r.text[:500])

<!doctype html>
<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->
<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->

<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">

    <link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jqu


## Scraping Images From Website

It is basically the same as requesting info in the previous blocks. Yet if we'd like to download the image, we have to write the image in wb(write byte) mode with given filename, file.png for instance in this example.

In [7]:
r2 = requests.get('https://s.yimg.com/ny/api/res/1.2/yhOnw.7ddMaRneVo.JBlvw--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MDtjZj13ZWJw/https://s.yimg.com/os/creatr-uploaded-images/2021-09/4fd0c420-1c3c-11ec-b1ff-ea1868351416')

with open("No_color_2.png", "wb") as f:
    f.write(r2.content)

## Headers

Headers normally contains metadata, you can regard it as a JSON content with all the important infomation. We use `r.headers` to retrieve JSON object, and use `r.headers.items()` to extract a collection of keys and values.

In [8]:
for k, v in r.headers.items():
    print(k, ":", v)

Connection : keep-alive
Content-Length : 49890
Server : nginx
Content-Type : text/html; charset=utf-8
X-Frame-Options : DENY
Via : 1.1 vegur, 1.1 varnish, 1.1 varnish
Accept-Ranges : bytes
Date : Fri, 08 Jul 2022 09:45:39 GMT
Age : 3286
X-Served-By : cache-iad-kiad7000025-IAD, cache-lcy19257-LCY
X-Cache : HIT, HIT
X-Cache-Hits : 395, 5
X-Timer : S1657273540.898550,VS0,VE0
Vary : Cookie
Strict-Transport-Security : max-age=63072000; includeSubDomains


## Query Parametrization
Just like injection attack prevention in SQL, we can pack the parameters into a dictionary or tuple of (key, value)if you wish, and combine with the designated website to get the parametrized request.

The format would be `request.get("YourWebsite.com", params=YourParams)`

We can also check the url of the parametrized query by using `r.url`

In [9]:
# Setting parameters
parameter= {'page':5 , 'count':10}
r=requests.get('http://httpbin.org/', params=parameter)
print(r.text[:600])

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <title>httpbin.org</title>
    <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,700|Source+Code+Pro:300,600|Titillium+Web:400,600,700"
        rel="stylesheet">
    <link rel="stylesheet" type="text/css" href="/flasgger_static/swagger-ui.css">
    <link rel="icon" type="image/png" href="/static/favicon.ico" sizes="64x64 32x32 16x16" />
    <style>
        html {
            box-sizing: border-box;
            overflow: -moz-scrollbars-vertical;
            overflow-y: scroll;
        }

        *,
        


In [10]:
r.url

'http://httpbin.org/?page=5&count=10'

## Submitting Data To Server

### Post

Similar to the previous section, we need a collection of parameters to determine what to submit for server to handle. However, this time, we use `request.post("YourWebsite.com", params=YourParams)` to implement the submitting process.

In [11]:
# POST request
param = { 'Name':'Jhon', 'Email': 'John@mail.com'}
r = requests.post('http://httpbin.org/post', data=param)

# As we are getting a json response, instead of using the text command, I am using json().
pprint(r.json())

{'args': {},
 'data': '',
 'files': {},
 'form': {'Email': 'John@mail.com', 'Name': 'Jhon'},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate, br',
             'Content-Length': '31',
             'Content-Type': 'application/x-www-form-urlencoded',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.27.1',
             'X-Amzn-Trace-Id': 'Root=1-62c7fcc4-486aeee81e2258c971c9cade'},
 'json': None,
 'origin': '31.205.121.71',
 'url': 'http://httpbin.org/post'}


### Put

`r.put()` and `r.post()` has similar functionality. When using r.put, we impose the new data to the server, causing the existing data replaced.

In [14]:
r = requests.put('https://httpbin.org/put', data ={'name':'abcd'})
print(r) 
pprint(r.json())

<Response [200]>
{'args': {},
 'data': '',
 'files': {},
 'form': {'name': 'abcd'},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate, br',
             'Content-Length': '9',
             'Content-Type': 'application/x-www-form-urlencoded',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.27.1',
             'X-Amzn-Trace-Id': 'Root=1-62c7fd55-5f95081d18f7a76c2636aeed'},
 'json': None,
 'origin': '31.205.121.71',
 'url': 'https://httpbin.org/put'}


## Delete
The `r.delete("YourWebsite.com", data)` deletes the entities that are from the server. There are 3 common signals of respond when sending a delete request:

1. 200: [OK] The request is received by the server and the data has been removed.
2. 202: [Accepted] The request is received yet not enacted.
3. 204: [NoContent] The request is received yet no data found.

Note that even though the request has been sent to the server, the action may not always been executed.

In [19]:
r = requests.delete('https://httpbin.org/delete', data ={'name':'abcd'}) 
print(r) 
pprint(r.json())

<Response [200]>
{'args': {},
 'data': '',
 'files': {},
 'form': {'name': 'abcd'},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate, br',
             'Content-Length': '9',
             'Content-Type': 'application/x-www-form-urlencoded',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.27.1',
             'X-Amzn-Trace-Id': 'Root=1-62c7feb7-055f2e493f04dbec1c34c0e0'},
 'json': None,
 'origin': '31.205.121.71',
 'url': 'https://httpbin.org/delete'}


## Head

Sometimes we just want the metadata instead of the whole content like `r.get()`, the head function retrieve only the headers of given url.  You can see that `r.text` returns nothing in the third line of code.

In [28]:
r = requests.head('https://httpbin.org/') 
print(r.headers) 
print("The content of the url with head command is: ", r.text)

{'Date': 'Fri, 08 Jul 2022 09:59:46 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '9593', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
The content of the url with head command is:  


## Authentication

The authentication gives the server an idea who you are. To implement this, add `auth=("accountname", "password")` as a parameter into any requests conneted to the server.

### Correct Combination of Credentials

In [31]:
r= requests.get('http://httpbin.org/basic-auth/abcd/efgh', auth=('abcd','efgh'))
print(r)
print(r.text)

<Response [200]>
{
  "authenticated": true, 
  "user": "abcd"
}



### Wrong Credentials

Incorrect combination of credentials will leads to MSG401, meaning that the client is unauthorized, thus reject the request.

In [32]:
r= requests.get('http://httpbin.org/basic-auth/abcd/efgh', auth=('abcdfgh','efgh'))
print(r)

<Response [401]>


## Timeout
The timeout command is often used to handle the latency of either connecting/reading/changing the server.

The usage is as follows:<br>
    &emsp;1\. `r.get("YourWebsite.com", timeout=5)`, this set the waiting time to be 5 second.<br>
    &emsp;2\. `r.get("YourWebsite.com", timeout=(5,10))`, this set the connection time wait to be 5 second, and the reading time to be 10 seconds.

In [35]:
r=requests.get('http://httpbin.org/basic-auth/abcd/efgh', timeout=(3,7))
print(r)

<Response [401]>
