In [3]:
%matplotlib inline
import matplotlib.pyplot as plt

import requests
import json

import os
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup

import boto3

from IPython.core.display import HTML, display

# Lecture 05 - JSON, XML, HTML, Requests and APIs
by Jan Šíla, Vítek Macháček <br />
March 16th, 2021

### Contents

* Standardized data representation
* JSON
* XML
* Introduction to BeautifulSoup
* Basics of HTML (+ Element Inspection)
* Introduction to Requests (GET vs. POST) and APIs


### Goals:
    
* work with data  online/real-time data
* acquisition, processing - > results
* Today introduction and next week a practical example

## Microservice architecture

Do one thing and do it well.

![Microservice architecture schema](./img/microservice-architecture.png "Microservice Architecture")


## Date exchange formats - JSON, XML

`Language of the internet`

* You can send/receive a message with (almost) any service

* send .docx -> what if I do not have MS Word?
* we need a simple data format which would work on any machine (system agnostic), is general (can write anything) and is ediatable in basic editors



* More complex than simple tables
* Highly structured - if you dont follow the rules, you are out
* Both sides need to understand the structure
* only data. It does not do anything!
* programming language/machine agnostic
* distributed as text/string (to be precise as `bytes` literals) 
* parsed to objects
* Can be persisted as special files, or some data streams from APIs. 
* Human readable
* Hierarchical
* Can be fetched using standard web APIs

### Purpose

1. Communication 
    * All imaginable communication channels
    * Applications within single server/machine
    * Only transferring of data
    * Both sides need to understand the structure

2. Storing
    * self-descriptive
    * human readable
    * also in DBs - SQL, MongoDB etc.

3. Standardization
    * predictability
    * cooperation
    * spillovers from standardization


### Dimensionality problem

* rich information comes at costs of data complexity 
* to interrelate information, you need to high dimensionality (or A LOT of columns)
* Strongly object-oriented


### 1D:
* logs

### 2D: CSVs
* tabular data (like pandas DFs)

### 3+D:
#### XML
* eXtensible Markup Language is a software- and hardware-independent tool for storing and transporting data.
* Officialy defined at 1998, but its roots are even older.
* XML was designed to carry data - with focus on what data is
* HTML was designed to display data - with focus on how data looks
* XML tags are not predefined like HTML tags are
* more verbose than JSON
* can have comments !actually a really cool in useful feature!
* used historically as a transaction format in many areas: 
    * Scientific measurements
    * News information
    * Wheather measurements
    * Financial transactions
* Necessary to use XML parser to use in Python or in JavaScript


### JSON
* JavaScript Object Notation
* often *.json* files
* but also used in the web etc.
* supports standard datatypes - strings, integers, floats, lists
* No comments
* More compact, less verbose
* No closing tags
* Used EVERYWHERE, BUT [NOT LICENSED FOR EVIL](https://www.json.org/license.html). If you want to do evil stuff, use XML instead.
* Native in JavaScript and close to native in Python (dictionary)
* Jupyter Notebooks


* commong pitfals: properly formatted JSON is different to python dict. -> check: https://jsonlint.com/

## Schema of XML or JSON
* defines allowed values etc.

### yml

# JSON

In [64]:
# general representation of a dictionary
# emphasis on accessibility -> key-value ( hash table )
# contains records, lists, or other dictionaries

teachers = [
    {'name':'Jozef Baruník','titles':['doc.','PhDr.','Ph.D.','Bc.','Mgr.'],'ID':1234,'courses':['JEM005','JEM116','JEM059','JEM061']},
    {'name':'Martin Hronec','titles':['Bc.','Mgr.'],'ID':3421,'courses':['JEM005','JEM207']},
    {'name':'Lukáš Vácha'}]

courses = {
    "JEM005":{'name':'Advanced Econometrics','ECTS':6,'teachers':[3421,1234]},
    'JEM207':{'name':'Data Processing in Python','ECTS':5,'teachers':[3421]},
    'JEM116':{'name':'Applied Econometrics','ECTS':6,'teachers':[1234]},
    'JEM059':{'name':'Quantitative Finance I.','ECTS':6,'teachers':[1234,5678]},
    'JEM061':{'name':'Quantitative Finance II.','ECTS':6,'teachers':[1234,5678]}
}
jsondata = {'teachers':teachers,'courses':courses}
jsondata

{'teachers': [{'name': 'Jozef Baruník',
   'titles': ['doc.', 'PhDr.', 'Ph.D.', 'Bc.', 'Mgr.'],
   'ID': 1234,
   'courses': ['JEM005', 'JEM116', 'JEM059', 'JEM061']},
  {'name': 'Martin Hronec',
   'titles': ['Bc.', 'Mgr.'],
   'ID': 3421,
   'courses': ['JEM005', 'JEM207']},
  {'name': 'Lukáš Vácha'}],
 'courses': {'JEM005': {'name': 'Advanced Econometrics',
   'ECTS': 6,
   'teachers': [3421, 1234]},
  'JEM207': {'name': 'Data Processing in Python',
   'ECTS': 5,
   'teachers': [3421]},
  'JEM116': {'name': 'Applied Econometrics', 'ECTS': 6, 'teachers': [1234]},
  'JEM059': {'name': 'Quantitative Finance I.',
   'ECTS': 6,
   'teachers': [1234, 5678]},
  'JEM061': {'name': 'Quantitative Finance II.',
   'ECTS': 6,
   'teachers': [1234, 5678]}}}

https://jsonformatter.curiousconcept.com/

![python and JSON](./img/python_json.png)

In [66]:
js = json.dumps(
    jsondata['courses']
) #json formatted string!




In [70]:
json.loads(js)

{'JEM005': {'name': 'Advanced Econometrics',
  'ECTS': 6,
  'teachers': [3421, 1234]},
 'JEM207': {'name': 'Data Processing in Python',
  'ECTS': 5,
  'teachers': [3421]},
 'JEM116': {'name': 'Applied Econometrics', 'ECTS': 6, 'teachers': [1234]},
 'JEM059': {'name': 'Quantitative Finance I.',
  'ECTS': 6,
  'teachers': [1234, 5678]},
 'JEM061': {'name': 'Quantitative Finance II.',
  'ECTS': 6,
  'teachers': [1234, 5678]}}

In [72]:
pd.read_json(js)

Unnamed: 0,JEM005,JEM207,JEM116,JEM059,JEM061
name,Advanced Econometrics,Data Processing in Python,Applied Econometrics,Quantitative Finance I.,Quantitative Finance II.
ECTS,6,5,6,6,6
teachers,"[3421, 1234]",[3421],[1234],"[1234, 5678]","[1234, 5678]"


In [73]:
pd.DataFrame(jsondata['courses'])

Unnamed: 0,JEM005,JEM207,JEM116,JEM059,JEM061
name,Advanced Econometrics,Data Processing in Python,Applied Econometrics,Quantitative Finance I.,Quantitative Finance II.
ECTS,6,5,6,6,6
teachers,"[3421, 1234]",[3421],[1234],"[1234, 5678]","[1234, 5678]"


In [74]:
dfc = pd.read_json(json.dumps(jsondata['courses']),orient='index')
dfc

Unnamed: 0,name,ECTS,teachers
JEM005,Advanced Econometrics,6,"[3421, 1234]"
JEM207,Data Processing in Python,5,[3421]
JEM116,Applied Econometrics,6,[1234]
JEM059,Quantitative Finance I.,6,"[1234, 5678]"
JEM061,Quantitative Finance II.,6,"[1234, 5678]"


In [None]:
# lets come back to this a little later

## GeoJSON

* One standardized data format for transferring geodata
* Plenty of geodata out there
* see for example http://opendata.iprpraha.cz/CUR/OVZ/OVZ_Klima_ZnecOvzdusi_p/WGS_84/OVZ_Klima_ZnecOvzdusi_p.json

In [75]:
verbose_request = requests.get('http://opendata.iprpraha.cz/CUR/OVZ/OVZ_Klima_ZnecOvzdusi_p/WGS_84/OVZ_Klima_ZnecOvzdusi_p.json')


In [80]:
verbose_request.status_code == 200

True

In [77]:
dir(verbose_request)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

In [None]:
verbose_request.json()

### Convert to python data-types

In [83]:
d = requests.get('http://opendata.iprpraha.cz/CUR/OVZ/OVZ_Klima_ZnecOvzdusi_p/WGS_84/OVZ_Klima_ZnecOvzdusi_p.json').json()

In [89]:
d['features'][0]['properties']

{'OBJECTID': 1,
 'GRIDVALUE': 2,
 'Shape_Length': 0.12654374175241262,
 'Shape_Area': 0.0005407785136354545}

In [90]:
import branca
import folium

colorscale = branca.colormap.linear.YlOrRd_09.scale(0, 5)

def style_function(feature):
    gridvalue = feature['properties']['GRIDVALUE']
    return {
        'fillOpacity': 0.5,
        'weight': 0,
        'fillColor': colorscale(gridvalue)
    }

m = folium.Map(location=[50.085,14.45],zoom_start=11)
folium.GeoJson('http://opendata.iprpraha.cz/CUR/OVZ/OVZ_Klima_ZnecOvzdusi_p/WGS_84/OVZ_Klima_ZnecOvzdusi_p.json',style_function=style_function).add_to(m)
m

# eXtensible Markup Language (XML)

* elements
* attributes
* tags

### Tag
> <>

### Element

In [None]:
#either
'''<element>content</element>'''

#or self-closing (no content)
'''<element />''';
#<br /> 

### Attributes

In [None]:
'''<element attr="value" />''';

![XML tree structure](./img/xml_tree_structure.png)

```xml
<bookstore>
    <book category="fiction">
        <title lang="ENG">Everyday Italian</title>
        <title lang="CZE">AAaAA</title>
        <author>Giada De Laurentis</author>
        <year>2005</year>
        <price>30.00</price>
    </book>
</bookstore>
```

```json
{
    "bookstore":[
        {
            "title":"Everyday Italian",
            "lang":"ENG",
            "author":"Giada de Laurentis",
            "year":2005,
            "price":30
        }
    ]
}
```


Takeaway: JSON and XML are not equivalents and cannot be freely mirrored. Unfortunately.

JSON cannot have multiple tags with different properties ->title_en, title_cze  perhaps

## Navigation
* Xpath
* CSS selectors 
* **BeautifulSoup**

### BeatifulSoup in detail
each BS object represents
* an element
* the position in tree

In [91]:
xml = '''
<?xml version="1.0" encoding="utf-8"?>
<ies_data>
    <courses>
        <course id="JEM005" ects="6" name="Advanced Econometrics">
           <teacher-id>3421</teacher-id>
           <teacher-id>1234</teacher-id>
        </course>
        <course id="JEM207" ects="5" name="Data Processing in Python">
            <teacher-id>3421</teacher-id>
        </course>
            <course id="JEM116" ects="6" name="Applied Econometrics I.">
            <teacher-id>1234</teacher-id>
        </course>
        <course id="JEM059" ects="6" name="Quantitative Finance I.">
            <teacher-id>1234</teacher-id>
            <teacher-id>5678</teacher-id>
        </course>
        <course id="JEM061" ects="6" name="Quantitative Finance II.">
            <teacher-id>1234</teacher-id>
            <teacher-id>5678</teacher-id>
        </course>
    </courses>
    <teachers>
        <teacher teacher-id="3421">
            <name>Martin Hronec</name>
        </teacher>
        <teacher teacher-id="1234">
            <name>Jozef Baruník</name>
        </teacher>
        <teacher teacher-id="5678">
            <name>Lukáš Vácha</name>
        </teacher>
    </teachers>
</ies_data>
'''

#unlike HTML, those tag names are defined by Vitek - no one else 'can' understand them -> flexibility is limited. But same issue with JSON to be fair

soup = BeautifulSoup(xml)

In [None]:
dir(soup)

```find()``` will find a **first** element given the input

```find_all()``` or ```findAll()```  finds a **all** elements given the input

In [94]:
jem059 = soup.find('course',{'id':'JEM059'}) #looking for a tag with attrbitues (optional)


In [95]:
jem059

<course ects="6" id="JEM059" name="Quantitative Finance I.">
<teacher-id>1234</teacher-id>
<teacher-id>5678</teacher-id>
</course>

In [99]:
jem059.findAll('teacher-id')

[<teacher-id>1234</teacher-id>, <teacher-id>5678</teacher-id>]

`soup['attr']` will return the value of attribute 

In [100]:
print(jem059['ects'])
print(jem059['name'])

6
Quantitative Finance I.


In [101]:
soup.findAll('teacher-id')

[<teacher-id>3421</teacher-id>,
 <teacher-id>1234</teacher-id>,
 <teacher-id>3421</teacher-id>,
 <teacher-id>1234</teacher-id>,
 <teacher-id>1234</teacher-id>,
 <teacher-id>5678</teacher-id>,
 <teacher-id>1234</teacher-id>,
 <teacher-id>5678</teacher-id>]

you can also navigate horizontally

In [104]:
jem059.findNext('course').findNext('course')

In [105]:
jem059.findPrevious('course').findPrevious('course')

<course ects="5" id="JEM207" name="Data Processing in Python">
<teacher-id>3421</teacher-id>
</course>

and even upstream!

In [108]:
jem059.parent.parent

<ies_data>
<courses>
<course ects="6" id="JEM005" name="Advanced Econometrics">
<teacher-id>3421</teacher-id>
<teacher-id>1234</teacher-id>
</course>
<course ects="5" id="JEM207" name="Data Processing in Python">
<teacher-id>3421</teacher-id>
</course>
<course ects="6" id="JEM116" name="Applied Econometrics I.">
<teacher-id>1234</teacher-id>
</course>
<course ects="6" id="JEM059" name="Quantitative Finance I.">
<teacher-id>1234</teacher-id>
<teacher-id>5678</teacher-id>
</course>
<course ects="6" id="JEM061" name="Quantitative Finance II.">
<teacher-id>1234</teacher-id>
<teacher-id>5678</teacher-id>
</course>
</courses>
<teachers>
<teacher teacher-id="3421">
<name>Martin Hronec</name>
</teacher>
<teacher teacher-id="1234">
<name>Jozef Baruník</name>
</teacher>
<teacher teacher-id="5678">
<name>Lukáš Vácha</name>
</teacher>
</teachers>
</ies_data>

In [111]:
#get all teacher ids
teacher_ids = [int(t.text) for t in soup.findAll('teacher-id')]
print(teacher_ids)
#get unique
set(teacher_ids)

[3421, 1234, 3421, 1234, 1234, 5678, 1234, 5678]


{1234, 3421, 5678}

In [112]:
course = soup.find('course')
d = {
    'id':course['id'],
    'name':course['name'],
    'ects':course['ects'],
    'teachers':[int(t.text) for t in course.findAll('teacher-id')]
}
d

{'id': 'JEM005',
 'name': 'Advanced Econometrics',
 'ects': '6',
 'teachers': [3421, 1234]}

### Can convert to JSON-like

In [113]:
l = []
for course in soup.findAll('course'):
    d = {'id':course['id'],
         'name':course['name'],
         'ects':course['ects'],
         'teachers':[int(t.text) for t in course.findAll('teacher-id')]}
    l.append(d)
l

[{'id': 'JEM005',
  'name': 'Advanced Econometrics',
  'ects': '6',
  'teachers': [3421, 1234]},
 {'id': 'JEM207',
  'name': 'Data Processing in Python',
  'ects': '5',
  'teachers': [3421]},
 {'id': 'JEM116',
  'name': 'Applied Econometrics I.',
  'ects': '6',
  'teachers': [1234]},
 {'id': 'JEM059',
  'name': 'Quantitative Finance I.',
  'ects': '6',
  'teachers': [1234, 5678]},
 {'id': 'JEM061',
  'name': 'Quantitative Finance II.',
  'ects': '6',
  'teachers': [1234, 5678]}]

### Or in list-comprehension syntax

In [114]:
l = [{
    'id':course['id'],
    'name':course['name'],
    'ects':course['ects'],
    'teachers':[int(t.text) for t in course.findAll('teacher-id')]
} for course in soup.findAll('course')]

In [115]:
l

[{'id': 'JEM005',
  'name': 'Advanced Econometrics',
  'ects': '6',
  'teachers': [3421, 1234]},
 {'id': 'JEM207',
  'name': 'Data Processing in Python',
  'ects': '5',
  'teachers': [3421]},
 {'id': 'JEM116',
  'name': 'Applied Econometrics I.',
  'ects': '6',
  'teachers': [1234]},
 {'id': 'JEM059',
  'name': 'Quantitative Finance I.',
  'ects': '6',
  'teachers': [1234, 5678]},
 {'id': 'JEM061',
  'name': 'Quantitative Finance II.',
  'ects': '6',
  'teachers': [1234, 5678]}]

In [116]:
pd.DataFrame(l)

Unnamed: 0,id,name,ects,teachers
0,JEM005,Advanced Econometrics,6,"[3421, 1234]"
1,JEM207,Data Processing in Python,5,[3421]
2,JEM116,Applied Econometrics I.,6,[1234]
3,JEM059,Quantitative Finance I.,6,"[1234, 5678]"
4,JEM061,Quantitative Finance II.,6,"[1234, 5678]"


# HTML
standard web-page consists of:

* Browser-executed code (`front-end`)
    * HTML "DOM" structure - the website content
        * List of elements that are on website
        * Links to CSS classes, ids and
    * CSS stylesheets - website graphics
    * JavaScripts - website interactivity    

* Server-executed (`back-end`)
    * Server, database, app logic etc.
    * Not available for scraping!
    * May be available as API


## Web-scraping
* client side only
* Navigating HTML DOM by taking advantage of CSS structure

## DOM (Document Object Module):

In [117]:
html = '''
<html>
    <head>
        <title>Sample page</title>
    <script>
        function click_button() {
            alert('Button clicked!')
        }
    </script>
    <style>
        #content div {
            color:black;
        }
        .firstRow {
            background-color:#ddd;
        }

        .normalRow {
            background-color:white;
        }
    </style>
    </head>
    
    <body>
        <div id="header">
            My page header
        </div>
        <div id="table_container">
            <table>
                <tr class="firstRow">
                    <td>name</td>
                    <td>number</td>
                </tr>
                <tr class="normalRow">
                    <td>B</td>
                    <td>2</td>
                </tr>
                <tr class="normalRow">
                    <td>C</td>
                    <td>3</td>
                </tr>
            </table>
        </div>
        <div id="button_container">
            <button id="btn" onclick="click_button()">Click Me!</button>
        </div
    </body>
</html>
'''
display(HTML(html))

0,1
name,number
B,2
C,3


In [119]:
soup = BeautifulSoup(html,'html')
soup


<html>
<head>
<title>Sample page</title>
<script>
        function click_button() {
            alert('Button clicked!')
        }
    </script>
<style>
        #content div {
            color:black;
        }
        .firstRow {
            background-color:#ddd;
        }

        .normalRow {
            background-color:white;
        }
    </style>
</head>
<body>
<div id="header">
            My page header
        </div>
<div id="table_container">
<table>
<tr class="firstRow">
<td>name</td>
<td>number</td>
</tr>
<tr class="normalRow">
<td>B</td>
<td>2</td>
</tr>
<tr class="normalRow">
<td>C</td>
<td>3</td>
</tr>
</table>
</div>
<div id="button_container">
<button id="btn" onclick="click_button()">Click Me!</button>
</div>
</body></html>

In [120]:
rows = soup.findAll('tr',{'class','normalRow'})
rows

[<tr class="normalRow">
 <td>B</td>
 <td>2</td>
 </tr>,
 <tr class="normalRow">
 <td>C</td>
 <td>3</td>
 </tr>]

In [121]:
d = {}

for row in rows:
    key = row.findAll('td')[0].text
    val = int(row.findAll('td')[1].text)
    d[key] = val
pd.Series(d)

B    2
C    3
dtype: int64

In [123]:
d

{'B': 2, 'C': 3}

In [124]:
pd.Series({
    row.findAll('td')[0].text:int(row.findAll('td')[1].text) 
    for row in BeautifulSoup(html).findAll('tr',{'class':'normalRow'})})

B    2
C    3
dtype: int64

In [51]:
soup = BeautifulSoup(html)

In [52]:
row = soup.findAll('tr',{'class':'normalRow'})[0]

In [53]:
row

In [54]:
row.findAll('td')[0].text

In [55]:
int(row.findAll('td')[1].text)

In [125]:
{row.findAll('td')[0].text:int(row.findAll('td')[1].text) for row in soup.findAll('tr',{'class':'normalRow'})}

{'B': 2, 'C': 3}

## HTML Inspection
http://ies.fsv.cuni.cz/cs/node/51

In [57]:
import requests

# requests and internet communication

* `Client` asks/requests questions (your Jupyter client)
* `Server` replies/serve answers (your Jupyter server)


API = *Application Programming Interface*

very general term! Not only used in web communication

## HTTP requests

A most standard webserver communication channel around

A standard HTTP request contains:

* URL 

    * domain
    * route
    * parameters

* Request Type - GET, POST, PUT, DELETE (see below)

* Content specification - 
    * Application/JSON
    * Application/XML
    * text/html
    * text/css

* Content

* Outcoming data (will see below)

* Cookies 

* Status Code:

    * 200 - success
    * 404 - resource does not exist
    * 500 - the server failed during processing your request


1) REST API - use HTTP request and returns JSON

2) SOAP API - use HTTP request and returns XML

3) Website - use HTTP request and returns set of HTML, JavaScript, CSS and other files

### When to use?
* whenever more applications need to communicate
* user-friendly interface for complicated tasks - DEEP AI, Google Maps
* Data - Golemio, OpenStreetMaps

### GET request
* fast
* public
* data flow only one direction
* parameters via request adress

> https://www.google.com/search?q=how+to+understand+url+parameters&rlz=1C1GCEU_csCZ860CZ860&oq=how+to+understand+url+parameters&aqs=chrome..69i57j33i22i29i30l7.5237j0j4&sourceid=chrome&ie=UTF-8


In [None]:
r = requests.get('https://cs.wikipedia.org/wiki/Institut_ekonomick%C3%BDch_studi%C3%AD_Fakulty_soci%C3%A1ln%C3%ADch_v%C4%9Bd_Univerzity_Karlovy')
#plain request - like browser
r.text

In [127]:
soup = BeautifulSoup(r.text,'html')
tags=soup.findAll('span', {'class':"wd"})

In [128]:
tags

[<span class="wd"><span lang="cs">Budova IES FSV UK v Praze v Opletalově ulici</span></span>,
 <span class="wd"><a href="/wiki/Opletalova" title="Opletalova">Opletalova</a>, <a href="/wiki/Praha" title="Praha">Praha</a>, <a href="/wiki/%C4%8Cesko" title="Česko">Česko</a></span>,
 <span class="wd"><span></span><span class="coordinates"><a class="external text" href="//geohack.toolforge.org/geohack.php?language=cs&amp;pagename=Institut+ekonomick%C3%BDch+studi%C3%AD+Fakulty+soci%C3%A1ln%C3%ADch+v%C4%9Bd+Univerzity+Karlovy&amp;params=50.082219444444_N_14.431111111111_E_type:landmark"><span style="white-space:pre">50°4′55,99″ s. š.</span>, <span style="white-space:pre">14°25′52″ v. d.</span></a></span></span>,
 <span class="wd"><span class="sisterproject sisterproject-commons"><span class="sisterproject_image"><a href="/wiki/Wikimedia_Commons" title="Wikimedia Commons"><img alt="Logo Wikimedia Commons" data-file-height="1376" data-file-width="1024" decoding="async" height="16" src="//upload

### POST request
* slow
* private
* both sides can send data

In [139]:
r=requests.post('https://instagram.com', data=json.dumps({'greeting':'hello'}))

In [140]:
display(HTML(r.text))

In [130]:
?requests.post

[0;31mSignature:[0m [0mrequests[0m[0;34m.[0m[0mpost[0m[0;34m([0m[0murl[0m[0;34m,[0m [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mjson[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Sends a POST request.

:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
    object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
[0;31mFile:[0m      /usr/local/lib/python3.7/site-packages/requests/api.py
[0;31mType:[0m      function


In [None]:
r = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=IBM&apikey=demo")
r.json()

## Static pages x Dynamic pages x JavaScript-rendered pages

### Static

* pages that do not get updated instantly
* all information necessary for rendering a website is available after entering the URL
* It may ask the database, but the output is stable.
* all parameters within the adress!
* Typical example:
    
### JavaScript rendered: 
* Defacto static, but you cannot take advantage of HTML/CSS structure

### Dynamic content
* webpage instantly communicates with the webserver and the database
* 
* solution -> Selenium!

### Is this website static or dynamic?

1. Facebook
2. Sreality.cz
3. IES website



## How to chose data source for project

You need to know in advance what data you will download:

1. full or satisfactory access to API
2. the web-page is parsable (prefer not too much javascript)
3. plan to generate all requests

# APIs Example
### Get wiki data using GET

In [None]:
#if time, return to geodata

In [None]:
response = requests.get('https://en.wikipedia.org/wiki/Charles_University')
soup = BeautifulSoup(response.text)
div = soup.find('div',{'id':'mw-content-text'}) #  #mw-content-text > div > p:nth-child(10)texts)
article = ' '.join([p.text for p in div.find_all('p')])
print(article)

# Bonus example:

<img src="http://ies.fsv.cuni.cz/default/file/get/id/31996" height="500" width="300">

In [142]:
client=boto3.client('rekognition')
with open('/notebooks/05_htm_xml_json/img/iespic.jpeg','rb') as f:
    response = client.recognize_celebrities(Image={'Bytes': f.read()})

In [143]:
response

{'CelebrityFaces': [],
 'UnrecognizedFaces': [{'BoundingBox': {'Width': 0.3861386775970459,
    'Height': 0.3658880293369293,
    'Left': 0.2979205846786499,
    'Top': 0.18226316571235657},
   'Confidence': 99.99385070800781,
   'Landmarks': [{'Type': 'mouthLeft',
     'X': 0.41166001558303833,
     'Y': 0.4617924392223358},
    {'Type': 'eyeRight', 'X': 0.591668426990509, 'Y': 0.3395422399044037},
    {'Type': 'mouthRight', 'X': 0.5646594762802124, 'Y': 0.4696613550186157},
    {'Type': 'eyeLeft', 'X': 0.40747493505477905, 'Y': 0.3298698663711548},
    {'Type': 'nose', 'X': 0.49969977140426636, 'Y': 0.39484962821006775}],
   'Pose': {'Roll': 2.916618585586548,
    'Yaw': 1.5092723369598389,
    'Pitch': 9.721925735473633},
   'Quality': {'Brightness': 90.49933624267578,
    'Sharpness': 89.85481262207031},
   'Emotions': [{'Type': 'CALM', 'Confidence': 99.33566284179688},
    {'Type': 'SAD', 'Confidence': 0.37673279643058777},
    {'Type': 'ANGRY', 'Confidence': 0.11169449239969254},