# HomeWork #3 (30%)

* You can check yourself within checking-cells below each task (if the cell does not provide an error - you get the full points for particular task)
* Fill-in and modify cells with `# YOUR FUNCTION HERE`
* In case of same solutions detection (90% and more similarity according to antiplagiarism algorithm) - you will obtain the half from your points

In [2]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import json
import warnings
warnings.filterwarnings('ignore')

# Task 1 (6%)

Write the function `city_tz(name)`, that takes the city name as input and returns the string with its timezone (according to data stored in Russian WikiPedia in the same format, if this city is not existing on WikiPedia - return `None`).

In [3]:
def city_tz(name):
    url = f"https://ru.wikipedia.org/wiki/{name}"
    response = requests.get(url)
    if response.status_code != 200:
        return None
    soup = BeautifulSoup(response.content, 'html.parser')
    infobox = soup.find(class_='infobox')
    if infobox is None:
        return None
    rows = infobox.find_all('tr')
    for row in rows:
        th = row.find('th')
        if th and th.text.strip() == 'Часовой пояс':
            td = row.find('td')
            if td:
                return td.text.strip()
    return None


In [4]:
res = [('Абакан', 'UTC+7:00'), 
       ('Анадырь', 'UTC+12:00'), 
       ('Киров (Кировская область)', 'UTC+3:00'), 
       ('Южно-Сахалинск', 'UTC+11:00'), 
       ('Усть-Каменоустюгск', None)]
for city, site in res:
    assert city_tz(city) == site, (site, city_tz(city))

# Task 2 (6%)

Write the function `diff_lat(place1, place2)`, that uses [Yandex Geocoder](https://yandex.com/dev/maps/geocoder/doc/desc/concepts/input_params.html) (create your own api key), finds geographical coordinates of two objects `place1` and `place2`, and returns float number. The float number is the answer on the following question:  By how many degrees the `place2` is farther north than `place1`?

In [5]:
import requests

def diff_lat(place1, place2):
    api_key = "9dc4b168-7bba-4e6c-bfcd-979c8e4f2ff4"

    base_url = "https://geocode-maps.yandex.ru/1.x/?format=json&apikey=" + api_key

    response1 = requests.get(base_url + "&geocode=" + place1)
    response2 = requests.get(base_url + "&geocode=" + place2)

    coordinates1 = response1.json()['response']['GeoObjectCollection']['featureMember'][0]['GeoObject']['Point']['pos']
    coordinates2 = response2.json()['response']['GeoObjectCollection']['featureMember'][0]['GeoObject']['Point']['pos']

    longitude1, latitude1 = map(float, coordinates1.split())
    longitude2, latitude2 = map(float, coordinates2.split())

    return latitude2 - latitude1


In [6]:
assert abs(diff_lat("Москва", "Апатиты") - 11.81) < 0.1
assert abs(diff_lat("Краснодар", "Петропавловск-Камчатский") - 8) < 0.1
assert abs(diff_lat("Геленджик", "Саратов") - 7) < 0.1
assert abs(diff_lat("Саратов", "Геленджик") + 7) < 0.1

# Task 3 (6%)

Use [API Google Books](https://developers.google.com/books/docs/v1/getting_started) with your own api key. Using ISBN, you can get an extended information about particular book: [example](https://www.googleapis.com/books/v1/volumes?q=isbn:9785699648146). Write the function `book_table(isbns)`, that takes the list of ISBN codes as input and returns pandas DataFrame with book title, authors, language and page numbers (if several authors are presented - they should be separated with comma and space). Use example output from the testing cell below.

In [9]:
import requests
import pandas as pd

def book_table(isbns):
    api_key = "AIzaSyB4B_HxgfXaDU9Kmt6EAchuqUAcAKwmRFQ"

    titles = []
    authors_lists = []
    languages = []
    page_counts = []

    for isbn in isbns:
        response = requests.get(f"https://www.googleapis.com/books/v1/volumes?q=isbn:{isbn}&key={api_key}")
        
        book_data = response.json()["items"][0]["volumeInfo"]

        titles.append(book_data["title"])
        authors_lists.append(", ".join(book_data["authors"]))
        languages.append(book_data["language"])
        page_counts.append(book_data["pageCount"])

    df = pd.DataFrame({
        "title": titles,
        "authors": authors_lists,
        "language": languages,
        "pageCount": page_counts
    })

    return df


In [10]:
obtained = book_table(['9781292153964', '9780262035613', '9785457499850'])
expected = pd.DataFrame({'authors': {0: 'Stuart Russell, Peter Norvig',
  1: 'Ian Goodfellow, Yoshua Bengio, Aaron Courville',
  2: 'Рэй Брэдбери'},
 'language': {0: 'en', 1: 'en', 2: 'ru'},
 'pageCount': {0: 1152, 1: 801, 2: 270},
 'title': {0: 'Artificial Intelligence',
  1: 'Deep Learning',
  2: 'Вино из одуванчиков'}})
assert obtained.to_dict() == expected.to_dict()

# Task 4 (6%)

Write the funtion `get_strong(html)`, that takes an html-page as input (`str` format). The function should return the string (text), which locates inside `strong`-tag (assume only one `strong`-tag inside the web-page).

In [11]:
from bs4 import BeautifulSoup

def get_strong(html):
    soup = BeautifulSoup(html, 'html.parser')

    strong_text = soup.find('strong').text

    return strong_text


In [12]:
assert get_strong("<html><body><p>Hello, <strong>World</strong>!") == "World"
html = """<html>
    <body>
        <p>
            Hello,
            <strong>
                World
            </strong>
        </p>
    </body>
</html>"""
assert get_strong(html).strip() == "World"
assert get_strong("<html><body><p>tag &lt;strong&gt; is used in HTML\n to make letters <strong>stronger</strong>") == "stronger"
assert get_strong("<html><body><strong>One\nTwo</strong></body></html>") == "One\nTwo"

# Task 5 (6%)

In order to paste an image in html-file the `<img>`-tag is used. This tag contains also an attribute called `src`, which stores the hyperlink to the image. For example, `<img src="https://upload.wikimedia.org/wikipedia/commons/b/bd/Struthio_camelus_portrait_Whipsnade_Zoo.jpg"/>`. Write the function `all_images_src(html)`, that takes an html-page as input (`str` format). It should return the `list` of hyperlinks (image path locations) for all images on the web-page (preserve the order of each image appearance).

In [13]:
from bs4 import BeautifulSoup

def all_images_src(html):
    # Parse the HTML string with BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')

    # Find all 'img' tags and get their 'src' attributes
    img_srcs = [img['src'] for img in soup.find_all('img')]

    return img_srcs


In [14]:
assert all_images_src('<html><body><img src="https://upload.wikimedia.org/wikipedia/commons/b/bd/Struthio_camelus_portrait_Whipsnade_Zoo.jpg"/>') == ["https://upload.wikimedia.org/wikipedia/commons/b/bd/Struthio_camelus_portrait_Whipsnade_Zoo.jpg"]
assert all_images_src( ('<html><body><IMG src="test.jpg">\n'
                        '<p>Some text\n'
                        '<img SRC=\'well.png\'>\n'
                        '</p></body></html>') ) == ["test.jpg", "well.png"]
assert all_images_src('<html><body><p><a href="link.html">'
                      '<img alt="Just a test image" src="this is a test.jpg"><ul>' + "\n"
                      .join("<li><img src='img%04i.png'></li>" % i for i in range(1000)) + 
                      "</ul></p></body></html>"
                     ) == ['this is a test.jpg'] + ['img%04i.png' % i for i in range(1000)]