# UPA
Jakub Zárybnický (xzaryb00), Matěj Mlejnek (xmlejn04) - třetí člen se během semestr rozhodl s námi už dále nekomunikovat, pokračovali jsme ve dvou.

Pro správné fungování notebooku je potřeba mít v Jupyteru mít povolenou/nainstalovanou [integraci s Matplotlib](https://github.com/matplotlib/ipympl) a v prostředí Python kernelu mít nainstalované balíčky:
- psycopg2
- pymongo
- pandas
- matplotlib

# PRO SPRAVNÉ VYGENEROVÁNÍ OBRÁZKŮ POUŽÍJTE JUPYTER


In [1]:
%matplotlib widget
import sys; sys.path.insert(0, '.')

import csv
from datetime import date, datetime
import json
import os
import time

from bson.json_util import dumps
from dateutil.relativedelta import relativedelta
import matplotlib.pyplot as plt
import numpy as np
from psycopg2 import extensions
import pandas as pd
import pandas.io.sql as sqlio
import requests

from db_connects import MONGO_DB_CURRENCIES, MONGO_DB_COL_CURRENCIES, connect_to_postgres, connect_to_mongodb
from scrape import parse

Některé součásti řešení zde nebudeme demonstrovat v celé délce, použijeme funkce pro zpracování vstupních dat nebo pro připojení k databázím, které máme předdefinované v našich knihovních souborech. Hlavní strukturu zde ale nastíníme, počínaje stáhnutím a zpracováním vstupních souborů.

## Stažení zdrojových souborů

In [2]:
scrape_dir = 'scraped/'
if not os.path.isdir(scrape_dir):
    os.mkdir(scrape_dir)
len(os.listdir(scrape_dir))

123

In [3]:
def scrape(base_url, output_dir, start_date, end_date):
    any_downloads = False
    for ordinal in range(start_date.toordinal(), end_date.toordinal()):
        url = base_url + date.fromordinal(ordinal).strftime('%d.%m.%Y')
        path = date.fromordinal(ordinal).strftime('%Y-%m-%d') + '.txt'
        filename = os.path.join(output_dir, path)
        if os.path.isfile(filename):
            continue
        any_downloads = True
        print("Requesting %s..." % url, end='')
        try:
            request = requests.get(url)
            if not request.text:
                print(' Empty!')
                continue
            print(' OK')
            with open(filename, 'w') as handle:
                handle.write(request.text)
            time.sleep(0.2)
        except Exception as ex:
            print(' %s' % ex)
            continue
    if not any_downloads:
        print("All files already present.")

start_date = datetime.today() - relativedelta(months=4)
end_date = datetime.today()
scrape(
    base_url='https://www.cnb.cz/cs/financni-trhy/devizovy-trh/kurzy-devizoveho-trhu/kurzy-devizoveho-trhu/denni_kurz.txt?date=',
    start_date=start_date,
    end_date=end_date,
    output_dir=scrape_dir,
)
print("Got %s input files" % len(os.listdir(scrape_dir)))
print()
with open(scrape_dir + '/' + os.listdir(scrape_dir)[0], 'r') as f:
    print(f.read())

All files already present.
Got 123 input files

09.10.2020 #196
země|měna|množství|kód|kurz
Austrálie|dolar|1|AUD|16,526
Brazílie|real|1|BRL|4,120
Bulharsko|lev|1|BGN|13,862
Čína|žen-min-pi|1|CNY|3,430
Dánsko|koruna|1|DKK|3,643
EMU|euro|1|EUR|27,110
Filipíny|peso|100|PHP|47,490
Hongkong|dolar|1|HKD|2,966
Chorvatsko|kuna|1|HRK|3,579
Indie|rupie|100|INR|31,449
Indonesie|rupie|1000|IDR|1,563
Island|koruna|100|ISK|16,652
Izrael|nový šekel|1|ILS|6,802
Japonsko|jen|100|JPY|21,694
Jižní Afrika|rand|1|ZAR|1,395
Kanada|dolar|1|CAD|17,443
Korejská republika|won|100|KRW|2,007
Maďarsko|forint|100|HUF|7,610
Malajsie|ringgit|1|MYR|5,554
Mexiko|peso|1|MXN|1,081
MMF|ZPČ|1|XDR|32,441
Norsko|koruna|1|NOK|2,496
Nový Zéland|dolar|1|NZD|15,212
Polsko|zlotý|1|PLN|6,065
Rumunsko|leu|1|RON|5,565
Rusko|rubl|100|RUB|29,811
Singapur|dolar|1|SGD|16,958
Švédsko|koruna|1|SEK|2,602
Švýcarsko|frank|1|CHF|25,162
Thajsko|baht|100|THB|74,009
Turecko|lira|1|TRY|2,908
USA|dolar|1|USD|22,983
Velká Británie|libra|1|GBP|29,7

In [4]:
for input_file in parse(scrape_dir):
    print(input_file)
    break

{'date': datetime.datetime(2020, 10, 9, 0, 0), 'currency': {'country': 'Austrálie', 'name': 'dolar', 'code': 'AUD'}, 'lotSize': '1', 'price': '16,526'}


Nyní máme stažené všechny textové/CSV vstupní soubory a zpracované v takovém formátu, že je můžeme přímo vložit do MongoDB bez dalšího zpracování. Do tohoto bodu se veškeré zpracování skládalo z načtení CSV souboru a přidání data ke každému řádku tak, se dá dále zpracovávat.

In [5]:
client = connect_to_mongodb()
collection = client[MONGO_DB_CURRENCIES][MONGO_DB_COL_CURRENCIES]
collection.drop()

res = collection.insert_many(parse(scrape_dir))
print("Loaded %s records to MongoDB" % len(res.inserted_ids))

collection.find_one()

Loaded 4059 records to MongoDB


{'_id': ObjectId('5fceb0e70595952406faa5b4'),
 'date': datetime.datetime(2020, 10, 9, 0, 0),
 'currency': {'country': 'Austrálie', 'name': 'dolar', 'code': 'AUD'},
 'lotSize': '1',
 'price': '16,526'}

Takto vypadají všechny záznamy v MongoDB. Nyní je převedeme to PostgreSQL, konkrétně do normalizovaného formátu ve dvou tabulkách, jedna tabulka měn a jedna tabulka kurzů.

- `Měna = Kód měny (string, primární klíč) x Název (string) x Země (string)`
- `Kurz = Den (date) x Kód měny (cizí klíč) x Normalizovaný kurz (Float)`

(Float sice není ideální reprezentace pro finanční výpočty, ale pro naše účely postačuje.)

In [6]:
conn = connect_to_postgres()

conn.set_isolation_level(extensions.ISOLATION_LEVEL_AUTOCOMMIT)

cursor = conn.cursor()
cursor.execute("DROP TABLE IF EXISTS kurz")
cursor.execute("DROP TABLE IF EXISTS mena")
cursor.execute("CREATE TABLE mena (zeme varchar(100), nazev varchar(100), kod varchar(10) primary KEY)")
cursor.execute(
    "CREATE TABLE kurz (den DATE, kod varchar(10), "
    "CONSTRAINT fk_mena FOREIGN KEY(kod) REFERENCES mena(kod) ON DELETE SET NULL, "
    "normalizovany_kurz FLOAT)"
)

In [7]:
mena_res = collection.find({}, {"currency": 1, "_id": 0}).distinct("currency")
for mena_item in mena_res:
    cursor.execute("INSERT INTO mena VALUES ('{}', '{}', '{}')".format(
        mena_item["country"],
        mena_item["name"],
        mena_item["code"]
    ))

for item in collection.find({}, {"_id": 0}):
    cursor.execute("INSERT INTO kurz VALUES ('{}', '{}', '{}')".format(
        item["date"].strftime("%Y-%m-%d"),
        item["currency"]["code"],
        float(item["price"].replace(',', '.')) / int(item["lotSize"])
    ))

In [8]:
cursor.execute("SELECT * from mena")
print("%s rows" % cursor.rowcount)
for row in cursor:
    print(row)
    break
print()
cursor.execute("SELECT * from kurz")
print("%s rows" % cursor.rowcount)
for row in cursor:
    print(row)
    break

33 rows
('Austrálie', 'dolar', 'AUD')

4059 rows
(datetime.date(2020, 10, 9), 'AUD', 16.526)


Nyní máme všechna data ve strukturované reprezentaci v PostgreSQL a můžeme se pustit do jednotlivých úkolů.

## Úkol A

První úkol, který jsme si ze zadání vybrali, je vytvoření žebříčku měn, které v daném období nejvíce posílily/oslabily.

In [9]:
cursor.execute(
    "select kod, normalizovany_kurz from kurz where den = (SELECT MIN(den) from kurz)"
    " ORDER BY kod ASC"
)
min_hash = dict(cursor)
cursor.execute(
    "select kod, normalizovany_kurz from kurz where den = (SELECT MAX(den) from kurz)"
    " GROUP BY kod, normalizovany_kurz ORDER BY kod ASC"
)
diff = {}
for item in cursor:
    diff[item[0]] = min_hash[item[0]] - item[1]
diff = {k: v for k, v in sorted(diff.items(), key=lambda x: -x[1])}

fig = plt.figure()
x = np.arange(len(diff))
plt.bar(x, height=diff.values())
plt.xticks(x, diff.keys(), rotation=-90);

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [10]:
print("Between %s and %s the best performing currency was %s which changed by %s units." % (
    start_date.date(), end_date.date(), list(diff.items())[0][0], round(list(diff.items())[0][1], 2)
))

Between 2020-08-07 and 2020-12-07 the best performing currency was USD which changed by 0.31 units.


## Úkol 2

Druhý úkol je nalezení skupin měn s podobným chováním (skupiny měn, které obvykle současně posilují/oslabují) pomocí korelační matice.

In [11]:
sql = "SELECT * FROM kurz ORDER BY den ASC"
df = sqlio.read_sql_query(sql, conn, parse_dates="den")
df = df.pivot_table(columns='kod', index="den", values="normalizovany_kurz")
df

kod,AUD,BGN,BRL,CAD,CHF,CNY,DKK,EUR,GBP,HKD,...,PLN,RON,RUB,SEK,SGD,THB,TRY,USD,XDR,ZAR
den,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-08-06,15.888,13.397,4.137,16.638,24.353,3.183,3.517,26.200,29.101,2.855,...,5.945,5.417,0.30138,2.540,16.138,0.71209,3.059,22.125,31.240,1.262
2020-08-07,16.020,13.436,4.155,16.669,24.326,3.196,3.528,26.280,29.080,2.870,...,5.962,5.432,0.30212,2.547,16.220,0.71388,3.091,22.241,31.423,1.269
2020-08-10,15.924,13.388,4.112,16.637,24.269,3.196,3.516,26.185,29.046,2.872,...,5.948,5.415,0.30235,2.547,16.207,0.71482,3.043,22.262,31.411,1.256
2020-08-11,15.941,13.371,4.085,16.706,24.344,3.197,3.512,26.155,29.110,2.864,...,5.942,5.409,0.30522,2.543,16.188,0.71480,3.062,22.194,31.223,1.268
2020-08-12,15.815,13.352,4.095,16.692,24.280,3.194,3.507,26.115,28.866,2.862,...,5.931,5.401,0.30258,2.548,16.153,0.71302,3.034,22.185,31.273,1.273
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-11-30,16.123,13.391,4.122,16.878,24.162,3.325,3.520,26.190,29.148,2.820,...,5.858,5.374,0.28732,2.573,16.340,0.72233,2.811,21.861,31.158,1.422
2020-12-01,16.121,13.416,4.128,16.900,24.216,3.336,3.525,26.240,29.213,2.828,...,5.860,5.387,0.28906,2.569,16.348,0.72461,2.787,21.922,31.351,1.431
2020-12-02,16.121,13.501,4.195,16.909,24.406,3.335,3.548,26.410,29.177,2.823,...,5.904,5.420,0.28919,2.566,16.323,0.72402,2.793,21.887,31.277,1.423
2020-12-03,16.150,13.509,4.193,16.846,24.394,3.322,3.549,26.420,29.246,2.806,...,5.906,5.422,0.29087,2.564,16.284,0.72074,2.783,21.748,31.143,1.425


In [12]:
corr = df.corr()
corr

kod,AUD,BGN,BRL,CAD,CHF,CNY,DKK,EUR,GBP,HKD,...,PLN,RON,RUB,SEK,SGD,THB,TRY,USD,XDR,ZAR
kod,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AUD,1.0,0.830581,0.469135,0.864231,0.777882,0.803441,0.834628,0.831635,0.708502,0.765298,...,0.494267,0.79126,0.014979,0.726132,0.857877,0.751915,-0.190327,0.766223,0.819544,0.60402
BGN,0.830581,1.0,0.218837,0.950046,0.984522,0.833479,0.999578,0.999879,0.822542,0.947429,...,0.52511,0.988002,0.188869,0.808955,0.95461,0.78567,-0.135619,0.947312,0.967763,0.448734
BRL,0.469135,0.218837,1.0,0.20907,0.144362,0.235185,0.227533,0.22191,-0.030606,0.131143,...,0.261377,0.186188,0.201694,0.206714,0.208351,0.245979,0.127408,0.131031,0.185194,0.258776
CAD,0.864231,0.950046,0.20907,1.0,0.93808,0.908475,0.950105,0.950551,0.888014,0.908704,...,0.398503,0.917444,0.021808,0.871349,0.970631,0.857295,-0.289149,0.910027,0.954253,0.600433
CHF,0.777882,0.984522,0.144362,0.93808,1.0,0.801546,0.983113,0.98425,0.810807,0.948238,...,0.508661,0.986031,0.221564,0.804502,0.932793,0.757112,-0.139475,0.948155,0.956359,0.397289
CNY,0.803441,0.833479,0.235185,0.908475,0.801546,1.0,0.829084,0.832854,0.846289,0.76307,...,0.089646,0.750262,-0.258765,0.895076,0.937638,0.957796,-0.530662,0.76635,0.860735,0.83823
DKK,0.834628,0.999578,0.227533,0.950105,0.983113,0.829084,1.0,0.999736,0.81909,0.94553,...,0.53185,0.988055,0.192643,0.805652,0.951751,0.778101,-0.128475,0.945307,0.965821,0.445788
EUR,0.831635,0.999879,0.22191,0.950551,0.98425,0.832854,0.999736,1.0,0.822696,0.947589,...,0.525691,0.988196,0.189186,0.808059,0.954173,0.784499,-0.133922,0.947473,0.967911,0.447845
GBP,0.708502,0.822542,-0.030606,0.888014,0.810807,0.846289,0.81909,0.822696,1.0,0.781841,...,0.208899,0.785219,-0.088797,0.838988,0.871904,0.806603,-0.359386,0.784027,0.848624,0.600869
HKD,0.765298,0.947429,0.131143,0.908704,0.948238,0.76307,0.94553,0.947589,0.781841,1.0,...,0.517598,0.953059,0.240995,0.701015,0.929772,0.74821,-0.011863,0.999948,0.979125,0.316315


In [13]:
corr[corr != 1.0][corr > 0.98].stack()

kod  kod
BGN  CHF    0.984522
     DKK    0.999578
     EUR    0.999879
     HRK    0.983191
     RON    0.988002
CHF  BGN    0.984522
     DKK    0.983113
     EUR    0.984250
     RON    0.986031
DKK  BGN    0.999578
     CHF    0.983113
     EUR    0.999736
     HRK    0.983073
     RON    0.988055
EUR  BGN    0.999879
     CHF    0.984250
     DKK    0.999736
     HRK    0.983500
     RON    0.988196
HKD  USD    0.999948
HRK  BGN    0.983191
     DKK    0.983073
     EUR    0.983500
     RON    0.991832
MYR  SGD    0.988273
PHP  XDR    0.985594
RON  BGN    0.988002
     CHF    0.986031
     DKK    0.988055
     EUR    0.988196
     HRK    0.991832
SGD  MYR    0.988273
USD  HKD    0.999948
XDR  PHP    0.985594
dtype: float64

In [14]:
fig, ax = plt.subplots(figsize=(len(corr) / 3, len(corr) / 3))
cax = ax.matshow(corr, cmap='RdYlGn')
plt.xticks(range(len(corr.columns)), corr.columns, rotation=90);
plt.yticks(range(len(corr.columns)), corr.columns);

# Add the colorbar legend
cbar = fig.colorbar(cax, ticks=[-1, 0, 1], aspect=40, shrink=.8)


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

## Úkol C

Ve třetím úkolu, naším vlastním, jsme se rozhodli zjistit zda nemají jednotlivé dny v týdnu vliv na změnu kurzu.

In [15]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

cursor.execute(
    "SELECT date_part('dow', den::date) as dow, AVG(normalizovany_kurz) FROM kurz GROUP BY dow order by dow"
)
days = {}
for item in cursor:
    day_str = ""
    if (item[0] == 1):
        day_str = "mon"
    elif (item[0] == 2):
        day_str = "tue"
    elif (item[0] == 3):
        day_str = "wed"
    elif (item[0] == 4):
        day_str = "thu"
    elif (item[0] == 5):
        day_str = "fri"
    days[day_str] = item[1]

   
fig = plt.figure()
x = np.arange(len(days))
plt.bar(x, height=days.values())
plt.xticks(x, days.keys(), rotation=-90);

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [16]:
days

{'mon': 8.10535261497326,
 'tue': 8.1323463315508,
 'wed': 8.11074049431818,
 'thu': 8.11117252693603,
 'fri': 8.11452631074379}

Vidíme, že rozdíl mezi jednotlivými dny je téměř zanedbatelný, ač je znát mírný skok mezi hodnotami v pondělí a v úterý.