# Rampart flats

This notebook's a convenient local development tool integrated with python interpreter, interactive browser-based text editor and preliminary started local PostgreSQL DB. Feel free to use this document to easily explore databases filled with [rampart](https://github.com/xXxRisingTidexXx/rampart) miners & parsers. Happy coding!

In [1]:
from warnings import filterwarnings
from pandas import read_sql, DataFrame
from sqlalchemy import create_engine
from shapely.wkb import loads
from numpy import array
from scipy.spatial.distance import cdist
from sklearn.preprocessing import RobustScaler
filterwarnings('ignore', message='numpy.dtype size changed')
filterwarnings('ignore', message='numpy.ufunc size changed')

<br />Lets load the whole *flats* table from the DB. Notice that the hostname equals the DB container name due to the common Docker network.

In [2]:
engine = create_engine('postgres://postgres:postgres@rampart-database:5432/rampart')

In [3]:
with engine.connect() as connection:
    flats = read_sql(
        '''
        select id, price, room_number, st_x(point) as longitude, st_y(point) as latitude
        from flats
        where city = 'Київ'
        ''', 
        connection, 
        index_col=['id']
    )

In [4]:
flats.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 452 entries, 1299 to 3913
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   price        452 non-null    float64
 1   room_number  452 non-null    int64  
 2   longitude    452 non-null    float64
 3   latitude     452 non-null    float64
dtypes: float64(3), int64(1)
memory usage: 17.7 KB


In [5]:
flats.describe()

Unnamed: 0,price,room_number,longitude,latitude
count,452.0,452.0,452.0,452.0
mean,182064.1,2.334071,30.525275,50.440771
std,350068.7,1.120812,0.08295,0.057008
min,25000.0,1.0,30.189363,50.052951
25%,70000.0,1.0,30.48171,50.421083
50%,107400.0,2.0,30.519562,50.444574
75%,169250.0,3.0,30.550017,50.465183
max,4500000.0,9.0,30.92854,50.754404


<br />Now it's the very time to explore the numerical data.

In [6]:
scaler = RobustScaler(quantile_range=(25, 75))
scaler.fit(flats)
candidates = scaler.transform(flats)
preferences = scaler.transform(array([[30000, 1, 30.525688, 50.418102]]))
weights = array([0.07, 0.23, 0.35, 0.35])
flats['cosine'] = cdist(candidates, preferences, 'cosine', w=weights)
flats['euclidean'] = cdist(candidates, preferences, 'euclidean', w=weights)

In [7]:
flats.sort_values('cosine').head(5)

Unnamed: 0_level_0,price,room_number,longitude,latitude,cosine,euclidean
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2841,61000.0,1,30.517193,50.425704,0.03096,0.150474
2599,73726.0,1,30.537369,50.415119,0.053009,0.159446
2472,92000.0,1,30.522426,50.424904,0.06486,0.190891
3433,81956.0,1,30.515585,50.425658,0.065562,0.192649
3832,82341.0,1,30.515585,50.425658,0.066295,0.193388


In [8]:
flats.sort_values('euclidean').head(5)

Unnamed: 0_level_0,price,room_number,longitude,latitude,cosine,euclidean
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2841,61000.0,1,30.517193,50.425704,0.03096,0.150474
2599,73726.0,1,30.537369,50.415119,0.053009,0.159446
2472,92000.0,1,30.522426,50.424904,0.06486,0.190891
3433,81956.0,1,30.515585,50.425658,0.065562,0.192649
3832,82341.0,1,30.515585,50.425658,0.066295,0.193388
