# Rampart flats

This notebook's a convenient local development tool integrated with python interpreter, interactive browser-based text editor and preliminary started local PostgreSQL DB. Feel free to use this document to easily explore databases filled with [rampart](https://github.com/xXxRisingTidexXx/rampart) miners & parsers. Happy coding!

In [1]:
from warnings import filterwarnings
from pandas import read_sql, unique
from sqlalchemy import create_engine
filterwarnings('ignore', message='numpy.dtype size changed')
filterwarnings('ignore', message='numpy.ufunc size changed')

<br />Lets load the whole *flats* table from the DB. Notice that the hostname equals the DB container name due to the common Docker network.

In [2]:
engine = create_engine('postgres://postgres:postgres@rampart-database:5432/rampart')

In [3]:
def read_flats():
    with engine.connect() as connection:
        return read_sql('select * from flats', connection, index_col=['id'])

In [4]:
flats = read_flats()

In [5]:
flats.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 61266 entries, 1 to 61266
Data columns (total 19 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   origin_url    61266 non-null  object        
 1   image_url     61266 non-null  object        
 2   update_time   61266 non-null  datetime64[ns]
 3   parsing_time  61266 non-null  datetime64[ns]
 4   price         61266 non-null  float64       
 5   total_area    61266 non-null  float64       
 6   living_area   61266 non-null  float64       
 7   kitchen_area  61266 non-null  float64       
 8   room_number   61266 non-null  int64         
 9   floor         61266 non-null  int64         
 10  total_floor   61266 non-null  int64         
 11  housing       61266 non-null  object        
 12  complex       61266 non-null  object        
 13  point         61266 non-null  object        
 14  state         61266 non-null  object        
 15  city          61266 non-null  object

In [6]:
flats.describe()

Unnamed: 0,price,total_area,living_area,kitchen_area,room_number,floor,total_floor
count,61266.0,61266.0,61266.0,61266.0,61266.0,61266.0,61266.0
mean,91989.33,69.464737,22.67543,10.935118,1.99432,8.150328,15.135165
std,198212.7,35.90109,25.224309,9.6384,0.955263,6.200472,7.845294
min,38.0,14.0,0.0,0.0,1.0,1.0,2.0
25%,34500.0,45.0,0.0,0.0,1.0,3.0,9.0
50%,55199.5,62.0,18.0,11.0,2.0,6.0,14.0
75%,100000.0,81.99,36.0,15.5,3.0,11.0,24.0
max,32698080.0,555.0,485.0,130.0,9.0,39.0,50.0


<br />Now lets explore the entire data frame contents closer. First of all, states:

In [7]:
states = flats.groupby(['state'])['state'].count().reset_index(name='count').sort_values(['count'], ascending=False, ignore_index=True)

In [8]:
print(f'Found {len(states)} states, {len(states[states["count"] < 100])} out of them seem to be insufficient.')

Found 35 states, 16 out of them seem to be insufficient.


In [9]:
states[states['count'] > 1000]

Unnamed: 0,state,count
0,Київська область,29340
1,Одеська область,14939
2,Івано-Франківська область,3844
3,Харківська область,2617
4,Вінницька область,2383
5,Львівська область,1705
6,Дніпропетровська область,1666
7,Хмельницька область,1310


<br />Now we should count cities.

In [10]:
cities = flats.groupby(['city'])['city'].count().reset_index(name='count').sort_values(['count'], ascending=False, ignore_index=True)

In [11]:
print(f'Found {len(cities)} cities, {len(cities[cities["count"] < 100])} out of them seem to be insufficient.')

Found 187 cities, 163 out of them seem to be insufficient.


In [12]:
cities[cities['count'] > 1000]

Unnamed: 0,city,count
0,Київ,23713
1,Одеса,14764
2,Івано-Франківськ,3798
3,Ірпінь,3202
4,Харків,2608
5,Вінниця,2366
6,Львів,1642
7,Києво-Святошинський,1543
8,Дніпро,1396
9,Хмельницький,1223
