In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

sns.set(style="white", color_codes=True)

%matplotlib inline

What is Statbank API?

The Central Statistics Office (CSO) StatBank Application Programming Interface (API) provides access to StatBank data in JSON-stat format. This dissemination tool allows developers machine to machine access to CSO StatBank data. Details about JSON-stat can be found at this web address http://json-stat.org/.

The majority of StatBank tables are available in JSON-stat format, on the StatBank API. StatBank tables with over 150,000 data points are not available in the StatBank API and a message to this effect is presented when you select the JSON table e.g. TDM04 is not available. A list of the StatBank tables that are unavailable are presented here.

If you encounter any difficulties please email databank@cso.ie.

CSO wish you every success using this new data dissemination tool and we welcome your feedback at databank@cso.ie.

Download Data

To download a StatBank table in JSON-stat please click here

There is a list of themes that follow a similar structure to StatBank. Select the theme and subtheme you are interested in and the data tables available will be displayed. Tables are listed as Key Tables, Current Tables and Archived Tables to assist with appropriate selection. The final theme PSSN, is the data hosted by the CSO for other government departments through the CSO Public Sector Statistics Network. Remember tables with over 150,000 data points though listed are not available for download with this application. Details about JSON-stat can be found at this web address http://json-stat.org/ .


http://www.cso.ie/webserviceclient/
http://www.cso.ie/webserviceclient/DatasetListing.aspx  

pyjstat is a python library for JSON-stat formatted data manipulation which allows reading and writing JSON-stat [1] format with python,using the DataFrame structures provided by the widely accepted pandas library [2]. The JSON-stat format is a simple lightweight JSON format for data dissemination, currently in its 2.0 version. Pyjstat is inspired in rjstat [3], a library to read and write JSON-stat with R, by ajschumacher. Note that, like in the rjstat project, not all features are supported (i.e. not all metadata are converted). pyjstat is provided under the Apache License 2.0.

[1]	http://json-stat.org/ for JSON-stat information
[2]	http://pandas.pydata.org for Python Data Analysis Library information
[3]	https://github.com/ajschumacher/rjstat for rjstat library information
This library was first developed to work with Python 2.7. With some fixes (thanks to @andrekittredge), now it works with Python 3.4 too.

https://pypi.python.org/pypi/pyjstat/

###### Residential Dwelling Property Transactions by County, Dwelling Status, Stamp Duty Event, Type of Buyer, Type of Sale, Month and Statistic

###### Variables

* County (27) -- for all counties
* Dwelling Status (3)
* Stamp Duty Event (2)
* Type of Buyer (6)
* Type of Sale (3)
* Time (87)
* Contents (4)

In [2]:
from pyjstat import pyjstat
# read from json-stat
url = 'http://www.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/HPM02'
dataset = pyjstat.Dataset.read(url)
df = dataset.write('dataframe')
print(df)

INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.cso.ie


               County        Dwelling Status Stamp Duty Event  \
0        All Counties  All Dwelling Statuses          Filings   
1        All Counties  All Dwelling Statuses          Filings   
2        All Counties  All Dwelling Statuses          Filings   
3        All Counties  All Dwelling Statuses          Filings   
4        All Counties  All Dwelling Statuses          Filings   
5        All Counties  All Dwelling Statuses          Filings   
6        All Counties  All Dwelling Statuses          Filings   
7        All Counties  All Dwelling Statuses          Filings   
8        All Counties  All Dwelling Statuses          Filings   
9        All Counties  All Dwelling Statuses          Filings   
10       All Counties  All Dwelling Statuses          Filings   
11       All Counties  All Dwelling Statuses          Filings   
12       All Counties  All Dwelling Statuses          Filings   
13       All Counties  All Dwelling Statuses          Filings   
14       All Counties  Al

In [3]:
df1 = pd.DataFrame.from_records(df)

In [5]:
df1.head(200)

Unnamed: 0,County,Dwelling Status,Stamp Duty Event,Type of Buyer,Type of Sale,Month,Statistic,value
0,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Volume of Sales (Number),1051.0
1,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Value of Sales (Euro Million),236.1
2,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Average Sale Price (Euro),224686.0
3,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Median Price (Euro),190000.0
4,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M02,Volume of Sales (Number),1599.0
5,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M02,Value of Sales (Euro Million),312.4
6,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M02,Average Sale Price (Euro),195343.0
7,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M02,Median Price (Euro),179000.0
8,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M03,Volume of Sales (Number),2020.0
9,All Counties,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M03,Value of Sales (Euro Million),374.4


In [6]:
#select all the counties without All Counties 
#https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas
df2 = df1.loc[df['County'] != 'All Counties']

In [7]:
df2.head()

Unnamed: 0,County,Dwelling Status,Stamp Duty Event,Type of Buyer,Type of Sale,Month,Statistic,value
37584,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Volume of Sales (Number),8.0
37585,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Value of Sales (Euro Million),1.5
37586,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Average Sale Price (Euro),190438.0
37587,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Median Price (Euro),176750.0
37588,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M02,Volume of Sales (Number),14.0


In [8]:
print ("\n\n---------------------")
print ("TRAIN SET INFORMATION")
print ("---------------------")
print ("Shape of training set:", df2.shape, "\n")
print ("Column Headers:", list(df2.columns.values), "\n")
print (df2.dtypes)



---------------------
TRAIN SET INFORMATION
---------------------
Shape of training set: (977184, 8) 

Column Headers: ['County', 'Dwelling Status', 'Stamp Duty Event', 'Type of Buyer', 'Type of Sale', 'Month', 'Statistic', 'value'] 

County               object
Dwelling Status      object
Stamp Duty Event     object
Type of Buyer        object
Type of Sale         object
Month                object
Statistic            object
value               float64
dtype: object


In [9]:
import re
missing_values = []
nonumeric_values = []

print ("TRAINING SET INFORMATION")
print ("========================\n")

for column in df2:
    # Find all the unique feature values
    uniq = df2[column].unique()
    print ("'{}' has {} unique values" .format(column,uniq.size))
    if (uniq.size > 10):
        print("~~Listing up to 10 unique values~~")
    print (uniq[0:10])
    print ("\n-----------------------------------------------------------------------\n")
    
    # Find features with missing values
    if (True in pd.isnull(uniq)):
        s = "{} has {} missing" .format(column, pd.isnull(df2[column]).sum())
        missing_values.append(s)
    
    # Find features with non-numeric values
    for i in range (1, np.prod(uniq.shape)):
        if (re.match('nan', str(uniq[i]))):
            break
        if not (re.search('(^\d+\.?\d*$)|(^\d*\.?\d+$)', str(uniq[i]))):
            nonumeric_values.append(column)
            break
  
print ("\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n")
print ("Features with missing values:\n{}\n\n" .format(missing_values))
print ("Features with non-numeric values:\n{}" .format(nonumeric_values))
print ("\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n")

TRAINING SET INFORMATION

'County' has 26 unique values
~~Listing up to 10 unique values~~
['Carlow' 'Dublin' 'Kildare' 'Kilkenny' 'Laois' 'Longford' 'Louth' 'Meath'
 'Offaly' 'Westmeath']

-----------------------------------------------------------------------

'Dwelling Status' has 3 unique values
['All Dwelling Statuses' 'New' 'Existing']

-----------------------------------------------------------------------

'Stamp Duty Event' has 2 unique values
['Filings' 'Executions']

-----------------------------------------------------------------------

'Type of Buyer' has 6 unique values
['All Buyer Types' 'Household Buyer - All'
 'Household Buyer - First-Time Buyer Owner-Occupier'
 'Household Buyer - Former Owner-Occupier' 'Household Buyer - Non-Occupier'
 'Non-Household Buyer']

-----------------------------------------------------------------------

'Type of Sale' has 3 unique values
['All Sale Types' 'Market' 'Non-Market']

-------------------------------------------------------------

In [10]:
df2.Statistic

37584           Volume of Sales (Number)
37585      Value of Sales (Euro Million)
37586         Average Sale Price  (Euro)
37587                Median Price (Euro)
37588           Volume of Sales (Number)
37589      Value of Sales (Euro Million)
37590         Average Sale Price  (Euro)
37591                Median Price (Euro)
37592           Volume of Sales (Number)
37593      Value of Sales (Euro Million)
37594         Average Sale Price  (Euro)
37595                Median Price (Euro)
37596           Volume of Sales (Number)
37597      Value of Sales (Euro Million)
37598         Average Sale Price  (Euro)
37599                Median Price (Euro)
37600           Volume of Sales (Number)
37601      Value of Sales (Euro Million)
37602         Average Sale Price  (Euro)
37603                Median Price (Euro)
37604           Volume of Sales (Number)
37605      Value of Sales (Euro Million)
37606         Average Sale Price  (Euro)
37607                Median Price (Euro)
37608           

In [None]:
for value1 in df2.County:
    df3 = df2.loc[(df2['County'] == 'value1') & (df2['Statistic'] == 'Median Price (Euro)')]

In [24]:
df3.value.mean()

95805.6290250779

In [23]:
df3

Unnamed: 0,County,Dwelling Status,Stamp Duty Event,Type of Buyer,Type of Sale,Month,Statistic,value
37587,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M01,Median Price (Euro),176750.0
37591,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M02,Median Price (Euro),140332.0
37595,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M03,Median Price (Euro),82500.0
37599,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M04,Median Price (Euro),57000.0
37603,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M05,Median Price (Euro),133000.0
37607,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M06,Median Price (Euro),140000.0
37611,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M07,Median Price (Euro),149000.0
37615,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M08,Median Price (Euro),172500.0
37619,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M09,Median Price (Euro),113000.0
37623,Carlow,All Dwelling Statuses,Filings,All Buyer Types,All Sale Types,2010M10,Median Price (Euro),180000.0
