# Cursory check of the PCRAFI database

This notebook serves to give a quick view of the quality of the information held in the [PCRAFI database](http://pcrafi.spc.int/beta), to help inform priority areas for enhancement of the database. 

We check the fraction of records with valid entries for wall type, wall frame, roof type, foundations and minimum floor height. These attributes are important for determining appropriate vulnerability relations for estimating damage, or estimating impact (in the case of floor height). 

This notebook uses matplotlib, numpy, pandas and it's geospatially enabled counterpart geopandas. Seaborn is used to simplify visualisation fo statistical analyses. 

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd

import seaborn as sns


Start by loading the data from file. I'd prefer to set this up to read a web feature service (then others can readily use and extend this notebook), but alas, there's no support for doing so in the documentation for the underlying module ([fiona](http://toblerity.org/fiona/index.html)).

In [2]:
to_bld = gpd.read_file("R:/Pacific/data/external/pcrafi/TO/to_buildings.shp")

to_bld.info()

A very quick look at the first fiew entries to confirm that the data loaded correctly. Not all the fileds (72 in this example) are shown.

In [3]:
to_bld.head()

Now to define some groups, based on building attributes. In this instance, the grouping is purely by the building attributes, so commercial buildings are lumped in with residential buildings, industrial and critical infrastructure. Further down, we break down the data by building use.

In [23]:
wall_frame = to_bld.groupby('B_FRAME1')
wall_material = to_bld.groupby('WALL_MAT1')
roof_material = to_bld.groupby('ROOF_MAT_1')
foundations = to_bld.groupby('FOUND1')
floor_height = to_bld.groupby('F_MINHT')
nrecords = len(to_bld)

pwallframe = 100 * np.count_nonzero(to_bld['B_FRAME1'].notnull())/float(nrecords)
pwall_material = 100 * np.count_nonzero(to_bld['WALL_MAT1'].notnull())/float(nrecords)
proof_material = 100 * np.count_nonzero(to_bld['ROOF_MAT_1'].notnull())/float(nrecords)
pfoundations = 100 * np.count_nonzero(to_bld['FOUND1'].notnull())/float(nrecords)
pfloor_height = 100 * np.count_nonzero(to_bld['F_MINHT'].notnull())/float(nrecords)

print("Percentage of complete records")
print("------------------------------")
print("Wall frame:    {0:.2f}%".format(pwallframe))
print("Wall material: {0:.2f}%".format(pwall_material))
print("Roof material: {0:.2f}%".format(proof_material))
print("Foundation:    {0:.2f}%".format(pfoundations))
print("Floor height   {0:.2f}%".format(pfloor_height))

Here we give the percentage of each class in each attribute. Note the total is not 100% - this gives an indication of the completeness of the data. In most cases, only around 35% of records have an entry. 

In [5]:
100 * wall_frame.count()['AGE']/nrecords

In [6]:
100 * wall_material.count()['AGE']/nrecords

In [7]:
100 * roof_material.count()['AGE']/nrecords

In [8]:
100 * foundations.count()['AGE']/nrecords

In [9]:
grouped = to_bld.groupby(['USE_GRP', 'B_FRAME1'])
100 * grouped.count()['AGE']/len(to_bld)

In [10]:
def autolabel(rects, rotation='horizontal'):
    # attach some text labels
    for rect in rects:
        height = rect.get_height()
        if np.isnan(height):
            height = 0
        ax.text(rect.get_x()+rect.get_width()/2., 1.05*height, '%d'%int(height),
                ha='center', va='bottom', rotation=rotation, fontsize='small')

In [11]:
fig, ax = plt.subplots(1, 1, figsize=(16,8))
ax = sns.countplot(x='USE_GRP', data=to_bld, palette='prism', hue='B_FRAME1')
autolabel(ax.patches, rotation='vertical')
ax.legend(loc=1)
ax.set_xlabel('Building use group')
#labels = ax.get_xticklabels()
#ax.set_xticklabels(labels,rotation='vertical')


In [12]:
fig, ax = plt.subplots(1, 1, figsize=(8, 16))

villages = gpd.read_file("R:/Pacific/data/external/pcrafi/TO/to_village.shp")
base = villages.plot(axes=ax)
to_bld.plot(axes=base)