# Application-to-Permit in New York City

This notebook (along with files associated with it) examines the timeline of permit application to permit issuance in New York City.

## Introduction

### Construction Permit Process

Before a new building can begin construction it has to go through a long professional vetting process. The Department of Buildings (DOB) is the principal governmental organization responsible for overseeing the inspections and paperwork required for breaking ground.

The exact process, for the building manager involved, is summarized by [this handy chart](http://www1.nyc.gov/site/buildings/homeowner/dob-forms-applications.page). Here are the steps:

1. The applicant prefiles the application. Prefiling saves time, but not all applicants invoke it.
2. The applicant submits a work plan.
3. The plan is either professionally certified (a so-called "pro cert" or "self cert") or reviewed by the DOB.
4. The plan is approved or denied by the DOB.
5. Once a plan is approved, the applicant requests a work permit.
6. DOB issues a work permit.
7. Construction begins.

### Retrieving Construction Permits

Construction permits are retrieved as a part of the work done by the `Permit Issuance` companion notebook, which came first.

### Retrieving Application Jobs

DOB publishes a list of all [building job applications](https://data.cityofnewyork.us/Housing-Development/DOB-Job-Application-Filings/ic3t-wcy2) as a part of the New York City Open Data Portal. This application is missing new building construction (which is posted to a seperate open dataset), but contains all of the information on the preceding timeline. Retrieving these permits is merely a matter of accessing, extracting, and filtering this dataset.

## Reading Job Application Data

In [1]:
import pandas as pd
import numpy as np

In [2]:
permits = pd.read_csv("DOB_Job_Application_Filings.csv")

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
pd.set_option("max_columns", 500)

In [4]:
permits['Job Status Descrp'].value_counts()[:-1] # Last one is a parsing error.

APPLICATION PROCESSED - ENTIRE              60217
PLAN EXAM - APPROVED                        36135
APPLICATION PROCESSED - ENTIRE              26308
PLAN EXAM - IN PROCESS                      23352
PLAN EXAM - APPROVED                        13840
PERMIT ISSUED - ENTIRE JOB/WORK             11975
PLAN EXAM - DISAPPROVED                      9453
PLAN EXAM - IN PROCESS                       7559
PERMIT ISSUED - ENTIRE JOB/WORK              7397
PRE-FILING                                   7052
PERMIT ISSUED - PARTIAL JOB                  5498
APPLICATION ASSIGNED TO PLAN EXAMINER        5216
PRE-FILING                                   3864
APPLICATION PROCESSED - NO PLAN EXAM         3554
PLAN EXAM - DISAPPROVED                      3291
PERMIT ISSUED - PARTIAL JOB                  2358
APPLICATION ASSIGNED TO PLAN EXAMINER        1411
APPLICATION PROCESSED - NO PLAN EXAM         1266
APPLICATION PROCESSED-PART-NO PAYMENT         378
APPLICATION PROCESSED-PART-NO PAYMENT         175


Records begin in time at 04/25/2013.

### `NB` Job Types

This record includes submissions for all job types, however we're only interested in new building construction jobs. This corresponds to records with a `JOB TYPE` of `NB`.

In [5]:
new_buildings = permits[permits['Job Type'] == 'NB']
all_bins = np.unique(new_buildings['Bin #'])
print("There are {0} new building records, corresponding with {1} buildings.".format(len(new_buildings), len(all_bins)))

There are 12481 new building records, corresponding with 6951 buildings.


In [6]:
new_buildings['Job Status Descrp'].value_counts()

APPLICATION PROCESSED - ENTIRE              4850
APPLICATION ASSIGNED TO PLAN EXAMINER       1860
APPLICATION PROCESSED - ENTIRE              1748
PRE-FILING                                  1712
PRE-FILING                                   751
APPLICATION ASSIGNED TO PLAN EXAMINER        553
PLAN EXAM - IN PROCESS                       351
PLAN EXAM - DISAPPROVED                      329
PLAN EXAM - DISAPPROVED                       96
PLAN EXAM - IN PROCESS                        80
PLAN EXAM - APPROVED                          76
PLAN EXAM - APPROVED                          45
APPLICATION PROCESSED-PART-NO PAYMENT         18
APPLICATION PROCESSED-PART-NO PAYMENT          6
PERMIT ISSUED - ENTIRE JOB/WORK                5
PERMIT ISSUED - PARTIAL JOB                    1
Name: Job Status Descrp, dtype: int64

In [7]:
permits['Job Type'].value_counts()

A2    150156
A3     44696
A1     14262
NB     12481
DM      5151
PA      2281
SC       877
SI       694
Name: Job Type, dtype: int64

Surprisingly there are almost no new building permit issuances!

This turned out to be a completely separate dataset.

In [73]:
new_buildings['Job Status Descrp'].value_counts()[['PERMIT ISSUED - ENTIRE JOB/WORK', 'PERMIT ISSUED - PARTIAL JOB']]

PERMIT ISSUED - ENTIRE JOB/WORK    5
PERMIT ISSUED - PARTIAL JOB        1
Name: Job Status Descrp, dtype: int64

## Application&mdash;Permit&mdash;Occupancy Join

The `Permit-to-Occupancy.ipynb` notebook contains all of the processing code for the permit-to-occupancy data. In the following few lines we now join the application data.

We will do so by taking the first application document which is not a pre-permit application. The pre-permit application speeds up the process somewhat but far from all choose to exercise that option, so a simpler metric is fine.

In [9]:
permits = pd.read_csv("permit_sample.csv")

In [10]:
applications = new_buildings

In [13]:
import arrow

In [43]:
applications['Latest Action Date'] = applications['Latest Action Date'].map(lambda x: np.datetime64(arrow.get(x,"MM/DD/YYYY").datetime))

  if __name__ == '__main__':


In [44]:
def get_application_doc(srs):
    BIN = srs['Bin #']
    non_pre_applications_jobs = applications[(applications['Bin #'] == BIN) &
                                             (applications['Job Status Descrp'] != 'PRE-FILING') &
                                             (applications['Job Status Descrp'] != 'PRE-FILING                              ')
                                            ]
    return np.min(non_pre_applications_jobs['Latest Action Date'])

In [45]:
permits_copy = permits.copy()
permits_copy['Application Date'] = permits.apply(get_application_doc, axis=1)

In [47]:
permits_copy = permits_copy.drop(["Unnamed: 0", "Unnamed: 0.1"], axis=1)

In [49]:
permits_copy.head(1)

Unnamed: 0,BOROUGH,Bin #,House #,Street Name,Job #,Job doc. #,Job Type,Self_Cert,Block,Lot,Community Board,Zip Code,Bldg Type,Residential,Special District 1,Special District 2,Work Type,Permit Status,Filing Status,Permit Type,Permit Sequence #,Permit Subtype,Oil Gas,Site Fill,Filing Date,Issuance Date,Expiration Date,Job Start Date,Permittee's First Name,Permittee's Last Name,Permittee's Business Name,Permittee's Phone #,Permittee's License Type,Permittee's License #,Act as Superintendent,Permittee's Other Title,HIC License,Site Safety Mgr's First Name,Site Safety Mgr's Last Name,Site Safety Mgr Business Name,Superintendent First & Last Name,Superintendent Business Name,Owner's Business Type,Non-Profit,Owner's Business Name,Owner's First Name,Owner's Last Name,Owner's House #,Owner's House Street Name,Owner’s House City,Owner’s House State,Owner’s House Zip Code,Owner's Phone #,DOBRunDate,Year,Certificate Date,Build Time,Application Date
0,BRONX,2818049.0,92,WEST TREMONT AVE,220177350.0,1.0,NB,,2867.0,125.0,205,10453,2.0,YES,,,,ISSUED,INITIAL,NB,1.0,,,ON-SITE,2013-06-21,06/21/2013 12:00:00 AM,03/20/2014,06/21/2013,JIMMY,ZERVOUDIS,GALAXY G.C.GROUP LLC,7186017000,GENERAL CONTRACTOR,610672,,,,,,,PHILIP ANTONAKOS,GALAXY GENERAL CONTRACTIO,HPD,Y,HOUSING PRESERVATION AND DEVELOP,TIMOTHY,JOSEPH,100,GOLD STREET,NEW YORK,NY,10038.0,2128637000.0,06/22/2013 12:00:00 AM,2013,2015-11-02,864 days 00:00:00.000000000,NaT


In [56]:
permits_copy['Issuance Date'] = permits_copy['Issuance Date'].map(lambda dstr: np.datetime64(arrow.get(dstr.split(" ")[0], "MM/DD/YYYY").datetime))

  if __name__ == '__main__':


In [57]:
permits_copy['Application Time'] = permits_copy.apply(lambda srs: srs['Issuance Date'] - srs['Application Date'], axis='columns')

Note that because of the difference between the timeline of the permit data and of the application data there is a significant data mismatch. Further thought on that is in the `Analysis` notebook.

In [66]:
permits_copy['Application Time'].isnull().sum() / float(len(permits_copy))

0.52171196501093409

In [71]:
permits_copy.to_csv("combined_data.csv", date_format='%Y-%m-%d')

## Backtracking on Application Data

On further reflection the addition of the application data to the dataset above, though easy to do, has limited utility. This is because the sample of permits that we took was from May 2013 through the end of 2014. The permit is the last step of the application document trail, and our application data goes back the same amount, to May 2013.

This is the opposite of the problem that we have in examining permit-to-occupancy data, in which occupancy certificates come afterwards, so every site has at least 550 days of construction on the record.

There is no reason to eliminate the variable, but it's not of much use.

## Application&mdash;Permit Join