## Notes

- The goal is to get building permit data for our markets (and across the country) from the U.S. Census Bureau
- I downloaded the master file from the bureau (link below)
- A column with "REP" means the data contained is from a reported source only. A column without "REP", like "BLDGS_1_UNIT", contains estimated data

Data link: https://www2.census.gov/econ/bps/Master%20Data%20Set/

## Imports

In [1]:
import pandas as pd
import numpy as np
import os

## PD Set Options

In [7]:
pd.set_option('display.max_columns',None)

## Data Read-in

In [3]:
pd.read_csv?

In [6]:
df = pd.read_csv("BPS_Compiled_File_202412.csv",dtype='str', encoding_errors='ignore')

## Data work

In [14]:
df_monthly = df[df['PERIOD'] == 'Monthly'] 

In [20]:
df_monthly_county = df_monthly[df_monthly['LOCATION_TYPE'] == 'County']

In [24]:
df_monthly_county_year = df_monthly_county[df_monthly_county['YEAR'] == '2024']

In [37]:
keep_columns = ['BLDGS_1_UNIT', 'BLDGS_2_UNITS','BLDGS_3_4_UNITS',
       'BLDGS_5_UNITS','COUNTY_NAME','DIVISION_NAME','MONTH','PERIOD','REGION_NAME',
       'STATE_NAME', 'SURVEY_DATE', 'TOTAL_BLDGS','TOTAL_UNITS','TOTAL_VALUE',
       'UNITS_1_UNIT', 'UNITS_2_UNITS','UNITS_3_4_UNITS',
       'UNITS_5_UNITS', 'VALUE_1_UNIT', 'VALUE_2_UNITS',
       'VALUE_3_4_UNITS', 'VALUE_5_UNITS','YEAR', 'ZIP_CODE', 'LOCATION_NAME']

In [39]:
final_df = df_monthly_county_year[keep_columns]

In [58]:
final_df['COUNTY_STATE_NAME'] = final_df['COUNTY_NAME'] + ', ' + final_df['STATE_NAME']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_df['COUNTY_STATE_NAME'] = final_df['COUNTY_NAME'] + ', ' + final_df['STATE_NAME']


In [47]:
import calendar
final_df['MONTH_NAME'] = final_df['MONTH'].astype(int).apply(lambda x: calendar.month_name[x])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_df['MONTH_NAME'] = final_df['MONTH'].astype(int).apply(lambda x: calendar.month_name[x])


In [51]:
final_df['MONTH_NAME_YEAR'] = final_df["MONTH_NAME"] + " " + final_df["YEAR"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_df['MONTH_NAME_YEAR'] = final_df["MONTH_NAME"] + " " + final_df["YEAR"]


In [60]:
final_df.to_csv("2024_CB_BuildingPermit_data.csv")

In [59]:
final_df.head(1)

Unnamed: 0,BLDGS_1_UNIT,BLDGS_2_UNITS,BLDGS_3_4_UNITS,BLDGS_5_UNITS,COUNTY_NAME,DIVISION_NAME,MONTH,PERIOD,REGION_NAME,STATE_NAME,SURVEY_DATE,TOTAL_BLDGS,TOTAL_UNITS,TOTAL_VALUE,UNITS_1_UNIT,UNITS_2_UNITS,UNITS_3_4_UNITS,UNITS_5_UNITS,VALUE_1_UNIT,VALUE_2_UNITS,VALUE_3_4_UNITS,VALUE_5_UNITS,YEAR,ZIP_CODE,LOCATION_NAME,COUNTY_STATE_NAME,MONTH_NAME,MONTH_NAME_YEAR
3208622,4,0,0,0,Swain County,South Atlantic Division,2,Monthly,South Region,North Carolina,202402,4,4,1615000,4,0,0,0,1615000,0,0,0,2024,,Swain County,"Swain County, North Carolina",February,February 2024


In [56]:
final_df.columns

Index(['BLDGS_1_UNIT', 'BLDGS_2_UNITS', 'BLDGS_3_4_UNITS', 'BLDGS_5_UNITS',
       'COUNTY_NAME', 'DIVISION_NAME', 'MONTH', 'PERIOD', 'REGION_NAME',
       'STATE_NAME', 'SURVEY_DATE', 'TOTAL_BLDGS', 'TOTAL_UNITS',
       'TOTAL_VALUE', 'UNITS_1_UNIT', 'UNITS_2_UNITS', 'UNITS_3_4_UNITS',
       'UNITS_5_UNITS', 'VALUE_1_UNIT', 'VALUE_2_UNITS', 'VALUE_3_4_UNITS',
       'VALUE_5_UNITS', 'YEAR', 'ZIP_CODE', 'LOCATION_NAME',
       'COUNTY_STATE_NAME', 'MONTH_NAME', 'MONTH_NAME_YEAR'],
      dtype='object')