# Prueba Técnica Desarrollador IA - Alianza Team


## **Introducción**
> 🎯 **Objetivo**: Desarrollar un modelo predictivo utilizando técnicas de machine learning (ML) para estimar la densidad de licencias comerciales por habitante en cada código postal de la ciudad de Nueva York.

## **Insumos**
| Archivo | Descripción | Fuente |
| :--- | :--- | :--- |
| `demographic-statistics-by-zip-code-1.csv` | La tabla muestra **estadísticas demográficas** de personas que participan en programas financiados por el Departamento de Desarrollo Comunitario y Juvenil de Nueva York (DYCD), organizadas por código postal. | [Demographic Statistics By Zip Code \| data.world](https://data.world/city-of-ny/kku6-nxdu) |
| `DCA_Legally_Operating_Businesses_03062015.xlsx` | Este conjunto de datos incluye licencias emitidas por el Departamento de Protección al Consumidor y al Trabajador (DCWP) a **empresas** (*Premises*) y **personas naturales** (*Individuals*) para que puedan operar legalmente en la ciudad de Nueva York. | [Legally operating businesses \| NYC Open Data](https://data.cityofnewyork.us/Business/Legally-Operating-Businesses/w7w3-xahh/about_data) |

## **Definiciones**
### **Densidad de Licencias Comerciales**
$$ \text{Densidad de Licencias Comerciales} = \frac{\text{Número de Licencias Comerciales}}{\text{Población Total}} $$

**Nota**:
1. El número de licencias comerciales se obtiene del archivo `DCA_Legally_Operating_Businesses_03062015.xlsx`.
2. La población total se obtiene del archivo `demographic-statistics-by-zip-code-1.csv`.

## **1. Preparación de los datos**

In [372]:
# Importar librerías (pandas, numpy, matplotlib)
import pandas as pd

In [373]:
# Importar tabla de datos demográficos de la ciudad de Nueva York por código postal
demographics = pd.read_csv('Sources/demographic-statistics-by-zip-code-1.csv', sep=';')

In [374]:
demographics.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 236 entries, 0 to 235
Data columns (total 46 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   JURISDICTION NAME                    236 non-null    int64  
 1   COUNT PARTICIPANTS                   236 non-null    int64  
 2   COUNT FEMALE                         236 non-null    int64  
 3   PERCENT FEMALE                       236 non-null    float64
 4   COUNT MALE                           236 non-null    int64  
 5   PERCENT MALE                         236 non-null    float64
 6   COUNT GENDER UNKNOWN                 236 non-null    int64  
 7   PERCENT GENDER UNKNOWN               236 non-null    int64  
 8   COUNT GENDER TOTAL                   236 non-null    int64  
 9   PERCENT GENDER TOTAL                 236 non-null    int64  
 10  COUNT PACIFIC ISLANDER               236 non-null    int64  
 11  PERCENT PACIFIC ISLANDER        

In [375]:
# Importar tabla de datos de licencias de negocios en la ciudad de Nueva York
licenses = pd.read_excel('Sources/DCA_Legally_Operating_Businesses_03062015.xlsx')

In [376]:
# Dimensiones de los datos
print('Shape de los datos de Demografía: ', demographics.shape)
print('Shape de los datos de Licencias: ', licenses.shape)


Shape de los datos de Demografía:  (236, 46)
Shape de los datos de Licencias:  (65798, 14)


### **Tabla 1. Demografía**
#### Eliminación de registros innecesarios

In [377]:
demographics.head()

Unnamed: 0,JURISDICTION NAME,COUNT PARTICIPANTS,COUNT FEMALE,PERCENT FEMALE,COUNT MALE,PERCENT MALE,COUNT GENDER UNKNOWN,PERCENT GENDER UNKNOWN,COUNT GENDER TOTAL,PERCENT GENDER TOTAL,...,COUNT CITIZEN STATUS TOTAL,PERCENT CITIZEN STATUS TOTAL,COUNT RECEIVES PUBLIC ASSISTANCE,PERCENT RECEIVES PUBLIC ASSISTANCE,COUNT NRECEIVES PUBLIC ASSISTANCE,PERCENT NRECEIVES PUBLIC ASSISTANCE,COUNT PUBLIC ASSISTANCE UNKNOWN,PERCENT PUBLIC ASSISTANCE UNKNOWN,COUNT PUBLIC ASSISTANCE TOTAL,PERCENT PUBLIC ASSISTANCE TOTAL
0,10001,44,22,0.5,22,0.5,0,0,44,100,...,44,100,20,0.45,24,0.55,0,0,44,100
1,10002,35,19,0.54,16,0.46,0,0,35,100,...,35,100,2,0.06,33,0.94,0,0,35,100
2,10003,1,1,1.0,0,0.0,0,0,1,100,...,1,100,0,0.0,1,1.0,0,0,1,100
3,10004,0,0,0.0,0,0.0,0,0,0,0,...,0,0,0,0.0,0,0.0,0,0,0,0
4,10005,2,2,1.0,0,0.0,0,0,2,100,...,2,100,0,0.0,2,1.0,0,0,2,100


Considerando el objetivo de este estudio, las **columnas más relevantes** para esta primera fase de preparación de los datos son:
- `JURISDICTION NAME`: Código postal. → Nos permitirá unir esta tabla con la tabla de licencias comerciales.
- `COUNT PARTICIPANTS`: Número de participantes. → Nos permitirá calcular la densidad de licencias comerciales.
    - Como se puede observar, en la _fila 4_ hay códigos postales con un recuento de participantes igual a cero.
    - Dada la definición de la densidad de licencias comerciales, estos códigos postales **no** aportarán información relevante al modelo predictivo. Por lo tanto, se procederá a **eliminar** estos registros.

In [378]:
print("Registros eliminados: ", demographics[demographics['COUNT PARTICIPANTS'] == 0].shape[0])
print("Registros útiles restantes: ", demographics[demographics['COUNT PARTICIPANTS'] != 0].shape[0])

Registros eliminados:  131
Registros útiles restantes:  105


In [379]:
# Eliminar registros con 0 participantes
demographics = demographics[demographics['COUNT PARTICIPANTS'] != 0]

#### Eliminación de columnas innecesarias
Una vez eliminados los registros innecesarios, se procederá a **eliminar** las columnas que no aportan información relevante al modelo predictivo. Para empezar, se evaluarán las columnas que se encuentran en la tabla de demografía.

In [380]:
demographics.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 105 entries, 0 to 233
Data columns (total 46 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   JURISDICTION NAME                    105 non-null    int64  
 1   COUNT PARTICIPANTS                   105 non-null    int64  
 2   COUNT FEMALE                         105 non-null    int64  
 3   PERCENT FEMALE                       105 non-null    float64
 4   COUNT MALE                           105 non-null    int64  
 5   PERCENT MALE                         105 non-null    float64
 6   COUNT GENDER UNKNOWN                 105 non-null    int64  
 7   PERCENT GENDER UNKNOWN               105 non-null    int64  
 8   COUNT GENDER TOTAL                   105 non-null    int64  
 9   PERCENT GENDER TOTAL                 105 non-null    int64  
 10  COUNT PACIFIC ISLANDER               105 non-null    int64  
 11  PERCENT PACIFIC ISLANDER        

A partir de esta evaluación podemos intuir que:
- Podemos presindir de las columnas de recuento (`COUNT`) y de las columnas de totales (`TOTAL`) ya que no aportan información relevante al modelo predictivo.
- Si hay columnas de ceros, se pueden eliminar ya que no aportan información relevante al modelo predictivo.

In [381]:
# Eliminar columnas con 0, COUNT (Excepto 'COUNT PARTICIPANTS' que se usará para calcular la densidad) y TOTAL
demographics = demographics.drop(columns=[col for col in demographics.columns if 'COUNT' in col and col != 'COUNT PARTICIPANTS'])
demographics = demographics[demographics.columns.drop(list(demographics.filter(regex='TOTAL')))]
demographics = demographics.loc[:, (demographics != 0).any(axis=0)]

Como resultado, se eliminaron 29 columnas de la tabla de demografía.

In [382]:
demographics.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 105 entries, 0 to 233
Data columns (total 17 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   JURISDICTION NAME                    105 non-null    int64  
 1   COUNT PARTICIPANTS                   105 non-null    int64  
 2   PERCENT FEMALE                       105 non-null    float64
 3   PERCENT MALE                         105 non-null    float64
 4   PERCENT PACIFIC ISLANDER             105 non-null    float64
 5   PERCENT HISPANIC LATINO              105 non-null    float64
 6   PERCENT AMERICAN INDIAN              105 non-null    float64
 7   PERCENT ASIAN NON HISPANIC           105 non-null    float64
 8   PERCENT WHITE NON HISPANIC           105 non-null    float64
 9   PERCENT BLACK NON HISPANIC           105 non-null    float64
 10  PERCENT OTHER ETHNICITY              105 non-null    float64
 11  PERCENT ETHNICITY UNKNOWN       

> 🔍 **Observaciones**
> - En líneas generales puede verse que las 17 columnas, aquellas de `PERCENT`, son las que pueden ser usadas como predictores para el modelo.
> - Las columnas `PERCENT` contienen información de las siguientes 4 variables:
>    - **Gender**: {`FEMALE`, `MALE`}
>    - **Ethnicity**: {`PACIFIC ISLANDER`, `HISPANIC LATINO`, `AMERICAN INDIAN`, `ASIAN NON HISPANIC`, `WHITE NON HISPANIC`, `BLACK NON HISPANIC`, `OTHER`, `UNKNOWN`}
>    - **Citizenship Status**: {`PERMANENT RESIDENT ALIEN`, `US CITIZEN`, `OTHER`}
>    - **Public Assistance**: {`RECEIVES PUBLIC ASSISTANCE`, `DOES NOT RECEIVE PUBLIC ASSISTANCE`}

👁️ <span style="background-color: #FFA500; color: black;">**Nota:**</span>
- En consecuencia, para evitar la multicolinealidad, deberíamos eliminar una columna de cada grupo de variables categóricas.
- Además, por claridad, se renombrarán las columnas de la tabla de demografía.

In [383]:
# Renombrar columnas de género, etnicidad, estatus de ciudadanía y asistencia pública
demographics = demographics.rename(columns=lambda x: x.replace('PERCENT ', ''))
demographics.rename(columns={'FEMALE': 'GENDER_FEMALE', 'MALE': 'GENDER_MALE'}, inplace=True)
demographics.rename(
    columns={
        'PACIFIC ISLANDER': 'ETHNICITY_PACIFIC_ISLANDER',
        'HISPANIC LATINO': 'ETHNICITY_HISPANIC_LATINO',
        'AMERICAN INDIAN': 'ETHNICITY_AMERICAN_INDIAN',
        'ASIAN NON HISPANIC': 'ETHNICITY_ASIAN_NON_HISPANIC',
        'WHITE NON HISPANIC': 'ETHNICITY_WHITE_NON_HISPANIC',
        'BLACK NON HISPANIC': 'ETHNICITY_BLACK_NON_HISPANIC',
        'OTHER ETHNICITY': 'ETHNICITY_OTHER',
        'ETHNICITY UNKNOWN': 'ETHNICITY_UNKNOWN'
    },
    inplace=True
)
demographics.rename(
    columns={
        'PERMANENT RESIDENT ALIEN': 'CITIZENSHIP_PERMANENT_RESIDENT_ALIEN',
        'US CITIZEN': 'CITIZENSHIP_US_CITIZEN',
        'OTHER CITIZEN STATUS': 'CITIZENSHIP_OTHER'
    },
    inplace=True
)
demographics.rename(
    columns={
        'RECEIVES PUBLIC ASSISTANCE': 'PUBLIC_ASSISTANCE_YES',
        'NRECEIVES PUBLIC ASSISTANCE': 'PUBLIC_ASSISTANCE_NO'
    },
    inplace=True
)

In [384]:
# Data type de tabla demo
demographics.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 105 entries, 0 to 233
Data columns (total 17 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   JURISDICTION NAME                     105 non-null    int64  
 1   COUNT PARTICIPANTS                    105 non-null    int64  
 2   GENDER_FEMALE                         105 non-null    float64
 3   GENDER_MALE                           105 non-null    float64
 4   ETHNICITY_PACIFIC_ISLANDER            105 non-null    float64
 5   ETHNICITY_HISPANIC_LATINO             105 non-null    float64
 6   ETHNICITY_AMERICAN_INDIAN             105 non-null    float64
 7   ETHNICITY_ASIAN_NON_HISPANIC          105 non-null    float64
 8   ETHNICITY_WHITE_NON_HISPANIC          105 non-null    float64
 9   ETHNICITY_BLACK_NON_HISPANIC          105 non-null    float64
 10  ETHNICITY_OTHER                       105 non-null    float64
 11  ETHNICITY_UNKNOWN  

In [385]:
# Eliminar las columnas GENDER_MALE, ETHNICITY_UNKNOWN, CITIZENSHIP_OTHER y PUBLIC_ASSISTANCE_NO
demographics = demographics.drop(columns=['GENDER_MALE', 'ETHNICITY_UNKNOWN', 'CITIZENSHIP_OTHER', 'PUBLIC_ASSISTANCE_NO'])

In [386]:
demographics.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 105 entries, 0 to 233
Data columns (total 13 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   JURISDICTION NAME                     105 non-null    int64  
 1   COUNT PARTICIPANTS                    105 non-null    int64  
 2   GENDER_FEMALE                         105 non-null    float64
 3   ETHNICITY_PACIFIC_ISLANDER            105 non-null    float64
 4   ETHNICITY_HISPANIC_LATINO             105 non-null    float64
 5   ETHNICITY_AMERICAN_INDIAN             105 non-null    float64
 6   ETHNICITY_ASIAN_NON_HISPANIC          105 non-null    float64
 7   ETHNICITY_WHITE_NON_HISPANIC          105 non-null    float64
 8   ETHNICITY_BLACK_NON_HISPANIC          105 non-null    float64
 9   ETHNICITY_OTHER                       105 non-null    float64
 10  CITIZENSHIP_PERMANENT_RESIDENT_ALIEN  105 non-null    float64
 11  CITIZENSHIP_US_CITI

In [387]:
demographics.head(20)

Unnamed: 0,JURISDICTION NAME,COUNT PARTICIPANTS,GENDER_FEMALE,ETHNICITY_PACIFIC_ISLANDER,ETHNICITY_HISPANIC_LATINO,ETHNICITY_AMERICAN_INDIAN,ETHNICITY_ASIAN_NON_HISPANIC,ETHNICITY_WHITE_NON_HISPANIC,ETHNICITY_BLACK_NON_HISPANIC,ETHNICITY_OTHER,CITIZENSHIP_PERMANENT_RESIDENT_ALIEN,CITIZENSHIP_US_CITIZEN,PUBLIC_ASSISTANCE_YES
0,10001,44,0.5,0.0,0.36,0.0,0.07,0.02,0.48,0.07,0.05,0.95,0.45
1,10002,35,0.54,0.0,0.03,0.0,0.8,0.17,0.0,0.0,0.06,0.94,0.06
2,10003,1,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
4,10005,2,1.0,0.0,0.0,0.0,0.5,0.0,0.5,0.0,0.5,0.5,0.0
5,10006,6,0.33,0.0,0.33,0.0,0.0,0.17,0.5,0.0,0.0,1.0,0.0
6,10007,1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0
7,10009,2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
9,10011,3,0.67,0.0,0.33,0.0,0.0,0.0,0.33,0.33,0.0,1.0,0.0
11,10013,8,0.13,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.13
13,10016,17,0.71,0.0,0.53,0.0,0.0,0.0,0.47,0.0,0.0,1.0,0.53


### **Tabla 2. Licencias Comerciales**
#### Eliminación de registros innecesarios

Considerando el objetivo de este estudio, las **columnas más relevantes** para la preparación de los datos son:
- `LICENSE_TYPE`: Tipo de licencia.
- `ZIP`: Código postal. → Nos permitirá unir esta tabla con la tabla de demografía.

In [388]:
licenses.sample(8, random_state=9).sort_values(by='LICENSE_TYPE')

Unnamed: 0,LICENSE_NBR,LICENSE_TYPE,LIC_EXPIR_DD,INDUSTRY,BUSINESS_NAME,BUSINESS_NAME2,BUILDING,STREET,STREET_2,CITY,STATE,ZIP,PHONE,BOROUGH
32723,1120701-DCA,Individual,2017-02-28 00:00:00.000,Home Improvement Salesperson,"CHUN, MIN G",,,,,FLUSHING,NY,,,
37102,0976248-DCA,Individual,2017-02-28 00:00:00.000,Home Improvement Salesperson,"MIELESZKO, JAROSLAW",,,,,MANHASSET,NY,,,
46769,0794375-DCA,Individual,2015-05-31 00:00:00.000,Locksmith,"GONZALEZ, ANTHONY",,,,,PEARL RIVER,NY,,,
51906,1350201-DCA,Individual,2016-02-28 00:00:00.000,Process Server Individual,"PRINGLE, FREDERICK JOSEPH",,,,,BRONX,NY,,,
3397,1462240-DCA,Premise,2016-12-31 00:00:00.000,Cigarette Retail Dealer,BMI DELI & GROCERY CORP,,665.0,ONDERDONK AVE,,RIDGEWOOD,NY,11385.0,3477983613,Queens
59368,1307588-DCA,Premise,2015-12-15 00:00:00.000,Sidewalk Cafe,480 REST AMSTERDAM INC.,,480.0,AMSTERDAM AVE,,NEW YORK,NY,10024.0,2125794299,Manhattan
61117,1339002-DCA,Premise,2016-03-31 00:00:00.000,Stoop Line Stand,"DOLORES CONVENIENCE & GROCERY, INC.",,3967.0,61ST ST,,WOODSIDE,NY,11377.0,718-457-3182,Queens
62567,1376598-DCA,Premise,2016-03-31 00:00:00.000,Stoop Line Stand,"WHOLE FOODS MARKET GROUP, INC.",,270.0,GREENWICH ST,,NEW YORK,NY,10007.0,2123496555,Manhattan


Aparentemente, las licencias comerciales emitidas a individuos no tienen un código postal asignado. Vamos a validar esa hipótesis:

In [389]:
nulos_no_nulos = licenses.groupby('LICENSE_TYPE')['ZIP'].apply(lambda x: pd.Series({
    'No Nulos': x.notnull().sum(),
    'Nulos': x.isnull().sum()
})).unstack().fillna(0)

# Mostrar el resultado
print(nulos_no_nulos)

              No Nulos  Nulos
LICENSE_TYPE                 
Individual           0  20359
Premise          45409     30


- Como se puede observar, solo las licencias comerciales emitidas a empresas tienen un código postal asignado.
- Por simplicidad, se eliminarán los registros que no tengan un código postal asignado, ya que la idea es poder conectar las tablas `demographics` y `licenses` a través de este campo.

In [390]:
# Eliminamos registros con valores nulos en la columna 'ZIP' → 45409
licenses = licenses.dropna(subset=['ZIP'])
licenses.shape

(45409, 14)

Además, considerando que los códigos postales de la tabla `demographics` son valores enteros, se eliminarán los registros con códigos postales no enteros.

In [391]:
# Eliminar registros con códigos postales no enteros y convertir la columna 'ZIP' a tipo de dato entero
licenses = licenses[licenses['ZIP'].str.isnumeric()].astype({'ZIP': 'int64'})
licenses.shape

(45401, 14)

#### Eliminación de columnas innecesarias
Más que eliminar columnas innecesarias, necesitamos hacer un resúmen para hallar el número de licencias comerciales por código postal que nos permitirá calcular la densidad de licencias comerciales.

In [392]:
# Recuento de licencias por código postal convertir en df con columnas 'ZIP' y 'COUNT LICENSES'
licenseStats = licenses.groupby('ZIP').size().reset_index(name='COUNT_OF_LICENSES')
licenseStats.sample(5, random_state=9)

Unnamed: 0,ZIP,COUNT_OF_LICENSES
628,11575,7
1092,50266,1
1062,44444,2
1395,95762,1
993,33870,1


### **Tabla 3. Unión de Tablas**

In [393]:
# Unir tablas de datos de demografía y licencias
data = pd.merge(demographics, licenseStats, left_on='JURISDICTION NAME', right_on='ZIP', how='inner')
data = data.drop(columns=['ZIP'])

In [394]:
# Reordenar columnas
new_order = ['JURISDICTION NAME', 'COUNT_OF_LICENSES', 'COUNT PARTICIPANTS', 'GENDER_FEMALE',
       'ETHNICITY_PACIFIC_ISLANDER', 'ETHNICITY_HISPANIC_LATINO',
       'ETHNICITY_AMERICAN_INDIAN', 'ETHNICITY_ASIAN_NON_HISPANIC',
       'ETHNICITY_WHITE_NON_HISPANIC', 'ETHNICITY_BLACK_NON_HISPANIC',
       'ETHNICITY_OTHER', 'CITIZENSHIP_PERMANENT_RESIDENT_ALIEN',
       'CITIZENSHIP_US_CITIZEN', 'PUBLIC_ASSISTANCE_YES']
data = data[new_order]

In [395]:
data.head()

Unnamed: 0,JURISDICTION NAME,COUNT_OF_LICENSES,COUNT PARTICIPANTS,GENDER_FEMALE,ETHNICITY_PACIFIC_ISLANDER,ETHNICITY_HISPANIC_LATINO,ETHNICITY_AMERICAN_INDIAN,ETHNICITY_ASIAN_NON_HISPANIC,ETHNICITY_WHITE_NON_HISPANIC,ETHNICITY_BLACK_NON_HISPANIC,ETHNICITY_OTHER,CITIZENSHIP_PERMANENT_RESIDENT_ALIEN,CITIZENSHIP_US_CITIZEN,PUBLIC_ASSISTANCE_YES
0,10001,515,44,0.5,0.0,0.36,0.0,0.07,0.02,0.48,0.07,0.05,0.95,0.45
1,10002,441,35,0.54,0.0,0.03,0.0,0.8,0.17,0.0,0.0,0.06,0.94,0.06
2,10003,483,1,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
3,10005,63,2,1.0,0.0,0.0,0.0,0.5,0.0,0.5,0.0,0.5,0.5,0.0
4,10006,30,6,0.33,0.0,0.33,0.0,0.0,0.17,0.5,0.0,0.0,1.0,0.0


#### Creación de la variable objetivo: Densidad de Licencias Comerciales

In [396]:
# Crear una nueva columna 'LICENSES_PER_CAPITA' que muestre la cantidad de licencias por participante
data['LICENSES_PER_CAPITA'] = data['COUNT_OF_LICENSES'] / data['COUNT PARTICIPANTS']

In [397]:
# Eliminar columnas 'COUNT_OF_LICENSES' y 'COUNT PARTICIPANTS' para evitar redundancia
data = data.drop(columns=['COUNT_OF_LICENSES', 'COUNT PARTICIPANTS'])

# Convertir 'JURISDICTION NAME' a 'ZIP CODE' y hacerlo el índice
data = data.rename(columns={'JURISDICTION NAME': 'ZIP CODE'}).set_index('ZIP CODE')

In [398]:
data.head()

Unnamed: 0_level_0,GENDER_FEMALE,ETHNICITY_PACIFIC_ISLANDER,ETHNICITY_HISPANIC_LATINO,ETHNICITY_AMERICAN_INDIAN,ETHNICITY_ASIAN_NON_HISPANIC,ETHNICITY_WHITE_NON_HISPANIC,ETHNICITY_BLACK_NON_HISPANIC,ETHNICITY_OTHER,CITIZENSHIP_PERMANENT_RESIDENT_ALIEN,CITIZENSHIP_US_CITIZEN,PUBLIC_ASSISTANCE_YES,LICENSES_PER_CAPITA
ZIP CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
10001,0.5,0.0,0.36,0.0,0.07,0.02,0.48,0.07,0.05,0.95,0.45,11.704545
10002,0.54,0.0,0.03,0.0,0.8,0.17,0.0,0.0,0.06,0.94,0.06,12.6
10003,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,483.0
10005,1.0,0.0,0.0,0.0,0.5,0.0,0.5,0.0,0.5,0.5,0.0,31.5
10006,0.33,0.0,0.33,0.0,0.0,0.17,0.5,0.0,0.0,1.0,0.0,5.0


In [399]:
data.shape

(85, 12)