## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [2]:
from pyspark.sql.types import IntegerType,StringType
from pyspark.sql.functions import col
from pyspark.sql import Row

# from pyspark import SparkContext as spark

# File location and type
file_location = "/FileStore/tables/COVID_19_Nursing_Home_Dataset.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","

# Read data
data=spark.read.format(file_type)\
.option('inferSchema',True)\
.option('header',True)\
.option('sep',',')\
.load(file_location)

display(data)

Week Ending,Federal Provider Number,Provider Name,Provider Address,Provider City,Provider State,Provider Zip Code,Submitted Data,Passed Quality Assurance Check,Residents Weekly Admissions COVID-19,Residents Total Admissions COVID-19,Residents Weekly Confirmed COVID-19,Residents Total Confirmed COVID-19,Residents Weekly Suspected COVID-19,Residents Total Suspected COVID-19,Residents Weekly All Deaths,Residents Total All Deaths,Residents Weekly COVID-19 Deaths,Residents Total COVID-19 Deaths,Number of All Beds,Total Number of Occupied Beds,Resident Access to Testing in Facility,Laboratory Type Is State Health Dept,Laboratory Type Is Private Lab,Laboratory Type Is Other,Staff Weekly Confirmed COVID-19,Staff Total Confirmed COVID-19,Staff Weekly Suspected COVID-19,Staff Total Suspected COVID-19,Staff Weekly COVID-19 Deaths,Staff Total COVID-19 Deaths,Shortage of Nursing Staff,Shortage of Clinical Staff,Shortage of Aides,Shortage of Other Staff,Any Current Supply of N95 Masks,One-Week Supply of N95 Masks,Any Current Supply of Surgical Masks,One-Week Supply of Surgical Masks,Any Current Supply of Eye Protection,One-Week Supply of Eye Protection,Any Current Supply of Gowns,One-Week Supply of Gowns,Any Current Supply of Gloves,One-Week Supply of Gloves,Any Current Supply of Hand Sanitizer,One-Week Supply of Hand Sanitizer,Ventilator Dependent Unit,Number of Ventilators in Facility,Number of Ventilators in Use for COVID-19,Any Current Supply of Ventilator Supplies,One-Week Supply of Ventilator Supplies,"Total Resident Confirmed COVID-19 Cases Per 1,000 Residents","Total Resident COVID-19 Deaths Per 1,000 Residents",Total Residents COVID-19 Deaths as a Percentage of Confirmed COVID-19 Cases,County,Three or More Confirmed and Suspected COVID-19 Cases This Week,Initial Confirmed COVID-19 Case This Week,Geolocation
06/07/2020,165418,SIMPSON MEMORIAL HOME,1000 NORTH MILLER STREET,WEST LIBERTY,IA,52776,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,65.0,38.0,Y,Y,N,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Muscatine,N,N,POINT (-91.259059 41.578372)
06/28/2020,115615,"BAPTIST VILLAGE, INC.",2650 CARSWELL AVE,WAYCROSS,GA,31502,Y,Y,2.0,2.0,5.0,12.0,19.0,145.0,1.0,36.0,0.0,1.0,254.0,192.0,Y,Y,Y,N,3.0,16.0,14.0,199.0,0.0,0.0,Y,N,Y,Y,Y,N,Y,N,Y,N,Y,N,Y,Y,Y,Y,N,,,,,62.5,5.2,8.3,Ware,Y,N,POINT (-82.414236 31.213024000000004)
06/07/2020,146189,LITTLE SISTERS OF THE POOR OF PALATINE,80 WEST NORTHWEST HIGHWAY,PALATINE,IL,60067,Y,Y,0.0,0.0,0.0,0.0,0.0,3.0,0.0,5.0,0.0,0.0,59.0,49.0,Y,N,Y,N,0.0,1.0,0.0,9.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Cook,N,N,POINT (-88.04550200000001 42.121632)
07/05/2020,115276,PRUITTHEALTH - MARIETTA,70 SAINE DRIVE SW,MARIETTA,GA,30008,Y,Y,1.0,7.0,0.0,116.0,0.0,13.0,0.0,2.0,0.0,2.0,119.0,81.0,Y,Y,Y,N,0.0,10.0,0.0,7.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,1432.1,24.7,1.7,Cobb,N,N,POINT (-84.556764 33.916387)
06/21/2020,165366,LAKE MILLS CARE CENTER,406 SOUTH TENTH AVENUE EAST,LAKE MILLS,IA,50450,Y,Y,0.0,0.0,0.0,0.0,0.0,4.0,0.0,2.0,0.0,0.0,78.0,45.0,Y,Y,Y,N,0.0,0.0,1.0,4.0,0.0,0.0,N,N,N,N,N,N,N,N,Y,N,N,N,Y,Y,Y,N,N,,,,,0.0,0.0,,Winnebago,N,N,POINT (-93.528394 43.410183)
05/24/2020,175334,GOOD SAMARITAN SOCIETY - LIBERAL,2160 ZINNIA LANE,LIBERAL,KS,67901,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,45.0,31.0,Y,Y,N,N,1.0,1.0,0.0,0.0,0.0,0.0,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Seward,N,N,POINT (-100.924909 37.064107)
05/31/2020,175309,ARKANSAS CITY PRESBYTERIAN MANOR,1711 N 4TH STREET,ARKANSAS CITY,KS,67005,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,60.0,48.0,Y,Y,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Cowley,N,N,POINT (-97.04464200000001 37.083424)
05/31/2020,115346,BOLINGREEN HEALTH AND REHABILITATION,529 BOLINGREEN DRIVE,MACON,GA,31210,Y,Y,0.0,0.0,3.0,7.0,1.0,91.0,1.0,2.0,0.0,0.0,121.0,90.0,Y,N,Y,N,4.0,8.0,0.0,0.0,0.0,0.0,N,N,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,77.8,0.0,0.0,Monroe,Y,N,POINT (-83.765283 32.939608)
06/28/2020,106097,FLORIDA BAPTIST RETIREMENT CENTER,1006 33RD ST,VERO BEACH,FL,32960,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,24.0,17.0,Y,N,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Indian River,N,N,POINT (-80.392647 27.652939)
06/28/2020,155546,BETHEL POINTE HEALTH AND REHAB,3400 W COMMUNITY DR,MUNCIE,IN,47304,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,1.0,5.0,0.0,0.0,114.0,100.0,Y,N,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Delaware,N,N,POINT (-85.426484 40.217895)


In [3]:
# Show all Columns, Column Type, meta data if any and if it can contain null values
data.printSchema()

# Filter columns which are only numeric
numeric_cols=[c.name for c in data.schema.fields if isinstance(c.dataType,IntegerType)]
print(numeric_cols)

# Run stats on all numeric columns
for r in numeric_cols:
  data.select(r).describe().show()

In [4]:
# Preprocessing
cont_cols=[]
cat_cols=[]
for col_name in numeric_cols:
  # Remove all numeric columns where % of 0s is more than 75%
  if (data.select(col(col_name)).where(col(col_name)==0).count()/data.count()) >= 0.95:
    print(col_name)
    data.drop(col_name)
    
  # Find  continious cols
  if data.select(col(col_name)).distinct().count() > 10:
    cont_cols.append(col_name)
  # Find Categorical cols
  if data.select(col(col_name)).distinct().count() <= 10:
    cat_cols.append(col_name)
        
#display(data)   
data.columns
print(cat_cols,cont_cols)
display(data)

In [5]:
from pyspark.sql.functions import corr
# EDA

q=data.stat.approxQuantile(cont_cols,[0.25, 0.5, 0.75], 0)
print(q)
#df.select(corr(col(cont_cols[0]),col(cont_cols[2])))
for c in cont_cols:
  data.select(   corr(col(c),col('Residents Weekly All Deaths'))).show()

In [6]:
display(data)

Week Ending,Federal Provider Number,Provider Name,Provider Address,Provider City,Provider State,Provider Zip Code,Submitted Data,Passed Quality Assurance Check,Residents Weekly Admissions COVID-19,Residents Total Admissions COVID-19,Residents Weekly Confirmed COVID-19,Residents Total Confirmed COVID-19,Residents Weekly Suspected COVID-19,Residents Total Suspected COVID-19,Residents Weekly All Deaths,Residents Total All Deaths,Residents Weekly COVID-19 Deaths,Residents Total COVID-19 Deaths,Number of All Beds,Total Number of Occupied Beds,Resident Access to Testing in Facility,Laboratory Type Is State Health Dept,Laboratory Type Is Private Lab,Laboratory Type Is Other,Staff Weekly Confirmed COVID-19,Staff Total Confirmed COVID-19,Staff Weekly Suspected COVID-19,Staff Total Suspected COVID-19,Staff Weekly COVID-19 Deaths,Staff Total COVID-19 Deaths,Shortage of Nursing Staff,Shortage of Clinical Staff,Shortage of Aides,Shortage of Other Staff,Any Current Supply of N95 Masks,One-Week Supply of N95 Masks,Any Current Supply of Surgical Masks,One-Week Supply of Surgical Masks,Any Current Supply of Eye Protection,One-Week Supply of Eye Protection,Any Current Supply of Gowns,One-Week Supply of Gowns,Any Current Supply of Gloves,One-Week Supply of Gloves,Any Current Supply of Hand Sanitizer,One-Week Supply of Hand Sanitizer,Ventilator Dependent Unit,Number of Ventilators in Facility,Number of Ventilators in Use for COVID-19,Any Current Supply of Ventilator Supplies,One-Week Supply of Ventilator Supplies,"Total Resident Confirmed COVID-19 Cases Per 1,000 Residents","Total Resident COVID-19 Deaths Per 1,000 Residents",Total Residents COVID-19 Deaths as a Percentage of Confirmed COVID-19 Cases,County,Three or More Confirmed and Suspected COVID-19 Cases This Week,Initial Confirmed COVID-19 Case This Week,Geolocation
06/07/2020,165418,SIMPSON MEMORIAL HOME,1000 NORTH MILLER STREET,WEST LIBERTY,IA,52776,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,65.0,38.0,Y,Y,N,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Muscatine,N,N,POINT (-91.259059 41.578372)
06/28/2020,115615,"BAPTIST VILLAGE, INC.",2650 CARSWELL AVE,WAYCROSS,GA,31502,Y,Y,2.0,2.0,5.0,12.0,19.0,145.0,1.0,36.0,0.0,1.0,254.0,192.0,Y,Y,Y,N,3.0,16.0,14.0,199.0,0.0,0.0,Y,N,Y,Y,Y,N,Y,N,Y,N,Y,N,Y,Y,Y,Y,N,,,,,62.5,5.2,8.3,Ware,Y,N,POINT (-82.414236 31.213024000000004)
06/07/2020,146189,LITTLE SISTERS OF THE POOR OF PALATINE,80 WEST NORTHWEST HIGHWAY,PALATINE,IL,60067,Y,Y,0.0,0.0,0.0,0.0,0.0,3.0,0.0,5.0,0.0,0.0,59.0,49.0,Y,N,Y,N,0.0,1.0,0.0,9.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Cook,N,N,POINT (-88.04550200000001 42.121632)
07/05/2020,115276,PRUITTHEALTH - MARIETTA,70 SAINE DRIVE SW,MARIETTA,GA,30008,Y,Y,1.0,7.0,0.0,116.0,0.0,13.0,0.0,2.0,0.0,2.0,119.0,81.0,Y,Y,Y,N,0.0,10.0,0.0,7.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,1432.1,24.7,1.7,Cobb,N,N,POINT (-84.556764 33.916387)
06/21/2020,165366,LAKE MILLS CARE CENTER,406 SOUTH TENTH AVENUE EAST,LAKE MILLS,IA,50450,Y,Y,0.0,0.0,0.0,0.0,0.0,4.0,0.0,2.0,0.0,0.0,78.0,45.0,Y,Y,Y,N,0.0,0.0,1.0,4.0,0.0,0.0,N,N,N,N,N,N,N,N,Y,N,N,N,Y,Y,Y,N,N,,,,,0.0,0.0,,Winnebago,N,N,POINT (-93.528394 43.410183)
05/24/2020,175334,GOOD SAMARITAN SOCIETY - LIBERAL,2160 ZINNIA LANE,LIBERAL,KS,67901,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,45.0,31.0,Y,Y,N,N,1.0,1.0,0.0,0.0,0.0,0.0,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Seward,N,N,POINT (-100.924909 37.064107)
05/31/2020,175309,ARKANSAS CITY PRESBYTERIAN MANOR,1711 N 4TH STREET,ARKANSAS CITY,KS,67005,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,60.0,48.0,Y,Y,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Cowley,N,N,POINT (-97.04464200000001 37.083424)
05/31/2020,115346,BOLINGREEN HEALTH AND REHABILITATION,529 BOLINGREEN DRIVE,MACON,GA,31210,Y,Y,0.0,0.0,3.0,7.0,1.0,91.0,1.0,2.0,0.0,0.0,121.0,90.0,Y,N,Y,N,4.0,8.0,0.0,0.0,0.0,0.0,N,N,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,77.8,0.0,0.0,Monroe,Y,N,POINT (-83.765283 32.939608)
06/28/2020,106097,FLORIDA BAPTIST RETIREMENT CENTER,1006 33RD ST,VERO BEACH,FL,32960,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,24.0,17.0,Y,N,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Indian River,N,N,POINT (-80.392647 27.652939)
06/28/2020,155546,BETHEL POINTE HEALTH AND REHAB,3400 W COMMUNITY DR,MUNCIE,IN,47304,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,1.0,5.0,0.0,0.0,114.0,100.0,Y,N,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Delaware,N,N,POINT (-85.426484 40.217895)


In [7]:
data2=data.dropDuplicates()
display(data2)

Week Ending,Federal Provider Number,Provider Name,Provider Address,Provider City,Provider State,Provider Zip Code,Submitted Data,Passed Quality Assurance Check,Residents Weekly Admissions COVID-19,Residents Total Admissions COVID-19,Residents Weekly Confirmed COVID-19,Residents Total Confirmed COVID-19,Residents Weekly Suspected COVID-19,Residents Total Suspected COVID-19,Residents Weekly All Deaths,Residents Total All Deaths,Residents Weekly COVID-19 Deaths,Residents Total COVID-19 Deaths,Number of All Beds,Total Number of Occupied Beds,Resident Access to Testing in Facility,Laboratory Type Is State Health Dept,Laboratory Type Is Private Lab,Laboratory Type Is Other,Staff Weekly Confirmed COVID-19,Staff Total Confirmed COVID-19,Staff Weekly Suspected COVID-19,Staff Total Suspected COVID-19,Staff Weekly COVID-19 Deaths,Staff Total COVID-19 Deaths,Shortage of Nursing Staff,Shortage of Clinical Staff,Shortage of Aides,Shortage of Other Staff,Any Current Supply of N95 Masks,One-Week Supply of N95 Masks,Any Current Supply of Surgical Masks,One-Week Supply of Surgical Masks,Any Current Supply of Eye Protection,One-Week Supply of Eye Protection,Any Current Supply of Gowns,One-Week Supply of Gowns,Any Current Supply of Gloves,One-Week Supply of Gloves,Any Current Supply of Hand Sanitizer,One-Week Supply of Hand Sanitizer,Ventilator Dependent Unit,Number of Ventilators in Facility,Number of Ventilators in Use for COVID-19,Any Current Supply of Ventilator Supplies,One-Week Supply of Ventilator Supplies,"Total Resident Confirmed COVID-19 Cases Per 1,000 Residents","Total Resident COVID-19 Deaths Per 1,000 Residents",Total Residents COVID-19 Deaths as a Percentage of Confirmed COVID-19 Cases,County,Three or More Confirmed and Suspected COVID-19 Cases This Week,Initial Confirmed COVID-19 Case This Week,Geolocation
07/05/2020,165257,GOLDEN AGE CARE CENTER,1915 SOUTH 18TH STREET,CENTERVILLE,IA,52544,N,,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,58.0,47.0,,,,,0.0,0.0,0.0,2.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,Appanoose,,,POINT (-92.867878 40.715107)
05/24/2020,155703,BROOKSIDE VILLAGE INC,1111 CHURCH AVE,JASPER,IN,47546,Y,Y,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,27.0,17.0,Y,N,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,N,,,,,0.0,0.0,,Dubois,N,N,POINT (-86.912508 38.37595900000001)
06/07/2020,135082,MCCALL REHABILITATION AND CARE CENTER,418 FLOYDE STREET,MCCALL,ID,83638,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,65.0,28.0,Y,N,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Valley,N,N,POINT (-116.094323 44.901428)
06/14/2020,145978,SHAWNEE ROSE CARE CENTER,1000 WEST SLOAN STREET,HARRISBURG,IL,62946,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,0.0,68.0,22.0,Y,Y,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Saline,N,N,POINT (-88.554538 37.731493)
07/05/2020,155556,MILLER'S MERRY MANOR,300 FAIRGROUNDS RD,TIPTON,IN,46072,Y,Y,0.0,0.0,1.0,2.0,0.0,0.0,1.0,5.0,0.0,0.0,150.0,121.0,Y,N,N,Y,0.0,3.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,16.5,0.0,0.0,Tipton,N,N,POINT (-86.041605 40.269716)
07/05/2020,165323,CORRECTIONVILLE SPECIALTY CARE,1116 EAST HIGHWAY 20,CORRECTIONVILLE,IA,51016,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,39.0,28.0,Y,Y,Y,N,0.0,0.0,0.0,2.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Woodbury,N,N,POINT (-95.77667 42.482178)
07/05/2020,14E212,FRANKFORT TERRACE,40 NORTH SMITH,FRANKFORT,IL,60423,N,,0.0,0.0,0.0,2.0,0.0,29.0,0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,0.0,Will,,,
06/28/2020,175399,CHENEY GOLDEN AGE HOME,724 N MAIN PO BOX 370,CHENEY,KS,67025,Y,Y,0.0,0.0,0.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,45.0,44.0,Y,Y,Y,N,0.0,0.0,1.0,2.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Sedgwick,N,N,
07/05/2020,106095,CLUB HEALTH AND REHABILITATION CENTER AT THE VILLA,16529 SE 86TH BELLE MEADE CIRCLE,THE VILLAGES,FL,32162,Y,Y,0.0,1.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,68.0,53.0,Y,Y,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,Alachua,N,N,
05/24/2020,155062,GOLDEN LIVING CENTER-LAPORTE,1700 I STREET,LA PORTE,IN,46350,Y,Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,81.0,59.0,Y,N,Y,N,0.0,0.0,0.0,0.0,0.0,0.0,N,N,N,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,,,,,0.0,0.0,,LaPorte,N,N,POINT (-86.72939800000002 41.592112)


In [8]:
print(data.count(),data2.count())