## Start date : July 5th, 2023 (Data Cleaning and Assumptions)

Author : Kenthia

Goals:

Two possible ways to experiment filtering the data: 

- Choose which SIC/NAICS code we would use to filter out the data 
- String match the companies to their Parent companies (Big 4) and then utilize the corresponding SIC code for filtering 


## Experimenting with NAICS Codes: 

### NAICS Codes:

**- 445110 - Supermarkets and Other Grocery (except Convenience) Stores** (Kroger Co and
Albertsons Companies LLC)

**- 455219 - All Other General Merchandise Retailers**(Costco Wholesale Corporation)

**- 452311 - Warehouse Clubs and Supercenters** (Walmart)
    
    

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
pd.set_option('display.max_columns', None)
from shapely.geometry import Point
import fiona
import math
import plotly.express as px
from thefuzz import fuzz
from thefuzz import process



In [3]:
#reading in the dataset 

stores_2022 = pd.read_csv('/srv/data/my_shared_data_folder/rafi/2022_Business_Academic_QCQ_grocery.csv')
stores_2022.drop(['Unnamed: 0'],inplace=True,axis=1)
territories = ['PR', 'FM', 'MP', 'GU', 'VI', 'MH']
stores_2022 = stores_2022.drop(stores_2022[stores_2022['STATE'].isin(territories)].index)
stores_2022


Unnamed: 0,COMPANY,ADDRESS LINE 1,CITY,STATE,ZIPCODE,ZIP4,COUNTY CODE,AREA CODE,IDCODE,LOCATION EMPLOYEE SIZE CODE,LOCATION SALES VOLUME CODE,PRIMARY SIC CODE,SIC6_DESCRIPTIONS,PRIMARY NAICS CODE,NAICS8 DESCRIPTIONS,SIC CODE,SIC6_DESCRIPTIONS (SIC),SIC CODE 1,SIC6_DESCRIPTIONS (SIC1),SIC CODE 2,SIC6_DESCRIPTIONS(SIC2),SIC CODE 3,SIC6_DESCRIPTIONS(SIC3),SIC CODE 4,SIC6_DESCRIPTIONS(SIC4),ARCHIVE VERSION YEAR,YELLOW PAGE CODE,EMPLOYEE SIZE (5) - LOCATION,SALES VOLUME (9) - LOCATION,BUSINESS STATUS CODE,INDUSTRY SPECIFIC FIRST BYTE,YEAR ESTABLISHED,OFFICE SIZE CODE,COMPANY HOLDING STATUS,ABI,SUBSIDIARY NUMBER,PARENT NUMBER,PARENT ACTUAL EMPLOYEE SIZE,PARENT ACTUAL SALES VOLUME,PARENT EMPLOYEE SIZE CODE,PARENT SALES VOLUME CODE,SITE NUMBER,ADDRESS TYPE INDICATOR,POPULATION CODE,CENSUS TRACT,CENSUS BLOCK,LATITUDE,LONGITUDE,MATCH CODE,CBSA CODE,CBSA LEVEL,CSA CODE,FIPS CODE
0,GOMART,55 POSTAL PLZ,MORGANTOWN,WV,26508,7005.0,61.0,304,2,C,C,541103,CONVENIENCE STORES,44512001.0,CONVENIENCE STORES,554101.0,SERVICE STATIONS-GASOLINE & OIL,554110.0,ALTERNATIVE FUELS,,,,,,,2022,21303.0,13.0,2482.0,2,,,,,998372387,,124929449.0,50.0,389500.0,E,I,998372387.0,,7,11000.0,3.0,39.594376,-79.954437,P,34060.0,2.0,390.0,54061.0
1,7-ELEVEN,485 E MAIN ST,EL CENTRO,CA,92243,2619.0,25.0,760,2,B,C,541103,CONVENIENCE STORES,44512001.0,CONVENIENCE STORES,554101.0,SERVICE STATIONS-GASOLINE & OIL,,,,,,,,,2022,,6.0,1273.0,2,W,,,,495660326,,5863311.0,800.0,1272634.0,H,K,495660326.0,,7,11400.0,3.0,32.792679,-115.536058,0,20940.0,2.0,0.0,6025.0
2,EL SOL MARKET,110 W MAIN ST,WESTMORLAND,CA,92281,,25.0,760,2,B,C,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,39106.0,5.0,1061.0,9,,,,,519605455,,,,,,,,,3,10200.0,1.0,33.051800,-115.581800,X,20940.0,2.0,0.0,6025.0
3,COOL SPRINGS GROCERY,241 WILDWOOD ST,MORGANTOWN,WV,26505,3141.0,61.0,304,2,A,A,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,39106.0,2.0,382.0,9,,,,,817953730,,,,,,,,,7,10400.0,4.0,39.652885,-79.986398,P,34060.0,2.0,390.0,54061.0
4,CIRCLE K,123 E MAIN ST,WESTMORLAND,CA,92281,,25.0,760,2,B,C,541103,CONVENIENCE STORES,44512001.0,CONVENIENCE STORES,554101.0,SERVICE STATIONS-GASOLINE & OIL,,,,,,,,,2022,,6.0,1273.0,2,F,,,,855113817,,450720289.0,650.0,4981020.0,H,J,,,3,10200.0,1.0,33.051800,-115.581800,X,20940.0,2.0,0.0,6025.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
203498,H-E-B,1434 W WELLS BRANCH PKWY,PFLUGERVILLE,TX,78660,3153.0,453.0,512,2,G,H,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,554101.0,SERVICE STATIONS-GASOLINE & OIL,729954.0,VEHICLE & DRIVERS LICENSING SERVICES,581208.0,RESTAURANTS,,,,,2022,39106.0,270.0,71857.0,2,F,,,,403451198,,436443592.0,1000.0,23000000.0,I,K,403451198.0,,8,1840.0,3.0,30.442331,-97.664580,P,12420.0,2.0,0.0,48453.0
203499,ELECTRIC CHARGING STATION,1A LAKEVIEW DR,HALFMOON,NY,12065,4101.0,91.0,518,2,,,554112,ELECTRIC CHARGING STATION,44719010.0,OTHER GASOLINE STATIONS,,,,,,,,,,,2022,,,0.0,9,J,,,,739034319,,,,,,,,,7,62405.0,1.0,42.851508,-73.767767,0,10580.0,2.0,104.0,36091.0
203500,SAC N PAC,2101 N STATE HIGHWAY 123,SAN MARCOS,TX,78666,1441.0,209.0,512,2,A,A,541103,CONVENIENCE STORES,44512001.0,CONVENIENCE STORES,541105.0,GROCERS-RETAIL,,,,,,,,,2022,21303.0,1.0,304.0,2,,,,,898604384,625576871.0,5863311.0,800.0,1272634.0,H,K,729087954.0,,8,10400.0,3.0,29.845163,-97.940228,P,12420.0,2.0,0.0,48209.0
203501,PRICE CHOPPER,1 KENDALL WAY,BALLSTON SPA,NY,12020,4399.0,91.0,518,2,F,F,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,543101.0,FRUITS & VEGETABLES & PRODUCE-RETAIL,546103.0,BAKERS-CAKE & PIE,,,,,,,2022,,100.0,18804.0,2,U,,,,734631888,4352464.0,769874507.0,,,,,734631888.0,,7,61901.0,5.0,42.973191,-73.794719,P,10580.0,2.0,104.0,36091.0


In [5]:
#filter by NAICS code "44511" - SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS

#grocery_naics = stores_2022.loc[(stores_2022["PRIMARY NAICS CODE"] == 445110) | (stores_2022["PRIMARY NAICS CODE"] == 455219 ) | (stores_2022["PRIMARY NAICS CODE"] == 452311) ]

grocery_naics = stores_2022.loc[stores_2022[['PRIMARY NAICS CODE']].astype(str).apply(lambda x: x.str.startswith('44511')).any(axis=1)]
grocery_naics

Unnamed: 0,COMPANY,ADDRESS LINE 1,CITY,STATE,ZIPCODE,ZIP4,COUNTY CODE,AREA CODE,IDCODE,LOCATION EMPLOYEE SIZE CODE,LOCATION SALES VOLUME CODE,PRIMARY SIC CODE,SIC6_DESCRIPTIONS,PRIMARY NAICS CODE,NAICS8 DESCRIPTIONS,SIC CODE,SIC6_DESCRIPTIONS (SIC),SIC CODE 1,SIC6_DESCRIPTIONS (SIC1),SIC CODE 2,SIC6_DESCRIPTIONS(SIC2),SIC CODE 3,SIC6_DESCRIPTIONS(SIC3),SIC CODE 4,SIC6_DESCRIPTIONS(SIC4),ARCHIVE VERSION YEAR,YELLOW PAGE CODE,EMPLOYEE SIZE (5) - LOCATION,SALES VOLUME (9) - LOCATION,BUSINESS STATUS CODE,INDUSTRY SPECIFIC FIRST BYTE,YEAR ESTABLISHED,OFFICE SIZE CODE,COMPANY HOLDING STATUS,ABI,SUBSIDIARY NUMBER,PARENT NUMBER,PARENT ACTUAL EMPLOYEE SIZE,PARENT ACTUAL SALES VOLUME,PARENT EMPLOYEE SIZE CODE,PARENT SALES VOLUME CODE,SITE NUMBER,ADDRESS TYPE INDICATOR,POPULATION CODE,CENSUS TRACT,CENSUS BLOCK,LATITUDE,LONGITUDE,MATCH CODE,CBSA CODE,CBSA LEVEL,CSA CODE,FIPS CODE
2,EL SOL MARKET,110 W MAIN ST,WESTMORLAND,CA,92281,,25.0,760,2,B,C,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,39106.0,5.0,1061.0,9,,,,,519605455,,,,,,,,,3,10200.0,1.0,33.051800,-115.581800,X,20940.0,2.0,0.0,6025.0
3,COOL SPRINGS GROCERY,241 WILDWOOD ST,MORGANTOWN,WV,26505,3141.0,61.0,304,2,A,A,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,39106.0,2.0,382.0,9,,,,,817953730,,,,,,,,,7,10400.0,4.0,39.652885,-79.986398,P,34060.0,2.0,390.0,54061.0
5,EL SOL MEAT MARKET,1100 MEADOWS RD # B,CALEXICO,CA,92231,5917.0,25.0,760,2,D,E,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,542107.0,MEAT-RETAIL,,,,,,,,,2022,,29.0,6153.0,9,,,,,995912391,,,,,,,995912391.0,,7,12002.0,2.0,32.679923,-115.482166,P,20940.0,2.0,0.0,6025.0
7,KROGER,1600 S OHIO ST,MARTINSVILLE,IN,46151,3317.0,109.0,765,2,F,G,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,39106.0,102.0,21931.0,2,P,,,,140765967,,7521503.0,1200.0,137888000.0,I,I,140765967.0,,7,510701.0,2.0,39.411880,-86.425148,P,26900.0,2.0,294.0,18109.0
8,FOOD 4 LESS,2420 COTTONWOOD DR,EL CENTRO,CA,92243,1604.0,25.0,760,2,E,F,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,39106.0,70.0,14851.0,2,ï¿½,,,,215261702,402627138.0,7521503.0,1200.0,137888000.0,I,I,215261702.0,,7,11201.0,1.0,32.815719,-115.572166,P,20940.0,2.0,0.0,6025.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
203493,SAFEWAY,2204 W NOB HILL BLVD # A,YAKIMA,WA,98902,6200.0,77.0,509,2,F,G,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,546102.0,BAKERS-RETAIL,599201.0,FLORISTS-RETAIL,592102.0,LIQUORS-RETAIL,,,,,2022,,145.0,35556.0,2,V,,,,840802979,9606492.0,5995907.0,800.0,,H,,840802979.0,,7,1100.0,1.0,46.584035,-120.539230,P,49420.0,2.0,0.0,53077.0
203496,CORNUCOPIA,1104 THORPE LN # J,SAN MARCOS,TX,78666,7126.0,209.0,512,2,B,C,541108,GROCERS-HEALTH FOODS,44511007.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,549904.0,VITAMIN & FOOD SUPPLEMENTS,549913.0,HERBS,514949.0,NATURAL FOODS-WHOLESALE,549935.0,ORGANIC FOODS & SERVICES,512205.0,VITAMINS & FOOD SUPPLEMENTS-WHOLESALE,2022,,6.0,1823.0,9,,1970.0,,,164470890,,,,,,,,,8,10304.0,1.0,29.886013,-97.924983,P,12420.0,2.0,0.0,48209.0
203497,T-MART,103 SUNRISE DR,SUNRISE BEACH,TX,78643,9287.0,299.0,325,2,A,A,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,343201.0,PLUMBING FIXTURES & SUPPLIES-MFRS,572222.0,PUMPS-RETAIL,508444.0,PUMPS (WHLS),525104.0,HARDWARE-RETAIL,594131.0,FISHING TACKLE-DEALERS,2022,,2.0,441.0,9,,,,,880617683,,,,,,,,,5,970300.0,1.0,30.593381,-98.425843,P,0.0,,0.0,48299.0
203498,H-E-B,1434 W WELLS BRANCH PKWY,PFLUGERVILLE,TX,78660,3153.0,453.0,512,2,G,H,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,554101.0,SERVICE STATIONS-GASOLINE & OIL,729954.0,VEHICLE & DRIVERS LICENSING SERVICES,581208.0,RESTAURANTS,,,,,2022,39106.0,270.0,71857.0,2,F,,,,403451198,,436443592.0,1000.0,23000000.0,I,K,403451198.0,,8,1840.0,3.0,30.442331,-97.664580,P,12420.0,2.0,0.0,48453.0


In [6]:
grocery_type= grocery_naics["PRIMARY NAICS CODE"].unique().tolist()
grocery_type

[44511003.0, 44511006.0, 44511007.0, 44511004.0]

# Will have to find another way to filter as NAICS CODES is not giving us exactly what we want.

## Some String Matching work and Validation 
### ( Just a Snippet - Other lists in other group members notebook )

In [31]:
#Ahold Delhaize:

stores_2022.loc[stores_2022["COMPANY"] =="GIANT EAGLE FLORAL"]

#Checking for Actual supermarkets owned by Ahold Delhaize:

 #FRESH DIRECT - remove 
 #GIANT EAGLE - keep 
 #GIANT EAGLE FLORAL - remove
 #PEA POD - keep
 #BFRESH MARKET - keep


Unnamed: 0,COMPANY,ADDRESS LINE 1,CITY,STATE,ZIPCODE,ZIP4,COUNTY CODE,AREA CODE,IDCODE,LOCATION EMPLOYEE SIZE CODE,LOCATION SALES VOLUME CODE,PRIMARY SIC CODE,SIC6_DESCRIPTIONS,PRIMARY NAICS CODE,NAICS8 DESCRIPTIONS,SIC CODE,SIC6_DESCRIPTIONS (SIC),SIC CODE 1,SIC6_DESCRIPTIONS (SIC1),SIC CODE 2,SIC6_DESCRIPTIONS(SIC2),SIC CODE 3,SIC6_DESCRIPTIONS(SIC3),SIC CODE 4,SIC6_DESCRIPTIONS(SIC4),ARCHIVE VERSION YEAR,YELLOW PAGE CODE,EMPLOYEE SIZE (5) - LOCATION,SALES VOLUME (9) - LOCATION,BUSINESS STATUS CODE,INDUSTRY SPECIFIC FIRST BYTE,YEAR ESTABLISHED,OFFICE SIZE CODE,COMPANY HOLDING STATUS,ABI,SUBSIDIARY NUMBER,PARENT NUMBER,PARENT ACTUAL EMPLOYEE SIZE,PARENT ACTUAL SALES VOLUME,PARENT EMPLOYEE SIZE CODE,PARENT SALES VOLUME CODE,SITE NUMBER,ADDRESS TYPE INDICATOR,POPULATION CODE,CENSUS TRACT,CENSUS BLOCK,LATITUDE,LONGITUDE,MATCH CODE,CBSA CODE,CBSA LEVEL,CSA CODE,FIPS CODE
1074,GIANT EAGLE FLORAL,5620 BAPTIST RD,PITTSBURGH,PA,15236,,3.0,412,2,A,A,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,599201.0,FLORISTS-RETAIL,,,,,,,,,2022,,1.0,207.0,2,,,,,404236039,,6829915.0,1300.0,389500.0,I,I,,,7,477300.0,5.0,40.3446,-79.9775,X,38300.0,2.0,430.0,42003.0


In [32]:
#KROGERS:

stores_2022.loc[stores_2022["COMPANY"] =="BAKER STREET MARKET"]

#Checking for KROGERS owned supermarkets/companies:

#BAKER'S - keep 
#BAKERS- keep 
#BAKER'S MARKET - remove
#BAKER STREET MARKET - remove
#FRY'S FOOD - keep 
#GERBES SUPER MARKET - keep

Unnamed: 0,COMPANY,ADDRESS LINE 1,CITY,STATE,ZIPCODE,ZIP4,COUNTY CODE,AREA CODE,IDCODE,LOCATION EMPLOYEE SIZE CODE,LOCATION SALES VOLUME CODE,PRIMARY SIC CODE,SIC6_DESCRIPTIONS,PRIMARY NAICS CODE,NAICS8 DESCRIPTIONS,SIC CODE,SIC6_DESCRIPTIONS (SIC),SIC CODE 1,SIC6_DESCRIPTIONS (SIC1),SIC CODE 2,SIC6_DESCRIPTIONS(SIC2),SIC CODE 3,SIC6_DESCRIPTIONS(SIC3),SIC CODE 4,SIC6_DESCRIPTIONS(SIC4),ARCHIVE VERSION YEAR,YELLOW PAGE CODE,EMPLOYEE SIZE (5) - LOCATION,SALES VOLUME (9) - LOCATION,BUSINESS STATUS CODE,INDUSTRY SPECIFIC FIRST BYTE,YEAR ESTABLISHED,OFFICE SIZE CODE,COMPANY HOLDING STATUS,ABI,SUBSIDIARY NUMBER,PARENT NUMBER,PARENT ACTUAL EMPLOYEE SIZE,PARENT ACTUAL SALES VOLUME,PARENT EMPLOYEE SIZE CODE,PARENT SALES VOLUME CODE,SITE NUMBER,ADDRESS TYPE INDICATOR,POPULATION CODE,CENSUS TRACT,CENSUS BLOCK,LATITUDE,LONGITUDE,MATCH CODE,CBSA CODE,CBSA LEVEL,CSA CODE,FIPS CODE
8484,BAKER STREET MARKET,96 BAKER ST,MAPLEWOOD,NJ,7040,2502.0,13.0,973,2,A,A,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,,2.0,486.0,9,,2019.0,,,746507627,,,,,,,,,7,19900.0,2.0,40.730231,-74.279081,0,35620.0,2.0,408.0,34013.0


In [44]:
#Albertsons:

stores_2022.loc[stores_2022["COMPANY"] =="CARRS QUALITY CTR PALMER SHPG"]

#Checking for Albertsons owned supermarkets/companies:

#SHAW'S SUPERMARKET - keep 
#SHAW'S SUPERMARKETS INC - keep 
#SUPER SAVER COST PLUS - keep 
#SAAR'S SUPER SAVER FOODS - keep
#CARRS QUALITY CTR PALMER SHPG - keep 
#BIG CHEAP CASH & CARR - remove

Unnamed: 0,COMPANY,ADDRESS LINE 1,CITY,STATE,ZIPCODE,ZIP4,COUNTY CODE,AREA CODE,IDCODE,LOCATION EMPLOYEE SIZE CODE,LOCATION SALES VOLUME CODE,PRIMARY SIC CODE,SIC6_DESCRIPTIONS,PRIMARY NAICS CODE,NAICS8 DESCRIPTIONS,SIC CODE,SIC6_DESCRIPTIONS (SIC),SIC CODE 1,SIC6_DESCRIPTIONS (SIC1),SIC CODE 2,SIC6_DESCRIPTIONS(SIC2),SIC CODE 3,SIC6_DESCRIPTIONS(SIC3),SIC CODE 4,SIC6_DESCRIPTIONS(SIC4),ARCHIVE VERSION YEAR,YELLOW PAGE CODE,EMPLOYEE SIZE (5) - LOCATION,SALES VOLUME (9) - LOCATION,BUSINESS STATUS CODE,INDUSTRY SPECIFIC FIRST BYTE,YEAR ESTABLISHED,OFFICE SIZE CODE,COMPANY HOLDING STATUS,ABI,SUBSIDIARY NUMBER,PARENT NUMBER,PARENT ACTUAL EMPLOYEE SIZE,PARENT ACTUAL SALES VOLUME,PARENT EMPLOYEE SIZE CODE,PARENT SALES VOLUME CODE,SITE NUMBER,ADDRESS TYPE INDICATOR,POPULATION CODE,CENSUS TRACT,CENSUS BLOCK,LATITUDE,LONGITUDE,MATCH CODE,CBSA CODE,CBSA LEVEL,CSA CODE,FIPS CODE
14700,CARRS QUALITY CTR PALMER SHPG,664 E PALMER WASILLA HWY,PALMER,AK,99645,6573.0,170.0,907,2,B,C,541105,GROCERS-RETAIL,44511003.0,SUPERMARKETS/OTHER GROCERY (EXC CONVENIENCE) STRS,,,,,,,,,,,2022,,6.0,1857.0,9,,,,,427093379,,,,,,,,,7,1202.0,2.0,61.599356,-149.133774,4,11260.0,2.0,0.0,2170.0


### Research Question 2


To what extent is corporate concentration found today within the United States grocery industry? How does this vary by state or geographic region? How has the monopoly power of groceries’ parent companies changed over time?