##### DELIVERABLES
- A Jupyter notebook that aggregates raw counts of optometrist and ophthalmologist locations and types by different geographies (e.g., census tracts, zip codes, counties), displays the results as choropleth maps, and then computes the spatial lag of the counts to identify clusters.

- A Jupyter notebook that maps optometrist and ophthalmologist locations across the United States, draws “zones” of eyecare provider access using a spatial interaction model based on driving distance and building characteristics, and then merges the zones with different geographies of interest.

- A Jupyter notebook that merges demographic variables with the computed access zones and raw provider counts and then explores whether access significantly varies by race, ethnicity, education level, etc.


In [None]:
map_1 = KeplerGl()
map_1

In [None]:
# mapping locations across united states moment

In [1]:
#importing necessary libraries
import requests
from urllib.request import urlopen
import json
from shapely.geometry import shape, Polygon
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
plt.style.use('default')
import geopandas as gpd
import geodatasets
import ast
from shapely import wkt
from pyproj import Geod
import contextily
pd.set_option('display.max_columns', None)
import pyarrow as pyarrow
import fastparquet as fastparquet
import plotly.express as px
from keplergl import KeplerGl

In [3]:
# importing and setting up vision providers dataset
vision = pd.read_parquet("../Data/vision_providers_minimal.parquet")

def classify_provider(taxonomy):
    codes = taxonomy.split('|')
    for code in codes:
        if code.startswith('152'):
            return 'Optometry'
        if code.startswith('207'):
            return 'Ophthalmology'
        if code.startswith('156'):
            return 'Others'
    return 'Unknown'
vision['Provider Type'] = vision['Taxonomy'].apply(classify_provider)

In [6]:
vision2 = vision[vision["Entity Type Code"] == 1]
vision2.reset_index(inplace=True)
vision2.drop(columns=['Entity Type Code', 'Replacement NPI', 'Employer Identification Number (EIN)', 'Provider Organization Name (Legal Business Name)',
                          'Provider Name Suffix Text', 'Provider Other Organization Name', 'Provider Other Organization Name Type Code',
                          'Provider Other Last Name', 'Provider Other First Name', 'Provider Other Middle Name', 'Provider Other Name Prefix Text', 
                          'Provider Other Name Suffix Text', 'Provider Other Credential Text', 'Provider Other Last Name Type Code',
                          'Authorized Official Last Name', 'Authorized Official First Name', 'Authorized Official Middle Name', 
                          'Authorized Official Title or Position', 'Authorized Official Telephone Number', 'Certification Date'], inplace=True)

# set up empty columns for the api function later
vision2['polygons'] = None
vision2['list of dict'] = None

#getting number of providers per location
vpg_unique_counts = pd.DataFrame(vision2['Full Address'].value_counts(dropna=False))
vpg_m = vision2.merge(vpg_unique_counts, how='left', on=['Full Address'])
vision2['number of providers at this location'] = vpg_m['count']
print(vision2.head(2))
print(len(vision2))
# drops rows with duplicate addresses so that only one provider per address is kept. optional but helps run the function a lot faster and looks cleaner on map
vision2.drop_duplicates(subset=['Full Address'], ignore_index=True, inplace=True)
print(len(vision2))
# drops rows with missing values in the latitude column. optional but you might run into issues if u try to save as a csv? and then load back to jupyter? idk its so weird
vision2.dropna(subset=['Latitude'], inplace=True)
print(len(vision2))
# creating geodataframe
vpg = gpd.GeoDataFrame(
   vision2, geometry=gpd.points_from_xy(vision2.Longitude, vision2.Latitude), crs="EPSG:4326")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vision2.drop(columns=['Entity Type Code', 'Replacement NPI', 'Employer Identification Number (EIN)', 'Provider Organization Name (Legal Business Name)',
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vision2['polygons'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vision2['list of dict'] = None
A value is trying to be set on a copy of a slice fr

   index         NPI Provider Last Name (Legal Name) Provider First Name  \
0      0  1427051994                            CHEN             SANFORD   
1      1  1326041880                             YOU             TIMOTHY   

  Provider Middle Name Provider Name Prefix Text Provider Credential Text  \
0                 None                       DR.                     M.D.   
1                    T                       DR.                     M.D.   

  Provider First Line Business Mailing Address  \
0                            1200 N TUSTIN AVE   
1                            1200 N TUSTIN AVE   

  Provider Second Line Business Mailing Address  \
0                                       STE 140   
1                                       STE 140   

  Provider Business Mailing Address City Name  \
0                                   SANTA ANA   
1                                   SANTA ANA   

  Provider Business Mailing Address State Name  \
0                                   

Unnamed: 0,index,NPI,Provider Last Name (Legal Name),Provider First Name,Provider Middle Name,Provider Name Prefix Text,Provider Credential Text,Provider First Line Business Mailing Address,Provider Second Line Business Mailing Address,Provider Business Mailing Address City Name,Provider Business Mailing Address State Name,Provider Business Mailing Address Postal Code,Provider Business Mailing Address Country Code (If outside U.S.),Provider Gender Code,Clean Zip,Full Address,Latitude,Longitude,Taxonomy,Provider Type,polygons,list of dict,number of providers at this location,geometry
0,0,1427051994,CHEN,SANFORD,,DR.,M.D.,1200 N TUSTIN AVE,STE 140,SANTA ANA,CA,92705,US,M,92705,1200 N TUSTIN AVE STE 140 SANTA ANA CA 92705,33.755588,-117.834918,207W00000X|_|_|_|_|_|_|_|_|_|_|_|_|_|_,Ophthalmology,,,3,POINT (-117.83492 33.75559)
1,2,1770586232,ISAEFF,WAYNE,B,,M.D.,1880 E WASHINGTON ST,,COLTON,CA,92324,US,M,92324,1880 E WASHINGTON ST COLTON CA 92324,34.047868,-117.295679,207W00000X|_|_|_|_|_|_|_|_|_|_|_|_|_|_,Ophthalmology,,,1,POINT (-117.29568 34.04787)
2,3,1548263007,VENDELAND,JAMES,LEE,DR.,M.D.,5 SEVERANCE CIR,STE 112,CLEVELAND HEIGHTS,OH,44118,US,M,44118,5 SEVERANCE CIR STE 112 CLEVELAND HEIGHTS OH 4...,41.515893,-81.547908,207W00000X|_|_|_|_|_|_|_|_|_|_|_|_|_|_,Ophthalmology,,,1,POINT (-81.54791 41.51589)
3,4,1477556421,ROTH,STEVEN,S.,DR.,O.D.,1107 MANTUA PIKE,STE 722,MANTUA,NJ,8051,US,M,8051,1107 MANTUA PIKE STE 722 MANTUA NJ 08051,39.802445,-75.170818,152W00000X|152W00000X|_|_|_|_|_|_|_|_|_|_|_|_|_,Optometry,,,1,POINT (-75.17082 39.80244)
4,5,1093718892,GOYAL,DINESH,K,DR.,MD,825 NICOLLET MALL,STE 2000,MINNEAPOLIS,MN,55402,US,M,55402,825 NICOLLET MALL STE 2000 MINNEAPOLIS MN 55402,44.975,-93.273498,207W00000X|_|_|_|_|_|_|_|_|_|_|_|_|_|_,Ophthalmology,,,7,POINT (-93.27350 44.97500)


In [9]:
vpgcopy = vpg

In [10]:
# function that feeds values from the longitude and latitude columns to the api to get driving distance isochrones. works per row

def polygonf(row):
    """
    takes longitude and latitude data from their respective columns in vision providers dataframe
    requests mapbox api using that data outputs polygon data as a column in vpdf
    """
    apierror = "Failed to make the API request"
    features = 'no features'
    long = row['Longitude']
    lat = row['Latitude']
# defining mapbox api parameters. adjusting the number in the minutes string will change the distance. 
# check 'https://docs.mapbox.com/api/navigation/isochrone/' for full documentation
    apiurl = 'https://api.mapbox.com/isochrone/v1/mapbox/driving-traffic/'
    coord = '%2C'
    minutes ='?contours_minutes=15&polygons=true&denoise=1&'
    token = 'access_token=sk.eyJ1IjoibXlsZXNuZGlyaXR1IiwiYSI6ImNsanllY2hueTAwbXcza3JraTBkc2Z2bzIifQ.rxgKQl5MA6GIkKLbc4nAvA'
    Mapbox = apiurl + str(long) + coord + str(lat) + minutes + token

    # requesting data from api
    data_response = requests.get(Mapbox)

# Check if the request was successful (status code 200)
    if data_response.status_code == 200:
        full_json = data_response.json()
       
        if "features" in full_json:
            polygonz = full_json['features'][0]['geometry']

            return polygonz
        else:
            return features
    else:
        return apierror

In [None]:
# # timer function because api can only take up to 300 requests per minute before breaking lmaoo thats not the only thing thats breaking

# import time

# chunk_size = 300
# delay_seconds = 66

# total_rows = len(vpg)
# num_chunks = (total_rows // chunk_size) + 1

# current_chunk = 0

# while current_chunk < num_chunks:
#     start_index = current_chunk * chunk_size
#     end_index = min((current_chunk + 1) * chunk_size, total_rows)

#     # Process the chunk of the DataFrame
#     chunk = vpg.iloc[start_index:end_index]
#     print(f'Processing chunk: {current_chunk+1}/{num_chunks}')
#     print(f'Chunk index: {chunk.index}')
#     chunk['polygons'] = chunk.apply(polygonf, axis=1)
#     vpg.iloc[start_index:end_index] = chunk

#     # Print progress and DataFrame
#     print(f"Processed chunk: {current_chunk+1}/{num_chunks}")
#     print(f"Updated dataframe: {vpg[['NPI', 'polygons']]}")

#     # Wait for the specified time delay
#     time.sleep(delay_seconds)

#     # Move to the next chunk
#     current_chunk += 1


Processing chunk: 1/206
Chunk index: Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
       ...
       290, 291, 292, 293, 294, 295, 296, 297, 298, 299],
      dtype='int64', length=300)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 1/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 2/206
Chunk index: Index([300, 301, 302, 303, 304, 305, 306, 307, 308, 309,
       ...

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 2/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 3/206
Chunk index: Index([600, 601, 602, 603, 604, 605, 606, 607, 608, 609,
       ...

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 3/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 4/206
Chunk index: Index([ 900,  901,  902,  903,  904,  905,  906,  907,  908,  909,


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 4/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 5/206
Chunk index: Index([1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211,


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 5/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 6/206
Chunk index: Index([1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509, 1510, 1511,


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 6/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 7/206
Chunk index: Index([1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810, 1811, 1812,


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 7/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 8/206
Chunk index: Index([2103, 2104, 2105, 2106, 2107, 2108, 2109, 2110, 2111, 2112,


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 8/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 9/206
Chunk index: Index([2403, 2404, 2405, 2406, 2407, 2408, 2409, 2410, 2411, 2412,


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Processed chunk: 9/206
Updated dataframe:               NPI                                           polygons
0      1427051994  {'coordinates': [[[-117.880918, 33.90019], [-1...
1      1770586232  {'coordinates': [[[-117.397679, 34.219885], [-...
2      1548263007  {'coordinates': [[[-81.505908, 41.599651], [-8...
3      1477556421  {'coordinates': [[[-75.115818, 39.917977], [-7...
4      1093718892  {'coordinates': [[[-93.306498, 45.120543], [-9...
...           ...                                                ...
61841  1598261018                                               None
61842  1871712299                                               None
61843  1699236943                                               None
61844  1487113486                                               None
61845  1770221202                                               None

[61634 rows x 2 columns]
Processing chunk: 10/206
Chunk index: Index([2703, 2704, 2705, 2706, 2707, 2708, 2709, 2710, 2711, 2712,