# Analysis of Potential Store Locations

## Introduction

This Jupyter Notebook analyzes potential store locations based on data obtained from loopnet.com and the census website.

### Goals
- Determine the suitability of potential store locations.
- Analyze the correlation between various factors and suitability.

## Data Overview
- **Data Sources:**
  - *Addresses:* Open addresses for rent obtained from loopnet.com.
  - *Existing Apple Stores Addresses:* obtained from loopnet.com.
  - *Income:* Income data obtained from the census website.

- **Initial Variables:**
  - `Address`: Address of the potential location.
  - `City`: City where the potential location is located.
  - `ZIP`: ZIP code of the potential location.
  - `Year Built`: Year the building was constructed.
  - `SF`: Square footage of the potential location.
  - `Price`: Rental price of the potential location.

- **Derived Variables:**
  - `potential_location`: Binary variable indicating suitability of a location.
  - `nearest_distance`: Distance to the nearest existing store.
  - `weighted_avg_income`: Weighted average income in the area.
  - `yearly_price_per_SF`: Yearly price per square foot.



## Libraries Import

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
import re
import matplotlib.pyplot as plt

In [242]:
def text_file_content(file_path):
    with open(file_path, 'r') as file:
        content = file.read().splitlines()
    return content

# Example usage:
file_path = 'nyc_rentals_ret.txt'  # loopnet addresses text file
lines_text = (text_file_content(file_path))

In [243]:

# Create a DataFrame
labels = ['Address', 'City and ZIP', 'Year Built', 'SF', 'Price']


In [244]:
def extract_info(text_lines):
    var_list = []
    for j in range(1,6):
        var_list.append([text_lines[i] for i in range(j,len(text_lines), 6)]) 
    return var_list

sublists_vals = (extract_info(lines_text))

In [245]:
df = pd.DataFrame(sublists_vals, index=labels).T
df

Unnamed: 0,Address,City and ZIP,Year Built,SF,Price
0,2586 Linden Blvd,"Brooklyn, NY, 11208",Built in 2015,"8,500 SF Retail Space",$48.00 SF/YR
1,103 Macdougal St,"New York, NY, 10012",Built in 1900,"5,000 - 12,000 SF Retail Spaces",$120.00 SF/YR
2,336 W 23rd St,"New York, NY, 10011",Built in 1910,"2,000 - 6,200 SF Retail Spaces",48.00 - $49.00 SF/YR
3,1 Wall Street,"New York, NY, 10005",Built in 1904,"5,000 - 12,500 SF Retail Spaces",84.00 - $88.00 SF/YR
4,31 W 21st St,"New York, NY, 10010",Built in 1908,"9,000 SF Retail Spaces",$115.00 SF/YR
...,...,...,...,...,...
506,625-649 Eighth Ave,"New York, NY, 10018",Built in 1950,"196 - 29,905 SF Retail Spaces",none
507,4473 Amboy Rd,"Staten Island, NY, 10312",Built in 2022,"2,000 - 15,000 SF Retail Space",none
508,143 Fulton St,"New York, NY, 10038",Built in 2018,"570 - 8,444 SF Retail Spaces",none
509,768 5th Ave,"New York, NY, 10019",Built in 1907,"440 - 32,940 SF Retail Spaces",none


In [246]:
list(df.Price)

['$48.00 SF/YR',
 '$120.00 SF/YR',
 '48.00 - $49.00 SF/YR',
 '84.00 - $88.00 SF/YR',
 '$115.00 SF/YR',
 '$250.00 SF/YR',
 '$76.50 SF/YR',
 '$29.00 SF/YR',
 '$19.68 SF/YR',
 '35.00 - $40.00 SF/YR',
 '$22.00 SF/YR',
 '110.00 - $200.00 SF/YR',
 '$30.00 SF/YR',
 '$55.00 SF/YR',
 '36.00 - $60.00 SF/YR',
 '$30.00 SF/YR',
 '$56.25 SF/YR',
 '$175.00 SF/YR',
 '$40.00 SF/YR',
 '$51.00 SF/YR',
 '$25.00 - $45.00 SF/YR',
 '$70.00 SF/YR',
 '75.00 - $150.00 SF/YR',
 '$180.00 SF/YR',
 '$85.00 SF/YR',
 '$50.00 SF/YR',
 '$150.00 SF/YR',
 '$30.00 - $32.00 SF/YR',
 '$60.00 SF/YR',
 '$30.55 - $150.00 SF/YR',
 '$40.75 - $155.00 SF/YR',
 '$38.00 SF/YR',
 '36.00 - $47.24 SF/YR',
 '$45.00 SF/YR',
 '$105.00 SF/YR',
 '$16.26 SF/YR',
 '$53.60 SF/YR',
 '$65.00 SF/YR',
 '$40.00 - $50.00 SF/YR',
 '$25.00 SF/YR',
 '$60.00 - $75.00 SF/YR',
 '$44.00 SF/YR',
 '50.00 - $150.00 SF/YR',
 '$45.00 SF/YR',
 '$55.00 SF/YR',
 '$50.00 SF/YR',
 '$35.00 SF/YR',
 '$28.00 SF/YR',
 '$50.00 SF/YR',
 '$34.00 - $60.00 SF/YR',
 '$42.50 S

In [247]:
df[['City', 'ZIP']] = df['City and ZIP'].str.split('NY, ', expand=True)


In [248]:
df.head(20)

Unnamed: 0,Address,City and ZIP,Year Built,SF,Price,City,ZIP
0,2586 Linden Blvd,"Brooklyn, NY, 11208",Built in 2015,"8,500 SF Retail Space",$48.00 SF/YR,"Brooklyn,",11208
1,103 Macdougal St,"New York, NY, 10012",Built in 1900,"5,000 - 12,000 SF Retail Spaces",$120.00 SF/YR,"New York,",10012
2,336 W 23rd St,"New York, NY, 10011",Built in 1910,"2,000 - 6,200 SF Retail Spaces",48.00 - $49.00 SF/YR,"New York,",10011
3,1 Wall Street,"New York, NY, 10005",Built in 1904,"5,000 - 12,500 SF Retail Spaces",84.00 - $88.00 SF/YR,"New York,",10005
4,31 W 21st St,"New York, NY, 10010",Built in 1908,"9,000 SF Retail Spaces",$115.00 SF/YR,"New York,",10010
5,59 N 6th St,"Brooklyn, NY, 11249",Built in 2020,"4,200 SF Retail Space",$250.00 SF/YR,"Brooklyn,",11249
6,135 W 36th St,"New York, NY, 10018",Built in 1925,"2,439 - 27,499 SF Spaces",$76.50 SF/YR,"New York,",10018
7,946 McDonald Ave,"Brooklyn, NY, 11218",Built in 1933,"1,000 - 4,500 SF Retail Spaces",$29.00 SF/YR,"Brooklyn,",11218
8,2705-2715 Mermaid Ave,"Brooklyn, NY, 11224",Built in 1930,"5,000 SF Retail Space",$19.68 SF/YR,"Brooklyn,",11224
9,218 Newel St,"Brooklyn, NY, 11222",Built in 1931,"2,830 - 11,760 SF",35.00 - $40.00 SF/YR,"Brooklyn,",11222


In [249]:
df['Price'] = df['Price'].str.strip()

# Replace 'none' with '- 0 SF/YR'
df['Price'] = df['Price'].replace('none', ' - 0 SF/YR')

# Step 2 & 3: Extract substring between '-' or '$' and 'SF/YR' in the 'Price' column
#df['yearly_price_per_SF'] = df['Price'].str.extract(r'(-|\$)(.*?)(?= SF/YR)', expand=False)
def get_substring(string, start_vals, end_vals):
    max_indices_from_left = ([string.rfind(val) for val in start_vals])
    max_index_from_left = max(max_indices_from_left) if max_indices_from_left else -1
    truncate_left_side = string[max_index_from_left+len(string[max_index_from_left]):]
    min_indices_from_right = [truncate_left_side.find(val) for val in end_vals]
    min_index_from_right = min(min_indices_from_right) if min_indices_from_right else -1
    truncate_right_side = truncate_left_side[:min_index_from_right]
    return truncate_right_side #left_side #right_side[0] if right_side else left_side


df['yearly_price_per_SF'] = df['Price'].apply(lambda x: get_substring(x, ['$', '-'], ['SF/YR']))
df['SF'] = df['SF'].apply(lambda x: get_substring(x, ['-'], ['SF']))

df['year_built'] = df['Year Built'].str.replace('Built in ', '')

def remove_info(s):
    return re.sub(r'^.*?-', '', s)


df['Address'] = df['Address'].apply(remove_info)

# Step 4: Drop unnecessary columns
df.drop(columns=[ 'Price', 'Year Built'], inplace=True)



In [250]:
df

Unnamed: 0,Address,City and ZIP,SF,City,ZIP,yearly_price_per_SF,year_built
0,2586 Linden Blvd,"Brooklyn, NY, 11208",8500,"Brooklyn,",11208,48.00,2015
1,103 Macdougal St,"New York, NY, 10012",12000,"New York,",10012,120.00,1900
2,336 W 23rd St,"New York, NY, 10011",6200,"New York,",10011,49.00,1910
3,1 Wall Street,"New York, NY, 10005",12500,"New York,",10005,88.00,1904
4,31 W 21st St,"New York, NY, 10010",9000,"New York,",10010,115.00,1908
...,...,...,...,...,...,...,...
506,649 Eighth Ave,"New York, NY, 10018",29905,"New York,",10018,0,1950
507,4473 Amboy Rd,"Staten Island, NY, 10312",15000,"Staten Island,",10312,0,2022
508,143 Fulton St,"New York, NY, 10038",8444,"New York,",10038,0,2018
509,768 5th Ave,"New York, NY, 10019",32940,"New York,",10019,0,1907


In [251]:
mystr = '123df6780df'
char = 'df'
min_index = min((i for i, c in enumerate(mystr) if c == char), default=-1)
print(min_index)  # Output: -1


-1


In [252]:
start = ['-']
end = []
s = '625-649 Eighth Ave'

def get_substring(string, start_vals, end_vals):
    max_indices_from_left = ([string.rfind(val) for val in start_vals])
    max_index_from_left = max(max_indices_from_left) if max_indices_from_left else -1
    truncate_left_side = string[max_index_from_left+len(string[max_index_from_left]) :]
    min_indices_from_right = [truncate_left_side.find(val) for val in end_vals]
    min_index_from_right = min(min_indices_from_right) if min_indices_from_right else -1
    truncate_right_side = truncate_left_side[:(min_index_from_right)]
    print('tls',truncate_left_side, min_index_from_right)
    return truncate_right_side #left_side #right_side[0] if right_side else left_side
    
print( get_substring(s, start,end))

tls 649 Eighth Ave -1
649 Eighth Av


In [253]:
import re
output_str = re.sub(r'^.*?-', '', s)
print(output_str)

649 Eighth Ave


In [254]:
df['full_address'] = df['Address'] + ' ' + df['City and ZIP']
df.head(20)

Unnamed: 0,Address,City and ZIP,SF,City,ZIP,yearly_price_per_SF,year_built,full_address
0,2586 Linden Blvd,"Brooklyn, NY, 11208",8500,"Brooklyn,",11208,48.0,2015,"2586 Linden Blvd Brooklyn, NY, 11208"
1,103 Macdougal St,"New York, NY, 10012",12000,"New York,",10012,120.0,1900,"103 Macdougal St New York, NY, 10012"
2,336 W 23rd St,"New York, NY, 10011",6200,"New York,",10011,49.0,1910,"336 W 23rd St New York, NY, 10011"
3,1 Wall Street,"New York, NY, 10005",12500,"New York,",10005,88.0,1904,"1 Wall Street New York, NY, 10005"
4,31 W 21st St,"New York, NY, 10010",9000,"New York,",10010,115.0,1908,"31 W 21st St New York, NY, 10010"
5,59 N 6th St,"Brooklyn, NY, 11249",4200,"Brooklyn,",11249,250.0,2020,"59 N 6th St Brooklyn, NY, 11249"
6,135 W 36th St,"New York, NY, 10018",27499,"New York,",10018,76.5,1925,"135 W 36th St New York, NY, 10018"
7,946 McDonald Ave,"Brooklyn, NY, 11218",4500,"Brooklyn,",11218,29.0,1933,"946 McDonald Ave Brooklyn, NY, 11218"
8,2715 Mermaid Ave,"Brooklyn, NY, 11224",5000,"Brooklyn,",11224,19.68,1930,"2715 Mermaid Ave Brooklyn, NY, 11224"
9,218 Newel St,"Brooklyn, NY, 11222",11760,"Brooklyn,",11222,40.0,1931,"218 Newel St Brooklyn, NY, 11222"


In [255]:
import ast
coords_from_text = (text_file_content('geospatial_coords_nyc.txt'))

def text_to_coords(string_coords_from_text):
    last_coord = [ast.literal_eval(string_coords_from_text[-1])]
    coords_from_text_tuples = [ast.literal_eval(str_coord)[0]
                               for str_coord in coords_from_text][:-1]+last_coord
    return coords_from_text_tuples

In [256]:
coords_from_text_tuples = text_to_coords(coords_from_text)

coords_list_lat, coords_list_long = list(map(list, zip(*coords_from_text_tuples)))
df['latitude'] = coords_list_lat
df['longitude'] = coords_list_long
df

Unnamed: 0,Address,City and ZIP,SF,City,ZIP,yearly_price_per_SF,year_built,full_address,latitude,longitude
0,2586 Linden Blvd,"Brooklyn, NY, 11208",8500,"Brooklyn,",11208,48.00,2015,"2586 Linden Blvd Brooklyn, NY, 11208",40.668856,-73.868809
1,103 Macdougal St,"New York, NY, 10012",12000,"New York,",10012,120.00,1900,"103 Macdougal St New York, NY, 10012",40.729650,-74.000913
2,336 W 23rd St,"New York, NY, 10011",6200,"New York,",10011,49.00,1910,"336 W 23rd St New York, NY, 10011",40.745588,-74.000071
3,1 Wall Street,"New York, NY, 10005",12500,"New York,",10005,88.00,1904,"1 Wall Street New York, NY, 10005",40.707302,-74.011693
4,31 W 21st St,"New York, NY, 10010",9000,"New York,",10010,115.00,1908,"31 W 21st St New York, NY, 10010",40.741147,-73.992115
...,...,...,...,...,...,...,...,...,...,...
506,649 Eighth Ave,"New York, NY, 10018",29905,"New York,",10018,0,1950,"649 Eighth Ave New York, NY, 10018",42.715204,-73.711278
507,4473 Amboy Rd,"Staten Island, NY, 10312",15000,"Staten Island,",10312,0,2022,"4473 Amboy Rd Staten Island, NY, 10312",40.544183,-74.162864
508,143 Fulton St,"New York, NY, 10038",8444,"New York,",10038,0,2018,"143 Fulton St New York, NY, 10038",40.710639,-74.007970
509,768 5th Ave,"New York, NY, 10019",32940,"New York,",10019,0,1907,"768 5th Ave New York, NY, 10019",40.764460,-73.974494


In [257]:
# Read the nyc income Excel file into a pandas DataFrame
df_income = pd.read_excel(r'nyc_income_by_zip.xlsx')
# Remove rows with missing values (NaN)
df_income.dropna(inplace=True)
# Create a new column 'avg_income' with specified values (repeating pattern) based on the provided income range
df_income['avg_income'] = [17, 37, 63, 87, 150, 250] * 1535
# Convert 'zip_code' column to string type and remove everything after '.'
df_income['ZIP'] = df_income['ZIP'].astype(str).apply(lambda x: x.split('.')[0])
# Filter the DataFrame to keep rows where the zip_code is in the nyc list (plenty are from the whole state of NY)
df_income = df_income[df_income['ZIP'].isin(list(df.ZIP))] 
# Calculate the total income for each zip code by multiplying the average income by the number of earners
df_income['total_income'] = df_income['avg_income'] * df_income['nr_earners']
# Group by 'zip_code' and sum the total income and total number of earners
grouped_data = df_income.groupby('ZIP').agg({'total_income': 'sum', 'nr_earners': 'sum'})
# Calculate the weighted average income for each zip code
grouped_data['weighted_avg_income'] = grouped_data['total_income'] / grouped_data['nr_earners']
# Reset index to make 'zip_code' a column again
grouped_data = grouped_data.reset_index()
grouped_data

Unnamed: 0,ZIP,total_income,nr_earners,weighted_avg_income
0,10001,1859520.0,16070.0,115.713752
1,10002,2577690.0,39840.0,64.701054
2,10003,3455670.0,26790.0,128.991041
3,10004,352780.0,2320.0,152.060345
4,10005,801510.0,5860.0,136.776451
...,...,...,...,...
79,11237,1208060.0,23440.0,51.538396
80,11238,3132360.0,31290.0,100.107383
81,11239,394020.0,8160.0,48.286765
82,11361,1064990.0,14380.0,74.060501


In [258]:
l2 = list(set(grouped_data.ZIP))
l1 = list(set(list(df.ZIP)))
not_in_list2 = [x for x in l1 if x not in l2]

print(not_in_list2)  # Output: [1, 2]

['11243', '10174', '10020', '10121', '10165', '10153', '11249', '10118']


In [259]:
# Filter the DataFrame to keep rows where the zip_code is in the nyc list (plenty are from the whole state of NY)
df = df[df['ZIP'].isin(list(grouped_data.ZIP))] 
# Merge the new DataFrame with the old DataFrame on the zip code column
df = pd.merge(df, grouped_data[['ZIP', 'weighted_avg_income']], on='ZIP', how='left')
df

Unnamed: 0,Address,City and ZIP,SF,City,ZIP,yearly_price_per_SF,year_built,full_address,latitude,longitude,weighted_avg_income
0,2586 Linden Blvd,"Brooklyn, NY, 11208",8500,"Brooklyn,",11208,48.00,2015,"2586 Linden Blvd Brooklyn, NY, 11208",40.668856,-73.868809,42.136414
1,103 Macdougal St,"New York, NY, 10012",12000,"New York,",10012,120.00,1900,"103 Macdougal St New York, NY, 10012",40.729650,-74.000913,124.969352
2,336 W 23rd St,"New York, NY, 10011",6200,"New York,",10011,49.00,1910,"336 W 23rd St New York, NY, 10011",40.745588,-74.000071,136.636492
3,1 Wall Street,"New York, NY, 10005",12500,"New York,",10005,88.00,1904,"1 Wall Street New York, NY, 10005",40.707302,-74.011693,136.776451
4,31 W 21st St,"New York, NY, 10010",9000,"New York,",10010,115.00,1908,"31 W 21st St New York, NY, 10010",40.741147,-73.992115,130.907326
...,...,...,...,...,...,...,...,...,...,...,...
487,649 Eighth Ave,"New York, NY, 10018",29905,"New York,",10018,0,1950,"649 Eighth Ave New York, NY, 10018",42.715204,-73.711278,123.035061
488,4473 Amboy Rd,"Staten Island, NY, 10312",15000,"Staten Island,",10312,0,2022,"4473 Amboy Rd Staten Island, NY, 10312",40.544183,-74.162864,89.055025
489,143 Fulton St,"New York, NY, 10038",8444,"New York,",10038,0,2018,"143 Fulton St New York, NY, 10038",40.710639,-74.007970,106.706181
490,768 5th Ave,"New York, NY, 10019",32940,"New York,",10019,0,1907,"768 5th Ave New York, NY, 10019",40.764460,-73.974494,119.985222


In [260]:
from geopy.distance import geodesic

# Read coordinates from the text file
apple_stores_coords = (text_file_content('apple_store_locs_nyc.txt'))
last_coord = [ast.literal_eval(apple_stores_coords[-1])]
apple_stores_coords_from_text_tuples = [ast.literal_eval(str_coord)[0] for str_coord in apple_stores_coords][:-1]+last_coord
print(apple_stores_coords_from_text_tuples)

# Assume your DataFrame containing coordinates is named df
# Iterate over each coordinate in your DataFrame
for index, row in df.iterrows():
    min_distance = float('inf')  # Initialize with infinity
    # Iterate over each coordinate from the text file
    for coord in apple_stores_coords_from_text_tuples:
        # Calculate distance between coordinates
        distance = geodesic((row['latitude'], row['longitude']), coord).meters
        # Update minimum distance if necessary
        min_distance = min(min_distance, distance)
    # Assign minimum distance to a new column in your DataFrame
    df.at[index, 'nearest_distance'] = min_distance

# Display DataFrame with nearest distances
df

[(-40.76353265, -73.97225599017521), (40.75449347916667, -73.97754116666667), (40.775066100000004, -73.98266860124198), (40.71155485, -74.01141990722878), (40.725039, -73.9991532), (40.7411861, -74.0054573), (40.7733551, -73.9644948), (40.73459645, -73.87001159452466), (40.7156238, -73.959899), (40.6855893, -73.9782267), (40.75020495, -73.9853222), (40.864143049999996, -73.8276784603631)]


Unnamed: 0,Address,City and ZIP,SF,City,ZIP,yearly_price_per_SF,year_built,full_address,latitude,longitude,weighted_avg_income,nearest_distance
0,2586 Linden Blvd,"Brooklyn, NY, 11208",8500,"Brooklyn,",11208,48.00,2015,"2586 Linden Blvd Brooklyn, NY, 11208",40.668856,-73.868809,42.136414,7301.069926
1,103 Macdougal St,"New York, NY, 10012",12000,"New York,",10012,120.00,1900,"103 Macdougal St New York, NY, 10012",40.729650,-74.000913,124.969352,533.161249
2,336 W 23rd St,"New York, NY, 10011",6200,"New York,",10011,49.00,1910,"336 W 23rd St New York, NY, 10011",40.745588,-74.000071,136.636492,667.802884
3,1 Wall Street,"New York, NY, 10005",12500,"New York,",10005,88.00,1904,"1 Wall Street New York, NY, 10005",40.707302,-74.011693,136.776451,472.868250
4,31 W 21st St,"New York, NY, 10010",9000,"New York,",10010,115.00,1908,"31 W 21st St New York, NY, 10010",40.741147,-73.992115,130.907326,1126.920292
...,...,...,...,...,...,...,...,...,...,...,...,...
487,649 Eighth Ave,"New York, NY, 10018",29905,"New York,",10018,0,1950,"649 Eighth Ave New York, NY, 10018",42.715204,-73.711278,123.035061,205823.387846
488,4473 Amboy Rd,"Staten Island, NY, 10312",15000,"Staten Island,",10312,0,2022,"4473 Amboy Rd Staten Island, NY, 10312",40.544183,-74.162864,89.055025,22151.785754
489,143 Fulton St,"New York, NY, 10038",8444,"New York,",10038,0,2018,"143 Fulton St New York, NY, 10038",40.710639,-74.007970,106.706181,308.763344
490,768 5th Ave,"New York, NY, 10019",32940,"New York,",10019,0,1907,"768 5th Ave New York, NY, 10019",40.764460,-73.974494,119.985222,1136.272060


In [261]:
# Remove non-numeric characters and convert to float
df['SF'] = df['SF'].str.replace(',', '').str.strip().astype(float)

# convert price to float
df['yearly_price_per_SF'] = df['yearly_price_per_SF'].astype(float)

# Convert 'year_built' column to integers
df['year_built'] = df['year_built'].astype(int)

In [268]:
# Create binary variables based on criteria
potential_location_distance = df['nearest_distance'] >= np.quantile(df.nearest_distance, .25)
potential_location_income = df['weighted_avg_income'] >= np.quantile(df.weighted_avg_income, 0.25)
potential_location_sf = df['SF'] >= np.quantile(df.SF, 0.25)
potential_location_price = df['yearly_price_per_SF'] <= np.quantile(df.yearly_price_per_SF, .95)
#potential_location_year = df['year_built'] >= np.median(df.year_built)

# Combine criteria
potential_location_combined = (
    potential_location_distance & 
    potential_location_income & 
    potential_location_sf & 
    potential_location_price 
   #& potential_location_year
)

# Label dataframe based on combined criteria
df['potential_location'] = potential_location_combined.astype(int)
df

Unnamed: 0,Address,City and ZIP,SF,City,ZIP,yearly_price_per_SF,year_built,full_address,latitude,longitude,weighted_avg_income,nearest_distance,potential_location
0,2586 Linden Blvd,"Brooklyn, NY, 11208",8500.0,"Brooklyn,",11208,48.0,2015,"2586 Linden Blvd Brooklyn, NY, 11208",40.668856,-73.868809,42.136414,7301.069926,0
1,103 Macdougal St,"New York, NY, 10012",12000.0,"New York,",10012,120.0,1900,"103 Macdougal St New York, NY, 10012",40.729650,-74.000913,124.969352,533.161249,0
2,336 W 23rd St,"New York, NY, 10011",6200.0,"New York,",10011,49.0,1910,"336 W 23rd St New York, NY, 10011",40.745588,-74.000071,136.636492,667.802884,0
3,1 Wall Street,"New York, NY, 10005",12500.0,"New York,",10005,88.0,1904,"1 Wall Street New York, NY, 10005",40.707302,-74.011693,136.776451,472.868250,0
4,31 W 21st St,"New York, NY, 10010",9000.0,"New York,",10010,115.0,1908,"31 W 21st St New York, NY, 10010",40.741147,-73.992115,130.907326,1126.920292,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
487,649 Eighth Ave,"New York, NY, 10018",29905.0,"New York,",10018,0.0,1950,"649 Eighth Ave New York, NY, 10018",42.715204,-73.711278,123.035061,205823.387846,1
488,4473 Amboy Rd,"Staten Island, NY, 10312",15000.0,"Staten Island,",10312,0.0,2022,"4473 Amboy Rd Staten Island, NY, 10312",40.544183,-74.162864,89.055025,22151.785754,1
489,143 Fulton St,"New York, NY, 10038",8444.0,"New York,",10038,0.0,2018,"143 Fulton St New York, NY, 10038",40.710639,-74.007970,106.706181,308.763344,0
490,768 5th Ave,"New York, NY, 10019",32940.0,"New York,",10019,0.0,1907,"768 5th Ave New York, NY, 10019",40.764460,-73.974494,119.985222,1136.272060,1


In [273]:
import folium
from folium.plugins import MarkerCluster
from IPython.display import IFrame


map_center = [40.7128, -74.0060]  # New York City coordinates
map_zoom = 10  # Zoom level
m = folium.Map(location=map_center, zoom_start=map_zoom)

# MarkerCluster layer
marker_cluster = MarkerCluster().add_to(m)

# Add markers for potential locations
for index, row in df.iterrows():
    if row['potential_location'] == 1:
        print(row['latitude'], row['longitude'])
        folium.Marker(
            location=[row['latitude'], row['longitude']],
            popup=f"Location: {index}",
            icon=folium.Icon(color='green', icon='check')
        ).add_to(marker_cluster)
    else:
        folium.Marker(
            location=[row['latitude'], row['longitude']],
            popup=f"Location: {index}",
            icon=folium.Icon(color='red', icon='times')
        ).add_to(marker_cluster)

# Save the map as an HTML file
m.save('potential_locations_map.html')

# Display the map directly in the notebook
IFrame(src='potential_locations_map.html', width=700, height=600)


40.7411472 -73.99211529319186
40.7296393 -73.9500212
40.5639272 -74.1143462
40.723597322580645 -73.95460551612904
40.7642505 -73.99793080889339
40.697211 -73.960922
40.52819158809241 -74.23743545740977
40.755799499999995 -73.99738739525293
40.520736 -74.217683
40.7189793 -73.946675
40.7500533877551 -74.00365836734694
40.50995335 -74.24713775000001
42.7352973 -73.71131
40.94783525 -73.8937227937998
40.63428605 -74.12324575091051
40.7041127 -73.9868413
40.6995838 -73.98721707678408
40.6102309 -73.9439397
40.7834733 -73.9502583
40.5273907 -74.233939
40.6728674 -73.9910249
40.6155853 -73.92962860064989
40.7437261 -73.9835592
40.74380889744286 -73.99266935386287
40.63783188095238 -74.07661871428571
40.7298686 -73.9590216
40.7189793 -73.946675
40.7500533877551 -74.00365836734694
40.50995335 -74.24713775000001
42.7352973 -73.71131
40.94783525 -73.8937227937998
40.63428605 -74.12324575091051
40.7041127 -73.9868413
40.6995838 -73.98721707678408
40.6102309 -73.9439397
40.7834733 -73.9502583
40.5

In [274]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming df is your DataFrame containing the relevant columns

# Calculate correlation matrix
correlation_matrix = df.corr()

# Visualize correlation matrix using heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Matrix')
plt.show()

# Visualize pairwise relationships between variables
sns.pairplot(df, vars=['potential_location', 'nearest_distance', 'weighted_avg_income', 'SF', 'yearly_price_per_SF', 'year_built'])
plt.show()


ModuleNotFoundError: No module named 'seaborn'

In [271]:
len(df[df['potential_location'] == 1])

169

In [266]:
print(df.full_address[487], df.potential_location[487])


649 Eighth Ave  New York, NY, 10018 1


In [272]:
169/490

0.3448979591836735