# Organizing OSM Data

This IPython notebook file is about organizing data from OpenStreetMap (OSM), specifically focusing on removing duplicates, fixing names, and correcting roundabout junctions. It starts by loading street data from OSM and performs operations to refine the data.

The notebook aims to create a comprehensive and accurate dataset of streets by combining and refining data from different sources. This involves identifying and correcting discrepancies in street names, handling roundabout junctions, and creating polygons to represent the crossroads. It also includes dropping duplicates and fixing names to make the data more usable. The process involves loading, preprocessing, and analyzing the data, and includes visualizing it using maps for better understanding and analysis.

* Removing Duplicates
* Fixing Names
* Correcting Roundabout Junctions


List of files that are used
* TA_streets_20240724_031704
* disjointed_roads_to_be_dropped

In [2]:
import matplotlib.pyplot as plt
from shapely.geometry import MultiLineString, LineString
from shapely.geometry import Point
from shapely.ops import unary_union
import leafmap

import networkx as nx
import osmnx as ox
import geopandas as gpd
import pandas as pd
import numpy as np
from IPython.display import Image, display

### Loading ta_streets from TA open data

In [3]:
ta_streets = gpd.read_file('./csv_tables/TA_streets_20240724_031704/Streets.shp')
ta_streets

Unnamed: 0,oidrechov,krechov,trechov,shemangli,mslamas,tsug,kkivun,UniqueId,shemarvit,kreka,geometry
0,1.0,915.0,הרוגי מלכות,HARUGEY MALKHOT,336.0,רחוב,0.0,507-10001,قتل مملكة,100.0,"LINESTRING (672865.880 3554095.253, 672895.216..."
1,2.0,0.0,0,UKNOWN,0.0,רחוב,3.0,507-10002,,100.0,"LINESTRING (666990.498 3551436.940, 667065.337..."
2,3.0,265.0,אמסטרדם,AMSTERDAM,516.0,רחוב,1.0,507-10003,أمستردام,100.0,"LINESTRING (667879.712 3551424.162, 667940.741..."
3,4.0,644.0,אלון יגאל,YIG'AL ALLON,2524.0,רחוב,0.0,507-10004,ألون ييغال,200.0,"LINESTRING (669570.036 3550420.535, 669581.404..."
4,5.0,634.0,מרגולין,MARGOLIN,2649.0,רחוב,1.0,507-10005,مارغولين,100.0,"LINESTRING (669329.153 3548322.758, 669409.403..."
...,...,...,...,...,...,...,...,...,...,...,...
8874,9851.0,3007.0,שבטי ישראל,SHIVTEY YISRA'EL,1983.0,רחוב,0.0,507-17843,قبائل إسرائيل,100.0,"LINESTRING (665771.816 3547023.159, 665760.256..."
8875,9852.0,3058.0,אבינרי יצחק,AVINERY,2027.0,רחוב,0.0,507-20562,Avinri Yitzhak,100.0,"LINESTRING (665585.719 3547178.152, 665627.936..."
8876,9853.0,3058.0,אבינרי יצחק,AVINERY,2027.0,רחוב,0.0,507-20563,Avinri Yitzhak,100.0,"LINESTRING (665700.142 3547064.296, 665759.119..."
8877,9855.0,3907.0,3907,,1703.0,רחוב,0.0,507-21960,3907,100.0,"LINESTRING (665087.059 3546677.092, 665075.120..."


In [4]:
ta_streets = ta_streets.drop(columns=['kreka', 'UniqueId', 'kkivun', 'mslamas', 'krechov','tsug'])
ta_streets

Unnamed: 0,oidrechov,trechov,shemangli,shemarvit,geometry
0,1.0,הרוגי מלכות,HARUGEY MALKHOT,قتل مملكة,"LINESTRING (672865.880 3554095.253, 672895.216..."
1,2.0,0,UKNOWN,,"LINESTRING (666990.498 3551436.940, 667065.337..."
2,3.0,אמסטרדם,AMSTERDAM,أمستردام,"LINESTRING (667879.712 3551424.162, 667940.741..."
3,4.0,אלון יגאל,YIG'AL ALLON,ألون ييغال,"LINESTRING (669570.036 3550420.535, 669581.404..."
4,5.0,מרגולין,MARGOLIN,مارغولين,"LINESTRING (669329.153 3548322.758, 669409.403..."
...,...,...,...,...,...
8874,9851.0,שבטי ישראל,SHIVTEY YISRA'EL,قبائل إسرائيل,"LINESTRING (665771.816 3547023.159, 665760.256..."
8875,9852.0,אבינרי יצחק,AVINERY,Avinri Yitzhak,"LINESTRING (665585.719 3547178.152, 665627.936..."
8876,9853.0,אבינרי יצחק,AVINERY,Avinri Yitzhak,"LINESTRING (665700.142 3547064.296, 665759.119..."
8877,9855.0,3907,,3907,"LINESTRING (665087.059 3546677.092, 665075.120..."


### Loading OSM TA Data

#### Loading crosswalk data:

Removing participants in the same accidents so we don't have more accidents.

Later we will analyze all participants when we want to check severity.

In [5]:
G = ox.graph_from_place("Tel Aviv, Israel", network_type="drive")

In [6]:
# you can convert your graph to node and edge GeoPandas GeoDataFrames
os_ta_streets_nodes, os_ta_streets_edges = ox.graph_to_gdfs(G)

os_ta_streets_nodes = os_ta_streets_nodes.to_crs('32636')
os_ta_streets_edges = os_ta_streets_edges.to_crs('32636')

display(os_ta_streets_nodes.head(3)), os_ta_streets_nodes.shape

Unnamed: 0_level_0,y,x,highway,street_count,ref,geometry
osmid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
139693,32.09384,34.790572,traffic_signals,4,,POINT (668968.683 3552240.237)
139698,32.093869,34.791231,,3,,POINT (669030.815 3552244.552)
139707,32.095354,34.7785,,3,,POINT (667826.578 3552389.242)


(None, (6488, 6))

In [7]:
os_ta_streets_nodes = os_ta_streets_nodes.reset_index()
os_ta_streets_nodes

Unnamed: 0,osmid,y,x,highway,street_count,ref,geometry
0,139693,32.093840,34.790572,traffic_signals,4,,POINT (668968.683 3552240.237)
1,139698,32.093869,34.791231,,3,,POINT (669030.815 3552244.552)
2,139707,32.095354,34.778500,,3,,POINT (667826.578 3552389.242)
3,139708,32.095052,34.778329,,3,,POINT (667810.983 3552355.494)
4,139709,32.094527,34.778842,,4,,POINT (667860.387 3552298.098)
...,...,...,...,...,...,...,...
6483,12292683832,32.062786,34.785004,,1,,POINT (668500.151 3548788.711)
6484,12292683834,32.063149,34.785276,traffic_signals,3,,POINT (668525.204 3548829.326)
6485,12361383714,32.057552,34.763578,,1,,POINT (666486.760 3548175.193)
6486,12361652364,32.054494,34.771553,,1,,POINT (667245.361 3547848.419)


In [8]:
os_ta_streets_edges = os_ta_streets_edges.reset_index()
display(os_ta_streets_edges.head(3)), os_ta_streets_edges.shape

Unnamed: 0,u,v,key,osmid,oneway,name,highway,reversed,length,geometry,maxspeed,lanes,ref,access,tunnel,bridge,junction,width
0,139693,5723720351,0,5118378,True,ויצמן,tertiary,False,4.37,"LINESTRING (668968.683 3552240.237, 668968.629...",,,,,,,,
1,139693,139698,0,167691710,True,יהודה המכבי,tertiary,False,62.173,"LINESTRING (668968.683 3552240.237, 668972.601...",,,,,,,,
2,139698,139723,0,167691710,True,יהודה המכבי,tertiary,False,110.029,"LINESTRING (669030.815 3552244.552, 669082.911...",,,,,,,,


(None, (12451, 18))

In [9]:
os_ta_streets_edges.name

0              ויצמן
1        יהודה המכבי
2        יהודה המכבי
3             ירמיהו
4           אוסישקין
            ...     
12446           הרכב
12447          המסגר
12448            NaN
12449           1133
12450           הרצל
Name: name, Length: 12451, dtype: object

Meaning of edges columns:

* u, v: These represent the unique identifiers of the nodes (intersections) at the start and end of the edge (street segment).
* key: An identifier for the specific edge within the graph.
* osmid: The OpenStreetMap ID of the edge.
* oneway: A Boolean indicating whether the street is one-way.
* name: The name of the street.
* highway: The functional class of the road, such as 'primary', 'secondary', 'tertiary', 'residential', etc.
* reversed: A Boolean indicating whether the direction of the edge has been reversed for routing purposes.
* length: The length of the edge in meters.
* geometry: The geographic shape of the edge, typically a LineString representing the street segment.
* maxspeed: The maximum allowed speed on the street.
* lanes: The number of lanes on the street.
* ref: Reference number of the street, often used for road signs.
* access: Information about vehicle access to the street, such as 'private' or 'permissive'.
* tunnel: A Boolean indicating whether the street is a tunnel.
* bridge: A Boolean indicating whether the street is a bridge.
* junction: Information about the type of junction at the end of the edge, such as 'roundabout' or 'crossing'.
* width: The width of the street in meters.


In [10]:
os_ta_streets_edges[~os_ta_streets_edges.width.isna()]['width'].shape

(56,)

In [11]:
os_ta_streets_edges[(~os_ta_streets_edges.key.isna()) & (os_ta_streets_edges.key != 0)]['key'].shape


(54,)

### Dropping u v duplicates:

In [12]:
os_ta_streets_nodes.duplicated(subset='osmid').sum()

0

In [13]:
os_ta_streets_edges.duplicated(subset=['u','v']).sum()

54

In [14]:
os_ta_streets_edges = os_ta_streets_edges.drop_duplicates(subset=['u','v'])
# os_ta_streets_nodes = os_ta_streets_nodes.drop_duplicates(subset=['u','v'])


In [15]:
os_ta_streets_edges.shape

(12397, 18)

Made sure dropping duplicates does actually drop the correct.<br>
But since I don't want to change the order of the points in the linestring I will get the index and filter using it.

In [16]:
idx_geo_dup = os_ta_streets_edges[os_ta_streets_edges.normalize().duplicated()].index
os_ta_streets_edges = os_ta_streets_edges[~(os_ta_streets_edges.index.isin(idx_geo_dup))].copy()

In [17]:
os_ta_streets_edges = os_ta_streets_edges.reset_index()
os_ta_streets_edges = os_ta_streets_edges.reset_index()
os_ta_streets_edges

Unnamed: 0,level_0,index,u,v,key,osmid,oneway,name,highway,reversed,length,geometry,maxspeed,lanes,ref,access,tunnel,bridge,junction,width
0,0,0,139693,5723720351,0,5118378,True,ויצמן,tertiary,False,4.370,"LINESTRING (668968.683 3552240.237, 668968.629...",,,,,,,,
1,1,1,139693,139698,0,167691710,True,יהודה המכבי,tertiary,False,62.173,"LINESTRING (668968.683 3552240.237, 668972.601...",,,,,,,,
2,2,2,139698,139723,0,167691710,True,יהודה המכבי,tertiary,False,110.029,"LINESTRING (669030.815 3552244.552, 669082.911...",,,,,,,,
3,3,3,139707,139708,0,26516058,False,ירמיהו,residential,False,37.249,"LINESTRING (667826.578 3552389.242, 667810.983...",,,,,,,,
4,4,4,139707,10985355495,0,1183058410,False,אוסישקין,residential,False,190.227,"LINESTRING (667826.578 3552389.242, 667831.045...",,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9523,9523,12440,12287814424,544561044,0,"[396627330, 396627331, 396627332, 139744099, 1...",True,איילון דרום,motorway,False,1158.030,"LINESTRING (668470.214 3547868.809, 668469.837...",90,3,20,,,yes,,
9524,9524,12441,12287814424,2213119636,0,1082074722,True,,motorway_link,False,656.725,"LINESTRING (668470.214 3547868.809, 668467.408...",,1,,no,,,,
9525,9525,12443,12292683830,1790662878,0,167691679,True,המסגר,secondary,False,136.733,"LINESTRING (668470.453 3548624.735, 668458.737...",,3,,,,,,
9526,9526,12446,12292683834,6145368549,0,1106792626,True,הרכב,residential,False,31.335,"LINESTRING (668525.204 3548829.326, 668538.677...",,,,,,,,


In [18]:
os_ta_streets_edges.columns

Index(['level_0', 'index', 'u', 'v', 'key', 'osmid', 'oneway', 'name',
       'highway', 'reversed', 'length', 'geometry', 'maxspeed', 'lanes', 'ref',
       'access', 'tunnel', 'bridge', 'junction', 'width'],
      dtype='object')

In [19]:
os_ta_streets_edges.columns = ['os_ta_index', 'index', 'u', 'v', 'key', 'osmid', 'oneway', 'name',
       'highway', 'reversed', 'length', 'geometry', 'maxspeed', 'lanes', 'ref',
       'access', 'tunnel', 'bridge', 'junction', 'width']

In [20]:
os_ta_streets_edges.head(3)

Unnamed: 0,os_ta_index,index,u,v,key,osmid,oneway,name,highway,reversed,length,geometry,maxspeed,lanes,ref,access,tunnel,bridge,junction,width
0,0,0,139693,5723720351,0,5118378,True,ויצמן,tertiary,False,4.37,"LINESTRING (668968.683 3552240.237, 668968.629...",,,,,,,,
1,1,1,139693,139698,0,167691710,True,יהודה המכבי,tertiary,False,62.173,"LINESTRING (668968.683 3552240.237, 668972.601...",,,,,,,,
2,2,2,139698,139723,0,167691710,True,יהודה המכבי,tertiary,False,110.029,"LINESTRING (669030.815 3552244.552, 669082.911...",,,,,,,,


Lets drop these columns:  key, ref, access, width

Since they have little data in some cases and/or are not relevant to my exploration.

In [21]:
os_ta_streets_edges = os_ta_streets_edges.drop(columns=['index','key', 'ref', 'access', 'width','highway', 'oneway','maxspeed','lanes'])
os_ta_streets_edges

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction
0,0,139693,5723720351,5118378,ויצמן,False,4.370,"LINESTRING (668968.683 3552240.237, 668968.629...",,,
1,1,139693,139698,167691710,יהודה המכבי,False,62.173,"LINESTRING (668968.683 3552240.237, 668972.601...",,,
2,2,139698,139723,167691710,יהודה המכבי,False,110.029,"LINESTRING (669030.815 3552244.552, 669082.911...",,,
3,3,139707,139708,26516058,ירמיהו,False,37.249,"LINESTRING (667826.578 3552389.242, 667810.983...",,,
4,4,139707,10985355495,1183058410,אוסישקין,False,190.227,"LINESTRING (667826.578 3552389.242, 667831.045...",,,
...,...,...,...,...,...,...,...,...,...,...,...
9523,9523,12287814424,544561044,"[396627330, 396627331, 396627332, 139744099, 1...",איילון דרום,False,1158.030,"LINESTRING (668470.214 3547868.809, 668469.837...",,yes,
9524,9524,12287814424,2213119636,1082074722,,False,656.725,"LINESTRING (668470.214 3547868.809, 668467.408...",,,
9525,9525,12292683830,1790662878,167691679,המסגר,False,136.733,"LINESTRING (668470.453 3548624.735, 668458.737...",,,
9526,9526,12292683834,6145368549,1106792626,הרכב,False,31.335,"LINESTRING (668525.204 3548829.326, 668538.677...",,,


#### Some street names and osmid are list, so adding a column of type for better filtering

In [22]:
os_ta_streets_edges['name_type'] = os_ta_streets_edges.name.apply(type).astype(str)

os_ta_streets_edges['osmid_type'] = os_ta_streets_edges.osmid.apply(type).astype(str)
os_ta_streets_edges.name_type.value_counts(), os_ta_streets_edges.osmid_type.value_counts()

(name_type
 <class 'str'>      8227
 <class 'float'>    1125
 <class 'list'>      176
 Name: count, dtype: int64,
 osmid_type
 <class 'int'>     8870
 <class 'list'>     658
 Name: count, dtype: int64)

In [23]:
os_ta_streets_edges[os_ta_streets_edges.name_type != "<class 'list'>"]['name'].unique().shape

(1843,)

In [24]:
def fix_name(x):

    print()
    print(type(x))
    if isinstance(x, list):

        print(type(x))

In [25]:
os_ta_streets_edges['name_fixed'] = os_ta_streets_edges['name'].apply(lambda x: ' ,'.join(x) if isinstance(x,list) else x )

In [26]:
os_ta_streets_edges[os_ta_streets_edges.name_type == "<class 'list'>"]

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed
89,89,1627588,4831211731,"[30146001, 31878573]","[שדרות דוד בן גוריון, הירקון]",False,258.017,"LINESTRING (667000.201 3551145.904, 667008.054...",,,,<class 'list'>,<class 'list'>,"שדרות דוד בן גוריון ,הירקון"
103,103,3359271,412400474,"[38434304, 38434306, 387701324, 387701325, 106...","[אליעזר פרי, הרברט סמואל, הירקון]",False,654.135,"LINESTRING (667129.890 3551539.533, 667070.571...",yes,,,<class 'list'>,<class 'list'>,"אליעזר פרי ,הרברט סמואל ,הירקון"
157,157,34651074,983958272,"[1066112490, 1062311107]","[סלומון, שדרות הר ציון]",False,151.922,"LINESTRING (667838.319 3548364.659, 667865.259...",,,,<class 'list'>,<class 'list'>,"סלומון ,שדרות הר ציון"
275,275,280004122,10985378458,"[1183058448, 1183058449, 1183058450]","[בני דן, אוסישקין]",False,63.286,"LINESTRING (668339.603 3552428.038, 668333.580...",,,,<class 'list'>,<class 'list'>,"בני דן ,אוסישקין"
367,367,286542526,540430271,"[43139680, 43139713]","[אברבנאל, רבנו חננאל]",False,197.156,"LINESTRING (666789.481 3547886.269, 666789.167...",,,,<class 'list'>,<class 'list'>,"אברבנאל ,רבנו חננאל"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9212,9212,9917964182,1252167870,"[35102350, 35102351]","[גשר ע""ש יצחק מודעי, יעקב דורי]",False,36.685,"LINESTRING (669699.206 3551298.634, 669687.871...",,yes,,<class 'list'>,<class 'list'>,"גשר ע""ש יצחק מודעי ,יעקב דורי"
9309,9309,10771925346,10814329460,"[1225595552, 1158287430, 1216081223, 121608122...","[2430, 2433, 2040]",False,391.187,"LINESTRING (668862.050 3556246.923, 668856.457...",,,,<class 'list'>,<class 'list'>,"2430 ,2433 ,2040"
9422,9422,11107954571,384687032,"[239183875, 33631292]","[אידלסון, ביאליק]",False,343.822,"LINESTRING (667065.971 3549667.851, 667065.418...",,,,<class 'list'>,<class 'list'>,"אידלסון ,ביאליק"
9452,9452,11267492433,10771925346,"[1158287431, 1216081225, 1216081228, 121608123...","[2430, 2433, 2040]",False,367.315,"LINESTRING (668861.085 3555985.906, 668848.472...",,,,<class 'list'>,<class 'list'>,"2430 ,2433 ,2040"


All street names that are float are NaNs

### Checking number of unique streets and difference between os and ta

In [27]:
ta_streets.trechov.nunique(), os_ta_streets_edges.name_fixed.nunique()

(2519, 2010)

There is a difference of about 500 not including streets that have a slight different name.

Because of a slight difference in some names trying to get those specific different street will give confusing results.

To get the streets that are not part of os data set we need to overlay geographically the streets and find the difference.

We will do it in an itterative way, each check of we will increase the size of the polygons

running this takes quite a while so I exported the results below


In [28]:
# df1_disjoint = ta_streets[ta_streets.disjoint(os_ta_streets_edges.unary_union)]


In [29]:
# df1_disjoint.to_parquet('./csv_tables/ta_streets_disjointed_from_os.parquet')

In [30]:
# ta_streets.geometry = ta_streets.buffer(1)
# os_ta_streets_edges_cp = os_ta_streets_edges.copy()

# os_ta_streets_edges_cp.geometry = os_ta_streets_edges_cp.buffer(1)

In [31]:
# disjointed_buff_1 = gpd.read_parquet('./csv_tables/ta_streets_disjointed_from_os.parquet')גג

In [32]:
# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(os_ta_streets_edges)
# m.add_gdf(disjointed_buff_1, zoom_to_layer=True, fill_colors='black')

# m

After going over some of the disjointed roads. we see that some should be joined, like בראלי

Lets increase our buff to 5 and continue looking for disjoint

In [33]:
# # ta_streets.geometry = ta_streets.buffer(4)
# disjointed_buff_4 = disjointed_buff_1.copy()
# os_ta_streets_edges.geometry = os_ta_streets_edges.buffer(4)

# disjointed_buff_4.geometry = disjointed_buff_4.buffer(4)


In [34]:
# disjointed_buff_4 = disjointed_buff_4[disjointed_buff_4.disjoint(os_ta_streets_edges.unary_union)]

In [35]:
os_ta_streets_edges.shape

(9528, 14)

In [36]:
# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(os_ta_streets_edges)
# m.add_gdf(disjointed_buff_4, zoom_to_layer=True, fill_colors='black')

# m

After viewing we can conclude that all the disjointed roads are irrelevant and can be dropped

In [37]:
# disjointed_buff_4.to_parquet('./csv_tables/disjointed_roads_to_be_dropped.parquet')

### Dropping disjointed roads that exist in ta but not in os, will be dropped from ta

In [38]:
disjointed = gpd.read_parquet('./csv_tables/disjointed_roads_to_be_dropped.parquet')

disjointed


Unnamed: 0,oidrechov,trechov,shemangli,shemarvit,geometry
5,6.0,הטייסים,HATASSIM DERAKH,الطيارون,"POLYGON ((671209.740 3546862.619, 671209.641 3..."
11,12.0,הר סיני,HAR SINAI,جبل سيناء,"POLYGON ((667307.216 3548993.331, 667307.590 3..."
51,53.0,יון מצולה,YEVEN METSULA,أيون مسلية,"POLYGON ((668136.418 3547970.343, 668171.279 3..."
56,58.0,נתיבי אילון צפון,AYALON NORTH,شمال أيالون مسارات,"POLYGON ((669036.161 3546496.238, 669036.003 3..."
66,69.0,0,UKNOWN,,"POLYGON ((665489.832 3547715.986, 665489.469 3..."
...,...,...,...,...,...
8831,9800.0,פרוץ לאו,PERUYZ LEO,انهيار,"POLYGON ((671055.604 3546696.692, 671055.479 3..."
8832,9801.0,ויל קורט,WEIL KURT,سوف المحكمة,"POLYGON ((671182.962 3546708.410, 671182.785 3..."
8836,9805.0,דוחן,DOHAN,الدخن,"POLYGON ((669238.339 3546686.043, 669238.540 3..."
8838,9807.0,גופר,GOFER,جوببر,"POLYGON ((669354.407 3546649.755, 669354.440 3..."


In [39]:
ta_streets.shape

(8879, 5)

In [40]:
# dropping

ta_streets.drop(index=disjointed.index, inplace=True)

### Finding all instances where street name in os is different then ta

In [41]:
pip install thefuzz

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [42]:
os_ta_streets_edges.shape

(9528, 14)

In [43]:
import pandas as pd
from thefuzz import fuzz

def best_fuzzy_match(name_1, name_2):
    """
    Compare two strings (name_1, name_2) under different transformations
    (removing punctuation, reversing word order) and return the best fuzzy score.
    Return a tuple of (original_name_1, original_name_2, best_fuzz_score).
    """
    original_name_1 = name_1
    original_name_2 = name_2
    
    # Handle the case where name_1 or name_2 might not be a string (e.g., NaN or float)
    if not isinstance(name_1, str):
        if pd.isna(name_1):
            name_1 = ""
        else:
            name_1 = str(name_1)
    if not isinstance(name_2, str):
        if pd.isna(name_2):
            name_2 = ""
        else:
            name_2 = str(name_2)

    # Clean up commas
    name_1_clean = name_1.replace(",", "").strip()
    name_2_clean = name_2.replace(",", "").strip()
    
    # If multiple words, reverse word order
    def reverse_if_multiple_words(s):
        words = s.split()
        return " ".join(words[::-1]) if len(words) > 1 else s
    
    variants_1 = [
        name_1_clean,
        reverse_if_multiple_words(name_1_clean)
    ]
    variants_2 = [
        name_2_clean,
        reverse_if_multiple_words(name_2_clean)
    ]
    
    best_score = 0
    for v1 in variants_1:
        for v2 in variants_2:
            score = fuzz.ratio(v1, v2)
            if score > best_score:
                best_score = score
    
    return (original_name_1, original_name_2, best_score)


# ---------------------------------------------------
# Example usage:

# 1. Extract unique street names from each DataFrame.
#    - 'trechov' in ta_streets
#    - 'name_fixed' in os_ta_streets_edges
unique_ta_names = ta_streets['trechov'].dropna().unique()
unique_os_names = os_ta_streets_edges['name_fixed'].dropna().unique()
print(unique_ta_names.shape, unique_os_names.shape)
# 2. For each unique street in ta_streets, find the single best match in os_ta_streets_edges.
fuzz_results = []

counter = 0

for ta_name in unique_ta_names:
    counter = 1 + counter
    ta_name_str = str(ta_name).strip()
    # Optional: skip empty strings if needed
    if not ta_name_str:
        continue

    best_score_for_this_ta = 0
    best_os_street_name = None

    # Compare with every unique os_name
    for os_name in unique_os_names:
        _, _, score = best_fuzzy_match(ta_name_str, os_name)
        
        if score > best_score_for_this_ta:
            best_score_for_this_ta = score
            best_os_street_name = os_name

    # After checking all os_name, store only the single best match
    if best_os_street_name is not None:
        fuzz_results.append({
            'trechov': ta_name_str,             # Unique street from ta_streets
            'name': best_os_street_name,        # Best match from os_ta_streets_edges
            'best_score': best_score_for_this_ta
        })
    if counter%500 == 0:
        print(counter)

# 3. Convert to DataFrame
fuzz_score = pd.DataFrame(fuzz_results, columns=['trechov', 'name', 'best_score'])
fuzz_score


(2447,) (2010,)
500
1000
1500
2000


Unnamed: 0,trechov,name,best_score
0,הרוגי מלכות,הרוגי מלכות,100
1,0,907,50
2,אמסטרדם,אמסטרדם,100
3,אלון יגאל,יגאל אלון,100
4,מרגולין,מרגולין,100
...,...,...,...
2441,470,4870,86
2442,חבר הלאומים,חבר הלאומים,100
2443,3969,3629,75
2444,3967,3956,75


I've decided to go over each line of names while looking at sorted score. I believe at some point in the score I can reliably use from that score onward changing values without looking. but for the scores that I'm not sure I go over and keep things I am sure and thing I want to check

keep: 
* בלום ליאון	בלום ליאון ,הברון הירש
* עמיקם אליהו (קשאק)	אליהו (קשאק) עמיקם
* הבעש"ט	שדרות הבעש"ט
* מוצקין	שדרות מוצקין
* גלוסקא זכריה	גלוסקא	
* אבולעפיה רבקה ושלמה	רבקה ושמעון אבולעפיה
* דן שיקה (ישעיהו)	שייקה דן
* פלד (רופין) רות ד"ר	רות רובין-פלד
* חנינא בן דוסא	בן דוסא
* אנדרומדה	מדרגות אנדרומדה
* צור צבי (צ'רה)	צבי צור (צ'רה)
* נמרי דוידקה	דוידקה	
* טריטש	דוד טריטש
* ישורון אבות	ישורון
* אבן שושן אברהם	אברהם אבן שושן
* ויסר חנה ומרדכי	חנה ומרדכי וייסר
* בן עטר	חיים בן עטר
* ביליס מנחם מנדל	מנחם מנדל בייליס
* שפירא צבי הרמן	צבי הרמן שפירא
* שתי האחיות,	בת גלים שתי האחיות
* פרנקל ידידיה,	הרב יצחק ידידיה פרנקל
* יד לבנים	שדרות, יד לבנים
* שוסטקוביץ' דמיטרי	שוסטקוביץ', דמיטרי דוד אויסטרך
* נתיבי אילון דרום,	איילון דרום
* גרציאני יצחק (זיקו),	יצחק (זיקו) גרציאני
* מיצקביץ אדם,	מיצקביץ'
* בוסקוביץ' אלכסנדר אוריה,	אלכסנדר אוריה בוסקוביץ'

figured I can just write the index of what to keep and what to check
[2356,1965, 69, 1524, 1786,2051, 520, 1176, 1853, 256, 1908, 2299, 2135, 749, 1268, 2017,335, 429, 1857, 675, 1639, 591, 656, 1600, 428, 1614, 1463,
1342,1291,273,   ]

Check:
* בן הלל מרדכי	בן הלל
* חנינא בן תרדיון	רבי חנינא
* שינמן פנחס	רבי פנחס	
* אנקאווא רפאל הרב	הרב אלנקוה
* בן יאיר פנחס	רבי פנחס
* שניאור זלמן	שניאור
* שינקין מנחם	שינקין
* פרופס	צבי פרופס
* ליפסקי לואי	ליפסקי
* זטורי משה	זטורי
* הרב טולדנו	טולדאנו
* רוקח נמיר	דרך נמיר
* קרן היסוד	היסוד
* גנני	ש. גנני
* הופמן יעקב ד"ר	בר הופמן
* לוי יוסף גונדר	יוסף לוי
* ההגנה-נשרי צבי	צבי נשרי
* כרמי דב	כרמי
* צינה דיזנגוף	דיזנגוף

[1135, 1082, 1357, 192, 219, 1688,2102, 655, 1848, 870, 217, 1137, 1677, 2136, 1348]

Stopped at 400:450

In [44]:
fuzz_score[(fuzz_score.best_score < 87) & (fuzz_score.best_score > 85)]

Unnamed: 0,trechov,name,best_score
83,"חי""ש",חיש,86
106,סלואדור,סלוודור,86
131,קהילת צ'רנוביץ,קהילת טשרנוביץ,86
206,נמיר מרדכי,מרדכי מאייר,86
383,ברוק,ברק,86
387,הנביאים,הנשיאים,86
500,429,3429,86
505,בן ציון,בת ציון,86
510,356,3956,86
513,עין דור,עין ורד,86


### Conclusions after going over names by score:

From 87 to 100 can be trusted.


### Fixing Street Names

Open TA has names I have corrected in hebrew and english so:<br>
* I will overlap Open TA data with OSM data.

-------

Next: <br>
* Remove roads from ta_streets and os using the names of the streets with a score between 87 and 100.
* go over the remaining roads and preform overlay.


In [45]:
fuzz_score[fuzz_score.best_score > 87]['trechov'].shape

(1513,)

In [46]:
fuzz_score

Unnamed: 0,trechov,name,best_score
0,הרוגי מלכות,הרוגי מלכות,100
1,0,907,50
2,אמסטרדם,אמסטרדם,100
3,אלון יגאל,יגאל אלון,100
4,מרגולין,מרגולין,100
...,...,...,...
2441,470,4870,86
2442,חבר הלאומים,חבר הלאומים,100
2443,3969,3629,75
2444,3967,3956,75


In [47]:
# list of street names in ta that have a score between 87 and 100, later on will be used to remove from ta
ta_street_names_to_be_removed = fuzz_score[fuzz_score.best_score > 87]['trechov'].unique()
os_street_names_to_be_removed = fuzz_score[fuzz_score.best_score > 87]['name'].unique()
ta_street_names_to_be_removed.shape, os_street_names_to_be_removed.shape

((1513,), (1509,))

In [48]:
ta_street_names_to_be_removed

array(['הרוגי מלכות', 'אמסטרדם', 'אלון יגאל', ..., '3629', 'חבר הלאומים',
       '3907'], dtype=object)

In [49]:
os_streets_after_name_remove = os_ta_streets_edges[~(os_ta_streets_edges.name.isin(os_street_names_to_be_removed))].copy()
os_ta_streets_edges.shape, os_streets_after_name_remove.shape

((9528, 14), (2854, 14))

In [50]:
# removing edges from ta_street
ta_streets_after_name_remove = ta_streets[~(ta_streets.trechov.isin(ta_street_names_to_be_removed))].copy()
ta_streets.shape, ta_streets_after_name_remove.shape

((8455, 5), (2063, 5))

### Trying to find overlapping polygons and their share

In [51]:
os_streets_after_name_remove.geometry = os_streets_after_name_remove.buffer(1, cap_style='flat')
ta_streets_after_name_remove.geometry = ta_streets_after_name_remove.buffer(1, cap_style='flat')


In [52]:
# Perform intersection
intersection_gdf = gpd.overlay(
    ta_streets_after_name_remove, 
    os_streets_after_name_remove, 
    how='intersection'
)
print(intersection_gdf.shape)
intersection_gdf.head(3)

(2184, 18)


Unnamed: 0,oidrechov,trechov,shemangli,shemarvit,os_ta_index,u,v,osmid,name,reversed,length,tunnel,bridge,junction,name_type,osmid_type,name_fixed,geometry
0,2.0,0,UKNOWN,,103,3359271,412400474,"[38434304, 38434306, 387701324, 387701325, 106...","[אליעזר פרי, הרברט סמואל, הירקון]",False,654.135,yes,,,<class 'list'>,<class 'list'>,"אליעזר פרי ,הרברט סמואל ,הירקון","POLYGON ((667058.224 3551426.511, 667056.055 3..."
1,15.0,שפירא צבי הרמן,SHAPIRA HERMAN,شابيرو زفي هيرمان,1458,352934158,352934309,31539281,צבי הרמן שפירא,False,71.638,,,,<class 'str'>,<class 'int'>,צבי הרמן שפירא,"POLYGON ((667703.568 3550110.223, 667693.932 3..."
2,72.0,שיטרית בכור,SHITREET,شيتريت في البكر,1536,354027893,2349733321,"[1134738131, 657412542]",שטרית,False,77.893,,,,<class 'str'>,<class 'list'>,שטרית,"POLYGON ((672257.365 3553735.558, 672257.855 3..."


#### Calculating Share for os_streets_edges

In [53]:
intersection_gdf['overlap_area'] = intersection_gdf.geometry.area

In [54]:
os_ta_original_area = os_ta_streets_edges[['os_ta_index','geometry']].copy()
os_ta_original_area.geometry = os_ta_original_area.buffer(1)
os_ta_original_area['os_area'] = os_ta_original_area.geometry.area
os_ta_original_area

Unnamed: 0,os_ta_index,geometry,os_area
0,0,"POLYGON ((668967.629 3552244.583, 668967.633 3...",11.852045
1,1,"POLYGON ((668972.601 3552241.236, 668974.489 3...",127.731089
2,2,"POLYGON ((669082.797 3552250.437, 669140.168 3...",223.627744
3,3,"POLYGON ((667811.891 3552355.074, 667811.845 3...",77.491866
4,4,"POLYGON ((667831.409 3552388.362, 667842.035 3...",384.274329
...,...,...,...
9523,9523,"POLYGON ((668470.828 3547866.295, 668467.145 3...",2312.953060
9524,9524,"POLYGON ((668468.408 3547727.719, 668467.228 3...",1313.225034
9525,9525,"POLYGON ((668459.727 3548556.358, 668451.331 3...",275.868776
9526,9526,"POLYGON ((668538.536 3548832.225, 668538.661 3...",65.929849


In [55]:
# Now merge the area back into the intersection GDF:

intersection_gdf = intersection_gdf.merge(
    os_ta_original_area[['os_ta_index', 'os_area']],
    on='os_ta_index',
    how='left'
)

In [56]:
intersection_gdf.shape

(2184, 20)

In [57]:
intersection_gdf.head(3)

Unnamed: 0,oidrechov,trechov,shemangli,shemarvit,os_ta_index,u,v,osmid,name,reversed,length,tunnel,bridge,junction,name_type,osmid_type,name_fixed,geometry,overlap_area,os_area
0,2.0,0,UKNOWN,,103,3359271,412400474,"[38434304, 38434306, 387701324, 387701325, 106...","[אליעזר פרי, הרברט סמואל, הירקון]",False,654.135,yes,,,<class 'list'>,<class 'list'>,"אליעזר פרי ,הרברט סמואל ,הירקון","POLYGON ((667058.224 3551426.511, 667056.055 3...",4.379963,1308.798959
1,15.0,שפירא צבי הרמן,SHAPIRA HERMAN,شابيرو زفي هيرمان,1458,352934158,352934309,31539281,צבי הרמן שפירא,False,71.638,,,,<class 'str'>,<class 'int'>,צבי הרמן שפירא,"POLYGON ((667703.568 3550110.223, 667693.932 3...",51.428333,146.113806
2,72.0,שיטרית בכור,SHITREET,شيتريت في البكر,1536,354027893,2349733321,"[1134738131, 657412542]",שטרית,False,77.893,,,,<class 'str'>,<class 'list'>,שטרית,"POLYGON ((672257.365 3553735.558, 672257.855 3...",51.911374,159.173965


In [58]:
intersection_gdf['os_overlap_share'] = (
    intersection_gdf['overlap_area'] / intersection_gdf['os_area']
)

### Calculating share for ta_streets

In [59]:
ta_streets_area = ta_streets[['oidrechov','geometry']].copy()
ta_streets_area.geometry = ta_streets_area.buffer(1)

ta_streets_area['ta_area'] = ta_streets_area.geometry.area
ta_streets_area

Unnamed: 0,oidrechov,geometry,ta_area
0,1.0,"POLYGON ((672895.663 3554081.464, 672895.749 3...",68.747672
1,2.0,"POLYGON ((667065.475 3551427.522, 667065.571 3...",154.255249
2,3.0,"POLYGON ((667941.006 3551408.388, 667941.099 3...",129.703351
3,4.0,"POLYGON ((669580.505 3550444.333, 669580.552 3...",55.095628
4,5.0,"POLYGON ((669409.421 3548322.350, 669409.518 3...",163.661697
...,...,...,...
8874,9851.0,"POLYGON ((665761.176 3546995.662, 665761.133 3...",62.069636
8875,9852.0,"POLYGON ((665628.588 3547141.361, 665695.882 3...",333.043508
8876,9853.0,"POLYGON ((665759.584 3547029.798, 665772.228 3...",168.570465
8877,9855.0,"POLYGON ((665074.943 3546678.250, 665074.845 3...",65.178864


In [60]:
# Now merge the area back into the intersection GDF:

intersection_gdf = intersection_gdf.merge(
    ta_streets_area[['oidrechov', 'ta_area']],
    on='oidrechov',
    how='left'
)

In [61]:
intersection_gdf.head(3)

Unnamed: 0,oidrechov,trechov,shemangli,shemarvit,os_ta_index,u,v,osmid,name,reversed,...,bridge,junction,name_type,osmid_type,name_fixed,geometry,overlap_area,os_area,os_overlap_share,ta_area
0,2.0,0,UKNOWN,,103,3359271,412400474,"[38434304, 38434306, 387701324, 387701325, 106...","[אליעזר פרי, הרברט סמואל, הירקון]",False,...,,,<class 'list'>,<class 'list'>,"אליעזר פרי ,הרברט סמואל ,הירקון","POLYGON ((667058.224 3551426.511, 667056.055 3...",4.379963,1308.798959,0.003347,154.255249
1,15.0,שפירא צבי הרמן,SHAPIRA HERMAN,شابيرو زفي هيرمان,1458,352934158,352934309,31539281,צבי הרמן שפירא,False,...,,,<class 'str'>,<class 'int'>,צבי הרמן שפירא,"POLYGON ((667703.568 3550110.223, 667693.932 3...",51.428333,146.113806,0.351974,153.476994
2,72.0,שיטרית בכור,SHITREET,شيتريت في البكر,1536,354027893,2349733321,"[1134738131, 657412542]",שטרית,False,...,,,<class 'str'>,<class 'list'>,שטרית,"POLYGON ((672257.365 3553735.558, 672257.855 3...",51.911374,159.173965,0.32613,98.247133


In [62]:
intersection_gdf['ta_overlap_share'] = (
    intersection_gdf['overlap_area'] / intersection_gdf['ta_area']
)

In [63]:
# intersection_gdf_sorted = intersection_gdf[['name','trechov','overlap_area','os_area','os_overlap_share','ta_area','ta_overlap_share','shemangli']].sort_values(by=['os_overlap_share','ta_overlap_share'], ascending=False).copy()

# intersection_gdf_sorted[intersection_gdf_sorted]

In [64]:
intersection_gdf.columns

Index(['oidrechov', 'trechov', 'shemangli', 'shemarvit', 'os_ta_index', 'u',
       'v', 'osmid', 'name', 'reversed', 'length', 'tunnel', 'bridge',
       'junction', 'name_type', 'osmid_type', 'name_fixed', 'geometry',
       'overlap_area', 'os_area', 'os_overlap_share', 'ta_area',
       'ta_overlap_share'],
      dtype='object')

In [65]:
intersection_gdf['sum_share'] = intersection_gdf['os_overlap_share'] + intersection_gdf['ta_overlap_share']
intersection_gdf

Unnamed: 0,oidrechov,trechov,shemangli,shemarvit,os_ta_index,u,v,osmid,name,reversed,...,name_type,osmid_type,name_fixed,geometry,overlap_area,os_area,os_overlap_share,ta_area,ta_overlap_share,sum_share
0,2.0,0,UKNOWN,,103,3359271,412400474,"[38434304, 38434306, 387701324, 387701325, 106...","[אליעזר פרי, הרברט סמואל, הירקון]",False,...,<class 'list'>,<class 'list'>,"אליעזר פרי ,הרברט סמואל ,הירקון","POLYGON ((667058.224 3551426.511, 667056.055 3...",4.379963,1308.798959,0.003347,154.255249,0.028394,0.031741
1,15.0,שפירא צבי הרמן,SHAPIRA HERMAN,شابيرو زفي هيرمان,1458,352934158,352934309,31539281,צבי הרמן שפירא,False,...,<class 'str'>,<class 'int'>,צבי הרמן שפירא,"POLYGON ((667703.568 3550110.223, 667693.932 3...",51.428333,146.113806,0.351974,153.476994,0.335088,0.687063
2,72.0,שיטרית בכור,SHITREET,شيتريت في البكر,1536,354027893,2349733321,"[1134738131, 657412542]",שטרית,False,...,<class 'str'>,<class 'list'>,שטרית,"POLYGON ((672257.365 3553735.558, 672257.855 3...",51.911374,159.173965,0.326130,98.247133,0.528375,0.854505
3,97.0,קרן קיימת לישראל,KEREN KAYEMET LEISRA,كيرين موجود إلى إسرائيل,3484,530842257,1156944947,510740947,"שדרות קק""ל",False,...,<class 'str'>,<class 'int'>,"שדרות קק""ל","POLYGON ((669938.859 3555328.253, 669863.036 3...",61.929598,867.319911,0.071403,312.781807,0.197996,0.269400
4,109.0,בן יוסף שלמה,BEN YOSEF,بن يوسف شلومو,2691,415448941,415448762,35417547,,False,...,<class 'float'>,<class 'int'>,,"POLYGON ((669617.990 3555715.980, 669617.692 3...",4.774089,58.865285,0.081102,181.505448,0.026303,0.107405
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2179,9791.0,עכו,AKKO,عكا,157,34651074,983958272,"[1066112490, 1062311107]","[סלומון, שדרות הר ציון]",False,...,<class 'list'>,<class 'list'>,"סלומון ,שדרות הר ציון","POLYGON ((667860.788 3548410.049, 667862.707 3...",4.048996,306.726319,0.013201,507.224027,0.007983,0.021183
2180,9791.0,עכו,AKKO,عكا,6399,1574692681,8423778839,227905985,שביל עכו,False,...,<class 'str'>,<class 'int'>,שביל עכו,"MULTIPOLYGON (((667748.649 3548437.531, 667746...",165.400200,434.929999,0.380292,507.224027,0.326089,0.706381
2181,9791.0,עכו,AKKO,عكا,6400,1574692681,2365353786,1035607025,,False,...,<class 'float'>,<class 'int'>,,"POLYGON ((667661.683 3548517.651, 667661.753 3...",5.430058,39.603396,0.137111,507.224027,0.010705,0.147816
2182,9814.0,1244,1244,1244,8185,3644526490,3713791729,367458134,1276,True,...,<class 'str'>,<class 'int'>,1276,"POLYGON ((668392.232 3548839.208, 668394.228 3...",1.802595,81.302271,0.022172,151.144693,0.011926,0.034098


#### Checking sum_share in different ranges, e.g above 1.8 (meaning 09.+09) etc

In [66]:
intersection_gdf[(intersection_gdf['sum_share'] < 5) & (intersection_gdf['sum_share'] > 0.001)][['os_ta_index','oidrechov','name','trechov','shemangli','geometry','os_overlap_share','ta_overlap_share','sum_share']].sort_values(by=['sum_share'], ascending=False)

Unnamed: 0,os_ta_index,oidrechov,name,trechov,shemangli,geometry,os_overlap_share,ta_overlap_share,sum_share
245,6594,1117.0,ילין מור,ילין מור נתן,MOR,"POLYGON ((669082.040 3549586.436, 669072.991 3...",0.943799,0.907792,1.851591
250,6590,1125.0,ילין מור,ילין מור נתן,MOR,"POLYGON ((669070.977 3549521.297, 669072.876 3...",0.942522,0.908332,1.850854
173,3346,862.0,שדרות דוד המלך,דוד המלך,DAVID HAMELLECH,"POLYGON ((668805.599 3550676.051, 668653.543 3...",0.931505,0.910306,1.841812
181,445,873.0,שדרות דוד המלך,דוד המלך,DAVID HAMELLECH,"POLYGON ((668315.403 3550714.521, 668315.261 3...",0.852566,0.893886,1.746452
1150,4577,5884.0,מבוא גרופית,גרופית,GROFIT,"POLYGON ((672472.795 3555244.075, 672537.429 3...",0.948299,0.785801,1.734099
...,...,...,...,...,...,...,...,...,...
500,3716,2406.0,רוזנבלט,רוזנבלט צבי,ROSENBLAT,"POLYGON ((666050.433 3546647.155, 666050.457 3...",0.000573,0.000546,0.001119
854,3569,4732.0,הרב יצחק ידידיה פרנקל,פרנקל ידידיה,FRENKEL,"POLYGON ((667334.359 3548125.464, 667334.241 3...",0.000312,0.000805,0.001117
588,2977,3267.0,אלתרמן,אלתרמן נתן,ALTERMAN,"POLYGON ((671816.767 3555224.930, 671816.685 3...",0.000778,0.000318,0.001096
675,8320,3498.0,דרך נמיר,נמיר מרדכי,NAMIR,"POLYGON ((669279.008 3550629.732, 669277.431 3...",0.000571,0.000524,0.001095


List of contradicting names to check:

* os: 2161    ta: 6131 
* os: 3223    ta: 7240.0
* os: 5785 	  ta: 9187.0
* os: 5087	  ta: 6215.0
* os: 7606	  ta: 7707.0
* os: 3465	  ta: 153.0
* os: 4664	  ta: 2982.0
* os: 5425	  ta: 6953.0
* os: 6235	  ta: 1651.0
* os: 2260	  ta: 7804.0	
* os: 4807	  ta: 7922.0
* os: 5685	  ta: 7777.0
* os: 6894	  ta: 1183.0
* os: 1885	  ta: 332.0
* os: 1419	  ta: 8230.0
* os: 2877	  ta: 7615.0
* os: 2979	  ta: 7823.0
* os: 2688	  ta: 7844.0
* os: 4760	  ta: 7615.0
* os: 8977	  ta: 7929.0
* os: 5101	  ta: 8199.0
* os: 8044	  ta: 6324.0
* os: 7412	  ta: 9286.0	
* os: 8677	  ta: 9307.0

-----

Conclusions:

Buffer 1 and using sum share between 1.9 to 0.8 I got 274 rows and most of them are a good match name wise.<br>
Below 0.8, results are not so trust worthy. 
To handle this we need to increase buffer and preform the same actions of overlay etc, again.

NEXT steps:
1. get the os_index and ta oidrechov of the edges we consider good hence we drop them.
2. preform the same actions of overlay as before with a bigger buffer


In [67]:
os_to_check = [2161, 3223, 5785, 5087, 7606, 3465, 4664, 5425, 6235, 2260, 4807, 5685, 6894, 1885, 1419, 2877, 2979, 2688, 4760, 8977, 5101, 8044, 7412, 8677]
ta_to_check = [6131, 7240.0, 9187.0, 6215.0, 7707.0, 153.0, 2982.0, 6953.0, 1651.0, 7804.0, 7922.0, 7777.0, 1183.0, 332.0, 8230.0, 7615.0, 7823.0, 7844.0, 7615.0, 7929.0, 8199.0, 6324.0, 9286.0, 9307.0]

In [68]:
intersection_gdf[(intersection_gdf['sum_share'] <= 1.9) & (intersection_gdf['sum_share'] >= 0.8)]['oidrechov'].shape, intersection_gdf[(intersection_gdf['sum_share'] <= 1.9) & (intersection_gdf['sum_share'] >= 0.8)]['oidrechov'].unique().shape

((274,), (271,))

In [69]:
intersection_gdf[(intersection_gdf['sum_share'] <= 1.9) & (intersection_gdf['sum_share'] >= 0.8)]['os_ta_index'].shape ,intersection_gdf[(intersection_gdf['sum_share'] <= 1.9) & (intersection_gdf['sum_share'] >= 0.8)]['os_ta_index'].unique().shape

((274,), (269,))

In [70]:
# Getting the os_index and ta oidrechov:

oidrechov_to_drop   = intersection_gdf[(intersection_gdf['sum_share'] <= 1.9) & (intersection_gdf['sum_share'] >= 0.8)]['oidrechov'].unique()
os_ta_index_to_drop = intersection_gdf[(intersection_gdf['sum_share'] <= 1.9) & (intersection_gdf['sum_share'] >= 0.8)]['os_ta_index'].unique()

Filtering again the original ta and os, seems there's an issus when buffering a second time.

In [71]:
# First drop by name
os_streets_after_name_remove_2 = os_ta_streets_edges[~(os_ta_streets_edges.name.isin(os_street_names_to_be_removed))].copy()
print(os_ta_streets_edges.shape, os_streets_after_name_remove_2.shape)

# second drop by os_ta_index
os_streets_after_name_remove_2 = os_streets_after_name_remove_2[~(os_streets_after_name_remove_2.os_ta_index.isin(os_ta_index_to_drop))].copy()
os_streets_after_name_remove.shape, os_streets_after_name_remove_2.shape

(9528, 14) (2854, 14)


((2854, 14), (2585, 14))

In [72]:
# First drop by name
ta_streets_after_name_remove_2 = ta_streets[~(ta_streets.trechov.isin(ta_street_names_to_be_removed))].copy()
print(ta_streets.shape, ta_streets_after_name_remove_2.shape)

# second drop by oid
ta_streets_after_name_remove_2 = ta_streets_after_name_remove_2[~(ta_streets_after_name_remove_2.oidrechov.isin(oidrechov_to_drop))].copy()
ta_streets_after_name_remove.shape, ta_streets_after_name_remove_2.shape

(8455, 5) (2063, 5)


((2063, 5), (1792, 5))

In [73]:
# buffer again
ta_streets_after_name_remove_2.geometry =  ta_streets_after_name_remove_2.buffer(5)
os_streets_after_name_remove_2.geometry =  os_streets_after_name_remove_2.buffer(5)

In [74]:
# Perform intersection
intersection_gdf_2 = gpd.overlay(
    ta_streets_after_name_remove_2, 
    os_streets_after_name_remove_2, 
    how='intersection'
)
print(intersection_gdf_2.shape)
intersection_gdf_2.head(3)

# creating overlap
intersection_gdf_2['overlap_area'] = intersection_gdf_2.geometry.area

# getting area of os
os_ta_original_area = os_ta_streets_edges[['os_ta_index','geometry']].copy()
os_ta_original_area.geometry = os_ta_original_area.buffer(5)
os_ta_original_area['os_area'] = os_ta_original_area.geometry.area

# Now merge the area back into the intersection GDF:
intersection_gdf_2 = intersection_gdf_2.merge(
    os_ta_original_area[['os_ta_index', 'os_area']],
    on='os_ta_index',
    how='left'
)



intersection_gdf_2['os_overlap_share'] = (
    intersection_gdf_2['overlap_area'] / intersection_gdf_2['os_area']
)

ta_streets_area = ta_streets[['oidrechov','geometry']].copy()
ta_streets_area.geometry = ta_streets_area.buffer(5)

ta_streets_area['ta_area'] = ta_streets_area.geometry.area


# Now merge the area back into the intersection GDF:

intersection_gdf_2 = intersection_gdf_2.merge(
    ta_streets_area[['oidrechov', 'ta_area']],
    on='oidrechov',
    how='left'
)

intersection_gdf_2['ta_overlap_share'] = (
    intersection_gdf_2['overlap_area'] / intersection_gdf_2['ta_area']
)

intersection_gdf_2['sum_share'] = intersection_gdf_2['os_overlap_share'] + intersection_gdf_2['ta_overlap_share']
intersection_gdf_2

(3677, 18)


Unnamed: 0,oidrechov,trechov,shemangli,shemarvit,os_ta_index,u,v,osmid,name,reversed,...,name_type,osmid_type,name_fixed,geometry,overlap_area,os_area,os_overlap_share,ta_area,ta_overlap_share,sum_share
0,2.0,0,UKNOWN,,103,3359271,412400474,"[38434304, 38434306, 387701324, 387701325, 106...","[אליעזר פרי, הרברט סמואל, הירקון]",False,...,<class 'list'>,<class 'list'>,"אליעזר פרי ,הרברט סמואל ,הירקון","POLYGON ((667066.026 3551431.484, 667066.392 3...",110.131311,6606.641959,0.016670,834.007213,0.132051,0.148721
1,15.0,שפירא צבי הרמן,SHAPIRA HERMAN,شابيرو زفي هيرمان,1458,352934158,352934309,31539281,צבי הרמן שפירא,False,...,<class 'str'>,<class 'int'>,צבי הרמן שפירא,"POLYGON ((667720.562 3550158.108, 667720.801 3...",658.785745,793.299997,0.830437,830.115938,0.793607,1.624044
2,15.0,שפירא צבי הרמן,SHAPIRA HERMAN,شابيرو زفي هيرمان,1460,352934199,352934309,32002848,צבי הרמן שפירא,True,...,<class 'str'>,<class 'int'>,צבי הרמן שפירא,"POLYGON ((667720.562 3550158.108, 667720.801 3...",54.964205,1118.327091,0.049149,830.115938,0.066213,0.115361
3,23.0,רוטשילד,ROTHSHILD,روتشيلد,153,34650800,384648219,762625690,שדרות רוטשילד,False,...,<class 'str'>,<class 'int'>,שדרות רוטשילד,"POLYGON ((667560.291 3548916.600, 667560.269 3...",17.140954,683.555060,0.025076,695.980636,0.024628,0.049705
4,23.0,רוטשילד,ROTHSHILD,روتشيلد,1956,384651116,4014929254,762625690,שדרות רוטשילד,False,...,<class 'str'>,<class 'int'>,שדרות רוטשילד,"POLYGON ((667509.129 3548893.362, 667500.005 3...",35.419056,707.179049,0.050085,695.980636,0.050891,0.100976
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3672,9791.0,עכו,AKKO,عكا,6400,1574692681,2365353786,1035607025,,False,...,<class 'float'>,<class 'int'>,,"POLYGON ((667659.183 3548506.700, 667657.763 3...",133.909211,260.724190,0.513605,2598.755305,0.051528,0.565133
3673,9809.0,1362,,1362,6745,1695279008,1695279012,28582615,שדרות ההשכלה,False,...,<class 'str'>,<class 'int'>,שדרות ההשכלה,"POLYGON ((669684.280 3549772.038, 669681.567 3...",0.448420,887.981750,0.000505,1618.047524,0.000277,0.000782
3674,9814.0,1244,1244,1244,8183,3644526483,3739773085,370265704,,True,...,<class 'float'>,<class 'int'>,,"POLYGON ((668391.001 3548833.283, 668390.252 3...",27.072024,403.833403,0.067038,818.454435,0.033077,0.100115
3675,9814.0,1244,1244,1244,8185,3644526490,3713791729,367458134,1276,True,...,<class 'str'>,<class 'int'>,1276,"POLYGON ((668388.368 3548842.780, 668388.604 3...",87.624710,468.980322,0.186841,818.454435,0.107061,0.293902


Checking

In [75]:
m = leafmap.Map(center=(32.047, 34.785), zoom=11)
m.add_gdf(intersection_gdf_2)
# m.add_gdf(ta_streets, zoom_to_layer=True, fill_colors='black')

m

Map(center=[32.047, 34.785], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom…

In [76]:
intersection_gdf_2[(intersection_gdf_2['sum_share'] < 1.9) & (intersection_gdf_2['sum_share'] > 1.3)][['trechov','name','os_ta_index','oidrechov','os_overlap_share','ta_overlap_share','sum_share']].sort_values(by=['sum_share'], ascending=False)

Unnamed: 0,trechov,name,os_ta_index,oidrechov,os_overlap_share,ta_overlap_share,sum_share
1491,מוסנזון בן-ציון דר',מוסינזון,1517,5338.0,0.856214,0.874290,1.730504
438,גורדון יהודה ליב,י. ל. גורדון,144,897.0,0.883663,0.838524,1.722187
428,דוד המלך,שדרות דוד המלך,338,865.0,0.851272,0.870395,1.721666
1371,נפחא יצחק,נפחא,1975,4456.0,0.832840,0.886264,1.719104
2281,זליבנסקי דויד הלוי,דוד זליבנסקי,5372,6945.0,0.882018,0.836460,1.718478
...,...,...,...,...,...,...,...
1299,אהרונסון,הרב אהרונסון,3057,4103.0,0.747475,0.557452,1.304927
2714,2409,,1548,7793.0,0.820718,0.483468,1.304186
319,דנין יחזקאל,דנין,1492,694.0,0.692224,0.610815,1.303039
3479,מרק יעקב,הרב מרק,3027,9303.0,0.668865,0.632759,1.301624




List of osmid and ta streets streets to check:

* os: 5091	ta: 5628.0
* os: 5077	ta: 8374.0
* os: 3455	ta: 152.0
* os: 6968	ta: 6090.0
* os: 5097	ta: 5627.0	
* os: 2684	ta: 186.0
* os: 1596	ta: 7848.0
* os: 9339	ta: 7629.0
* os: 3598	ta: 9785.0
* os: 95	ta: 6670.0	
* os: 7980	ta: 6445.0
* os: 5177	ta: 7822.0
* os: 5082	ta: 7770.0
* os: 1890	ta: 275.0
* os: 1504	ta: 7429.0
* os: 8837	ta: 7711.0
* os: 8125	ta: 9747.0
* os: 8183	ta: 1377.0
* os: 1875	ta: 7610.0
* os: 3875	ta: 9282.0	
* os: 5329	ta: 7824.0
* os: 9456	ta: 7792.0	
* os: 9298	ta: 7611.0
* os: 8778	ta: 1916.0	
* os: 6141	ta: 2160.0
* os: 7307	ta: 7845.0	
* os: 3453	ta: 7846.0
* os: 3598	ta: 9785.0
* os: 5486	ta: 5181.0	
* os: 8043	ta: 7918.0
* os: 9402	ta: 8054.0
* os: 1548	ta: 7793.0	
* os: 2781	ta: 5436.0
* os: 1524	ta: 7949.0
* os: 2849	ta: 8592.0


In [77]:
# os = [5091, 5077, 3455, 6968, 5097, 2684, 1596, 9339, 3598, 95, 7980, 5177, 5082, 1890, 1504, 8837, 8125, 8183, 1875, 3875, 5329, 9456, 9298, 8778, 6141, 7307, 3453, 3598, 5486, 8043, 9402, 1548, 2781, 1524, 2849]
# ta = [5628.0, 8374.0, 152.0, 6090.0, 5627.0, 186.0, 7848.0, 7629.0, 9785.0, 6670.0, 6445.0, 7822.0, 7770.0, 275.0, 7429.0, 7711.0, 9747.0, 1377.0, 7610.0, 9282.0, 7824.0, 7792.0, 7611.0, 1916.0, 2160.0, 7845.0, 7846.0, 9785.0, 5181.0, 7918.0, 8054.0, 7793.0, 5436.0, 7949.0, 8592.0]

In [78]:
intersection_gdf_2[(intersection_gdf_2['sum_share'] < 1.5) & (intersection_gdf_2['sum_share'] > 1) & (intersection_gdf_2.name.isna())][['trechov','name','os_ta_index','oidrechov','os_overlap_share','ta_overlap_share','sum_share','geometry']]

Unnamed: 0,trechov,name,os_ta_index,oidrechov,os_overlap_share,ta_overlap_share,sum_share,geometry
77,שלוש משה,,6885,275.0,0.860877,0.342755,1.203631,"POLYGON ((673254.088 3553990.690, 673254.122 3..."
642,הרב באזוב דוד,,8183,1377.0,0.745176,0.662708,1.407884,"POLYGON ((668385.686 3548839.104, 668385.786 3..."
728,יועזר איש הבירה,,8778,1916.0,0.783406,0.587511,1.370917,"POLYGON ((665881.715 3547874.893, 665882.154 3..."
1207,אלעזר דוד,,8297,3622.0,0.909674,0.145650,1.055325,"POLYGON ((668776.295 3549976.301, 668776.301 3..."
1305,התעודה האדומה,,5453,4141.0,0.368571,0.817070,1.185642,"POLYGON ((665280.248 3547402.121, 665280.686 3..."
...,...,...,...,...,...,...,...,...
3465,2411,,7412,9286.0,0.920486,0.357249,1.277735,"POLYGON ((668468.765 3554444.536, 668467.847 3..."
3467,2411,,7415,9286.0,0.806885,0.276421,1.083306,"POLYGON ((668468.766 3554444.534, 668467.847 3..."
3481,לוין דב השופט,,2625,9307.0,0.903701,0.252138,1.155838,"POLYGON ((669870.658 3555698.434, 669870.659 3..."
3486,לוין דב השופט,,8677,9307.0,0.938595,0.249655,1.188249,"POLYGON ((669898.171 3555703.461, 669898.490 3..."


In [79]:
m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(intersection_gdf_2[(intersection_gdf_2['sum_share'] < 1.5) & (intersection_gdf_2['sum_share'] > 1) & (intersection_gdf_2.name.isna())][['trechov','name','os_ta_index','oidrechov','os_overlap_share','ta_overlap_share','sum_share', 'geometry']], fill_colors='black')

m.add_gdf(ta_streets)
m.add_gdf(os_ta_streets_edges)
m.add_gdf(intersection_gdf[intersection_gdf.os_ta_index.isin(os_to_check)][['trechov','name','os_ta_index','oidrechov', 'geometry']], fill_colors='black')

# m.add_gdf(ta_streets, zoom_to_layer=True, fill_colors='black')

m

Map(center=[32.047, 34.785], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom…

* os: 1885  ta: 332   TA
* os: 3223  ta: 9746  OS
* os: 6894  ta: 1183  TA
* os: 7606  ta: 7707  OS
* os: 5101  ta: 6335  TA  
* os: 5087  ta: 6215  OS
* os: 8044  ta: 9478  TA
* os: 5812  ta: 8230  TA
* os: 4664  ta: 2982  TA
* os: 4666  ta: 2982  TA
* os: 5787  ta: 9187  TA
* os: 2161  ta: 4879  TA
* os: 6235  ta: 1651  TA
* os: 2849  ta: 8549  TA
* os: 9339  ta: 7629  TA
* os: 3455  ta: 152   TA
* os: 2684  ta: 186   TA
* os: 5453, ta: 4141  TA
* os: 712:  ta: 7074  TA
* os: 724   ta: 7401  TA
* os: 6481  ta: 9041  DROP
* os: 3878  ta: 8048  TA
* os: 4811  ta: 8047  TA
* os: 4812  ta: 8041  TA
* os: 5486  ta: 5181  TA
* os: 8778  ta: 1916  TA
* os: 7614  ta: 9461  TA
* os: 7720  ta: 7652  TA
* os: 7994  ta: 6457  TA
* os: 620   ta: 5878  TA
* os: 7989  ta: 6723  TA
* os: 7990  ta: 5587  TA
* os: 9402  ta: 8054  TA
* os: 3875  ta: 9282  TA
* os: 3598  ta: 9785  OS
* os 6141   ta: 2160  OS
* os: 95    ta: 6468  TA
* os: 8943  ta: 7918  OS
* os: 5077  ta: 8374  TA
* os: 5082  ta: 7770  TA
* os: 5097  ta: 5627  TA
* os: 6968  ta: 6090  TA
* os: 1524  ta: 7949  OS
* os: 1504  ta: 7429  TA
* os: 7980  ta: 6445  OS
* os: 8125  ta: 9748  OS

In [80]:
os_ta_to_change = [(1885, 332, 'TA'), (3223, 9746, 'OS'), (6894, 1183, 'TA'), (7606, 7707, 'OS'), (5101, 6335, 'TA'), (5087, 6215, 'OS'), (8044, 9478, 'TA'), (5812, 8230, 'TA'), (4664, 2982, 'TA'), (4666, 2982, 'TA'), (5787, 9187, 'TA'), (2161, 4879, 'TA'), (6235, 1651, 'TA'), (2849, 8549, 'TA'), (9339, 7629, 'TA'), (3455, 152, 'TA'), (2684, 186, 'TA'), (5453, 4141, 'TA'), (712, 7074, 'TA'), (724, 7401, 'TA'), (6481, 9041, 'DROP'), (3878, 8048, 'TA'), (4811, 8047, 'TA'), (4812, 8041, 'TA'), (5486, 5181, 'TA'), (8778, 1916, 'TA'), (7614, 9461, 'TA'), (7720, 7652, 'TA'), (7994, 6457, 'TA'), (620, 5878, 'TA'), (7989, 6723, 'TA'), (7990, 5587, 'TA'), (9402, 8054, 'TA'), (3875, 9282, 'TA'), (3598, 9785, 'OS'), (6141, 2160, 'OS'), (95, 6468, 'TA'), (8943, 7918, 'OS'), (5077, 8374, 'TA'), (5082, 7770, 'TA'), (5097, 5627, 'TA'), (6968, 6090, 'TA'), (1524, 7949, 'OS'), (1504, 7429, 'TA'), (7980, 6445, 'OS'), (8125, 9748, 'OS')]

### Finding out how many street names were changed.

In [81]:
# 1) Filter fuzz_score based on best_score > 87
filtered_fuzz_score = fuzz_score[fuzz_score['best_score'] > 87]

# 2) Get number of unique values in 'trechov' and 'name' columns
num_unique_trechov_fuzz = filtered_fuzz_score['trechov'].nunique()
num_unique_name_fuzz    = filtered_fuzz_score['name'].nunique()

print("fuzz_score (filtered) unique trechov:", num_unique_trechov_fuzz)
print("fuzz_score (filtered) unique name:   ", num_unique_name_fuzz)

# 3) Number of unique trechov and name in intersection_gdf (filtered by sum_share)
intersection_gdf_filtered = intersection_gdf[
    (intersection_gdf['sum_share'] < 1.9) & 
    (intersection_gdf['sum_share'] > 0.8)
]

num_unique_trechov_intersect = intersection_gdf_filtered['trechov'].nunique()
num_unique_name_intersect    = intersection_gdf_filtered['name_fixed'].nunique()

print("intersection_gdf unique trechov:", num_unique_trechov_intersect)
print("intersection_gdf unique name:   ", num_unique_name_intersect)

# 4) Number of unique trechov and name in intersection_gdf_2 (filtered by sum_share)
intersection_gdf_2_filtered = intersection_gdf_2[
    (intersection_gdf_2['sum_share'] < 1.9) & 
    (intersection_gdf_2['sum_share'] > 1.3)
]

num_unique_trechov_intersect2 = intersection_gdf_2_filtered['trechov'].nunique()
num_unique_name_intersect2    = intersection_gdf_2_filtered['name_fixed'].nunique()

print("intersection_gdf_2 unique trechov:", num_unique_trechov_intersect2)
print("intersection_gdf_2 unique name:   ", num_unique_name_intersect2)


# 5) Combine all unique NAMES into one set
unique_name_fuzz        = set(filtered_fuzz_score['name'].unique())
unique_name_intersect   = set(intersection_gdf_filtered['name_fixed'].unique())
unique_name_intersect2  = set(intersection_gdf_2_filtered['name_fixed'].unique())

combined_unique_names   = unique_name_fuzz.union(unique_name_intersect, unique_name_intersect2)
total_unique_names      = len(combined_unique_names)

# 6) Calculate the difference (as you described):
#    total - (unique in fuzz_score) - (unique in intersection_gdf) - (unique in intersection_gdf_2)
difference = (
    total_unique_names
    - len(unique_name_fuzz)
    - len(unique_name_intersect)
    - len(unique_name_intersect2)
)

print("\n--- Combined Stats ---")
print("Number of TOTAL unique names (across all filtered sets):", total_unique_names)
print("Difference (total - fuzz - intersect - intersect2):", difference)


fuzz_score (filtered) unique trechov: 1513
fuzz_score (filtered) unique name:    1509
intersection_gdf unique trechov: 173
intersection_gdf unique name:    155
intersection_gdf_2 unique trechov: 174
intersection_gdf_2 unique name:    160

--- Combined Stats ---
Number of TOTAL unique names (across all filtered sets): 1762
Difference (total - fuzz - intersect - intersect2): -64


### Changing names in os_ta_streets_edges

In [82]:
# 1) Filter fuzz_score to rows with best_score > 87
filtered_fuzz_score = fuzz_score[fuzz_score['best_score'] > 87].copy()

# 2) Create a dictionary: name -> trechov
name_to_trechov = dict(zip(filtered_fuzz_score['name'], filtered_fuzz_score['trechov']))

# 3) In os_ta_streets_edges, create a new column ta_name,
#    mapping name_fixed to trechov. Fill unmatched names with empty string:
os_ta_streets_edges['ta_name'] = os_ta_streets_edges['name_fixed'].map(name_to_trechov).fillna('')

# adding names that are class list
name_type_list_idx = os_ta_streets_edges[os_ta_streets_edges.name_type == "<class 'list'>"].index
os_ta_streets_edges.loc[name_type_list_idx, 'ta_name'] = os_ta_streets_edges.loc[name_type_list_idx, 'name_fixed']

name_fixed_not_na_idx = os_ta_streets_edges[(os_ta_streets_edges.ta_name == '') & ~(os_ta_streets_edges.name_fixed.isna())].index
os_ta_streets_edges.loc[name_fixed_not_na_idx, 'ta_name'] = os_ta_streets_edges.loc[name_fixed_not_na_idx, 'name_fixed']


# Done! Now os_ta_streets_edges['ta_name'] has the trechov value
# from fuzz_score wherever name_fixed == fuzz_score['name'],
# and empty strings otherwise.


In [83]:
os_ta_streets_edges[(os_ta_streets_edges.ta_name == '') & (os_ta_streets_edges.name_fixed.isna())]

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name
31,31,1226885,2989957353,366078522,,False,63.366,"LINESTRING (669162.286 3549950.188, 669167.226...",,,,<class 'float'>,<class 'int'>,,
37,37,1227320,2473987496,1119409444,,False,16.801,"LINESTRING (668173.498 3549982.380, 668176.432...",,,,<class 'float'>,<class 'int'>,,
46,46,1228104,442898687,37757978,,False,42.959,"LINESTRING (669854.792 3549710.831, 669859.319...",,,,<class 'float'>,<class 'int'>,,
51,51,1228344,2481705198,240437112,,False,39.718,"LINESTRING (669890.550 3547051.209, 669883.330...",,,,<class 'float'>,<class 'int'>,,
64,64,1329714,17703741,8057470,,False,28.269,"LINESTRING (669910.804 3549703.681, 669908.304...",,,,<class 'float'>,<class 'int'>,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9512,9512,12115376830,12115376829,1308290584,,False,4.578,"LINESTRING (666980.298 3548968.052, 666977.405...",,,,<class 'float'>,<class 'int'>,,
9515,9515,12120455200,10926320334,1308836995,,False,16.541,"LINESTRING (669667.174 3557577.105, 669662.428...",,,,<class 'float'>,<class 'int'>,,
9516,9516,12120460101,12120460109,"[1308836988, 1308836989]",,False,121.938,"LINESTRING (669681.585 3557584.854, 669690.084...",yes,,,<class 'float'>,<class 'list'>,,
9518,9518,12120460109,12120455200,"[1308836987, 1308836990]",,False,146.222,"LINESTRING (669771.229 3557616.095, 669755.429...",yes,,,<class 'float'>,<class 'list'>,,


In [84]:
os_ta_streets_edges[os_ta_streets_edges.ta_name == ''].shape

(1125, 15)

### Changing street names in os_ta_streets_edges using intersection_gdf

In [85]:
# 1) Filter intersection_gdf
filtered_intersection = intersection_gdf[
    (intersection_gdf['sum_share'] < 1.9) & 
    (intersection_gdf['sum_share'] > 0.8)
]

# 2) Iterate over the filtered rows
for idx, row in filtered_intersection.iterrows():
    os_ta_idx = row['os_ta_index']
    
    # 3) Safely check if this index exists in os_ta_streets_edges
    # (depending on how your data is structured, you might need an `if os_ta_idx in os_ta_streets_edges.index:`)
    if os_ta_idx in os_ta_streets_edges.index:
        # 4) Check if the current 'name' is a list
        current_name = os_ta_streets_edges.at[os_ta_idx, 'name']
        
        if not isinstance(current_name, list):
            # Not a list → update it with trechov
            os_ta_streets_edges.at[os_ta_idx, 'ta_name'] = row['trechov']
        # else: if it is a list, do nothing


In [86]:
os_ta_streets_edges[os_ta_streets_edges.ta_name == ''].shape

(1097, 15)

### Changing street names in os_ta_streets_edges using intersection_gdf_2

In [87]:
# 1) Filter intersection_gdf
filtered_intersection = intersection_gdf_2[
    (intersection_gdf_2['sum_share'] < 1.9) & 
    (intersection_gdf_2['sum_share'] > 1.3)
    ]

# 2) Iterate over the filtered rows
for idx, row in filtered_intersection.iterrows():
    os_ta_idx = row['os_ta_index']
    
    # 3) Safely check if this index exists in os_ta_streets_edges
    # (depending on how your data is structured, you might need an `if os_ta_idx in os_ta_streets_edges.index:`)
    if os_ta_idx in os_ta_streets_edges.index:
        # 4) Check if the current 'name' is a list
        current_name = os_ta_streets_edges.at[os_ta_idx, 'name']
        
        if not isinstance(current_name, list):
            # Not a list → update it with trechov
            os_ta_streets_edges.at[os_ta_idx, 'ta_name'] = row['trechov']
        # else: if it is a list, do nothing


In [88]:
os_ta_streets_edges[os_ta_streets_edges.ta_name == ''].shape

(1075, 15)

In [89]:
os_ta_streets_edges[os_ta_streets_edges.ta_name == ''].name_fixed.nunique()

0

In [90]:
os_ta_streets_edges[(os_ta_streets_edges.name_fixed.isna())].shape

(1125, 15)

In [91]:
### Changing street names in os_ta_streets_edges using the tuple I created of specific os_ta_index

In [92]:
import pandas as pd

# Assuming you have your DataFrames 'ta_streets' and 'os_ta_streets_edges' loaded
# and the list of tuples 'tuple_list' is created as before.

for os_val, ta_val, action in os_ta_to_change:
    if action == "OS":
        # os_val is the index
        if os_val in os_ta_streets_edges.index:
            # 1. Get the value from the 'name' column
            name_val = os_ta_streets_edges.loc[os_val, 'name']

            # ta_val is the index
            if ta_val in ta_streets.index:
                # 2. Update the 'trechov' column in ta_streets
                ta_streets.loc[ta_val, 'trechov'] = name_val
            else:
                print(f"TA index {ta_val} not found in ta_streets. Skipping this tuple.")
        else:
            print(f"OS index {os_val} not found in os_ta_streets_edges. Skipping this tuple.")

    elif action == "TA":
        # ta_val is the index
        if ta_val in ta_streets.index:
            # 1. Get the value from the 'trechov' column
            trechov_val = ta_streets.loc[ta_val, 'trechov']

            # os_val is the index
            if os_val in os_ta_streets_edges.index:
              # 2. Update the 'name' column in os_ta_streets_edges
              os_ta_streets_edges.loc[os_val, 'ta_name'] = trechov_val
            else:
              print(f"OS index {os_val} not found in os_ta_streets_edges. Skipping this tuple.")
        else:
            print(f"TA index {ta_val} not found in ta_streets. Skipping this tuple.")

    elif action == "DROP":
        # os_val is the index
        if os_val in os_ta_streets_edges.index:
            # 1. Drop the row in os_ta_streets_edges
            os_ta_streets_edges.drop(os_val, inplace=True)
        else:
            print(f"OS index {os_val} not found in os_ta_streets_edges for dropping.")

        # ta_val is the index
        if ta_val in ta_streets.index:
            # 2. Drop the row in ta_streets
            ta_streets.drop(ta_val, inplace=True)
        else:
            print(f"TA index {ta_val} not found in ta_streets for dropping.")
    else:
        print("Invalid action")

TA index 9746 not found in ta_streets. Skipping this tuple.
TA index 9478 not found in ta_streets. Skipping this tuple.
TA index 9187 not found in ta_streets. Skipping this tuple.
TA index 9041 not found in ta_streets for dropping.
TA index 9461 not found in ta_streets. Skipping this tuple.
TA index 9282 not found in ta_streets. Skipping this tuple.
TA index 9785 not found in ta_streets. Skipping this tuple.
TA index 9748 not found in ta_streets. Skipping this tuple.


## Handling Roundabout

There are currently 505 row categorized as roundabout

### Locating missing roundabouts

Where am I looking:

1. [X] edges that are **NOT** considered roundabout <br>
    
    * [X] name **NOT** na **AND** name type **NOT** ```<class 'list'>```<br>
        
    * [X] name **IS** na **AND** osmid type **IS** ```<class 'list'>```<br>
        
    * [X] name **IS** na **AND** osmid type **NOT** ```<class 'list'>```<br>
        

#### Helper function to check if a linestring are closed loop

If yes then in higher probability they are a roundabout.<br>
Will plot to make sure.

In [93]:
def is_group_closed(group):
    """
    Check if the LineStrings in a group form a closed loop.
    :param group: A GeoSeries of LineStrings
    :return: True if connected and closed, False otherwise
    """
    # Combine all LineStrings in the group into a MultiLineString
    # combined = group.buffer(0.5).tolist()
    
    # Use unary_union to merge all geometries into a single geometry
    merged = unary_union(group.buffer(0.5).tolist())
    
    # Check if the merged geometry is a single LineString and is closed
    return hasattr(merged, 'interiors') and (len(merged.interiors) > 0)

#### name **NOT** na **AND** name type **IS** ```<class 'list'>``` <br>

In [94]:
name_str_nametype_list_edges = os_ta_streets_edges[(os_ta_streets_edges.junction != 'roundabout') &
                                                  (~os_ta_streets_edges.name.isna()) &
                                                  (os_ta_streets_edges.name_type == "<class 'list'>") 
                                                  ]
name_str_nametype_list_edges.shape

(176, 15)

In [95]:
name_str_nametype_list_edges_cp = name_str_nametype_list_edges.copy()
name_str_nametype_list_edges_cp['name'] = name_str_nametype_list_edges_cp['name'].apply(lambda x: str(x) if isinstance(x, list) else x)

res = name_str_nametype_list_edges_cp.groupby('name').apply(is_group_closed)
res.sum()

1

In [96]:
# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(name_str_nametype_list_edges_cp)
# m.add_gdf(name_str_nametype_list_edges_cp[name_str_nametype_list_edges_cp.name.isin(res[res].index)], zoom_to_layer=True, fill_colors='black')

# m

Found no roundabout in **name_str_nametype_list_edges**

#### name **NOT** na **AND** name type **NOT** ```<class 'list'>```<br>

This is most of the edges

In [97]:
name_str_nametype_rest_edges = os_ta_streets_edges[(os_ta_streets_edges.junction != 'roundabout') &
                                                  (~os_ta_streets_edges.name.isna()) &
                                                  ~(os_ta_streets_edges.name_type == "<class 'list'>") 
                                                  ]
name_str_nametype_rest_edges.shape

(8180, 15)

In [98]:
res = name_str_nametype_rest_edges.groupby('name').apply(is_group_closed)
res.sum()

58

In [99]:
# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(name_str_nametype_rest_edges)
# m.add_gdf(name_str_nametype_rest_edges[name_str_nametype_rest_edges.name.isin(res[res].index)], zoom_to_layer=True, fill_colors='black')

# m

We see 3 edges that have been missed.
* רביבים
* כיכר איסר הראל
* כיכר המלך אלברט

#### name **IS** na **AND** osmid type **IS** ```<class 'list'>```<br>

In [100]:
name_na_osmidtype_list_edges = os_ta_streets_edges[(os_ta_streets_edges.junction != 'roundabout') &
                                                  (os_ta_streets_edges.name.isna()) &
                                                  (os_ta_streets_edges.osmid_type == "<class 'list'>") 
                                                  ]
name_na_osmidtype_list_edges.shape

(56, 15)

In [101]:
# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(name_na_osmidtype_list_edges, fill_colors='black')
# m

No roundabout.<br>

There are some other issues.

* Some of the osmid list are actually connector but with a split such as: [408943846, 28941694]
* [660595707, 454564565], is a connector, not sure why osmid has 2.

There are more with issues will go through them later.

#### name **IS** na **AND** osmid type **NOT** ```<class 'list'>```<br>

In [102]:
name_na_osmidtype_rest_edges = os_ta_streets_edges[(os_ta_streets_edges.junction != 'roundabout') &
                                                  (os_ta_streets_edges.name.isna()) &
                                                  ~(os_ta_streets_edges.osmid_type == "<class 'list'>") 
                                                  ]
name_na_osmidtype_rest_edges.shape

(610, 15)

In [103]:
res = name_na_osmidtype_rest_edges.groupby('osmid').apply(is_group_closed)
res.sum()

6

In [104]:
# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(name_na_osmidtype_rest_edges)
# m.add_gdf(name_na_osmidtype_rest_edges[name_na_osmidtype_rest_edges.osmid.isin(res[res].index)], fill_colors='black')

# m

Found no roundabout

#### Conclusion of locating missing roundabout

There are in total 3 roundabouts that haven't been missed.

* רביבים
* כיכר איסר הראל
* כיכר המלך אלברט



### Changing missed roundabouts

In [105]:
os_ta_streets_edges[os_ta_streets_edges.name == 'כיכר המלך אלברט']

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name
1934,1934,384636414,384637410,473287488,כיכר המלך אלברט,False,8.906,"LINESTRING (667529.784 3549128.195, 667526.615...",,,,<class 'str'>,<class 'int'>,כיכר המלך אלברט,כיכר המלך אלברט
1936,1936,384636415,384637413,33626593,כיכר המלך אלברט,True,9.028,"LINESTRING (667548.495 3549121.672, 667544.577...",,,,<class 'str'>,<class 'int'>,כיכר המלך אלברט,כיכר המלך אלברט
1937,1937,384636415,384636414,"[473287488, 473287489]",כיכר המלך אלברט,False,19.787,"LINESTRING (667548.495 3549121.672, 667530.810...",,,,<class 'str'>,<class 'list'>,כיכר המלך אלברט,כיכר המלך אלברט
1938,1938,384637410,384637413,473287488,כיכר המלך אלברט,False,19.019,"LINESTRING (667526.615 3549119.892, 667544.577...",,,,<class 'str'>,<class 'int'>,כיכר המלך אלברט,כיכר המלך אלברט


We have an extra segment in row 2130 that just repeats 2125.

So drop 2130 and change 2125 osmid to 473287488
and change 2126 osmid to 473287488

In [106]:
missed_ra_idx = os_ta_streets_edges[os_ta_streets_edges.name.isin(['כיכר המלך אלברט','כיכר איסר הראל','רביבים'])].index

os_ta_streets_edges.loc[missed_ra_idx, 'junction'] = 'roundabout'

In [107]:
os_ta_streets_edges.loc[2126,'osmid'] = 473287489
os_ta_streets_edges.loc[2125,'osmid'] = 473287488


In [108]:
os_ta_streets_edges[os_ta_streets_edges.name == 'כיכר המלך אלברט']


Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name
1934,1934,384636414,384637410,473287488,כיכר המלך אלברט,False,8.906,"LINESTRING (667529.784 3549128.195, 667526.615...",,,roundabout,<class 'str'>,<class 'int'>,כיכר המלך אלברט,כיכר המלך אלברט
1936,1936,384636415,384637413,33626593,כיכר המלך אלברט,True,9.028,"LINESTRING (667548.495 3549121.672, 667544.577...",,,roundabout,<class 'str'>,<class 'int'>,כיכר המלך אלברט,כיכר המלך אלברט
1937,1937,384636415,384636414,"[473287488, 473287489]",כיכר המלך אלברט,False,19.787,"LINESTRING (667548.495 3549121.672, 667530.810...",,,roundabout,<class 'str'>,<class 'list'>,כיכר המלך אלברט,כיכר המלך אלברט
1938,1938,384637410,384637413,473287488,כיכר המלך אלברט,False,19.019,"LINESTRING (667526.615 3549119.892, 667544.577...",,,roundabout,<class 'str'>,<class 'int'>,כיכר המלך אלברט,כיכר המלך אלברט


In [109]:
os_ta_streets_edges[os_ta_streets_edges.name.isin(['כיכר המלך אלברט','כיכר איסר הראל','רביבים'])].shape

(15, 15)

In [110]:
os_ta_streets_edges[os_ta_streets_edges.junction == 'roundabout'].shape

(520, 15)

### How many of the roundabout have no name?

In [111]:
os_ta_streets_edges[(os_ta_streets_edges.junction == 'roundabout') & (os_ta_streets_edges.name.isna())].shape

(458, 15)

### Handling roundabout naming

I need to get all the street names of each roundabout that has no name.<br>
Then check all the streets that intersect each roundabout and get their names<br>
Take from all the names 2.<br>
Add to the column nam כיכר סטריט1 וסטריט2


In [112]:
# filter for junction=='roundabout' and turn to a separate df
os_ta_roundabouts = os_ta_streets_edges[os_ta_streets_edges.junction == 'roundabout'].copy()
os_ta_roundabouts

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name
177,177,35288627,2203627170,5118376,ה' באייר,False,27.760,"LINESTRING (668874.869 3551329.236, 668885.399...",,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,הא באייר
256,256,271878297,381571872,33492500,,False,28.725,"LINESTRING (672928.438 3554590.534, 672933.715...",,,roundabout,<class 'float'>,<class 'int'>,,
258,258,271878300,1170469323,33492500,,False,15.693,"LINESTRING (672940.500 3554645.491, 672933.864...",,,roundabout,<class 'float'>,<class 'int'>,,
459,459,289069630,2111357789,98095055,,False,5.463,"LINESTRING (669331.383 3553475.481, 669330.294...",,,roundabout,<class 'float'>,<class 'int'>,,
485,485,289499143,9709113999,1056620897,,False,9.208,"LINESTRING (667493.029 3552990.310, 667494.436...",,,roundabout,<class 'float'>,<class 'int'>,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9488,9488,11805962014,5604911212,355307991,,False,21.827,"LINESTRING (670047.940 3553648.775, 670050.132...",,,roundabout,<class 'float'>,<class 'int'>,,
9490,9490,11805962018,3609242009,355307991,,False,23.292,"LINESTRING (670024.513 3553661.126, 670024.322...",,,roundabout,<class 'float'>,<class 'int'>,,
9496,9496,11808919720,1801631024,168983276,כיכר חנה אבנור,False,10.027,"LINESTRING (669356.058 3552609.839, 669357.674...",,,roundabout,<class 'str'>,<class 'int'>,כיכר חנה אבנור,כיכר חנה אבנור
9501,9501,11846590955,2993052349,1275972347,,False,9.343,"LINESTRING (670392.267 3553411.071, 670394.928...",,,roundabout,<class 'float'>,<class 'int'>,,


### turn the roundabouts linestring to a polygon

In [113]:
# Step 1: Buffer and create roundabout polygons
roundabouts = os_ta_streets_edges[os_ta_streets_edges.junction == 'roundabout'].copy()
roundabouts['geometry'] = roundabouts.buffer(4)
roundabouts_geometry = unary_union(roundabouts.geometry)

# Create polygons GeoDataFrame
roundabouts_poly = gpd.GeoDataFrame(geometry=[roundabouts_geometry], crs=roundabouts.crs)
roundabouts_poly = roundabouts_poly.explode(index_parts=True).reset_index(drop=True)

# Step 2: Assign unique IDs to roundabout polygons
roundabouts_poly['poly_id'] = roundabouts_poly.index

# Step 3: Spatial join to associate edges with polygons
edges_with_polygons = gpd.sjoin(roundabouts, roundabouts_poly, how='inner', predicate='intersects')

# Step 4: Analyze or group information
# Example: For each polygon, find the first edge's data
roundabouts_with_info = (
    edges_with_polygons.groupby('poly_id')
    .first()  # Select the first edge information for each polygon
    .reset_index()
)

# View the resulting dataframe
roundabouts_with_info['geometry'] = roundabouts_poly.geometry
roundabouts_with_info


Unnamed: 0,poly_id,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name,index_right
0,0,662,318177901,1716729694,159555091,,False,1.317,"POLYGON ((664989.528 3545205.441, 664989.586 3...",,,roundabout,<class 'float'>,<class 'int'>,,,0
1,1,8965,6952780428,6952780430,742719147,,False,20.654,"POLYGON ((665607.578 3545525.736, 665607.613 3...",,,roundabout,<class 'float'>,<class 'int'>,,,1
2,2,5680,1283949630,1283949681,113140773,,False,10.751,"POLYGON ((664910.207 3545552.105, 664910.187 3...",,,roundabout,<class 'float'>,<class 'int'>,,,2
3,3,9089,8589321882,8589321892,925514478,,False,16.055,"POLYGON ((665190.400 3545579.390, 665190.429 3...",,,roundabout,<class 'float'>,<class 'int'>,,,3
4,4,5949,1458945605,1458945625,132638933,,False,37.276,"POLYGON ((666388.951 3545956.731, 666388.931 3...",,,roundabout,<class 'float'>,<class 'int'>,,,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111,111,1609,354047338,354058549,31651913,,False,18.347,"POLYGON ((669097.330 3555949.584, 669097.310 3...",,,roundabout,<class 'float'>,<class 'int'>,,,111
112,112,2635,414720184,414720184,35373026,,False,43.276,"POLYGON ((670003.503 3556007.020, 670003.501 3...",,,roundabout,<class 'float'>,<class 'int'>,,,112
113,113,1601,354045769,803481930,31651960,,False,8.331,"POLYGON ((669114.184 3556121.814, 669114.075 3...",,,roundabout,<class 'float'>,<class 'int'>,,,113
114,114,9351,10907653669,11267492245,1173903897,,False,14.612,"POLYGON ((669622.880 3557156.219, 669622.863 3...",,,roundabout,<class 'float'>,<class 'int'>,,,114


Found a problem in some roundabouts where some edges are not in the right osmid or their junction categorization is not a roundabout.

The difference is that some roundabout had more then one osmid and when preforming the polygon creation we lost some of them, but the shape is the same.

#### take the rest of the edges and break them to separate df

In [114]:
os_ta_streets_no_ra_edges = os_ta_streets_edges[os_ta_streets_edges.junction != 'roundabout'].copy()
os_ta_streets_no_ra_edges.shape

(9007, 15)

In [115]:
# returns list that will be used to to determine overlapping edges in roundabout
def iter_group_list(group):
    ls = []
    for item in group:
        if isinstance(item,list):
            for i in item:
                if not pd.isna(i):
                    ls.append(i)
        else:
            if not pd.isna(item):
                ls.append(item)
    return list(set(ls))


In [116]:
# Add the new columns with default empty list values
os_ta_streets_edges['overlapping_names'] = ''
os_ta_streets_edges['overlapping_osmids'] = ''

In [117]:
# the final overlappign information will be as string, since it's easier to work with
def turn_to_str(ls):
    ls = list(set(ls))
    str_ls = []
    for item in ls:
        str_item = str(item)
        str_ls.append(str_item)
    return ' ,'.join(str_ls)

In [118]:
# can't sjoin if I already have index_right
roundabouts_with_info.drop(columns=['index_right'], inplace=True)

In [119]:
roundabouts_with_info[roundabouts_with_info.ta_name != '']

Unnamed: 0,poly_id,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name
10,10,3878,549256221,924516872,43518958,,False,56.526,"POLYGON ((667679.895 3546837.373, 667679.991 3...",,,roundabout,<class 'float'>,<class 'int'>,,3933
21,21,1769,365766891,2584166686,692585794,כיכר יוסי כרמל,False,5.14,"POLYGON ((665831.668 3547925.919, 665831.555 3...",,,roundabout,<class 'str'>,<class 'int'>,כיכר יוסי כרמל,כיכר יוסי כרמל
22,22,3199,446138533,563927922,44372507,,False,19.699,"POLYGON ((669420.882 3548096.166, 669420.792 3...",,,roundabout,<class 'float'>,<class 'int'>,,גרונדמן יעקב (יענקל'ה)
27,27,1934,384636414,384637410,473287488,כיכר המלך אלברט,False,8.906,"POLYGON ((667548.097 3549111.667, 667548.017 3...",,,roundabout,<class 'str'>,<class 'int'>,כיכר המלך אלברט,כיכר המלך אלברט
34,34,177,35288627,2203627170,5118376,ה' באייר,False,27.76,"POLYGON ((669024.607 3551547.628, 669024.716 3...",,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,הא באייר
40,40,6871,1801631001,11808919720,168983276,כיכר חנה אבנור,False,2.304,"POLYGON ((669365.011 3552597.458, 669364.989 3...",,,roundabout,<class 'str'>,<class 'int'>,כיכר חנה אבנור,כיכר חנה אבנור
58,58,5177,1135283061,1188072657,753366273,,False,15.813,"POLYGON ((669674.572 3554035.029, 669674.441 3...",,,roundabout,<class 'float'>,<class 'int'>,,לוין אריה הרב
59,59,4913,985869973,9256522643,84925664,השופט חיים כהן,False,17.131,"POLYGON ((668728.213 3554087.217, 668728.285 3...",,,roundabout,<class 'str'>,<class 'int'>,השופט חיים כהן,כהן חיים השופט
74,74,8498,4135912119,4135913599,912644179,כיכר איסר הראל,False,7.043,"POLYGON ((673674.751 3554594.036, 673674.740 3...",,,roundabout,<class 'str'>,<class 'int'>,כיכר איסר הראל,כיכר איסר הראל
78,78,2867,417632981,849307633,35665854,כיכר אהוד אבריאל,False,21.543,"POLYGON ((673763.317 3554740.856, 673763.511 3...",,,roundabout,<class 'str'>,<class 'int'>,כיכר אהוד אבריאל,כיכר אהוד אבריאל


### Adding to os_ta_street_edges the overlapping information

In [120]:
overlap = gpd.sjoin(os_ta_streets_edges, roundabouts_with_info, how='inner', predicate='intersects')

# Step 3: Group by `osmid_right` and aggregate `osmid_left` and `name_left` into lists, flattening nested lists
grouped_overlap = overlap.groupby('osmid_right').agg({
    'osmid_left': iter_group_list,  # Flatten nested lists or handle single values
    'ta_name_left': iter_group_list   # Same logic for name_left
}).reset_index()

for idx,row  in grouped_overlap.iterrows():
    osmid          = row['osmid_right']
    overlap_osmids = row['osmid_left']
    overlap_names  = row['ta_name_left']

    osmid_mask = os_ta_streets_edges[os_ta_streets_edges.osmid == osmid].index
    for idx_os in osmid_mask:
        os_ta_streets_edges.at[idx_os, 'overlapping_names']  = turn_to_str(overlap_names)
        os_ta_streets_edges.at[idx_os, 'overlapping_osmids'] = turn_to_str(os_ta_streets_edges[(os_ta_streets_edges.osmid.isin(overlap_osmids)) & (os_ta_streets_edges.junction == 'roundabout')].osmid.to_list())


os_ta_streets_edges.head()


Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name,overlapping_names,overlapping_osmids
0,0,139693,5723720351,5118378,ויצמן,False,4.37,"LINESTRING (668968.683 3552240.237, 668968.629...",,,,<class 'str'>,<class 'int'>,ויצמן,ויצמן,,
1,1,139693,139698,167691710,יהודה המכבי,False,62.173,"LINESTRING (668968.683 3552240.237, 668972.601...",,,,<class 'str'>,<class 'int'>,יהודה המכבי,יהודה המכבי,,
2,2,139698,139723,167691710,יהודה המכבי,False,110.029,"LINESTRING (669030.815 3552244.552, 669082.911...",,,,<class 'str'>,<class 'int'>,יהודה המכבי,יהודה המכבי,,
3,3,139707,139708,26516058,ירמיהו,False,37.249,"LINESTRING (667826.578 3552389.242, 667810.983...",,,,<class 'str'>,<class 'int'>,ירמיהו,ירמיהו הנביא,,
4,4,139707,10985355495,1183058410,אוסישקין,False,190.227,"LINESTRING (667826.578 3552389.242, 667831.045...",,,,<class 'str'>,<class 'int'>,אוסישקין,אוסישקין,,


In [121]:
# Getting length of overlapping_names
def get_len(value):
    if value == '' or pd.isna(value):
        return 0
    else:
        return value.count(',') + 1

In [122]:
os_ta_streets_edges['overlapping_names_len'] = os_ta_streets_edges.overlapping_names.apply(get_len)

In [123]:
os_ta_streets_edges[os_ta_streets_edges.overlapping_names_len > 2]

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name,overlapping_names,overlapping_osmids,overlapping_names_len
177,177,35288627,2203627170,5118376,ה' באייר,False,27.760,"LINESTRING (668874.869 3551329.236, 668885.399...",,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,הא באייר,"תש״ח ,ז'בוטינסקי ,חברה חדשה ,ויצמן ,עקיבא אריה...",5118376,6
256,256,271878297,381571872,33492500,,False,28.725,"LINESTRING (672928.438 3554590.534, 672933.715...",,,roundabout,<class 'float'>,<class 'int'>,,,",פתחיה מרגנשבורג ,משמר הירדן ,מרכוס דוד","1134738136 ,33492500",4
258,258,271878300,1170469323,33492500,,False,15.693,"LINESTRING (672940.500 3554645.491, 672933.864...",,,roundabout,<class 'float'>,<class 'int'>,,,",פתחיה מרגנשבורג ,משמר הירדן ,מרכוס דוד","1134738136 ,33492500",4
459,459,289069630,2111357789,98095055,,False,5.463,"LINESTRING (669331.383 3553475.481, 669330.294...",,,roundabout,<class 'float'>,<class 'int'>,,,",ברודצקי ,רדינג ,גריפל יגאל",98095055,4
485,485,289499143,9709113999,1056620897,,False,9.208,"LINESTRING (667493.029 3552990.310, 667494.436...",,,roundabout,<class 'float'>,<class 'int'>,,,",יקותיאלי יוסף ,ליפקין שחק- שטח הנמל ,1235",1056620897,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9475,9475,11498283485,11498267668,1237894128,,False,6.499,"LINESTRING (669622.601 3554231.222, 669620.806...",,,roundabout,<class 'float'>,<class 'int'>,,,",ברודצקי ,ברזיל",1237894128,3
9477,9477,11498283514,11498283483,1237894128,,False,12.749,"LINESTRING (669629.084 3554213.764, 669632.273...",,,roundabout,<class 'float'>,<class 'int'>,,,",ברודצקי ,ברזיל",1237894128,3
9488,9488,11805962014,5604911212,355307991,,False,21.827,"LINESTRING (670047.940 3553648.775, 670050.132...",,,roundabout,<class 'float'>,<class 'int'>,,,",רוזנפלד שלום ,וייז ג'ורג' ד""ר",355307991,3
9490,9490,11805962018,3609242009,355307991,,False,23.292,"LINESTRING (670024.513 3553661.126, 670024.322...",,,roundabout,<class 'float'>,<class 'int'>,,,",רוזנפלד שלום ,וייז ג'ורג' ד""ר",355307991,3


Because of how the polygon got names some edges didn't get any overlapping information.

To fix this we will iterate over all the osmid of roundabout and recreate these.

### Fill roundabout edges without any overlapping information with correct overlapping names and osmids

In [124]:
joined_round_polu_with_ta_edges = gpd.sjoin(roundabouts_with_info, os_ta_streets_edges[(os_ta_streets_edges.overlapping_names_len == 0) & (os_ta_streets_edges.junction == 'roundabout')])

joined_round_polu_with_ta_edges[joined_round_polu_with_ta_edges.osmid_left == 692585794][['osmid_left','osmid_right', 'overlapping_osmids', 'overlapping_names', 'overlapping_names_len']]

Unnamed: 0,osmid_left,osmid_right,overlapping_osmids,overlapping_names,overlapping_names_len
21,692585794,1284672741,,,0
21,692585794,1284672741,,,0
21,692585794,1284672742,,,0
21,692585794,1284672741,,,0
21,692585794,1284672741,,,0
21,692585794,1284672741,,,0


In [125]:
# osmid_left holds information in os_ta_streets_edges of overlapping
os_ta_streets_edges[os_ta_streets_edges.osmid.isin(joined_round_polu_with_ta_edges.osmid_left)].shape

(21, 18)

In [126]:
# Filling the empty overlapping
for idx, row in joined_round_polu_with_ta_edges.iterrows():
    osmid_left  = row['osmid_left']
    osmid_rights = joined_round_polu_with_ta_edges[joined_round_polu_with_ta_edges.osmid_left == osmid_left]['osmid_right'].to_list()
    print()
    print('------------------------')
    print(osmid_left)
    print(osmid_rights)
    idx_osmid_to_fill_overlap = os_ta_streets_edges[os_ta_streets_edges.osmid.isin(osmid_rights)].index
    print(idx_osmid_to_fill_overlap)
    print('------------------------')
    print()

    os_ta_streets_edges.loc[idx_osmid_to_fill_overlap, ['overlapping_names']] = os_ta_streets_edges[os_ta_streets_edges.osmid == osmid_left].overlapping_names.values[0]
    os_ta_streets_edges.loc[idx_osmid_to_fill_overlap, ['overlapping_osmids']] = os_ta_streets_edges[os_ta_streets_edges.osmid == osmid_left].overlapping_osmids.values[0]    
    os_ta_streets_edges.loc[idx_osmid_to_fill_overlap, ['overlapping_names_len']] = os_ta_streets_edges[os_ta_streets_edges.osmid == osmid_left].overlapping_names_len.values[0]    



------------------------
132638933
[132638933, 132638933]
Index([5949, 5950], dtype='int64')
------------------------


------------------------
132638933
[132638933, 132638933]
Index([5949, 5950], dtype='int64')
------------------------


------------------------
692585794
[1284672741, 1284672741, 1284672742, 1284672741, 1284672741, 1284672741]
Index([4796, 4798, 7875, 7877, 8932, 8937], dtype='int64')
------------------------


------------------------
692585794
[1284672741, 1284672741, 1284672742, 1284672741, 1284672741, 1284672741]
Index([4796, 4798, 7875, 7877, 8932, 8937], dtype='int64')
------------------------


------------------------
692585794
[1284672741, 1284672741, 1284672742, 1284672741, 1284672741, 1284672741]
Index([4796, 4798, 7875, 7877, 8932, 8937], dtype='int64')
------------------------


------------------------
692585794
[1284672741, 1284672741, 1284672742, 1284672741, 1284672741, 1284672741]
Index([4796, 4798, 7875, 7877, 8932, 8937], dtype='int64')
----------

### Checking how many roundabout have no overlapping names or osmid and if there are mistakes in the creation before moving to creating names to roundabout.

In [127]:
# os_ta_streets_edges[(os_ta_streets_edges.junction=='roundabout') & (os_ta_streets_edges.overlapping_names_len < 2)]

All these roundabouts can be ignored or dropped. they are not real roundabout

In [128]:
idx_roundabout_to_drop = os_ta_streets_edges[(os_ta_streets_edges.junction=='roundabout') & (os_ta_streets_edges.overlapping_names_len < 2)].index

os_ta_streets_edges.drop(index=idx_roundabout_to_drop, inplace=True)

### Name roundabout that have no names but can be named (have more than 1 street)

In [129]:
os_ta_streets_edges[(os_ta_streets_edges.name.isna()) & (os_ta_streets_edges.junction == 'roundabout')].shape, os_ta_streets_edges[(os_ta_streets_edges.name.isna()) & (os_ta_streets_edges.junction == 'roundabout') & (os_ta_streets_edges.overlapping_names_len > 1)].shape

((454, 18), (454, 18))

In [130]:
def name_roundabout_with_overlap(overlapping_names):
    overlapping_names = overlapping_names.values[0].split(',')    
    # # take the first two names and create a new string with them
    name0 = overlapping_names[0]
    name1 = overlapping_names[1]

    new_name = f"כיכר {name0} ו{name1}"
    return new_name

In [131]:
# naming roundabout that have more then 1 street in overlapping 

idx_ra_no_name_with_overlap = os_ta_streets_edges[(os_ta_streets_edges.name.isna()) & (os_ta_streets_edges.junction == 'roundabout') & (os_ta_streets_edges.overlapping_names_len > 1)].index

os_ta_streets_edges.loc[idx_ra_no_name_with_overlap,['name']] = os_ta_streets_edges.loc[idx_ra_no_name_with_overlap,['overlapping_names']].apply(name_roundabout_with_overlap, axis=1)

In [132]:
os_ta_streets_edges[ (os_ta_streets_edges.junction == 'roundabout') & (os_ta_streets_edges.overlapping_names_len > 1)]

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name,overlapping_names,overlapping_osmids,overlapping_names_len
177,177,35288627,2203627170,5118376,ה' באייר,False,27.760,"LINESTRING (668874.869 3551329.236, 668885.399...",,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,הא באייר,"תש״ח ,ז'בוטינסקי ,חברה חדשה ,ויצמן ,עקיבא אריה...",5118376,6
256,256,271878297,381571872,33492500,כיכר ופתחיה מרגנשבורג,False,28.725,"LINESTRING (672928.438 3554590.534, 672933.715...",,,roundabout,<class 'float'>,<class 'int'>,,,",פתחיה מרגנשבורג ,משמר הירדן ,מרכוס דוד","1134738136 ,33492500",4
258,258,271878300,1170469323,33492500,כיכר ופתחיה מרגנשבורג,False,15.693,"LINESTRING (672940.500 3554645.491, 672933.864...",,,roundabout,<class 'float'>,<class 'int'>,,,",פתחיה מרגנשבורג ,משמר הירדן ,מרכוס דוד","1134738136 ,33492500",4
459,459,289069630,2111357789,98095055,כיכר וברודצקי,False,5.463,"LINESTRING (669331.383 3553475.481, 669330.294...",,,roundabout,<class 'float'>,<class 'int'>,,,",ברודצקי ,רדינג ,גריפל יגאל",98095055,4
485,485,289499143,9709113999,1056620897,כיכר ויקותיאלי יוסף,False,9.208,"LINESTRING (667493.029 3552990.310, 667494.436...",,,roundabout,<class 'float'>,<class 'int'>,,,",יקותיאלי יוסף ,ליפקין שחק- שטח הנמל ,1235",1056620897,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9488,9488,11805962014,5604911212,355307991,כיכר ורוזנפלד שלום,False,21.827,"LINESTRING (670047.940 3553648.775, 670050.132...",,,roundabout,<class 'float'>,<class 'int'>,,,",רוזנפלד שלום ,וייז ג'ורג' ד""ר",355307991,3
9490,9490,11805962018,3609242009,355307991,כיכר ורוזנפלד שלום,False,23.292,"LINESTRING (670024.513 3553661.126, 670024.322...",,,roundabout,<class 'float'>,<class 'int'>,,,",רוזנפלד שלום ,וייז ג'ורג' ד""ר",355307991,3
9496,9496,11808919720,1801631024,168983276,כיכר חנה אבנור,False,10.027,"LINESTRING (669356.058 3552609.839, 669357.674...",,,roundabout,<class 'str'>,<class 'int'>,כיכר חנה אבנור,כיכר חנה אבנור,",הזוהר ,כיכר חנה אבנור ,בני דן ,קוסובסקי",168983276,5
9501,9501,11846590955,2993052349,1275972347,כיכר ורמבה אייזיק,False,9.343,"LINESTRING (670392.267 3553411.071, 670394.928...",,,roundabout,<class 'float'>,<class 'int'>,,,",רמבה אייזיק",1275972347,2


In [133]:
# filling ta_name for roundabouts
roundabout_edges_idx = os_ta_streets_edges[
    (os_ta_streets_edges.junction=='roundabout') & 
    (os_ta_streets_edges.overlapping_names_len > 1)
    ].index

os_ta_streets_edges.loc[roundabout_edges_idx, 'ta_name'] = os_ta_streets_edges.loc[roundabout_edges_idx, 'name']

os_ta_streets_edges.loc[roundabout_edges_idx, 'ta_name']


177                     ה' באייר
256     כיכר   ופתחיה מרגנשבורג 
258     כיכר   ופתחיה מרגנשבורג 
459             כיכר   וברודצקי 
485       כיכר   ויקותיאלי יוסף 
                  ...           
9488       כיכר   ורוזנפלד שלום 
9490       כיכר   ורוזנפלד שלום 
9496              כיכר חנה אבנור
9501         כיכר   ורמבה אייזיק
9502         כיכר   ורמבה אייזיק
Name: ta_name, Length: 515, dtype: object

Getting roundabout edges and edges that intersect roundabout, this will be used to filter later in the notebook.
* Get roundabout edges
* Buffer rab edges
* split os_ta_streets_edges
* join split_edges to get the rab edges and the edges that connect (these are what we care)
    We still need these edges to check maybe they are connected to other streets, but we only care about the parts that don't connect to rab



In [134]:

os_ta_streets_edges_rab = os_ta_streets_edges[os_ta_streets_edges.junction == 'roundabout'].copy()
os_ta_streets_edges_rab.geometry = os_ta_streets_edges_rab.buffer(3)

# getting overlapping edges with roundabout

edges_inters_rab = gpd.sjoin(os_ta_streets_edges, os_ta_streets_edges_rab, how='inner', predicate='intersects')
display(edges_inters_rab.head(3))
edges_rab_and_inters = edges_inters_rab.drop_duplicates(subset=['u_left','v_left'])
edges_inters_rab = edges_rab_and_inters[edges_rab_and_inters.junction_left != 'roundabout'].copy()
edges_inters_rab.shape
os_ta_streets_edges['is_connected_to_rab'] = 0
inters_rab_idx = os_ta_streets_edges[(os_ta_streets_edges.u.isin(edges_inters_rab.u_left)) & (os_ta_streets_edges.v.isin(edges_inters_rab.v_left))].index
os_ta_streets_edges.loc[inters_rab_idx, 'is_connected_to_rab'] = 1


Unnamed: 0,os_ta_index_left,u_left,v_left,osmid_left,name_left,reversed_left,length_left,geometry,tunnel_left,bridge_left,...,tunnel_right,bridge_right,junction_right,name_type_right,osmid_type_right,name_fixed_right,ta_name_right,overlapping_names_right,overlapping_osmids_right,overlapping_names_len_right
177,177,35288627,2203627170,5118376,ה' באייר,False,27.76,"LINESTRING (668874.869 3551329.236, 668885.399...",,,...,,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,ה' באייר,"תש״ח ,ז'בוטינסקי ,חברה חדשה ,ויצמן ,עקיבא אריה...",5118376,6
177,177,35288627,2203627170,5118376,ה' באייר,False,27.76,"LINESTRING (668874.869 3551329.236, 668885.399...",,,...,,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,ה' באייר,"תש״ח ,ז'בוטינסקי ,חברה חדשה ,ויצמן ,עקיבא אריה...",5118376,6
177,177,35288627,2203627170,5118376,ה' באייר,False,27.76,"LINESTRING (668874.869 3551329.236, 668885.399...",,,...,,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,ה' באייר,"תש״ח ,ז'בוטינסקי ,חברה חדשה ,ויצמן ,עקיבא אריה...",5118376,6


#### Giving nodes is a roundabout flag

In [135]:
os_ta_streets_nodes

Unnamed: 0,osmid,y,x,highway,street_count,ref,geometry
0,139693,32.093840,34.790572,traffic_signals,4,,POINT (668968.683 3552240.237)
1,139698,32.093869,34.791231,,3,,POINT (669030.815 3552244.552)
2,139707,32.095354,34.778500,,3,,POINT (667826.578 3552389.242)
3,139708,32.095052,34.778329,,3,,POINT (667810.983 3552355.494)
4,139709,32.094527,34.778842,,4,,POINT (667860.387 3552298.098)
...,...,...,...,...,...,...,...
6483,12292683832,32.062786,34.785004,,1,,POINT (668500.151 3548788.711)
6484,12292683834,32.063149,34.785276,traffic_signals,3,,POINT (668525.204 3548829.326)
6485,12361383714,32.057552,34.763578,,1,,POINT (666486.760 3548175.193)
6486,12361652364,32.054494,34.771553,,1,,POINT (667245.361 3547848.419)


In [136]:
edges_rab = os_ta_streets_edges[os_ta_streets_edges.junction == 'roundabout'].copy()
edges_rab.geometry = edges_rab.buffer(3)
edges_rab.head(3)

Unnamed: 0,os_ta_index,u,v,osmid,name,reversed,length,geometry,tunnel,bridge,junction,name_type,osmid_type,name_fixed,ta_name,overlapping_names,overlapping_osmids,overlapping_names_len,is_connected_to_rab
177,177,35288627,2203627170,5118376,ה' באייר,False,27.76,"POLYGON ((668885.943 3551329.703, 668902.637 3...",,,roundabout,<class 'str'>,<class 'int'>,ה' באייר,ה' באייר,"תש״ח ,ז'בוטינסקי ,חברה חדשה ,ויצמן ,עקיבא אריה...",5118376,6,1
256,256,271878297,381571872,33492500,כיכר ופתחיה מרגנשבורג,False,28.725,"POLYGON ((672932.947 3554593.994, 672936.413 3...",,,roundabout,<class 'float'>,<class 'int'>,,כיכר ופתחיה מרגנשבורג,",פתחיה מרגנשבורג ,משמר הירדן ,מרכוס דוד","1134738136 ,33492500",4,0
258,258,271878300,1170469323,33492500,כיכר ופתחיה מרגנשבורג,False,15.693,"POLYGON ((672934.666 3554642.011, 672931.248 3...",,,roundabout,<class 'float'>,<class 'int'>,,כיכר ופתחיה מרגנשבורג,",פתחיה מרגנשבורג ,משמר הירדן ,מרכוס דוד","1134738136 ,33492500",4,0


In [137]:
# nodes_rab = gpd.sjoin(os_ta_streets_nodes, edges_rab, how='inner', predicate='intersects')
# nodes_rab.geometry = nodes_rab.buffer(3)
# nodes_rab.head(3)

In [138]:
# nodes_rab_idx = os_ta_streets_nodes[os_ta_streets_nodes.osmid.isin(nodes_rab.osmid_left)].index
# os_ta_streets_nodes['is_roundabout'] = 0
# os_ta_streets_nodes.loc[nodes_rab_idx, 'is_roundabout'] = 1

#### Export nodes with column is_roundabout

In [139]:
# os_ta_streets_nodes.to_parquet('./csv_tables/os_ta_streets_nodes.parquet')

### check overlap between edges and roundabouts polygon

In [140]:
# ## making sure all unnamed roads are actually connector

# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(nodes_rab, fill_colors='black')

# m

## Handling Roads/Connector Roads

### Handling roads that have no name that are connector between streets with name.

How many streets that have no name are connector?

My guess segments that have osmid_type class list are the one.

In [141]:
os_ta_streets_edges.name_type.value_counts()

name_type
<class 'str'>      8226
<class 'float'>    1120
<class 'list'>      176
Name: count, dtype: int64

176 are class list

Does that correspond to edges that are connector?

In [142]:
# # ## making sure all unnamed roads are actually connector

# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(os_ta_streets_edges[os_ta_streets_edges.name_type == "<class 'list'>"], fill_colors='black')

# m

These streets don't seem like connector.

How many nan ames  are there?

How many name ands and class list are there?

In [143]:
os_ta_streets_edges[os_ta_streets_edges.name.isna()].shape, os_ta_streets_edges[(os_ta_streets_edges.name.isna()) & (os_ta_streets_edges.osmid_type == "<class 'list'>")].shape

((666, 19), (56, 19))

In [144]:
# # ## making sure all unnamed roads are actually connector

# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(os_ta_streets_edges[os_ta_streets_edges.name.isna()], fill_colors='black')
# m.add_gdf(os_ta_streets_edges[(os_ta_streets_edges.name.isna()) & (os_ta_streets_edges.osmid_type == "<class 'list'>")])


# m

These streets do seem like connectors.

In [145]:
# Step 1: Filter edges with no name that are considered connectors
idx_unnamed_roads = os_ta_streets_edges[(os_ta_streets_edges.name.isna()) & (os_ta_streets_edges.osmid_type != "<class 'list'>") & (os_ta_streets_edges.junction !='roundabout')].index
display(os_ta_streets_edges.loc[idx_unnamed_roads][os_ta_streets_edges['ta_name'] == ''].shape), os_ta_streets_edges.loc[idx_unnamed_roads].shape

(577, 19)

(None, (610, 19))

In [146]:
# ## making sure all unnamed roads are actually connector

# m = leafmap.Map(center=(32.047, 34.785), zoom=11)
# m.add_gdf(os_ta_streets_edges, fill_colors='black')

# m

In [147]:
os_ta_streets_edges.loc[idx_unnamed_roads].shape, os_ta_streets_edges[os_ta_streets_edges.name.isna()].shape

((610, 19), (666, 19))

A lot of these unnamed roads are connecting between other roads.

### Creating names for unnamed connector streets

A connector name will be determined by the road that connects to it's start point and the road that connects to it's end point.

In [148]:
from shapely.geometry import Point, LineString

def get_start_mid_end_points(line: LineString):
    """
    Returns the start point 
    and the end point of a Shapely LineString.
    """
    start = Point(line.coords[0])
    end = Point(line.coords[-1])
    # mid = line.interpolate(0.5, normalized=True)  # 0.5 means halfway along the total length

    return start, end


In [149]:
os_ta_streets_edges['start_point'], os_ta_streets_edges['end_point'] = zip(*os_ta_streets_edges['geometry'].apply(get_start_mid_end_points))

In [150]:
# Make a copy of the DataFrame and add a unique integer ID column
os_ta_streets_edges_cp = os_ta_streets_edges.copy()
# os_ta_streets_edges_cp['row_id'] = np.arange(len(os_ta_streets_edges_cp))

def find_closest_edge(point, ta_streets, exclude_id):
    # Only consider rows whose 'row_id' is different
    mask = ta_streets['os_ta_index'] != exclude_id

    # print(mask.sum())
    distances = ta_streets.loc[mask].distance(point)
    return distances.idxmin()

os_ta_streets_edges_cp['start_edge_idx'] = os_ta_streets_edges_cp.apply(
    lambda row: find_closest_edge(row['start_point'], os_ta_streets_edges_cp, exclude_id=row['os_ta_index']),
    axis=1
)

os_ta_streets_edges_cp['end_edge_idx'] = os_ta_streets_edges_cp.apply(
    lambda row: find_closest_edge(row['end_point'], os_ta_streets_edges_cp, exclude_id=row['os_ta_index']),
    axis=1
)

In [151]:
# getting name from os_ta_streets_edges_cp 
os_ta_streets_edges_cp['start_name'] = os_ta_streets_edges_cp['start_edge_idx'].apply(lambda idx: os_ta_streets_edges_cp.loc[idx, 'ta_name'])
# os_ta_streets_edges['eng_name']   = os_ta_streets_edges['mid_edge_idx'].apply(lambda idx: os_ta_streets_edges.loc[idx, 'name'])
os_ta_streets_edges_cp['end_name']   = os_ta_streets_edges_cp['end_edge_idx'].apply(lambda idx: os_ta_streets_edges_cp.loc[idx, 'ta_name'])

After exploring start_name and end_name.<br>
I found there are 220 such cases, trying to solve using ta_streets map didn't help much.

So I decided to give them a psudo name such as:


start no name 1<br>
start no name 2 <br>
OR <br>
end no name 1

etc

In [152]:
os_ta_streets_edges_cp[(os_ta_streets_edges_cp.ta_name == '') & (os_ta_streets_edges_cp.start_name == '') &(os_ta_streets_edges_cp.end_name == '')].shape

(34, 25)

#### How many names with na do we have?

* ta_name is '' = 627
* ta_name and start_name are '' = 120
* ta_name and end_name are ''= 111
* ta_name and end_name and start_name are ''= 34

If we have end name and start name but no name then fill ta_name



In [153]:
os_edges_name_na_idx = os_ta_streets_edges_cp[ 
                                              ~(os_ta_streets_edges_cp.end_name == '') & 
                                              ~(os_ta_streets_edges_cp.start_name == '') &
                                              (os_ta_streets_edges_cp.ta_name == '')].index
print(os_edges_name_na_idx.shape)
start_name_of_na = os_ta_streets_edges_cp.loc[os_edges_name_na_idx, 'start_name']
end_name_of_na = os_ta_streets_edges_cp.loc[os_edges_name_na_idx, 'end_name']

os_ta_streets_edges_cp.loc[os_edges_name_na_idx, 'ta_name'] = start_name_of_na + ', ' + end_name_of_na


(430,)


In [154]:
os_ta_streets_edges_cp[os_ta_streets_edges_cp.ta_name == ''].shape

(197, 25)

In [155]:
## making sure all unnamed roads are actually connector

m = leafmap.Map(center=(32.047, 34.785), zoom=11)
m.add_gdf(os_ta_streets_edges_cp[os_ta_streets_edges_cp.ta_name == ''][['geometry','osmid','name']])

m.add_gdf(os_ta_streets_edges.loc[os_edges_name_na_idx,['geometry','osmid','name']], fill_colors='black')

m

Map(center=[32.047, 34.785], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom…

In [156]:
os_ta_streets_edges_cp[(os_ta_streets_edges_cp.ta_name == '') & (os_ta_streets_edges_cp.start_name == '') &(os_ta_streets_edges_cp.end_name == '')].shape

(34, 25)

Filling for cases where ta_name is empty and start_name **OR** end_name are empty

In [157]:
os_edges_name_na_idx = os_ta_streets_edges_cp[ 
                                              (~(os_ta_streets_edges_cp.end_name == '') | 
                                              ~(os_ta_streets_edges_cp.start_name == '')) &
                                              (os_ta_streets_edges_cp.ta_name == '')].index
print(os_edges_name_na_idx.shape)
start_name_of_na = os_ta_streets_edges_cp.loc[os_edges_name_na_idx, 'start_name']
end_name_of_na = os_ta_streets_edges_cp.loc[os_edges_name_na_idx, 'end_name']

os_ta_streets_edges_cp.loc[os_edges_name_na_idx, 'ta_name'] = start_name_of_na + ', ' + end_name_of_na


(163,)


In [159]:
os_ta_streets_edges_cp[os_ta_streets_edges_cp.ta_name == ''].shape

(34, 25)

### Exporting the os_ta_streets_edges after finally creating proper names (I think)

<!-- NOTE: since we have some columns with mixed list and string etc, parquet is not a good choice to export.

Instead I will export just the geometry and the other columns in a different format then concat.

We will also reload the files and start with from this part onward. -->

In [51]:
os_ta_streets_edges.to_parquet('./csv_tables/os_ta_streets_edges.parquet')