In this exercise we will calculate the number of stops within the radius of a specified position.

First we initialise PySpark.


In [1]:
from pyspark import SparkContext

sc = SparkContext.getOrCreate()


We first read all the stops. The specified file is a preprocessed version of the JSON "stops.txt", in which 
each line contains one stop in the format 

halte_id;halte_name;lat;long;town_name

This makes it easier
to parse since it only requires a call to str.split().


In [2]:
stops = sc.textFile("./converted_stops.csv").map(lambda stop: tuple([x.strip() for x in stop.split(";")]))


In order to determine the amount of stops inside a radius, we first need to
add the radius and coordinate data to the stops. 

The user can select a point by setting the variables "lat", "long" and "radius"
which specifies the radius in meters.

This is done using a simple map.
The result is an RDD with tuples of the form (stop, radius, point).


In [3]:
# these need to be floats!
lat = 51.21989
long = 4.40346
radius = 3000.0 # in meters

stops_with_geodata = stops.map(lambda stop: (stop, radius, (lat, long)))


Next, we create a function that determines the distance between the specified 
point, and a set of coordinates. We use the function to map the 
(stop, radius, point) tuples to (stop, radius, distance) tuples. 
We then filter the tuples where distance > radius.

Note: since the earth is a sphere euclidian distances are not 
accurate enough, I have used an online implementation of the haversine method.
http://evoling.net/code/haversine/
https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points/4913653#4913653
There are many sources, I don't know which one is the original.



In [4]:
def haversine(coord1, coord2):
    from math import radians, cos, sin, asin, sqrt
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    
    lat1, lon1 = coord1
    lat2, lon2 = coord2
    
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2.0)**2.0 + cos(lat1) * cos(lat2) * sin(dlon/2.0)**2.0
    c = 2.0 * asin(sqrt(a)) 
    # Radius of earth in kilometers is 6371
    km = 6371.0 * c
    m = km * 1000.0
    return m

def get_stop_coord(stop):
  """Retrieve the geo coordinate of a stop."""
  return float(stop[2]), float(stop[3])

stop_with_distance = stops_with_geodata.map(lambda x: (x[0], x[1], haversine(get_stop_coord(x[0]), x[2])))
stops_within_radius = stop_with_distance.filter(lambda x: x[1] >= x[2]) # radius >= distance
stop_list = stops_within_radius.collect() # all the stops
stop_count = stops_within_radius.count() # amount of stops


Finally, we print the result.


In [5]:
print("{} stops found within radius of {} meters:\n".format(stop_count, radius))
print("Stad - Halte naam - Afstand")
print("---------------------------\n")
for stop in stop_list:
  print("{} - {} - {}".format(stop[0][4].encode('utf-8'), stop[0][1].encode('utf-8'), stop[2]))

459 stops found within radius of 3000.0 meters:

Stad - Halte naam - Afstand
---------------------------

Antwerpen - Zurenborg - 2158.39363984
Antwerpen - A. Grisarstraat - 2487.26174391
Antwerpen - A. Van Cauwelaert - 1971.34288902
Antwerpen - A. Van Cauwelaert - 2004.53944533
Antwerpen - Straatsburgdok - 2632.41403237
Antwerpen - Bestorming - 1571.41017311
Antwerpen - Begijnenvest - 1112.96963604
Antwerpen - Sint-Jansplein - 1009.95947498
Antwerpen - Sint-Jansplein - 1042.45534847
Antwerpen - F. Rooseveltplaats perron 25 - 1001.22774256
Antwerpen - Cadix - 1479.79224674
Antwerpen - Stadspark - 864.219171299
Antwerpen - Ballaarstraat - 1748.31038733
Antwerpen - Begijnenvest - 1131.42336092
Antwerpen - Montigny - 1715.09403069
Antwerpen - Bestorming - 1617.601695
Antwerpen - Belgiëlei - 1811.9008656
Antwerpen - Belgiëlei - 1920.76584647
Berchem - Station perron 21 - 2952.85534202
Berchem - Station perron 12 - 2962.39782124
Berchem - Station perron 13 - 2944.84554314
Berchem - Station 