In this exercise we determine the number of stops for each town.

First we initialise PySpark.


In [1]:
from pyspark import SparkContext

sc = SparkContext.getOrCreate()


We first read all the stops. The specified file is a preprocessed version of the JSON "stops.txt", in which 
each line contains one stop in the format 

halte_id;halte_name;lat;long;town_name. 

This makes it easier
to parse since it only requires a call to str.split().

Next we filter out all the stops that don't have a town specified since we cannot process these otherwise. This
is done using a simple filter.

In order to combine the stops and count them we map each stop to (town-name, 1). 

And then we add all the tuples as (X, n) + (X, m) = (X, n + m) using the reduceByKey method which combines all tuples
that have the first element in common.


In [2]:
stops_per_town = sc.textFile("./converted_stops.csv").map(lambda x: x.split(";")).filter(lambda x: x[4] != '').map(lambda x: (x[4], 1)).reduceByKey(lambda x,y: x + y).collect()


Finally we print the output.


In [3]:
for town in stops_per_town:
  print("{}: {}".format(town[0].encode('utf-8'), town[1]))

Nijlen: 40
Banneux: 10
Huldenberg: 30
Wilskerke: 3
Doel: 13
Avekapelle: 8
Wijgmaal: 24
Burcht: 22
Pamel: 31
Budingen: 36
Deerlijk: 66
Niel-Bij-As: 6
Ellikom: 16
Kuurne: 71
SPV: 14
Wulveringem: 10
Ekeren: 58
Peutie: 18
Holheide: 1
Heukelom: 13
Aalbeke: 8
Peer: 55
Heers: 26
Tusvoort: 2
Oudenaarde: 69
Oedelem: 47
Beverst: 12
Spurk: 6
Nederbrakel: 49
Bree: 99
Wimmertingen: 4
Elen: 23
Sint-Rochus: 15
Munte: 4
Melsele: 21
Neerlinter: 16
Gooik: 40
Poederlee: 12
Ichtegem: 40
Wezembeek-Oppem: 14
Sint-Amandsberg: 54
Mater: 31
Kampenhout: 22
Lanklaar: 38
Helkijn: 12
Stokrooie: 18
Opitter: 33
Oordegem: 24
Nederhasselt: 16
Kaster: 7
Meerle: 13
Webbekom: 18
Schorisse: 21
Reet: 37
Ieper: 100
Olmen: 18
Sint-Maria-Horebeke: 22
Poppel: 16
Genendijk: 3
Elsene: 9
Rumbeke: 40
Berg: 20
Diest: 65
Koekelare: 59
Kwatrecht: 18
Hoeilaart: 44
Schiplaken: 2
Ulbeek: 15
Oud-Turnhout: 47
Ardooie: 46
Opvelp: 10
Zelem: 36
Loenhout: 28
Krombeke: 13
Zwankendamme: 3
Westrem: 10
Oeren: 1
Sint-Jansteen: 6
Engelmanshoven: 7
