### Kim Vo - Distilling the web into GIS datasets

#### Project details: 
Scraping the addresses of Light Rail Stations in Seattle and display them on an interactive web map. I chose to use Light Rail stations for this project because I expected to work with small data, so I could easily track the geocoding proccess, learn how it works, and also check the final map so see if there are any lost data.

#### My work proccess:
* Scraped data from the internet using lxml and cssselect
* Converted data from dictionary structure into GeoData using Pandas, and then to GeoDataFrame using GeoPandas
* Created interactive web map using Folium

Light Rail Stations data from https://www.soundtransit.org/Rider-Guide/link-light-rail/link-light-rail-stations  <br>
Link to students.washington.edu web space: http://students.washington.edu/vokim/StationsMap.html <br>

#### Hours spent on this project:
* Find and scrape data: 1 hour
* Convert to geodataframe: 6 hours
* Display map with folium: 3 hours
* Prepare materials for submission: 1 hour

In [2]:
import urllib2, lxml, pandas, folium, os, urllib, json, geocoder
from lxml import html
from geopandas import GeoDataFrame
from shapely.geometry import Point

In [5]:
# Setting up URL to read and parse
url = "https://www.soundtransit.org/Rider-Guide/link-light-rail/link-light-rail-stations"
req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib2.urlopen( req )
doc_text = con.read()
doc = lxml.html.fromstring(doc_text)
doc.make_links_absolute(url)

In [6]:
# Store data in lists
name, address = [], []
for row in doc.cssselect(".field-item p"):
    lines = row.text_content().split("\n") # Handle <br> tag
    name.append(lines[0].replace("\t",""))
    address.append(lines[1].replace("\t",""))

I tried opencage but didn't really like it, and found out that ArcGIS geocoder is easy and it returns more accurate data <br>
Source: https://github.com/DenisCarriere/geocoder

In [7]:
# Put lists into dictionary
data = {'Name':[], 'Address':[], 'lat':[], 'lng':[]}
i = 0
while i < len(name):
    data['Name'].append(name[i])
    data['Address'].append(address[i])
    g = geocoder.arcgis(address[i], maxRows=1)    # Convert address to long & lat
    data['lat'].append(g.lat)                     # Parse long & lat from json file
    data['lng'].append(g.lng)
    i += 1

Source: https://gis.stackexchange.com/questions/174159/convert-a-pandas-dataframe-to-a-geodataframe/174168


In [8]:
# Convert data in Dictionary to GeoDataFrame
gdf = pandas.DataFrame(data,columns=['Name','Address','lat','lng'])
geometry = [Point(xy) for xy in zip(gdf['lng'], gdf['lat'])] # Create point geometry
gdf = gdf.drop(['lng', 'lat'], axis=1)
crs = {'init': 'epsg:4326'}                                  # Set projection
gdf = GeoDataFrame(gdf, crs=crs, geometry=geometry)          # Convert dataframe->geodataframe
gdf.head()

Unnamed: 0,Name,Address,geometry
0,University of Washington Station,"3720 Montlake Blvd NE, Seattle, WA",POINT (-122.3045020146049 47.64911603275507)
1,Capitol Hill Station,"140 Broadway E., Seattle, WA",POINT (-122.3208259682629 47.6192193967835)
2,Westlake Station,"4th Ave & Pine St., Seattle, WA",POINT (-122.3376435 47.611179)
3,University Street Station,"3rd Ave & Seneca St., Seattle, WA",POINT (-122.3353215 47.60702099999999)
4,Pioneer Square Station,"3rd Ave & James St., Seattle, WA",POINT (-122.3314425 47.602755)


In [18]:
# Make map with custom icon and pop up
myMap = folium.Map(location=[47.5399, -122.3001], zoom_start=11,
                   tiles='openstreetmap')
for i,row in gdf.iterrows():
    folium.Marker([row.geometry.y, row.geometry.x],
                  icon = folium.features.CustomIcon('http://205.166.161.233/MyRide/css/ico/bus_pointer.svg', 
                                                    icon_size=(35, 35)),
                  popup= row.Name + "<br/>" + row.Address).add_to(myMap)
myMap

In [10]:
gdf.to_file('stations.shp')       # Export shp
myMap.save('StationsMap.html')    # Save HTML doc

#### Challenges
I first used OpenCageData to convert addressed into longs and lats, but 3 stations were lost during the process. So I searched for another option and used geocoder ArcGIS to do the conversion. I'm also not very sure about the difference between Pandas and GeoPandas. Is it correct that Pandas creates Data Frame table, and GeoPandas creates GeoDataFrame?
