# Request Routes
Requests the supplementary data from the Mountain Project API.

In [36]:
import os
import re
import math
import yaml
import requests
import json
import pandas as pd

### Get the Route IDs

The API provides a few cumbersome methods, but the most useful for this project is going to be the `getRoutes()` method. This method takes the nine-digit routeIDs as an argument. Luckily, the URL for each route contains this identifier, and  `"Route URL"` is a column in our scraped.csv file.

In [4]:
scraped = pd.read_csv("scraped.csv")

In [5]:
scraped["URL"][0]

'https://www.mountainproject.com/route/105979968/dreamscape'

From each row, we need to pull out this identifier-----------^

In [6]:
re.search("\d{9}", scraped["URL"][0]).group() # For one row

'105979968'

Add this identifier as a new column in the dataframe.

In [7]:
get_id = lambda entry: int(re.search("\d{9}", entry).group()) # for all rows (and as integers)
scraped["route_id"] = scraped["URL"].apply(get_id)
scraped.head(2)

Unnamed: 0.1,Unnamed: 0,Route,Location,URL,Avg Stars,Your Stars,Route Type,Rating,Pitches,Length,Area Latitude,Area Longitude,route_id
0,0,Dreamscape,Sun Wall > Sand Rock > Alabama,https://www.mountainproject.com/route/10597996...,3.8,-1,Sport,5.11c,1,75.0,34.18041,-85.81555,105979968
1,1,Comfortably Numb,The Pinnacle > Sand Rock > Alabama,https://www.mountainproject.com/route/10590519...,3.6,-1,"Trad, TR",5.9,1,120.0,34.17948,-85.81775,105905196


### Make the requests

I'll first need to load in my private key for using the API.  
You'll have to request access (it's pretty easy) on the site to get your own key.

In [8]:
yaml_path = "~/mtn-proj/.mp-pkey.yaml"

In [9]:
try:
    with open(os.path.expanduser(yaml_path)) as file:
        pkey = yaml.safe_load(file)
except FileNotFoundError:
    print("Cannot find file {}".format(yaml_path))

Here, I'm iteratively constructing the API method call.

In [48]:
def build_request_with_args(id_list_section):
    request_template = "https://www.mountainproject.com/data/get-routes?routeIds={}&key={}".format("{}{}",pkey)
    for routeid in id_list_section:
        request_template = request_template.format(routeid, ",{}{}")
    return request_template.replace(",{}{}", "") # to get rid of the last comma and template holders --> ,{}{}

Create a folder to catch all the downloads.

In [57]:
try: 
    os.mkdir("json-routes") 
except OSError as error: 
    print(error)

Going through the list of routeIDs, requesting data for 200 routes at a time, and downloading that data to files within the folder we just made.

In [58]:
id_list = list(scraped["route_id"])
num_queries = math.ceil(len(id_list)/200)
[first_index, last_index] = [0,200]

for query_num in range(num_queries):
    
    next_200 = build_request_with_args(id_list[first_index:last_index])
    
    api_response = requests.get(next_200)
    
    with open("json-routes/route_set_{}.JSON".format(query_num + 1), "w") as response_file:
        json.dump(api_response.json(), response_file, indent=4)
        
    first_index += 200
    last_index += 200

This took about 5 seconds for 500 routes, and they take up 400KB of space.  

So if there are 50,000 routes in North America, it will take roughly 10 minutes to download all the routes and they will take up 200MB of space. Totally doable.