# But where are my closest wineries?
#### Problem:  Oregon wine pass is great, all these wineries offer great deals, but crawling around their website to find the wineries that are closest sucks.  Also, wouldn't it be great to know the ratings before you went there?
1. Scrape Wine Passport site
2. Call Google API for map distances
3. todo: Get reviews from YELP
4. todo: Topic model reviews on YELP
5. todo: Return df with Name, address, distance, ratings, review topics



In [1]:
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import json
from datetime import datetime, timedelta
import re

In [2]:
# i keep address and api key in separate config because privacy
with open("wine_config.json", "r") as f:
    wine_config = json.load(f)

# here's the OWP's handy dandy website
url = r'https://oregonwinepass.com/sponsor-type/wineries/'
api_base = 'https://maps.googleapis.com/maps/api/directions/json?'
my_addr = wine_config['address']
myk = wine_config['map_api']

##### Scrape with bs4

In [3]:
raw_html = get(url)
soup = BeautifulSoup(raw_html.content, 'html.parser')

In [4]:
wineries = soup.find_all("h3", class_="entry-title scroll-top")
addresses = soup.find_all("span", class_="addr")
winery_list = [x.contents[1].string.strip() for x in wineries]
addr_list = [x.contents[0].string.strip() for x in addresses]

In [5]:
details = soup.find_all("div", class_="details")
details_list = [x.findChildren("li") for x in details]
wine_dict = {}

##### WInery detials are a little funky since there are more than 1 for each winery

In [6]:
# that shib is burried in there, let's get her out jim
details_list[0][0].contents[0]

'2-for-1 Wine Tasting'

##### I'm certain there is a more efficient way of doing this

In [7]:
for i in range(len(winery_list)):
    sub_details = details_list[i]
    sub_list = [x.contents[0] for x in sub_details]
    wine_dict[winery_list[i]] = sub_list
    
wine_details_df = pd.DataFrame.from_dict(wine_dict, orient='index')
wine_details_df.head()    

Unnamed: 0,0,1,2,3
AlexEli Vineyard,2-for-1 Wine Tasting,10% off 4 Bottle Purchase,,
Angel Vine,Bonus pour with Tasting Fee,10% off all bottle purchases,15% off case purchase – mix & match,
Anindor Vineyards,Complimentary tasting for 2,10% off wine purchases,,
ArborBrook Vineyards,2 for 1 tasting,10% off on wine purchases,,
Archer Vineyard,2 for 1 tasting (regularly $20/person),waived tasting fee with a purchase of any 4 wi...,20% off $100 wine purchase,


##### gut check we didn't mess anything up thus far

In [8]:
assert len(wine_details_df) == len(addr_list), 'you done did messed up'
# thank god these are in order, amirite?
wine_details_df['address'] = addr_list

In [9]:
wine_details_df.head()    

Unnamed: 0,0,1,2,3,address
AlexEli Vineyard,2-for-1 Wine Tasting,10% off 4 Bottle Purchase,,,"35803 South Highway 213 , Molalla"
Angel Vine,Bonus pour with Tasting Fee,10% off all bottle purchases,15% off case purchase – mix & match,,"2025 SE 7th Avenue, Portland"
Anindor Vineyards,Complimentary tasting for 2,10% off wine purchases,,,"1171 Vintage Dr, Elkton"
ArborBrook Vineyards,2 for 1 tasting,10% off on wine purchases,,,"17770 NE Calkins Lane , Newberg"
Archer Vineyard,2 for 1 tasting (regularly $20/person),waived tasting fee with a purchase of any 4 wi...,20% off $100 wine purchase,,"32230 NE Old Parrett Mountain Rd, Newberg"


In [10]:
def replace_reserved(string):
    '''remove reserved url/uri characters'''
    for char in [';' , '/' , '?' , ':' , '@' , '&' , '=' , '+' , '$' , ',' , '#']:
        if char in string:
            string = string.replace(char, '')
    
    return string
    
def time_string(string):
    '''deal with google apis deal of formatting times nicely for humans'''
    if 'hours' in string and 'mins' in string:
        t = datetime.strptime(string, '%H hours %M mins')
    
    elif 'hour' not in string and 'mins' in string:
        t = datetime.strptime(string, '%M mins') 
    elif 'mins' in string:
        t = datetime.strptime(string, '%H hour %M mins')
    
    elif 'hours' in string:
        t = datetime.strptime(string, '%H hours %M min')
    else:
        t = datetime.strptime(string, '%H hour %M min')
    
    time_str = str(timedelta(hours=t.hour, minutes=t.minute, seconds=t.second))
    
    return time_str
    
    
def route_data(address):
    '''return distance and formatted duration a given address'''
    address = replace_reserved(address)
    dest = (address.replace(' ', '+'))
    api_str = api_base + 'origin=' + dest + '&destination=' + my_addr + '+Oregon' + '&key=' + myk
    # call api
    resp = get(api_str)
    json_resp = resp.json()
    dist = json_resp['routes'][0]['legs'][0]['distance']['text']
    duration = json_resp['routes'][0]['legs'][0]['duration']['text']
    dura = time_string(duration)
    
    return (dist, dura)


##### Loop through all address and return distance and duration in lists

In [11]:
dists = []
durations = []

for address in wine_details_df['address']:
    dist, duration = route_data(address)
    dists.append(dist)
    durations.append(duration)
    

##### Add distrance and duration to dataframe, sort

In [12]:
wine_details_df['duration'] = durations
wine_details_df['distances'] = dists
wine_details_df.sort_values(by='duration', inplace=True)

wine_details_df

Unnamed: 0,0,1,2,3,address,duration,distances
Namaste Vineyards,2-for-1 Tasting,10% off any purchase of 3 bottles or more,,,"3250 Pacific Hwy, Independence",0:28:00,21.3 mi
Willamette Valley Vineyards,Complimentary tasting for 2,10% off wine purchase,Passport Member benefits does NOT apply on win...,,"8800 Enchanted Way SE, Turner",0:35:00,25.5 mi
Johan Vineyards,Complimentary tasting for 2,,,,"4285 N Pacific Hwy W, Rickreall",0:40:00,30.1 mi
Coria Estates,Complimentary Tasting for 2,10% off bottle purchases,,,"8252 Redstone Ave SE, Salem",0:40:00,27.7 mi
Van Duzer Vineyards,2 for 1 tasting,10% off of 3 bottles or more,,,"11975 Smithfield Rd, Dallas",0:41:00,31.1 mi
Pfeiffer Winery,enjoy 2 complimentary pinot plus tastings in t...,,,,"25040 Jaeg Road, Junction City",0:42:00,28.6 mi
Silas Wines,2 for 1 tasting,10% off a purchase of 6 or more bottles,,,"100 5th St. , Amity",0:46:00,37.5 mi
Franny Beck Wines,Complimentary tasting for two with a one bottl...,10% off with a six bottle purchase,,,"108 5th St, Amity",0:46:00,37.5 mi
Coelho Winery,2 for 1 tasting,10% off wine purchase on 6 or more bottles,,,"111 5th St., Amity",0:46:00,37.5 mi
Mystic Wine,2-for-1 Tasting,5% off of wine purchase,,,"11931 SE Hood View Road , Amity",0:54:00,43.2 mi


In [17]:
wine_details_df.to_csv('oregon_wineries.csv')

# Yelp scraper