## Vivino Webscraper 

#### Introduction: 
Vivino is a popular wine enthusiast website database of information about wine from around the world. This project is a proof of concept for scraping data from Vivino for the purposes of wine understanding the wine market in Spain. This proof of concept has some limitations. First, the list of wines scraped are not a complete and comprehensive view of Spanish wine, as wine culture in Spain has traditionally been that of small production and barrel sales. For this reason, many small wine producers are not included in Vivinos database and therefore not represented in the data. Second, the wines shown only include wines currently available for purchase. Lastly, as a consumer database, the results of ratings may not represent professional opinions and may be limited by the 5-star rating system due to its overly simplistic nature. 

In [1]:
import requests
import json
import pandas as pd
import time

# Instantiate a dictionary of headers
# We only need to `manipulate` an User-Agent key
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
}

# Instantiate a dictionary of query strings
# Defines the only needed payload
payload = {
        "country_codes[]": ["au"],  # "FR", "IT", "DE", "CL", "PT", "AU", "AT", "AR", "US" <-- can add more country codes here
        "currency_code": "AUD",
        "grape_filter": "varietal",
        "min_rating": "1",
        "order_by": "price",
        "order": "asc",
        "page": 1,
        "price_range_max": "500",
        "price_range_min": "0",
        "wine_type_ids[]": "1",
}

# Performs an initial request and gathers the amount of results
r = requests.get('https://www.vivino.com/api/explore/explore?',
                 params=payload, headers=headers)
n_matches = r.json()['explore_vintage']['records_matched']


In [4]:
# Create Dataframe
column_names=["Winery", "Year", "Wine ID", "Wine", "Rating", "num_review", "price",'Country','Region']
df = pd.DataFrame(columns = column_names)


In [None]:
# Iterates through the amount of possible pages
# A page is defined by n_matches divided by 25 (number of results per page)
dfs = []
for i in range(int(n_matches / 25)):
    # Adds the page on the payload
    payload['page'] = i + 1

    print(f'Requesting data from page: {payload["page"]}')

    # Performs the request and saves the matches
    r = requests.get('https://www.vivino.com/api/explore/explore?',
                 params=payload, headers=headers)
    # matches = r.json()['explore_vintage']['matches']
    results = [
        (
            t["vintage"]["wine"]["winery"]["name"],
            t["vintage"]["year"],
            t["vintage"]["wine"]["id"],
            f'{t["vintage"]["wine"]["name"]} {t["vintage"]["year"]}',
            t["vintage"]["statistics"]["ratings_average"],
            t["vintage"]["statistics"]["ratings_count"],
            t["prices"][0]["amount"],
            t['vintage']['wine']['region']['country']['name'],
            t['vintage']['wine']['region']['name'],

        )
        for t in r.json()["explore_vintage"]["matches"] 
    ]       

    df2 = pd.DataFrame(results, columns=column_names)


df = pd.merge([df, df2], axis=True)
    

Requesting data from page: 1


AttributeError: 'DataFrame' object has no attribute 'append'

In [81]:
df

Unnamed: 0,Winery,Year,Wine ID,Wine,Rating,num_review,price,Country,Region
0,Viña Oria,2019,1812139,Garnacha 2019,3.5,327,2.50,España,Cariñena
1,Ayuso,2017,2084964,Abadia del Roble Tinto 2017,2.9,52,2.54,España,La Mancha
2,Castillo de Liria,2020,1643745,Bobal - Shiraz 2020,3.4,85,2.56,España,Valencia
3,Castillo de Liria,2019,1643745,Bobal - Shiraz 2019,3.6,305,2.56,España,Valencia
4,Finca Venta de Don Quijote,2020,7166356,Tempranillo - Syrah 2020,3.6,129,2.72,España,Castilla
...,...,...,...,...,...,...,...,...,...
20,Loxera,2017,1895952,Sentada Sobre la Bestia 2017,3.7,116,12.90,España,Valencia
21,Vinos del Viento - Michael Cooper,2015,5049890,Seleccion Especial Cariñena 2015,4.0,51,12.90,España,Cariñena
22,Luzon,2018,1230455,Altos de Luzón 2018,3.9,261,12.90,España,Jumilla
23,Tridente,2018,2185839,Triton Tinta de Toro 2018,3.8,174,12.90,España,Toro


In [None]:
 # Iterates through every match
    for result in results:
        # Defines the wine's identifier
        _id = result['Wine ID']

        # Defines a page counter
        page_counter = 1

        # Performs an all-time true loop
        while True:
            print(f'Requesting reviews from wine: {_id} and page: {page_counter}')

            # Performs the request and saves the reviews
            r = requests.get(f'https://www.vivino.com/api/wines/{_id}/reviews?per_page=50&page={page_counter}',
                            # headers=headers)
            reviews = r.json()['reviews']
            df = df.append(reviews)

           print(f'Number of reviews: {len(reviews)}')

            # If there are no reviews anymore,
            # it indicates that the loop can be broken
           if len(reviews) == 0:
                # Breaks the loop
                break

            # Otherwise, increments the counter
           page_counter += 1


In [83]:
df.to_csv(r"C:\SD card\Documents\Data Analytics\Vivino Webscraper\vivino.csv", index=False)