# Arbitrage opportunities

### Steps of finding arbitrage opportunities:

#### Pre - algorithm
1. Setup
    - Reading wines from monopolet from a .csv file, storing it in a dataframe
    - Reading wines from vivino from a .xlsx file, storing it in a dataframe
    - Setting up dictionary to store the wines with an arbitrage opportunity

#### Algorithm
2. Creating a double for-loop to iterate over all wines in both datasets
3. Using a package called SequenceMatcher that takes in two strings, and compare them to their similarity, returning a percent. 
    - Vinmonopolet include always both producer and wine name, Vivino doesn't always include both.
    - Therefore we put a minimum similarity-percent of 85%.
4. If we find wines that are 85% similar or more, we check if their year is the same (which we think is important)
5. Last we check if the price in Norway is cheaper than the market price
6. If the wine pass all the checks, we store it in the dictionary as an arbitrage opportunity
7. Storing the dictionary as a dataframe

#### Post - algorithm
8. Creating a .csv file with the dataframe stored in step 7

In [1]:
import pandas as pd
from difflib import SequenceMatcher

In [3]:
#Just to load this if I have ran the algorithm earlier as it take a lot of time to run the algorithm
df = pd.read_csv("wines_vinmonopolet.csv")

#Read the dataset from Vivino (Stored as an excel-file)
df_vivino = pd.read_csv("wine_scrape_clean_result.csv")


#Creating a dictionary to store values
arbitrage_wines = {
    "name" : [],
    "vpolet_price" : [],
    "vivino_price" : [],
    "region" : [],
    "country" : [],
    "ratings" : []
}


### Algorithm

In [None]:
#Double for-loop to find all of the matching wines (with 85% accuracy)

for i in range(df.shape[0]):
    for j in range(df_vivino.shape[0]):
        
        #Checking if the algorithm matches by 85%
        if float(SequenceMatcher(None, df.loc[i, "Name"].lower(), (str(df_vivino.loc[j, "WineBrand"]).lower() + " " +  str(df_vivino.loc[j, "WineName"]).lower())).ratio())>0.85:
            
            #Formatting the wine names to check if there is a year in the end (we need matching years as year is treated as a string)

            wine1 = df.loc[i, "Name"].lower()
            wine2 = df_vivino.loc[j, "WineBrand"].lower() + " " +  df_vivino.loc[j, "WineName"].lower()
            wine1 = wine1.split()
            wine2 = wine2.split()
            
            #Year always comes last (if there is a number), so we check the last string in each wine
            if wine1[len(wine1)-1] == wine2[len(wine2)-1]:
                
                #Checking whether there is an arbitrage opportunity
                if float(df.loc[i, "Price"])<float(df_vivino.loc[j, "WinePrice"]):
                    
                    #If there is an arbitrage opportunity, we add the wine to the arbitrage_wines dictionary
                    arbitrage_wines["name"].append(df.loc[i, "Name"])
                    arbitrage_wines["vpolet_price"].append(df.loc[i, "Price"])
                    arbitrage_wines["region"].append(df.loc[i, "Region"])
                    arbitrage_wines["country"].append(df.loc[i, "Country"])
                    arbitrage_wines["vivino_price"].append(df_vivino.loc[j, "WinePrice"])
                    arbitrage_wines["ratings"].append(df_vivino.loc[j, "WineRating"])
                    print("\n--------------------\n")
                    
                    break

#Storing all arbitrage opportunities in a dataframe
arbitrage_wines_df = pd.DataFrame(arbitrage_wines)

### Post - Algorithm

In [None]:
#Create a csv-file with the dataset
arbitrage_wines_df.to_csv("arbitrage_wines.csv")