# Argos Search Results Notebook

## Installations

Importing all necessary modules to run this notebook. Ensure selenium, selectorlib, and fake-useragent have been installed prior to running this notebook.

In [None]:
import requests
import json
import pandas as pd
import numpy as np
import time
import warnings

In [None]:
from fake_useragent import UserAgent
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
from selectorlib import Extractor

## Loading Pre-Documented Gender Stereotyped Toys

Taking in predoc_stereotyped_items.csv, a CSV file containing 72 rows.

In [None]:
stereo_toys = pd.read_csv('../predoc_info/predoc_stereotyped_items.csv', delimiter=',')
stereo_toys[:10]

In [None]:
len(stereo_toys)

## Loading List of Toys Collected from Previous Research

all_items.txt contains a list of strings, where each string represents a toy that will be searched on Argos. This text file contains 166 rows.

In [None]:
with open('../predoc_info/all_items.txt') as f:
    all_items = f.read().splitlines()

In [None]:
len(all_items)

## Trial

Creating a short list of 6 toys from all_items. Trial is used so the following functions can be tested on a smaller sample rathen than testing on all 166 toys.

In [None]:
trial = all_items[160:]
trial

In [None]:
len(trial)

In [None]:
generic = ['toys', 'books', 'learning material', 'games', 'sports']

In [None]:
gender = ['boys', 'girls', 'neutral']

## Scraping Functions

### Unique Identifier Function

This function is used to scrape the EAN number for each toy.

In [None]:
def ean(driver):
    ean_list = []
    for index in range(1, 10):
        eans = driver.find_elements('xpath', '//a[@aria-labelledby]')
        for ean in eans:
            ean_list.append(ean.get_attribute('aria-labelledby'))
    return ean_list

### Product Title Function

This function is used to scrape the name of each toy.

In [None]:
def item_info(driver):
    item = []
    elem = driver.find_elements('xpath', "//a[meta/@itemprop]")
    for i in elem:
        item.append(i.text)
    return item

### Product Link Function

This function is used to scrape the associated links of each toy.

In [None]:
def item_link(driver):
    href = []
    links = driver.find_elements('xpath', "//a[@data-test = 'component-product-card-link']")
    for link in links:
        href.append(link.get_attribute('href'))
    return href

### Search Function

This function calls the above three functions, and runs them on each toy for all three queries (for boys, for girls, for kids).

In [None]:
def search(item, who):
    if who == 'neutral':
        query = item + '-for-' + 'kids'
    else:
        query = item + '-for-' + who
    driver.get(f'https://www.argos.co.uk/search/{query}/?clickOrigin=searchbar:home:term:{query}')
    time.sleep(15)
    list_ean = ean(driver)
    item_list = item_info(driver)
    item_page = item_link(driver)
    return (list_ean, item_list), item_page

## Database Initialization

Initializing databases to store scraped data.

In [None]:
columns1 = ['gender', 'query', 'result']
qr = pd.DataFrame(columns=columns1)
columns2 = ['gender', 'query', 'href']
qr_link = pd.DataFrame(columns=columns2)

## Running Queries for Boys, Girls, and Kids (Neutral)

This code is used to scrape all relevant data from the toys included in all_items. As of right now, all_items is used on line 7 in order to run the code through the entire list of toys. Changing all_items with trial on line 7 will faciliate testing as this will run the code on a smaller sample size.

In [None]:
warnings.filterwarnings('ignore')
driver = webdriver.Chrome(ChromeDriverManager().install())
data1 = []
data2 = []
item = ''
for item in all_items:
    for g in gender:
        result, link = search(item, g)
        values1 = [g, item, result]
        values2 = [g, item, link]
        zipped1 = zip(columns1, values1)
        zipped2 = zip(columns2, values2)
        a_dictionary1 = dict(zipped1)
        a_dictionary2 = dict(zipped2)
        time.sleep(15)
        data1.append(a_dictionary1)
        data2.append(a_dictionary2)
driver.close()

Appending EAN data to previously initialized dataframe.

In [None]:
qr = qr.append(data1, True)
qr

Database of toys and their associated links.

In [None]:
qr_link = qr_link.append(data2, True)
qr_link

## Converting Data to CSV File

In [None]:
argos_search_results = pd.DataFrame()

In [None]:
argos_search_results = argos_search_results.append(qr, ignore_index = True)
# argos_search_results_link = argos_search_results_link.append(qr_link, ignore_index =True)

In [None]:
argos_search_results

Export data to CSV file.

In [None]:
argos_search_results.to_csv('argos_search_results.csv', index = False)