# [ Chapter 4 - Crowdsourced Relevance ] 
# Setting up the Retrotech Dataset

In [3]:
import sys
sys.path.append('..')
from aips import *  
from pyspark.sql import SparkSession
engine = get_engine()
spark = SparkSession.builder.appName("AIPS").getOrCreate()

## Download the Retrotech (Ecommerce) Products + Signals Dataset

In [1]:
#Get datasets
![ ! -d 'retrotech' ] && git clone --depth 1 https://github.com/ai-powered-search/retrotech.git
! cd retrotech && git pull
! cd retrotech && tar -xvf products.tgz -C '../data/retrotech/' && tar -xvf signals.tgz -C '../data/retrotech/'

## Get a Feel for the Product Catalog

### Listing 4.1

In [2]:
! cd data/retrotech/ && head products.csv

## Index the Products into the Search Engine

### Listing 4.2

In [4]:
products_collection = engine.create_collection("products")
products_collection.write_from_csv("data/retrotech/products.csv")

Wiping 'products' collection
Status: Failure; Response:[ {'responseHeader': {'status': 400, 'QTime': 39}, 'Operation delete caused exception:': 'org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not find collection : products', 'exception': {'msg': 'Could not find collection : products', 'rspCode': 400}, 'error': {'metadata': ['error-class', 'org.apache.solr.common.SolrException', 'root-error-class', 'org.apache.solr.common.SolrException'], 'msg': 'Could not find collection : products', 'code': 400}} ]
Creating 'products' collection
Status: Success
Loading products
products Schema: 
root
 |-- upc: long (nullable = true)
 |-- name: string (nullable = true)
 |-- manufacturer: string (nullable = true)
 |-- shortDescription: string (nullable = true)
 |-- longDescription: string (nullable = true)

Status: Success


## Verify Searches Work

### Listing 4.3

In [5]:
def product_search_request(query):
    return {
        "query": query,
        "fields": ["upc", "name", "manufacturer", "score"],
        "limit": 5,
        "params": {
            "qf": "name manufacturer longDescription",
            "defType": "edismax",
            "indent": "true",
            "sort": "score desc, upc asc"
        }
    }

In [6]:
query = "ipod"
request = product_search_request(query)
response = products_collection.search(request)
display_product_search(query, engine.docs_from_response(response))

## Get a Feel for the Signals Data

In [7]:
! cd data/retrotech && head signals.csv

## Index the Signals into the Search Engine

### Listing 4.4

In [8]:
signals_collection = engine.create_collection("signals")
signals_collection.write_from_csv("data/retrotech/signals.csv")

## Success!

You have now indexed the RetroTech product catalog and signals into the search engine, and run a sample query against the product collection. The results don't look very relevant using the out of the box keyword scoring function, of course, but we'll be working to improve that throughout the rest of this book!

In the next section, we'll take a look at our first crowd-sourced AI-powered search technique: [Signals Boosting](2.signals-boosting.ipynb). 