## **Funko Pop - Web Scraping Project**

### **Introduction**

> **If you wanna replicate, maybe you need to install some of the packages with PIP command.**

### **Libraries**

In [1]:
import json
import math
import requests
import pandas as pd

from tqdm.notebook import tqdm

### **Verifying Request and Defining Variables**

Their API uses only one endpoint to get the queries. Let's define the URL!

In [2]:
url = 'https://www.funko.com/api/search/template'

Differently from the Just Watch project, this uses a post request to get the data. 

The *collection* key value was collected checking the requests from the website, but it's apparently doesn't make any greater difference for this project.

In [3]:
post_data = {
    'collection': '241066344637',
    'page': 1,
    'pageCount': 50,
    'type': 'shop',
    'sort': {
        'title': 'asc'
    }
}

To get the data through post request we need to specify which content type we want to receive. 

Let's create our request headers!

In [4]:
headers = {
    'content-type': 'application/json', 
}

Now, let's check if the request is working!

In [5]:
req = requests.post(url, data=json.dumps(post_data), headers=headers)
req.status_code

200

Great! And now the total number of results!

In [6]:
total = req.json()['total']
total

1482

The API works with a page system, so let's calculate how many pages there are!

In [7]:
n_requests = math.ceil(total/post_data['pageCount'])
n_requests

30

### **Scraping**

The retrieved data from the API contains many unuseful information. We need to encapsulate in a function the data collection of each item!

In [8]:
def get_item_data(data: dict) -> dict:
    item_data = {
        'uid': data['uid'],
        'title': data['title'],
        'product_type': data['productType'],
        'price': data['price'],
        'interest': data['interest'],
        'license': data['license'],
        'tags': data['tags'],
        'vendor': data['vendor'],
        'form_factor': data['formFactor'],
        'feature': data['feature'],
        'related': data['relatedProducts'],
        'description': data['description'],
        'gid': data['gid'],
        'created_at': data['createdAt'],
        'published_at': data['publishedAt'],
        'updated_at': data['updatedAt'],
        'handle': data['handle'],
        'img': data['media'][0]['src']
    }
    
    return item_data
    

Good. Let's collect all the data!

In [9]:
data = []

for i in tqdm(range(n_requests)):
    page = i + 1
    post_data['page'] = page
    req = requests.post(url, data=json.dumps(post_data), headers=headers)
    if req.status_code != 200:
        raise requests.ConnectionError('Connection Error')
    
    page_data = req.json()['hits']
    for item in page_data:
        data.append(get_item_data(item))

  0%|          | 0/30 [00:00<?, ?it/s]

Checking the total number of items collected.

In [10]:
len(data)

1482

Great!! Let's visualize it and save!

In [11]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,uid,title,product_type,price,interest,license,tags,vendor,form_factor,feature,related,description,gid,created_at,published_at,updated_at,handle,img
0,7051374362813,"""It's Crunch Time "" Kids Tee - General Mills",Apparel,7.0,[Ad Icons],[General Mills],"[Apparel, Cereal Day, Kids Tee, Markdown Item,...",Funko Pop Up Shop,[],[],"[7051374788797, 7051374854333, 7051374723261, ...","<p>""It's Crunch Time"" with the Count Chocula P...",gid://shopify/Product/7051374362813,2021-10-29T00:02:52-00:00,2021-10-29T16:30:00-00:00,2022-05-23T16:50:36-00:00,ad-icons-general-mills-its-crunch-time-kids-bl...,https://cdn.shopify.com/s/files/1/1052/2158/pr...
1,7231381405885,"""This is the Way"" Kids Tee - The Mandalorian",Apparel,10.0,[Star Wars],[Star Wars],"[Apparel, Kids Tee, May the 4th, May the 4th B...",Funko Shop,[],[],"[7254814130365, 7254814064829, 4491928240194, ...",<p>Celebrate May the Fourth in stellar style w...,gid://shopify/Product/7231381405885,2022-05-03T20:57:34-00:00,2022-05-04T15:30:00-00:00,2022-05-27T13:25:49-00:00,star-wars-this-is-the-way-kids-purple-tee,https://cdn.shopify.com/s/files/1/1052/2158/pr...
2,7231381536957,"""This is the Way"" Neon Blast Kids Tee - The Ma...",Apparel,10.0,[Star Wars],[Star Wars],"[Apparel, May the 4th, May the 4th Be With You...",Funko Shop,[],[],"[7254814130365, 7254814064829, 4491928240194, ...",<p>Celebrate May the Fourth in stellar style w...,gid://shopify/Product/7231381536957,2022-05-03T20:57:40-00:00,2022-05-04T15:30:01-00:00,2022-05-29T18:15:32-00:00,star-wars-the-mandalorian-this-is-the-way-kids...,https://cdn.shopify.com/s/files/1/1052/2158/pr...
3,7051374395581,"""Time for a Midnight Bite"" Tee - General Mills",Apparel,14.0,[Ad Icons],[General Mills],"[Apparel, Cereal Day, Markdown Item, Sale, T-S...",Funko Pop Up Shop,[],[],"[7051374788797, 7051374854333, 7051374723261, ...","<p>Like the Pop! Tee says, it's ""Time for a Mi...",gid://shopify/Product/7051374395581,2021-10-29T00:02:57-00:00,2021-10-29T16:30:01-00:00,2022-05-26T04:04:45-00:00,ad-icons-general-mills-midnight-bite-black-tee,https://cdn.shopify.com/s/files/1/1052/2158/pr...
4,7231381668029,"""Where He Goes, I Go"" Grogu Kids Tee - The Man...",Apparel,10.0,[Star Wars],[Star Wars],"[Apparel, May the 4th, May the 4th Be With You...",Funko Shop,[],[],"[7254814130365, 7254814064829, 4491928240194, ...",<p>Celebrate May the Fourth in stellar style w...,gid://shopify/Product/7231381668029,2022-05-03T20:57:45-00:00,2022-05-04T15:30:02-00:00,2022-05-30T02:40:26-00:00,star-wars-where-he-goes-i-go-grogu-kids-tee,https://cdn.shopify.com/s/files/1/1052/2158/pr...


In [12]:
df.to_csv('data.csv', index=False)