# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
import requests
import pandas as pd
import json

In [17]:
# paste the url you obtained for your data
url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=0'

In [18]:
# create the headers with the User Agent in order to be able to do the requests.get
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0'}

In [19]:
# encapsulate the information
zalando = requests.get(url, headers=headers).json()

In [20]:
# create a DataFrame with the articles
zal_articles = pd.DataFrame(zalando['articles'])

In [21]:
# check the DataFrame
zal_articles.head(5)

Unnamed: 0,sku,name,price,sizes,url_key,media,brand_name,is_premium,family_articles,flags,tracking_information,product_group,delivery_promises,amount,outfits
0,VA222O03F-C12,T-shirt basique - athletic heather,"{'original': '20,00 €', 'promotional': '14,00...","[XS, S, M, L, XL, XXL]",vans-left-chest-logo-tee-t-shirt-basique-va222...,[{'path': 'spp-media-p1/37e898a8bf8c3eefa65b25...,Vans,False,"[{'sku': 'VA222O03F-C12', 'url_key': 'vans-lef...","[{'key': 'discountRate', 'value': '-30%', 'tra...",{'metrigo_impression_urls': ['https://ccp-et.a...,clothing,[],,
1,VA215O02L-Q11,COMFYCUSH OLD SKOOL - Baskets basses - black,"{'original': '85,00 €', 'promotional': '59,50...","[34.5, 35, 36.5, 37, 38, 39, 40, 40.5, 41, 42,...",vans-baskets-basses-black-va215o02l-q11,[{'path': 'spp-media-p1/68afc439d58734ffb24dcf...,Vans,False,"[{'sku': 'VA215O02L-Q11', 'url_key': 'vans-bas...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",{'metrigo_impression_urls': ['https://ccp-et.a...,shoe,[],,
2,C7642E05D-K11,BERMUDA - Short de sport - navy,"{'original': '29,95 €', 'promotional': '25,55...","[S, M, L, XL, XXL]",champion-bermuda-short-de-sport-navy-c7642e05d...,[{'path': 'spp-media-p1/acc1338f1191408fb05a92...,Champion,False,"[{'sku': 'C7642E05D-K11', 'url_key': 'champion...","[{'key': 'discountRate', 'value': '-15%', 'tra...",{'metrigo_impression_urls': ['https://ccp-et.a...,clothing,[],,
3,PI922O0GU-K11,5 PACK - T-shirt basique - dark blue/grey/khaki,"{'original': '29,99 €', 'promotional': '26,99...","[XS, S, M, L, XL, XXL, 3XL, 4XL, 5XL, 6XL]",pier-one-5-pack-t-shirt-basique-dark-bluegreyk...,[{'path': 'spp-media-p1/77c420d19d293a3ba68a1f...,Pier One,False,"[{'sku': 'PI922O0GU-K11', 'url_key': 'pier-one...","[{'key': 'discountRate', 'value': '-10%', 'tra...",,clothing,[],,
4,TO122O08I-Q11,LOGO TEE - T-shirt imprimé - black,"{'original': '39,95 €', 'promotional': '29,95...","[XS, S, M, L, XL, XXL, 3XL]",tommy-hilfiger-logo-tee-t-shirt-imprime-to122o...,[{'path': 'spp-media-p1/6a1582f6fde836a5917156...,Tommy Hilfiger,False,"[{'sku': 'TO122O08I-Q11', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-25%', 'tra...",,clothing,[],,


In [22]:
# encapsulate the information with a new offset
zalando1 = requests.get(url, params={'offset': '84'}, headers=headers).json()

In [23]:
# create a DataFrame with the articles
zal_articles1 = pd.DataFrame(zalando1['articles'])

In [24]:
# check if the DataFrame has different articles than the previous one
zal_articles1.head(5)

Unnamed: 0,sku,name,price,sizes,url_key,media,brand_name,is_premium,family_articles,flags,tracking_information,product_group,delivery_promises,amount,outfits
0,NER22O000-T11,TEE 5 PACK - T-shirt basique - multi,"{'original': '31,95 €', 'promotional': '28,75...","[XS, S, M, L, XL, XXL]",newport-bay-sailing-club-newport-multi-tee-5-p...,[{'path': 'spp-media-p1/c3f10ffe878c337aa98cbc...,Newport Bay Sailing Club,False,"[{'sku': 'NER22O000-T11', 'url_key': 'newport-...","[{'key': 'discountRate', 'value': '-10%', 'tra...",{'metrigo_impression_urls': ['https://ccp-et.a...,clothing,[],,
1,VA222O03F-C12,T-shirt basique - athletic heather,"{'original': '20,00 €', 'promotional': '14,00...","[XS, S, M, L, XL, XXL]",vans-left-chest-logo-tee-t-shirt-basique-va222...,[{'path': 'spp-media-p1/37e898a8bf8c3eefa65b25...,Vans,False,"[{'sku': 'VA222O03F-C12', 'url_key': 'vans-lef...","[{'key': 'discountRate', 'value': '-30%', 'tra...",{'metrigo_impression_urls': ['https://ccp-et.a...,clothing,[],,
2,VA215O02L-Q11,COMFYCUSH OLD SKOOL - Baskets basses - black,"{'original': '85,00 €', 'promotional': '59,50...","[34.5, 35, 36.5, 37, 38, 39, 40, 40.5, 41, 42,...",vans-baskets-basses-black-va215o02l-q11,[{'path': 'spp-media-p1/68afc439d58734ffb24dcf...,Vans,False,"[{'sku': 'VA215O02L-Q11', 'url_key': 'vans-bas...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",{'metrigo_impression_urls': ['https://ccp-et.a...,shoe,[],,
3,PI922O0GU-K11,5 PACK - T-shirt basique - dark blue/grey/khaki,"{'original': '29,99 €', 'promotional': '26,99...","[XS, S, M, L, XL, XXL, 3XL, 4XL, 5XL, 6XL]",pier-one-5-pack-t-shirt-basique-dark-bluegreyk...,[{'path': 'spp-media-p1/77c420d19d293a3ba68a1f...,Pier One,False,"[{'sku': 'PI922O0GU-K11', 'url_key': 'pier-one...","[{'key': 'discountRate', 'value': '-10%', 'tra...",,clothing,[],,
4,TO122O08I-Q11,LOGO TEE - T-shirt imprimé - black,"{'original': '39,95 €', 'promotional': '29,95...","[XS, S, M, L, XL, XXL, 3XL]",tommy-hilfiger-logo-tee-t-shirt-imprime-to122o...,[{'path': 'spp-media-p1/6a1582f6fde836a5917156...,Tommy Hilfiger,False,"[{'sku': 'TO122O08I-Q11', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-25%', 'tra...",,clothing,[],,


## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [29]:
# create a DataFrame with the articles of the first page
promo_homme = zalando['articles']

In [30]:
# to collect all the pages:
# Step 1: make a for loop with a range of the total number of pages
# Step 2: get the url of every page and obtain its articles and append them into the promo_homme list
for i in range(2, zalando['pagination']['page_count'] + 1):
    zalando_total = requests.get(url, headers=headers, params={'current_page':i}).json()
    promo_homme.extend(zalando_total['articles'])

In [31]:
len(promo_homme)

74927

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [33]:
# create a dataframe with the articles of all pages
zalando2 = pd.DataFrame(promo_homme)

In [34]:
# check the number of articles by brand - Trending brand
zalando2.groupby('brand_name')['name'].count().nlargest(1)

brand_name
Pier One    15918
Name: name, dtype: int64

In [36]:
# create a column for original prices by obtaining the key original and replacing spaces and € symbol
zalando2['original_price'] = zalando2['price'].apply(lambda x: x['original']).replace(to_replace='\s{1,}\W', value='', regex=True)

In [37]:
# from the str, change the ',' for the '.' to be able to convert it to float
zalando2['original_price'] = zalando2['original_price'].replace(to_replace=',', value='.', regex=True)

In [38]:
# convert to float
zalando2['original_price'] = zalando2['original_price'].apply(lambda x: float(x))

In [39]:
# create a column for promo prices by obtaining the key original and replacing spaces and € symbol
zalando2['promo_price'] = zalando2['price'].apply(lambda x: x['promotional']).replace(to_replace='\s{1,}\W', value='', regex=True)

In [40]:
# from the str, change the ',' for the '.' to be able to convert it to float
zalando2['promo_price'] = zalando2['promo_price'].replace(to_replace=',', value='.', regex=True)

In [41]:
# convert to float
zalando2['promo_price'] = zalando2['promo_price'].apply(lambda x: float(x))

In [42]:
# create a column for discount by subtracting original price minus promo price
zalando2['discount'] = zalando2['original_price'] - zalando2['promo_price']

In [55]:
# check the product with biggest discount (want to know name and brand)
# I use .mean() as there are some products that appear more than once
zalando2.groupby(['sku', 'name','brand_name'])[['discount']].mean().nlargest(1, 'discount')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,discount
sku,name,brand_name,Unnamed: 3_level_1
TO112C035-B11,CLASSIC PENNY LOAFER - Mocassins - beige,Tommy Hilfiger,65.9


In [56]:
# check the total sum of discount
zalando2['discount'].sum()

683042.4