# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

In [35]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [12]:
import urllib
import pandas as pd
import requests
import json

In [28]:
destination = 'promo-homme'
limit = 84
page_offset = 1
url_api = f'https://www.zalando.fr/api/catalog/articles?categories={destination}&limit={limit}&offset={page_offset * limit}'
#url_api

In [29]:
json_clothes = json.load(urllib.request.urlopen(url_api))
#len(data_clothes.get('articles'))

df_clothes = pd.DataFrame(json_clothes.get('articles'))
df_clothes.head(10)

Unnamed: 0,sku,name,price,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,amount,outfits
0,PI952M00B-C11,Montre - gunmetal,"{'original': '29,99 €', 'promotional': '24,09 ...",[One Size],pier-one-montre-a-aiguilles-gunmetal-pi952m00b...,[{'path': 'spp-media-p1/e303a6d51bce3d9aa6f5ff...,Pier One,False,[],"[{'key': 'discountRate', 'value': '-20%', 'tra...",accessoires,[],,
1,THC22O00A-T11,3 PACK - T-shirt imprimé - multi,"{'original': '69,99 €', 'promotional': '19,95 ...","[S, M, L, XL, XXL]",threadbare-t-shirt-imprime-multi-thc22o00a-t11,[{'path': 'spp-media-p1/ac35baa5299839569e73a4...,Threadbare,False,[],"[{'key': 'discountRate', 'value': '-71%', 'tra...",clothing,[],,
2,JA222S0N0-K11,JCOPINN HOOD REGULAR FIT - Sweat à capuche - n...,"{'original': '39,99 €', 'promotional': '27,99 ...","[S, M, L, XL, XXL]",jack-jones-jcopinn-sweatshirt-ja222s0n0-k11,[{'path': 'spp-media-p1/0df75f081a263e3b96d5e7...,Jack & Jones,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",clothing,[],,
3,AS142A0QQ-K11,FUJITRABUCO SKY - Chaussures de running - dire...,"{'original': '144,95 €', 'promotional': '108,9...","[40, 40.5, 41.5, 42, 42.5, 43.5, 44, 44.5, 45,...",asics-fujitrabuco-sky-chaussures-de-running-di...,[{'path': 'spp-media-p1/913cc125baf73845bdc92f...,ASICS,False,[],"[{'key': 'discountRate', 'value': '-25%', 'tra...",shoe,[],220 g,
4,LA222O008-A11,T-shirt à manches longues - weiß,"{'original': '49,90 €', 'promotional': '42,90 ...","[M, L, XL, XXL, 3XL]",lacoste-t-shirt-a-manches-longues-blanc-la222o...,[{'path': 'spp-media-p1/933e61649cd33af5989f5d...,Lacoste,False,[],"[{'key': 'discountRate', 'value': 'Jusqu’à -14...",clothing,[],,
5,H0422P01R-Q11,CORE PRINTS - Polo - black print,"{'original': '28,95 €', 'promotional': '24,75 ...","[XS, S, M, L, XL]",hollister-co-core-prints-polo-black-print-h042...,[{'path': 'spp-media-p1/753fab3d494e33238b126b...,Hollister Co.,False,[],"[{'key': 'discountRate', 'value': '-15%', 'tra...",clothing,[],,
6,HU752D008-802,BALDWIN - Ceinture - black,"{'original': '84,95 €', 'promotional': '59,45 ...","[80, 85, 90, 95, 100, 105, 110, 115]",hugo-baldwin-ceinture-noir-hu752d008-802,[{'path': 'spp-media-p1/280b909b3f2e33119a65ec...,HUGO,True,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",accessoires,[],,
7,DH022A007-K11,FASHION SLIM FIT - Costume - navy,"{'original': '99,95 €', 'promotional': '59,95 ...","[46, 50, 52, 54, 90, 94, 98, 102, 106]",isaac-dewhirst-1880-fashion-costume-navy-dh022...,[{'path': 'spp-media-p1/13551870c9f235fa8abe8a...,Isaac Dewhirst,False,[],"[{'key': 'discountRate', 'value': '-40%', 'tra...",clothing,[],,
8,LE222A01W-Q13,511 SLIM FIT - Jean slim - nightshine,"{'original': '99,95 €', 'promotional': '69,95 ...","[26x30, 27x30, 27x32, 28x30, 28x32, 29x30, 29x...",levi-s-511-slim-fit-jean-slim-le222a01w-q13,[{'path': 'spp-media-p1/01ff6d60dc4a3631a784f2...,Levi's®,False,[],"[{'key': 'discountRate', 'value': 'Jusqu’à -30...",clothing,[],,"[{'id': 'K806AqQCRuG', 'url_key': '/outfits/K8..."
9,YO121001P-A11,T-shirt à manches longues - white,"{'original': '14,99 €', 'promotional': '12,79 ...","[XXS, XS, S, M, L, XL, XXL]",yourturn-t-shirt-a-manches-longues-white-yo121...,[{'path': 'spp-media-p1/e4becef537343d789e7091...,YOURTURN,False,[],"[{'key': 'discountRate', 'value': '-15%', 'tra...",clothing,[],,


## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [37]:
import urllib
import pandas as pd
import requests
import json
import math

In [38]:
#destination = 'promo-homme'
#limit = 84
#page_offset = 0
#url_api = f'https://www.zalando.fr/api/catalog/articles?categories={destination}&limit={limit}&offset={page_offset * limit}'

In [39]:
destination = 'promo-homme'
limit = 84
page_offset = 0
url_api = f'https://www.zalando.fr/api/catalog/articles?categories={destination}&limit={limit}&offset={page_offset * limit}'

json_clothes = json.load(urllib.request.urlopen(url_api))

list_clothes = []
for i in range(math.ceil(int(json_clothes.get('total_count'))/84)):
    page_offset = i
    json_clothes = json.load(urllib.request.urlopen(url_api))
    list_clothes.extend(json_clothes.get('articles'))

df_clothes = pd.DataFrame(list_clothes)
df_clothes.head(10)
len(df_clothes)

Unnamed: 0,sku,name,price,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,outfits,amount
0,N1242E1AA-Q11,DRY PANT TAPER - Pantalon de survêtement - bla...,"{'original': '39,95 €', 'promotional': '33,95 ...","[S, L, XL, XXL]",nike-performance-dry-pant-taper-pantalon-de-su...,[{'path': 'spp-media-p1/ec7733610b873228a9face...,Nike Performance,False,[],"[{'key': 'discountRate', 'value': '-15%', 'tra...",clothing,[],,
1,TO122O08I-Q11,LOGO TEE - T-shirt imprimé - black,"{'original': '39,95 €', 'promotional': '31,95 ...","[S, M, L, XL, XXL, 3XL]",tommy-hilfiger-logo-tee-t-shirt-imprime-to122o...,[{'path': 'spp-media-p1/6a1582f6fde836a5917156...,Tommy Hilfiger,False,[],"[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],,
2,LA222O04B-K12,T-shirt imprimé - marine/guepe/blanc,"{'original': '59,95 €', 'promotional': '29,95 ...","[XS, S, M, L, XL, XXL, 3XL, 4XL]",lacoste-t-shirt-imprime-marineguepeblanc-la222...,[{'path': 'spp-media-p1/15ad9270132737eb84ce21...,Lacoste,False,[],"[{'key': 'discountRate', 'value': 'Jusqu’à -50...",clothing,[],,
3,C1822O09X-Q11,BACK INSTITUTIONAL TEE - T-shirt imprimé - black,"{'original': '37,95 €', 'promotional': '21,05 ...","[XS, S, M, L, XL, XXL]",calvin-klein-jeans-back-institutional-tee-t-sh...,[{'path': 'spp-media-p1/3433d9ab9c9337069c52f5...,Calvin Klein Jeans,False,[],"[{'key': 'discountRate', 'value': '-45%', 'tra...",clothing,[],,
4,TO152B00S-Q11,CLASSIC - Casquette - schwarz (15),"{'original': '29,95 €', 'promotional': '20,95 ...",[One Size],tommy-hilfiger-classic-casquette-black-to152b0...,[{'path': 'spp-media-p1/90294d3334933b8d93cff3...,Tommy Hilfiger,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",accessoires,[],,
5,TO122S06Q-Q11,LOGO HOODY - Sweat à capuche - black,"{'original': '99,95 €', 'promotional': '84,95 ...","[S, M, L, XL, XXL, 3XL]",tommy-hilfiger-logo-hoody-sweatshirt-to122s06q...,[{'path': 'spp-media-p1/99c495d0153e3c5c8881e6...,Tommy Hilfiger,False,[],"[{'key': 'discountRate', 'value': '-15%', 'tra...",clothing,[],,
6,TH342B06D-Q14,DREW PEAK - Sweat à capuche - black,"{'original': '74,95 €', 'promotional': '63,45 ...","[XS, S, M, L, XL, XXL]",the-north-face-drew-peak-hoodie-sweatshirt-th3...,[{'path': 'spp-media-p1/d9d6dab33e4f308c95d8aa...,The North Face,False,[],"[{'key': 'discountRate', 'value': '-15%', 'tra...",clothing,[],,
7,AS142A0I8-Q11,GEL-MISSION 3 - Chaussures de course - black/c...,"{'original': '54,95 €', 'promotional': '43,95 ...","[40, 41.5, 42, 42.5, 43.5, 44, 44.5, 45, 46, 4...",asics-gel-mission-3-chaussures-de-course-black...,[{'path': 'spp-media-p1/daf2bd54768a3f6c842ddf...,ASICS,False,[],"[{'key': 'discountRate', 'value': '-20%', 'tra...",shoe,[],,
8,PI922O0PY-A11,7 PACK - T-shirt basique - white/blue/green,"{'original': '38,99 €', 'promotional': '31,29 ...","[XS, S, M, L, XL, XXL]",pier-one-7-pack-t-shirt-basique-whitebluegreen...,[{'path': 'spp-media-p1/0447efe4c16c36a89a0976...,Pier One,False,[],"[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],,
9,NA622T02N-Q11,RAINFOREST WINTER - Veste mi-saison - black,"{'original': '199,95 €', 'promotional': '139,9...","[XS, S, M, L, XL, XXL]",napapijri-rainforest-winter-veste-mi-saison-bl...,[{'path': 'spp-media-p1/f4b922085dcb3c309db984...,Napapijri,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",clothing,[],,


76440

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [95]:
import re

list_prices_disc = []

for i in range(len(df_clothes)):
    sku = df_clothes.iloc[i].get('sku')
    brand = df_clothes.iloc[i].get('brand_name')
    discount = float(re.findall('[-\d]{1,3}', dict(df_clothes.iloc[i].get('flags')[0]).get('value').replace('%',''))[0])
    original_price = float(df_clothes.iloc[i].get('price').get('original').replace(u'\xa0€', '').replace(',','.'))
    promotional_price = float(df_clothes.iloc[i].get('price').get('promotional').replace(u'\xa0€', '').replace(',','.'))
    list_prices_disc.append([sku, brand, discount, original_price, promotional_price])
    
df_list_prices_disc = pd.DataFrame(list_prices_disc, columns=['SKU','Brand','Discount','OriginalPrice','PromotionalPrice'])
df_list_prices_disc.info()
df_list_prices_disc.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76440 entries, 0 to 76439
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   SKU               76440 non-null  object 
 1   Brand             76440 non-null  object 
 2   Discount          76440 non-null  float64
 3   OriginalPrice     76440 non-null  float64
 4   PromotionalPrice  76440 non-null  float64
dtypes: float64(3), object(2)
memory usage: 2.9+ MB


Unnamed: 0,Discount,OriginalPrice,PromotionalPrice
count,76440.0,76440.0,76440.0
mean,-24.044021,59.800039,44.372194
std,11.553133,39.168537,28.914689
min,-71.0,12.99,11.69
25%,-30.0,38.99,27.99
50%,-20.0,39.99,31.95
75%,-15.0,74.95,54.95
max,-5.0,234.95,187.95


In [103]:
#Sum of discounts of all goods
sum(df_list_prices_disc['PromotionalPrice']) / sum(df_list_prices_disc['OriginalPrice'])

0.7420094429561128

In [102]:
#Highest discount products
df_list_prices_disc[df_list_prices_disc['Discount'] == min(df_list_prices_disc['Discount'])]

Unnamed: 0,SKU,Brand,Discount,OriginalPrice,PromotionalPrice
81,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
165,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
249,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
333,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
417,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
...,...,...,...,...,...
60899,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
60983,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
61067,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
61235,THC22O00A-T11,Threadbare,-71.0,69.99,19.95
