# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

1. Import libraries.

In [1]:
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

2. Define the initial API endpoint URL.

In [2]:
url = "https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=0&sort=popularity"

3. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

In [3]:
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"}

In [4]:
response = requests.get(url, headers = headers) 
results = response.json()
results

{'total_count': 51330,
 'pagination': {'page_count': 612, 'current_page': 1, 'per_page': 84},
 'sort': 'popularity',
 'articles': [{'sku': 'TI112C04A-702',
   'name': '6 INCH PREMIUM - Bottes de neige - wheat',
   'price': {'original': '219,95\xa0€',
    'promotional': '175,95\xa0€',
    'has_different_prices': False,
    'has_different_original_prices': False,
    'has_different_promotional_prices': False,
    'has_discount_on_selected_sizes_only': False},
   'sizes': ['40',
    '41',
    '41.5',
    '42',
    '43',
    '43.5',
    '44',
    '44.5',
    '45',
    '45.5',
    '46',
    '47.5',
    '49',
    '50'],
   'url_key': 'timberland-6-inch-premium-boots-a-lacets-marron-ti112c04a-702',
   'media': [{'path': 'TI/11/2C/04/A7/02/TI112C04A-702@19.2.jpg',
     'role': 'DEFAULT',
     'packet_shot': True},
    {'path': 'TI/11/2C/04/A7/02/TI112C04A-702@24.1.jpg',
     'role': 'HOVER',
     'packet_shot': False}],
   'brand_name': 'Timberland',
   'is_premium': False,
   'family_articles

In [5]:
flattened_data = json_normalize(results)
flattened_data
flattened_data["articles"]
flattened_data.articles[0]
articles = json_normalize(flattened_data.articles[0])   
articles.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,...,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source,outfits
0,TI112C04A-702,6 INCH PREMIUM - Bottes de neige - wheat,"[40, 41, 41.5, 42, 43, 43.5, 44, 44.5, 45, 45....",timberland-6-inch-premium-boots-a-lacets-marro...,[{'path': 'TI/11/2C/04/A7/02/TI112C04A-702@19....,Timberland,False,"[{'sku': 'TI112C04A-702', 'url_key': 'timberla...","[{'key': 'discountRate', 'value': '-20%', 'tra...",shoe,...,"219,95 €","175,95 €",False,False,False,False,[https://ccp-et.metrigo.zalan.do/event/sbv?z=c...,https://ccp-et.metrigo.zalan.do/event/sbv?z=c6...,ccp,
1,GA322S02Y-A11,MEDIUM SHIELD HOODIE - Sweat à capuche - eggshell,"[XS, S, M, L, XL, XXL, 3XL]",gant-medium-shield-hoodie-sweatshirt-ga322s02y...,[{'path': 'GA/32/2S/02/YA/11/GA322S02Y-A11@4.j...,GANT,False,"[{'sku': 'GA322S02Y-A11', 'url_key': 'gant-med...","[{'key': 'discountRate', 'value': '-10%', 'tra...",clothing,...,"99,95 €","89,95 €",False,False,False,False,[https://ccp-et.metrigo.zalan.do/event/sbv?z=c...,https://ccp-et.metrigo.zalan.do/event/sbv?z=c6...,ccp,
2,1MI52F01P-O11,TALL CARD CASE - Étui pour cartes de visite - ...,[One Size],michael-kors-tall-card-case-etui-pour-cartes-d...,[{'path': '1M/I5/2F/01/PO/11/1MI52F01P-O11@8.j...,Michael Kors,False,"[{'sku': '1MI52F01P-O11', 'url_key': 'michael-...","[{'key': 'discountRate', 'value': '-60%', 'tra...",accessoires,...,"59,95 €","23,95 €",False,False,False,False,[https://ccp-et.metrigo.zalan.do/event/sbv?z=c...,https://ccp-et.metrigo.zalan.do/event/sbv?z=c6...,ccp,
3,AD115B01K-A12,STAN SMITH STREETWEAR-STYLE SHOES - Baskets ba...,"[36, 38, 40, 42, 44, 46, 48, 50, 36 2/3, 37 1/...",adidas-originals-stan-smith-baskets-basses-ad1...,[{'path': 'AD/11/5B/01/KA/12/AD115B01K-A12@12....,adidas Originals,False,"[{'sku': 'AD115B01K-A12', 'url_key': 'adidas-o...","[{'key': 'discountRate', 'value': '-10%', 'tra...",shoe,...,"94,95 €","85,45 €",False,False,False,False,,,,
4,N1242E0W3-N12,PANT TAPER - Pantalon de survêtement - cargo k...,"[S, L, XL]",nike-performance-pant-taper-pantalon-de-survet...,[{'path': 'N1/24/2E/0W/3N/12/N1242E0W3-N12@9.j...,Nike Performance,False,"[{'sku': 'N1242E0W3-N12', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -15...",clothing,...,"42,95 €","36,45 €",True,False,True,False,,,,


4. Find out the total page count in the 1st page data.

In [6]:
page_count = results["pagination"]["page_count"] 
page_count

612

5. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

In [7]:
data = pd.DataFrame()

for page in range(page_count):
    url="https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=84&sort=popularity"
    response = requests.get(url, headers = headers) 
    results = response.json()
    flattened_data2 = json_normalize(results)
    flattened_data2 = json_normalize(flattened_data2.articles[0]) 
    flattened_data.append(flattened_data2)
    data = data.append(flattened_data2,sort=False)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


6. Print and review the data you obtained.

In [8]:
data.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,outfits,amount,price.base_price
0,PO222D0IG-A11,NATURAL SLIM FIT - Chemise - white,"[M, XL, XXL]",polo-ralph-lauren-natural-slim-fit-chemise-whi...,[{'path': 'PO/22/2D/0I/GA/11/PO222D0IG-A11@9.j...,Polo Ralph Lauren,True,"[{'sku': 'PO222D0IG-A11', 'url_key': 'polo-ral...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],"99,95 €","79,95 €",False,False,False,False,,,
1,N1242A1R6-C12,AIR MAX ALPHA TRAINER 2 - Chaussures d'entraîn...,"[38.5, 39, 40, 40.5, 41, 42, 44, 44.5, 45]",nike-performance-air-max-alpha-trainer-2-chaus...,[{'path': 'N1/24/2A/1R/6C/12/N1242A1R6-C12@2.j...,Nike Performance,False,"[{'sku': 'N1242A1R6-C12', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -20...",shoe,[],"79,95 €","63,95 €",True,False,True,False,,,
2,C1422E028-K11,BARNES PANT - Pantalon de survêtement - dark n...,"[S, L, XXL]",carhartt-wip-barnes-pant-pantalon-de-surveteme...,[{'path': 'C1/42/2E/02/8K/11/C1422E028-K11@6.j...,Carhartt WIP,False,"[{'sku': 'C1422E028-K11', 'url_key': 'carhartt...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],"89,95 €","71,95 €",False,False,False,False,"[{'id': 'n5uzd__ETt6', 'url_key': '/outfits/n5...",,
3,PO222P04J-G11,SLIM SHORT SLEEVE - Polo - classic wine,"[M, XL, XXL]",polo-ralph-lauren-slim-fit-polo-po222p04j-g11,[{'path': 'PO/22/2P/04/JG/11/PO222P04J-G11@8.j...,Polo Ralph Lauren,True,"[{'sku': 'PO222P04J-G11', 'url_key': 'polo-ral...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],"109,95 €","87,95 €",False,False,False,False,,,
4,C1422O060-K11,MONUMENT - T-shirt imprimé - blue,"[XS, S, M, L, XL]",carhartt-wip-monument-t-shirt-imprime-c1422o06...,[{'path': 'C1/42/2O/06/0K/11/C1422O060-K11@7.j...,Carhartt WIP,False,"[{'sku': 'C1422O060-K11', 'url_key': 'carhartt...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],"34,95 €","27,95 €",False,False,False,False,,,


## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

1. The trending brand.

In [9]:
Trending_brand = pd.DataFrame(data["brand_name"].value_counts())
Trending_brand

Unnamed: 0,brand_name
Nike Performance,5664
Carhartt WIP,4392
Pier One,4150
Jack & Jones,3818
Polo Ralph Lauren,3781
Nike Sportswear,3471
Levi's®,2642
Tommy Jeans,1600
Lacoste,1516
BOSS,1394


2. The product(s) with the highest discount.

In [10]:
def cleaning_time(data, series):
    data[series] = [product.strip("€").strip() for product in data[series]]
    data[series] = data[series].str.replace(',','.')
    data[series] = pd.to_numeric(data[series],errors="coerce",downcast="float")

cleaning_time(data,"price.original")
cleaning_time(data,"price.promotional")

In [11]:
data["discount"] = data["price.original"] - data["price.promotional"]

In [15]:
data = data.sort_values(by=["discount"], ascending=False)
data[["brand_name","name","discount"]].head()

Unnamed: 0,brand_name,name,discount
57,BOSS,Portefeuille - black,171.0
58,BOSS,Portefeuille - black,171.0
65,BOSS,Portefeuille - black,171.0
66,BOSS,Portefeuille - black,171.0
58,BOSS,Portefeuille - black,171.0


3. The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [16]:
sum_of_discounts = data["price.promotional"].sum() / data["price.original"].sum()
sum_of_discounts

0.7122527