# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-femme&limit=84'

In [2]:
headers = {
    'Host': 'www.zalando.fr',
    'Accept': '*/*',
    'Cache-Control': 'no-cache',
    'Accept-Encoding': 'gzip, deflate',
    'Connection' : 'keep-alive',
    'User-Agent':'PostmanRuntime/7.23.0'
}

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [3]:
# 1. Import libraries.
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

In [4]:
# 2. Define the initial API endpoint URL.
url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-femme&limit=84'

In [5]:
# 3. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.
response = requests.get(url, headers=headers).json()

In [6]:
zalando = pd.json_normalize(response['articles'])
zalando.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,...,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source,outfits,amount,price.base_price
0,ES421I0H8-K11,Gilet - navy,"[S, M, L, XL, XXL, 3XL]",esprit-collection-cardigan-gilet-navy-es421i0h...,[{'path': 'spp-media-p1/de329d5209133039a649e3...,Esprit Collection,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",clothing,...,False,False,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,
1,LA251H01P-Q11,Cabas - noir,[One Size],lacoste-cabas-black-la251h01p-q11,[{'path': 'spp-media-p1/9e976d99d30b39d98b4c59...,Lacoste,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",accessoires,...,False,False,False,False,,,,,,
2,LE221G031-K11,ORIGINAL TRUCKER - Veste en jean - soft as but...,"[XXS, XS, S, M, L, XL]",levisr-original-trucker-veste-en-jean-soft-as-...,[{'path': 'spp-media-p1/b06b79cb3ac83c549f3008...,Levi's®,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",clothing,...,False,False,False,False,,,,"[{'id': 'z7fZ7k9yR-q', 'url_key': '/outfits/z7...",,
3,5VA51H0GE-B11,Sac à main - ecru,[One Size],valentino-by-mario-valentino-sac-a-main-ecru-5...,[{'path': 'spp-media-p1/77e0d3b36786329b8f4e3d...,Valentino Bags,False,[],"[{'key': 'discountRate', 'value': '-25%', 'tra...",accessoires,...,False,False,False,False,,,,,,
4,LE221N04U-Q11,MILE HIGH SUPER SKINNY - Jeans Skinny - black ...,"[23x28, 23x30, 24x28, 24x30, 24x32, 25x28, 25x...",levisr-mile-high-super-skinny-jeans-skinny-le2...,[{'path': 'spp-media-p1/df7d6a01a4833214aa11c6...,Levi's®,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",clothing,...,False,False,False,False,,,,"[{'id': 'AThyCLljTFy', 'url_key': '/outfits/AT...",,


In [7]:
# 4. Find out the total page count in the 1st page data.
pages = response['pagination']['page_count']
print(f'There are {pages} pages.')

There are 892 pages.


In [8]:
# 5. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

total_articles = []
offset = 0

for page in range(pages):
    total_response = requests.get(url + '&offset='+str(offset), headers=headers).json()
    offset = offset + 84
    total_articles.append(pd.json_normalize(total_response['articles']))

In [9]:
promo_femme = pd.concat(total_articles)
promo_femme.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,...,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source,amount,price.base_price,outfits,condition,condition_key
0,ES421I0H8-K11,Gilet - navy,"[S, M, L, XL, XXL, 3XL]",esprit-collection-cardigan-gilet-navy-es421i0h...,[{'path': 'spp-media-p1/de329d5209133039a649e3...,Esprit Collection,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",clothing,...,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
1,ES121E1DG-Q11,T-shirt imprimé - black,"[36, 38, 40, 42, 44, 46]",esprit-blouse-black-es121e1dg-q11,[{'path': 'spp-media-p1/b0516945db5b37b9bc65ab...,Esprit,False,[],"[{'key': 'discountRate', 'value': 'Jusqu’à -35...",clothing,...,True,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
2,LE115O01Q-A11,COURTSET UNISEX - Baskets basses - optical whi...,"[37, 39, 40, 41, 42, 43, 44, 45, 46]",le-coq-sportif-courtset-unisex-baskets-basses-...,[{'path': 'spp-media-p1/99042f6fcb4f37a89e86f9...,le coq sportif,False,[],"[{'key': 'discountRate', 'value': 'Jusqu’à -30...",shoe,...,True,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
3,LA251H01P-Q11,Cabas - noir,[One Size],lacoste-cabas-black-la251h01p-q11,[{'path': 'spp-media-p1/9e976d99d30b39d98b4c59...,Lacoste,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",accessoires,...,False,False,,,,,,,,
4,LE221G031-K11,ORIGINAL TRUCKER - Veste en jean - soft as but...,"[XXS, XS, S, M, L, XL]",levisr-original-trucker-veste-en-jean-soft-as-...,[{'path': 'spp-media-p1/b06b79cb3ac83c549f3008...,Levi's®,False,[],"[{'key': 'discountRate', 'value': '-30%', 'tra...",clothing,...,False,False,,,,,,,,


In [10]:
promo_femme.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 74928 entries, 0 to 83
Data columns (total 25 columns):
 #   Column                                        Non-Null Count  Dtype 
---  ------                                        --------------  ----- 
 0   sku                                           74928 non-null  object
 1   name                                          74928 non-null  object
 2   sizes                                         74928 non-null  object
 3   url_key                                       74928 non-null  object
 4   media                                         74928 non-null  object
 5   brand_name                                    74928 non-null  object
 6   is_premium                                    74928 non-null  bool  
 7   family_articles                               74928 non-null  object
 8   flags                                         74928 non-null  object
 9   product_group                                 74928 non-null  object
 10  d

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [11]:
# Trending brand
trending_brand = promo_femme['brand_name'].value_counts().nlargest(1).index[0]
print(f'The trending brand is "{trending_brand}".')

The trending brand is "DreiMaster".


In [12]:
# Product(s) with the highest discount
promo_femme['price.original'] = promo_femme['price.original'].apply(lambda price: float((price.split('€')[0]).replace(',', '.')))
promo_femme['price.promotional'] = promo_femme['price.promotional'].apply(lambda price: float((price.split('€')[0]).replace(',', '.')))

promo_femme['total_discount'] = promo_femme['price.original'] - promo_femme['price.promotional']

product_highest_discount = promo_femme.groupby(['name']).sum()['total_discount'].sort_values(ascending=False).nlargest(1).index[0]
print(f'The product with the highest discount is "{product_highest_discount}".')

The product with the highest discount is "Sac à main - schwarz".


In [14]:
# Sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices)
sum_discounted_prices = promo_femme['price.promotional'].sum()

sum_original_prices = promo_femme['price.original'].sum()

sum_discounts = sum_discounted_prices / sum_original_prices
print(sum_discounts)

0.7412792155938533
