# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
# your code here
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

In [2]:
url='https://www.zalando.fr/api/catalog/articles?categories=promo-sport-homme&limit=84&offset=84&sort=sale'
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}


In [3]:
response = requests.get(url,headers=headers)
response

<Response [200]>

In [4]:
result=response.json()
result
data=pd.DataFrame(json_normalize(result))
data.head()

Unnamed: 0,total_count,sort,articles,query_path,previous_page_path,next_page_path,page_gender,premium,filters,total_article_count,...,iconPaths.filters.standard_delivery_filter,iconPaths.filters.fast_delivery_filter,iconPaths.filters.zalando_plus,iconPaths.mobileFilters.standard_delivery_filter,iconPaths.mobileFilters.fast_delivery_filter,iconPaths.mobileFilters.zalando_plus,iconPaths.flags.slow_delivery_flag,iconPaths.flags.fast_delivery_flag,iconPaths.flags.plus_delivery_flag,iconPaths.flags.zalando_plus
0,5850,sale,"[{'sku': 'N1242A1R4-K11', 'name': 'JOYRIDE RUN...",/promo-sport-homme/?p=2&order=sale,/promo-sport-homme/?order=sale,/promo-sport-homme/?p=3&order=sale,men,False,"[{'key': 'sizes', 'label': 'Taille', 'url_key'...",5852,...,icons/truck.svg,icons/truck-fast.svg,icons/plus-short-1.svg,icons/truck.svg,icons/truck-fast.svg,icons/plus-short-1.svg,icons/clock.svg,icons/truck-fast-orange-3.svg,icons/plus-short-1.svg,icons/zalando-plus.svg


## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [5]:
# your code here
data2=json_normalize(data['articles'][0])
data2.head(1)

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,amount,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only
0,N1242A1R4-K11,JOYRIDE RUN FK - Chaussures de running neutres...,"[38.5, 40, 40.5, 41, 42, 42.5, 43, 44, 44.5, 4...",nike-performance-joyride-run-chaussures-de-run...,[{'path': 'N1/24/2A/1R/4K/11/N1242A1R4-K11@5.j...,Nike Performance,False,"[{'sku': 'N1242A1R4-K11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",shoe,325 g,[],"179,95 €","125,95 €",True,False,True,False


In [6]:
total_articles=result['total_count']

total_articles

5850

In [7]:
articles_page=result['pagination']['per_page']
articles_page

84

In [9]:
total_articles=result['total_count']
articles_page=result['pagination']['per_page']

calls_api=[i for i in range(84,total_articles+1,articles_page)]
calls_api.append(total_articles-calls_api[-1]+calls_api[-1])
final_result=pd.DataFrame()
for i in calls_api:
    if i%84==0:
        url=f'https://www.zalando.fr/api/catalog/articles?categories=promo-sport-homme&limit=84&offset={i}&sort=sale'
    else:
        x=calls_api[-1]-calls_api[-2]
        url=f'https://www.zalando.fr/api/catalog/articles?categories=promo-sport-homme&limit={x}&offset={i}&sort=sale'        
    
    response = requests.get(url,headers=headers)
    result=response.json()
    data=pd.DataFrame(json_normalize(result))
    data2=json_normalize(data['articles'][0])
    final_result=final_result.append(data2)
    

In [10]:
final_result.head(1)

Unnamed: 0,amount,brand_name,delivery_promises,family_articles,flags,is_premium,media,name,outfits,price.has_different_original_prices,price.has_different_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,price.original,price.promotional,product_group,sizes,sku,url_key
0,325 g,Nike Performance,[],"[{'sku': 'N1242A1R4-K11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,[{'path': 'N1/24/2A/1R/4K/11/N1242A1R4-K11@5.j...,JOYRIDE RUN FK - Chaussures de running neutres...,,False,True,True,False,"179,95 €","125,95 €",shoe,"[38.5, 40, 40.5, 41, 42, 42.5, 43, 44, 44.5, 4...",N1242A1R4-K11,nike-performance-joyride-run-chaussures-de-run...


## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [11]:
# your code here
Brand = final_result.groupby('brand_name', as_index=False)['sku'].count()
Brand=Brand.rename(columns={"sku": "Num_Items"})
Brand.sort_values(by=['Num_Items'], ascending=False).head(10)

Unnamed: 0,brand_name,Num_Items
100,Nike Performance,974
157,adidas Performance,616
115,Puma,345
147,Under Armour,236
139,The North Face,174
27,Columbia,169
117,Quiksilver,152
24,Champion,120
98,New Era,100
3,ASICS,100


In [12]:
final_result['price.original']=final_result['price.original'].replace('[\€,]', '', regex=True).astype(float)/100
final_result['price.promotional']=final_result['price.promotional'].replace('[\€,]', '', regex=True).astype(float)/100
final_result['discount']=final_result['price.original']-final_result['price.promotional']

final_result.head(2)


Unnamed: 0,amount,brand_name,delivery_promises,family_articles,flags,is_premium,media,name,outfits,price.has_different_original_prices,price.has_different_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,price.original,price.promotional,product_group,sizes,sku,url_key,discount
0,325 g,Nike Performance,[],"[{'sku': 'N1242A1R4-K11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,[{'path': 'N1/24/2A/1R/4K/11/N1242A1R4-K11@5.j...,JOYRIDE RUN FK - Chaussures de running neutres...,,False,True,True,False,179.95,125.95,shoe,"[38.5, 40, 40.5, 41, 42, 42.5, 43, 44, 44.5, 4...",N1242A1R4-K11,nike-performance-joyride-run-chaussures-de-run...,54.0
1,,Nike Performance,[],"[{'sku': 'N1242D2GQ-G11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -20...",False,[{'path': 'N1/24/2D/2G/QG/11/N1242D2GQ-G11@9.j...,AS ROM TEE TRAVEL CREST - Article de supporter...,,False,True,True,False,29.95,23.95,clothing,"[S, M, L, XL, XXL]",N1242D2GQ-G11,nike-performance-as-rom-tee-travel-crest-t-shi...,6.0


In [13]:
Discounts = final_result.groupby('brand_name', as_index=False)['discount'].sum()
Products =final_result.groupby('brand_name', as_index=False)['sku'].count()
Agg_disc=Discounts.merge(Products,left_on='brand_name',right_on='brand_name')
Agg_disc=Agg_disc.rename(columns={"sku": "Num_Items"})
Agg_disc['avg_discount']=Agg_disc['discount']/Agg_disc['Num_Items']

Agg_disc.sort_values(by=['avg_discount'], ascending=False).head(10)

Unnamed: 0,brand_name,discount,Num_Items,avg_discount
26,Colmar,824.0,6,137.333333
4,Aigle,135.0,1,135.0
136,State of Elevenate,215.0,2,107.5
110,PYUA,3164.89,31,102.093226
95,Mo,303.84,3,101.28
56,Halti,278.0,3,92.666667
15,Bogner Fire + Ice,4957.2,55,90.130909
82,Kjus,1078.0,12,89.833333
96,Napapijri,806.0,9,89.555556
75,KARL LAGERFELD,82.5,1,82.5


In [14]:
Agg_disc.sort_values(by=['avg_discount'], ascending=True).head(10)

Unnamed: 0,brand_name,discount,Num_Items,avg_discount
158,camano,1.5,1,1.5
153,Your Turn Active,118.6,28,4.235714
18,Buff,32.15,7,4.592857
131,Smartwool,41.6,9,4.622222
33,Derbystar,4.97,1,4.97
6,Arena,10.3,2,5.15
9,Barts,133.8,24,5.575
74,K1X,194.4,28,6.942857
133,Speedo,7.0,1,7.0
49,Forvert,42.0,6,7.0
