# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [2]:
# your code here
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [8]:
#Maneras de aplanar un json

#Esta funcion convierte en un diccionario con todos elementos de response.json
def flatten_json(y): #https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

#Alternativa usar esta libreria 
from flatten_json import flatten
flat_json = flatten(response.json()['articles']

#Pero cual es el sentido de aplanar el json? es mas complicado de leer

In [3]:
# your code here
from time import sleep

url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=0'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36'}
response = requests.get(url, headers=headers)

paginas_count = response.json()['pagination']['page_count'] #892 paginas totales con un offset de 84

url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset='
df1 = pd.DataFrame()

for pagina in range(paginas_count): #Tarda 15minutos minimo
    sleep(1)
    response = requests.get(url+str(pagina*84), headers=headers)
    df2 = pd.DataFrame(response.json()['articles'])
    df1 = pd.concat([df1,df2])
    print(pagina)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


In [5]:
df1.drop_duplicates(['sku'],inplace=True) #borramos los duplicados
df1.reset_index(drop=True,inplace=True) #reseteamos el indiceb

In [8]:
df1.set_index('sku',inplace=True)

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [25]:
# your code here
#The trending brand.
df1.groupby('brand_name').count()[['name']].sort_values('name',ascending=False)

Unnamed: 0_level_0,name
brand_name,Unnamed: 1_level_1
Pier One,347
Jack & Jones,164
INDICODE JEANS,125
Tommy Hilfiger,102
Levi's®,73
...,...
STUDIO ID,1
STOP THE WATER WHILE USING ME!,1
HARRINGTON,1
Hackett London,1


In [29]:
#The product(s) with the highest discount.
import re
df1['original_price'] = df1[['price']].apply( (lambda row:  float(re.search(r".*(?=\xa0\xa0)",row['price']['original']).group().replace(',','.')) ), axis=1)
df1['promotional_price'] = df1[['price']].apply( (lambda row:  float(re.search(r".*(?=\xa0\xa0)",row['price']['promotional']).group().replace(',','.')) ), axis=1)
df1['discount'] = df1['original_price'] - df1['promotional_price']

df1.sort_values('discount',ascending=False).head(1)

Unnamed: 0_level_0,name,price,sizes,url_key,media,brand_name,is_premium,family_articles,flags,tracking_information,product_group,delivery_promises,amount,outfits,condition,condition_key,original_price,promotional_price,discount
sku,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
T1022A05T-K11,PIECE WOOL BLEND SLIM SUIT - Costume - blue,"{'original': '549,00 €', 'promotional': '274,...","[46, 48, 50, 52, 60, 98]",tommy-hilfiger-tailored-piece-wool-blend-slim-...,[{'path': 'spp-media-p1/922aa254b681306085718c...,Tommy Hilfiger Tailored,False,"[{'sku': 'T1022A05T-K11', 'url_key': 'tommy-hi...","[{'key': 'campaign', 'value': 'Prix Mini', 'tr...",,clothing,[],,,,,549.0,274.0,275.0


In [33]:
# The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).
df1['promotional_price'].sum() / df1['original_price'].sum()

0.7642494520971997

In [35]:
df1.groupby('brand_name')[['promotional_price','original_price']].sum()

Unnamed: 0_level_0,promotional_price,original_price
brand_name,Unnamed: 1_level_1,Unnamed: 2_level_1
274,23.95,31.95
ASICS,88.30,111.90
ASICS SportStyle,626.57,775.55
Abercrombie & Fitch,53.95,59.95
Alessandro Zavetti,155.85,194.85
...,...,...
edc by Esprit,181.45,243.95
klairs,21.55,23.95
le coq sportif,221.35,264.85
s.Oliver,95.17,122.97


In [38]:
df1_disc = df1.groupby('brand_name')[['promotional_price','original_price']].sum()

df1_disc['brand_discount']=df1_disc['promotional_price']/df1_disc['original_price']

df1_disc.sort_values('brand_discount',ascending=True)

Unnamed: 0_level_0,promotional_price,original_price,brand_discount
brand_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Gusti Leder,75.90,209.90,0.361601
Petrol Industries,65.75,179.75,0.365786
Quiksilver,87.40,185.97,0.469968
DeFacto Fit,9.99,19.99,0.499750
Hackett London,94.95,189.95,0.499868
...,...,...,...
Brett & Sons,229.40,254.90,0.899961
K-Way,233.90,259.90,0.899962
Armani Exchange,53.50,56.60,0.945230
Columbia,75.95,79.95,0.949969
