# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [None]:
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import urllib.request

In [None]:
# your code here

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [2]:
#Import libraries
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import urllib.request

In [3]:
#Define URL
url='https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=84'
#Make request
response = urllib.request.urlopen(url)
#Abres en formato json
results = json.load(response)
#"Aplanas" todo el json en un pd df
flattened_data = pd.json_normalize(results)
#Aplanas el json que tiene los articulos en un pd df
flattened_data1 = pd.json_normalize(flattened_data.articles[0])
#Revisamos el data frame
flattened_data1.head(2)

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,outfits,amount
0,LA222O045-C11,T-shirt imprimé - silver chine,"[XS, S, M, L, XL, XXL, 3XL]",lacoste-t-shirt-imprime-silver-chine-la222o045...,[{'path': 'spp-media-p1/458091b2d9cb3b55b037b9...,Lacoste,False,[],"[{'key': 'discountRate', 'value': 'Jusqu’à -50...",clothing,[],"46,95 €","23,45 €",True,False,True,False,,
1,AD115B01K-A11,STAN SMITH - Baskets basses - run white/new navy,"[36, 38, 40, 42, 44, 46, 48, 36 2/3, 37 1/3, 3...",adidas-originals-stan-smith-baskets-basses-bla...,[{'path': 'spp-media-p1/19047b3703d63f398e2d96...,adidas Originals,False,[],"[{'key': 'discountRate', 'value': 'Jusqu’à -20...",shoe,[],"94,95 €","75,95 €",True,False,True,False,,


In [6]:
"""EL num d pags esta dentro del json "results", dentro de la llave 'pagination' que a su vez tiene la llave
'page_count' cuyo valor estamos trayendo"""

pages = results['pagination']['page_count']
pages

892

In [15]:
#Creamos df vacio
df_men = pd.DataFrame()
#Creamos loop para traer articulos d todaas las pags
#hago pages/5 pq no corria con todas XD
for i in range(int(pages/5)):
    #creamos variable que definira la pag por medio del num d articulos
    #84 por pag = x/84=num de pag
    k=84*i
    #Creamos url con format para ir cambiando pag con la variable k
    url=f'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset={k}'
    #Make request
    response = urllib.request.urlopen(url)
    #Abres en formato json
    results = json.load(response)
    #"Aplanas" todo el json en un pd df
    flattened_data = pd.json_normalize(results)
    #Aplanas el json que tiene los articulos en un pd df
    flattened_data1 = pd.json_normalize(flattened_data.articles[0])
    #Ponemos el sku como indice
    flattened_data1=flattened_data1.set_index('sku')
    #Agregamos el df generado en cada pag a un df maestro
    df_men=df_men.append(flattened_data1)

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [26]:
#Creas series solo con el numero d veces q aparece cada marca y sacas el mas alto

df_men.brand_name.value_counts().head(1)

Pier One    678
Name: brand_name, dtype: int64

In [28]:
#Cambiamos los datos a numero
#Quitamos todo lo q no sean digitos 
df_men['price.original']=df_men['price.original'].str.extract('(\d*,\d*)')
df_men['price.promotional']=df_men['price.promotional'].str.extract('(\d*,\d*)')
#cambiamos , por . pq nomenclatura europea
df_men['price.original'] = [x.replace(',', '.') for x in df_men['price.original']]
df_men['price.promotional'] = [x.replace(',', '.') for x in df_men['price.promotional']]
#Creamos una columna calculada con el descuento cambiando a float los dos precios original y promocional y restandolos
df_men['discount_amount']=df_men['price.original'].astype(float)-df_men['price.promotional'].astype(float)

In [46]:
df_men[['name','discount_amount']].sort_values('discount_amount',ascending=False).head()

Unnamed: 0_level_0,name,discount_amount
sku,Unnamed: 1_level_1,Unnamed: 2_level_1
DI122T049-K11,L-MAY JACKET - Veste en cuir - blue,298.0
CAR22A00I-K11,TROPICAL SLIM SUIT - Costume - blue,275.0
T1022A05G-Q11,PACKABLE SLIM FLEX STRIPE SUIT - Costume - blue,275.0
T1022A05T-K11,PIECE WOOL BLEND SLIM SUIT - Costume - blue,275.0
SC922D00M-702,Veste en cuir - dark brown,225.0


In [60]:
f"{round((df_men['price.promotional'].astype(float).sum())/(df_men['price.original'].astype(float).sum())*100,2)}% of total discounts"

'76.8% of total discounts'