# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
# your code here

url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-femme&limit=84&offset=84&sort=popularity'

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [2]:
# your code here
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

In [3]:
url= 'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=84&sort=popularity'

In [4]:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

# Doing the request to the api, headers= user agent
response = requests.get(url, headers= headers)
results = response.json()
results
# Display the naked json
results["pagination"]["page_count"]

607

In [5]:
#Normalizing the first level of the JSON dictionary

df = json_normalize(results)

#Getting 'articles', the column we're interested in

df = json_normalize(df.articles[0])

print(df)

              sku                                               name  \
0   BB122O04L-G11                 T-SHIRT RN - T-shirt imprimé - red   
1   LE252F000-Q11   VINTAGE TWO HORSE - Portefeuille - regular black   
2   PI922EA0O-Q11                                      Chino - black   
3   JA222O2C4-K11  JCOBOOSTER TEE CREW NECK 2 PACK - T-shirt impr...   
4   TO122Q07C-K11                    SILK ZIP MOCK - Pullover - blue   
..            ...                                                ...   
79  NI122E06R-C11  PANT CARGO - Pantalon de survêtement - grey he...   
80  BB122P01R-K15                          PALLAS - Polo - dark blue   
81  NI122S0AT-K11                  CLUB - Sweatshirt - midnight navy   
82  PI922Q00F-C11                            Pullover - grey melange   
83  NE352P01U-T11  LEAGUE ESSENTIAL - Casquette - new york yankee...   

                       sizes  \
0                 [M, L, XL]   
1                 [One Size]   
2   [29, 30, 31, 32, 33, 34]   
3      

In [6]:
lst = []
for i in range(results["pagination"]["page_count"]): # Using just 10 pages
    count = 84*i
    urls = 'https://www.zalando.fr/api/catalog/articles?categories=promo-enfant&limit=84&offset='+str(count)+'&sort=sale'
    response_all = requests.get(urls, headers =headers)
    results_all = response_all.json()
    flatten_data_all = json_normalize(results_all)
    flatten_data_all = json_normalize(flatten_data_all.articles[0])
    lst.append(flatten_data_all)
    
df = pd.concat(lst)

# Your code

df.index = df['sku']

df['price.original'].head()

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  # This is added back by InteractiveShellApp.init_path()


sku
NI116D04L-Q11     54,95 €
NI114D0B9-Q14    104,95 €
NI114D0C8-Q11    114,95 €
AD116D007-A11     54,95 €
AD126L00J-Q11     64,95 €
Name: price.original, dtype: object

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [7]:
# The trending brand
df['brand_name'].value_counts().head(5)

Name it     1018
GAP          714
Esprit       675
Boboli       671
Benetton     529
Name: brand_name, dtype: int64

In [8]:
#Our data is still text. Convert prices into numbers.
import re

def extract_nums(string):
    string = string[:-2]
    num = string.replace(',', '.').replace(' ', '')
    return num
   

df['price.original'] = df['price.original'].apply(extract_nums).astype(float)
df['price.promotional'] = df['price.promotional'].apply(extract_nums).astype(float)

df[['price.original', 'price.promotional']]

Unnamed: 0_level_0,price.original,price.promotional
sku,Unnamed: 1_level_1,Unnamed: 2_level_1
NI116D04L-Q11,54.95,43.95
NI114D0B9-Q14,104.95,83.95
NI114D0C8-Q11,114.95,91.95
AD116D007-A11,54.95,38.45
AD126L00J-Q11,64.95,41.95
...,...,...
LE314G02K-K11,54.95,54.95
N1243D12J-Q11,55.00,55.00
LE314E01O-K11,54.95,54.95
CA313L002-J11,40.00,40.00


In [9]:
# Products with highest discount:
df['total.discount'] = df['price.original'] - df['price.promotional']

df['total.discount'].sort_values(ascending= False)

sku
AJ223L006-G11    164.0
M4O23F004-K11    150.0
TH343F01R-C11    150.0
KJ143F00J-H11    148.0
M4O23F006-T11    132.0
                 ...  
KE543B00D-K11      0.0
GI116L014-C11      0.0
CA316I006-C11      0.0
L5214K00M-K11      0.0
CA313L002-J11      0.0
Name: total.discount, Length: 22009, dtype: float64

In [10]:
# The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

discounts = df['price.promotional'].sum()
all_goods = df['price.original'].sum()

sum_of_discounts = discounts / all_goods

sum_of_discounts


0.7385473713581384