# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Write your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

In [2]:
# your code here

url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-femme&limit=84&offset=84&sort=popularity'

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [11]:
# your code here

headers = {'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}

response = requests.get(url, headers=headers)
results = response.json()

In [4]:
dataframe = json_normalize(results)

dataframe = json_normalize(dataframe.articles[0])

In [5]:
# Get the total number of pages

lst = []


# range (using 10 instead of total page number to avoid long execution times)

for i in range(15):
    count = 84*i
    urls = 'https://www.zalando.fr/api/catalog/articles?categories=promo-enfant&limit=84&offset='+str(count)+'&sort=sale'
    response_all = requests.get(urls, headers =headers)
    results_all = response_all.json()
    flatten_data_all = json_normalize(results_all)
    flatten_data_all = json_normalize(flatten_data_all.articles[0])
    lst.append(flatten_data_all)

dataframe = pd.concat(lst)

dataframe.index = dataframe['sku']

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

### The trending brand:


In [10]:
dataframe['brand_name'].value_counts()

Nike Sportswear     98
Nike Performance    47
Lemon Beret         35
adidas Originals    33
s.Oliver            33
                    ..
Palladium            1
Scotch Shrunk        1
Boboli               1
The New              1
UGG                  1
Name: brand_name, Length: 107, dtype: int64

### The product(s) with the highest discount:

In [7]:
# The product(s) with the highest discount:

import re

def remove_euro(string):
    
    '''
    string = consider only until the last number
    reg = replace the comma for a dot
    '''
    
    string = string[:-2]
    reg = string.replace(",",".")
    return reg

# Apply the function to both price columns

dataframe['price.original'] = dataframe['price.original'].apply(remove_euro).astype(float)            
dataframe['price.promotional'] = dataframe['price.promotional'].apply(remove_euro).astype(float)            

In [8]:
# Create a total_discount column

dataframe['total_discount'] = dataframe['price.original'] - dataframe['price.promotional']

# Sorting values descending per product so we can find 
dataframe['total_discount'].sort_values(ascending=False)

sku
RO543F01A-A11    72.00
PO224L03D-K11    66.00
PO224L03D-K11    66.00
K1343F06L-K11    66.00
QU124L020-K11    64.99
                 ...  
NL623J011-Q11     1.36
NL623G08P-Q11     0.91
N1243A113-Q11     0.80
VA253I004-K12     0.80
ES123G0JG-C11     0.09
Name: total_discount, Length: 840, dtype: float64

### The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices):

In [15]:
# your code here

sum_discounts = dataframe['price.promotional'].sum()
sum_original_prices = dataframe['price.original'].sum()

sum_discounts_all_goods = sum_discounts/sum_original_prices
sum_discounts_all_goods

0.7252948058948033