# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [3]:
# your code here
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import urllib


## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [6]:
# your code here
url='https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=84&sort=sale'
json_url = urllib.request.urlopen(url)
results = json.loads(json_url.read())
flattened_data = pd.json_normalize(results)
flattened_data1 = pd.json_normalize(flattened_data.articles[0])
flattened_data1

total_pages=results['pagination']['page_count']
print(f'Total Number of pages: {total_pages}')
total_pages = 100 #892*84 seems like too much data

# Your code
df=pd.DataFrame()
for i in range(total_pages):
    k=84*i
    url=f'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset={k}&sort=sale'
    json_url = urllib.request.urlopen(url)
    results = json.loads(json_url.read())
    flattened_data = pd.json_normalize(results)
    flattened_data1 = pd.json_normalize(flattened_data.articles[0])
    flattened_data1=flattened_data1.set_index('sku')
    df = df.append(flattened_data1)

print(df.shape)


Total Number of pages: 892
(8400, 21)
                                                            name  \
sku                                                                
TS122H00S-Q11                    BLIGHT - Veste mi-saison - noir   
LA212O03J-A11                  GRADUATE - Baskets basses - white   
AD115O07M-A11  CONTINENTAL 80 SKATEBOARD SHOES - Baskets bass...   
AD115O0J1-A11  STAN SMITH - Baskets basses - footwear white/l...   
LA222O04B-K12               T-shirt imprimé - marine/guepe/blanc   
JA222Q0QM-Q11               JJEEMIL ROLL NECK - Pullover - black   
C1852D027-Q11                            BELT - Ceinture - black   
TS122T009-K11            TERRY - Veste sans manches - total navy   
BRH22Q005-Q11                            HUME - Pullover - black   
LA212O06W-A11           LEROND - Baskets basses - white/navy/red   

                                                           sizes  \
sku                                                                
TS122H00S

In [8]:
# your code here
df.columns

Index(['name', 'sizes', 'url_key', 'media', 'brand_name', 'is_premium',
       'family_articles', 'flags', 'product_group', 'delivery_promises',
       'price.original', 'price.promotional', 'price.has_different_prices',
       'price.has_different_original_prices',
       'price.has_different_promotional_prices',
       'price.has_discount_on_selected_sizes_only', 'outfits', 'amount',
       'price.base_price', 'condition', 'condition_key'],
      dtype='object')

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [9]:
df.brand_name.value_counts().index[0] #trending brand (most items of that brand on website)

'Jack & Jones'

In [11]:
#Our data is still text. Convert prices into numbers:
df['price.original']=df['price.original'].str.extract('(\d*,\d*)')
df['price.promotional']=df['price.promotional'].str.extract('(\d*,\d*)')

df['price.original'] = [x.replace(',', '.') for x in df['price.original']]
df['price.promotional'] = [x.replace(',', '.') for x in df['price.promotional']]

In [13]:
df['discount_amount']=df['price.original'].astype(float)-df['price.promotional'].astype(float)
df1=df.copy()
total_disc=df1.groupby(['brand_name']).sum().discount_amount
total_disc.sort_values(ascending=False).index[0] #Item with largest discount

'Tommy Hilfiger'

In [62]:
df2 = df.copy()
df2['price.original']= df['price.original'].astype(float)
total_original_price = df2.groupby(['brand_name'])['price.original'].sum()
total_original_price



brand_name
'47                  79.80
11 DEGREES          124.90
274                 106.85
A.S.98              229.95
ALDO               2062.50
                    ...   
camano               30.90
edc by Esprit       129.97
le coq sportif      198.00
s.Oliver            639.71
sergio tacchini     179.70
Name: price.original, Length: 339, dtype: float64