# Parsing APIs Example

## Intro

Now we will take a look on a real data. When you parse data from web you will often meet API based web-pages. 

For example [zalando.fr](https://www.zalando.fr/accueil-homme/) is API based web-page. 

In this guided lab you will learn how to obtain the links from webpages and extract the data. Read through this doc, execute the cells in order and make sure you understand the explanations. 

*Note: This guided lab uses Google Chrome. Other browsers like Safari and Firefox have similar tools for developers but they work differently. To save your time in following this lab, it is strongly recommended that you install and use Google Chrome.*

## Obtaining the link

Zalando is discount e-store where you can buy clothes and accesories with discount. When we go to the web-page, we can choose different sections. First the general process will be shown using [Children section](https://www.zalando.fr/accueil-enfant/) as example.

Here we will parse data about promotions only. Therefore, final output will be the DataFrame with all the goods under discount.

[![Image from Gyazo](https://i.gyazo.com/fa4874d8e81c7570273bbfb853d66308.png)](https://gyazo.com/fa4874d8e81c7570273bbfb853d66308)


We go to Promos page. Right click of mouse shows us a list of actions possible, from which we select Inspect.

<img src='https://i.gyazo.com/bccbd11d69c9040dc98758d443e32052.png' width="400">


You will see the menu dropdown on the right side or on the bottom of the window. There you should click on Network:


[![Image from Gyazo](https://i.gyazo.com/f7e0db81cbfee67694183d1a7640bf81.png)](https://gyazo.com/f7e0db81cbfee67694183d1a7640bf81)

Right after the developer part will change showing the files behind the page. In order to obtain only useful files we select the following settings:
1. Preserve Log
2. Select XHR files.

[![Image from Gyazo](https://i.gyazo.com/9a899d4441d9d93e795f79747f1e47d5.png)](https://gyazo.com/9a899d4441d9d93e795f79747f1e47d5)

In order to obtain some files we need to scrool down and go forward to second page. 

[![Image from Gyazo](https://i.gyazo.com/0956eb3d5125075a236c9a439c7749c7.png)](https://gyazo.com/0956eb3d5125075a236c9a439c7749c7)

In the Network panel you can see the following files being uploaded. All the data on the web-page is uploaded from the json file, which is one of the following. It is important to understand which file contains what kind of information. 

<a href="https://gyazo.com/cf97a655869f0b22df0ada1cb2a41c3c"><img src="https://i.gyazo.com/cf97a655869f0b22df0ada1cb2a41c3c.png" alt="Image from Gyazo" width="724.8"/></a>

When you find what kind of information you need for the data to be uploaded you just test it. Here we need the article... file:

<a href="https://gyazo.com/78b35bf492994b3f35c0564a21da202a"><img src="https://i.gyazo.com/78b35bf492994b3f35c0564a21da202a.png" alt="Image from Gyazo" width="727.2"/></a>

When we test the link in Chrome inkognito mode we obtain the proper json file:


<a href="https://gyazo.com/b60453fa98454fa29771c731a5174443"><img src="https://i.gyazo.com/b60453fa98454fa29771c731a5174443.png" alt="Image from Gyazo" width="1530.4"/></a>

In order to change the objects in the json file (kind of pagination), you need to change the offset (the number of the first element on the page). in fact, if you take a look on the link, it is easy to unerstand the structure of the link.

# Reading the data

Now the party rocks! When we know how can we obtain the data, it is not a problem to obtain the whole database with all the data from the web-page.
In this lab you will collect your database of Zalando products. You select which goods you want to track. You can define as many filters to your data as you want. Just make sure that the data represents the filters.




In [203]:
import json
import requests
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize
import re

In [204]:
# Paste the url you obtained for your data
url='https://www.zalando.fr/api/catalog/articles?categories=enfant&genders=MALE&limit=84&offset=84&sort=popularity'

#### Collect first 84 object of the of the data (1st page)

Your output should be a Pandas DataFrame of goods. Each row should contain only text or numbers, having *family_articles, flags, media* and *sizes* remaining lists (they are exceptions). Hint: use the headers parameter to get the data!

In [205]:
# headers definition
response = requests.get(url, headers={"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"})
result=response.json()
result

{'total_count': 29501,
 'pagination': {'page_count': 352, 'current_page': 2, 'per_page': 84},
 'sort': 'popularity',
 'articles': [{'sku': 'NI114D0EI-K11',
   'name': 'COURT BOROUGH 2 - Baskets basses - midnight navy/lemon/black/anthracite',
   'price': {'original': '29,95\xa0€',
    'promotional': '29,95\xa0€',
    'has_different_prices': False,
    'has_different_original_prices': False,
    'has_different_promotional_prices': False,
    'has_discount_on_selected_sizes_only': False},
   'sizes': ['19.5', '21', '22', '23.5', '25', '26', '27'],
   'url_key': 'nike-sportswear-court-borough-low-2-baskets-basses-ni114d0ei-k11',
   'media': [{'path': 'NI/11/4D/0E/IK/11/NI114D0EI-K11@10.jpg',
     'role': 'DEFAULT',
     'packet_shot': False}],
   'brand_name': 'Nike Sportswear',
   'is_premium': False,
   'family_articles': [{'sku': 'NI114D0EI-K11',
     'url_key': 'nike-sportswear-court-borough-low-2-baskets-basses-ni114d0ei-k11',
     'media': [{'path': 'NI/11/4D/0E/IK/11/NI114D0EI-K11@1

In [218]:
dfArticles=pd.DataFrame()
dfArticles=json_normalize(json_normalize(result).articles[0])
dfArticles.head()

Unnamed: 0,amount,brand_name,delivery_promises,family_articles,flags,is_premium,media,name,price.has_different_original_prices,price.has_different_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,price.original,price.promotional,product_group,sizes,sku,url_key
0,,Nike Sportswear,[],"[{'sku': 'NI114D0EI-K11', 'url_key': 'nike-spo...","[{'key': 'new', 'value': 'Nouveau', 'tracking_...",False,[{'path': 'NI/11/4D/0E/IK/11/NI114D0EI-K11@10....,COURT BOROUGH 2 - Baskets basses - midnight na...,False,False,False,False,"29,95 €","29,95 €",shoe,"[19.5, 21, 22, 23.5, 25, 26, 27]",NI114D0EI-K11,nike-sportswear-court-borough-low-2-baskets-ba...
1,,Nike Performance,[],"[{'sku': 'N1243F02T-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-10%', 'tra...",False,[{'path': 'N1/24/3F/02/TQ/11/N1243F02T-Q11@10....,Veste Hardshell - black/black/white,False,False,False,False,"44,95 €","40,45 €",clothing,"[6-8a, 8-10a, 10-12a, 12-13a, 13-15a]",N1243F02T-Q11,nike-performance-veste-hardshell-blackblackwhi...
2,109 g,Nike Performance,[],"[{'sku': 'N1243A0XV-J11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -40...",False,[{'path': 'N1/24/3A/0X/VJ/11/N1243A0XV-J11@2.1...,DOWNSHIFTER 9 - Chaussures de running neutres...,False,True,True,False,"39,95 €","23,95 €",shoe,"[27.5, 28, 28.5, 29.5, 30, 31, 31.5, 32, 33, 3...",N1243A0XV-J11,nike-performance-downshifter-9-chaussures-de-r...
3,,Vans,[],"[{'sku': 'VA216D02W-G11', 'url_key': 'vans-old...","[{'key': 'new', 'value': 'Nouveau', 'tracking_...",False,[{'path': 'VA/21/6D/02/WG/11/VA216D02W-G11@7.j...,OLD SKOOL ELASTIC LACE - Mocassins - racing re...,False,False,False,False,"39,95 €","39,95 €",shoe,"[17, 19, 20, 21, 22, 23.5, 24, 25, 26]",VA216D02W-G11,vans-old-skool-elastic-lace-baskets-basses-rac...
4,,Tommy Hilfiger,[],"[{'sku': 'TO124G05S-G11', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-25%', 'tra...",False,[{'path': 'TO/12/4G/05/SG/11/TO124G05S-G11@6.j...,BOYS BASIC - T-shirt basique - apple red heather,False,True,True,False,"14,95 €","11,20 €",clothing,"[4a, 5a, 6a, 7a, 8a, 10a, 12a, 14a, 16a]",TO124G05S-G11,tommy-hilfiger-boys-basic-t-shirt-basique-to12...


In [219]:
dfArticles.count()

amount                                        8
brand_name                                   84
delivery_promises                            84
family_articles                              84
flags                                        84
is_premium                                   84
media                                        84
name                                         84
price.has_different_original_prices          84
price.has_different_prices                   84
price.has_different_promotional_prices       84
price.has_discount_on_selected_sizes_only    84
price.original                               84
price.promotional                            84
product_group                                84
sizes                                        84
sku                                          84
url_key                                      84
dtype: int64

In [224]:
print(dfArticles.columns)
dfArticles.columns=[str.strip(x.lower()) for x in dfArticles.columns]
dfArticles.columns=[re.sub(r"[. \ :+]","_",x) for x in dfArticles.columns]
dfArticles.columns=[re.sub("__","_",x) for x in dfArticles.columns]
print(dfArticles.columns)

Index(['amount', 'brand_name', 'delivery_promises', 'family_articles', 'flags',
       'is_premium', 'media', 'name', 'price.has_different_original_prices',
       'price.has_different_prices', 'price.has_different_promotional_prices',
       'price.has_discount_on_selected_sizes_only', 'price.original',
       'price.promotional', 'product_group', 'sizes', 'sku', 'url_key',
       'amount_c', 'brand_name_c', 'delivery_promises_c', 'path_c',
       'packet_shot_c'],
      dtype='object')
Index(['amount', 'brand_name', 'delivery_promises', 'family_articles', 'flags',
       'is_premium', 'media', 'name', 'price_has_different_original_prices',
       'price_has_different_prices', 'price_has_different_promotional_prices',
       'price_has_discount_on_selected_sizes_only', 'price_original',
       'price_promotional', 'product_group', 'sizes', 'sku', 'url_key',
       'amount_c', 'brand_name_c', 'delivery_promises_c', 'path_c',
       'packet_shot_c'],
      dtype='object')


In [221]:
for x in dfArticles.columns:
    print(dfArticles[x].value_counts())

121 g    1
109 g    1
195 g    1
120 g    1
136 g    1
40 g     1
66 g     1
206 g    1
Name: amount, dtype: int64
Nike Performance       21
Next                   13
Nike Sportswear         8
adidas Originals        6
adidas Performance      5
Champion                4
Tommy Hilfiger          4
Vans                    3
Levi's®                 3
Polo Ralph Lauren       2
Puma                    2
Calvin Klein Jeans      2
GAP                     2
Lacoste                 2
Fila                    1
Jordan                  1
New Balance             1
Jack & Jones Junior     1
Benetton                1
New Era                 1
Redskins                1
Name: brand_name, dtype: int64
[]    84
Name: delivery_promises, dtype: int64
[{'sku': 'AD124G02B-A11', 'url_key': 'adidas-originals-stripes-t-shirt-a-manches-longues-whiteblack-ad124g02b-a11', 'media': [{'path': 'AD/12/4G/02/BA/11/AD124G02B-A11@5.jpg', 'role': 'FAMILY', 'packet_shot': False}], 'name': 'T-shirt à manches longues - white/

In [225]:
# family_articles, flags, media and sizes
def clean_number(number):
    clean_number=re.sub("\D+", "", str(number))
    return clean_number

def clean_text(text):
    clean_text=re.sub("\W+\s*", " ", str(text))
    return clean_text

def clean_path(txt):
    clean_packet_shot=txt['packet_shot']
    clean_path=txt['path']
    return (clean_path,clean_packet_shot)

def clean_float(float_v):
    float_v = re.sub(",", ".", str(float_v))
    clean_float=re.match("\d+\.?\d*", float_v).group()
    return clean_float

dfArticles['amount_c']= dfArticles.amount.apply(lambda x:clean_number(x))
dfArticles['brand_name_c']= dfArticles.brand_name.apply(lambda x:clean_text(x))
dfArticles['delivery_promises_c']= dfArticles.delivery_promises.apply(lambda x:clean_text(x))
dfArticles[['path_c','packet_shot_c']]= dfArticles.media.apply(lambda x:clean_path(x[0])).apply(pd.Series)
dfArticles['price_original_c']= dfArticles.price_original.apply(lambda x:clean_float(x))
dfArticles['price_promotional_c']= dfArticles.price_promotional.apply(lambda x:clean_float(x))

In [226]:
dfOut=dfArticles[['amount_c', 'brand_name_c', 'delivery_promises_c', 'family_articles', 'flags',
       'is_premium', 'path_c','packet_shot_c', 'name', 'price_has_different_original_prices',
       'price_has_different_prices', 'price_has_different_promotional_prices',
       'price_has_discount_on_selected_sizes_only', 'price_original_c',
       'price_promotional_c', 'product_group', 'sizes', 'sku', 'url_key']]
dfOut.columns=['amount', 'brand_name', 'delivery_promises', 'family_articles', 'flags',
       'is_premium', 'path','packet_shot', 'name', 'price_has_different_original_prices',
       'price_has_different_prices', 'price_has_different_promotional_prices',
       'price_has_discount_on_selected_sizes_only', 'price_original',
       'price_promotional', 'product_group', 'sizes', 'sku', 'url_key']
dfOut.head()

Unnamed: 0,amount,brand_name,delivery_promises,family_articles,flags,is_premium,path,packet_shot,name,price_has_different_original_prices,price_has_different_prices,price_has_different_promotional_prices,price_has_discount_on_selected_sizes_only,price_original,price_promotional,product_group,sizes,sku,url_key
0,,Nike Sportswear,,"[{'sku': 'NI114D0EI-K11', 'url_key': 'nike-spo...","[{'key': 'new', 'value': 'Nouveau', 'tracking_...",False,NI/11/4D/0E/IK/11/NI114D0EI-K11@10.jpg,False,COURT BOROUGH 2 - Baskets basses - midnight na...,False,False,False,False,29.95,29.95,shoe,"[19.5, 21, 22, 23.5, 25, 26, 27]",NI114D0EI-K11,nike-sportswear-court-borough-low-2-baskets-ba...
1,,Nike Performance,,"[{'sku': 'N1243F02T-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-10%', 'tra...",False,N1/24/3F/02/TQ/11/N1243F02T-Q11@10.jpg,True,Veste Hardshell - black/black/white,False,False,False,False,44.95,40.45,clothing,"[6-8a, 8-10a, 10-12a, 12-13a, 13-15a]",N1243F02T-Q11,nike-performance-veste-hardshell-blackblackwhi...
2,109.0,Nike Performance,,"[{'sku': 'N1243A0XV-J11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': 'Jusqu’à -40...",False,N1/24/3A/0X/VJ/11/N1243A0XV-J11@2.1.jpg,True,DOWNSHIFTER 9 - Chaussures de running neutres...,False,True,True,False,39.95,23.95,shoe,"[27.5, 28, 28.5, 29.5, 30, 31, 31.5, 32, 33, 3...",N1243A0XV-J11,nike-performance-downshifter-9-chaussures-de-r...
3,,Vans,,"[{'sku': 'VA216D02W-G11', 'url_key': 'vans-old...","[{'key': 'new', 'value': 'Nouveau', 'tracking_...",False,VA/21/6D/02/WG/11/VA216D02W-G11@7.jpg,False,OLD SKOOL ELASTIC LACE - Mocassins - racing re...,False,False,False,False,39.95,39.95,shoe,"[17, 19, 20, 21, 22, 23.5, 24, 25, 26]",VA216D02W-G11,vans-old-skool-elastic-lace-baskets-basses-rac...
4,,Tommy Hilfiger,,"[{'sku': 'TO124G05S-G11', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-25%', 'tra...",False,TO/12/4G/05/SG/11/TO124G05S-G11@6.jpg,True,BOYS BASIC - T-shirt basique - apple red heather,False,True,True,False,14.95,11.2,clothing,"[4a, 5a, 6a, 7a, 8a, 10a, 12a, 14a, 16a]",TO124G05S-G11,tommy-hilfiger-boys-basic-t-shirt-basique-to12...


#### Collect all the objects from selected filters. Total number of pages can be found in the same json. Use *sku* column as index.

Your output should be a Pandas DataFrame of goods. Each row should contain only text or numbers, having family_articles, flags, media and sizes remaining lists (they are exceptions).

In [262]:
# Get the total number of pages
pag=result['pagination']
page_count=pag['page_count']
per_page=pag['per_page']

dfTotalArticles=pd.DataFrame()
for x in range(0,page_count):
    url="https://www.zalando.fr/api/catalog/articles?categories=promo-enfant&limit=84&offset="+ str(int(per_page)*x)+"&sort=popularity"
    response = requests.get(url, headers={"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"})
    result=response.json()
    if x==0:
        dfTotalArticles=json_normalize(json_normalize(result).articles[0])
    else:
        dfTotalArticles=pd.concat([dfTotalArticles,json_normalize(json_normalize(result).articles[0])], ignore_index=True, sort=True)

print(dfTotalArticles.columns)
dfTotalArticles.columns=[str.strip(x.lower()) for x in dfTotalArticles.columns]
dfTotalArticles.columns=[re.sub(r"[. \ :+]","_",x) for x in dfTotalArticles.columns]
dfTotalArticles.columns=[re.sub("__","_",x) for x in dfTotalArticles.columns]
print(dfTotalArticles.columns)
dfTotalArticles.head()

Index(['amount', 'brand_name', 'delivery_promises', 'family_articles', 'flags',
       'is_premium', 'media', 'name', 'price.base_price',
       'price.has_different_original_prices', 'price.has_different_prices',
       'price.has_different_promotional_prices',
       'price.has_discount_on_selected_sizes_only', 'price.original',
       'price.promotional', 'product_group', 'sizes', 'sku',
       'tracking_information.impression_beacon',
       'tracking_information.metrigo_impression_urls',
       'tracking_information.source', 'url_key'],
      dtype='object')
Index(['amount', 'brand_name', 'delivery_promises', 'family_articles', 'flags',
       'is_premium', 'media', 'name', 'price_base_price',
       'price_has_different_original_prices', 'price_has_different_prices',
       'price_has_different_promotional_prices',
       'price_has_discount_on_selected_sizes_only', 'price_original',
       'price_promotional', 'product_group', 'sizes', 'sku',
       'tracking_information_impress

Unnamed: 0,amount,brand_name,delivery_promises,family_articles,flags,is_premium,media,name,price_base_price,price_has_different_original_prices,...,price_has_discount_on_selected_sizes_only,price_original,price_promotional,product_group,sizes,sku,tracking_information_impression_beacon,tracking_information_metrigo_impression_urls,tracking_information_source,url_key
0,,Tommy Hilfiger,[],"[{'sku': 'TO113I00A-Q11', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-20%', 'tra...",False,[{'path': 'TO/11/3I/00/AQ/11/TO113I00A-Q11@10....,BOOT - Bottines à lacets - black,,False,...,False,"99,95 €","79,95 €",shoe,"[28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 3...",TO113I00A-Q11,https://ccp-et.metrigo.zalan.do/event/sbv?z=95...,[https://ccp-et.metrigo.zalan.do/event/sbv?z=9...,ccp,tommy-hilfiger-boot-bottines-a-lacets-black-to...
1,,Nike Sportswear,[],"[{'sku': 'NI113D08L-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': 'Jusqu’à -20...",False,[{'path': 'NI/11/3D/08/LA/11/NI113D08L-A11@10....,AIR FORCE 1 - Baskets basses - black/wolf grey...,,False,...,False,"79,95 €","63,95 €",shoe,"[35.5, 36, 36.5, 37.5, 38]",NI113D08L-A11,https://ccp-et.metrigo.zalan.do/event/sbv?z=95...,[https://ccp-et.metrigo.zalan.do/event/sbv?z=9...,ccp,nike-sportswear-air-force-1-baskets-basses-bla...
2,,Nike Performance,[],"[{'sku': 'N1243A0XO-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-10%', 'tra...",False,[{'path': 'N1/24/3A/0X/OQ/11/N1243A0XO-Q11@11....,TEAM HUSTLE QUICK 2 - Chaussures de basket - b...,,False,...,False,"34,95 €","31,45 €",shoe,"[27.5, 28, 28.5, 29.5, 30, 31, 32, 33, 33.5, 3...",N1243A0XO-Q11,https://ccp-et.metrigo.zalan.do/event/sbv?z=95...,[https://ccp-et.metrigo.zalan.do/event/sbv?z=9...,ccp,nike-performance-team-hustle-quick-2-chaussure...
3,,LMTD,[],"[{'sku': 'L0423B00N-Q11', 'url_key': 'lmtd-nlf...","[{'key': 'discountRate', 'value': '-25%', 'tra...",False,[{'path': 'L0/42/3B/00/NQ/11/L0423B00N-Q11@1.1...,NLFDONNA - Pantalon classique - black,,False,...,False,"31,95 €","23,95 €",clothing,"[10a, 11a, 12a, 13a, 14a, 15a, 16a]",L0423B00N-Q11,,,,lmtd-nlfdonna-bootcut-pant-pantalon-classique-...
4,,adidas Originals,[],"[{'sku': 'AD116D008-A12', 'url_key': 'adidas-o...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,[{'path': 'AD/11/6D/00/8A/12/AD116D008-A12@12....,STAN SMITH - Baskets basses - white/bold pink,,False,...,False,"64,95 €","45,45 €",shoe,"[36, 38, 40, 35 1/2, 36 2/3, 37 1/3, 38 2/3]",AD116D008-A12,,,,adidas-originals-stan-smith-baskets-basses-ad1...


In [266]:
dfTotalArticles.count()

amount                                            158
brand_name                                      22016
delivery_promises                               22016
family_articles                                 22016
flags                                           22016
is_premium                                      22016
media                                           22016
name                                            22016
price_base_price                                    1
price_has_different_original_prices             22016
price_has_different_prices                      22016
price_has_different_promotional_prices          22016
price_has_discount_on_selected_sizes_only       22016
price_original                                  22016
price_promotional                               22016
product_group                                   22016
sizes                                           22016
sku                                             22016
tracking_information_impress

In [264]:
def clean_number(number):
    clean_number=re.sub("\D+", "", str(number))
    return clean_number

def clean_text(text):
    clean_text=re.sub("\W+\s*", " ", str(text))
    return clean_text

def clean_path(txt):
    clean_packet_shot=txt['packet_shot']
    clean_path=txt['path']
    return (clean_path,clean_packet_shot)

def clean_float(float_v):
    float_v = re.sub(",", ".", str(float_v))
    clean_float=re.match("\d+\.?\d*", float_v).group()
    return clean_float

dfTotalArticles['amount_c']= dfTotalArticles.amount.apply(lambda x:clean_number(x))
dfTotalArticles['brand_name_c']= dfTotalArticles.brand_name.apply(lambda x:clean_text(x))
dfTotalArticles['delivery_promises_c']= dfTotalArticles.delivery_promises.apply(lambda x:clean_text(x))
dfTotalArticles[['path_c','packet_shot_c']]= dfTotalArticles.media.apply(lambda x:clean_path(x[0])).apply(pd.Series)
dfTotalArticles['price_original_c']= dfTotalArticles.price_original.apply(lambda x:clean_float(x))
dfTotalArticles['price_promotional_c']= dfTotalArticles.price_promotional.apply(lambda x:clean_float(x))

In [270]:
dfOutTotal=dfTotalArticles[['amount_c', 'brand_name_c', 'delivery_promises_c', 'family_articles', 'flags',
       'is_premium', 'path_c','packet_shot_c', 'name', 'price_has_different_original_prices',
       'price_has_different_prices', 'price_has_different_promotional_prices',
       'price_has_discount_on_selected_sizes_only', 'price_original_c',
       'price_promotional_c', 'product_group', 'sizes', 'sku', 'url_key']]
dfOutTotal.columns=['amount', 'brand_name', 'delivery_promises', 'family_articles', 'flags',
       'is_premium', 'path','packet_shot', 'name', 'price_has_different_original_prices',
       'price_has_different_prices', 'price_has_different_promotional_prices',
       'price_has_discount_on_selected_sizes_only', 'price_original',
       'price_promotional', 'product_group', 'sizes', 'sku', 'url_key']

Unnamed: 0,amount,brand_name,delivery_promises,family_articles,flags,is_premium,path,packet_shot,name,price_has_different_original_prices,price_has_different_prices,price_has_different_promotional_prices,price_has_discount_on_selected_sizes_only,price_original,price_promotional,product_group,sizes,sku,url_key
0,,Tommy Hilfiger,,"[{'sku': 'TO113I00A-Q11', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-20%', 'tra...",False,TO/11/3I/00/AQ/11/TO113I00A-Q11@10.jpg,False,BOOT - Bottines à lacets - black,False,False,False,False,99.95,79.95,shoe,"[28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 3...",TO113I00A-Q11,tommy-hilfiger-boot-bottines-a-lacets-black-to...
1,,Nike Sportswear,,"[{'sku': 'NI113D08L-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': 'Jusqu’à -20...",False,NI/11/3D/08/LA/11/NI113D08L-A11@10.jpg,False,AIR FORCE 1 - Baskets basses - black/wolf grey...,False,True,True,False,79.95,63.95,shoe,"[35.5, 36, 36.5, 37.5, 38]",NI113D08L-A11,nike-sportswear-air-force-1-baskets-basses-bla...
2,,Nike Performance,,"[{'sku': 'N1243A0XO-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-10%', 'tra...",False,N1/24/3A/0X/OQ/11/N1243A0XO-Q11@11.1.jpg,True,TEAM HUSTLE QUICK 2 - Chaussures de basket - b...,False,False,False,False,34.95,31.45,shoe,"[27.5, 28, 28.5, 29.5, 30, 31, 32, 33, 33.5, 3...",N1243A0XO-Q11,nike-performance-team-hustle-quick-2-chaussure...
3,,LMTD,,"[{'sku': 'L0423B00N-Q11', 'url_key': 'lmtd-nlf...","[{'key': 'discountRate', 'value': '-25%', 'tra...",False,L0/42/3B/00/NQ/11/L0423B00N-Q11@1.1.jpg,True,NLFDONNA - Pantalon classique - black,False,False,False,False,31.95,23.95,clothing,"[10a, 11a, 12a, 13a, 14a, 15a, 16a]",L0423B00N-Q11,lmtd-nlfdonna-bootcut-pant-pantalon-classique-...
4,,adidas Originals,,"[{'sku': 'AD116D008-A12', 'url_key': 'adidas-o...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,AD/11/6D/00/8A/12/AD116D008-A12@12.jpg,True,STAN SMITH - Baskets basses - white/bold pink,False,True,True,False,64.95,45.45,shoe,"[36, 38, 40, 35 1/2, 36 2/3, 37 1/3, 38 2/3]",AD116D008-A12,adidas-originals-stan-smith-baskets-basses-ad1...


In [272]:
dfOutTotal.set_index('sku')

Unnamed: 0_level_0,amount,brand_name,delivery_promises,family_articles,flags,is_premium,path,packet_shot,name,price_has_different_original_prices,price_has_different_prices,price_has_different_promotional_prices,price_has_discount_on_selected_sizes_only,price_original,price_promotional,product_group,sizes,url_key
sku,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
TO113I00A-Q11,,Tommy Hilfiger,,"[{'sku': 'TO113I00A-Q11', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-20%', 'tra...",False,TO/11/3I/00/AQ/11/TO113I00A-Q11@10.jpg,False,BOOT - Bottines à lacets - black,False,False,False,False,99.95,79.95,shoe,"[28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 3...",tommy-hilfiger-boot-bottines-a-lacets-black-to...
NI113D08L-A11,,Nike Sportswear,,"[{'sku': 'NI113D08L-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': 'Jusqu’à -20...",False,NI/11/3D/08/LA/11/NI113D08L-A11@10.jpg,False,AIR FORCE 1 - Baskets basses - black/wolf grey...,False,True,True,False,79.95,63.95,shoe,"[35.5, 36, 36.5, 37.5, 38]",nike-sportswear-air-force-1-baskets-basses-bla...
N1243A0XO-Q11,,Nike Performance,,"[{'sku': 'N1243A0XO-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-10%', 'tra...",False,N1/24/3A/0X/OQ/11/N1243A0XO-Q11@11.1.jpg,True,TEAM HUSTLE QUICK 2 - Chaussures de basket - b...,False,False,False,False,34.95,31.45,shoe,"[27.5, 28, 28.5, 29.5, 30, 31, 32, 33, 33.5, 3...",nike-performance-team-hustle-quick-2-chaussure...
L0423B00N-Q11,,LMTD,,"[{'sku': 'L0423B00N-Q11', 'url_key': 'lmtd-nlf...","[{'key': 'discountRate', 'value': '-25%', 'tra...",False,L0/42/3B/00/NQ/11/L0423B00N-Q11@1.1.jpg,True,NLFDONNA - Pantalon classique - black,False,False,False,False,31.95,23.95,clothing,"[10a, 11a, 12a, 13a, 14a, 15a, 16a]",lmtd-nlfdonna-bootcut-pant-pantalon-classique-...
AD116D008-A12,,adidas Originals,,"[{'sku': 'AD116D008-A12', 'url_key': 'adidas-o...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,AD/11/6D/00/8A/12/AD116D008-A12@12.jpg,True,STAN SMITH - Baskets basses - white/bold pink,False,True,True,False,64.95,45.45,shoe,"[36, 38, 40, 35 1/2, 36 2/3, 37 1/3, 38 2/3]",adidas-originals-stan-smith-baskets-basses-ad1...
JAN24B004-Q11,,Jack Jones Junior,,"[{'sku': 'JAN24B004-Q11', 'url_key': 'jackandj...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,JA/N2/4B/00/4Q/11/JAN24B004-Q11@8.1.jpg,True,Pantalon cargo - black,False,True,True,False,49.95,34.95,clothing,"[8a, 9a, 10a, 11a, 12a, 13a, 14a, 15a, 16a]",jackandjones-junior-pantalon-cargo-black-jan24...
TO184A02M-T12,,Tommy Hilfiger,,"[{'sku': 'TO184A02M-T12', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-15%', 'tra...",False,TO/18/4A/02/MT/12/TO184A02M-T12@9.jpg,False,TEE 2 PACK - T-shirt basique - multi,False,False,False,False,27.95,23.75,clothing,"[8-10a, 10-12a, 12-14a, 14-16a]",tommy-hilfiger-tee-2-pack-caraco-multi-to184a0...
DI124G049-Q11,,Diesel,,"[{'sku': 'DI124G049-Q11', 'url_key': 'diesel-t...","[{'key': 'discountRate', 'value': '-55%', 'tra...",False,DI/12/4G/04/9Q/11/DI124G049-Q11@7.jpg,False,TOBBY SLIM 2 C-C T-SHIRT - T-shirt à manches l...,False,False,False,False,49.95,22.45,clothing,"[6a, 10a, 12a, 14a, 16a]",diesel-tobby-slim-t-shirt-a-manches-longues-di...
AD116D007-A13,,adidas Originals,,"[{'sku': 'AD116D007-A13', 'url_key': 'adidas-o...","[{'key': 'discountRate', 'value': 'Jusqu’à -25...",False,AD/11/6D/00/7A/13/AD116D007-A13@2.jpg,True,STAN SMITH - Baskets basses - white/bold pink,False,True,True,False,54.95,41.15,shoe,"[28, 29, 30, 31, 32, 33, 34, 35, 30 1/2, 33 1/2]",adidas-originals-stan-smith-baskets-basses-ad1...
AD116D007-A11,,adidas Originals,,"[{'sku': 'AD116D007-A11', 'url_key': 'adidas-o...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,AD/11/6D/00/7A/11/AD116D007-A11@12.jpg,True,STAN SMITH - Baskets basses - white,False,True,True,False,54.95,38.45,shoe,"[28, 29, 30, 31, 32, 33, 34, 35, 28 1/2, 30 1/...",adidas-originals-stan-smith-baskets-basses-bla...


#### Display the trending brand in DataFrame

In [310]:
dfOutTotal.brand_name.value_counts().head(1)
# Name it es el que más aparece

Name it    1010
Name: brand_name, dtype: int64

#### Display the brand with maximal total discount (sum of discounts on all goods)

In [309]:
#Our data is still text. Convert prices into numbers.
# your code
dfOutTotal['discount']=dfOutTotal.price_original.astype(float)-dfOutTotal.price_promotional.astype(float)
dfOutTotal.groupby('brand_name')['discount'].agg('sum').sort_values(ascending=False).head(1)
#Naturino

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


brand_name
Naturino    10300.6
Name: discount, dtype: float64

#### Display the brands without discount at all

In [315]:
# your code
dfOutTotal[dfOutTotal['discount']==0].groupby('brand_name')['discount'].agg('sum')

brand_name
Birkenstock           0.0
Camper                0.0
DC Shoes              0.0
Esprit                0.0
Giesswein             0.0
Helly Hansen          0.0
Jack Jones Junior     0.0
Keen                  0.0
Kickers               0.0
Kids ONLY             0.0
LICO                  0.0
Lacoste               0.0
Lacoste Sport         0.0
Name it               0.0
Next                  0.0
Nike Performance      0.0
Nike Sportswear       0.0
O Neill               0.0
Palladium             0.0
Petrol Industries     0.0
Primigi               0.0
Quiksilver            0.0
Ricosta               0.0
Roxy                  0.0
Superfit              0.0
UGG                   0.0
Viking                0.0
WE Fashion            0.0
adidas Originals      0.0
adidas Performance    0.0
Name: discount, dtype: float64