### <span style="color:darksalmon;"> Groupby( ): </span>

El método groupby( ) en Pandas permite agrupar filas que comparten un valor común en una o más columnas, y aplicar funciones de agregación como sum(), mean(), count(), entre muchas otras.

Sintaxis general:
```Python
df.groupby("columna").función()

#O para aplicar sobre columnas específicas:

df.groupby("columna")[["otra_col"]].función()

# También permite agrupar por múltiples columnas

df.groupby(["col1", "col2"]).función()

```
### Cuadro explicativo

| Elemento                                | Descripción                                                          |
| --------------------------------------- | -------------------------------------------------------------------- |
| `groupby("Region")`                     | Agrupa todas las filas por región                                    |
| `.sum()`                                | Suma todos los valores numéricos por grupo                           |
| `.mean()`                               | Calcula la media por grupo                                           |
| `.count()`                              | Cuenta la cantidad de filas por grupo                                |
| `.agg({"col1": "sum", "col2": "mean"})` | Agrega con diferentes funciones por columna                          |
| `.reset_index()`                        | Restaura el índice después del `groupby()` (útil para visualización) |


In [1]:
import pandas as pd
import numpy as np
# Cargar archivo CSV en un DataFrame
df = pd.read_csv('../4.3_Python_fo_Data_Pandas_Manipulación_del_Dato/Online_Sales_modificado.csv')
#df = pd.read_csv('Online_Sales_modificado.csv')
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method,Total_Revenue_recal,Dif_Revenue,Precio con IVA,Ingreso por unidad,Precio con descuento,Unidades futuras
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card,1999.98,0.0,1209.9879,999.99,899.991,4
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal,499.99,0.0,604.9879,499.99,449.991,3
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card,209.97,0.0,84.6879,69.99,62.991,5
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card,63.96,0.0,19.3479,15.99,14.391,6
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal,89.99,0.0,108.8879,89.99,80.991,3


### 1. Realiza un pequeño analisis Preliminar

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Transaction ID        240 non-null    int64  
 1   Date                  240 non-null    object 
 2   Product Category      240 non-null    object 
 3   Product Name          240 non-null    object 
 4   Units Sold            240 non-null    int64  
 5   Unit Price            240 non-null    float64
 6   Total Revenue         240 non-null    float64
 7   Region                240 non-null    object 
 8   Payment Method        240 non-null    object 
 9   Total_Revenue_recal   240 non-null    float64
 10  Dif_Revenue           240 non-null    float64
 11  Precio con IVA        240 non-null    float64
 12  Ingreso por unidad    240 non-null    float64
 13  Precio con descuento  240 non-null    float64
 14  Unidades futuras      240 non-null    int64  
dtypes: float64(7), int64(3)

In [3]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue,Total_Revenue_recal,Dif_Revenue,Precio con IVA,Ingreso por unidad,Precio con descuento,Unidades futuras
count,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375,335.699375,0.0,286.038656,236.395583,212.756025,4.158333
std,69.42622,1.322454,429.446695,485.804469,485.804469,0.0,519.630501,429.446695,386.502025,1.322454
min,10001.0,1.0,6.5,6.5,6.5,0.0,7.865,6.5,5.85,3.0
25%,10060.75,1.0,29.5,62.965,62.965,0.0,35.695,29.5,26.55,3.0
50%,10120.5,2.0,89.99,179.97,179.97,0.0,108.8879,89.99,80.991,4.0
75%,10180.25,3.0,249.99,399.225,399.225,0.0,302.4879,249.99,224.991,5.0
max,10240.0,10.0,3899.99,3899.99,3899.99,0.0,4718.9879,3899.99,3509.991,12.0


In [4]:
df.describe(include="O")

Unnamed: 0,Date,Product Category,Product Name,Region,Payment Method
count,240,240,240,240,240
unique,240,6,232,3,3
top,2024-01-01,Electronics,Dyson Supersonic Hair Dryer,North America,Credit Card
freq,1,40,2,80,120


### 2. Ejercicio básico agrupando sobre una sola columna y sólo una columna en el método de agregación. Haz uno usando el metodo reset_index() y otro sin para comprobar la diferencia

In [5]:
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method,Total_Revenue_recal,Dif_Revenue,Precio con IVA,Ingreso por unidad,Precio con descuento,Unidades futuras
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card,1999.98,0.0,1209.9879,999.99,899.991,4
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal,499.99,0.0,604.9879,499.99,449.991,3
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card,209.97,0.0,84.6879,69.99,62.991,5
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card,63.96,0.0,19.3479,15.99,14.391,6
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal,89.99,0.0,108.8879,89.99,80.991,3


In [6]:
# Sintaxis basica usando Reset Index
df.groupby("Region")["Total Revenue"].sum().sort_values(ascending=False).reset_index()

Unnamed: 0,Region,Total Revenue
0,North America,36844.34
1,Asia,22455.45
2,Europe,21268.06


In [7]:
# Sintaxis basica
df.groupby("Region")["Total Revenue"].sum().sort_values(ascending=False)

Region
North America    36844.34
Asia             22455.45
Europe           21268.06
Name: Total Revenue, dtype: float64

### 3. Ejercicio con una columna en el group by y ninguna en el método de agregación.

In [8]:
df.groupby("Region").count().reset_index()

Unnamed: 0,Region,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Payment Method,Total_Revenue_recal,Dif_Revenue,Precio con IVA,Ingreso por unidad,Precio con descuento,Unidades futuras
0,Asia,80,80,80,80,80,80,80,80,80,80,80,80,80,80
1,Europe,80,80,80,80,80,80,80,80,80,80,80,80,80,80
2,North America,80,80,80,80,80,80,80,80,80,80,80,80,80,80


### 4 Agrupamos por Product Name. Primero vemos los valores únicos y cuantos hay para recordar unique y value_counts

In [9]:
df["Product Category"].unique()

array(['Electronics', 'Home Appliances', 'Clothing', 'Books',
       'Beauty Products', 'Sports'], dtype=object)

In [10]:
df["Product Name"].value_counts()

Product Name
Dyson Supersonic Hair Dryer                         2
The Girl with the Dragon Tattoo by Stieg Larsson    2
Keurig K-Elite Coffee Maker                         2
The Silent Patient by Alex Michaelides              2
Dune by Frank Herbert                               2
                                                   ..
LG OLED TV                                          1
Uniqlo Ultra Light Down Jacket                      1
Sunday Riley Good Genes                             1
On Running Cloud Shoes                              1
Yeti Rambler 20 oz Tumbler                          1
Name: count, Length: 232, dtype: int64

In [11]:
df["Product Category"].value_counts()

Product Category
Electronics        40
Home Appliances    40
Clothing           40
Books              40
Beauty Products    40
Sports             40
Name: count, dtype: int64

#### 5. Agrupamos por Product Category pero varias columnas a la vez en el método de agregación

In [12]:
df.head(2)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method,Total_Revenue_recal,Dif_Revenue,Precio con IVA,Ingreso por unidad,Precio con descuento,Unidades futuras
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card,1999.98,0.0,1209.9879,999.99,899.991,4
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal,499.99,0.0,604.9879,499.99,449.991,3


In [13]:
df.groupby("Product Category")[["Units Sold","Unit Price", "Total Revenue"]].mean().reset_index()

Unnamed: 0,Product Category,Units Sold,Unit Price,Total Revenue
0,Beauty Products,1.15,61.623,65.5475
1,Books,2.85,16.153,46.54825
2,Clothing,3.625,67.5365,203.22325
3,Electronics,1.65,691.5915,874.56025
4,Home Appliances,1.475,320.1855,466.154
5,Sports,2.2,261.284,358.163


### 6. Agrupamos por varias columnas a la vez

In [17]:
df.groupby(["Region", "Payment Method"])["Total Revenue"].sum().reset_index()

Unnamed: 0,Region,Payment Method,Total Revenue
0,Asia,Credit Card,14326.52
1,Asia,Debit Card,8128.93
2,Europe,PayPal,21268.06
3,North America,Credit Card,36844.34


In [18]:
df.groupby(["Region", "Payment Method" , "Product Category"])["Total Revenue"].sum().reset_index()

Unnamed: 0,Region,Payment Method,Product Category,Total Revenue
0,Asia,Credit Card,Sports,14326.52
1,Asia,Debit Card,Clothing,8128.93
2,Europe,PayPal,Beauty Products,2621.9
3,Europe,PayPal,Home Appliances,18646.16
4,North America,Credit Card,Books,1861.93
5,North America,Credit Card,Electronics,34982.41


### 7. Agrupamos por una columna y aplicamos en el metodo de agregacion para una sola columna

In [19]:
df.groupby("Region")["Units Sold"].agg(["min", "max"]).reset_index()


Unnamed: 0,Region,min,max
0,Asia,1,10
1,Europe,1,3
2,North America,1,4


In [20]:
df.groupby(["Region", "Payment Method" , "Product Category"])["Units Sold"].agg(["min", "max"]).reset_index()


Unnamed: 0,Region,Payment Method,Product Category,min,max
0,Asia,Credit Card,Sports,1,6
1,Asia,Debit Card,Clothing,2,10
2,Europe,PayPal,Beauty Products,1,2
3,Europe,PayPal,Home Appliances,1,3
4,North America,Credit Card,Books,2,4
5,North America,Credit Card,Electronics,1,4


### 8. Agrupamos por una columna y varias columnas en el metodo de agragacion pero cambiando el metodo de agregacion en cada columna

In [21]:
df.groupby("Product Category").agg({"Units Sold": "mean","Total Revenue": "sum"}).reset_index()

Unnamed: 0,Product Category,Units Sold,Total Revenue
0,Beauty Products,1.15,2621.9
1,Books,2.85,1861.93
2,Clothing,3.625,8128.93
3,Electronics,1.65,34982.41
4,Home Appliances,1.475,18646.16
5,Sports,2.2,14326.52


### 9 Eneseñamos el método rename para renombrar las columnas que creamos en el group by

In [22]:
df.groupby("Product Category").agg({"Units Sold": "mean","Total Revenue": "sum"}).rename(columns={'Units Sold': 'Cantidad_Media','Total Revenue':'Importe_Total'}).reset_index()

Unnamed: 0,Product Category,Cantidad_Media,Importe_Total
0,Beauty Products,1.15,2621.9
1,Books,2.85,1861.93
2,Clothing,3.625,8128.93
3,Electronics,1.65,34982.41
4,Home Appliances,1.475,18646.16
5,Sports,2.2,14326.52
