## Pertemuan 13
----
##  Eksplorasi dan Pengolahan Data Pandas
----


## 1. GroupBy dan Aggregasi

GroupBy memungkinkan kita untuk mengelompokkan data berdasarkan kolom tertentu dan melakukan agregasi (merangkum data) seperti menghitung rata-rata, jumlah, total, dll.

Bayangkan kamu punya data penjualan di berbagai kota. groupby() itu seperti:

*“Kelompokkan semua baris berdasarkan kota, lalu jumlahkan total penjualannya di tiap kota.”*

| Fungsi      | Kegunaan                                              |
| ----------- | ----------------------------------------------------- |
| `groupby()` | Mengelompokkan data berdasarkan satu atau lebih kolom |
| `agg()`     | Melakukan banyak fungsi agregasi sekaligus            |
| `mean()`    | Rata-rata                                             |
| `sum()`     | Total penjumlahan                                     |
| `count()`   | Menghitung jumlah baris di tiap grup                  |


Misal kita menggunakan dataset kecil untuk simple :

In [22]:
import pandas as pd

# Buat dataset kecil
data = {
    'Kategori': ['Elektronik', 'Elektronik', 'Pakaian', 'Pakaian', 'Makanan'],
    'Toko': ['Toko A', 'Toko B', 'Toko A', 'Toko B', 'Toko A'],
    'Penjualan': [1000, 1500, 500, 700, 300],
    'Transaksi': [10, 15, 5, 7, 3]
}

df_mini = pd.DataFrame(data)

df_mini


Unnamed: 0,Kategori,Toko,Penjualan,Transaksi
0,Elektronik,Toko A,1000,10
1,Elektronik,Toko B,1500,15
2,Pakaian,Toko A,500,5
3,Pakaian,Toko B,700,7
4,Makanan,Toko A,300,3


### groupby()

Mengelompokkan data berdasarkan kolom tertentu.

In [23]:
# Ini belum menampilkan hasil, tapi membentuk objek GroupBy berdasarkan Kategori.
grouped = df_mini.groupby('Kategori')

print(grouped)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001904937F230>


### agg()
Melakukan agregasi ganda (multi fungsi) sekaligus.

In [24]:
hasil_agg = df_mini.groupby('Kategori').agg({
    'Penjualan': 'sum',
    'Transaksi': 'mean'
})

hasil_agg

Unnamed: 0_level_0,Penjualan,Transaksi
Kategori,Unnamed: 1_level_1,Unnamed: 2_level_1
Elektronik,2500,12.5
Makanan,300,3.0
Pakaian,1200,6.0


### mean()
Menghitung rata-rata data

In [None]:
mean_penjualan = df_mini.groupby('Kategori')['Penjualan'].mean().reset_index()
mean_penjualan.columns = ['Kategori', 'Rata-Rata Penjualan'] # buat ganti nama kolom aja

mean_penjualan

Unnamed: 0,Kategori,Rata-Rata Penjualan
0,Elektronik,1250.0
1,Makanan,300.0
2,Pakaian,600.0


### sum()
Menghitung total nilai

In [29]:
total_penjualan = df_mini.groupby('Kategori')['Penjualan'].sum().reset_index()
total_penjualan.columns = ['Kategori', 'Total Penjualan']

total_penjualan


Unnamed: 0,Kategori,Total Penjualan
0,Elektronik,2500
1,Makanan,300
2,Pakaian,1200


### count()
Menghitung jumlah baris/data per grup

In [33]:
jumlah_data = df_mini.groupby('Kategori')['Penjualan'].count().reset_index()
jumlah_data.columns = ['Kategori', 'Jumlah Baris atau data']

jumlah_data

Unnamed: 0,Kategori,Jumlah Baris atau data
0,Elektronik,2
1,Makanan,1
2,Pakaian,2


## Kesimpulan
Semua fungsi ini bekerja dengan pendekatan :

In [None]:
# Bentuk Struktur saja tidak perlu di run
df.groupby('Kolom')['Kolom_Ag'].fungsi()

# atau
df.groupby('Kolom').agg({...})

------------

## 2. Pivot Table dan Crosstab

- Membuat pivot table untuk melihat ringkasan data berdasarkan baris dan kolom.

- Menggunakan crosstab untuk menghitung frekuensi atau agregasi antar kategori.

- Pivot Table itu seperti tabel dinamis di Excel — kamu bisa melihat total penjualan per kategori di setiap toko, dalam bentuk tabel dua dimensi.

- Crosstab itu seperti membuat tabel kontingensi (jumlah kemunculan kombinasi dua variabel kategori).

### Pivot Table

Fungsi : pd.pivot_table()

In [34]:
import pandas as pd

data = {
    'Kategori': ['Elektronik', 'Elektronik', 'Pakaian', 'Pakaian', 'Makanan'],
    'Toko': ['Toko A', 'Toko B', 'Toko A', 'Toko B', 'Toko A'],
    'Penjualan': [1000, 1500, 500, 700, 300],
    'Transaksi': [10, 15, 5, 7, 3]
}

df_mini = pd.DataFrame(data)

df_mini

Unnamed: 0,Kategori,Toko,Penjualan,Transaksi
0,Elektronik,Toko A,1000,10
1,Elektronik,Toko B,1500,15
2,Pakaian,Toko A,500,5
3,Pakaian,Toko B,700,7
4,Makanan,Toko A,300,3


Total Penjualan per Kategori di tiap Toko :

In [37]:
pivot1 = pd.pivot_table(df_mini, 
                        values='Penjualan', 
                        index='Kategori', 
                        columns='Toko', 
                        aggfunc='sum', 
                        fill_value=0)

pivot1

Toko,Toko A,Toko B
Kategori,Unnamed: 1_level_1,Unnamed: 2_level_1
Elektronik,1000,1500
Makanan,300,0
Pakaian,500,700


### Penjelasan :
- values='Penjualan': Nilai yang dirangkum.

- index='Kategori'  : Ditaruh di baris.

- columns='Toko'    : Ditaruh di kolom.

- aggfunc='sum'     : Agregasi berupa penjumlahan.

- fill_value=0      : Kosong → diisi 0.

Rata-rata Transaksi per kombinasi :

In [38]:
pivot2 = pd.pivot_table(df_mini, 
                        values='Transaksi', 
                        index='Toko', 
                        columns='Kategori', 
                        aggfunc='mean', 
                        fill_value=0)

pivot2

Kategori,Elektronik,Makanan,Pakaian
Toko,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Toko A,10.0,3.0,5.0
Toko B,15.0,0.0,7.0


### Crosstab

Fungsi: pd.crosstab()

Jumlah data per kombinasi Kategori dan Toko :

In [39]:
ctab = pd.crosstab(df_mini['Kategori'], df_mini['Toko'])

ctab
# Ini cocok untuk menghitung frekuensi atau jumlah kemunculan kombinasi kategori.

Toko,Toko A,Toko B
Kategori,Unnamed: 1_level_1,Unnamed: 2_level_1
Elektronik,1,1
Makanan,1,0
Pakaian,1,1


Crosstab dengan nilai dan agregasi. Misalnya, total Penjualan :

In [40]:
ctab_penjualan = pd.crosstab(df_mini['Kategori'], 
                             df_mini['Toko'], 
                             values=df_mini['Penjualan'], 
                             aggfunc='sum').fillna(0)

ctab_penjualan
# Hasilnya sama seperti pivot, tapi ini lebih ringkas.

Toko,Toko A,Toko B
Kategori,Unnamed: 1_level_1,Unnamed: 2_level_1
Elektronik,1000.0,1500.0
Makanan,300.0,0.0
Pakaian,500.0,700.0


| Fungsi        | Gunanya                                                       |
| ------------- | ------------------------------------------------------------- |
| `pivot_table` | Merangkum nilai (sum, mean, count, dll) dalam bentuk tabel 2D |
| `crosstab`    | Menghitung frekuensi atau agregasi antar kategori             |


-----------

## 3. Merge dan Join Dataset
- Menggabungkan dua DataFrame berdasarkan kolom tertentu.

- Mengenal jenis-jenis join: inner, left, right, outer.

#### Analogi Sederhana

Bayangkan kamu punya:

- Tabel 1: Data Customer

- Tabel 2: Data Penjualan

Merge & Join = menggabungkan dua tabel yang punya kolom kunci sama, seperti Customer ID.

In [41]:
import pandas as pd

# Data pelanggan
pelanggan = pd.DataFrame({
    'CustomerID': [1, 2, 3],
    'Nama': ['Andi', 'Budi', 'Citra']
})

# Data pesanan
pesanan = pd.DataFrame({
    'OrderID': [101, 102, 103, 104],
    'CustomerID': [1, 2, 2, 4],
    'Total': [50000, 75000, 30000, 100000]
})


In [42]:
pelanggan

Unnamed: 0,CustomerID,Nama
0,1,Andi
1,2,Budi
2,3,Citra


In [43]:
pesanan

Unnamed: 0,OrderID,CustomerID,Total
0,101,1,50000
1,102,2,75000
2,103,2,30000
3,104,4,100000


### Inner Join
Menggabungkan data yang ada di kedua tabel :

In [44]:
inner = pd.merge(pelanggan, pesanan, on='CustomerID', how='inner')

inner

Unnamed: 0,CustomerID,Nama,OrderID,Total
0,1,Andi,101,50000
1,2,Budi,102,75000
2,2,Budi,103,30000


### Left Join
Ambil semua data dari tabel kiri, yang kanan ikut kalau cocok :

In [45]:
left = pd.merge(pelanggan, pesanan, on='CustomerID', how='left')

left

Unnamed: 0,CustomerID,Nama,OrderID,Total
0,1,Andi,101.0,50000.0
1,2,Budi,102.0,75000.0
2,2,Budi,103.0,30000.0
3,3,Citra,,


### Right Join
Ambil semua data dari tabel kanan, yang kiri ikut kalau cocok :

In [46]:
right = pd.merge(pelanggan, pesanan, on='CustomerID', how='right')

right

Unnamed: 0,CustomerID,Nama,OrderID,Total
0,1,Andi,101,50000
1,2,Budi,102,75000
2,2,Budi,103,30000
3,4,,104,100000


### Outer Join
Gabungkan semua data, baik yang cocok maupun tidak cocok :

In [47]:
outer = pd.merge(pelanggan, pesanan, on='CustomerID', how='outer')

outer

Unnamed: 0,CustomerID,Nama,OrderID,Total
0,1,Andi,101.0,50000.0
1,2,Budi,102.0,75000.0
2,2,Budi,103.0,30000.0
3,3,Citra,,
4,4,,104.0,100000.0


### Rangkuman Join

| Tipe Join | Data dari Kiri | Data dari Kanan | Baris Tidak Cocok         |
| --------- | -------------- | --------------- | ------------------------- |
| Inner     | ✅ kalau cocok  | ✅ kalau cocok   | ❌ dihapus                 |
| Left      | ✅ semua        | ✅ kalau cocok   | ❌ kanan kosong            |
| Right     | ✅ kalau cocok  | ✅ semua         | ❌ kiri kosong             |
| Outer     | ✅ semua        | ✅ semua         | ✅ isi NaN kalau tak cocok |


### Gambaran


Tabel Pelanggan (kiri) :
```diff
CustomerID   Nama
-----------  ------
1            Andi
2            Budi
3            Citra
```

Tabel pesanan (kanan):
```diff
OrderID   CustomerID   Total
--------  -----------  ------
101       1            50000
102       2            75000
103       2            30000
104       4            100000
```

### Cara Kerja Join

### 1. Inner Join (how ='inner')
Hanya ambil data yang cocok di kedua tabel
```lua
  pelanggan             pesanan
  ----------            ---------
  1  Andi     ─────▶     1  50000
  2  Budi     ─────▶     2  75000
                     ─────▶     2  30000

⬇ Hasil:

CustomerID  Nama   OrderID  Total
----------- ------ -------- -------
1           Andi   101      50000
2           Budi   102      75000
2           Budi   103      30000
```

### 2. Left Join (how='left')
Ambil semua dari tabel kiri, kanan ikut jika cocok :
```lua
  pelanggan             pesanan
  ----------            ---------
  1  Andi     ─────▶     1  50000
  2  Budi     ─────▶     2  75000
                     ─────▶     2  30000
  3  Citra    ─────▶     (tidak ada)

⬇ Hasil:

CustomerID  Nama   OrderID  Total
----------- ------ -------- -------
1           Andi   101      50000
2           Budi   102      75000
2           Budi   103      30000
3           Citra  NaN      NaN
```

### 3. Right Join (how='right')
Ambil semua dari tabel kanan, kiri ikut jika cocok :
```lua
  pelanggan             pesanan
  ----------            ---------
  1  Andi     ◀─────     1  50000
  2  Budi     ◀─────     2  75000
                     ◀─────     2  30000
  (tidak ada) ◀─────     4  100000

⬇ Hasil:

CustomerID  Nama   OrderID  Total
----------- ------ -------- -------
1           Andi   101      50000
2           Budi   102      75000
2           Budi   103      30000
4           NaN    104      100000
```

### 4. Outer Join (how='outer')
Ambil semua dari kedua tabel, isi kosong kalau tidak cocok :
```lua
  pelanggan             pesanan
  ----------            ---------
  1  Andi     ◀─────▶     1  50000
  2  Budi     ◀─────▶     2  75000
                     ◀─────▶     2  30000
  3  Citra    ◀─────▶     (tidak ada)
  (tidak ada) ◀─────▶     4  100000

⬇ Hasil:

CustomerID  Nama   OrderID  Total
----------- ------ -------- -------
1           Andi   101      50000
2           Budi   102      75000
2           Budi   103      30000
3           Citra  NaN      NaN
4           NaN    104      100000
```

-----------

## 4. Reshape dan Melt
membahas cara mengubah bentuk (struktur) DataFrame, dari lebar ke panjang atau sebaliknya.

Memahami fungsi melt() dan pivot()

Mengubah data dari bentuk lebar (wide) ke panjang (long), dan sebaliknya

### Analogi

Bayangkan data kamu seperti lembar Excel dengan kolom per bulan:
```lua
| Produk | Jan | Feb | Mar |
```
Tapi kamu ingin ubah jadi:
```lua
| Produk | Bulan | Penjualan |
```
ini adalah melt

In [48]:
import pandas as pd

data = {
    'Produk': ['A', 'B', 'C'],
    'Jan': [100, 150, 200],
    'Feb': [120, 180, 210],
    'Mar': [130, 170, 190]
}

df_a = pd.DataFrame(data)

df_a

Unnamed: 0,Produk,Jan,Feb,Mar
0,A,100,120,130
1,B,150,180,170
2,C,200,210,190


### melt()
Mengubah format wide menjadi long (data panjang per baris).

In [49]:
df_melt = pd.melt(df_a, 
                  id_vars='Produk', 
                  value_vars=['Jan', 'Feb', 'Mar'],
                  var_name='Bulan', value_name='Penjualan')

df_melt

Unnamed: 0,Produk,Bulan,Penjualan
0,A,Jan,100
1,B,Jan,150
2,C,Jan,200
3,A,Feb,120
4,B,Feb,180
5,C,Feb,210
6,A,Mar,130
7,B,Mar,170
8,C,Mar,190


jika ingin mengembalikan seperti semula bisa mengggunakan pivot.

karena pivot mengubah dari long ke wide format :

In [50]:
df_pivot = df_melt.pivot(index='Produk', columns='Bulan', values='Penjualan').reset_index()

df_pivot

Bulan,Produk,Feb,Jan,Mar
0,A,120,100,130
1,B,180,150,170
2,C,210,200,190


### stack() dan unstack()

stack() dan unstack() ini digunakan untuk mengubah level baris dan kolom (MultiIndex) — biasanya dipakai ketika data kamu punya beberapa level indeks.

### 1. Membuat produk sebagai index
stack() bekerja lebih baik kalau ada satu kolom index, kita set dulu kolom Produk jadi index:

In [51]:
df_b = df_a.set_index('Produk')

df_b

Unnamed: 0_level_0,Jan,Feb,Mar
Produk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,100,120,130
B,150,180,170
C,200,210,190


### 2. Mengggunakan stack() untuk Mengubah kolom menjadi baris

In [52]:
# Series dengan Multiindex
df_stacked = df_b.stack()

print(df_stacked)

Produk     
A       Jan    100
        Feb    120
        Mar    130
B       Jan    150
        Feb    180
        Mar    170
C       Jan    200
        Feb    210
        Mar    190
dtype: int64


### Penjelasan :
- Kolom Jan, Feb, dan Mar sekarang menjadi baris baru.

- Produk jadi level 1 index, sedangkan nama bulan (Jan, Feb, dll) jadi level 2 index.

agak mirip ya sama melt()

Kalau mau jadi tabel :

In [53]:
df_stacked = df_b.stack().reset_index()
df_stacked.columns = ['Produk', 'Bulan', 'Penjualan']

df_stacked

Unnamed: 0,Produk,Bulan,Penjualan
0,A,Jan,100
1,A,Feb,120
2,A,Mar,130
3,B,Jan,150
4,B,Feb,180
5,B,Mar,170
6,C,Jan,200
7,C,Feb,210
8,C,Mar,190


### 3. unstack() untuk Mengembalikan ke format semula

In [54]:
df_unstacked = df_stacked.unstack()

print(df_unstacked)

Produk     0      A
           1      A
           2      A
           3      B
           4      B
           5      B
           6      C
           7      C
           8      C
Bulan      0    Jan
           1    Feb
           2    Mar
           3    Jan
           4    Feb
           5    Mar
           6    Jan
           7    Feb
           8    Mar
Penjualan  0    100
           1    120
           2    130
           3    150
           4    180
           5    170
           6    200
           7    210
           8    190
dtype: object


jadi kembali seperti semula deh

### Perbedaan dengan melt :
- melt() hasil akhirnya adalah DataFrame dengan kolom 'Produk', 'Bulan', dan 'Penjualan'.

- stack() hasil akhirnya adalah Series dengan MultiIndex.

- Gunakan stack() jika kamu nyaman dengan index yang kompleks atau ingin manipulasi lanjutan seperti groupby() MultiIndex.

----------------

## 5. Data time series vs Data Cross-sectional

### 1. Time series
Adalah data yang dikumpulkan berdasarkan urutan waktu tertentu, biasanya teratur seperti harian, mingguan, bulanan.

Contoh: Harga saham harian, jumlah penjualan bulanan, suhu setiap jam.

In [55]:
import pandas as pd

data_ts = pd.DataFrame({
    'Tanggal': pd.date_range(start='2025-01-01', periods=5, freq='D'),
    'Penjualan': [100, 120, 130, 110, 150]
})

data_ts

Unnamed: 0,Tanggal,Penjualan
0,2025-01-01,100
1,2025-01-02,120
2,2025-01-03,130
3,2025-01-04,110
4,2025-01-05,150


```python
'Tanggal': pd.date_range(start='2025-01-01', periods=5, freq='D')
```
adalah cara membuat kolom berisi tanggal berurutan menggunakan fungsi pd.date_range() dari pustaka pandas.

| Parameter            | Artinya                                             |
| -------------------- | --------------------------------------------------- |
| `start='2025-01-01'` | Tanggal mulai (1 Januari 2025)                      |
| `periods=5`          | Jumlah tanggal yang ingin dibuat (sebanyak 5 baris) |
| `freq='D'`           | Frekuensi waktunya adalah harian (`D` = daily)      |


Kalau kamu mau ganti jadi mingguan, tinggal ganti freq='W'. Atau bulanan → freq='M'.

### Cross-sectional
data dari banyak entitas pada satu titik waktu tertentu (bukan time-based).

Contoh: Pendapatan 10 perusahaan pada tahun 2025.

In [56]:
data_cs = pd.DataFrame({
    'Perusahaan': ['A', 'B', 'C'],
    'Pendapatan 2025': [50000, 70000, 60000]
})

data_cs

Unnamed: 0,Perusahaan,Pendapatan 2025
0,A,50000
1,B,70000
2,C,60000


### 2. Modul datetime dan Implementasinya (year, month, day)
Python memiliki modul datetime untuk mengelola tanggal dan waktu.

In [57]:
from datetime import datetime

# Ambil waktu sekarang
sekarang = datetime.now()

print("Tanggal dan Waktu Saat Ini:", sekarang)
print("Tahun:", sekarang.year)
print("Bulan:", sekarang.month)
print("Hari:", sekarang.day)

Tanggal dan Waktu Saat Ini: 2025-08-26 08:55:01.698334
Tahun: 2025
Bulan: 8
Hari: 26


### Fungsi Tanggal & Waktu di Python (datetime)
- today()

- weekday()

- isoweekday()

- isoformat()

### datetime.today()
 Fungsi ini akan memberikan tanggal dan waktu sekarang (saat ini).

In [58]:
from datetime import datetime

print(datetime.today())

2025-08-26 08:55:52.066048


### weekday()
Mengembalikan angka hari dalam seminggu (0 = Senin, 6 = Minggu).

In [60]:
hari_ini = datetime.today()
print(hari_ini.weekday())
# hasilnya 2 berarti rabu

1


### isoweekday()
Mirip weekday(), tapi dimulai dari 1 = Senin, 7 = Minggu (ISO standard).

In [61]:
print(hari_ini.isoweekday())

2


### isoformat()
Mengubah objek datetime ke format ISO 8601 string → YYYY-MM-DDTHH:MM:SS

In [62]:
print(hari_ini.isoformat())

2025-08-26T08:56:29.486650


```lua
2025-07-22T14:30:00
          ↑
          Ini adalah huruf T (bukan bagian dari jam)
```
| Bagian       | Keterangan              | Contoh                  |
| ------------ | ----------------------- | ----------------------- |
| `2025-07-22` | **Tanggal**             | 22 Juli 2025            |
| `T`          | **Pemisah (delimiter)** | Bukan bagian dari waktu |
| `14:30:00`   | **Waktu** (HH\:MM\:SS)  | Jam 14:30:00            |

Jadi T itu singkatan dari "Time", sebagai penanda bahwa setelahnya adalah bagian waktu.

| Fungsi         | Kegunaan                                    | Contoh Output                |
| -------------- | ------------------------------------------- | ---------------------------- |
| `today()`      | Tanggal dan waktu saat ini                  | `2025-07-22 03:12:45.123456` |
| `weekday()`    | Index hari dari 0 (Senin) sampai 6 (Minggu) | `1`                          |
| `isoweekday()` | Index hari dari 1 (Senin) sampai 7 (Minggu) | `2`                          |
| `isoformat()`  | Format string ISO 8601                      | `2025-07-22T03:12:45.123456` |


### Objek time
Gunakan datetime.time() untuk membuat objek waktu tanpa tanggal.

In [63]:
from datetime import time

jam = time(14, 30, 0)
print(jam)               # 14:30:00
print(jam.hour)          # 14
print(jam.minute)        # 30
print(jam.second)        # 0

14:30:00
14
30
0


### Objek datetime dan now()
Gabungan tanggal dan waktu.

In [64]:
from datetime import datetime

sekarang = datetime.now()
print(sekarang)           # contoh: 2025-07-22 14:42:00
print(sekarang.year)      # 2025
print(sekarang.hour)      # 14

2025-08-26 08:58:49.685719
2025
8


### Pandas Timestamp
Pandas punya tipe waktu sendiri yang powerful: Timestamp. Bisa dibuat dari:

### String

In [65]:
pd.Timestamp('2025-01-01 12:30:00')

Timestamp('2025-01-01 12:30:00')

### Komponen Tanggal

In [66]:
pd.Timestamp(year=2025, month=1, day=1, hour=12)

Timestamp('2025-01-01 12:00:00')

### Dari objek datetime

In [67]:
dt = datetime(2025, 1, 1)
pd.Timestamp(dt)

Timestamp('2025-01-01 00:00:00')

### Dari integer/float timestamp

In [68]:
pd.Timestamp(1672549200, unit='s')  # Unix timestamp

Timestamp('2023-01-01 05:00:00')

### Fungsi pd.to_datetime()
Berfungsi untuk mengonversi:

- String

- List tanggal

- Kolom dalam DataFrame
ke dalam bentuk datetime.

In [69]:
tanggal = ['2025-01-01', '2025-01-02', '2025-01-03']
pd.to_datetime(tanggal)

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03'], dtype='datetime64[ns]', freq=None)

### Parsing String ke DatetimeIndex

In [70]:
data = {
    'Tanggal': ['2025-01-01', '2025-01-02', '2025-01-03'],
    'Penjualan': [100, 200, 150]
}
df = pd.DataFrame(data)

# Ubah kolom Tanggal jadi datetime
df['Tanggal'] = pd.to_datetime(df['Tanggal'])

# Jadikan sebagai index
df = df.set_index('Tanggal')

df

Unnamed: 0_level_0,Penjualan
Tanggal,Unnamed: 1_level_1
2025-01-01,100
2025-01-02,200
2025-01-03,150


-----------

## Implementasi pada Dataset Superstore

In [73]:
import pandas as pd

# Import file CSV
df = pd.read_csv("Superstore.csv", encoding="ISO-8859-1")

# Lihat 5 baris pertama
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,6/12/2016,6/16/2016,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714
3,4,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164


## 1. GroupBy dan Agregasi
Contoh penggunaan fungsi: groupby(), agg(), mean(), sum(), count()

In [74]:
# Total Sales per Region
sales_per_region = df.groupby("Region")["Sales"].sum().reset_index()
sales_per_region.columns = ["Region", "Total Sales"]

sales_per_region

Unnamed: 0,Region,Total Sales
0,Central,501239.8908
1,East,678781.24
2,South,391721.905
3,West,725457.8245


In [75]:
# Rata-rata Profit per Category
avg_profit_per_category = df.groupby("Category")["Profit"].mean().reset_index()
avg_profit_per_category.columns = ["Category", "Average Profit"]

avg_profit_per_category

Unnamed: 0,Category,Average Profit
0,Furniture,8.699327
1,Office Supplies,20.32705
2,Technology,78.752002


In [76]:
# Jumlah transaksi per Segment
trans_count_per_segment = df.groupby("Segment")["Order ID"].count().reset_index()
trans_count_per_segment.columns = ["Segment", "Transaction Count"]

trans_count_per_segment

Unnamed: 0,Segment,Transaction Count
0,Consumer,5191
1,Corporate,3020
2,Home Office,1783


In [77]:
display(sales_per_region)
display(avg_profit_per_category)
display(trans_count_per_segment)


Unnamed: 0,Region,Total Sales
0,Central,501239.8908
1,East,678781.24
2,South,391721.905
3,West,725457.8245


Unnamed: 0,Category,Average Profit
0,Furniture,8.699327
1,Office Supplies,20.32705
2,Technology,78.752002


Unnamed: 0,Segment,Transaction Count
0,Consumer,5191
1,Corporate,3020
2,Home Office,1783


In [78]:
# urutkan berdasarkan nilai tertentu
sales_per_region.sort_values("Total Sales", ascending=True)

Unnamed: 0,Region,Total Sales
2,South,391721.905
0,Central,501239.8908
1,East,678781.24
3,West,725457.8245


## 2. Pivot Table dan Crosstab
Contoh penggunaan: pivot_table(), pd.crosstab()

In [None]:
# Pivot Table: Total Profit per Region dan Segment
pd.pivot_table(df, values="Profit", index="Region", columns="Segment", aggfunc="sum")

Segment,Consumer,Corporate,Home Office
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Central,8564.0481,18703.902,12438.4124
East,41190.9843,23622.5789,26709.2168
South,26913.5728,15215.2232,4620.6343
West,57450.604,34437.4299,16530.415


In [80]:
# Crosstab: Frekuensi Segment per Ship Mode
pd.crosstab(df["Segment"], df["Ship Mode"])

Ship Mode,First Class,Same Day,Second Class,Standard Class
Segment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Consumer,769,317,1020,3085
Corporate,485,114,609,1812
Home Office,284,112,316,1071


## 3. Merge & Join Dataset
Karena Sample - Superstore adalah satu file, kita buat 2 subset dulu lalu merge:

In [81]:
import pandas as pd

# Baca file Superstore
df = pd.read_csv("Superstore.csv", encoding="ISO-8859-1")

# Subset untuk simulasi dua tabel
orders = df[["Order ID", "Order Date", "Customer ID"]].drop_duplicates() # kiri
details = df[["Order ID", "Product Name", "Sales"]] # kanan

In [82]:
# cek apakah ada data yang kosong
print(df.isnull().sum())

Row ID           0
Order ID         0
Order Date       0
Ship Date        0
Ship Mode        0
Customer ID      0
Customer Name    0
Segment          0
Country          0
City             0
State            0
Postal Code      0
Region           0
Product ID       0
Category         0
Sub-Category     0
Product Name     0
Sales            0
Quantity         0
Discount         0
Profit           0
dtype: int64


### Inner Join

In [83]:
inner_join = pd.merge(orders, details, on="Order ID", how="inner")
print("Inner Join:")

inner_join.head()

Inner Join:


Unnamed: 0,Order ID,Order Date,Customer ID,Product Name,Sales
0,CA-2016-152156,11/8/2016,CG-12520,Bush Somerset Collection Bookcase,261.96
1,CA-2016-152156,11/8/2016,CG-12520,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,CA-2016-138688,6/12/2016,DV-13045,Self-Adhesive Address Labels for Typewriters b...,14.62
3,US-2015-108966,10/11/2015,SO-20335,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,US-2015-108966,10/11/2015,SO-20335,Eldon Fold 'N Roll Cart System,22.368


### Left Join

In [84]:
left_join = pd.merge(orders, details, on="Order ID", how="left")
print("Left Join:")

left_join


Left Join:


Unnamed: 0,Order ID,Order Date,Customer ID,Product Name,Sales
0,CA-2016-152156,11/8/2016,CG-12520,Bush Somerset Collection Bookcase,261.9600
1,CA-2016-152156,11/8/2016,CG-12520,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400
2,CA-2016-138688,6/12/2016,DV-13045,Self-Adhesive Address Labels for Typewriters b...,14.6200
3,US-2015-108966,10/11/2015,SO-20335,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,US-2015-108966,10/11/2015,SO-20335,Eldon Fold 'N Roll Cart System,22.3680
...,...,...,...,...,...
9989,CA-2014-110422,1/21/2014,TB-21400,Ultra Door Pull Handle,25.2480
9990,CA-2017-121258,2/26/2017,DB-13060,Tenex B1-RE Series Chair Mats for Low Pile Car...,91.9600
9991,CA-2017-121258,2/26/2017,DB-13060,Aastra 57i VoIP phone,258.5760
9992,CA-2017-121258,2/26/2017,DB-13060,"It's Hot Message Books with Stickers, 2 3/4"" x 5""",29.6000


## Right Join

In [85]:
right_join = pd.merge(orders, details, on="Order ID", how="right")
print("Right Join:")

right_join.head()

Right Join:


Unnamed: 0,Order ID,Order Date,Customer ID,Product Name,Sales
0,CA-2016-152156,11/8/2016,CG-12520,Bush Somerset Collection Bookcase,261.96
1,CA-2016-152156,11/8/2016,CG-12520,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,CA-2016-138688,6/12/2016,DV-13045,Self-Adhesive Address Labels for Typewriters b...,14.62
3,US-2015-108966,10/11/2015,SO-20335,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,US-2015-108966,10/11/2015,SO-20335,Eldon Fold 'N Roll Cart System,22.368


## Outer Join

In [86]:
outer_join = pd.merge(orders, details, on="Order ID", how="outer")
print("Outer Join:")

outer_join.head()

Outer Join:


Unnamed: 0,Order ID,Order Date,Customer ID,Product Name,Sales
0,CA-2014-100006,9/7/2014,DK-13375,AT&T EL51110 DECT,377.97
1,CA-2014-100090,7/8/2014,EB-13705,Hon 2111 Invitation Series Corner Table,502.488
2,CA-2014-100090,7/8/2014,EB-13705,"Wilson Jones Ledger-Size, Piano-Hinge Binder, ...",196.704
3,CA-2014-100293,3/14/2014,NF-18475,Xerox 1887,91.056
4,CA-2014-100328,1/28/2014,JC-15340,"Pressboard Covers with Storage Hooks, 9 1/2"" x...",3.928


In [87]:
# Merge berdasarkan Order ID
merged = pd.merge(orders, details, on="Order ID")

merged.head()

Unnamed: 0,Order ID,Order Date,Customer ID,Product Name,Sales
0,CA-2016-152156,11/8/2016,CG-12520,Bush Somerset Collection Bookcase,261.96
1,CA-2016-152156,11/8/2016,CG-12520,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,CA-2016-138688,6/12/2016,DV-13045,Self-Adhesive Address Labels for Typewriters b...,14.62
3,US-2015-108966,10/11/2015,SO-20335,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,US-2015-108966,10/11/2015,SO-20335,Eldon Fold 'N Roll Cart System,22.368


### Penjelasan Parameter how:
- "inner" → hanya data yang cocok di kedua tabel

- "left" → semua data dari tabel kiri (orders)

- "right" → semua data dari tabel kanan (details)

- "outer" → semua data dari kedua tabel, gabung berdasarkan Order ID

## 4. Reshape dan Melt
Contoh penggunaan: melt(), stack(), unstack()

In [88]:
# Melting beberapa kolom numerik
melted = pd.melt(df, id_vars=["Region", "Category"], value_vars=["Sales", "Profit"])

# Stack dan Unstack
pivot = df.pivot_table(index="Region", columns="Category", values="Sales", aggfunc="sum")
stacked = pivot.stack()
unstacked = stacked.unstack()

In [89]:
# Melting Kolom Numerik
melted

Unnamed: 0,Region,Category,variable,value
0,South,Furniture,Sales,261.9600
1,South,Furniture,Sales,731.9400
2,West,Office Supplies,Sales,14.6200
3,South,Furniture,Sales,957.5775
4,South,Office Supplies,Sales,22.3680
...,...,...,...,...
19983,South,Furniture,Profit,4.1028
19984,West,Furniture,Profit,15.6332
19985,West,Technology,Profit,19.3932
19986,West,Office Supplies,Profit,13.3200


In [90]:
# hasil pivot (Sales berdasarkan Region dan Category):
pivot

Category,Furniture,Office Supplies,Technology
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Central,163797.1638,167026.415,170416.312
East,208291.204,205516.055,264973.981
South,117298.684,125651.313,148771.908
West,252612.7435,220853.249,251991.832


In [91]:
# ubah kolom menjadi baris
stacked

Region   Category       
Central  Furniture          163797.1638
         Office Supplies    167026.4150
         Technology         170416.3120
East     Furniture          208291.2040
         Office Supplies    205516.0550
         Technology         264973.9810
South    Furniture          117298.6840
         Office Supplies    125651.3130
         Technology         148771.9080
West     Furniture          252612.7435
         Office Supplies    220853.2490
         Technology         251991.8320
dtype: float64

In [92]:
# kembali jadi kolom
unstacked

Category,Furniture,Office Supplies,Technology
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Central,163797.1638,167026.415,170416.312
East,208291.204,205516.055,264973.981
South,117298.684,125651.313,148771.908
West,252612.7435,220853.249,251991.832


In [93]:
# buat liat hasil semua
display(melted.head())
display(pivot)
display(stacked.head())
display(unstacked)

Unnamed: 0,Region,Category,variable,value
0,South,Furniture,Sales,261.96
1,South,Furniture,Sales,731.94
2,West,Office Supplies,Sales,14.62
3,South,Furniture,Sales,957.5775
4,South,Office Supplies,Sales,22.368


Category,Furniture,Office Supplies,Technology
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Central,163797.1638,167026.415,170416.312
East,208291.204,205516.055,264973.981
South,117298.684,125651.313,148771.908
West,252612.7435,220853.249,251991.832


Region   Category       
Central  Furniture          163797.1638
         Office Supplies    167026.4150
         Technology         170416.3120
East     Furniture          208291.2040
         Office Supplies    205516.0550
dtype: float64

Category,Furniture,Office Supplies,Technology
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Central,163797.1638,167026.415,170416.312
East,208291.204,205516.055,264973.981
South,117298.684,125651.313,148771.908
West,252612.7435,220853.249,251991.832


## 5. Time Series
Topik: parsing waktu, datetime, to_datetime(), weekday, dll.

In [95]:
# Ubah kolom Order Date menjadi datetime
df["Order Date"] = pd.to_datetime(df["Order Date"])

# Ekstrak informasi waktu
df["Year"] = df["Order Date"].dt.year
df["Month"] = df["Order Date"].dt.month
df["Day"] = df["Order Date"].dt.day
df["Weekday"] = df["Order Date"].dt.day_name()

# Filter data berdasarkan tahun
df_2017 = df[df["Order Date"].dt.year == 2017]

In [96]:
# Melihat hasil konversi kolom Order Date dan kolom waktu baru
df[["Order Date", "Year", "Month", "Day", "Weekday"]].head()

Unnamed: 0,Order Date,Year,Month,Day,Weekday
0,2016-11-08,2016,11,8,Tuesday
1,2016-11-08,2016,11,8,Tuesday
2,2016-06-12,2016,6,12,Sunday
3,2015-10-11,2015,10,11,Sunday
4,2015-10-11,2015,10,11,Sunday


In [97]:
# Menampilkan data yang hanya terjadi di tahun 2017
df_2017.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,Year,Month,Day,Weekday
12,13,CA-2017-114412,2017-04-15,4/20/2017,Standard Class,AA-10480,Andrew Allen,Consumer,United States,Concord,...,Paper,Xerox 1967,15.552,3,0.2,5.4432,2017,4,15,Saturday
23,24,US-2017-156909,2017-07-16,7/18/2017,Second Class,SF-20065,Sandra Flanagan,Consumer,United States,Philadelphia,...,Chairs,"Global Deluxe Stacking Chair, Gray",71.372,2,0.3,-1.0196,2017,7,16,Sunday
34,35,CA-2017-107727,2017-10-19,10/23/2017,Second Class,MA-17560,Matt Abelman,Home Office,United States,Houston,...,Paper,Easy-staple paper,29.472,3,0.2,9.9468,2017,10,19,Thursday
41,42,CA-2017-120999,2017-09-10,9/15/2017,Standard Class,LC-16930,Linda Cazamias,Corporate,United States,Naperville,...,Phones,Panasonic Kx-TS550,147.168,4,0.2,16.5564,2017,9,10,Sunday
43,44,CA-2017-139619,2017-09-19,9/23/2017,Standard Class,ES-14080,Erin Smith,Corporate,United States,Melbourne,...,Storage,"Advantus 10-Drawer Portable Organizer, Chrome ...",95.616,2,0.2,9.5616,2017,9,19,Tuesday


In [98]:
# kalau ingin tahu berapa banyak data dalam tahun 2017
len(df_2017)

3312

In [99]:
df_2017.shape

(3312, 25)

### Penjelasan :
- 3312 baris: Ada 3312 transaksi (record) yang terjadi di tahun 2017.

- 25 kolom: Setiap baris memiliki 25 atribut (kolom), seperti Order Date, Sales, Profit, Category, dll.

- Dataset df_2017 memiliki 3312 baris data dan 25 kolom, yang merupakan subset dari data original berdasarkan filter tahun 2017.

In [100]:
# Menampilkan jumlah transaksi per bulan di tahun 2017
df_2017["Month"].value_counts().sort_index()

Month
1     155
2     107
3     238
4     203
5     242
6     245
7     226
8     218
9     459
10    298
11    459
12    462
Name: count, dtype: int64

In [102]:
# tabel
df_2017.groupby("Month")["Order ID"].count()

Month
1     155
2     107
3     238
4     203
5     242
6     245
7     226
8     218
9     459
10    298
11    459
12    462
Name: Order ID, dtype: int64

In [103]:
# untuk menampilkan semua
display(df[["Order Date", "Year", "Month", "Day", "Weekday"]].head())
display(df_2017.head())
display(df_2017.groupby("Month")["Order ID"].count())

Unnamed: 0,Order Date,Year,Month,Day,Weekday
0,2016-11-08,2016,11,8,Tuesday
1,2016-11-08,2016,11,8,Tuesday
2,2016-06-12,2016,6,12,Sunday
3,2015-10-11,2015,10,11,Sunday
4,2015-10-11,2015,10,11,Sunday


Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,Year,Month,Day,Weekday
12,13,CA-2017-114412,2017-04-15,4/20/2017,Standard Class,AA-10480,Andrew Allen,Consumer,United States,Concord,...,Paper,Xerox 1967,15.552,3,0.2,5.4432,2017,4,15,Saturday
23,24,US-2017-156909,2017-07-16,7/18/2017,Second Class,SF-20065,Sandra Flanagan,Consumer,United States,Philadelphia,...,Chairs,"Global Deluxe Stacking Chair, Gray",71.372,2,0.3,-1.0196,2017,7,16,Sunday
34,35,CA-2017-107727,2017-10-19,10/23/2017,Second Class,MA-17560,Matt Abelman,Home Office,United States,Houston,...,Paper,Easy-staple paper,29.472,3,0.2,9.9468,2017,10,19,Thursday
41,42,CA-2017-120999,2017-09-10,9/15/2017,Standard Class,LC-16930,Linda Cazamias,Corporate,United States,Naperville,...,Phones,Panasonic Kx-TS550,147.168,4,0.2,16.5564,2017,9,10,Sunday
43,44,CA-2017-139619,2017-09-19,9/23/2017,Standard Class,ES-14080,Erin Smith,Corporate,United States,Melbourne,...,Storage,"Advantus 10-Drawer Portable Organizer, Chrome ...",95.616,2,0.2,9.5616,2017,9,19,Tuesday


Month
1     155
2     107
3     238
4     203
5     242
6     245
7     226
8     218
9     459
10    298
11    459
12    462
Name: Order ID, dtype: int64

------------------

## Latihan

### Soal 1
Berapa total penjualan (Sales) untuk setiap kategori (Category)?

In [104]:
# Write Your Code Here
sales_per_category = df.groupby("Category")["Sales"].sum().reset_index()
sales_per_category.columns = ["Category", "Total Sales"]

sales_per_category


Unnamed: 0,Category,Total Sales
0,Furniture,741999.7953
1,Office Supplies,719047.032
2,Technology,836154.033


### Soal 2
Tampilkan rata-rata profit per Ship Mode dalam bentuk pivot table.

In [105]:
# Write Your Code Here
pd.pivot_table(df, values="Profit", index="Ship Mode", aggfunc="mean")

Unnamed: 0_level_0,Profit
Ship Mode,Unnamed: 1_level_1
First Class,31.839948
Same Day,29.266591
Second Class,29.535545
Standard Class,27.49477


### Soal 3
Gabungkan informasi tanggal pesanan (Order Date) dengan detail produk berdasarkan Order ID.

In [106]:
# Write Your Code Here
orders = df[["Order ID", "Order Date"]].drop_duplicates()
details = df[["Order ID", "Product Name", "Sales"]]
pd.merge(orders, details, on="Order ID")

Unnamed: 0,Order ID,Order Date,Product Name,Sales
0,CA-2016-152156,2016-11-08,Bush Somerset Collection Bookcase,261.9600
1,CA-2016-152156,2016-11-08,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400
2,CA-2016-138688,2016-06-12,Self-Adhesive Address Labels for Typewriters b...,14.6200
3,US-2015-108966,2015-10-11,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,US-2015-108966,2015-10-11,Eldon Fold 'N Roll Cart System,22.3680
...,...,...,...,...
9989,CA-2014-110422,2014-01-21,Ultra Door Pull Handle,25.2480
9990,CA-2017-121258,2017-02-26,Tenex B1-RE Series Chair Mats for Low Pile Car...,91.9600
9991,CA-2017-121258,2017-02-26,Aastra 57i VoIP phone,258.5760
9992,CA-2017-121258,2017-02-26,"It's Hot Message Books with Stickers, 2 3/4"" x 5""",29.6000


### Soal 4
Gunakan melt() untuk mengubah kolom Sales dan Profit menjadi baris berdasarkan Region

In [None]:
# Write Your Code Here
pd.melt(df, id_vars="Region", value_vars=["Sales", "Profit"])

Unnamed: 0,Region,variable,value
0,South,Sales,261.9600
1,South,Sales,731.9400
2,West,Sales,14.6200
3,South,Sales,957.5775
4,South,Sales,22.3680
...,...,...,...
19983,South,Profit,4.1028
19984,West,Profit,15.6332
19985,West,Profit,19.3932
19986,West,Profit,13.3200


### Soal 5
Tampilkan jumlah pesanan yang terjadi pada hari Senin saja.

In [108]:
# Write Your Code Here
df["Order Date"] = pd.to_datetime(df["Order Date"])
df[df["Order Date"].dt.weekday == 0]["Order ID"].nunique()

920

---------

### Dataset
Gunakan Superstore.csv, dan bagi menjadi dua dataframe:
```python
orders = df[["Order ID", "Order Date", "Customer ID", "Region"]].drop_duplicates()
details = df[["Order ID", "Product Name", "Category", "Sales", "Profit"]]
```

In [109]:
# Write Your Code Here
orders = df[["Order ID", "Order Date", "Customer ID", "Region"]].drop_duplicates()
details = df[["Order ID", "Product Name", "Category", "Sales", "Profit"]]

### Soal 6 : Inner Join
Gabungkan tabel orders dan details menggunakan inner join berdasarkan Order ID.
Pertanyaan:

- Berapa total baris hasil gabungan?

- Tampilkan 5 data pertama yang berada di region "West".

In [111]:
# Write Your Code Here
inner = pd.merge(orders, details, on="Order ID", how="inner")
print(inner.shape)  # Ukuran data
inner[inner["Region"] == "West"]  # Filter region "West"

(9994, 8)


Unnamed: 0,Order ID,Order Date,Customer ID,Region,Product Name,Category,Sales,Profit
2,CA-2016-138688,2016-06-12,DV-13045,West,Self-Adhesive Address Labels for Typewriters b...,Office Supplies,14.620,6.8714
5,CA-2014-115812,2014-06-09,BH-11710,West,Eldon Expressions Wood and Plastic Desk Access...,Furniture,48.860,14.1694
6,CA-2014-115812,2014-06-09,BH-11710,West,Newell 322,Office Supplies,7.280,1.9656
7,CA-2014-115812,2014-06-09,BH-11710,West,Mitel 5320 IP Phone VoIP phone,Technology,907.152,90.7152
8,CA-2014-115812,2014-06-09,BH-11710,West,DXL Angle-View Binders with Locking Rings by S...,Office Supplies,18.504,5.7825
...,...,...,...,...,...,...,...,...
9986,CA-2016-125794,2016-09-29,ML-17410,West,Memorex Mini Travel Drive 64 GB USB 2.0 Flash ...,Technology,36.240,15.2208
9990,CA-2017-121258,2017-02-26,DB-13060,West,Tenex B1-RE Series Chair Mats for Low Pile Car...,Furniture,91.960,15.6332
9991,CA-2017-121258,2017-02-26,DB-13060,West,Aastra 57i VoIP phone,Technology,258.576,19.3932
9992,CA-2017-121258,2017-02-26,DB-13060,West,"It's Hot Message Books with Stickers, 2 3/4"" x 5""",Office Supplies,29.600,13.3200


### Soal 7 : Left Join
Lakukan left join antara orders (kiri) dan details (kanan).
Pertanyaan:

- Apakah hasil join ini memiliki baris lebih banyak dari inner join?

- Ada berapa nilai NaN di kolom Product Name?

In [112]:
# Write Your Code Here
left = pd.merge(orders, details, on="Order ID", how="left")
print(left.shape)
print(left["Product Name"].isna().sum())  # Hitung missing values

(9994, 8)
0


### Soal 8 : Right Join
Lakukan right join antara orders (kiri) dan details (kanan).

Pertanyaan:

- Berapa banyak transaksi (Order ID) dari hasil right join yang tidak ada di tabel orders?

In [113]:
# Write Your Code Here
right = pd.merge(orders, details, on="Order ID", how="right")
missing_orders = right["Customer ID"].isna().sum()
print(missing_orders)

0


### Soal 9 : Full Outer Join
Gabungkan kedua tabel dengan outer join berdasarkan Order ID.
Pertanyaan:

- Hitung jumlah total Order ID unik dari hasil outer join.

- Berapa baris yang memiliki missing value di kolom Customer ID?

In [114]:
# Write your Code Here
outer = pd.merge(orders, details, on="Order ID", how="outer")
unique_order_ids = outer["Order ID"].nunique()
missing_customer_id = outer["Customer ID"].isna().sum()
print("Order ID unik:", unique_order_ids)
print("Missing Customer ID:", missing_customer_id)


Order ID unik: 5009
Missing Customer ID: 0
