# 銷售行為分析

你剛接手超商的店長職位，為了想要了解顧客的銷售行為，打算打造一個系統來搜集並且分析資料。

在這個系統中，你會紀錄每個顧客的性別與年齡層，以及所購買的所有商品，希望能夠透過這些資料來進行銷售行為與顧客的分析。

In [60]:
# 商品結構
# p001 為商品編號
items = {
    "p001": {
        "name": "番茄",
        "categories": ["fruit", "vegetable"],
        "price": 32,
        "calories": 123
    },
    "p002": {
        "name": "西瓜",
        "categories": ["fruit"],
        "price": 123,
        "calories": 22        
    }
}


# 訂單結構
# age: 10(10歲以下), 20(10-20歲), 30, 40, 50, 60, 99(目測超過60歲), -1(無法目測歲數)
# gender: 0(男生), 1(女生)
orders = [
    {
        "time": "2020-03-01 12:00:12",
        "customer": {
            "age": 20, 
            "gender": 0,
        },
        "items": [
            "p001",  # 編號 p001 買 1 個
            "p002",  # 編號 p002 買 3 個
            "p002",
            "p002",
        ]
    },
    {
        "time": "2020-03-01 12:00:12",
        "customer": {
            "age": -1,
            "gender": 0,
        },
        "items": [
            "p001",  # 編號 p001 買 1 個
        ]
    },
]

此為產生假資料的腳本，請務必執行～

In [1]:
import string
from datetime import datetime
from random import randint, choice

NUM_ITEMS = 50
NUM_ORDERS = 10000
GENDERS = [0, 1]
AGES = [-1, 99, 10, 20, 30, 40, 50, 60]
CATEGORIES = ['fruit', 'vegetable', 'drink', 'meat', 'bread']
items = { 
    "p{:03d}".format(idx): {
        "name": ''.join([choice(string.ascii_letters) for _ in range(randint(5, 10))]),
        "categories": [choice(CATEGORIES) for _ in range(randint(1, 3))],
        "price": randint(10, 500),
        "calories": randint(50, 1000)
    } 
    for idx in range(NUM_ITEMS)
}

now = datetime.now().timestamp()
orders = [
    {
        "time": datetime.fromtimestamp(now - randint(0, 86400 * 30)).strftime('%Y-%m-%d %H:%M:%S'),
        "customer": {
            "age": choice(AGES),
            "gender": choice(GENDERS),
        },
        "items": [choice(list(items.keys())) for _ in range(randint(1, 10))]
    }
    for idx in range(NUM_ORDERS)
]

## Q1 所有訂單的銷售總額？

In [7]:
items["p001"]["price"]

116

In [15]:
# imperative
total_price = 0
for order in orders:
    order_items = order["items"]
    for item_id in order_items:
        total_price = total_price + items[item_id]["price"]

# declarative
total_price_declarative = sum([
    items[item_id]["price"]
    for order in orders
    for item_id in order["items"]
])

print(total_price, total_price_declarative)

14399020 14399020


## Q2 最熱賣的十項商品為何？

## Q3 總熱量最低的訂單總熱量為何？

## Q4 青少年(10-30歲) 最愛的商品為何？

## Q5 男性最愛的類別為何？


## Q6 女性平均每筆訂單金額為何？

## Q7 飲料類別中，各年齡層的消費力為何？
消費力 = 消費金額總數，假設現在有三筆訂單：

1. age(20), 番茄、牛奶(100)
2. age(10), 奶茶(30)
3. age(10), 果汁(15)、紅茶(20)

因為番茄沒有飲料的類別，所以濾掉不計

* age: 20 的消費力 = 100
* age: 10 的消費力 = 30 + 15 + 20 = 65

## Q8 商品 p000 最常跟哪個商品一起購買？

## Q9 每個商品的男性購買力與女性購買力各自為何？