# IPW-Learnerの理解

- 概要
  - 参考: https://github.com/st-tech/zr-obp/tree/master
- インプット
  - Context: 共変量
  - Action: 行動
  - Reward: 報酬
  - Propensity: 傾向スコア
- 学習
  - 下記を分類モデルに学習させる
    - X: Context
    - Y: Action
    - sample_weight: Reward / Propensity   
- 予測
  - 分類モデルにContextを入力すると、各行動の確率が出力される

In [30]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

### Case1. 線形効果 + ロジスティック回帰

- 設定
  - 200人のユーザー
  - Train:Test = 5:5
  - Treatment:Control = 90%:10%
  - 共変量は年齢のみ
  - 報酬は100 + 年齢
- 期待
  - 年齢が高いユーザーをTreatmentとして推奨

In [31]:
N = 20000
np.random.seed(0)
age = np.random.randint(20, 60, N)
context = age
reward = 100 + age
treatment_rate = 0.9
action = np.random.choice([0, 1], N, p=[1 - treatment_rate, treatment_rate])
prpensity = np.where(action == 1, treatment_rate, (1 - treatment_rate))
sample_weight = reward / prpensity
train_flag = np.random.choice([0, 1], N, p=[0.5, 0.5])

In [32]:
# データフレームにまとめる
df = pd.DataFrame({
  'age': age, 
  'context': context, 
  'reward': reward, 
  'action': action, 
  'prpensity': prpensity, 
  'sample_weight': sample_weight, 
  'train_flag': train_flag
})
df

Unnamed: 0,age,context,reward,action,prpensity,sample_weight,train_flag
0,20,20,120,1,0.9,133.333333,0
1,23,23,123,1,0.9,136.666667,0
2,23,23,123,1,0.9,136.666667,0
3,59,59,159,1,0.9,176.666667,1
4,29,29,129,1,0.9,143.333333,0
...,...,...,...,...,...,...,...
19995,58,58,158,1,0.9,175.555556,1
19996,24,24,124,0,0.1,1240.000000,0
19997,53,53,153,1,0.9,170.000000,0
19998,48,48,148,1,0.9,164.444444,0


In [33]:
train_df = df[df['train_flag'] == 1]
test_df = df[df['train_flag'] == 0]

In [34]:
model = LogisticRegression()
model.fit(X = train_df[['context']], y = train_df['reward'], sample_weight = train_df['sample_weight']) 
test_df['pred'] = model.predict(test_df[['context']])

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_df['pred'] = model.predict(test_df[['context']])


In [35]:
# 年代ごとに予測された介入率を計算
test_df['pred'].groupby(test_df['age'] // 10 * 10).mean() / test_df['pred'].groupby(test_df['age'] // 10 * 10).count()

age
20    0.049735
30    0.054803
40    0.056832
50    0.062834
Name: pred, dtype: float64

### Case2. 非線形効果 + ロジスティック回帰

- 設定
  - 200人のユーザー
  - Train:Test = 5:5
  - Treatment:Control = 90%:10%
  - 共変量は年齢のみ
  - 報酬は下記
    - 40歳未満: 100 + 年齢
    - 40歳以上: 180 - 年齢
- 期待
  - 40歳を推奨したいが、非線形回帰では表現しきれない

In [36]:
N = 20000
np.random.seed(0)
age = np.random.randint(20, 60, N)
context = age
reward = np.where(age < 40, 100 + age, 180 - age)
treatment_rate = 0.9
action = np.random.choice([0, 1], N, p=[1 - treatment_rate, treatment_rate])
prpensity = np.where(action == 1, treatment_rate, (1 - treatment_rate))
sample_weight = reward / prpensity
train_flag = np.random.choice([0, 1], N, p=[0.5, 0.5])

In [37]:
# データフレームにまとめる
df = pd.DataFrame({
  'age': age, 
  'context': context, 
  'reward': reward, 
  'action': action, 
  'prpensity': prpensity, 
  'sample_weight': sample_weight, 
  'train_flag': train_flag
})
df

Unnamed: 0,age,context,reward,action,prpensity,sample_weight,train_flag
0,20,20,120,1,0.9,133.333333,0
1,23,23,123,1,0.9,136.666667,0
2,23,23,123,1,0.9,136.666667,0
3,59,59,121,1,0.9,134.444444,1
4,29,29,129,1,0.9,143.333333,0
...,...,...,...,...,...,...,...
19995,58,58,122,1,0.9,135.555556,1
19996,24,24,124,0,0.1,1240.000000,0
19997,53,53,127,1,0.9,141.111111,0
19998,48,48,132,1,0.9,146.666667,0


In [38]:
train_df = df[df['train_flag'] == 1]
test_df = df[df['train_flag'] == 0]

In [39]:
model = LogisticRegression()
model.fit(X = train_df[['context']], y = train_df['reward'], sample_weight = train_df['sample_weight']) 
test_df['pred'] = model.predict(test_df[['context']])

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_df['pred'] = model.predict(test_df[['context']])


In [40]:
# 年代ごとに予測された介入率を計算
test_df['pred'].groupby(test_df['age'] // 10 * 10).mean() / test_df['pred'].groupby(test_df['age'] // 10 * 10).count()

age
20    0.049676
30    0.050854
40    0.050901
50    0.053582
Name: pred, dtype: float64

### Case3. 非線形効果 + Random Forest

- 設定
  - 200人のユーザー
  - Train:Test = 5:5
  - Treatment:Control = 90%:10%
  - 共変量は年齢のみ
  - 報酬は下記
    - 40歳未満: 100 + 年齢
    - 40歳以上: 180 - 年齢
- 期待
  - 40歳を推奨したいが、非線形回帰では表現しきれない

In [41]:
N = 20000
np.random.seed(0)
age = np.random.randint(20, 60, N)
context = age
reward = np.where(age < 40, 100 + age, 180 - age)
treatment_rate = 0.9
action = np.random.choice([0, 1], N, p=[1 - treatment_rate, treatment_rate])
prpensity = np.where(action == 1, treatment_rate, (1 - treatment_rate))
sample_weight = reward / prpensity
train_flag = np.random.choice([0, 1], N, p=[0.5, 0.5])

In [42]:
# データフレームにまとめる
df = pd.DataFrame({
  'age': age, 
  'context': context, 
  'reward': reward, 
  'action': action, 
  'prpensity': prpensity, 
  'sample_weight': sample_weight, 
  'train_flag': train_flag
})
df

Unnamed: 0,age,context,reward,action,prpensity,sample_weight,train_flag
0,20,20,120,1,0.9,133.333333,0
1,23,23,123,1,0.9,136.666667,0
2,23,23,123,1,0.9,136.666667,0
3,59,59,121,1,0.9,134.444444,1
4,29,29,129,1,0.9,143.333333,0
...,...,...,...,...,...,...,...
19995,58,58,122,1,0.9,135.555556,1
19996,24,24,124,0,0.1,1240.000000,0
19997,53,53,127,1,0.9,141.111111,0
19998,48,48,132,1,0.9,146.666667,0


In [43]:
train_df = df[df['train_flag'] == 1]
test_df = df[df['train_flag'] == 0]

In [44]:
model = RandomForestClassifier()
model.fit(X = train_df[['context']], y = train_df['reward'], sample_weight = train_df['sample_weight']) 
test_df['pred'] = model.predict(test_df[['context']])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_df['pred'] = model.predict(test_df[['context']])


In [45]:
# 年代ごとに予測された介入率を計算
test_df['pred'].groupby(test_df['age'] // 10 * 10).mean() / test_df['pred'].groupby(test_df['age'] // 10 * 10).count()

age
20    0.049693
30    0.054686
40    0.053368
50    0.051247
Name: pred, dtype: float64