신용카드 사용 내역 데이터(2019.01 ~ 2020.03)를 활용한 지역별, 업종별 월간 카드 사용 총액(2020.04) 예측

데이터셋 출처 : https://dacon.io/competitions/official/235615/overview/

# 데이터셋 이해

REG_YYMM : 날짜

CARD_SIDO_NM : 카드이용지역_시도 (가맹점 주소 기준)

CARD_CCG_NM : 카드이용지역_시군구 (가맹점 주소 기준)

STD_CLSS_NM : 업종명

HOM_SIDO_NM : 거주지역_시도, (고객 집주소 기준)

HOM_CCG_NM : 거주지역_시군구 (고객 집주소 기준)

AGE : 연령대

SEX_CTGO_CD : 성별 (1: 남성, 2: 여성)

FLC : 가구생애주기 (1: 1인가구, 2: 영유아자녀가구, 3: 중고생자녀가구, 4: 성인자녀가구, 5: 노년가구)

CSTMR_CNT : 이용고객수 (명)

AMT : 이용금액 (원)

CNT : 이용건수 (건)

# 데이터 전처리

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
import lightgbm as lgb
pd.options.display.float_format = '{:.1f}'.format

In [2]:
data = pd.read_csv('data/201901-202003.csv')
data.head()

Unnamed: 0,REG_YYMM,CARD_SIDO_NM,CARD_CCG_NM,STD_CLSS_NM,HOM_SIDO_NM,HOM_CCG_NM,AGE,SEX_CTGO_CD,FLC,CSTMR_CNT,AMT,CNT
0,201901,강원,강릉시,건강보조식품 소매업,강원,강릉시,20s,1,1,4,311200,4
1,201901,강원,강릉시,건강보조식품 소매업,강원,강릉시,30s,1,2,7,1374500,8
2,201901,강원,강릉시,건강보조식품 소매업,강원,강릉시,30s,2,2,6,818700,6
3,201901,강원,강릉시,건강보조식품 소매업,강원,강릉시,40s,1,3,4,1717000,5
4,201901,강원,강릉시,건강보조식품 소매업,강원,강릉시,40s,1,4,3,1047300,3


In [3]:
# 날짜 처리
def grap_year(data):
    data = str(data)
    return int(data[:4])

def grap_month(data):
    data = str(data)
    return int(data[4:])

In [4]:
data = data.fillna('')
data['year'] = data['REG_YYMM'].apply(lambda x: grap_year(x))
data['month'] = data['REG_YYMM'].apply(lambda x: grap_month(x))
data = data.drop(['REG_YYMM'], axis=1)

In [5]:
df = data.copy()

columns = ['CARD_SIDO_NM', 'STD_CLSS_NM','HOM_CCG_NM','CARD_CCG_NM', 'HOM_SIDO_NM', 'AGE', 'SEX_CTGO_CD', 'FLC', 'year', 'month']
df = df.groupby(columns).sum().reset_index(drop=False)

In [6]:
group = df.groupby(['CARD_SIDO_NM','STD_CLSS_NM']).sum()
df1 = df.set_index(['CARD_SIDO_NM','STD_CLSS_NM'])

# EDA

- 업종별로 카드 사용량의 차이가 큼
- 코로나의 여파로 카드 총 사용량은 변하지 않음
- 코로나의 여파로 업종별로 카드 사용량이 변하는걸 알 수 있음

In [7]:
df_q = df.groupby(['year','month','STD_CLSS_NM']).agg({'AMT':'sum'})
df_q = df_q.loc[2019]
df_q

Unnamed: 0_level_0,Unnamed: 1_level_0,AMT
month,STD_CLSS_NM,Unnamed: 2_level_1
1,건강보조식품 소매업,8605074944
1,골프장 운영업,11968748603
1,과실 및 채소 소매업,44453112689
1,관광 민예품 및 선물용품 소매업,955750428
1,그외 기타 분류안된 오락관련 서비스업,1017500
...,...,...
12,피자 햄버거 샌드위치 및 유사 음식점업,75293316726
12,한식 음식점업,1024610072785
12,호텔업,21380718943
12,화장품 및 방향제 소매업,40400412420


In [8]:
df_q = df.groupby(['year','month','STD_CLSS_NM']).agg({'AMT':'sum'})
df_q = df_q.loc[2020]
df_q

Unnamed: 0_level_0,Unnamed: 1_level_0,AMT
month,STD_CLSS_NM,Unnamed: 2_level_1
1,건강보조식품 소매업,10380995655
1,골프장 운영업,13414089759
1,과실 및 채소 소매업,55612798228
1,관광 민예품 및 선물용품 소매업,981629002
1,그외 기타 분류안된 오락관련 서비스업,1390350
...,...,...
3,피자 햄버거 샌드위치 및 유사 음식점업,65972238656
3,한식 음식점업,666573459086
3,호텔업,5073633041
3,화장품 및 방향제 소매업,31002789304


# 모델링

- 앞서 시행한 데이터 전처리와 EDA를 참고하여 for문을 통한 모델 구축
- 업종별로 데이터를 분할해서 모델링을 함
- 최종적으로 필요한 예측한 데이터셋 생성

In [None]:
predict = pd.DataFrame()
# df = df.drop(['HOM_CCG_NM', 'CARD_CCG_NM'], axis=1)
for i,j in group.index:
    df = df1.loc[i,j] # 1번 강원-건강보조식품

    columns = ['CARD_SIDO_NM', 'STD_CLSS_NM', 'HOM_SIDO_NM', 'AGE', 'SEX_CTGO_CD', 'FLC', 'year', 'month']
    df = df.groupby(columns).sum().reset_index(drop=False) # 시군구 자동 드랍

    # 라벨 인코딩
    df_re = df.copy()
    columns = ['CARD_SIDO_NM', 'STD_CLSS_NM', 'HOM_SIDO_NM','AGE']
    for r in columns:
      encoder = LabelEncoder()
      encoded = encoder.fit(df[r])
      df_re[r] = encoded.transform(df[r])

  
    # feature, target 설정
    train_num = df_re.sample(frac=1, random_state=0)
    x = train_num.drop(['CSTMR_CNT', 'AMT', 'CNT'], axis=1)
    y = np.log1p(train_num['AMT'])

    try:
      k = int(len(x)*0.9)

      x_train = x[:k]
      y_train = y[:k]
      x_val = x[k:]
      y_val = y[k:]

      train_ds = lgb.Dataset(x_train, label=y_train)
      val_ds = lgb.Dataset(x_val, label=y_val)

      params = {'learning_rate' : 0.05,
                  'boosting_type': 'gbdt',
                  'objective': 'tweedie',
                  'tweedie_variance_power': 1.1,
                  'metric': 'rmse',
                  'sub_row' : 0.75,
                  'lambda_l2' : 0.1
                  }

      model = lgb.train(params,
                          train_ds,
                          1000,
                          val_ds,
                          verbose_eval = 100,
                          early_stopping_rounds = 100
                          )
    
      # 예측 템플릿 만들기
      CARD_SIDO_NMs = df_re['CARD_SIDO_NM'].unique()
      STD_CLSS_NMs  = df_re['STD_CLSS_NM'].unique()
      HOM_SIDO_NMs  = df_re['HOM_SIDO_NM'].unique()
      AGEs          = df_re['AGE'].unique()
      SEX_CTGO_CDs  = df_re['SEX_CTGO_CD'].unique()
      FLCs          = df_re['FLC'].unique()
      years         = [2020]
      months        = [4]

      temp = []
      for CARD_SIDO_NM in CARD_SIDO_NMs:
        for STD_CLSS_NM in STD_CLSS_NMs:
          for HOM_SIDO_NM in HOM_SIDO_NMs:
            for AGE in AGEs:
              for SEX_CTGO_CD in SEX_CTGO_CDs:
                for FLC in FLCs:
                  for year in years:
                    for month in months:
                      temp.append([CARD_SIDO_NM, STD_CLSS_NM, HOM_SIDO_NM, AGE, SEX_CTGO_CD, FLC, year, month])

      temp = np.array(temp)
      temp = pd.DataFrame(data=temp, columns=x.columns)

        
      pred = model.predict(temp)
      pred = np.expm1(pred)

      temp['AMT'] = np.round(pred, 0)

      temp['REG_YYMM'] = temp['year']*100 + temp['month']
      temp = temp[['REG_YYMM', 'AMT']]

      temp = temp.groupby('REG_YYMM').sum().reset_index(drop=False)

      temp['CARD_SIDO_NM'] = i
      temp['STD_CLSS_NM'] = j
      predict = predict.append(temp)
      print(i,j,"done")

    except:
      temp = pd.DataFrame()
      temp['REG_YYMM']=[202004]
      temp['CARD_SIDO_NM'] = i
      temp['STD_CLSS_NM'] = j
      temp['AMT']=0
      predict = predict.append(temp)
      print(i,j,"done")

You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 40
[LightGBM] [Info] Number of data points in the train set: 295, number of used features: 6
[LightGBM] [Info] Start training from score 2.631352
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.9043
[200]	valid_0's rmse: 0.853412


[300]	valid_0's rmse: 0.828976
[400]	valid_0's rmse: 0.818451
[500]	valid_0's rmse: 0.799582


[600]	valid_0's rmse: 0.790553
[700]	valid_0's rmse: 0.771598
[800]	valid_0's rmse: 0.760088


[900]	valid_0's rmse: 0.759218
Early stopping, best iteration is:
[879]	valid_0's rmse: 0.7579
강원 건강보조식품 소매업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 1584, number of used features: 6
[LightGBM] [Info] Start training from score 2.685066
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 1.12057
[200]	valid_0's rmse: 1.03768
[300]	valid_0's rmse: 0.980797
[400]	valid_0's rmse: 0.94781


[500]	valid_0's rmse: 0.928167
[600]	valid_0's rmse: 0.920333
[700]	valid_0's rmse: 0.90921
[800]	valid_0's rmse: 0.903542
[900]	valid_0's rmse: 0.904429
Early stopping, best iteration is:
[804]	valid_0's rmse: 0.903029
강원 골프장 운영업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 1566, number of used features: 6
[LightGBM] [Info] Start training from score 2.628121
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.958098
[200]	valid_0's rmse: 0.816517
[300]	valid_0's rmse: 0.7712
[400]	valid_0's rmse: 0.741803
[500]	valid_0's rmse: 0.736579
Early stopping, best iteration is:
[437]	valid_0's rmse: 0.73541
강원 과실 및 채소 소매업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 836, number of use

강원 그외 기타 분류안된 오락관련 서비스업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 28
[LightGBM] [Info] Number of data points in the train set: 121, number of used features: 4
[LightGBM] [Info] Start training from score 2.563619
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.592977


[200]	valid_0's rmse: 0.583402
Early stopping, best iteration is:
[117]	valid_0's rmse: 0.581873
강원 그외 기타 스포츠시설 운영업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 40
[LightGBM] [Info] Number of data points in the train set: 515, number of used features: 6
[LightGBM] [Info] Start training from score 2.650424
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.654352
[200]	valid_0's rmse: 0.641439


[300]	valid_0's rmse: 0.627198
[400]	valid_0's rmse: 0.618227


[500]	valid_0's rmse: 0.612418
Early stopping, best iteration is:
[463]	valid_0's rmse: 0.610074
강원 그외 기타 종합 소매업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 2523, number of used features: 6
[LightGBM] [Info] Start training from score 2.682455
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.935012
[200]	valid_0's rmse: 0.781044
[300]	valid_0's rmse: 0.6772
[400]	valid_0's rmse: 0.622216
[500]	valid_0's rmse: 0.599213
[600]	valid_0's rmse: 0.582179
[700]	valid_0's rmse: 0.571813
[800]	valid_0's rmse: 0.566116
[900]	valid_0's rmse: 0.563478
Early stopping, best iteration is:
[898]	valid_0's rmse: 0.563011
강원 기타 대형 종합 소매업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 36
[LightGBM] [Info] Number of data points in the train set: 167, numbe

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 1265, number of used features: 6
[LightGBM] [Info] Start training from score 2.622075
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.825977
[200]	valid_0's rmse: 0.778821
[300]	valid_0's rmse: 0.778808
Early stopping, best iteration is:
[238]	valid_0's rmse: 0.775179
강원 기타 외국식 음식점업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 36
[LightGBM] [Info] Number of data points in the train set: 262, number of used features: 6
[LightGBM] [Info] Start training from score 2.719545
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.961934
[200]	valid_0's rmse: 0.927375
[300]	valid_0's rmse: 0.922957


Early stopping, best iteration is:
[273]	valid_0's rmse: 0.920978
강원 기타 주점업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 1728, number of used features: 6
[LightGBM] [Info] Start training from score 2.668807
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.901938
[200]	valid_0's rmse: 0.799823
[300]	valid_0's rmse: 0.777436
[400]	valid_0's rmse: 0.771869
[500]	valid_0's rmse: 0.770958
Early stopping, best iteration is:
[410]	valid_0's rmse: 0.770499
강원 기타음식료품위주종합소매업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 33
[LightGBM] [Info] Number of data points in the train set: 185, number of used features: 5
[LightGBM] [Info] Start training from score 2.455962
Training until validation scores don't improve for 100 rounds


[100]	valid_0's rmse: 0.849228
Early stopping, best iteration is:
[24]	valid_0's rmse: 0.812003
강원 내항 여객 운송업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 31
[LightGBM] [Info] Number of data points in the train set: 154, number of used features: 5
[LightGBM] [Info] Start training from score 2.644518
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.898494
[200]	valid_0's rmse: 0.877422
[300]	valid_0's rmse: 0.869952


[400]	valid_0's rmse: 0.865502
[500]	valid_0's rmse: 0.865852
Early stopping, best iteration is:
[437]	valid_0's rmse: 0.863308
강원 마사지업 done


[LightGBM] [Info] Total Bins 0
[LightGBM] [Info] Number of data points in the train set: 9, number of used features: 0
[LightGBM] [Info] Start training from score 2.530657
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 145.21
Early stopping, best iteration is:
[1]	valid_0's rmse: 145.21
강원 면세점 done


You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 42
[LightGBM] [Info] Number of data points in the train set: 278, number of used features: 5
[LightGBM] [Info] Start training from score 2.450569
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 1.13463
Early stopping, best iteration is:
[44]	valid_0's rmse: 1.11132
강원 버스 운송업 done


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 2409, number of used features: 6
[LightGBM] [Info] Start training from score 2.576690
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.875977
[200]	valid_0's rmse: 0.739686
[300]	valid_0's rmse: 0.687097
[400]	valid_0's rmse: 0.665797
[500]	valid_0's rmse: 0.648697
[600]	valid_0's rmse: 0.646627
[700]	valid_0's rmse: 0.640256
[800]	valid_0's rmse: 0.630507
Early stopping, best iteration is:
[789]	valid_0's rmse: 0.629489
강원 비알콜 음료점업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 1690, number of used features: 6
[LightGBM] [Info] Start training from score 2.582477
Training un

[100]	valid_0's rmse: 0.810566
[200]	valid_0's rmse: 0.80643


[300]	valid_0's rmse: 0.80994
Early stopping, best iteration is:
[205]	valid_0's rmse: 0.805455
강원 욕탕업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 48
[LightGBM] [Info] Number of data points in the train set: 1030, number of used features: 6
[LightGBM] [Info] Start training from score 2.669310
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 1.35588
[200]	valid_0's rmse: 1.2199
[300]	valid_0's rmse: 1.12692
[400]	valid_0's rmse: 1.07315
[500]	valid_0's rmse: 1.04023
[600]	valid_0's rmse: 1.02031
[700]	valid_0's rmse: 1.0063
[800]	valid_0's rmse: 1.00006
[900]	valid_0's rmse: 0.994784
[1000]	valid_0's rmse: 0.99272
Did not meet early stopping. Best iteration is:
[999]	valid_0's rmse: 0.992661
강원 육류 소매업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 43
[LightGBM] [Info] Number of data points in the train set: 527, number of used features: 6
[LightGBM] [Info] Star

[400]	valid_0's rmse: 0.524839
Early stopping, best iteration is:
[347]	valid_0's rmse: 0.523071
강원 일반유흥 주점업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 1659, number of used features: 6
[LightGBM] [Info] Start training from score 2.678059
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.861779
[200]	valid_0's rmse: 0.778452
[300]	valid_0's rmse: 0.734832
[400]	valid_0's rmse: 0.711139
[500]	valid_0's rmse: 0.696161
[600]	valid_0's rmse: 0.684867
[700]	valid_0's rmse: 0.67672
[800]	valid_0's rmse: 0.67097
[900]	valid_0's rmse: 0.664916
[1000]	valid_0's rmse: 0.663509
Did not meet early stopping. Best iteration is:
[968]	valid_0's rmse: 0.662041


강원 일식 음식점업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 24
[LightGBM] [Info] Number of data points in the train set: 45, number of used features: 3
[LightGBM] [Info] Start training from score 2.616142
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 1.01609


Early stopping, best iteration is:
[82]	valid_0's rmse: 1.01506
강원 자동차 임대업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 2036, number of used features: 6
[LightGBM] [Info] Start training from score 2.611275
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 1.26023
[200]	valid_0's rmse: 1.08032
[300]	valid_0's rmse: 1.02315
[400]	valid_0's rmse: 0.995635
[500]	valid_0's rmse: 0.971706
[600]	valid_0's rmse: 0.959788
[700]	valid_0's rmse: 0.951999
[800]	valid_0's rmse: 0.948589
[900]	valid_0's rmse: 0.946848
[1000]	valid_0's rmse: 0.944865
Did not meet early stopping. Best iteration is:
[974]	valid_0's rmse: 0.94252
강원 전시 및 행사 대행업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49


[100]	valid_0's rmse: 0.858894
[200]	valid_0's rmse: 0.644833
[300]	valid_0's rmse: 0.596272


[400]	valid_0's rmse: 0.588294
[500]	valid_0's rmse: 0.582831
[600]	valid_0's rmse: 0.580232


[700]	valid_0's rmse: 0.580101
Early stopping, best iteration is:
[637]	valid_0's rmse: 0.577427
강원 차량용 가스 충전업 done
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 2617, number of used features: 6
[LightGBM] [Info] Start training from score 2.712697
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.857806
[200]	valid_0's rmse: 0.744518
[300]	valid_0's rmse: 0.685381


[400]	valid_0's rmse: 0.644476
[500]	valid_0's rmse: 0.629782
[600]	valid_0's rmse: 0.61388
[700]	valid_0's rmse: 0.60863
[800]	valid_0's rmse: 0.605252
[900]	valid_0's rmse: 0.600587
[1000]	valid_0's rmse: 0.594009
Did not meet early stopping. Best iteration is:
[998]	valid_0's rmse: 0.593996
강원 차량용 주유소 운영업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 3859, number of used features: 6
[LightGBM] [Info] Start training from score 2.649802
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 1.0162
[200]	valid_0's rmse: 0.822598
[300]	valid_0's rmse: 0.705307
[400]	valid_0's rmse: 0.671726
[500]	valid_0's rmse: 0.649598
[600]	valid_0's rmse: 0.635606
[700]	valid_0's rmse: 0.6261
[800]	valid_0's rmse: 0.620414
[900]	valid_0's rmse: 0.61914
[1000]	valid_0's rmse: 0.616175
Did not meet early 

You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 2079, number of used features: 6
[LightGBM] [Info] Start training from score 2.586274
Training until validation scores don't improve for 100 rounds
[100]	valid_0's rmse: 0.764362
[200]	valid_0's rmse: 0.663149
[300]	valid_0's rmse: 0.625981
[400]	valid_0's rmse: 0.612338
[500]	valid_0's rmse: 0.602251
[600]	valid_0's rmse: 0.591793
[700]	valid_0's rmse: 0.585001
[800]	valid_0's rmse: 0.578157
[900]	valid_0's rmse: 0.580218
Early stopping, best iteration is:
[836]	valid_0's rmse: 0.577861
강원 피자 햄버거 샌드위치 및 유사 음식점업 done
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 49
[LightGBM] [Info] Number of data points in the train set: 3704, number of used features: 6
[LightGBM] [Info] St

# 결과 해석 및 평가

- 4월 amt 가 들어간 데이터프레임 생성
- r2score(정확도)
- rmsle(오차율)


In [None]:
data4 = pd.read_csv('202004.csv')
true = data4.groupby(['REG_YYMM','CARD_SIDO_NM','STD_CLSS_NM']).sum()['AMT']
true = true.reset_index()
true

In [None]:
real = pd.merge(true, predict, how = 'left', on =['REG_YYMM', 'CARD_SIDO_NM','STD_CLSS_NM'] )
real = real.fillna(0)
real

In [None]:
from sklearn.metrics import mean_squared_log_error
np.sqrt(mean_squared_log_error(real.AMT_x, real.AMT_y))