# RFM Modeling

RFM stands for Recency - Frequency - Monetary Value. Theoretically we will have segments like below:

- **Recency**: How much time has elapsed since a customer’s last activity or transaction with the brand? Activity is usually a purchase, although variations are sometimes used, e.g., the last visit to a website or use of a mobile app. In most cases, the more recently a customer has interacted or transacted with a brand, the more likely that customer will be responsive to communications from the brand.
- **Frequency**: How often has a customer transacted or interacted with the brand during a particular period of time? Clearly, customers with frequent activities are more engaged, and probably more loyal, than customers who rarely do so. And one-time-only customers are in a class of their own.
- **Monetary**: Also referred to as “monetary value,” this factor reflects how much a customer has spent with the brand during a particular period of time. Big spenders should usually be treated differently than customers who spend little. Looking at monetary divided by frequency indicates the average purchase amount – an important secondary factor to consider when segmenting customers.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
data = pd.read_csv('RFM_data/sample-orders.csv')

In [3]:
data.head()

Unnamed: 0,order_date,order_id,customer,grand_total
0,9/7/11,CA-2011-100006,Dennis Kane,378
1,7/8/11,CA-2011-100090,Ed Braxton,699
2,3/14/11,CA-2011-100293,Neil Franz�sisch,91
3,1/29/11,CA-2011-100328,Jasper Cacioppo,4
4,4/8/11,CA-2011-100363,Jim Mitchum,21


In [4]:
data.shape

(5009, 4)

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5009 entries, 0 to 5008
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   order_date   5009 non-null   object
 1   order_id     5009 non-null   object
 2   customer     5009 non-null   object
 3   grand_total  5009 non-null   int64 
dtypes: int64(1), object(3)
memory usage: 156.7+ KB


In [6]:
data.customer.value_counts().head(10)

Emily Phan             17
Noel Staavos           13
Zuschuss Carroll       13
Joel Eaton             13
Sally Hughsby          13
Chloris Kastensmidt    13
Erin Ashbrook          13
Patrick Gardner        13
Pete Kriz              12
Sanjit Jacobs          12
Name: customer, dtype: int64

In [7]:
import datetime as dt
data['order_date'] = pd.to_datetime(data['order_date'])

In [8]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5009 entries, 0 to 5008
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   order_date   5009 non-null   datetime64[ns]
 1   order_id     5009 non-null   object        
 2   customer     5009 non-null   object        
 3   grand_total  5009 non-null   int64         
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 156.7+ KB


In [9]:
data.order_date.value_counts().head(10)

2013-09-06    19
2014-09-05    16
2014-11-04    16
2014-12-03    16
2014-09-08    15
2014-12-10    15
2014-11-20    15
2014-11-25    15
2014-09-03    14
2013-11-11    14
Name: order_date, dtype: int64

## RFM Table

Aggregating the order_date to calculate recency, order_id to calculate frequency and grand_total to calculate monetary value.

In [10]:
today = dt.datetime(2021,2,28) #automate this

In [11]:
rfm = data.groupby('customer').agg({'order_date': lambda x: (today - x.max()),      # Recency
                                        'order_id': lambda x: len(x),               # Frequency
                                        'grand_total': lambda x: x.sum()})          # Monetary Value

rfm['order_date'] = pd.to_numeric(rfm['order_date'].dt.days, downcast='integer')

rfm.rename(columns={'order_date': 'recency', 
                         'order_id': 'frequency', 
                         'grand_total': 'monetary_value'}, inplace=True)

In [12]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary_value
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aaron Bergman,2666,3,887
Aaron Hawkins,2263,7,1744
Aaron Smayling,2339,7,3050
Adam Bellavance,2305,8,7756
Adam Hart,2285,10,3249


In [13]:
#Making sure the RFM data is right Adam Hart should have 10 purchase with sum of 3249 where the last purchase was 2285 days ago
test1 = data[data['customer']=='Adam Hart']
test1

Unnamed: 0,order_date,order_id,customer,grand_total
703,2011-11-16,CA-2011-160066,Adam Hart,5
1606,2012-12-21,CA-2012-161795,Adam Hart,3
2371,2013-06-27,CA-2013-144015,Adam Hart,262
2429,2013-12-18,CA-2013-147109,Adam Hart,217
2473,2013-09-16,CA-2013-149797,Adam Hart,842
3045,2014-09-26,CA-2014-112004,Adam Hart,199
3175,2014-04-16,CA-2014-118857,Adam Hart,196
3315,2014-10-24,CA-2014-125451,Adam Hart,1170
3719,2014-05-20,CA-2014-145702,Adam Hart,342
4101,2014-11-27,CA-2014-165029,Adam Hart,13


In [14]:
test1.grand_total.sum()

3249

In [15]:
test1.customer.count()

10

In [16]:
(today - dt.datetime(2014,11,27)).days

2285

## RFM Matrix

I will be segmenting RFM into a 3 by 3 matrixs where the scale is divided in to 3 qualties at 0.25, 0.5 and 0.75, each represent:

- **Low Value**: Customers who are less active than others, not very frequent buyer/visitor and generates very low - zero - maybe negative revenue.
- **Mid Value**: In the middle of everything. Often using our platform (but not as much as our High Values), fairly frequent and generates moderate revenue.
- **High Value**: The group we don’t want to lose. High Revenue, Frequency and low Inactivity.

Each category (R,F,M) will be assign 1,2,3,4 tier (1 being the best, 4 being the worst) to indicate how effective they are, for example:

- **Best Customers** – This group consists of those customers who are found in R-Tier-1, F-Tier-1 and M-Tier-1, meaning that they transacted recently, do so often and spend more than other customers. A shortened notation for this segment is 1-1-1; we’ll use this notation going forward.
- **High-spending New Customers** – This group consists of those customers in 1-4-1 and 1-4-2. These are customers who transacted only once, but very recently and they spent a lot.
- **Lowest-Spending Active Loyal Customers** – This group consists of those customers in segments 1-1-3 and 1-1-4 (they transacted recently and do so often, but spend the least).
- **Churned Best Customers** – This segment consists of those customers in groups 4-1-1, 4-1-2, 4-2-1 and 4-2-2 (they transacted frequently and spent a lot, but it’s been a long time since they’ve transacted).

In [17]:
section = rfm.quantile(q=[0.25,0.5,0.75])
section

Unnamed: 0,recency,frequency,monetary_value
0.25,2281.0,5.0,1145.0
0.5,2326.0,6.0,2257.0
0.75,2434.0,8.0,3784.0


In [18]:
#Low recency is good --> 1
def r_tier(Rcolumn,quantile_column,quantile_dict):
    rclass = []
    for value in Rcolumn:
        if value <= quantile_dict[quantile_column][0.25]:
            rclass.append(1)
        elif value <= quantile_dict[quantile_column][0.50]:
            rclass.append(2)
        elif value <= quantile_dict[quantile_column][0.75]: 
            rclass.append(3)
        else:
            rclass.append(4)
    return rclass

#High frequency is good --> 1
#High Monetary is good --> 1
def fm_tier(FMcolumn,quantile_column,quantile_dict):
    fmclass = []
    for value in FMcolumn:
        if value <= quantile_dict[quantile_column][0.25]:
            fmclass.append(4)
        elif value <= quantile_dict[quantile_column][0.50]:
            fmclass.append(3)
        elif value <= quantile_dict[quantile_column][0.75]: 
            fmclass.append(2)
        else:
            fmclass.append(1)
    return fmclass

In [19]:
rfm['R_Class'] = r_tier(rfm['recency'],'recency',section)
rfm['F_Class'] = fm_tier(rfm['frequency'],'frequency',section)
rfm['M_Class'] = fm_tier(rfm['monetary_value'],'monetary_value',section)
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary_value,R_Class,F_Class,M_Class
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aaron Bergman,2666,3,887,4,4,4
Aaron Hawkins,2263,7,1744,1,2,3
Aaron Smayling,2339,7,3050,3,2,2
Adam Bellavance,2305,8,7756,2,2,1
Adam Hart,2285,10,3249,2,1,2


In [20]:
#finding RFM score
rfm['RFMClass'] = rfm.R_Class.map(str) + rfm.F_Class.map(str) + rfm.M_Class.map(str)

In [21]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary_value,R_Class,F_Class,M_Class,RFMClass
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Aaron Bergman,2666,3,887,4,4,4,444
Aaron Hawkins,2263,7,1744,1,2,3,123
Aaron Smayling,2339,7,3050,3,2,2,322
Adam Bellavance,2305,8,7756,2,2,1,221
Adam Hart,2285,10,3249,2,1,2,212


In [22]:
#save RFM Result
rfm.to_csv('sample_rfm_table.csv')

## Evaluation

- **Best Customers** – Communications with this group should make them feel valued and appreciated. These customers likely generate a disproportionately high percentage of overall revenues and thus focusing on keeping them happy should be a top priority. Further analyzing their individual preferences and affinities will provide additional opportunities for even more personalized messaging.
- **High-spending New Customers** – It is always a good idea to carefully “incubate” all new customers, but because these new customers spent a lot on their first purchase, it’s even more important. Like with the Best Customers group, it’s important to make them feel valued and appreciated – and to give them terrific incentives to continue interacting with the brand.
- **Lowest-Spending Active Loyal Customers** – These repeat customers are active and loyal, but they are low spenders. Marketers should create campaigns for this group that make them feel valued, and incentivize them to increase their spend levels. As loyal customers, it often also pays to reward them with special offers if they spread the word about the brand to their friends, e.g., via social networks.
- **Churned Best Customers** – These are valuable customers who stopped transacting a long time ago. While it’s often challenging to re-engage churned customers, the high value of these customers makes it worthwhile trying. Like with the Best Customers group, it’s important to communicate with them on the basis of their specific preferences, as known from earlier transaction data.