# RFM Managerial Segmentation
Manegerial segmentation is simple and based on rules than ML or Statistiacal models.

![Rules](rfm-seg-rules.png)

In [1]:
import java.util.concurrent.TimeUnit
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

In [2]:
val schema = StructType(
                List(
                    StructField("customer_id", StringType, false),
                    StructField("purchase_amount", DoubleType, false),
                    StructField("date_of_purchase", DateType, false)
                )
            )
val data = spark.read
                .option("sep", "\t")
                .option("mode","FAILFAST")
                .option("dateFormat","YYYY-MM-dd")
                .schema(schema)
                .csv("../../data/foundation-marketing-analytics/purchases.txt")
                .toDF

schema = StructType(StructField(customer_id,StringType,false), StructField(purchase_amount,DoubleType,false), StructField(date_of_purchase,DateType,false))
data = [customer_id: string, purchase_amount: double ... 1 more field]


[customer_id: string, purchase_amount: double ... 1 more field]

In [3]:
val enriched1 = data
                    .withColumn("end_date", lit("2016-01-01"))
                    .withColumn("year_of_purchase", year($"date_of_purchase"))
                    .withColumn("days_since", datediff($"end_date", $"date_of_purchase"))
enriched1.printSchema()

root
 |-- customer_id: string (nullable = true)
 |-- purchase_amount: double (nullable = true)
 |-- date_of_purchase: date (nullable = true)
 |-- end_date: string (nullable = false)
 |-- year_of_purchase: integer (nullable = true)
 |-- days_since: integer (nullable = true)



enriched1 = [customer_id: string, purchase_amount: double ... 4 more fields]


[customer_id: string, purchase_amount: double ... 4 more fields]

In [4]:
val OneYear = 365
val TwoYears = OneYear * 2
val ThreeYears = OneYear * 3

val enriched2 = enriched1
                .groupBy($"customer_id")
                .agg(
                    max($"days_since").alias("first_purchase"),
                    min($"days_since").alias("recency"),
                    count($"*").alias("frequency"),
                    avg($"purchase_amount").alias("amount"))

enriched2.filter($"customer_id".isin("10", "90")).show(5)

+-----------+--------------+-------+---------+------+
|customer_id|first_purchase|recency|frequency|amount|
+-----------+--------------+-------+---------+------+
|         90|          3783|    758|       10| 115.8|
|         10|          3829|   3829|        1|  30.0|
+-----------+--------------+-------+---------+------+



OneYear = 365
TwoYears = 730
ThreeYears = 1095
enriched2 = [customer_id: string, first_purchase: int ... 3 more fields]


[customer_id: string, first_purchase: int ... 3 more fields]

## First level segmentation
Calculates only first level segmentation

In [5]:
val segment1Level = enriched2
                    .withColumn("segment1", 
                                when($"recency" > ThreeYears, "inactive")
                                .when($"recency" > TwoYears && $"recency" <= ThreeYears, "cold")
                                .when($"recency" > OneYear && $"recency" <= TwoYears, "warm")
                                .otherwise("active"))

segment1Level.groupBy($"segment1").count().show()
segment1Level.show()

+--------+-----+
|segment1|count|
+--------+-----+
|    warm| 1958|
|  active| 5398|
|    cold| 1903|
|inactive| 9158|
+--------+-----+

+-----------+--------------+-------+---------+------------------+--------+
|customer_id|first_purchase|recency|frequency|            amount|segment1|
+-----------+--------------+-------+---------+------------------+--------+
|       6240|          3752|   3005|        3| 76.66666666666667|inactive|
|      52800|          3320|   3320|        1|              15.0|inactive|
|     100140|          2750|     13|        4|             51.25|  active|
|     109180|          2616|     30|        8|             48.75|  active|
|     131450|          2228|    205|        8|            103.75|  active|
|      45300|          3667|    234|        6|29.166666666666668|  active|
|      69460|          3179|     15|        9| 28.88888888888889|  active|
|      86180|          2975|      2|        9| 21.11111111111111|  active|
|     161110|          1528|   1528|  

segment1Level = [customer_id: string, first_purchase: int ... 4 more fields]


[customer_id: string, first_purchase: int ... 4 more fields]

## Second level segmentatiom
Calculates ONLY 2nd level segmentation

In [6]:
//Make sure that the conditions for "warm new" and "active new" come eralier than other conditions with respective 
//categories for accurate results

val segment2Level = segment1Level.withColumn("segment2",
                        when($"segment1" === lit("warm") && $"first_purchase" <= TwoYears, "warm new")
                        .when($"segment1" === lit("warm") && $"amount" >= 100, "warm high value")
                        .when($"segment1" === lit("warm") && $"amount" < 100, "warm low value")
                        .when($"segment1" === lit("active") && $"first_purchase" <= OneYear, "active new")
                        .when($"segment1" === lit("active") && $"amount" >= 100, "active high value")
                        .when($"segment1" === lit("active") && $"amount" < 100, "active low value"))

segment2Level.groupBy($"segment2").count().show()
segment2Level.show()

+-----------------+-----+
|         segment2|count|
+-----------------+-----+
|  warm high value|  119|
|active high value|  573|
|             null|11061|
|         warm new|  938|
| active low value| 3313|
|       active new| 1512|
|   warm low value|  901|
+-----------------+-----+

+-----------+--------------+-------+---------+------------------+--------+-----------------+
|customer_id|first_purchase|recency|frequency|            amount|segment1|         segment2|
+-----------+--------------+-------+---------+------------------+--------+-----------------+
|       6240|          3752|   3005|        3| 76.66666666666667|inactive|             null|
|      52800|          3320|   3320|        1|              15.0|inactive|             null|
|     100140|          2750|     13|        4|             51.25|  active| active low value|
|     109180|          2616|     30|        8|             48.75|  active| active low value|
|     131450|          2228|    205|        8|            103.

segment2Level = [customer_id: string, first_purchase: int ... 5 more fields]


[customer_id: string, first_purchase: int ... 5 more fields]

In [7]:
val cols = segment1Level.schema.fieldNames.map(col(_))
cols.foreach(println)
val segmented = segment1Level
                    .join(segment2Level, segment1Level("customer_id") === segment2Level("customer_id"), "inner")
                    .select(segment1Level("customer_id"),
                            segment1Level("first_purchase"),
                            segment1Level("recency"),
                            segment1Level("frequency"),
                            segment1Level("amount"),
                            segment1Level("segment1"),
                            segment2Level("segment2"))
                    .withColumn("segment", when(segment2Level("segment2").isNotNull, $"segment2").otherwise(segment1Level("segment1")))
                    .orderBy("segment")

//Cache to simplify subsequent calculations
segmented.cache()

segmented.groupBy($"segment").count().show()
segmented.show()

customer_id
first_purchase
recency
frequency
amount
segment1
+-----------------+-----+
|          segment|count|
+-----------------+-----+
|active high value|  573|
| active low value| 3313|
|       active new| 1512|
|             cold| 1903|
|         inactive| 9158|
|  warm high value|  119|
|   warm low value|  901|
|         warm new|  938|
+-----------------+-----+

+-----------+--------------+-------+---------+------------------+--------+-----------------+-----------------+
|customer_id|first_purchase|recency|frequency|            amount|segment1|         segment2|          segment|
+-----------+--------------+-------+---------+------------------+--------+-----------------+-----------------+
|     131450|          2228|    205|        8|            103.75|  active|active high value|active high value|
|     189280|          1106|      1|        3|             100.0|  active|active high value|active high value|
|     170050|          1520|     13|        2|             100.0|  acti

cols = Array(customer_id, first_purchase, recency, frequency, amount, segment1)
segmented = [customer_id: string, first_purchase: int ... 6 more fields]


[customer_id: string, first_purchase: int ... 6 more fields]

**NOTE: We can combine the calculation of both 1st and 2nd level segments into one code base but separating them simplifies testing and better maintenance**

### Profile of each segment

In [8]:
    val segments = segmented
                    .groupBy($"segment")
                    .agg(
                        round(avg($"recency"),2).alias("avg_r"),
                        round(avg($"frequency"),2).alias("avg_f"),
                        round(avg($"amount"),2).alias("avg_a"))
                    .orderBy($"segment")
    segments.show(10, truncate=false)

+-----------------+-------+-----+------+
|segment          |avg_r  |avg_f|avg_a |
+-----------------+-------+-----+------+
|active high value|88.82  |5.89 |240.05|
|active low value |108.36 |5.94 |40.72 |
|active new       |84.99  |1.05 |77.13 |
|cold             |857.78 |2.3  |51.74 |
|inactive         |2178.11|1.81 |48.11 |
|warm high value  |455.13 |4.71 |327.41|
|warm low value   |474.38 |4.53 |38.59 |
|warm new         |509.3  |1.04 |66.6  |
+-----------------+-------+-----+------+



segments = [segment: string, avg_r: double ... 2 more fields]


[segment: string, avg_r: double ... 2 more fields]

## Segment a Database Retrospectively
Taht is the segmentation of the database as if we were a **year ago**. 

**How did it work?**

The first thing to do is to remember that we are a year ago. Meaning that whatever data we take into account, anything that has happened over the last 365 days should be discarded.

We go back in time, assume the data that has been generated over the last year, for instance over the last period did not even exist. Adapt how we compute recency, frequency, monetary value and accordingly. And then we just apply everything we have applied before, same segmentation, same transformation, same analyses, and same tables.

**Why do we need to segment retrospectively?**

From a managerial point of view, it is also extremely useful to see not only to what extent each segment contributes to today's revenues. But also to what extent each segment today would likely contribute to tomorrow's revenues.