# RFM Segmentation
https://clevertap.com/blog/rfm-analysis/

In [13]:
import org.apache.spark.sql.types._

val schema = StructType(
                List(
                    StructField("customer_id", StringType, false),
                    StructField("purchase_amount", DoubleType, false),
                    StructField("date_of_purchase", DateType, false)
                )
            )
val data = spark.read
                .option("sep", "\t")
                .option("mode","FAILFAST")
                .option("dateFormat","YYYY-MM-dd")
                .schema(schema)
                .csv("../../data/foundation-marketing-analytics/purchases.txt")
                .toDF

schema = StructType(StructField(customer_id,StringType,false), StructField(purchase_amount,DoubleType,false), StructField(date_of_purchase,DateType,false))
data = [customer_id: string, purchase_amount: double ... 1 more field]


[customer_id: string, purchase_amount: double ... 1 more field]

**Calculate RECENCY in days**

Recency: How recently a customer has made a purchase

In [34]:
import org.apache.spark.sql.functions._

//Why is end_date set to "2016-01-01"?

val enriched1 = data
                .withColumn("end_date", lit("2016-01-01"))
                .withColumn("days_since", datediff($"end_date", $"date_of_purchase"))
enriched1.show(5)

//Verify if any calculations have failed
val nullCount = enriched1.filter(isnull($"days_since")).count()
assert(nullCount == 0)

+-----------+---------------+----------------+----------+----------+
|customer_id|purchase_amount|date_of_purchase|  end_date|days_since|
+-----------+---------------+----------------+----------+----------+
|        760|           25.0|      2009-11-06|2016-01-01|      2247|
|        860|           50.0|      2012-09-28|2016-01-01|      1190|
|       1200|          100.0|      2005-10-25|2016-01-01|      3720|
|       1420|           50.0|      2009-07-09|2016-01-01|      2367|
|       1940|           70.0|      2013-01-25|2016-01-01|      1071|
+-----------+---------------+----------------+----------+----------+
only showing top 5 rows



enriched1 = [customer_id: string, purchase_amount: double ... 3 more fields]
nullCount = 0


lastException: Throwable = null


0

**Calculate FREQUENCY and Monetary value**

Frequency: How often a customer makes a purchase
Monetary Value: How much money a customer spends on purchases

In [38]:
val enriched2 = enriched1
                .groupBy($"customer_id")
                .agg(
                    min($"days_since").alias("recency"),
                    count($"customer_id").alias("frequency"),
                    avg($"purchase_amount").alias("amount"))
enriched2.filter($"customer_id".isin("10", "90")).show(5)

+-----------+-------+---------+------+
|customer_id|recency|frequency|amount|
+-----------+-------+---------+------+
|         90|    758|       10| 115.8|
|         10|   3829|        1|  30.0|
+-----------+-------+---------+------+



enriched2 = [customer_id: string, recency: int ... 2 more fields]


[customer_id: string, recency: int ... 2 more fields]

**Let us do some summary/descriptive stats**

In [40]:
enriched2.describe().show()

+-------+------------------+------------------+------------------+------------------+
|summary|       customer_id|           recency|         frequency|            amount|
+-------+------------------+------------------+------------------+------------------+
|  count|             18417|             18417|             18417|             18417|
|   mean|137573.51088668077|  1253.03789976652|2.7823749796383774|57.792985101815624|
| stddev|  69504.5998805089|1081.4378683668397| 2.936888270392829|154.36010930845674|
|    min|                10|                 1|                 1|               5.0|
|    max|             99990|              4014|                45|            4500.0|
+-------+------------------+------------------+------------------+------------------+



In [44]:
enriched2.createOrReplaceTempView("customers")
spark.sql("select * from customers").show()

+-----------+-------+---------+------------------+
|customer_id|recency|frequency|            amount|
+-----------+-------+---------+------------------+
|       6240|   3005|        3| 76.66666666666667|
|      52800|   3320|        1|              15.0|
|     100140|     13|        4|             51.25|
|     109180|     30|        8|             48.75|
|     131450|    205|        8|            103.75|
|      45300|    234|        6|29.166666666666668|
|      69460|     15|        9| 28.88888888888889|
|      86180|      2|        9| 21.11111111111111|
|     161110|   1528|        1|              30.0|
|      60070|   2074|        3|51.666666666666664|
|      13610|   1307|        8|           3043.75|
|     100010|    413|        7|27.857142857142858|
|     107930|    150|        5|              79.0|
|     132610|     30|        7|28.571428571428573|
|     154770|    427|        1|              45.0|
|      49290|    371|        5|              24.0|
|     229650|    419|        1|

In [46]:
%%help

Name: Error parsing magics!
Message: Magic help does not exist!
StackTrace: 