# RFM Segmentation
https://clevertap.com/blog/rfm-analysis/

## 1. Data Acquisition & Analysis

In [7]:
import java.util.concurrent.TimeUnit

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

import org.apache.spark.ml.feature.StandardScaler
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.mllib.util.MLUtils

import org.apache.spark.ml.clustering.BisectingKMeans
import org.apache.spark.ml.clustering.ClusteringSummary
import org.apache.spark.ml.evaluation.ClusteringEvaluator

## 2. Modeling - Data Preparation

In [8]:
val schema = StructType(
                List(
                    StructField("customer_id", StringType, false),
                    StructField("purchase_amount", DoubleType, false),
                    StructField("date_of_purchase", DateType, false)
                )
            )
val data = spark.read
                .option("sep", "\t")
                .option("mode","FAILFAST")
                .option("dateFormat","YYYY-MM-dd")
                .schema(schema)
                .csv("../../data/foundation-marketing-analytics/purchases.txt")
                .toDF

schema = StructType(StructField(customer_id,StringType,false), StructField(purchase_amount,DoubleType,false), StructField(date_of_purchase,DateType,false))
data = [customer_id: string, purchase_amount: double ... 1 more field]


[customer_id: string, purchase_amount: double ... 1 more field]

### Creating

#### Derive days_since col for RECENCY calculation later

In [9]:
//Why is end_date set to "2016-01-01"?

val enriched1 = data
                .withColumn("end_date", lit("2016-01-01"))
                .withColumn("days_since", datediff($"end_date", $"date_of_purchase"))
enriched1.show(5)

//Verify if any calculations have failed
val nullCount = enriched1.filter(isnull($"days_since")).count()
assert(nullCount == 0)

+-----------+---------------+----------------+----------+----------+
|customer_id|purchase_amount|date_of_purchase|  end_date|days_since|
+-----------+---------------+----------------+----------+----------+
|        760|           25.0|      2009-11-06|2016-01-01|      2247|
|        860|           50.0|      2012-09-28|2016-01-01|      1190|
|       1200|          100.0|      2005-10-25|2016-01-01|      3720|
|       1420|           50.0|      2009-07-09|2016-01-01|      2367|
|       1940|           70.0|      2013-01-25|2016-01-01|      1071|
+-----------+---------------+----------------+----------+----------+
only showing top 5 rows



enriched1 = [customer_id: string, purchase_amount: double ... 3 more fields]
nullCount = 0


0

#### Create features: FREQUENCY, RECENCY (in days) and Monetary value

1. Recency: How recently a customer has made a purchase
2. Frequency: How often a customer makes a purchase
3. Monetary Value: How much money a customer spends on purchases

In [10]:
val enriched2 = enriched1
                .groupBy($"customer_id")
                .agg(
                    min($"days_since").alias("recency"),
                    count($"customer_id").alias("frequency"),
                    avg($"purchase_amount").alias("amount"))
enriched2.filter($"customer_id".isin("10", "90")).show(5)

+-----------+-------+---------+------+
|customer_id|recency|frequency|amount|
+-----------+-------+---------+------+
|         90|    758|       10| 115.8|
|         10|   3829|        1|  30.0|
+-----------+-------+---------+------+



enriched2 = [customer_id: string, recency: int ... 2 more fields]


[customer_id: string, recency: int ... 2 more fields]

**Let us do some summary/descriptive stats**

In [11]:
enriched2.describe().show()

+-------+------------------+------------------+------------------+------------------+
|summary|       customer_id|           recency|         frequency|            amount|
+-------+------------------+------------------+------------------+------------------+
|  count|             18417|             18417|             18417|             18417|
|   mean|137573.51088668077|  1253.03789976652|2.7823749796383774|57.792985101815624|
| stddev|  69504.5998805089|1081.4378683668397| 2.936888270392829|154.36010930845674|
|    min|                10|                 1|                 1|               5.0|
|    max|             99990|              4014|                45|            4500.0|
+-------+------------------+------------------+------------------+------------------+



In [12]:
enriched2.createOrReplaceTempView("customers")
spark.sql("select * from customers").show()

+-----------+-------+---------+------------------+
|customer_id|recency|frequency|            amount|
+-----------+-------+---------+------------------+
|       6240|   3005|        3| 76.66666666666667|
|      52800|   3320|        1|              15.0|
|     100140|     13|        4|             51.25|
|     109180|     30|        8|             48.75|
|     131450|    205|        8|            103.75|
|      45300|    234|        6|29.166666666666668|
|      69460|     15|        9| 28.88888888888889|
|      86180|      2|        9| 21.11111111111111|
|     161110|   1528|        1|              30.0|
|      60070|   2074|        3|51.666666666666664|
|      13610|   1307|        8|           3043.75|
|     100010|    413|        7|27.857142857142858|
|     107930|    150|        5|              79.0|
|     132610|     30|        7|28.571428571428573|
|     154770|    427|        1|              45.0|
|      49290|    371|        5|              24.0|
|     229650|    419|        1|

### Feature Scaling

We need to address the following before we can train the model:

1. Data Dispersion: The *amount* is skewed. We need to take **log** to address this skewness.
2. Scale: Different features use different scales (see table). These features need to be standardized so that they contribute equally to the result.


|Feature|Scale|
|-------|-----|
|recency|days|
|frequency|purchase occassions|
|amount|dollars|

**1. Handle data dispersion**

Below histogram shows that majority of the purchases are below 5 and 300 dollars. The data is left-skewed .

In [13]:
val (startValues, counts) = enriched2.select($"amount").map(v=>v.getDouble(0)).rdd.histogram(15)

startValues = Array(5.0, 304.6666666666667, 604.3333333333334, 904.0, 1203.6666666666667, 1503.3333333333333, 1803.0, 2102.6666666666665, 2402.3333333333335, 2702.0, 3001.6666666666665, 3301.3333333333335, 3601.0, 3900.6666666666665, 4200.333333333333, 4500.0)
counts = Array(18126, 144, 28, 56, 16, 8, 19, 4, 3, 5, 1, 0, 0, 5, 2)


Array(18126, 144, 28, 56, 16, 8, 19, 4, 3, 5, 1, 0, 0, 5, 2)

Let us derive a new column, *log_amount*

In [14]:
val enriched3 = enriched2.withColumn("log_amount", log($"amount"))
val (startValues, counts) = enriched3.select($"log_amount").map(v=>v.getDouble(0)).rdd.histogram(15)
enriched3.createOrReplaceTempView("customers")
spark.sql("select * from customers").show(5)

+-----------+-------+---------+-----------------+------------------+
|customer_id|recency|frequency|           amount|        log_amount|
+-----------+-------+---------+-----------------+------------------+
|       6240|   3005|        3|76.66666666666667| 4.339467020255086|
|      52800|   3320|        1|             15.0|  2.70805020110221|
|     100140|     13|        4|            51.25|3.9367156180185177|
|     109180|     30|        8|            48.75| 3.886705197443856|
|     131450|    205|        8|           103.75| 4.641984159110808|
+-----------+-------+---------+-----------------+------------------+
only showing top 5 rows



enriched3 = [customer_id: string, recency: int ... 3 more fields]
startValues = Array(1.6094379124341003, 2.062930896655721, 2.516423880877342, 2.9699168650989627, 3.423409849320583, 3.8769028335422036, 4.330395817763825, 4.783888801985445, 5.237381786207066, 5.690874770428687, 6.1443677546503075, 6.597860738871929, 7.051353723093549, 7.50484670731517, 7.958339691536791, 8.411832675758411)
counts = Array(109, 987, 1567, 6989, 3004, 3291, 1453, 487, 217, 95, 81, 73, 26, 26, 12)


Array(109, 987, 1567, 6989, 3004, 3291, 1453, 487, 217, 95, 81, 73, 26, 26, 12)

**2. Scale**

Distance algorithms like K-means (clustering) are most affected by the range of features because they are using distance between data points to determine their similarity.  So there is a chance that higher weights will be given to features with higher magnitude.

**Standardization** is a technique used to scale our features so that all the features **contribute equally to the result**. Logic: (data - mean)/std-dev. This puts most of the data between -2 & 2.

We will use [StandardScaler](https://spark.apache.org/docs/1.4.1/ml-features.html#standardscaler) from Spark.

In [15]:
val assembler = new VectorAssembler()
                    .setInputCols(Array("recency", "frequency", "log_amount"))
                    .setOutputCol("vectors")

val vectors = assembler.transform(enriched3)
vectors.show(3)


val scaler = new StandardScaler()
                .setInputCol("vectors")
                .setOutputCol("features")
                .setWithStd(true)
                .setWithMean(true)
// Compute summary statistics by fitting the StandardScaler
val scalerModel = scaler.fit(vectors)
// Standardize each feature to have unit standard deviation.
val scaledFeatures = scalerModel
                        .transform(vectors)
                        .cache()
scaledFeatures.printSchema()
scaledFeatures.show(3, truncate=false)

+-----------+-------+---------+-----------------+------------------+--------------------+
|customer_id|recency|frequency|           amount|        log_amount|             vectors|
+-----------+-------+---------+-----------------+------------------+--------------------+
|       6240|   3005|        3|76.66666666666667| 4.339467020255086|[3005.0,3.0,4.339...|
|      52800|   3320|        1|             15.0|  2.70805020110221|[3320.0,1.0,2.708...|
|     100140|     13|        4|            51.25|3.9367156180185177|[13.0,4.0,3.93671...|
+-----------+-------+---------+-----------------+------------------+--------------------+
only showing top 3 rows

root
 |-- customer_id: string (nullable = true)
 |-- recency: integer (nullable = true)
 |-- frequency: long (nullable = false)
 |-- amount: double (nullable = true)
 |-- log_amount: double (nullable = true)
 |-- vectors: vector (nullable = true)
 |-- features: vector (nullable = true)

+-----------+-------+---------+-----------------+--------

assembler = vecAssembler_a1b0b05fea89
vectors = [customer_id: string, recency: int ... 4 more fields]
scaler = stdScal_5a47718d9676
scalerModel = stdScal_5a47718d9676
scaledFeatures = [customer_id: string, recency: int ... 5 more fields]


[customer_id: string, recency: int ... 5 more fields]

## 2. Modeling - Model Building

Extract the feature vector from *scaledFeatures* dataframe so that we can fit the model

We will apply [hierarchial clustering](http://spark.apache.org/docs/latest/mllib-clustering.html#bisecting-k-means) using BisectingKMeans

In [22]:
def summarize(summary: ClusteringSummary): Unit = {        
    println(s"# of clusters: ${summary.clusterSizes.length}, # of iterations: ${summary.numIter}")

    println("Frequency (or number of data points) of each cluster:")
    summary.clusterSizes.zipWithIndex.foreach { case(size, idx) =>
        println(s"\tCluser: ${idx}, # of members: ${size}")
    }
    
    println("Prediction for first 10 customers:")
    summary
        .predictions
        .withColumn("cust_id_numeric", $"customer_id".cast(IntegerType))
        .orderBy($"cust_id_numeric")
        .select("customer_id", "prediction")
        .limit(10)
        .show(10)

    println("Profile of each segment:")
    val segments = summary
                    .predictions
                    .groupBy($"prediction".alias("segment"))
                    .agg(
                        min($"recency").alias("min_r"),
                        round(avg($"recency"),2).alias("avg_r"),
                        max($"recency").alias("max_r"),

                        min($"frequency").alias("min_f"),
                        round(avg($"frequency"),2).alias("avg_f"),
                        max($"frequency").alias("max_f"),

                        min($"amount").alias("min_a"),
                        round(avg($"amount"),2).alias("avg_a"),
                        max($"amount").alias("max_a"))
                    .orderBy($"segment")
    segments.show(10, truncate=false)
}

def mean(list:List[Double]):Double = if(list.isEmpty) 0 else list.sum/list.size

summarize: (summary: org.apache.spark.ml.clustering.ClusteringSummary)Unit
mean: (list: List[Double])Double


## 3. Modeling - Model Evaluation

Accuracy is a useful metric in supervised learning, such as classification. However, in case of unsupervised learning (like KMeans, BisectingKMeans), there is no accuracy (as there is no labeled data / gold standard you can evaluate against.

However, we can use the [ClusteringEvaluator](https://spark.apache.org/docs/2.4.4/api/scala/index.html#org.apache.spark.ml.evaluation.ClusteringEvaluator) for assessing the quality of our model. The Silhouette is a measure for the validation of the consistency within clusters. It ranges between 1 and -1, where a **value close to 1** means that the points in a cluster are close to the otherpoints in the same cluster and far from the points of the other clusters.

**Let us find optimal K**

In [23]:
def mean(list:List[Double]):Double = if(list.isEmpty) 0 else list.sum/list.size

/**
Determine optimal number of clusters by using Silhoutte Score Analysis.
    :param df_in: the input dataframe
    :param index_col: the name of the index column
    :param k_min: the train dataset
    :param k_min: the minmum number of the clusters
    :param k_max: the maxmum number of the clusters
    :param num_runs: the number of runs for each fixed clusters
    
        :return k: optimal number of the clusters
    :return silh_lst: Silhouette score
    :return r_table: the running results table
**/
def optimalClusters(input:DataFrame, minClusters:Int, maxClusters:Int, numRuns:Int) = {
    val start = System.nanoTime()
    
    val numClusters = List()
    val silhouetteAvgScores = List()
    
    for(clustSize <- minClusters until maxClusters + 1) {
        println(s"\nFit for cluster size (K): ${clustSize}")
        numClusters :+ clustSize
        val silScoresOfCurrCluster = List()
        
        for(run <- 1 until (numRuns + 1)) {
            val randSeed = new scala.util.Random().nextInt(100)
            println(s"Run: ${run}, Seed: ${randSeed}")
            
            // Clustering the data into clustSize clusters by BisectingKMeans
            val bkm = new BisectingKMeans()
                            .setK(clustSize)
                            .setSeed(randSeed)
                            .setFeaturesCol("features")
            val model = bkm.fit(scaledFeatures)
            
            //Print summary
            // Show the compute cost and the cluster centers
            println(s"Within Set Sum of Squared Errors: ${model.computeCost(scaledFeatures)}")
            println("Cluster centers:")
            model.clusterCenters.zipWithIndex.foreach { case (center, idx) =>
              println(s"\tCluster Center ${idx}: ${center}")
            }
            summarize(model.summary)

            //Predict
            val predictions = model.transform(scaledFeatures)

            // Evaluate clustering by computing Silhouette score
            val evaluator = new ClusteringEvaluator()
            val silhouette = evaluator.evaluate(predictions)
            silScoresOfCurrCluster :+ silhouette
            println(s"Silhouette with squared euclidean distance = $silhouette")
        }
        
        println(s"numClusters: ${numClusters}, silhouetteAvgScores: ${silhouetteAvgScores}")
        silhouetteAvgScores :+ mean(silScoresOfCurrCluster)
    }
    
    val result = numClusters zip silhouetteAvgScores
    println(s"result: ${result}")
    
    result.foreach {
        case (clusters, meanSilScore) => println(s"Clusters: ${clusters}, No. of runs: ${numRuns}, Mean Silhouette score: ${meanSilScore}")
    }
    
    //Create a distributed collection i.e. RDD
    val resultRdd = sc.parallelize(result)
    spark.createDataFrame(resultRdd).show(20, truncate=false)

    val duration = System.nanoTime() - start
    
    println(TimeUnit.NANOSECONDS.toMinutes(duration) + " mins.")
    println(TimeUnit.NANOSECONDS.toSeconds(duration) + " secs.")
}

/**
def optimal_k(df_in,index_col,k_min, k_max,num_runs):
    start = time.time()
    silh_lst = []
    k_lst = np.arange(k_min, k_max+1)

    r_table = df_in.select(index_col).toPandas()
    r_table = r_table.set_index(index_col)
    centers = pd.DataFrame()

    for k in k_lst:
        silh_val = []
        for run in np.arange(1, num_runs+1):

            # Trains a k-means model.
            kmeans = KMeans()\
                    .setK(k)\
                    .setSeed(int(np.random.randint(100, size=1)))
            model = kmeans.fit(df_in)

            # Make predictions
            predictions = model.transform(df_in)
            r_table['cluster_{k}_{run}'.format(k=k, run=run)]= predictions.select('prediction').toPandas()

            # Evaluate clustering by computing Silhouette score
            evaluator = ClusteringEvaluator()
            silhouette = evaluator.evaluate(predictions)
            silh_val.append(silhouette)

        silh_array=np.asanyarray(silh_val)
        silh_lst.append(silh_array.mean())

    elapsed =  time.time() - start

    silhouette = pd.DataFrame(list(zip(k_lst,silh_lst)),columns = ['k', 'silhouette'])

    print('+------------------------------------------------------------+')
    print("|         The finding optimal k phase took %8.0f s.       |" %(elapsed))
    print('+------------------------------------------------------------+')


    return k_lst[np.argmax(silh_lst, axis=0)], silhouette , r_table
**/

mean: (list: List[Double])Double
optimalClusters: (input: org.apache.spark.sql.DataFrame, minClusters: Int, maxClusters: Int, numRuns: Int)Unit


In [24]:
optimalClusters(scaledFeatures, 3, 9, 3)


Fit for cluster size (K): 3
Run: 1, Seed: 42
Within Set Sum of Squared Errors: 27924.569386362753
Cluster centers:
	Cluster Center 0: [-0.7141814317618062,-0.19798460395810444,0.3600638763470465]
	Cluster Center 1: [-0.8427370232348932,2.088039268480961,0.3412415902377899]
	Cluster Center 2: [0.8665655715310359,-0.43376735557953605,-0.4126103873057646]
# of clusters: 3, # of iterations: 20
Frequency (or number of data points) of each cluster:
	Cluser: 0, # of members: 7421
	Cluser: 1, # of members: 2474
	Cluser: 2, # of members: 8522
Prediction for first 10 customers:
+-----------+----------+
|customer_id|prediction|
+-----------+----------+
|         10|         2|
|         80|         1|
|         90|         1|
|        120|         2|
|        130|         2|
|        160|         2|
|        190|         2|
|        220|         2|
|        230|         2|
|        240|         0|
+-----------+----------+

Profile of each segment:
+-------+-----+-------+-----+-----+-----+-----+-

+-----------+----------+
|customer_id|prediction|
+-----------+----------+
|         10|         3|
|         80|         1|
|         90|         1|
|        120|         2|
|        130|         3|
|        160|         3|
|        190|         3|
|        220|         2|
|        230|         3|
|        240|         0|
+-----------+----------+

Profile of each segment:
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-------+
|segment|min_r|avg_r  |max_r|min_f|avg_f|max_f|min_a|avg_a|max_a  |
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-------+
|0      |1    |480.7  |3664 |1    |2.2  |5    |5.0  |82.55|4500.0 |
|1      |1    |341.67 |2822 |6    |8.91 |45   |6.75 |70.62|3043.75|
|2      |2    |1639.09|3691 |1    |1.57 |9    |5.0  |19.9 |50.0   |
|3      |1268 |2676.5 |4014 |1    |1.45 |8    |8.0  |43.64|1807.0 |
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-------+

Silhouette with squared euclidean distance = 0.3732320401084462
numClusters: Li

Silhouette with squared euclidean distance = 0.3684175175794214
Run: 2, Seed: 6
Within Set Sum of Squared Errors: 18119.26779248774
Cluster centers:
	Cluster Center 0: [-0.7548735781849052,-0.19541521314335572,-0.05024969148311409]
	Cluster Center 1: [-0.5669604365509189,-0.20728045846771836,1.8445461933131682]
	Cluster Center 2: [-0.8427370232348932,2.088039268480961,0.3412415902377899]
	Cluster Center 3: [0.35678463737577565,-0.4127735583688451,-0.8939074628951671]
	Cluster Center 4: [1.6175673224889875,-0.4970400698271032,-0.49287599534711213]
	Cluster Center 5: [1.0209469126413733,-0.4084307020730653,0.506574460759273]
# of clusters: 6, # of iterations: 20
Frequency (or number of data points) of each cluster:
	Cluser: 0, # of members: 5814
	Cluser: 1, # of members: 1607
	Cluser: 2, # of members: 2474
	Cluser: 3, # of members: 3994
	Cluser: 4, # of members: 2239
	Cluser: 5, # of members: 2289
Prediction for first 10 customers:
+-----------+----------+
|customer_id|prediction|
+-----

Silhouette with squared euclidean distance = 0.3285510625986899
Run: 3, Seed: 12
Within Set Sum of Squared Errors: 17564.539322782148
Cluster centers:
	Cluster Center 0: [-0.8031710574491647,-0.009294640729135316,-0.3275376308065666]
	Cluster Center 1: [-0.8811590846351864,2.3168037474496077,0.18990053738733403]
	Cluster Center 2: [-0.6162228469632147,-0.178855012339305,1.1895531759424154]
	Cluster Center 3: [0.046105182504868066,-0.40264821946281665,-0.641202160382231]
	Cluster Center 4: [0.8961682920708912,-0.4304681015340519,-1.3321178353886598]
	Cluster Center 5: [1.6027687792221723,-0.49240356248905454,-0.48594050399709776]
	Cluster Center 6: [1.0221981121970196,-0.41103829498718814,0.5232884006074435]
# of clusters: 7, # of iterations: 20
Frequency (or number of data points) of each cluster:
	Cluser: 0, # of members: 4185
	Cluser: 1, # of members: 1906
	Cluser: 2, # of members: 3804
	Cluser: 3, # of members: 2528
	Cluser: 4, # of members: 1467
	Cluser: 5, # of members: 2279
	Clus

+-----------+----------+
|customer_id|prediction|
+-----------+----------+
|         10|         6|
|         80|         2|
|         90|         2|
|        120|         5|
|        130|         7|
|        160|         6|
|        190|         7|
|        220|         5|
|        230|         6|
|        240|         0|
+-----------+----------+

Profile of each segment:
+-------+-----+-------+-----+-----+-----+-----+-----------------+------+------------------+
|segment|min_r|avg_r  |max_r|min_f|avg_f|max_f|min_a            |avg_a |max_a             |
+-------+-----+-------+-----+-----+-----+-----+-----------------+------+------------------+
|0      |1    |436.69 |1892 |1    |2.21 |5    |5.0              |37.48 |73.75             |
|1      |1    |639.91 |3664 |1    |2.17 |5    |66.66666666666667|245.61|4500.0            |
|2      |1    |373.38 |2822 |6    |7.61 |11   |6.75             |71.92 |3043.75           |
|3      |1    |222.51 |1939 |11   |13.82|45   |7.0              |65.73 |

+-----------+----------+
|customer_id|prediction|
+-----------+----------+
|         10|         6|
|         80|         2|
|         90|         2|
|        120|         4|
|        130|         7|
|        160|         6|
|        190|         7|
|        220|         4|
|        230|         6|
|        240|         0|
+-----------+----------+

Profile of each segment:
+-------+-----+-------+-----+-----+-----+-----+-----------------+------+-----------------+
|segment|min_r|avg_r  |max_r|min_f|avg_f|max_f|min_a            |avg_a |max_a            |
+-------+-----+-------+-----+-----+-----+-----+-----------------+------+-----------------+
|0      |1    |436.69 |1892 |1    |2.21 |5    |5.0              |37.48 |73.75            |
|1      |1    |639.91 |3664 |1    |2.17 |5    |66.66666666666667|245.61|4500.0           |
|2      |1    |373.38 |2822 |6    |7.61 |11   |6.75             |71.92 |3043.75          |
|3      |1    |222.51 |1939 |11   |13.82|45   |7.0              |65.73 |997.72

## 4. Segmentation summary

In [19]:
/*
summary.predictions.printSchema()
println("Prediction for first 10 customers:")
summary
    .predictions
    .withColumn("cust_id_numeric", $"customer_id".cast(IntegerType))
    .orderBy($"cust_id_numeric")
    .select("customer_id", "prediction")
    .limit(10)
    .show(10)

println("Profile of each segment:")
val segments = summary
                .predictions
                .groupBy($"prediction".alias("segment"))
                .agg(
                    min($"recency").alias("min_r"),
                    round(avg($"recency"),2).alias("avg_r"),
                    max($"recency").alias("max_r"),
                    
                    min($"frequency").alias("min_f"),
                    round(avg($"frequency"),2).alias("avg_f"),
                    max($"frequency").alias("max_f"),
                    
                    min($"amount").alias("min_a"),
                    round(avg($"amount"),2).alias("avg_a"),
                    max($"amount").alias("max_a"))
                .orderBy($"segment")
segments.show(10, truncate=false)
*/

Name: Syntax Error.
Message: 
StackTrace: 