# **5-1 - Conjoint Analysis**

Marketing and Customer Analytics
---

**Outline**
1. Business Understanding
2. Read Data
3. Generate Attribute Combination
4. Check Combination Distribution

In [1]:
import pandas as pd
import numpy as np
import itertools
import random
from collections import Counter

# **1. Business Understanding**
---

- Company ABC is SaaS platform for online stores, e-commerce, and retail point-of-sale systems
- The ABC platform offers online retailers a suite of services including payments, marketing, shipping and customer engagement tools


## **1.2 Business Objective**
---
- Company ABC wants to know which attribute of the platform is the most prefered for customers
- Company uses the information to determine which marketing messages is the most suitable to be advertized and at what price

## **1.3 Modelling Task**
---

- Output target: customer choice
- The goal of this project is model customer choice (Choice Based Conjoint Method / CBC)
- Modelling task: Classification
- We need interpretable model, so model used: Logistic Regression
- Our target response is imbalance, so we consider use f1-score as evaluation metric

# **2. Read Data**
---

- Read the data of attribute list
- The data consist of choice of price and choice of attribute
- We need to build combination of attribute list to build a survey

In [2]:
# read dataset function
def read_data(path):
    """
    Reads a CSV file at the given path
    and returns its contents as a pandas DataFrame.

    Parameters
    ----------
    path : str
        input path

    Returns
    -------
    df : pandas Dataframe
        Sample dataframe
    """
    # Read data
    data = pd.read_csv(path)
    print('Original data shape :', data.shape)

    return data

In [6]:
# read data
df_attribute = read_data(path = 'Attribute_Conjoint.csv')
df_attribute

Original data shape : (8, 2)


Unnamed: 0,Price,Attribute
0,$20,Website Customization
1,$25,Integration with Third-Party Apps
2,$30,Shipping and Fulfillment
3,$35,SEO and Marketing Features
4,$40,Customer Support
5,$45,Analytics and Reporting
6,$50,Bulk Product Upload
7,$55,Automated Email Marketing


# **3. Generate Combinations**
---

- Company conduct conjoint analysis with 8 pairs of attribute
- Company determine the survey design with only 20 questions
- Each of question will have 3 options with 1 additional None option
- It means we need to generate 60 combinations (3 * 20)


Consider one of the following example question 1:

<center>
<img src="https://sekolahdata-assets.s3.ap-southeast-1.amazonaws.com/notebook-images/marketing-analytics-1/live_9_1.png">
</center>


**Survey Design:**


- In this survey design, we want to know which additional features is the most prefered for customer
- And we want to know at what price
- Basically, all of the additional feature will be available for every subscription plan with determined price
- In this conjoint survey design, we just want to know which feature (attribute) is the most prefered even though all of the features will be used

In [7]:
def generate_combination(df_attribute):
    """
    Function to generate combinations of

    Parameters
    ----------
    df_attribute : pandas dataframe
        dataframe attribute list

    Returns
    -------
    df_pairs : pandas Dataframe
        Sample dataframe with 60 combinations
    """
    # 1. Generate all possible combinations of attributes and prices
    combinations = list(itertools.product(df_attribute['Attribute'], df_attribute['Price']))

    # 2. Shuffle the combinations to add randomness
    random.seed(52)
    random.shuffle(combinations)

    # 3. Create 60 random pairs with approximately equal frequency
    pairs = combinations[:60]

    # 4. Create a DataFrame with the pairs
    df_pairs = pd.DataFrame(pairs, columns=['Attribute', 'Price'])

    return df_pairs

In [8]:
df_pairs = generate_combination(df_attribute)
df_pairs

Unnamed: 0,Attribute,Price
0,Shipping and Fulfillment,$40
1,Shipping and Fulfillment,$20
2,Analytics and Reporting,$35
3,SEO and Marketing Features,$45
4,Bulk Product Upload,$45
5,Website Customization,$55
6,Customer Support,$35
7,Automated Email Marketing,$40
8,Customer Support,$40
9,Integration with Third-Party Apps,$25


In [9]:
# check frequency of occurrence
df_pairs['Attribute'].value_counts()

Unnamed: 0_level_0,count
Attribute,Unnamed: 1_level_1
Shipping and Fulfillment,8
SEO and Marketing Features,8
Bulk Product Upload,8
Integration with Third-Party Apps,8
Analytics and Reporting,7
Website Customization,7
Customer Support,7
Automated Email Marketing,7


In [10]:
# check frequency of occurrence
df_pairs['Price'].value_counts()

Unnamed: 0_level_0,count
Price,Unnamed: 1_level_1
$40,8
$45,8
$25,8
$55,8
$35,7
$20,7
$50,7
$30,7


# **4. Validate the Combinations**
---

A combination is said to be good when:
- each attribute appears in each question equally
- evenly distributed in the sense that the frequency of occurrence is equal (or at least similar)

In [11]:
def check_attribute(df_pairs, df_attribute):
    """
    Check if the columns of a DataFrame 'df_pairs' have the same unique values
    as the corresponding columns in another DataFrame 'df_attribute'.

    Parameters
    ----------
    df_pairs : pandas DataFrame
        A DataFrame containing pairs of attributes and the corresponding prices.

    df_attribute : pandas DataFrame
        A DataFrame containing a list of unique attributes and prices.

    Returns
    -------
    result : bool
        True if the unique values of columns in 'df_pairs' are the same as those
        in 'df_attribute' for all columns, False otherwise.
    """
    # Get the column names of the DataFrame df_pairs
    cols = df_pairs.columns

    # Loop through each column of df_pairs
    for col in cols:
      if set(df_pairs[col]) == set(df_attribute[col]):
        return True
      else:
        return False

In [12]:
# check attribute
check_attribute(df_pairs, df_attribute)

True

In [13]:
def check_distribution(df_pairs, threshold=1):
    """
    Check if the distribution of unique values in the columns of a DataFrame 'df_pairs' is within a specified threshold.

    Parameters
    ----------
    df_pairs : pandas DataFrame
        A DataFrame containing pairs of attributes and their corresponding prices.

    threshold : int
        The threshold value for the range of occurrence counts of unique values.
        The default value is 1.

    Returns
    -------
    results : list of bool
        A list of boolean values where each element corresponds to a column in 'df_pairs'.
    """
    # Get the column names of the DataFrame df_pairs
    cols = df_pairs.columns

    # Initialize an empty list to store results for each column
    results = []

    # Loop through each column of df_pairs
    for col in cols:
        # Get the unique values and their occurrence counts in the current column
        values, counts = np.unique(df_pairs[col], return_counts=True)
        # Calculate the range of occurrence counts (difference between max and min counts)
        range = np.max(counts) - np.min(counts)
        # Append the result (True or False) to the results list
        results.append(range <= threshold)

    return results


In [14]:
# check distribution
check_distribution(df_pairs, threshold=1)

[np.True_, np.True_]

In [15]:
df_pairs.to_csv('df_pairs.csv')
print("data saved")

data saved
