#***Customer Loyalty Survey Analysis***
---
Andrea Wright, wrigh282@miamioh.edu | Reece Gordley, gordlerw@miamioh.edu



# 1. Introduction <font color='red'> 

## 1.1 Problem Background 

Surveys can be used as a very powerful informative tool. It gives us insight into things such as consumer preferences and consumer's thoughts on a specific item. Throughout this project, Consumer-Based Brand Equity (CBBE) will be evaluated. This metric refers to the consumer attitude towards a brand and is rated on a 1 (unfavorable)  to 5 (favorable) scale. More specifically, we will be looking at the Customer Loyalty factor of CBBE. CBBE is evaluated using a series of statements relating to brand-event personality fit (BEPF) which was administered via the survey. 

Consumers were evaluated via a quesionnaire that employed the Likert scale. The Likert scale is from 1 (bad, dislike) to 7  (good, like) for this survey. 

This project intends to help marketing companies further understand the consumer and the specific attributes that are most related to customer loyalty. With this information, partnerships and sponsorships between company and sport/team can further be leveraged in an attempt to help maximize the marketing techniques and maintain loyal customers. 




## 1.2 Research Goals <font color='red'> 



The goal of this project is to further understand the consumer loyalty. More specifically, we seek to understand what makes our youngest consumers, 18-21 year-olds loyal to a brand. This is an ideal age range to build loyalty with the hopes that they will continue to remain loyal and not switch to a rival company throughout their lifetime. 

This analysis will allow the brand to further understand how to target this specific demographic. Then, marketing companies can implement strategies to keep these individuals and further build the relationship between consumer and brand. 

***Our ultimate goal is to better understand the factors contributing to customer loyalty, and find ways to implement this to the specified demographic range.***

## 1.3 Key Assumptions

We assume the following about the data before completing any analysis:


*   Simple random sampling was employed when distributing the questionnaire so as not to pull from a specific group (i.e. questionnaire was not distributed to a single Greek organization attending the game).
*   Consumers completed the questionnaire with honesty, to the best of their ability, and, if questionnaires were distributed at or around the time of a game, they were sufficiently non-inebriated. 
*   All observations provided are associated to a single company, the identity of which is unknown to data anlysts but was provided to questionnaire-takers. (I.e. All observations provided are about customer's opinion of Coca-Cola, not a mix of Coca-Cola and Delta Airlines).
*   Indicators of Brand Awareness, Quality, and Loyalty were composed of 3 components each. We assume that, as the questionnarie is written, each component holds equal weight for their respective indicators. (Ex. The first question regarding Brand Awareness does not hold higher importance or indicate greater awareness than the remaining two questions when the questionnaire is delivered to the consumer). 
 


# 2. Project Setup

First, we import relevant Python packages. 

In [None]:
import pandas as pd
import numpy as np
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import linear_model
pd.set_option('display.max_columns', None)

# 3. Data Manipulation


## 3.1 Data Download and Initial Review
In this section, results of the questionnaire data is downloaded and saved as `q_df`. We take a brief look at the variables and their data types.

NOTE: The file `BEPF.csv` includes the first sheet of the .xlsx file provided with this assignment. Contact authors if it is necessary to provide `BEPF.csv`.

In [None]:
q_df = pd.read_csv('BEPF.csv')
q_df.head()

Unnamed: 0,PF_Dow,PF_Sta,PF_Res,PF_Act,PF_Dyn,PF_Inn,PF_Agg,PF_Bol,PF_Ord,PF_Sim,PF_Rom,PF_Sen,BE_Recog,BE_Aware,BE_ComMind,BE_RecalLG,BE_DIFF_Im,BE_Quality,BE_Function,BE_BLoyal,BE_FirstChoi,BE_NBuyOth,Gender,Age,Ethn,Schyea,EAtt_BG,EAtt_DL,EAtt_UP,EAtt_UF
0,5,6,7,7,6,5,7,6,2,2,1,2,5,5,5,5,5,4,4,2,2,1,1,1,1,3,7,7,7,7
1,4,7,5,5,4,6,5,5,6,3,1,2,5,5,5,5,5,3,4,2,2,1,2,1,1,2,6,5,5,5
2,6,6,6,5,5,7,3,5,4,4,4,4,4,4,3,5,4,3,4,2,2,2,1,1,1,3,6,6,6,6
3,2,6,5,6,4,5,6,6,5,4,1,2,5,5,3,5,5,3,4,1,1,1,2,2,1,4,6,6,6,5
4,5,6,5,7,6,5,3,5,2,4,2,4,5,5,4,5,5,4,4,3,3,3,1,1,1,3,6,7,7,6


Evaluate the data type of each variable.

In [None]:
q_df.dtypes

PF_Dow           int64
PF_Sta          object
PF_Res           int64
PF_Act          object
PF_Dyn           int64
PF_Inn           int64
PF_Agg           int64
PF_Bol           int64
PF_Ord           int64
PF_Sim           int64
PF_Rom           int64
PF_Sen          object
BE_Recog         int64
BE_Aware         int64
BE_ComMind       int64
BE_RecalLG       int64
BE_DIFF_Im       int64
BE_Quality       int64
BE_Function      int64
BE_BLoyal        int64
BE_FirstChoi     int64
BE_NBuyOth      object
Gender           int64
Age              int64
Ethn            object
Schyea          object
EAtt_BG         object
EAtt_DL         object
EAtt_UP         object
EAtt_UF         object
dtype: object

For consistency, convert entire dataframe to type `int64` or `float`. In the event that a value is missing, replace with `NaN` using `errors='coerce'`.

In [None]:
q_df = q_df.apply(pd.to_numeric, errors='coerce')

Remove any rows that contain NaN values. 

In [None]:
q_df.dropna(inplace=True)

### 3.2 Identifying the Target Audience
For this project, we seek to evaluate customer loyalty as it relates to the brand. For this reason, we remove the variables 
* `BE_Recog`
* `BE_Aware`
* `BE_ComMind`
* `BE_RecalLG`
* `BE_DIFF_Im`
* `BE_Quality`
* `BE_Function` 

that are indicative of the Brand Awareness and Perceived Qualities aspects of CBBE. 

In [None]:
q_df.drop(['BE_Recog','BE_Aware','BE_ComMind','BE_RecalLG','BE_DIFF_Im','BE_Quality','BE_Function'], axis=1, inplace=True)

Of our target audience, we are not concered with School Year Classification, Gender, or Ethnicity. Those variables are dropped. 

In [None]:
q_df.drop(['Schyea', 'Gender', 'Ethn'], axis=1, inplace=True, )

We also seek to identify only younger consumers, or those belonging to the 18-21 age range. Age groups 2, 3, and 4 are eliminated. Then, the variable `Age` is eliminated. 

In [None]:
q_df = q_df.query("Age == 1")
q_df.drop('Age', axis=1, inplace=True)

### 3.3 Creating CBBE Loyalty Score

Rather than evaluate each CBBE item individually, we generate a CBBE Loyalty score. The score is as follows:

$CBBE_{Loyalty} =$ *BE_BLoyal + BE_FirstChoi + BE_NBuyOth*

where the CBBE score components correspond to the following questionnaire statements:
* *BE_BLoyal* = "I would consider myself to be loyal to X."
* *BE_FirstChoi* = "X would be my first choice."
* *BE_NBuyOth* = "I will not buy other brands if X is available at the store."

These statements are scored on a 1 (strongly disagree) to 5 (strongly agree) scale.  

Below, we create the CBBE Loyalty Score, `CBBE_Loyalty` and drop the individual components.

In [None]:
q_df['CBBE_Loyalty'] = q_df['BE_BLoyal'] + q_df['BE_FirstChoi'] + q_df['BE_NBuyOth']
q_df.drop(['BE_BLoyal', 'BE_FirstChoi', 'BE_NBuyOth'], axis=1, inplace=True)
q_df.head()

Unnamed: 0,PF_Dow,PF_Sta,PF_Res,PF_Act,PF_Dyn,PF_Inn,PF_Agg,PF_Bol,PF_Ord,PF_Sim,PF_Rom,PF_Sen,EAtt_BG,EAtt_DL,EAtt_UP,EAtt_UF,CBBE_Loyalty
0,5,6.0,7,7.0,6,5,7,6,2,2,1,2.0,7.0,7.0,7.0,7.0,5.0
1,4,7.0,5,5.0,4,6,5,5,6,3,1,2.0,6.0,5.0,5.0,5.0,5.0
2,6,6.0,6,5.0,5,7,3,5,4,4,4,4.0,6.0,6.0,6.0,6.0,6.0
4,5,6.0,5,7.0,6,5,3,5,2,4,2,4.0,6.0,7.0,7.0,6.0,9.0
5,5,6.0,6,6.0,5,5,5,5,5,5,4,4.0,7.0,7.0,7.0,7.0,12.0


# 4. Analytical Techniques <font color='orange'> 

## 4.1 Scaling

First, data is scaled to generalize the data. BEPF and control variables (EAtt_xxx) exist on a 1-7 scale while CBBE exist on a 1-5 scale. 

In [None]:
scaler = StandardScaler()

scale_df = q_df # create new df that will be scaled, scale_df
sacle_df = pd.DataFrame(scaler.fit_transform(scale_df), columns=scale_df.columns)
scale_df.head()

Unnamed: 0,PF_Dow,PF_Sta,PF_Res,PF_Act,PF_Dyn,PF_Inn,PF_Agg,PF_Bol,PF_Ord,PF_Sim,PF_Rom,PF_Sen,EAtt_BG,EAtt_DL,EAtt_UP,EAtt_UF,CBBE_Loyalty
0,5,6.0,7,7.0,6,5,7,6,2,2,1,2.0,7.0,7.0,7.0,7.0,5.0
1,4,7.0,5,5.0,4,6,5,5,6,3,1,2.0,6.0,5.0,5.0,5.0,5.0
2,6,6.0,6,5.0,5,7,3,5,4,4,4,4.0,6.0,6.0,6.0,6.0,6.0
4,5,6.0,5,7.0,6,5,3,5,2,4,2,4.0,6.0,7.0,7.0,6.0,9.0
5,5,6.0,6,6.0,5,5,5,5,5,5,4,4.0,7.0,7.0,7.0,7.0,12.0


## 4.3 Multiple Linear Regression



In [None]:
# create indep and dep vars
X = q_df.drop('CBBE_Loyalty', axis=1)
y = q_df['CBBE_Loyalty']

In [None]:
# create training and test datasets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=1)

In [None]:
# OLS regression model
import statsmodels.api as sm
X_new = sm.add_constant(X_train)
reg_model = sm.OLS(y_train, X_new).fit()
print(reg_model.summary())

                            OLS Regression Results                            
Dep. Variable:           CBBE_Loyalty   R-squared:                       0.353
Model:                            OLS   Adj. R-squared:                  0.238
Method:                 Least Squares   F-statistic:                     3.068
Date:                Mon, 20 Mar 2023   Prob (F-statistic):           0.000385
Time:                        00:25:26   Log-Likelihood:                -256.25
No. Observations:                 107   AIC:                             546.5
Df Residuals:                      90   BIC:                             591.9
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.6710      2.514      0.665      0.5

# 5. Results <font color='orange'> 

## 5.1 Significant Variables

The independent variables with significant p-values can be found in Table 1. We do not include the control variable `EAtt_UF` despite its significant p-value. 

**Table 1. Significant variable description**

| Variable Name | Variable Description | Coefficient | P-Value |
|---------------|:----------------------|--------------|---------|
|PF_Dow|Responsibility dimension, "Down to earth"|-0.6392|0.025|
|PF_Res|Responsibility dimension, "Responsible"|0.7069|0.066|
|PF_Sen|Responsibility dimension, "Sentimental"|0.4841|0.088|

The key elements that we were looking at were the variables with a p-value less than 0.10. The traits were down to earth, responsible, and sentimental fit. The down to earth variable (`PF_Dow`) had a p-value of 0.025, the responsible variable (`PF_Res`) had a p-value of 0.066 and the sentimental variable (`PF_Sen`) was found to have a p-value of 0.088. 

## 5.2 Regression Statistics

First, $R^2$ is 0.353 and adjusted $R^2$ is 0.238. These are not particularly high or considered good, but we expected this to some degree based on the correlation matrix.

# 6. Discussion <font color='red'> 

Targeting a consumer who is currently 18-21 years will allow brands to gain loyalty early in someones life with intensions to keep them a returning customer. 

The first variable, "down to earth", is negatively associated with loyalty. With this information we can say that this age range does not value being seen as down to earth in terms important in terms of brand loyalty for the most part. 

The second variable, responsibility, is positively associated with a coefficient of 0.7069. The third variable, sentimental is positive as well with a coefficient of 0.4841. Of positively associated variables, responsibility plays a greater role in indicating consumer loyalty than the quality "sentimental."


# 7. Analysis of Project <font color='red'> 

## 7.1 Strengths



1. The biggest strength of this project is the specificity in terms of targeting a specific age range and loyalty. Surveying can sometimes be difficult to measure but by running a model to find out what people think is most important in terms of loyalty can make it easy to view. 

2. Creation of the variable `CBBE_Loyalty` provided a single, concise measurement of consumer loyalty. This single performance indicator is preferable to three distinct indicators, assuming each component has an equal weight. 

## 7.2 Weaknesses


1. The mutliple linear regression shows an adjusted $R^2$ value of only 0.238, indicating that the model accounts for approximately 24% of varaition for customer loyalty score by the independent BEPF and `EAtt_xx` variables. Moving forward, it may be necessary to implement a polynomial model, however this does pose a significant risk of overfitting. 


2. The variables beginning with `EAtt_` were included in the analysis as a test of validity. These variables were included in the regression since high scores here are indicative of a positive brand association. High scores of `CBBE_Loyalty` are also indicative of positive brand association so, theoretically, these two groups of variables should have a positive relationship. There is room for research with regards to the association between these two groups alone, from which we may see a more complex relationship. 

3. One weakness that is quite obvious is the fact it is a survey. When giving out a questionaire like this you never know how serious someone takes this as they may be rushing or not filling it out to how they truly feel. This could be more true as it varies across an age group. For example, people aged 18-21 might not care as much about answering truthfully versus someone who is aged 27+. This could be for any reason but we truthfully don't know. 

4. The scope of this project is limited. We evaluate loyalty for a single age range (18-21 years) in an attempt to maintain them as customers. However, these opinions may change with age. For example, the quality of a company being "down-to-earth" was negatively associated for this age group. As this group grows older, it is likely this could change and as such, marketing strategies implemented for them may no longer work. Additionally, there are generational differences that exist. In other words, the qualities that indicate loyalty for this particular set of 18-21 year-olds may be different than the next round of 18-21 year-olds. A longitudinal study may be required to evaluate whether the same BEPF factors are applicable across generations. Additional studies should be conducted to determine the most significant BEPF factors across different age groups. 

# 8. Conclusions and Call to Action





The corporation in question should utilize the results of the regression when garnering loyalty for consumers in the age range 18-21 years. Our recommendations based on the results of the multiple linear regression are as follows:

1. Positively associated trait: Responsible
  * Corporate responsibility is shown to be positively associated with consumer loyalty for this age group. 
  * Items in this category include the coporation's contributions to societal issues, the environment, or the economy. 
  * Action items here can include taking active responsibility when a fault is made by the company *or* acting of their own volition in response to current events. Examples include:
    * Appropriate treatment of workers including but not limited to wages, healthcare, hiring practices, etc. 
    * Charitable actions
    * Reducing carbon footprint, fair-trade practices, environmentally conscious efforts, etc. 
    * Public statements regarding current events
  * A historic example of corporate responsibilty is Procter & Gamble donating money for every purchase of Dawn dish soap in response to the 2010 Deepwater Horizon Oil Spill. 

2. Positively associated trait: Sentimental
  * Sentimentality is shown to be positively associated with consumer loyalty for this age group. 
  * Items in this category can involve personalized customer service, creating happy experiences, and maintaining positive reputations. 
  * Marketing employees may consider using sentiment analysis on public or social media to understand where their company exists among competitors in terms of positive, neutral, or negative sentiments. 

3. Negatively associated trait: Down-to-earth
  * Being "down-to-earth" was negatively associated with consumer loyalty for this age group. 
  * To avoid impressions of being "down-to-earth" marketing executives can employ a wistful, idealistic, or grandiose techniques when discussing or advertising the brand. 
  * For example, Delta Airlines may consider marketing air travel as "the most luxurious and exclusive mode of transportation" with similing and attractive airline staff, opulent seating and dining options, and beaming customers as opposed to realistic descriptions of air travel which can be frustrating, stressful, and unglamorous. 
