In [5]:
# importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [6]:
my_df = pd.read_excel('Survey_Data_Final_Exam.xlsx')

## Introduction

The purpose of this analysis is to use unsupervised learning to gain information and see patterns related to personality traits and how they affect laptop manufacturer preference, if they do at all. This information will be used to derive insights for Microsoft, both on their current customers and any potential customers. The data being used came from a survey containing questions which can be used to determine personality type, more specifically looking at <a href="https://psychcentral.com/lib/the-big-five-personality-traits/">'The Big Five'</a>, as well as questions about the <a href="https://www.hult.edu/blog/why-every-leader-needs-growth-mindset/">Hult DNA</a> and demographic information.



### Domain Knowledge Research

Before beginning to analyze the survey responses, we conducted some domain research to get a better understanding of the data we had. This research was also used to derive hypotheses. Some highlights from our research can be found below.

#### <a href="https://www.researchgate.net/publication/259540094_I'm_a_Mac_versus_I'm_a_PC_Personality_Differences_between_Mac_and_PC_Users_in_a_College_Sample">Nemid & Pastva</a> (2013)

 - Big Five personality traits did not differentiate between Mac and PC owners. Students overall rated Macs higher on various product attributes (attractive style, cool, youthful, and exciting) and PCs higher on reasonable price and good for gaming.
 
 - PC owners placed greater importance on cost as a determinant of brand choice, whereas Mac owners placed greater emphasis on style. 
 
 - Personality traits may have more nuanced effects on brand choices, as shown by relationships between Neuroticism and greater importance placed on cost and lesser importance placed on ease of use. 
     - **Personality Traits are more important in the brand choice!**
     - More neuroticism, more importance in cost and less importance in ease of use
     
- Openness to Experience was associated with greater importance placed on reliability and lesser importance placed on style.
    - **More openness to experience, more importance in reliability and less importance in style**
    
    
https://www.researchgate.net/publication/259540094_I'm_a_Mac_versus_I'm_a_PC_Personality_Differences_between_Mac_and_PC_Users_in_a_College_Sample


___________


#### PC World

<a href="https://www.pcworld.com/article/141473/article.html">A study</a> carried out by Mindset Media found the following:

- People who purchase Macs fall into what the branding company calls the "Openness 5" personality category -- which means they are more liberal, less modest and more assured of their own superiority than the population at large.
-  People from Openness 5 seek rich, varied and novel experiences, according to the company, and believe that imagination and intellectual curiosity are as important to life as more rational or pragmatic endeavors.

### Hypotheses
- Mac users will have more opennes to new adventures based on the PC World research
- Mac users are more extroverted than windows users. <a href="https://mashable.com/2011/04/23/mac-vs-pc-infographic/">This infographic</a> created by Hunch states that Windows users are 26% more likely to prefer fitting in with others, while Mac users are 50% more likely to say they frequently throw parties.
- Windows users will be more conscientious than Mac users. <a href="https://www.neosperience.com/blog/the-new-marketing-is-people-centric-know-your-customer-personality/">It has been said</a> that when purchasing, conscientious customers 'look for the utilitarian, functional, task-related, and rational value of shopping'. 

<strong>Determining Customer Types</strong>

In [8]:
# Changing MAC to Macbook
my_df['What laptop do you currently have?'] =  my_df['What laptop do you currently have?'].replace('MAC', 'Macbook')
my_df['What laptop would you buy in next assuming if all laptops cost the same?'] =  my_df['What laptop would you buy in next assuming if all laptops cost the same?'].replace('MAC', 'Macbook')

The initial plan was to split each respondent in to one of 4 customer types; Loyal Windows, Loyal Macbook, Windows Deserter and Macbook Deserter. However, as can be seen below, when doing this the 'Macbook Deserter' group was too small to use as a sample.

It should still be noted that only 10% of current Macbook users would switch to a different brand of laptop, compared with 23% of Windows users. This shows that there is less brand loyalty towards Windows laptops than Macbook's. We would recommend Microsoft carry out more research in to users who are willing to change laptop brands in order to gain a deeper understanding of their own customers who would consider leaving, as well as Macbook customers they could potentially poach.

In [11]:

# # Loop to determine customer type

# for index, row in my_df.iterrows():
#     if 'Windows laptop' in row['What laptop do you currently have?'] and 'Windows laptop' in row['What laptop would you buy in next assuming if all laptops cost the same?']:
#         my_df.loc[index, 'customer_type'] = 'Loyal Windows'
    
#     elif 'Windows laptop' in row['What laptop do you currently have?'] and 'Windows laptop' not in row['What laptop would you buy in next assuming if all laptops cost the same?']:
#         my_df.loc[index, 'customer_type'] = 'Windows Deserter'   
   
#     elif 'Macbook' in row['What laptop do you currently have?'] and 'Macbook' in row['What laptop would you buy in next assuming if all laptops cost the same?']:
#         my_df.loc[index, 'customer_type'] = 'Loyal Macbook'
        
#     elif 'Macbook' in row['What laptop do you currently have?'] and 'Macbook' not in row['What laptop would you buy in next assuming if all laptops cost the same?']:
#         my_df.loc[index, 'customer_type'] = 'Macbook Deserter'      
        
#     else:
#         my_df.loc[index, 'customer_type'] = 'error'

In [13]:
# my_df['customer_type'].value_counts()

Value Counts using code above:
~~~
Loyal Macbook       181
Loyal Windows       147
Windows Deserter     45
Macbook Deserter     19
~~~

In order to avoid a sampling issue, we have decided to use three customer types instead. For Microsoft, it will be important to understand the personalities of the audience, depending on their purchasing preferences. Our three customer types are:
- Loyal Windows - Currently own a Windows laptop and would buy a Windows laptop next
- Loyal Macbook - Currently own a Macbook and would buy a Macbook next
- Not Brand Loyal - Users who said their next laptop will be a different brand to the one they currently have.

With these groups, we'll gain a better understanding of if certain personality types lead to brand loyalty, or lack of loyalty, to Windows or Macbook.

In [14]:

# Loop to determine customer type

for index, row in my_df.iterrows():
    if 'Windows laptop' in row['What laptop do you currently have?'] and 'Windows laptop' in row['What laptop would you buy in next assuming if all laptops cost the same?']:
        my_df.loc[index, 'customer_type'] = 'Loyal Windows' 
   
    elif 'Macbook' in row['What laptop do you currently have?'] and 'Macbook' in row['What laptop would you buy in next assuming if all laptops cost the same?']:
        my_df.loc[index, 'customer_type'] = 'Loyal Macbook'
        
    elif row['What laptop do you currently have?'] != row['What laptop would you buy in next assuming if all laptops cost the same?']:
        my_df.loc[index, 'customer_type'] = 'Not Brand Loyal'      
        
    else:
        my_df.loc[index, 'customer_type'] = 'error'

In [15]:
my_df['customer_type'].value_counts()

Loyal Macbook      181
Loyal Windows      147
Not Brand Loyal     64
Name: customer_type, dtype: int64