Tauan Torres Mendes, Researcher
São Paulo-SP, Brazil

**`The Good Place`**
Giuseppe Garibaldi, CEO
Morumbi, São Paulo-SP, Brazil

Dear Giuseppe Garibaldi,

`Subject`: Pizza Study Design Proposal

I am writing to provide a detailed proposal for conducting a study that will help our small pizza company, **`The Good Place`**, create a compelling marketing advertisement comparing our offerings to our main competitor, `The Bad Place`. The primary focus of this study will be to highlight one or two of our company's strengths and present credible data to support our claims effectively.

**`Research Question:`**
To ensure the study is insightful and relevant to our target audience, I propose the following research question: "How does **`The Good Place`** compare to `The Bad Place` in terms of customer satisfaction regarding the amount of toppings on our pizzas?"

**Variables to Be Measured:**
1. **Customer Satisfaction:** We will measure customer satisfaction using a [Likert scale](https://en.wikipedia.org/wiki/Likert_scale), where 1 indicates "Very Dissatisfied," and 5 indicates "Very Satisfied." After each pizza purchase, customers will be asked to rate their satisfaction with the amount of toppings on a scale of 1 to 5.

2. **Amount of Toppings:** We will create a checklist for each pizza order, recording the specific toppings included on each pizza to ensure consistency and accuracy.

**Data Collection:**
To collect the necessary data, we will implement the following steps:
1. At the point of sale, our staff will distribute customer satisfaction surveys to patrons along with their orders. These surveys will include questions related to topping satisfaction.

2. Our order preparation team will meticulously record the toppings included on each pizza order and cross-reference them with the customer surveys to ensure accuracy.

**Data Summarization:**
To present the data effectively in our marketing advertisement, we will utilize the following graphical and numerical summaries:

1. **Bar Charts:** We will create bar charts to visualize the average customer satisfaction scores for both **`The Good Place`** and `The Bad Place`. This will provide a clear visual comparison of the two companies.

2. **Pie Charts:** We will also create pie charts to illustrate the distribution of customer satisfaction scores. This will help highlight the percentage of highly satisfied customers for each company.

**Incentive**: To encourage customer participation in the survey, we will offer an incentive. Upon completing the survey, customers will receive a unique hash. When they place their next order through our website or app, they can enter this hash to receive a 10% discount. This strategy will not only boost engagement but also foster customer loyalty.

3. **Mean and Standard Deviation:** We will calculate and report the mean and standard deviation of the customer satisfaction scores for both companies. This will provide a concise numerical summary of the data.

By following this study design, we can collect and summarize data that will allow us to confidently claim superiority in the amount of toppings and customer satisfaction over our competitor. These findings will serve as the foundation for our marketing campaign, emphasizing our strengths and attracting more customers to **`The Good Place`**.

I am confident that this study will provide valuable insights and support our marketing efforts effectively. Please let me know if you have any questions or if you would like to discuss this proposal further.

Sincerely,

Tauan Torres Mendes,
E-mail: tauantorresm@gmail.com



### `How to select dataframe subsets from multivariate data`

In [9]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 30) 


In [13]:
path = 'data/nhanes-2015-2016.csv'
df = pd.read_csv(path)
df.head()


Unnamed: 0,SEQN,ALQ101,ALQ110,ALQ130,SMQ020,RIAGENDR,RIDAGEYR,RIDRETH1,DMDCITZN,DMDEDUC2,DMDMARTL,DMDHHSIZ,WTINT2YR,SDMVPSU,SDMVSTRA,INDFMPIR,BPXSY1,BPXDI1,BPXSY2,BPXDI2,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST,HIQ210
0,83732,1.0,,1.0,1,1,62,3,1.0,5.0,1.0,2,134671.37,1,125,4.39,128.0,70.0,124.0,64.0,94.8,184.5,27.8,43.3,43.6,35.9,101.1,2.0
1,83733,1.0,,6.0,1,1,53,3,2.0,3.0,3.0,1,24328.56,1,125,1.32,146.0,88.0,140.0,88.0,90.4,171.4,30.8,38.0,40.0,33.2,107.9,
2,83734,1.0,,,1,1,78,3,1.0,3.0,1.0,2,12400.01,1,131,1.51,138.0,46.0,132.0,44.0,83.4,170.1,28.8,35.6,37.0,31.0,116.5,2.0
3,83735,2.0,1.0,1.0,2,2,56,3,1.0,5.0,6.0,1,102718.0,1,131,5.0,132.0,72.0,134.0,68.0,109.8,160.9,42.4,38.5,37.7,38.3,110.1,2.0
4,83736,2.0,1.0,1.0,2,2,42,4,1.0,4.0,3.0,5,17627.67,2,126,1.23,100.0,70.0,114.0,54.0,55.2,164.9,20.3,37.4,36.0,27.2,80.4,2.0


#### `Keep only body measures columns, so only columns with "BMX" in the name`

In [14]:
df.columns

Index(['SEQN', 'ALQ101', 'ALQ110', 'ALQ130', 'SMQ020', 'RIAGENDR', 'RIDAGEYR',
       'RIDRETH1', 'DMDCITZN', 'DMDEDUC2', 'DMDMARTL', 'DMDHHSIZ', 'WTINT2YR',
       'SDMVPSU', 'SDMVSTRA', 'INDFMPIR', 'BPXSY1', 'BPXDI1', 'BPXSY2',
       'BPXDI2', 'BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML', 'BMXARMC',
       'BMXWAIST', 'HIQ210'],
      dtype='object')

In [52]:
keep = [column for column in df.columns if 'BMX' in column]


In [53]:
df_BMX = df[keep]
df_BMX.head(3)


Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
0,94.8,184.5,27.8,43.3,43.6,35.9,101.1
1,90.4,171.4,30.8,38.0,40.0,33.2,107.9
2,83.4,170.1,28.8,35.6,37.0,31.0,116.5


### [From pandas docs](https://pandas.pydata.org/pandas-docs/stable/indexing.html):  

* .loc is primarily label based, but may also be used with a boolean array.   
* .iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

In [54]:
df.loc[:, keep].head(3)

Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
0,94.8,184.5,27.8,43.3,43.6,35.9,101.1
1,90.4,171.4,30.8,38.0,40.0,33.2,107.9
2,83.4,170.1,28.8,35.6,37.0,31.0,116.5


In [55]:
index_bool = np.isin(df.columns, keep)
index_bool

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True,  True,  True,  True,  True,  True,  True,
       False])

In [56]:
df.iloc[:, index_bool].head(3)

Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
0,94.8,184.5,27.8,43.3,43.6,35.9,101.1
1,90.4,171.4,30.8,38.0,40.0,33.2,107.9
2,83.4,170.1,28.8,35.6,37.0,31.0,116.5


#### `Selection by conditions`

In [57]:
waist_median = pd.Series.median(df_BMX['BMXWAIST'])
waist_median

98.3

In [58]:
df_BMX

Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
0,94.8,184.5,27.8,43.3,43.6,35.9,101.1
1,90.4,171.4,30.8,38.0,40.0,33.2,107.9
2,83.4,170.1,28.8,35.6,37.0,31.0,116.5
3,109.8,160.9,42.4,38.5,37.7,38.3,110.1
4,55.2,164.9,20.3,37.4,36.0,27.2,80.4
...,...,...,...,...,...,...,...
5730,59.1,165.8,21.5,38.2,37.0,29.5,95.0
5731,112.1,182.2,33.8,43.4,41.8,42.3,110.2
5732,71.7,152.2,31.0,31.3,37.5,28.8,
5733,78.2,173.3,26.0,40.3,37.5,30.6,98.9


In [62]:
df_BMX[ df_BMX['BMXWAIST'] > waist_median ].head(3)

Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
0,94.8,184.5,27.8,43.3,43.6,35.9,101.1
1,90.4,171.4,30.8,38.0,40.0,33.2,107.9
2,83.4,170.1,28.8,35.6,37.0,31.0,116.5


In [64]:
condition_1 = df_BMX['BMXWAIST'] > waist_median
condition_2 = df_BMX['BMXLEG'] < 32

df_BMX[ condition_1 & condition_2].head()


Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
15,80.5,150.8,35.4,31.6,32.7,33.7,113.5
27,75.6,145.2,35.9,31.0,33.1,36.0,108.0
39,63.7,147.9,29.1,26.0,34.0,31.5,110.0
52,105.9,157.7,42.6,29.2,35.0,40.7,129.1
55,77.5,148.3,35.2,30.5,34.0,34.4,107.6


In [66]:
df_BMX.loc[ condition_1 & condition_2, :].head()


Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
15,80.5,150.8,35.4,31.6,32.7,33.7,113.5
27,75.6,145.2,35.9,31.0,33.1,36.0,108.0
39,63.7,147.9,29.1,26.0,34.0,31.5,110.0
52,105.9,157.7,42.6,29.2,35.0,40.7,129.1
55,77.5,148.3,35.2,30.5,34.0,34.4,107.6


In [67]:
tmp = df_BMX.loc[ condition_1 & condition_2, :].head()
tmp.index = ['a', 'b', 'c', 'd', 'e']
tmp


Unnamed: 0,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST
a,80.5,150.8,35.4,31.6,32.7,33.7,113.5
b,75.6,145.2,35.9,31.0,33.1,36.0,108.0
c,63.7,147.9,29.1,26.0,34.0,31.5,110.0
d,105.9,157.7,42.6,29.2,35.0,40.7,129.1
e,77.5,148.3,35.2,30.5,34.0,34.4,107.6
