# Drawing Conclusions Using Query
In the notebook below, you're going to investigate two questions about this data using pandas' query function. Here are tips for answering each question:

### Q1: Do wines with higher alcoholic content receive better ratings?
To answer this question, use query to create two groups of wine samples:

> 1. Low alcohol (samples with an alcohol content less than the median)
> 2. High alcohol (samples with an alcohol content greater than or equal to the median)

Then, find the mean quality rating of each group.

### Q2: Do sweeter wines (more residual sugar) receive better ratings?
Similarly, use the median to split the samples into two groups by residual sugar and find the mean quality rating of each group.

# Drawing Conclusions Using Query

In [39]:
# Load 'winequality_edited.csv,' a file you created in a previous section 
import pandas as pd
import numpy as np

df = pd.read_csv('winequality_edited.csv')
df.rename(columns = {'fixed acidity': 'fixed_acidity', 
                    'volatile acidity': 'volatile_acidity', 
                    'citric acid': 'citric_acid', 
                    'residual sugar': 'residual_sugar', 
                    'free sulfur dioxide': 'free_sulfur_dioxide',
                    'total sulfur dioxide': 'total_sulfur_dioxide', 
                    'acidity levels': 'acidity_levels'}, inplace = True)
df.head()

Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality,color,acidity_levels
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red,low
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red,mod_high
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red,medium
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red,mod_high
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red,low


### Do wines with higher alcoholic content receive better ratings?

In [5]:
# get the median amount of alcohol content
median = df.alcohol.median()

10.3

In [20]:
# select samples with alcohol content less than the median
low_alcohol = df.query('alcohol < 10.3')

In [21]:
# select samples with alcohol content greater than or equal to the median
high_alcohol = df.query('alcohol >= 10.3')

In [22]:
# ensure these queries included each sample exactly once
num_samples = df.shape[0]
num_samples == low_alcohol['quality'].count() + high_alcohol['quality'].count() # should be True

True

In [24]:
# get mean quality rating for the low alcohol and high alcohol groups
low_alcohol.quality.mean()

5.475920679886686

In [25]:
high_alcohol.quality.mean()

6.146084337349397

### Do sweeter wines receive better ratings?

In [41]:
# get the median amount of residual sugar
df.residual_sugar.median()

3.0

In [42]:
# select samples with residual sugar less than the median
low_sugar = df.query('residual_sugar < 3.0')

# select samples with residual sugar greater than or equal to the median
high_sugar = df.query('residual_sugar >= 3.0')

# ensure these queries included each sample exactly once
num_samples == low_sugar['quality'].count() + high_sugar['quality'].count() # should be True

True

In [43]:
# get mean quality rating for the low sugar and high sugar groups
low_sugar.quality.mean(), high_sugar.quality.mean()

(5.808800743724822, 5.82782874617737)

In [44]:
df.to_csv('winequality_edited.csv', index=False)