## Project - FetchMaker

In [1]:
import numpy as np

Let's start by including a data interface called fetchmaker that will give you access to FetchMaker's dog data.<br>
Use import fetchmaker at the top of your script.py file to import the fetchmaker package.

In [2]:
import fetchmaker

The attributes that FetchMaker keeps track of are:
- weight, an integer representing how heavy a dog is in pounds
- tail_length, a float representing tail length in inches
- age, in years
- color, a String such as "brown" or "grey"
- is_rescue, a boolean 0 or 1

The fetchmaker package lets you access this data for a specific breed of dog with the following format:
- fetchmaker.get_weight("poodle")

This returns a Pandas DataFrame of the weights of the poodles recorded in the system. The other methods are get_tail_length, get_color, get_age, and get_is_rescue, which all take a breed as an input.<br>
Get the tail lengths of all of the "rottweiler"s in the system, and store it in a variable called rottweiler_tl.

In [3]:
rottweiler_tl = fetchmaker.get_tail_length('rottweiler')

Print out the mean of rottweiler_tl and the standard deviation of rottweiler_tl, using np.mean and np.std.

In [4]:
print('mean: %0.6f, standard deviation: %0.6f' %(np.mean(rottweiler_tl), np.std(rottweiler_tl)))

mean: 4.236100, standard deviation: 2.064754


Over the years, we have seen that we expect 8% of dogs in the FetchMaker system to be rescues. We want to know if whippets are significantly more or less likely to be a rescue.<br>
Store the is_rescue values for "whippet"s in a variable called whippet_rescue.

In [5]:
whippet_rescue = fetchmaker.get_is_rescue('whippet')

Use np.count_nonzero to get the number of entries in whippet_rescue that are 1. Store this number in a variable called num_whippet_rescues.

In [6]:
num_whippet_rescues = np.count_nonzero(whippet_rescue)

Get the number of samples in the whippet set by taking the np.size of whippet_rescue. Store this in a variable called num_whippets.

In [7]:
num_whippets = np.size(whippet_rescue)

Use a binomial test to test the number of whippet rescues, num_whippet_rescues, against our expected percentage, 8%.<br>
Remember to import the binomial test by using from scipy.stats import binom_test.

In [8]:
from scipy.stats import binom_test

Print out the p-value. Is your result significant?

In [9]:
print('%.06f' %binom_test(num_whippet_rescues, num_whippets, .08))

0.581178


Three of our most popular mid-sized dog breeds are whippets, terriers, and pitbulls. Is there a significant difference in the average weights of these three dog breeds? Perform a comparative numerical test to determine if there is a significant difference.

In [10]:
from scipy.stats import f_oneway

w = fetchmaker.get_weight('whippet')
t = fetchmaker.get_weight('terrier')
p = fetchmaker.get_weight('pitbull')
print('%.06f' %f_oneway(w, t, p).pvalue)

0.000000


Now, perform another test to determine which of the pairs of these dog breeds differ from each other.

In [11]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

values = np.concatenate([w, t, p])
labels = ['whippet'] * len(w) + ['terrier'] * len(t) + ['pitbull'] * len(p)
print(pairwise_tukeyhsd(values, labels, .05))

Multiple Comparison of Means - Tukey HSD,FWER=0.05
 group1  group2 meandiff  lower  upper  reject
----------------------------------------------
pitbull terrier  -13.24  -16.728 -9.752  True 
pitbull whippet  -3.34    -6.828 0.148  False 
terrier whippet   9.9     6.412  13.388  True 
----------------------------------------------


We want to see if "poodle"s and "shihtzu"s have significantly different color breakdowns.<br>
Get the poodle colors and store it in a variable called poodle_colors.<br>
Get the shih tzu colors and store it in a variable called shihtzu_colors.

In [12]:
poodle_colors = fetchmaker.get_color('poodle')
shihtzu_colors = fetchmaker.get_color('shihtzu')


You can get the number of occurrences of brown poodles by using np.count_nonzero(poodle_colors == "brown").<br>
Use this function to build a Chi Square contingency table, called color_table, with the following structure:
- Poodle	Shih Tzu
- Black	x	x
- Brown	x	x
- Gold	x	x
- Grey	x	x
- White	x	x

Fill in the "x" entries with the number of each poodle or shih tzu with the specified color.

In [13]:
color_table = [[np.count_nonzero(poodle_colors == "black"), np.count_nonzero(shihtzu_colors == "black")], 
               [np.count_nonzero(poodle_colors == "brown"), np.count_nonzero(shihtzu_colors == "brown")], 
               [np.count_nonzero(poodle_colors == "gold"), np.count_nonzero(shihtzu_colors == "gold")], 
               [np.count_nonzero(poodle_colors == "grey"), np.count_nonzero(shihtzu_colors == "grey")], 
               [np.count_nonzero(poodle_colors == "white"), np.count_nonzero(shihtzu_colors == "white")]]

Feed your color_table into SciPy's Chi Square test, save the p-value and print it out.<br>
Is there a significant difference?

In [14]:
from scipy.stats import chi2_contingency

_, color_pval, _, _ = chi2_contingency(color_table)
print('%.06f' %color_pval)

0.005302


Great job!<br>
Feel free to play around with fetchmaker more and run some hypothesis tests of your own.<br>
The breeds you can explore are "poodle", "rottweiler", "whippet", "greyhound", "terrier", "chihuahua", "shihtzu", and "pitbull".