# Goodness of Fit Chi-Squares in Python
Running a goodness of fit Chi-Square in Python is very similar to running a goodness of fit Chi-Square in R.

## Import Packages

In [13]:
import pandas as pd
import numpy as np
import scipy, scipy.stats

## Load Data

In [3]:
SW = pd.read_csv('./assets/SW_survey_renamed.csv')

In [6]:
SW.head(2)

Unnamed: 0,RespondentID,SeenYN,FanYN,SeenIYN,SeenIIYN,SeenIIIYN,SeenIVYN,SeenVYN,SeenVIYN,RankI,...,Favorable_Yoda,ShotFirst,ExpandedUniverseYN,FanExpandedUniverseYN,StarTrekFanYN,Gender,Age,Household Income,Education,Location
0,3292879998,Yes,Yes,Star Wars: Episode I The Phantom Menace,Star Wars: Episode II Attack of the Clones,Star Wars: Episode III Revenge of the Sith,Star Wars: Episode IV A New Hope,Star Wars: Episode V The Empire Strikes Back,Star Wars: Episode VI Return of the Jedi,3.0,...,Very favorably,I don't understand this question,Yes,No,No,Male,18-29,,High school degree,South Atlantic
1,3292879538,No,,,,,,,,,...,,,,,Yes,Male,18-29,"$0 - $24,999",Bachelor degree,West South Central


## Question Set Up
You will be testing the same premise as when you did a goodness of fit Chi-Square in R: You found something online that mentioned that 90% of people are Star Wars fans, and you want to see if that holds true in your own survey. In this way, you are comparing your sample (the survey) to the population at large (what you read online).

## Data Wrangling
You will need to get the number of people who were and were not fans of Star Wars. Luckily, this is relatively easy to do with the `pandas` function `value_counts()`:

In [11]:
SW.FanYN.value_counts()

Yes    552
No     284
Name: FanYN, dtype: int64

## Run the Analysis
Now you are ready to run your analysis! You will first create a variable that houses the observed values:

In [14]:
observed_values = np.array([552, 284])

Then you will create a variable that houses the expected values.

Unlike R, Python requires you to have raw numbers, not percentages here, so you will ned to calculate the values yourself.

First, add up your expected values to get the total: `552 + 284 = 836`.

Then, multiply that number by `.9` to get what percentage would be `90%`. 

The number is `752` - this becomes your first expected value.

Then subtract hat number, `752`, from the total, and you will get your other value: `836-752 = 84`.

In [15]:
expected_values = np.array([752, 84])

Once you have those two variables, it is simply a matter of plugging them into your chisquare() function that comes in scipy.stats:

In [16]:
scipy.stats.chisquare(observed_values, f_exp=expected_values)

Power_divergenceResult(statistic=529.3819655521784, pvalue=3.849512370977756e-117)

The one labeled `statistic` is your Chi-Square value, and the one labeled `pvalue` is your p value!

It will most likely be written in scientific notation. It is in this case, and so this means that this value is extremely significant - you are moving the decimal over to the left `117` times, which means that there are `116 zeros` in front of that 3! And remember that a significant goodness of fit Chi-Square means that your sample significantly differs from the population in some way.