Welcome to your DataCamp project audition! This notebook must be filled out and vetted before a contract can be signed and you can start creating your project.

There are two parts to be completed in this notebook:

1. **Project information**:  The title of the project, a project description, assumed student background, etc.
2. **Project introduction**: The three first text and code cells that will form the introduction of your project.

When you are happy with this document, include the file as an attachment in an email to me as well as the datasets used. If you have any questions, feel free to reach out to me at any time!

David Venturi<br>
david.venturi@datacamp.com

# 1. Project information

**Project title**: The title of the project. Maximum 41 characters.

**Name:** Jessica Chace.

**Email address associated with your DataCamp account:** chace.jessica@gmail.com

**Project description**:  Calling all politicos!  This project explores voter registration for the 27 congressional districts of New York from 2014 through 2018.  We'll be asking questions of the data and identifying findings like whether the 2016 election drove voters to fringe political parties, and what districts were likely to flip for the 2018 midterm elections.  Data will be pulled using Enigma Public's new SDK package which faciliates an easy API pull.  Data will be manipulated and analyzed using pandas.  We'll also plot findings with static and interactive charts using Matplotlib and Plotly-Dash, a new data visualization tool that allows Python users to deploy their work without any knowledge of Javascript.  By the end of this project, you'll have something that rivals the best of FiveThirtyEight.  Nate Silver's got nothing on you.  

This project assumes you have some familiarity with pandas and Matplotlib.  It also assumes you have some knowledge of machine learning and predictive modeling.  The last part of the project will guide you through a basic set-up of a Plotly-Dash app to create interactive features for your plots.

**Dataset(s) used**: This project uses five datasets pulled from the Enigma Public SDK API package found here: https://public.enigma.com/browse/tag/elections/34.  These datasets are filled with voter registration information for the 27 districts of New York from 2014 through 2018 for 10 different political parties.  

**Assumed student knowledge**: This project will be designed for a beginner/intermediate Python user who wants to work on dataframe manipulation skills, plotting skills, and analysis skills.  It is assumed that the student is somewhat familiar with the pandas and matplotlib libraries as well as some pandas functions like .pct_change.    

# 2. Project introduction

The final output of a DataCamp project looks like a blog post: pairs of text and code cells that tell a story about data. The text is written from the perspective of the data analyst and *not* from the perspective of an instructor on DataCamp. So, for this blog post intro, all you need to do is pretend like you're writing a blog post -- forget the part about instructors and students.

Below you'll see the structure of a DataCamp project: a series of "tasks" where each task consists of a title, a **single** text cell, and a **single** code cell. There are 8-12 tasks in a project and each task can have up to 10 lines of code. What you need to do:
1. Read through the template structure.
2. As best you can, divide your project as it is currently visualized in your mind into tasks.
3. Fill out the template structure for the first three tasks of your project.

As you are completing each task, you may wish to consult the project notebook format in our [documentation](https://authoring.datacamp.com/projects/projects-format.html). Only the `@context` and `@solution` cells are relevant to this audition.

Titles of Tasks to be Political Slogans:

Putting Data First

Prosperity and Progress (and Pandas)

A Chicken in Every Pot and a Subplot in Every Grid

Don't Swap Functions in the Middle of the Stream

Percent Change We Can Believe In

Make America Great Again

I'm with Her

Yes, We Can Build this Dash app

Believe in sklearn




## 1. Putting data first
![Putting Data First](https://github.com/thedatasleuth/New-York-Congressional-Districts/blob/master/puttingpeoplefirst.png "Putting People First")

Exciting intro paragraph about knowledge is power .


In this project, we're going to be using datasets made publicly available by Enigma Public.  These datasets were made available and promoted during a competition hosted by Enigma in anticipation of the 2018 midterm elections.  The goal: explore data in a way that had some explanatory power over the 2018 midterm elections.  We're going to be recreating that competition during this project and taking a look what political parties are dominant in which districts and in what time frame, who are the most active districts and political parties, whether a district was ceded to a rival party, what effect fringe party registration had on canonical parties, identifying correlations between parties, and making a simple model to predict the outcome of the 2018 midterm elections.  


An exciting intro to the analysis. Provide context on the problem you're going to solve, the dataset(s) you're going to use, the relevant industry, etc. You may wish to briefly introduce the techniques you're going to use. Tell a story. Get students excited! It should at most have 1800 characters.

The most common error instructors make in **context cells** is referring to the student or the project. We want project notebooks to appear as a blog post or a data analysis. Bad: *"In this project, you will..."* Good: *"We will..."*

Images are welcome additions to every Markdown cell, but especially this first one. Make sure the images you use have a [permissive license](https://support.google.com/websearch/answer/29508?hl=en) and display them using [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#images).

In [10]:
"""First, we're going to need to download and the import Engima Public's SDK package.  
You can easily do this with a pip install right in your notebook"""

# !pip install enigma-sdk
import enigma

"""Then, we'll need set up our API request of the database.  Each user will have a unique
API key that can be acquired by setting up an account with Enigma Public."""

public = enigma.Public()
public.set_auth(apikey='Li3XmSTVtp8zoaQ2uyoxQesJ8bVt6E4xrMi0TQ1XW3oc20ksCb5og')

'''Before our pull, let's import our packages'''

import pandas as pd
import matplotlib as plt

'''Now that we have access to the database we can pull the records we are able to pull records.
There are five years worth of voter registration data, so we'll have to make five separate
requests and concatenate the dataframes to make one large dataset.'''

dataset = public.datasets.get('113dc95c-b3d0-41ad-8e0e-cb8cccd31f16')
newyork2018 = dataset.current_snapshot.export_dataframe()

dataset = public.datasets.get('a0c06879-3c2a-41d2-9de4-40d63d6a9be6')
newyork2017 = dataset.current_snapshot.export_dataframe()

dataset = public.datasets.get('cc0fa835-f6ba-4425-b0d9-b5a2226e547f')
newyork2016 = dataset.current_snapshot.export_dataframe()

dataset = public.datasets.get('0eb79e77-0a6f-48e5-be83-db89c03da8af')
newyork2015 = dataset.current_snapshot.export_dataframe()

dataset = public.datasets.get('895e1114-e783-4109-91a5-ad7262ee607c')
newyork2014 = dataset.current_snapshot.export_dataframe()


# Code for the first task
# It should consist of up to 10 lines of code (not including comments)
# and take at most 5 seconds to execute on an average laptop.

## 2. Prosperity and progress (and pandas)

Surprisingly, the datasets do not contain the corresponding year, most likely because they are identified by their title.  Because we're going to concatenate the five datatsets together for a more in depth analysis over the course of five years, we'll need to keep track of which dataset corresponds to which year.  Let's add the year as a new column for each dataset.  

In [8]:
newyork2014['Year'] = 2014
newyork2015['Year'] = 2015
newyork2016['Year'] = 2016
newyork2017['Year'] = 2017
newyork2018['Year'] = 2018

# def get_year(df):
#     df['Year'] = df.name[-4:]
#     return df



Now, let's concatenate all of the individual dataframes into one on the 0 axis.

In [11]:
newyork = pd.concat([newyork2014, newyork2015, newyork2016, newyork2017, newyork2018], axis=0)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


## 3. Title of the third task (<= 55 chars)  (sentence case)

Context / background / story / etc. It should at most have 800 characters and/or 3 paragraphs.

The most common error instructors make in **context cells** is referring to the student or the project. We want project notebooks to appear as a blog post or a data analysis. Bad: *"In this project, you will..."* Good: *"We will..."*

In [3]:
# Code for the third task
# It should consist of up to 10 lines of code (not including comments)
# and take at most 5 seconds to execute on an average laptop.

*Stop here! Only the three first tasks. :)*