In [1]:
import numpy as np
from datascience import *

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

# DSC 10 Discussion Week 4

Topics to go over: 
    - Joins
        - Implementation
    - Maps
        - Implementation 
    - Iteration
        - Comparison/Booleans
        - Random Selection
        - Control Statements

# Making Comparisons

# The Incumbency Effect
The following table contains data about every congressional representative from the 50 states (DC is not included) from 1995 onward. If an entry contains "nan" it means that there is no entry (i.e. a middlename of "nan" means the congressman or woman had no middle name).

In [2]:
congress = Table().read_table("congress-terms-since1995.csv").drop(2,6,13,14).where("incumbent", are.not_equal_to("nan"))
true_date = make_array()
for i in congress.column("termstart"):
     true_date = np.append(true_date, int(i[:-5:-1][::-1]))
congress = congress.with_column(
    "Term Start", true_date).drop("termstart")
congress

congress,chamber,firstname,middlename,lastname,birthday,state,party,incumbent,age,Term Start
104,house,Sidney,Richard,Yates,8/27/1909,IL,D,Yes,85.4,1995
104,house,James,Henry,Quillen,1/11/1916,TN,R,Yes,79.0,1995
104,house,Henry,Barbosa,González,5/3/1916,TX,D,Yes,78.7,1995
104,house,Sam,Melville,Gibbons,1/20/1920,FL,D,Yes,75.0,1995
104,house,George,E.,Brown,3/6/1920,CA,D,Yes,74.8,1995
104,house,Gillespie,V.,Montgomery,8/5/1920,MS,D,Yes,74.4,1995
104,house,Tom,,Bevill,3/27/1921,AL,D,Yes,73.8,1995
104,house,Barbara,Farrell,Vucanovich,6/22/1921,NV,R,Yes,73.5,1995
104,house,Carlos,John,Moorhead,5/6/1922,CA,R,Yes,72.7,1995
104,house,Benjamin,,Gilman,12/6/1922,NY,R,Yes,72.1,1995


# Question 1.1:
Often times people who've already served in office are said to have an incumbency advantage: an advantage that comes from having prior experience, a well known name, and franking priviledges (they can send mail without paying postage). But just how big is this supposed advantage? Is there an advantage?

In the cell below find out what proportion of representatives have been incumbents since 1995.

In [None]:
prop_incumbents = 
prop_incumbents

# Question 1.2
Of course the House and the Senate differ in many ways, from term lengths, to number of representatives, to jurisdiction of powers. Does one chamber have a higher incidence of incumbent advantage that the other?

In [None]:
# house and senate are tables that only have data about their respective chambers 
house = 
senate = 

prop_incumbents_house = 
prop_incumbents_senate = 

print("Proportion of House who have been incumbents:", prop_incumbents_house)
print("Proportion of Senate who have been incumbents:", prop_incumbents_senate)

So it would seem that the senate, at least since 1995, has had a more pronounced incumbency effect. I encourage you to think about why this may be

# Parties of Congressional Representatives
Next, we are going to find what proportion of each state's representatives, since 1995, have been Republican or Democrat. Eventually, we're going to use that information to build a map.

First, let's see how the composition of each chamber Congress has changed since 1995.

In [None]:
#run cell
house_plot = congress.where("chamber", "house").pivot("party", "Term Start").select("Term Start", "D", "R")\
.relabeled("D", "Democrats").relabeled("R", "Republicans").scatter(0)
house_plot

In [None]:
#run cell
senate_plot = congress.where("chamber", "senate").pivot("party", "Term Start").select("Term Start", "D", "R")\
.relabeled("D", "Democrats").relabeled("R", "Republicans").scatter(0)
senate_plot

Obviously, the composition of congress has changed dramatically over the past few years. For the sake of simplicity, we'll only be looking at the overall proportion of democratic versus republican representatives a state has had in the given time.

The following cell creates a table called states that associates each state with the latitude and longitude of one of it's cities (for our purposes it doesn't matter which city it is, so long as the coordinates correspond with a point within each State's borders).

In [None]:
#run cell
abbreviations = Table().read_table("State_names_w_abbreviations.csv")
states_w_cities = Table.read_table("zip_codes_states.csv").where(
    "latitude", are.between_or_equal_to(0.01,360))
states = abbreviations.join("Abbreviation", states_w_cities, "state").drop(2,5,6)
states

# Question 2.1
Now that we have a table with congressional data and one with coordinates associated with each state, it will be of some use to us to join the 2 tables. 
Use the .join method to create a new table that has a row for every representative and has all of the old congressional data PLUS the new longitude and latitude data. 
(Remember, the order you use .join matters. congress.join(-arguments-) is not the same as states.join(-arguments-)!!)

In [None]:
combined = 
combined

Now our goal is to make 2 maps: One displaying the top 10 most democratic states, and one displaying the top 10 most republican states.

# Question 2.2
In order to build our maps, the first thing we have to do is find out what proportion of representatives from each State are Republican or Democrat. There are several ways to do this, one of which is iteratively. In the cell below is a skeleton to solve the problem iteratively, but if you have another way to do it feel free to employ that in the blank cell 2 cells down.

NOTE: Make sure the that each element of your array corresponds to exactly one state, and that the states each element corresponds to is in alphabetical order
i.e. the first element of prop_democrat_array will be the proportion of democrats in AK.

In [None]:
#make an array that has the proportion of democratic representatives 
#for every state, and one for the proportion of republicans.
prop_democrat_array = make_array()
prop_republican_array = make_array()
for i in _______: #______ = some iterable object. An array maybe? A column?
    prop_democrat_one_state = 
    prop_republican_one_state =     
    prop_democrat_array = np.append(prop_democrat_array, prop_democrat_one_state)
    prop_republican_array = np.append(prop_republican_array, prop_republican_one_state)

In [None]:
#Your own way
prop_democrat_array = 

# Question 2.3
now make a table called proportions_and_locations that has your new arrays appended onto it as columns. It should have 1 row for every state, and the following columns: latitude, longitude, State Abbreviation, Proportion Republican,and Proportion Democrat. They should be in that order as well.

In [None]:
proportions_and_locations = 
proportions_and_locations 

Now, we're almost ready to make our maps! First, make 2 tables: one called top_10_republican and one called top_10_democrat. Each should be the same as proportions_and_locations, except each should only have 10 rows for each of the states with the highest proportion of republican and democratic representatives, respectively. They should also only have the proportions for their own party (top_10_republicans shouldn't have a "Proportion Democrat" column and visa versa)

In [None]:
top_10_republican = 
top_10_republican

In [None]:
top_10_democrat = 
top_10_democrat

The following cell takes each table you made and puts them in a suitable format to be read by the .map_table function. 

In [None]:
#run cell 
map_ready_top_10_democrat = top_10_democrat.drop(3).with_columns(
    "Color", np.array(["blue"] * 10),
    "Size", (top_10_democrat.column(3)**5)*(10**12))
map_ready_top_10_republican = top_10_republican.drop(3).with_columns(
    "Color", np.array(["red"] * 10),
    "Size", (top_10_republican.column(3)**5)*(10**12))

In [None]:
map_ready_top_10_democrat

In [None]:
map_ready_top_10_republican

Now, use .map_table to create 2 maps: One for the each of the above tables. Both should use circles whose size is proportionate to the proportion of republican or democrat representatives.

In [None]:
democrat_map = 
democrat_map

In [None]:
republican_map = 
republican_map

In [None]:
actors = Table.read_table('actors.csv')
actors

1.1 Find the number of actors whose number of movies are greater than 50 by using the table method ".where".

In [None]:
...

1.2 Find the number of actors whose number of movies are greater than 50 without using the table method ".where".

In [None]:
...

# Conditional Statement Review

Chained Conditional Statement

if x < y:

    STATEMENTS_A
    
elif x > y:

    STATEMENTS_B
    
else:

    STATEMENTS_C

Nested Conditional Statement

if x < y:

    STATEMENTS_A
    
else:

    if x > y:
    
        STATEMENTS_B
        
    else:
    
        STATEMENTS_C

2.1 Write a function that take in a string username and a string password and verify if the username and password match with the provided information.

In [None]:
myusername = "aliceinwonderland"
mypassword = "ilovenutella"

def verify(username, password):
    #if the username is not correct, print out an error message "No user found"
    
    #if the username is correct but the password is not, 
    #print out an error message "The password entered is not correct"
    
    #if both the username and password are correct, print out "Welcome back!"
    ...

In [None]:
# run verify(myusername, mypassword) to see if you implement the function correctly
verify(myusername, mypassword)

# Iteration

3.1 Write a function that takes in a list of numbers and prints out all the negative numbers from the list.

In [None]:
def printnegative(mylist):
    ...

In [None]:
#verify if the function above behaves correctly
mylist = make_array(1,-1,4,0,-7,-12,3,-3)
printnegative(mylist)