# IRS migration analysis: Is money leaving CT?

by Jake Kara / TrendCT.org

This is my first attempt at using highly reliable but narrowly focused data to shed some light on the question of whether there is an exodus of wealth from the state. 

raw data available here: https://www.irs.gov/uac/SOI-Tax-Stats-Migration-Data

#### Notebook format

TrendCT.org strives to publish its data and analysis in as clear and reproducible a format as possible in an effort to encourage readers to check our work and expand upon our analysis.


In [1]:
import os, pandas as pd

data_dir = "ct_xls/renamed/"

files = os.listdir(data_dir)

In [2]:
## Load in each year's data

outflow = {}
inflow = {}

## The following years don't have aggregate income data:
## 1991, 1992
## So we'll skip those

## prototype for 1993 on 
#outflow[1993] =  pd.read_excel("ct_xls/renamed/1992-93 out.xls", skiprows=7)
#outflow[1993].head()

## Turn year int into YYYY-YY two-year format
def abbr(year): 
    year_abbr = ""
    if year < 2000:
        year_abbr = year - 1900
    elif year < 2010:
        year_abbr = "0" + str(year - 2000)
    else:
        year_abbr = year - 2000
    return str(year - 1) + "-" + str(year_abbr)
    
def fname(year, in_out):
    return "ct_xls/renamed/" + abbr(year) + " " + in_out + ".xls"

def loadyear(year, skip):
    print "Loading: " + str(year)
    inflow[year] = pd.read_excel(fname(year, "in"), skiprows = skip)
    outflow[year] = pd.read_excel(fname(year, "out"), skiprows = skip)
    #outflow[year]["year"] = year
    #return outflow[year]

def load93(year):
    loadyear(year, 6)


In [3]:
## Load 1993 data
load93(1993)

outflow[1993].columns = ["to","abbr","state","returns","exemptions","agi"]
outflow[1993]["year"] = 1993
outflow[1993] = outflow[1993].drop("to",1)

inflow[1993].columns = ["to","abbr","state","returns","exemptions","agi"]
inflow[1993]["year"] = 1993
inflow[1993] = inflow[1993].drop("to",1)

totals = {}
totals[1993] = outflow[1993][outflow[1993]["state"] == "Connecticut"]


in_totals = {}
in_totals[1993] = inflow[1993][inflow[1993]["state"] == "Connecticut"]

## data frame to hold data on aggregate filers leaving the state and how much money they took with them
totals[1993]
in_totals[1993]


Loading: 1993


Unnamed: 0,abbr,state,returns,exemptions,agi,year
1,XX,Connecticut,33195,60452,1681451,1993


In [4]:
## Load 1994 data
load93(1994)
outflow[1994].columns = ["to","abbr","state","returns","exemptions","agi"]
outflow[1994]["year"] = 1994
outflow[1994]

inflow[1994].columns = ["to","abbr","state","returns","exemptions","agi"]
inflow[1994]["year"] = 1994
inflow[1994]

totals[1994] = outflow[1994][outflow[1994]["state"] == "Total Outflow"]
totals[1994]

in_totals[1994] = inflow[1994][inflow[1994]["state"] == "Total Inflow"]
in_totals[1994]

Loading: 1994


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,63,,Total Inflow,32765,60222,1717898,1994


In [5]:
## Load 1995 data
load93(1995)
outflow[1995].columns = ["to","abbr","state","returns","exemptions","agi"]
outflow[1995]["year"] = 1995

totals[1995] = outflow[1995][outflow[1995]["state"] == "Total Outflow"]
totals[1995]

inflow[1995].columns = ["to","abbr","state","returns","exemptions","agi","drop"]
inflow[1995] = inflow[1995].drop("to",1)
inflow[1995] = inflow[1995].drop("drop",1)
inflow[1995]["year"] = 1995

in_totals[1995] = inflow[1995][inflow[1995]["state"] == "Total Inflow"]

in_totals[1995]

#inflow[1995]


Loading: 1995


Unnamed: 0,abbr,state,returns,exemptions,agi,year
1,,Total Inflow,35221,64065,1735618,1995


In [6]:
## Load 1996 data

#load93(1996)
#outflow[1996]
#outflow[1996].columns = ["to","abbr","state","returns","exemptions","agi"]
#outflow[1996]["year"] = 1996

#totals[1996] = outflow[1996][outflow[1996]["state"] == "CT Total Mig - US & For"]
#totals[1996]

## All the same through 2004, so we'll write a function to do it
## based on the prototype above
def load96(year):
    load93(year)
    outflow[year].columns = ["to","abbr","state","returns","exemptions","agi"]
    outflow[year]["year"] = year

    totals[year] = outflow[year][outflow[year]["state"] == "CT Total Mig - US & For"]
    inflow[year].columns = ["to","abbr","state","returns","exemptions","agi"]
    
    in_totals[year] = inflow[year][inflow[year]["state"] == "CT Total Mig - US & For"]
    in_totals[year]["year"] = year
    return in_totals[year]

load96(1996)

Loading: 1996


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,33780,61377,1803045,1996


In [7]:
## Load 1997 data
load96(1997)

Loading: 1997


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,35624,64742,2056093,1997


In [8]:
## Load 1998 data
load96(1998)

Loading: 1998


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,36956,66435,2168891,1998


In [9]:
## Load 1999 data
load96(1999)

Loading: 1999


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,37959,68166,2458195,1999


In [10]:
## Load 2000 data
load96(2000)

Loading: 2000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,39333,70686,2715637,2000


In [11]:
## Load 2001 data
load96(2001)

Loading: 2001


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,39182,70090,2837094,2001


In [12]:
## Load 2002 data
load96(2002)

Loading: 2002


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,40096,71022,2722527,2002


In [13]:
## Load 2003 data
load96(2003)

Loading: 2003


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,39449,70423,2566727,2003


In [14]:
## Load 2004 data
load96(2004)

Loading: 2004


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,37264,66966,2506512,2004


In [15]:
## Load 2005 data
load96(2005)

Loading: 2005


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
0,96,CT,CT Total Mig - US & For,38125,68651,2698556,2005


In [16]:
## Load 2006 data
load96(2006)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Loading: 2006


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
0,96,CT,CT Total Mig - US & For,38256,68038,2657640,2006


In [17]:
## Load 2007 data
load96(2007)

Loading: 2007


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
0,96,CT,CT Total Mig - US & For,36676,64428,2594852,2007


In [18]:
## Load 2008 data
load96(2008)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Loading: 2008


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,38072,66078,2838352,2008


In [19]:
## Load 2009 data
load96(2009)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Loading: 2009


Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
1,96,CT,CT Total Mig - US & For,36310,62186,2468447,2009


In [20]:
## Load 2010 data
#loadyear(2010, 3)
#outflow[2010].columns = ["code1","code2","abbr","state","returns","exemptions","agi"]
#outflow[2010]["year"] = 2010
#outflow[2010] = outflow[2010].drop("code1",1)
#outflow[2010] = outflow[2010].drop("code2",1)
#totals[2010] = outflow[2010][outflow[2010]["state"] == "CT Total Mig - US & For"]
#totals[2010]

# Base function on prototype above for the next years
def load10(year):
    loadyear(year, 3)
    outflow[year].columns = ["code1","code2","abbr","state","returns","exemptions","agi"]
    outflow[year]["year"] = year
    outflow[year] = outflow[year].drop("code1",1)
    outflow[year] = outflow[year].drop("code2",1)
    totals[year] = outflow[year][outflow[year]["state"] == "CT Total Mig - US & For"]
    
    inflow[year].columns = ["code1","code2","abbr","state","returns","exemptions","agi"]
    inflow[year]["year"] = year
    inflow[year] = inflow[year].drop("code1",1)
    outflow[year] = inflow[year].drop("code2",1)
    in_totals[year] = inflow[year][inflow[year]["state"] == "CT Total Mig - US & For"]
    
    return in_totals[year]
load10(2010)

Loading: 2010


Unnamed: 0,code2,abbr,state,returns,exemptions,agi,year
2,96,CT,CT Total Mig - US & For,32415,56130,2134638,2010


In [21]:
load10(2011)

Loading: 2011


Unnamed: 0,code2,abbr,state,returns,exemptions,agi,year
2,96,CT,CT Total Mig - US & For,33881,59285,2494120,2011


In [22]:
## Load 2012 data
#outflow[2012] = pd.read_excel("ct_xls/renamed/2011-12 in out.xls", sheetname="State Outflow", skiprows=4)
#outflow[2012].columns = ["code1","code2","abbr","state","returns","exemptions","agi"]
#outflow[2012] = outflow[2012].drop("code1",1)
#outflow[2012] = outflow[2012].drop("code2",1)
#outflow[2012]

def load12(year):
    outflow[year] = pd.read_excel("ct_xls/renamed/" + abbr(year) + " in out.xls", sheetname="State Outflow", skiprows=4)
    outflow[year].columns = ["code1","code2","abbr","state","returns","exemptions","agi"]
    outflow[year] = outflow[year].drop("code1",1)
    outflow[year] = outflow[year].drop("code2",1)
    outflow[year]["year"] = year
    totals[year] =  outflow[year][outflow[year]["state"] == "CT Total Migration US and Foreign"]
    
    inflow[year] = pd.read_excel("ct_xls/renamed/" + abbr(year) + " in out.xls", sheetname="State Inflow", skiprows=4)
    inflow[year].columns = ["code1","code2","abbr","state","returns","exemptions","agi"]
    inflow[year] = inflow[year].drop("code1",1)
    inflow[year] = inflow[year].drop("code2",1)
    inflow[year]["year"] = year
    in_totals[year] =  inflow[year][inflow[year]["state"] == "CT Total Migration US and Foreign"]
    
    return in_totals[year]
    
load12(2012)

Unnamed: 0,abbr,state,returns,exemptions,agi,year
1,CT,CT Total Migration US and Foreign,39062,67699,3165218,2012


In [23]:
load12(2013)

Unnamed: 0,abbr,state,returns,exemptions,agi,year
1,CT,CT Total Migration US and Foreign,39385,69577,3601931,2013


## Create non-migrant dataframes

Use this data to compare the returns/exemptions/income of CT non-migrant residents to migrants leaving the state

In [24]:
nonmigrant = {}

def get_nonmigrant(year,label):
    nonmigrant[year] = outflow[year][outflow[year]["state"] == label]
    return nonmigrant[year]


In [25]:
## Get 1993 non-migrant data
get_nonmigrant(1993,"tate Non-Migrant")


Unnamed: 0,abbr,state,returns,exemptions,agi,year
53,Ct S,tate Non-Migrant,1233330,2574811,58081271,1993


In [26]:
get_nonmigrant(1994,"State Non-Migrant")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
53,9,Ct,State Non-Migrant,1216708,2551533,57626608,1994


In [27]:
get_nonmigrant(1995,"State Non-Migrant")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
53,9,Ct,State Non-Migrant,1209762,2539958,58713546,1995


In [28]:
get_nonmigrant(1996,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1214701,2551739,64190713,1996


In [29]:
get_nonmigrant(1997,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1221385,2569421,68960919,1997


In [30]:
get_nonmigrant(1998,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1231278,2588083,73926815,1998


In [31]:
get_nonmigrant(1999,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1251218,2627741,79815776,1999


In [32]:
get_nonmigrant(2000,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1270081,2670644,87245246,2000


In [33]:
get_nonmigrant(2001,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1280310,2688956,94932236,2001


In [34]:
get_nonmigrant(2002,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1295421,2715895,91361107,2002


In [35]:
get_nonmigrant(2003,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1302971,2745004,88871823,2003


In [36]:
get_nonmigrant(2004,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1298471,2748590,90332924,2004


In [37]:
get_nonmigrant(2005,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
3,9,CT,CT Non-Migrants,1294962,2739282,96894716,2005


In [38]:
get_nonmigrant(2006,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
3,9,CT,CT Non-Migrants,1289941,2720319,100383499,2006


In [39]:
get_nonmigrant(2007,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
3,9,CT,CT Non-Migrants,1300673,2723531,105079131,2007


In [40]:
get_nonmigrant(2007,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
3,9,CT,CT Non-Migrants,1300673,2723531,105079131,2007


In [41]:
get_nonmigrant(2008,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1345270,2781620,112353473,2008


In [42]:
get_nonmigrant(2009,"CT Non-Migrants")

Unnamed: 0,to,abbr,state,returns,exemptions,agi,year
4,9,CT,CT Non-Migrants,1360079,2803701,108996493,2009


In [43]:
get_nonmigrant(2010,"CT Non-Migrants")

Unnamed: 0,abbr,state,returns,exemptions,agi,year
5,CT,CT Non-Migrants,1348693,2789649,101699729,2010


In [44]:
get_nonmigrant(2011,"CT Non-Migrants")

Unnamed: 0,abbr,state,returns,exemptions,agi,year
5,CT,CT Non-Migrants,1343294,2774447,106050133,2011


In [45]:
get_nonmigrant(2012,"CT Non-migrants")

Unnamed: 0,abbr,state,returns,exemptions,agi,year
4,CT,CT Non-migrants,1367856,2870405,133130211,2012


In [46]:
get_nonmigrant(2013,"CT Non-migrants")

Unnamed: 0,abbr,state,returns,exemptions,agi,year
4,CT,CT Non-migrants,1365345,2855206,142228765,2013


## Merge frame sets together

In [47]:
## Create monolithic frame containing returns, exemptions agi and year 
## for two groups per year: out-flowing migrants and non-migrant residents

## Build first year manually
key = 0
totals
mono = totals[1993]
mono = mono[mono["state"].notnull()]

mono = mono.drop("state",1)
mono = mono.drop("abbr",1)
#mono = mono.drop("key",1)
#mono[mono["state"].notnull,"key"] = 0

# Delete first row
mono = mono.drop(1,0)

for year in nonmigrant:
    nonm_row = nonmigrant[year]
    #nonm_row["key"] = key
    nonm_row["type"] = "nonmigrants"
    nonm_row = nonm_row.drop("state",1)
    nonm_row = nonm_row.drop("abbr",1)
    #print nonm_row
    key += 1
    mono = mono.append(nonm_row)
    
#print "--" * 20    
    
for year in totals:
    print year
    tot_row =totals[year]
    #tot_row["key"] = key
    tot_row["type"] = "out-migrants"
    tot_row = tot_row.drop("state",1)
    tot_row = tot_row.drop("abbr",1)

    #print tot_row
    key += 1
    
    mono = mono.append(tot_row)
        
mono = mono.drop("to",1)
mono["year"].value_counts()
#mono.sort_values(by="year")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


1993
1994
1995
1996
1997
1998
1999
2000
2001


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013


2013    2
2002    2
1994    2
1995    2
1996    2
1997    2
1998    2
1999    2
2000    2
2001    2
2003    2
2012    2
2004    2
2005    2
2006    2
2007    2
2008    2
2009    2
2010    2
2011    2
1993    2
Name: year, dtype: int64

In [48]:
mono["avg"] = mono["agi"] / mono["returns"]
print mono[mono["type"] == "nonmigrants"]["agi"].mean()
print mono[mono["type"] == "out-migrants"]["agi"].mean()

91470244.4762
3006155.61905


In [49]:
def outmigrants(year):
    return mono[(mono["type"] == "out-migrants")&(mono["year"] == year)]

def nonmigrants(year):
    return mono[(mono["type"] == "nonmigrants")&(mono["year"] == year)]

def inmigrants(year):
    return mono[(mono["type"] == "in-migrants")&(mono["year"] == year)]

In [50]:
inmigrants(1993)

Unnamed: 0,agi,exemptions,returns,type,year,avg


In [51]:
nonmigrants(1993)


Unnamed: 0,agi,exemptions,returns,type,year,avg
53,58081300.0,2574810.0,1233330.0,nonmigrants,1993,47.093


In [52]:
outmigrants(1993)

Unnamed: 0,agi,exemptions,returns,type,year,avg
1,2060260.0,83912,46731,out-migrants,1993,44.0877


## Add in-flow data

In [53]:
print mono.columns
for year in in_totals:
    inrow = in_totals[year]
    inrow = inrow[["agi","exemptions","returns","year"]]
    inrow["type"] = "in-migrants"
    
    mono = mono.append(inrow)
    #print inrow.columns
mono["type"].value_counts()

Index([u'agi', u'exemptions', u'returns', u'type', u'year', u'avg'], dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


out-migrants    21
in-migrants     21
nonmigrants     21
Name: type, dtype: int64

## Summarize with in-flow and outflow data

In [54]:
## Helper functions
def dollars(num):
    return "${:,.2f}".format(num)

def money (num):
    return "${:,.2f}".format(num / (1000*1000*1000)) + " billion"
last = 0

def billion(num):
    return round(float(num) / (1000 * 1000 * 1000), 2)


In [55]:
## Summarize results and see if it makes sense:

net_gains = 0
net_losses = 0

def summarize(year):
    global net_gains, net_losses
    
    outm = outmigrants(year)
    nonm = nonmigrants(year)
    inm = inmigrants(year)
    
    returns_out = outm.iloc[0]["returns"]
    dollars_out = outm.iloc[0]["agi"] * 1000
    avg_out = dollars_out / returns_out
    
    returns_non = nonm.iloc[0]["returns"]
    dollars_non = nonm.iloc[0]["agi"] * 1000
    avg_non = dollars_non / returns_non
    
    returns_in = inm.iloc[0]["returns"]
    dollars_in = inm.iloc[0]["agi"] * 1000
    avg_in = dollars_in / returns_in
        
    #print "----" * 10
    #print "In " + str(year) + ", " + str(returns_out) + " former residents filed taxes in another state"
    #print "\twith an aggregate gross income of " + str(dollars_out) + "."
    #print "\tMeanwhile, " + str(returns_non) + " non-migrant residents filed taxes with an AGI of " + str(dollars_out) + "."
    #print "====" * 10
    
   # print "----" * 4 + "[" + str(year) + "]" + "----" * 4
    print "Out: ${:,.2f}".format(round(avg_out))
    #print "Dollars out: ${:,.2f}".format(round(dollars_out))
    #print "Avg $$ out : ${:,.2f}".format(round(dollars_out / returns_out))
    #print ""
    print "In: ${:,.2f}".format(round(avg_in))

    print "Stay: ${:,.2f}".format(round(avg_non))
    #print "Dollars non: ${:,.2f}".format(round(dollars_non))
    #print "Avg $$ non : ${:,.2f}".format(round(dollars_non / returns_non))
    #print ""
    
    avg_diff = avg_out - avg_non
    avg_pct_diff = avg_diff * 100 / avg_non
    print str(year) + ": {:,.2f}".format(round(avg_diff))
    print "pct diff: {:,.2f}".format(round(avg_pct_diff))
    print "====" * (8) + "======"

    if dollars_in < dollars_out:
        net_losses += 1
    elif dollars_in > dollars_out:
        net_gains  +=1
for year in totals.keys():
    summarize(year)

print "NET LOSSES: " + str(net_losses)
print "NET GAINS: " + str(net_gains)

Out: $44,088.00
In: $50,654.00
Stay: $47,093.00
1993: -3,005.00
pct diff: -6.00
Out: $44,994.00
In: $52,431.00
Stay: $47,363.00
1994: -2,369.00
pct diff: -5.00
Out: $46,112.00
In: $49,277.00
Stay: $48,533.00
1995: -2,421.00
pct diff: -5.00
Out: $50,638.00
In: $53,376.00
Stay: $52,844.00
1996: -2,206.00
pct diff: -5.00
Out: $54,439.00
In: $57,717.00
Stay: $56,461.00
1997: -2,022.00
pct diff: -4.00
Out: $60,335.00
In: $58,688.00
Stay: $60,041.00
1998: 294.00
pct diff: 0.00
Out: $65,131.00
In: $64,759.00
Stay: $63,790.00
1999: 1,340.00
pct diff: 2.00
Out: $71,420.00
In: $69,042.00
Stay: $68,693.00
2000: 2,727.00
pct diff: 4.00
Out: $76,723.00
In: $72,408.00
Stay: $74,148.00
2001: 2,576.00
pct diff: 3.00
Out: $68,837.00
In: $67,900.00
Stay: $70,526.00
2002: -1,690.00
pct diff: -2.00
Out: $62,229.00
In: $65,064.00
Stay: $68,207.00
2003: -5,978.00
pct diff: -9.00
Out: $65,949.00
In: $67,264.00
Stay: $69,569.00
2004: -3,619.00
pct diff: -5.00
Out: $67,239.00
In: $70,781.00
Stay: $74,824.00
20

# Chart 1: Simple in-flow versus out-flow

In [56]:
## CHART 1
# Net wealth change among people moving in and out of the state

ret = {}

print "year\tin-flow\tout-flow"#\tnet"
for year in totals.keys():
    inm = inmigrants(year)
    outm = outmigrants(year)
    nonm = nonmigrants(year)
    
    money_in = inm.iloc[0]["agi"] * 1000
    money_out = outm.iloc[0]["agi"] * 1000
    money_non = nonm.iloc[0]["agi"] * 1000
    money_net = money_in - money_out
    
    people_in = inm.iloc[0]["returns"] 
    people_out = outm.iloc[0]["returns"] 
    people_non = nonm.iloc[0]["returns"]
    people_net = people_in - people_out
    
    money_avg_in = money_in / people_in
    money_avg_out = money_out / people_out
    
    diff_avg = money_avg_in - money_avg_out
    
    net_people = people_in - people_out
    net_money = money_in - money_out
    
    #print "----[" + str(year) + "]----"
    #print "    \tPEOPLE\tMONEY\t\tINCOME"
    #print "IN:\t" + str(int(people_in)) + "\t" + str(round(money_in)) + "\t" +  str(round(money_avg_in))
    #print "OUT:\t" + str(int(people_out)) + "\t" +  str(round(money_out)) + "\t" +  str(round(money_avg_out))
    #print "DIFF:\t" + str(int(people_net)) + "\t" + str(round(money_net)) + "\t" + str(round(money_avg_in - money_avg_out))
    #print "NON:\t" + str(int(people_non)) + "\t" +  str(round(money_non)) + "\t" +  str(round(money_non / people_non))

    #print "NET WEALTH CHANGE: {:,.2f}".format(int(money_net))
    #print "NET PEOPLE CHANGE: " + str(int(people_net))
    #print money_in
    #print money_out
    #print net_money
    print str(year) + "\t" + str(billion(money_in)) + "\t" + str(billion(money_out))# + "\t" + str(billion(net_money))
    #print "--"
print ret


year	in-flow	out-flow
1993	1.68	2.06
1994	1.72	2.06
1995	1.74	2.14
1996	1.8	2.25
1997	2.06	2.5
1998	2.17	2.71
1999	2.46	2.84
2000	2.72	3.11
2001	2.84	3.36
2002	2.72	2.91
2003	2.57	2.56
2004	2.51	2.91
2005	2.7	3.01
2006	2.66	3.18
2007	2.59	3.36
2008	2.84	3.55
2009	2.47	2.87
2010	2.13	2.48
2011	2.49	2.68
2012	3.17	5.19
2013	3.6	5.4
{}


# Chart 2: Wealth of people in state (new and existing residents)


In [57]:
## CHART 2

## year-over-year wealth of people in the state (whether they moved in or stayed here)

gains = -1
losses = 0
print "year\twealth\tchange"
for year in totals.keys():
    inm = inmigrants(year)
    nonm = nonmigrants(year)
    
    #print inm.iloc[0]["agi"]
    
    wealth = nonm.iloc[0]["agi"] * 1000 + inm.iloc[0]["agi"] * 1000
    diff = wealth - last
    last = wealth
    #print str(year) + "\t" + money(wealth) + "\t" + money(diff)
    print str(year) + "\t" + str(billion(wealth)) + "\t" + str(billion(diff))
    
    if (diff) > 0:
        gains += 1
    else:
        losses += 1

print "Gains: " + str(gains)
print "Losses: " + str(losses)

year	wealth	change
1993	59.76	59.76
1994	59.34	-0.42
1995	60.45	1.1
1996	65.99	5.54
1997	71.02	5.02
1998	76.1	5.08
1999	82.27	6.18
2000	89.96	7.69
2001	97.77	7.81
2002	94.08	-3.69
2003	91.44	-2.65
2004	92.84	1.4
2005	99.59	6.75
2006	103.04	3.45
2007	107.67	4.63
2008	115.19	7.52
2009	111.46	-3.73
2010	103.83	-7.63
2011	108.54	4.71
2012	136.3	27.75
2013	145.83	9.54
Gains: 15
Losses: 5


# Chart 3: Income (averaged per person)

In [58]:
## CHART 3

## Per-return income for in-migrants, out-migrants and non-migrants

print "year\tin-migrants\tout-migrants\tnon-migrants"

def money_people_average(row):
    money = row.iloc[0]["agi"] * 1000
    people = row.iloc[0]["returns"]
    avg = money/people
    return (money,people, avg)

for year in totals.keys():
    inm = money_people_average(inmigrants(year))
    outm = money_people_average(outmigrants(year))
    nonm = money_people_average(nonmigrants(year))

    print str(year) + "\t" + str(int(inm[2])) + "\t" + str(int(outm[2])) + "\t" + str(int(nonm[2]))

year	in-migrants	out-migrants	non-migrants
1993	50653	44087	47093
1994	52430	44993	47362
1995	49277	46112	48533
1996	53376	50638	52844
1997	57716	54438	56461
1998	58688	60334	60040
1999	64759	65130	63790
2000	69042	71419	68692
2001	72408	76723	74147
2002	67900	68836	70526
2003	65064	62229	68207
2004	67263	65949	69568
2005	70781	67239	74824
2006	69469	69603	77820
2007	70750	75291	80788
2008	74552	80161	83517
2009	67982	69470	80139
2010	65853	64131	75406
2011	73614	65171	78947
2012	81030	111397	97327
2013	91454	111753	104170


# Number of people moving in / out

In [59]:
print "year\tin\tout"
    
for year in totals.keys():
    inm = money_people_average(inmigrants(year))
    outm = money_people_average(outmigrants(year))
    nonm = money_people_average(nonmigrants(year))
    
    print str(year) + "\t" + str(int(inm[1])) + "\t" + str(int(outm[1]))

year	in	out
1993	33195	46731
1994	32765	45825
1995	35221	46476
1996	33780	44391
1997	35624	45841
1998	36956	44953
1999	37959	43575
2000	39333	43526
2001	39182	43809
2002	40096	42246
2003	39449	41126
2004	37264	44152
2005	38125	44772
2006	38256	45650
2007	36676	44687
2008	38072	44230
2009	36310	41322
2010	32415	38637
2011	33881	41151
2012	39062	46576
2013	39385	48362
