In [1]:
import arcpy
from arcgis.gis import GIS

import pandas as pd

In [2]:
# arcpy.env.workspace = r"C:\data\Classes\Current\Analysis In GIS"

# If running outside of ArcGIS Pro, you may prefer this:
arcpy.env.workspace = "."

arcpy.env.overwriteOutput = True
arcpy.env.addOutputsToMap = True

# Load Data

In [3]:
# Load the 2000-2016 county-level election results
#
# This data is assumed to be in the environment workspace and can be downloaded here:
# https://github.com/thomaspingel/geodata/raw/master/election/election.gpkg
#
# See: https://github.com/thomaspingel/geodata/tree/master/election
# for documentation of this data layer

input_layer = r"election.gpkg/data"

In [4]:
# While geoprocessing tools can operate on saved data, I often prefer to load my data into memory, particularly
# if I need to make modifications to it (like generating a unique_id field for use in OLS regression)
# Whether you do this or not is entirely dependent on what you're trying to accomplish

input_data = arcpy.CopyFeatures_management(input_layer, "in_memory/data")

# Adding a Unique ID Field

In [5]:
# If you have a license for it, the Add Incrementing ID Field will do it, but if not, one can add a unique id 
# field like so:

arcpy.AddField_management(input_data,"unique_id","LONG")

# Arcpy uses "cursors" to loop through the data to inspect and modify data
# This is a fairly awkward way to handle it compared to the normal Pandas syntax.
# This is an example of how it could be done:

x = 1
with arcpy.da.UpdateCursor(input_data,["unique_id"]) as cursor:
    for row in cursor:
        row[0] = x
        cursor.updateRow(row)
        x = x + 1

# Inspecting data

In [6]:
# Inspecting data is most easily done by creating a Pandas Dataframe representation of the feature class like so.
# Keep in mind this is a copy of your data.

df = pd.DataFrame.spatial.from_featureclass(input_data)

In [7]:
df.head()

Unnamed: 0,OBJECTID,STATEFP,COUNTYFP,COUNTYNS,AFFGEOID,GEOID,NAME,LSAD,ALAND,AWATER,...,dem_2012_prc,gop_minus_dem_prc_2012,gop_2016_votes,dem_2016_votes,totalvotes_2016,gop_2016_prc,dem_2016_prc,gop_minus_dem_prc_2016,unique_id,SHAPE
0,1,39,131,1074078,0500000US39131,39131,Pike,6,1140324458,9567612,...,49.08,0.01,7902.0,3539.0,11879.0,66.52,29.79,36.73,1,"{""rings"": [[[-83.35353099999998, 39.1975850000..."
1,2,46,3,1266983,0500000US46003,46003,Aurora,6,1834813753,11201379,...,39.71,17.72,974.0,340.0,1407.0,69.23,24.16,45.07,2,"{""rings"": [[[-98.80777099999995, 43.9352230000..."
2,3,55,35,1581077,0500000US55035,55035,Eau Claire,6,1652211310,18848512,...,55.95,-13.52,23311.0,27294.0,54885.0,42.47,49.73,-7.26,3,"{""rings"": [[[-91.65045499999997, 44.8559510000..."
3,4,72,145,1804553,0500000US72145,72145,Vega Baja,13,118766803,57805868,...,,,,,,,,,4,"{""rings"": [[[-66.44898899999998, 18.3872140000..."
4,5,48,259,1383915,0500000US48259,48259,Kendall,6,1715747531,1496797,...,17.11,64.47,15700.0,3643.0,20120.0,78.03,18.11,59.92,5,"{""rings"": [[[-98.92014699999999, 30.1382900000..."


In [9]:
df.columns.values

array(['OBJECTID', 'STATEFP', 'COUNTYFP', 'COUNTYNS', 'AFFGEOID', 'GEOID',
       'NAME', 'LSAD', 'ALAND', 'AWATER', 'FIPS', 'gop_2000_votes',
       'dem_2000_votes', 'totalvotes_2000', 'gop_2000_prc',
       'dem_2000_prc', 'gop_minus_dem_prc_2000', 'gop_2004_votes',
       'dem_2004_votes', 'totalvotes_2004', 'gop_2004_prc',
       'dem_2004_prc', 'gop_minus_dem_prc_2004', 'gop_2008_votes',
       'dem_2008_votes', 'totalvotes_2008', 'gop_2008_prc',
       'dem_2008_prc', 'gop_minus_dem_prc_2008', 'gop_2012_votes',
       'dem_2012_votes', 'totalvotes_2012', 'gop_2012_prc',
       'dem_2012_prc', 'gop_minus_dem_prc_2012', 'gop_2016_votes',
       'dem_2016_votes', 'totalvotes_2016', 'gop_2016_prc',
       'dem_2016_prc', 'gop_minus_dem_prc_2016', 'unique_id', 'SHAPE'],
      dtype=object)

# Ordinary Least Squares

The OLS run looks like this.  

* It requires an input of the feature class, a unique integer ID, and the output feature class of residuals.
* The next two parameters are the dependent variable (what you're trying to predict) and independent variables.
    * If you need more than one independent variable, separate them by semicolons
* The last three optional parameters include diagnostic tables and reports

See the [OLS documentation for more information](https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/ordinary-least-squares.htm).

The results are not formatted as nicely in Notebook form.  But you can see the all the parameters, including the R<sup>2</sup> values in the output.  The R<sup>2</sup> value is the third item under "OLS Diagnostics".

Results are also written out more nicely to the console (Python Command Window)

In [8]:
arcpy.OrdinaryLeastSquares_stats(input_data,"unique_id",'ols_residuals.shp',
                                 'gop_minus_dem_prc_2016','gop_minus_dem_prc_2012;gop_minus_dem_prc_2008',
                                 'ols_coefficients.dbf','ols_diagnostics.dbf','ols_output_report.pdf')

id,value
0,.\ols_residuals.shp
1,.\ols_coefficients.dbf
2,.\ols_diagnostics.dbf
3,.\ols_output_report.pdf


# Generalized Linear Regression

GLR is a newer version of the OLS tool.  It doesn't require a unique_id, and includes the ability to specify continuous, count (Poisson), and binary (logistic) models.

The GLR run looks like this.  

* It requires inputs of:
    * The feature class
    * The dependent variable field
    * The model type: CONTINUOUS, COUNT, or BINARY
    * The output feature class for the residuals
    * The independent variables.  If more than one, seperate with semicolons
    * More optional parameters

See the [GLR documentation for more information](https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/generalized-linear-regression.htm) for a full explanation.

The results are not formatted as nicely in Notebook form.  But you can see the all the parameters, including the R<sup>2</sup> values in the output.  The R<sup>2</sup> value is the third item under "GLR Diagnostics".

Results are also written out more nicely to the console (Python Command Window)

In [9]:
arcpy.stats.GeneralizedLinearRegression(input_data,"gop_minus_dem_prc_2016",'CONTINUOUS',
                                        'glr_residuals.shp','gop_minus_dem_prc_2012;gop_minus_dem_prc_2008')

id,value
0,.\glr_residuals.shp
1,


# Geographically Weighted Regression

GWR uses nearby neighbors to improve prediction, and can be considered a spatial extension to OLS/GLR.
It doesn't require a unique_id, and includes the ability to specify continuous, count (Poisson), and binary (logistic) models.

The GWR run looks like this.

It requires inputs of:
* The feature class
* The dependent variable field
* The model type: CONTINUOUS, COUNT, or BINARY
* The independent variables. If more than one, seperate with semicolons [Note: These two parameters are reversed from GLR]
* The output feature class for the residuals [Note: These two parameters are reversed from GLR]
* Two parameters to specify how neighbors are defined
* More optional Parameters

See the [GLR documentation](https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/geographicallyweightedregression.htm) for more information for a full explanation.

The results are more nicely formatted than the other two. 
R<sup>2</sup> values are given in the last table, "Model Diagnostics".

In [10]:
arcpy.stats.GWR(input_data,"gop_minus_dem_prc_2016",'CONTINUOUS',
                'gop_minus_dem_prc_2012;gop_minus_dem_prc_2008','gwr_residuals.shp',
               "NUMBER_OF_NEIGHBORS","GOLDEN_SEARCH")

id,value
0,.\gwr_residuals.shp
1,
2,
