# Part 2: Spatial Interaction models
For this section, you will be given a “symbolic” population and the number of jobs for the stations in the underground. You will also be given the number of people that commute from one station to another, through an OD matrix.

## III. Models and calibration



### III.1. Briefly introduce the spatial interaction models covered in the lectures using equations and defining the terms, taking particular care in explaining the role of the parameters.


The spatial interaction models covered in the lectures include the unconstrained model, the production-constrained model, the attraction-constrained model, and the doubly-constrained model.

#### The unconstrained model is expressed as:

\begin{equation} \label{eq:1} \tag{1}
T_{ij} = k O_i D_j^\gamma d_{ij}^{-\beta}
\end{equation}

where $T_{ij}$ represents the number of trips between origin $i$ and destination $j$, $O_i$ represents the total number of trips generated by origin $i$, $D_j$ represents the total number of trips attracted to destination $j$, $d_{ij}$ represents the distance between origin $i$ and destination $j$, and $k$, $\gamma$, and $\beta$ are parameters that control the sensitivity of the model to changes in $O_i$, $D_j$, and $d_{ij}$, respectively, and are to be estimated.

\begin{equation} \tag{2}
T= \sum_i \sum_j T_{ij}
\end{equation}

\begin{equation} \tag{3}
k = \frac{T}{\sum_i \sum_j O_i^\alpha  D_j^\gamma  d_{ij}^{-\beta}}
\end{equation}


The unconstrained (total constrained) model assumes that all flows in the model sum to a known total T, which can be calculated using Equation 2. The constant of proportionality k can be estimated by dividing T by the sum of all the other elements in the model, as shown in Equation 3. The model can be calibrated using observed flow data and origin and destination attributes. The role of the parameters is to capture the effects of origin and destination attributes on the flows and the effect of distance on the flows. Specifically, alpha and gamma capture the effect of origin and destination attributes, respectively, while beta captures the effect of distance.


#### The production-constrained model is expressed as:

\begin{equation} \tag{4}
T_{ij} = A_i O_i D_j^\gamma d_{ij}^{-\beta}
\end{equation}

Where

\begin{equation} \tag{5}
O_i = \sum_j T_{ij}
\end{equation}

where $A_i$ represents the balancing factor for origin $i$ that ensures that the flow estimates from each origin sum to the known total $O_i$. The production-constrained model has no parameter for $O_i$ because it is a known constraint. $\gamma$ and $\beta$ are the exponents that determine the relative importance of production and distance in determining the flow between locations.

The balancing factor $A_i$ is calculated using

\begin{equation} \tag{6}
A_i = \frac{1}{\sum_j D_j^\gamma d_{ij}^{-\beta}}
\end{equation}

\begin{equation} \tag{7}
\lambda_{ij} = \exp (\alpha_i + \gamma \ln D_j - \beta \ln d_{ij})
\end{equation}

To calibrate the Production-constrained Model, a Poisson regression model is used. The model assumes that the logarithm of the right-hand side of Equation (4) is linked to the Poisson distributed mean ($\lambda_{ij}$) of the $T_{ij}$ variable. Equation (4) is re-specified as Equation (7), which includes a fixed effect term $\alpha_i$ that takes the place of the vector of balancing factors $A_i$. In the Poisson regression model, $\alpha_i$ is modeled as a categorical predictor, which means that a categorical identifier is used for the origin rather than the numeric value of $O_i$.

#### The attraction-constrained model is expressed as:

\begin{equation} \tag{8}
T_{ij} = D_j B_j O_i^\alpha d_{ij}^{-\beta}
\end{equation}

Where

\begin{equation} \tag{9}
D_j = \sum_i T_{ij}
\end{equation}

and 

\begin{equation} \tag{10}
B_j = \frac{1}{\sum_i O_i^\alpha d_{ij}^{-\beta}}
\end{equation}

and

\begin{equation} \tag{11}
\lambda_{ij} = \exp (\alpha \ln O_i + \gamma_j - \beta \ln d_{ij})
\end{equation}

where $B_j$ represents the balancing factor for destination $j$ that ensures that the flow estimates to each destination sum to the known total $D_j$. The attraction-constrained model has no parameter for $D_j$ because it is a known constraint.$\alpha$ and $\beta$ are the exponents that determine the relative importance of attractiveness and distance in determining the flow between locations.


#### The doubly-constrained model is expressed as:

\begin{equation} \tag{12}
T_{ij} = A_i B_j O_i D_j d_{ij}^{-\beta}
\end{equation}

where $T_{ij}$ represents the flow between locations i and j, $A_i$ and $B_j$ are the constraints on the origins and destinations respectively, $O_i$ and $D_j$ represent the total flow from and to locations i and j respectively, $d_{ij}$ is the distance between locations i and j, and $\beta$ is the exponent that determines the relative importance of distance in determining the flow between locations.

$A_i$ and $B_j$ are calculated as:

\begin{equation} \tag{13}
A_i = \frac{1}{\sum_j B_j D_j d_{ij}^{-\beta}}
\end{equation}

\begin{equation} \tag{14}
B_j = \frac{1}{\sum_i A_i O_i d_{ij}^{-\beta}}
\end{equation}

The model parameters are $\alpha_i$, which represents the attractiveness of location i, and $\gamma_j$, which represents the production potential of location j. The relationship between the flow and the model parameters is defined as:

\begin{equation} \tag{15}
\lambda_{ij} = \exp (\alpha_i + \gamma_j -\beta \ln d_{ij})
\end{equation}

where $\lambda_{ij}$ represents the predicted flow between locations i and j.

To calculate $A_i$ and $B_j$ iteratively, we can use an algorithm that sets each to equal to 1 initially and then continues to calculate each in turn until the difference between each value is small enough not to matter.

In Python, we can run the doubly constrained model in the same way by using the equation (15) and defining the terms accordingly. The parameter $\beta$ can be estimated using methods such as maximum likelihood estimation or least squares estimation.


### III.2. Using the information of population, jobs and flows, select a spatial interaction model and calibrate the parameter for the cost function (usually denoted as ). It is essential that you justify the model selected.

I used the doubly-constrained model to compute the flows, which takes into account both the production (origin) and attraction (destination) constraints of the transportation system. This is crucial for accurately modeling transportation demand and supply. I created a formula for the model using the variables "flows", "station_destination", "station_origin", and "log_distance", and set the intercept to zero with "-1". Then, I fitted the model using a Poisson regression and calculated the goodness-of-fit.

The resulting $R^2$ value indicated that the model explained 40.77% of the variance in the observed data, and the RMSE value indicated an average error of 101.3 commuters/migrants. Although the model was able to explain a significant portion of the variance, there was still room for improvement in reducing prediction errors.

To account for cases where the effect of distance is less severe, I used the negative exponential function instead of the negative power law. The resulting model had a higher $R^2$ value of 0.4978554141117555 and a lower RMSE value of 93.368, indicating a better fit to the data.

The value of beta calculated using the negative exponential model was 0.0001543696921567826, while the value calculated using the inverse power model was 0.909631760493274. Since the 𝐴𝑖 and 𝐵𝑗 balancing factors depend on each other, they were calculated iteratively. I obtained estimates of 𝐴𝑖 and 𝐵𝑗 by substituting the beta value calculated using the negative exponential model into the previously fitted Poisson model.

The table shows estimated passenger flows (SIM_est_exp) between stations. The highest estimated flow is from Waterloo to Bank and Monument at 4,557 people, with high flows expected between stations like Stratford and Liverpool Street, which are important for tourist attractions, business and financial centers, and transport hubs. These stations are also major interchange points, which requires handling high passenger traffic. Similarly, high estimated flows between Stratford and Canary Wharf can be attributed to the large number of commuters traveling between these two areas. Overall, the estimates seem reasonable and consistent with prior knowledge of the network.

## IV. Scenarios


### IV.1. Scenario A: assume that Canary Wharf has a 50% decrease in jobs after Brexit. Using the calibrated parameter , compute the new flows for scenario A. Make sure the number of commuters is conserved, and explain how you ensured this.


The scenario A assumes that the number of jobs in Canary Wharf will be reduced by 50% after Brexit. The predicted flows under this scenario are calculated using the adjusted jobsnew field and the Ai and Bj fields from the previous model, as well as the calibrated beta parameter. To conserve the total number of commuters, the predicted flows are scaled by a factor, k, which is equal to the ratio of the total observed flows to the total predicted flows from scenario A. The new predicted flows are stored in a new column called "senarioA1". 

The table shows the top 10 pairs of origin and destination stations with the largest increase in predicted passenger flow under scenario A, which includes an increase of approximately 94 between Waterloo and Bank and Monument stations, an increase of approximately 63 between Ilford and Stratford stations, and an increase of approximately 60 between London Bridge and Bank and Monument stations. These changes may be due to changes in travel patterns and destinations as a result of job losses, which have affected footfall between Tube stations.

A more in-depth analysis of the scenario A results shows that the predicted increases in passenger flow are predominantly concentrated in the central and east London areas, where many of the job losses are expected to occur. This suggests that the reduction in job opportunities may lead to changes in commuting patterns, with commuters from these areas seeking employment opportunities elsewhere. The large increase in passenger flow between Waterloo and Bank and Monument stations may also be due to the fact that these stations are major interchange stations on the Waterloo & City and Northern lines, which are important transport links for commuters travelling to the financial district.

These changes may also have an impact on the resilience of the London Underground network. For example, if there is too much of an increase in footfall at some Underground stations, this could lead to overcrowding at these stations, affecting the efficiency and safety of Underground operations. In addition, if the Tube network is unable to adapt to these changes, adjustments and improvements may be required to ensure the resilience and sustainability of the network.

### IV.2. Scenario B: assume that there is a significant increase in the cost of transport. Select 2 values for the parameter in the cost function reflecting scenario B. Recompute the distribution of flows.


### IV.3. Discuss how the flows change for the 3 different situations: scenario A, and scenario B with two selections of parameters. Which scenario would have more impact in the redistribution of flows? Explain and justify your answers using the results of the analysis.