# How to transfer data from one GIS layer to another - Overlay

## Setup

Suppose we have two separate GeoDataFrames, both containing different pieces of information about the same route, but the line breaks don't line up across the two GeoDataFrames. 

Here is a visual example of the problem:

![example setup](union_setup.png "Example Setup")
Figure 1: Example Setup

The blue line on top (labeled "Reference") has three segments: from 0 to 2, from 2 to 4 and from 4 to 5. 
The orange line on the bottom (labelled "Input") has two segments: from 1 to 3, and from 3 to 6. 
Notice how the breaks don't line up nicely across them, and there are even parts where the Reference layer has data and the Input doesn't (and vice versa). Furthermore, notice how the Reference layer has information about the `Value` variable for each of its segments, while the Input has information about the `Category` variable for each of its segments.

It is worth noting that, in this example, all of the lines shown are assumed to belong to the same Route ID, so that information is ommitted from the images.

## The idea behind the solution

It is quite common for us to have situations like these, where linework was generated with different breakpoints, each one of which contains important information, and we are tasked with merging or joining the data to figure out the characteristics of the entire route for all variables across all of the disparate layers. 

The idea behind it is actually quite simple. Ultimately, a new DataFrame is created with breaks at every point across the two layers. You can think about this process as trying to find the smallest common denominator, one little piece at a time, across the Reference and Input layers. 

![how the results look](union_result.png "Example Results")
Figure 2: Solution

In the output, we can see several smaller segments: one from 0 to 1, another from 1 to 2, and so on. This way, it becomes easy to figure out what the `Value` and `Category` values need to be for each little segment in the output.

## Solving the problem with code

In `linref`, we can solve this problem using a process called `union`. For reference, this process is sometimes called "overlay" (or overlaying) in other software, such as ArcGIS Pro. 

Let's walk through how we would make this work using `linref`.

### Basic setup
The first thing to do is to load up `pandas`, `linref` and create the input data we will use.

In [1]:
# Loading important libraries
import pandas as pd
import linref as lr

In [2]:
# Creating synthetic data 
reference_df = pd.DataFrame({'reference_rowid':[1,2,3],
                             'route_id':['main_route','main_route','main_route'],
                             'beg':[0,2,4],
                             'end':[2,4,5],
                             'val':[3.5,6,1.5]})

reference_ec = lr.EventsCollection(reference_df , 
                                   keys=['route_id'],
                                   beg='beg',
                                   end='end',
                                   )

input_df = pd.DataFrame({'input_rowid':[100,200],
                         'route_id':['main_route','main_route'],
                         'beg':[1,3],
                         'end':[3,6],
                         'categ':['A','B']
                         })

input_ec = lr.EventsCollection(input_df, 
                               keys=['route_id'],
                               beg='beg',
                               end='end',
                               )

### Creating an `EventsUnion` and executing a `union()`

Now we have two `EventCollection` objects: `reference_ec` (analogous to the blue "Reference" linework) and `input_ec` (analogous to the orange "Input" linework), identical to the setup shown in Figure 1 above. 

The only thing we need to do is create an `EventsUnion` object and then execute the `union()` method, as shown below.

In [3]:
# Creating the EventsUnion object
eu = lr.EventsUnion([reference_ec,input_ec])

# Executing the `union()` method
union_ec = eu.union()

  group = self._build_group(self._groups.get_group(keys))
  group = self._build_group(self._groups.get_group(keys))


Once the `union()` method is executed, it creates a new `EventsCollection` object containing the results.

We can then look inside the resulting `DataFrame` as follows:

In [4]:
# Extracting the DataFrame from the result of the EventsUnion
union_df = union_ec.df.copy()

print(union_df)

     route_id  beg  end  index_0  index_1
0  main_route  0.0  1.0      0.0      NaN
1  main_route  1.0  2.0      0.0      0.0
2  main_route  2.0  3.0      1.0      0.0
3  main_route  3.0  4.0      1.0      1.0
4  main_route  4.0  5.0      2.0      1.0
5  main_route  5.0  6.0      NaN      1.0


### Merging information from the original DataFrames into the output from the `union()`

The resulting `DataFrame` is broken down into the smallest common segments across both members of the union, just like we showed in Figure 2. 

The two `index` columns (`index_0` and `index_1`) contain information about how each one of its segments relates to the rows in the Reference and Input layers. Specifically, `index_0` and `index_1` are the indexes of the rows of the `reference_df` and `input_df` objects, respectively. 

Therefore, if we want to join or merge any data from those original layers, we can just use a basic pandas `merge()`, with `left_on="index_0"` (or `left_on="index_1"`) and with `right_index=True`. 

In [5]:
# Merging the output of the EventsUnion with the original `reference_df` and `input_df` 
# to carry over the `'val'` and `'categ'` columns, respectively.
out_df = (union_df
          .merge(reference_df[['val']], 
                 how='left', 
                 left_on='index_0', 
                 right_index=True)
          .merge(input_df[['categ']], 
                 how='left', 
                 left_on='index_1', 
                 right_index=True)
          )

print(out_df)

     route_id  beg  end  index_0  index_1  val categ
0  main_route  0.0  1.0      0.0      NaN  3.5   NaN
1  main_route  1.0  2.0      0.0      0.0  3.5     A
2  main_route  2.0  3.0      1.0      0.0  6.0     A
3  main_route  3.0  4.0      1.0      1.0  6.0     B
4  main_route  4.0  5.0      2.0      1.0  1.5     B
5  main_route  5.0  6.0      NaN      1.0  NaN     B


Finally, the result looks exactly like the green Output line shown in Figure 2, where each little segment has information about the `Value` and `Category` segment. 