## Problem 3: How far did people travel? (8 points)

During this task, the aim is to calculate the (air-line) distance in meters that each social media user in the data set prepared in *Problem 2* has travelled in-between the posts. We’re interested in the Euclidean distance between subsequent points generated by the same user.

For this, we will need to use the `userid` column of the data set `kruger_posts.shp` that we created in *Problem 2*.

Answer the following questions:
- What was the shortest distance a user travelled between all their posts (in meters)?
- What was the mean distance travelled per user (in meters)?
- What was the maximum distance a user travelled (in meters)?

---


### a) Read the input file and re-project it

- Read the input file `kruger_points.shp` into a geo-data frame `kruger_points`
- Transform the data from WGS84 to an `EPSG:32735` projection (UTM Zone 35S, suitable for South Africa). This CRS has *metres* as units.

In [1]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import pathlib
from shapely.geometry import LineString, Polygon, Point

In [2]:
# ADD YOUR OWN CODE HERE
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_PATH = NOTEBOOK_PATH / "data" / "kruger_points.shp"
kruger_points = gpd.read_file(DATA_PATH)

# Transform projection
kruger_points = kruger_points.to_crs("EPSG:32735")

In [3]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# Check the data
kruger_points.head()

Unnamed: 0,lat,lon,timestamp,userid,geometry
0,-24.980792,31.484633,2015-07-07 03:02,66487960,POINT (952912.890 7229683.258)
1,-25.499225,31.508906,2015-07-07 03:18,65281761,POINT (953433.223 7172080.632)
2,-24.342578,30.930866,2015-03-07 03:38,90916112,POINT (898955.144 7302197.408)
3,-24.854614,31.519718,2015-10-07 05:04,37959089,POINT (956927.218 7243564.942)
4,-24.921069,31.520836,2015-10-07 05:19,27793716,POINT (956794.955 7236187.926)


In [4]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# Check that the crs is correct after re-projecting (should be epsg:32735)
import pyproj
assert kruger_points.crs == pyproj.CRS("EPSG:32735")

### b) Group the data by user id

Group the data by `userid` and store the grouped data in a variable `grouped_by_users`

In [5]:
# ADD YOUR OWN CODE HERE

In [6]:
grouped_by_users = kruger_points.groupby(['userid'])

In [7]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# Check the number of groups:
assert len(grouped_by_users.groups) == kruger_points["userid"].nunique(), "Number of groups should match number of unique users!"

### c) Create `shapely.geometry.LineString` objects for each user connecting the points from oldest to most recent

There are multiple ways to solve this problem (see the [hints for this exercise](https://autogis-site.readthedocs.io/en/latest/lessons/lesson-2/exercise-2.html). You can use, for instance, a dictionary or an empty GeoDataFrame to collect data that is generated using the steps below:

- Use a for-loop to iterate over the grouped object. For each user’s data: 
    - [sort](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) the rows by timestamp 
    - create a `shapely.geometry.LineString` based on the user’s points

**CAREFUL**: Remember that every LineString needs at least two points. Skip users who have less than two posts.

Store the results in a `geopandas.GeoDataFrame` called `movements`, and remember to assign a CRS.

In [8]:
# ADD YOUR OWN CODE HERE
movements = {"geometry":[]}

for userid, group in grouped_by_users:
    sorted_group = group.sort_values(by = ["timestamp"])

    if len(sorted_group) >= 2:
        # Create a LineString from the sorted points
        lines = LineString(sorted_group[["lat", "lon"]].values)
        movements["geometry"].append(lines)

movements = gpd.GeoDataFrame(movements, crs="EPSG:32735")

In [9]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# Check the result
print(type(movements))
print(movements.crs)

movements

<class 'geopandas.geodataframe.GeoDataFrame'>
EPSG:32735


Unnamed: 0,geometry
0,"LINESTRING (-24.760 31.371, -24.750 31.338, -2..."
1,"LINESTRING (-25.321 31.026, -25.321 31.026)"
2,"LINESTRING (-24.770 31.394, -24.993 31.593, -2..."
3,"LINESTRING (-25.329 31.000, -25.329 31.000)"
4,"LINESTRING (-25.067 31.551, -24.993 31.593)"
...,...
9021,"LINESTRING (-25.290 31.000, -25.295 31.011, -2..."
9022,"LINESTRING (-24.993 31.593, -24.993 31.592, -2..."
9023,"LINESTRING (-24.305 31.322, -24.305 31.322)"
9024,"LINESTRING (-24.299 31.293, -24.276 31.299)"


### d) Calculate the distance between all posts of a user

- Check once more that the CRS of the data frame is correct
- Compute the lengths of the lines, and store it in a new column called `distance`

In [10]:
# ADD YOUR OWN CODE HERE
def calculate_distance(row):
    return row["geometry"].length

movements["distance"] = movements.apply(calculate_distance, axis = 1)

In [11]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

#Check the output
movements.head()

Unnamed: 0,geometry,distance
0,"LINESTRING (-24.760 31.371, -24.750 31.338, -2...",3.158937
1,"LINESTRING (-25.321 31.026, -25.321 31.026)",0.0
2,"LINESTRING (-24.770 31.394, -24.993 31.593, -2...",1.490359
3,"LINESTRING (-25.329 31.000, -25.329 31.000)",7.28011e-07
4,"LINESTRING (-25.067 31.551, -24.993 31.593)",0.08527984


### e) Answer the original questions

You should now be able to quickly find answers to the following questions: 
- What was the shortest distance a user travelled between all their posts (in meters)? (store the value in a variable `shortest_distance`)
- What was the mean distance travelled per user (in meters)? (store the value in a variable `mean_distance`)
- What was the maximum distance a user travelled (in meters)? (store the value in a variable `longest_distance`)

In [12]:
# ADD YOUR OWN CODE HERE
shortest_distance = round(movements["distance"].min(), 2)
mean_distance = round(movements["distance"].mean(), 2)
longest_distance = round(movements["distance"].max(), 2)

print(f"Shortest: {shortest_distance}; Mean: {mean_distance}; Longest: {longest_distance}")

Shortest: 0.0; Mean: 1.0; Longest: 63.95


### f) Save the movements in a file

Save the `movements` into a new Shapefile called `movements.shp` inside the `data` directory.

In [13]:
# ADD YOUR OWN CODE HERE
DATA_DIRECTORY = NOTEBOOK_PATH / "data"
movements.to_file(DATA_DIRECTORY / "movements.shp")

In [14]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

assert (DATA_DIRECTORY / "movements.shp").exists()


---

# Fantastic job!

That’s all for this week! 