In [1]:
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import path
from pathlib import Path
import shutil
import gdown
from exif import Image
import gpxpy
import gpxpy.gpx

# Project scope 🔍

**Mi camino** aims to be a little project for tracking my daily progress during my *Camino de Santiago* on bike. It will consist of several Python scripts that perform the following actions:

- Download all files in a *Google Drive* folder (gpx files and pictures) at set intervals.
- Parse the gpx files.
- Otain the location of the pictures using the metadata.
- Display the latest progress (parsed gpx files) on a map via *Folium*.
- Send the pictures to my *Raspberry* and display them on the map with a marker on the point they were taken.
- Display the current progress (km/elevation).

For my convenience I chose to use a *Google Drive* folder to store all gpx files and pictures that I want to upload during the route. This way I can limit the data uploads to once/twice a day and I can keep the scripts *relatively* simple, with just a few lines of web scraping.

## Downloading data from *Google Drive*

*Google* provides an API to interact with *Drive*, but since I only want to download a few files at a time and perform no uploads whatsoever I found a much simpler way to do so via the *gdown* library.

Once you have downloaded the folder it's simply a matter of accessing it using our beloved *Selenium*. Please notice that I'm not using the regular *chromedriver* but a library that automatically downloads and runs it for you, eliminating the risk of an out-of-date chromedriver.

In [2]:
# For security purposes it's good practice to store private links in txt files and add the to the gitignore

link = open('download_link.txt','r').readline() # Reading the file containing the link

In [3]:
# Downloading the folder using gdown

url = link
gdown.download_folder(url, quiet=True, use_cookies=False)

['C:\\Users\\User\\micamino\\camino\\Primera_etapa_Pirinexus.gpx',
 'C:\\Users\\User\\micamino\\camino\\PXL_20220629_145708603.jpg']

## Unzipping the file and reading its contents

The driver now contains both types of files that we'll encounter, *gpx* files and pictures. Let's filter the filenames in both categories.

In [5]:
directory = r'C:\Users\User\micamino\camino' # Out download folder
files = Path(directory).glob('*') # Using all files in the folder as input
files = list(files)

gpx = [] # We'll hold gpx file paths
images = [] # Same for images
for file in files:
    if '.jpg' in str(file): # Filtering by filename
        images.append(file)
    elif '.gpx' in str(file):
        gpx.append(file)
    else:
        pass
    
print(gpx)
print(images)

[WindowsPath('C:/Users/User/micamino/camino/Primera_etapa_Pirinexus.gpx')]
[WindowsPath('C:/Users/User/micamino/camino/PXL_20220629_145708603.jpg')]


## Image processing: obtaining coordinates

Since we want to display on the map the images with a marker on the point they were taken, we will need to extract their coordinates. This can be achieved via the *EXIF* data embedded in each picture. Let's try it!

In [6]:
# Let's open the image

img_path = images[0] # Using the path we just obtained
with open(img_path, 'rb') as src:
    img = Image(src)

In [7]:
# Now let's access its longitude

img.gps_longitude

(2.0, 24.0, 8.57)

As we can see, the longitude and latitude are in degrees, minutes and seconds. We'll need to use a little function to convert those coordinates to decimal degrees, as well as making the process more streamlined.

In [8]:
# Let's first define a function that simply converts the coordinates to decimal degrees.
# We'll have to take into account the orientation (ref), because the result will vary whether it's facing south or west.

def converter(coords, ref):
    ####################
    #Input: coordinates and ref (orientation) of the picture, as expressed by the parser
    # Output: if there's coordinates, returns them in decimal degrees.
    ####################
    decimal_degrees = coords[0] + coords[1] / 60 + coords[2] / 3600 # Converting to decimal degrees
    if ref == 'S' or ref == 'W':
        decimal_degrees = -decimal_degrees # Changing sign if it's facing south or west
    return decimal_degrees

In [9]:
# Now let's incorporate it into a new function that will return the coordinates if there's any, and simply
# return False if there aren't. This way we can use the same function to know if an image has coordinates
# and also retrieve them.

def coordinates(image_path):
    ####################
    #Input: path of an image
    # Output: coordinates if there's any, False in any other case
    ####################
    with open(img_path, 'rb') as src: # Accessing the image 
        img = Image(src)    
    if img.has_exif:
        try:
            img.gps_longitude
            coords = (converter(img.gps_latitude, # Using our previously defined function
                      img.gps_latitude_ref),
                      converter(img.gps_longitude,
                      img.gps_longitude_ref))
        except:
            return False # Returning False if the process fails at any point
    else:
        return False
        
    return coords # Returning the coords

Success!

In [10]:
#Let's try it out with the image we downloaded

coordinates(images[0])

(41.65405277777778, 2.4023805555555553)

# Flowchart 🌊

The tool behind the **mi Camino** webpage will have two main components: the main loop and the map creator.

The main loop will check every *xx* minutes if there's new files (be it gpx or images) in the shared folder, and process them accordingly.

The map creator will use the files generated or updated by the main loop to create a new map, which will be displayed in the website.

I will now proceed to explain every part in detail, as well as the file system I'll have in place:

## File system

The *gpx* files will stay in the original folder, since they only need to be parsed once. All images will be moved to a separate **img** folder, where they will be indexed and accessed by the website (via **Nginx**).

Both the main loop and the map creator will use and access several *csv* files, which will mainly act as lightweight dataframe holders. Using *pkl* files was also considered, but it wasn't worth the hassle since read/write speeds aren't critical in our use case. 

The *csv* files that will be used consist of the following:

- **file_log.csv**: contains the original path of every processed file, to prevent duplicates.

- **images.csv**: contains both the filepath and coordinates of every picture.

- **route.csv**: holds the parsed gpx files of the route I've cycled until that point. Every row is a point, as per gpx standard.

- **camino.csv**: the original Camino de Santiago route, to be more specific the french Way. Since every day I'll be traversing part of this route, it will get shorter accordingly. It will always be the original route - the contents of *route.csv*.

- **markers.csv**: it will contain information (coordinates, text, html code...) necessary to create map markers when necessary. For example, there will be a marker both at the start/end of the route and at the end of every day's journey, something vital to track overall progress.

## Main loop

The main loop will go through the following steps:

**1.** Download all files from the shared folder.

**2.** Check filenames against a file log (*file_log.csv*) to detect duplicates.

**3.** If there's no new files, the loop will stop at this point. If there are, it will continue.

**4.** New files are added to the file log, marking them as processed.

**5.** Move images to *img* folder. Store their file paths and image coordinates as a new row in *images.csv*.

**6.** Parse gpx files and add the new points to *route.csv*.

**7.** Find the closest point to the route for the track in *camino.csv*, which contains the original route from start to finish. Delete the necessary rows so that the remaining route is the original route - distance travelled.

**8.** If the date of the parsed *gpx* files is different from the last gpx, a new entry will be created in *markers.csv*.

## Map creator

The map creator will perform the following actions:

**1.** Create a new map with bounds (size auto-adjusts).

**2.** Plot both routes (*route.csv* and *camino.csv*).

**3.** Create and display a marker for every image in *images.csv*.

**4.** Create and display a marker for every row in *markers.csv*.

**5.** Save the resulting map with the required filename. 

Now that the basic logic behind our project has been established, let's get to business.

# Development 🔧  

Let's begin by creating and saving the *csv* files we defined earlier.

In [11]:
file_log = pd.DataFrame(columns=['filepath'])
file_log.to_csv('file_log.csv', index=False)

images = pd.DataFrame(columns=['filepath', 'coords'])
images.to_csv('images.csv', index=False)

route = pd.DataFrame(columns=['coords','alt', 'time'])
route.to_csv('route.csv', index=False)

camino = pd.DataFrame(columns=['coords','alt'])
camino.to_csv('camino.csv', index=False)

markers = pd.DataFrame(columns=['coords', 'text', 'html', 'icon', 'color'])
markers.to_csv('markers.csv', index=False)

Now we have the empty *csv* files, which is fine for all of them except *camino.csv*, which should hold the parsed *gpx* file containing the whole route. I won't be following it all the time, but it's a good guideline.

In this step we'll parse the *gpx* file and store its contents.

In [12]:
filename = 'camino.gpx' # The gpx file we need to parse
gpx_file = open(filename, 'r', encoding='utf-8') # Opening it, we might encounter encoding issues
gpx = gpxpy.parse(gpx_file) #Parsing the file
data = gpx.tracks[0].segments[0].points # Extracting all data points

Now we'll use the latitude/longitude/elevation attributes to extract the coordinates from each point.

In [13]:
coords = [] #Storing the coordinates
alt = [] # Same for the elevation

for point in data:
    point_coords = (point.latitude,point.longitude) # Obtaining the coordinates from every point
    point_alt = point.elevation
    coords.append(point_coords) #Appending it to the list
    alt.append(point_alt)

In [14]:
# Let's store those values inside the corresponding csv.

df = pd.read_csv('camino.csv')

df['coords'] = coords
df['alt'] = alt

df.head()

Unnamed: 0,coords,alt
0,"(43.010221, -1.319525)",953.053
1,"(43.009372, -1.319931)",951.804
2,"(43.009108, -1.319748)",950.914
3,"(43.00853, -1.319887)",947.508
4,"(43.007335, -1.319483)",942.526


In [15]:
# The only thing left to do is save the csv.

df.to_csv('camino.csv', index=False)

### Detecting new files

The first step in the main loop will be downloading the files in the shared folder and checking for new files. To perform this task we'll re-use the code at the beginning of this notebook.

In [16]:
def downloader():
    ####################
    # Input: none required, but "url" must point to a valid GDrive folder
    # Output: dictionary containing a list for gpx filenames, same for images
    ####################
    link = open('download_link.txt','r').readline()
    url = link
    gdown.download_folder(url, quiet=True, use_cookies=False)
        
    directory = r'C:\Users\User\micamino\camino' # Out download folder
    
    files = Path(directory).glob('*') # Using all files in the folder as input
    files = list(files)

    gpx = []
    images = []
    for file in files:
        if '.jpg' in str(file):
            images.append(file)
        elif '.gpx' in str(file):
            gpx.append(file)
        else:
            pass

    return {'gpx': gpx, 'images': images} # The function returns a dictionary of lists, with all filenames

In [17]:
files = downloader() # Running the function we just created

print(files['gpx']) # Accessing the images
print(files['images'])

[WindowsPath('C:/Users/User/micamino/camino/Primera_etapa_Pirinexus.gpx')]
[WindowsPath('C:/Users/User/micamino/camino/PXL_20220629_145708603.jpg')]
