# Processing data

Our fake data will be geo data with lat and long.

For the purposes of this tutorial we will simulate doing some analysis and generate three outputs from each bundle of data:
* The geo centroid (the average lat and long)
* The most northerly datapoint

We can then return in that in some sort of data structure that would then be written to a database.

In [1]:
from faker import Faker
import random

Firstly let's create some fake geo data

In [2]:
fake = Faker()
data = []
for i in range(8):
    data.append(fake.location_on_land())

data

[('53.94313', '10.30215', 'Bad Segeberg', 'DE', 'Europe/Berlin'),
 ('13.48082', '-86.58208', 'Somoto', 'NI', 'America/Managua'),
 ('-31.4488', '-60.93173', 'Esperanza', 'AR', 'America/Argentina/Cordoba'),
 ('33.08014', '-83.2321', 'Milledgeville', 'US', 'America/New_York'),
 ('30.16688', '-96.39774', 'Brenham', 'US', 'America/Chicago'),
 ('17.94979', '-94.91386', 'Acayucan', 'MX', 'America/Mexico_City'),
 ('41.16704', '-73.20483', 'Bridgeport', 'US', 'America/New_York'),
 ('23.29549', '113.82465', 'Licheng', 'CN', 'Asia/Shanghai')]

In [3]:
def process_data(data: list):

    # Calculate the average lat and average long from the data
    lats = [float(x[0]) for x in data]
    average_lat = sum(lats) / len(lats)
    longs = [float(x[1]) for x in data]
    average_long = sum(longs) / len(longs)

    # The most northerly datapoint
    most_northerly_lat = max(lats)
    most_northerly_index =  lats.index(most_northerly_lat)
    most_northerly = data[most_northerly_index]

    # Package up into processed data
    result = {
        "centroid": (average_lat, average_long),
        "most_northerly": most_northerly
    }

    return result
    

In [4]:
process_data(data)

{'centroid': (22.70431125, -46.3919425),
 'most_northerly': ('53.94313',
  '10.30215',
  'Bad Segeberg',
  'DE',
  'Europe/Berlin')}

This function can mimic our processing data.