<a href="https://colab.research.google.com/github/whrc/ARTS/blob/main/Tutorial/data_formatting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Metadata formatting for the ARTS data set

Heidi Rodenhizer, Yili Yang

Jan 2024

## Dependencies

In [1]:
%%capture
pip install git+https://github.com/whrc/ARTS.git    

In [2]:
import uuid
import numpy as np
import pandas as pd
import geopandas as gpd
import warnings
import re
from datetime import datetime
import os
from os.path import dirname
from pathlib import Path
from ARTS import dataformatting

## User-Defined Input

Are you using colab to run this script? Provide 'True' or 'False':

In [3]:
colab = False

if colab:
    from google.colab import drive
    
    drive.mount("/content/drive")


Would you like to run this as a demo with the mock data set? Provide 'True' (mock data demo) or 'False' (actual data processing of a new contribution):

In [4]:
demo = True

Before starting, copy your new RTS dataset (can be a shapefile or a geojson) into a directory called "input_data" within the directory in which you would like to work. If you do not change any code in the following code chunk, the default directory will be the directory up one level from the location of this script, if you are running the script locally, **OR** a folder called "ARTS" in MyDrive, if you are using colab (you will have to create this folder manually).

Default file structure:
    
    > ARTS (default working directory)
        > ARTS_main_dataset
        > img
        > **input_data**
        > src
        > Tutorial

Provide the location of the directory in which you are working:

In [5]:
if colab:
    base_dir = Path("/content/drive/MyDrive/ARTS")
else:
    base_dir = Path('..')

print('Your base directory is ' + str(base_dir.resolve()))

Your base directory is /home/hrodenhizer/Documents/permafrost_pathways/rts_mapping/ARTS


Provide the file name of the data:

In [6]:
# set this - if demo == True, what this is set to doesn't matter
your_file = 'new_data.geojson' # set this
dataset_version = 'v.1.0.0' # set this to most recent version

# leave everything else in this chunk alone
if demo:
    # RTS data set to be processed
    your_rts_dataset_file = 'rts_dataset_test_polygons_new.geojson'
    your_rts_dataset_filepath = base_dir / 'Tutorial' / 'mock_dataset' / 'input_data' / your_rts_dataset_file
    
    # ARTS main dataset to be appended
    ARTS_main_dataset_filepath = base_dir / 'Tutorial' / 'mock_dataset' / 'input_data' / 'rts_dataset_test_polygons_current.geojson'
    
else:
    # RTS data set to be processed
    your_rts_dataset_file = your_file
    your_rts_dataset_filepath = base_dir / 'input_data' / your_rts_dataset_file
    
    # ARTS main dataset to be appended
    ARTS_main_dataset_filepath = base_dir / 'ARTS_main_dataset' / dataset_version / 'ARTS_main_dataset.geojson'
    
# Metadata Description file
metadata_filepath = base_dir / 'Metadata_Format_Summary.csv'


Provide the names of any metadata fields in your new file that are not already in the official RTS Data Set (please check the list to ensure that the field has not been included previously) that you would like to be included in the compiled data set:

In [7]:
# Provide new metatdata fields as a list of the character column names. If there are no new fields, leave the code assigning an empty list.
# If your new file is a shapefile, also provide a list of the abbreviated names
# Example:
# new_fields = ['CustomColumn1', 'CustomColumn2']
# Shapefile example:
# new_fields_abbreviated = ['CstmCl1', 'CstmCl2']
new_fields = []
new_fields_abbreviated = []

Have you already created RTS centroid columns, or would you like them to be created within this script? Provide either True, if the columns do not exist yet, or False, if you have already created them:

In [8]:
# Example:
# calculate_centroid = False
calculate_centroid = False

Would you like your formatted new data to be output in its own file (in which case you will email the file of new features to us to merge with the compiled data set) or appended to the compiled dataset (in which case you will commit your updated file to your forked github repository and create a pull request to add the file to the official github repository). Your decision here should mostly be based on your comfort with github. If you have no idea what any of the details about the GitHub stuff means, please opt for the separate file and email it to us.

In [9]:
# Example
# separate_file = True
separate_file = False

# Import Metadata Format Summary

In [10]:
metadata_format_summary = pd.read_csv(metadata_filepath)

required_fields = list(
    metadata_format_summary[metadata_format_summary.Required == "True"].FieldName.values
)

generated_fields = list(
    metadata_format_summary[
        metadata_format_summary.Required == "Generated"
    ].FieldName.values
)

optional_fields = list(
    metadata_format_summary[
        metadata_format_summary.Required == "False"
    ].FieldName.values
)

all_fields = required_fields + generated_fields + optional_fields + new_fields

metadata_format_summary

Unnamed: 0,FieldName,Format,Required,Description
0,CentroidLat,Decimal Degrees,True,"Polygon centroid latitude in EPSG:4326, round ..."
1,CentroidLon,Decimal Degrees,True,"Polygon centroid longitude in EPSG:4326, round..."
2,RegionName,String,True,Name of the geographical region
3,CreatorLab,String,True,Data creator and associated organization
4,BaseMapDate,String,True,Date of base map used for RTS delineation in Y...
5,BaseMapSource,String,True,Name of the satellite sensor used for RTS deli...
6,BaseMapResolution,Number,True,Resolution of the imagery used for RTS delinea...
7,TrainClass,String,True,'Positive’ for genuine RTS and ‘Negative’ for ...
8,LabelType,String,True,"Type of digitisation, e.g. ‘Polygon’, ‘Boundin..."
9,MergedRTS,String,Generated,UIDs of intersecting RTS that merged into one RTS


# Load the Main ARTS Data Set

In [11]:
ARTS_main_dataset = gpd.read_file(ARTS_main_dataset_filepath).filter(
    items=required_fields + generated_fields + optional_fields + ["geometry"]
)

ARTS_main_dataset.ContributionDate = pd.to_datetime(ARTS_main_dataset.ContributionDate)

for field in required_fields:  # Check if all required columns are present
    if field not in ARTS_main_dataset.columns:
        raise ValueError(
            "{field} is missing. Has the RTS data set been modified since download?".format(
                field=repr(field)
            )
        )

ARTS_main_dataset

Unnamed: 0,CentroidLat,CentroidLon,RegionName,CreatorLab,BaseMapDate,BaseMapSource,BaseMapResolution,TrainClass,LabelType,MergedRTS,StabilizedRTS,ContributionDate,UID,Area,geometry
0,70.01668,68.33918,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-09-30",WorldView-2,4.0,Positive,Polygon,,,2023-09-01,b4bae416-9fde-5d91-920d-731bcf042b2d,7581.395967,"POLYGON ((2007198.307 865988.469, 2007189.916 ..."
1,70.01622,68.33917,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-09-30",WorldView-2,4.0,Positive,Polygon,,,2023-09-01,10f75ab9-2297-5b04-97ad-559b34fa020f,3621.349764,"POLYGON ((2007253.161 866032.001, 2007235.776 ..."
2,70.01648,68.33242,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-09-30",WorldView-2,4.0,Positive,Polygon,,,2023-09-01,ff0d265e-385c-53c2-9c3a-28e885a220d2,1339.292585,"POLYGON ((2007310.378 865857.070, 2007340.692 ..."
3,70.0155,68.3295,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-09-30",WorldView-2,4.0,Positive,Polygon,,,2023-09-01,297dc622-3584-5d79-8b7b-b4f5a67fa8a4,3482.02968,"POLYGON ((2007453.557 865845.775, 2007456.723 ..."
4,70.01451,68.33296,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-09-30",WorldView-2,4.0,Positive,Polygon,,,2023-09-01,7c64ad8e-07be-5ba5-8f97-19374f809af1,134.941981,"POLYGON ((2007514.965 865926.094, 2007508.132 ..."
5,70.01437,68.33493,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-09-30",WorldView-2,4.0,Positive,Polygon,,,2023-09-01,e36abf57-ffb3-5c6a-be32-d8278f385a73,411.580601,"POLYGON ((2007496.803 865994.362, 2007488.866 ..."


# Load Your New RTS Data Set

In [12]:
# pre-processing your rts data
if re.search('\\.geojson', str(your_rts_dataset_file)):
    new_dataset = dataformatting.preprocessing(
        your_rts_dataset_filepath,
        required_fields,
        generated_fields,
        optional_fields,
        new_fields,
        None,
        calculate_centroid = calculate_centroid
    )

elif re.search('\\.shp', str(your_rts_dataset_file)):
    new_dataset = dataformatting.preprocessing(
        your_rts_dataset_filepath,
        required_fields,
        generated_fields,
        optional_fields,
        new_fields,
        new_fields_abbreviated,
        calculate_centroid
    )
new_dataset

Unnamed: 0,CentroidLat,CentroidLon,RegionName,CreatorLab,BaseMapDate,BaseMapSource,BaseMapResolution,TrainClass,LabelType,geometry
0,70.01655,68.33926,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007199.012 865984.608, 2007188.217 ..."
1,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007289.337 866129.959, 2007283.521 ..."
2,70.01652,68.33235,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007340.152 865838.514, 2007326.959 ..."
3,70.01531,68.33115,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007453.557 865845.775, 2007456.416 ..."
4,70.01457,68.33342,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007492.587 865949.766, 2007492.342 ..."
5,70.01448,68.33495,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007479.747 865990.850, 2007472.484 ..."
6,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-9-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007289.337 866129.959, 2007283.521 ..."


# Check Metadata Format of New Data

In [13]:
dataformatting.run_formatting_checks(new_dataset)

Formatting looks good!


# Generate UIDs

Set seed for UID generation (R) by concatenating all required metadata columns (except UID) into a single string

In [14]:
dataformatting.seed_gen(new_dataset)
new_dataset.seed

0    70.0165568.33926Yamal-GydanRodenhizer2023-05-0...
1    70.0154368.34071Yamal-GydanRodenhizer2023-05-0...
2    70.0165268.33235Yamal-GydanRodenhizer2023-05-0...
3    70.0153168.33115Yamal-GydanRodenhizer2023-05-0...
4    70.0145768.33342Yamal-GydanRodenhizer2023-05-0...
5    70.0144868.33495Yamal-GydanRodenhizer2023-05-0...
6    70.0154368.34071Yamal-GydanRodenhizer2022-05-0...
Name: seed, dtype: object

Generate UIDs

In [15]:
new_dataset["UID"] = [
    str(uuid.uuid5(uuid.NAMESPACE_DNS, name=seed)) for seed in new_dataset.seed
]
new_dataset.UID

0    697680a4-9707-59fb-aabb-540308cf0705
1    edb47fed-2c5d-59f0-9609-dd230ab25a58
2    d10b5ffe-ff73-57cf-a12b-4b74142f0b98
3    1fae5c95-c99a-5400-ad29-6273ecbbaf94
4    e9627675-6398-5d4f-a3c0-19d0ef5511df
5    f6593433-6229-5324-85cf-e3edccae5420
6    aedeff78-0897-5159-aefd-c5a5885475c8
Name: UID, dtype: object

# Check for Intersections with RTS Data Set

Find intersecting RTS polygons from the official RTS data set and retrieve their UIDs. Create empty columns to manually classify the repeated polygons.

In [16]:
with warnings.catch_warnings():
    warnings.filterwarnings('ignore', r'All-NaN (slice|axis) encountered')
    if demo:
        intersections_output_filepath = (
            base_dir / 'Tutorial' / 'mock_dataset' / 'output' / (
                str(your_rts_dataset_file).split('.')[0] + "_overlapping_polygons.geojson"
            )
        )
    else:
        intersections_output_filepath = (
            base_dir / 'output' / (
                str(your_rts_dataset_file).split('.')[0] + "_overlapping_polygons.geojson"
            )
        )
    
    new_dataset = dataformatting.check_intersections(
        new_dataset, ARTS_main_dataset, intersections_output_filepath, demo
    )
new_dataset

   CentroidLat  CentroidLon   RegionName  CreatorLab            BaseMapDate  \
0     70.01655     68.33926  Yamal-Gydan  Rodenhizer  2023-05-01,2023-09-30   
1     70.01543     68.34071  Yamal-Gydan  Rodenhizer  2023-05-01,2023-09-30   
2     70.01652     68.33235  Yamal-Gydan  Rodenhizer  2023-05-01,2023-09-30   
3     70.01531     68.33115  Yamal-Gydan  Rodenhizer  2023-05-01,2023-09-30   
4     70.01457     68.33342  Yamal-Gydan  Rodenhizer  2023-05-01,2023-09-30   
5     70.01448     68.33495  Yamal-Gydan  Rodenhizer  2023-05-01,2023-09-30   
6     70.01543     68.34071  Yamal-Gydan  Rodenhizer   2022-05-01,2022-9-30   

  BaseMapSource  BaseMapResolution TrainClass LabelType  \
0   WorldView-2                4.0   Positive   Polygon   
1   WorldView-2                4.0   Positive   Polygon   
2   WorldView-2                4.0   Positive   Polygon   
3   WorldView-2                4.0   Positive   Polygon   
4   WorldView-2                4.0   Positive   Polygon   
5   WorldView

Unnamed: 0,CentroidLat,CentroidLon,RegionName,CreatorLab,BaseMapDate,BaseMapSource,BaseMapResolution,TrainClass,LabelType,geometry,BaseMapResolutionStr,seed,UID,Intersections,SelfIntersections
0,70.01655,68.33926,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007199.012 865984.608, 2007188.217 ...",4,70.0165568.33926Yamal-GydanRodenhizer2023-05-0...,697680a4-9707-59fb-aabb-540308cf0705,"b4bae416-9fde-5d91-920d-731bcf042b2d,10f75ab9-...",
1,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007289.337 866129.959, 2007283.521 ...",4,70.0154368.34071Yamal-GydanRodenhizer2023-05-0...,edb47fed-2c5d-59f0-9609-dd230ab25a58,,aedeff78-0897-5159-aefd-c5a5885475c8
2,70.01652,68.33235,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007340.152 865838.514, 2007326.959 ...",4,70.0165268.33235Yamal-GydanRodenhizer2023-05-0...,d10b5ffe-ff73-57cf-a12b-4b74142f0b98,ff0d265e-385c-53c2-9c3a-28e885a220d2,
3,70.01531,68.33115,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007453.557 865845.775, 2007456.416 ...",4,70.0153168.33115Yamal-GydanRodenhizer2023-05-0...,1fae5c95-c99a-5400-ad29-6273ecbbaf94,297dc622-3584-5d79-8b7b-b4f5a67fa8a4,
4,70.01457,68.33342,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007492.587 865949.766, 2007492.342 ...",4,70.0145768.33342Yamal-GydanRodenhizer2023-05-0...,e9627675-6398-5d4f-a3c0-19d0ef5511df,7c64ad8e-07be-5ba5-8f97-19374f809af1,
5,70.01448,68.33495,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007479.747 865990.850, 2007472.484 ...",4,70.0144868.33495Yamal-GydanRodenhizer2023-05-0...,f6593433-6229-5324-85cf-e3edccae5420,e36abf57-ffb3-5c6a-be32-d8278f385a73,
6,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-9-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007289.337 866129.959, 2007283.521 ...",4,70.0154368.34071Yamal-GydanRodenhizer2022-05-0...,aedeff78-0897-5159-aefd-c5a5885475c8,,edb47fed-2c5d-59f0-9609-dd230ab25a58


At this point, you will need to manually check all polygons with intersections against the polygons in the official RTS data set and polygons with self intersections against themselves in your preferred GIS software and save the output as a geojson file. When possible/necessary, try to find imagery that matches the date of the intersecting polygons - this may require contacting the lab that did the original delineation.

Your job is to inspect each of the previously published polygons listed in the 'Intersections' column compared to the new RTS feature and manually copy and paste the UIDs from the 'Intersections' column into the 'RepeatRTS', 'StabilizedRTS', 'NewRTS', 'MergedRTS', 'AccidentalOverlap', or 'UnknownRelationship' based on the relationship between the two polygons. Similarly, you need to inspect each of the polygons listed in the 'SelfIntersections' column and copy and paste the UIDs from the 'RepeatRTS', 'StabilizedRTS', 'NewRTS', 'MergedRTS', 'AccidentalOverlap', or 'UnknownRelationship' based on the relationship between the two polygons.

- Paste the UID into the RepeatRTS column when the RTS feature in the current row is the same RTS feature as the RTS feature in the 'Intersections' or 'SelfIntersections' column, but was delineated at a different point in time, by a different lab at the same point in time, or from different imagery at the same point in time. The RTS feature is the same when it was the result of the same RTS initiation event.

- Paste the UID into the NewRTS column when the RTS feature in the 'Intersections' or 'SelfIntersections' column is a new RTS feature which formed on top of the RTS feature in the current row.

- Paste the UID into the StabilizedRTS column when the RTS feature in the 'Intersections' or 'SelfIntersections' column is a stabilized RTS scar as of the date of the imagery used in the new RTS delineations.

- Paste the UID into the MergedRTS column when multiple RTS features in the 'Intersections' or 'SelfIntersections' column merged to form the new RTS feature.

- Paste the UID into the AccidentalOverlap column when inaccuracies in delineation of separate RTS features lead to overlap (e.g. features that are very close to each other and the polygons barely overlap). 

- If you are unable to determine the relationship based on an inspection of the original imagery and the available information, you can copy the UID into the UnknownRelationship column. NOTE: This should be a last resort used in rare occasions (e.g. the researcher who delineated the feature cannot be contacted and insufficient information was recorded to make a reasonable confident decision), as it will limit the utility of the row of data to researchers.  

When this is done, each of the UIDs in the 'Intersections' and 'SelfIntersections' columns should have been copied into one (and only one) of the 'RepeatRTS', 'StabilizedRTS', 'NewRTS', 'MergedRTS', 'AccidentalOverlap', or 'UnknownRelationship' columns.


# Load Manually Edited File and Join to Processed Data

Add the 'RepeatRTS', 'StabilizedRTS', 'NewRTS', 'MergedRTS', 'AccidentalOverlap', or 'UnknownRelationship' columns that you just edited back into `new_dataset`.

In [17]:
from IPython.display import display, HTML
display(HTML("<style>.jp-OutputArea-output {display:flex}</style>"))


In [18]:
# path to the manually-edited file
if demo:
    edited_filepath = base_dir / 'Tutorial' / 'mock_dataset' / 'output' / (
        str(your_rts_dataset_file).split('.')[0] + "_overlapping_polygons_edited.geojson"
    )

else:
    edited_filepath = base_dir / 'output' / (
        str(your_rts_dataset_file).split('.')[0] + "_overlapping_polygons_edited.geojson"
        )

merged_data = dataformatting.merge_data(new_dataset, edited_filepath)
merged_data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data[column].loc[new_data[column] == 'nan'] = ''
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data[column].loc[new_data[column] == 'nan'] = ''
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data[column].loc[new_data[column] == 'nan'] = ''
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data[

Unnamed: 0,CentroidLat,CentroidLon,RegionName,CreatorLab,BaseMapDate,BaseMapSource,BaseMapResolution,TrainClass,LabelType,geometry,...,UID,Intersections,SelfIntersections,RepeatRTS,MergedRTS,NewRTS,StabilizedRTS,AccidentalOverlap,UnknownRelationship,ContributionDate
0,70.01655,68.33926,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007199.012 865984.608, 2007188.217 ...",...,697680a4-9707-59fb-aabb-540308cf0705,"b4bae416-9fde-5d91-920d-731bcf042b2d,10f75ab9-...",,,"b4bae416-9fde-5d91-920d-731bcf042b2d,10f75ab9-...",,,,,2024-01-31
1,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007289.337 866129.959, 2007283.521 ...",...,aedeff78-0897-5159-aefd-c5a5885475c8,,aedeff78-0897-5159-aefd-c5a5885475c8,aedeff78-0897-5159-aefd-c5a5885475c8,,,,,,2024-01-31
2,70.01652,68.33235,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007340.152 865838.514, 2007326.959 ...",...,ff0d265e-385c-53c2-9c3a-28e885a220d2,ff0d265e-385c-53c2-9c3a-28e885a220d2,,ff0d265e-385c-53c2-9c3a-28e885a220d2,,,,,,2024-01-31
3,70.01531,68.33115,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007453.557 865845.775, 2007456.416 ...",...,1fae5c95-c99a-5400-ad29-6273ecbbaf94,297dc622-3584-5d79-8b7b-b4f5a67fa8a4,,,,,297dc622-3584-5d79-8b7b-b4f5a67fa8a4,,,2024-01-31
4,70.01457,68.33342,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007492.587 865949.766, 2007492.342 ...",...,e9627675-6398-5d4f-a3c0-19d0ef5511df,7c64ad8e-07be-5ba5-8f97-19374f809af1,,,,,,7c64ad8e-07be-5ba5-8f97-19374f809af1,,2024-01-31
5,70.01448,68.33495,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007479.747 865990.850, 2007472.484 ...",...,e36abf57-ffb3-5c6a-be32-d8278f385a73,e36abf57-ffb3-5c6a-be32-d8278f385a73,,e36abf57-ffb3-5c6a-be32-d8278f385a73,,,,,,2024-01-31
6,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-9-30",WorldView-2,4.0,Positive,Polygon,"POLYGON ((2007289.337 866129.959, 2007283.521 ...",...,aedeff78-0897-5159-aefd-c5a5885475c8,,edb47fed-2c5d-59f0-9609-dd230ab25a58,edb47fed-2c5d-59f0-9609-dd230ab25a58,,,,,,2024-01-31


# Check Completeness of Intersection Information

In [19]:
if demo:
    dataformatting.check_intersection_info(merged_data, None, base_dir, demo)
else:
    dataformatting.check_intersection_info(merged_data, your_rts_dataset_file, base_dir, demo)

Intersection information is complete.


# Final Column Selection

In [20]:
formatted_data = dataformatting.add_empty_columns(
    merged_data,
    [col for col in optional_fields],
)

formatted_data = formatted_data[all_fields + ["geometry"]]

formatted_data

Unnamed: 0,CentroidLat,CentroidLon,RegionName,CreatorLab,BaseMapDate,BaseMapSource,BaseMapResolution,TrainClass,LabelType,MergedRTS,NewRTS,StabilizedRTS,UnknownRelationship,ContributionDate,UID,BaseMapID,Area,Notes,geometry
0,70.01655,68.33926,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,"b4bae416-9fde-5d91-920d-731bcf042b2d,10f75ab9-...",,,,2024-01-31,697680a4-9707-59fb-aabb-540308cf0705,,,,"POLYGON ((2007199.012 865984.608, 2007188.217 ..."
1,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,,,,,2024-01-31,aedeff78-0897-5159-aefd-c5a5885475c8,,,,"POLYGON ((2007289.337 866129.959, 2007283.521 ..."
2,70.01652,68.33235,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,,,,,2024-01-31,ff0d265e-385c-53c2-9c3a-28e885a220d2,,,,"POLYGON ((2007340.152 865838.514, 2007326.959 ..."
3,70.01531,68.33115,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,,,297dc622-3584-5d79-8b7b-b4f5a67fa8a4,,2024-01-31,1fae5c95-c99a-5400-ad29-6273ecbbaf94,,,,"POLYGON ((2007453.557 865845.775, 2007456.416 ..."
4,70.01457,68.33342,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,,,,,2024-01-31,e9627675-6398-5d4f-a3c0-19d0ef5511df,,,,"POLYGON ((2007492.587 865949.766, 2007492.342 ..."
5,70.01448,68.33495,Yamal-Gydan,Rodenhizer,"2023-05-01,2023-09-30",WorldView-2,4.0,Positive,Polygon,,,,,2024-01-31,e36abf57-ffb3-5c6a-be32-d8278f385a73,,,,"POLYGON ((2007479.747 865990.850, 2007472.484 ..."
6,70.01543,68.34071,Yamal-Gydan,Rodenhizer,"2022-05-01,2022-9-30",WorldView-2,4.0,Positive,Polygon,,,,,2024-01-31,aedeff78-0897-5159-aefd-c5a5885475c8,,,,"POLYGON ((2007289.337 866129.959, 2007283.521 ..."


In [21]:
new_increment = int(re.split('-', re.split('\\.', dataset_version)[1])[0]) + 1
new_version = ['.'.join(re.split('\\.', dataset_version)[0:1] +  [str(new_increment)] + ['0', '0'])]

updated_ARTS_filepath = base_dir / 'ARTS_main_dataset' / new_version[0]

dataformatting.output(
    formatted_data,
    ARTS_main_dataset,
    new_fields,
    all_fields,
    base_dir,
    your_rts_dataset_file,
    updated_ARTS_filepath,
    separate_file,
    demo
)

Now you are ready to submit the above file.