# Train-validation tagging

How to split training dataset to train/validation using tags

**Input**:
- Source Project
- Train-validation split ratio

**Output**:
- New Project with images randomly tagged by `train` or `val`, based on split ration

## Configuration

Edit the following settings for your own case

In [12]:
import supervisely_lib as sly
from tqdm import tqdm
import random
import os

In [2]:
team_name = "jupyter_tutorials"
workspace_name = "cookbook"
project_name = "tutorial_project"

dst_project_name = "tutorial_project_tagged"

validation_portion = 0.1

tag_meta_train = sly.TagMeta('train', sly.TagValueType.NONE)
tag_meta_val = sly.TagMeta('val', sly.TagValueType.NONE)

# Obtain server address and your api_token from environment variables
# Edit those values if you run this notebook on your own PC
address = os.environ['SERVER_ADDRESS']
token = os.environ['API_TOKEN']

In [4]:
# Initialize API object
api = sly.Api(address, token)

## Verify input values

Test that context (team / workspace / project) exists

In [5]:
# Get IDs of team, workspace and project by names

team = api.team.get_info_by_name(team_name)
if team is None:
    raise RuntimeError("Team {!r} not found".format(team_name))

workspace = api.workspace.get_info_by_name(team.id, workspace_name)
if workspace is None:
    raise RuntimeError("Workspace {!r} not found".format(workspace_name))
    
project = api.project.get_info_by_name(workspace.id, project_name)
if project is None:
    raise RuntimeError("Project {!r} not found".format(project_name))
    
print("Team: id={}, name={}".format(team.id, team.name))
print("Workspace: id={}, name={}".format(workspace.id, workspace.name))
print("Project: id={}, name={}".format(project.id, project.name))

Team: id=30, name=jupyter_tutorials
Workspace: id=76, name=cookbook
Project: id=898, name=tutorial_project


## Get Source ProjectMeta

In [6]:
project = api.project.get_info_by_name(workspace.id, project_name)
meta_json = api.project.get_meta(project.id)
meta = sly.ProjectMeta.from_json(meta_json)
print("Source ProjectMeta: \n", meta)

Source ProjectMeta: 
 ProjectMeta:
Object Classes
+--------+-----------+----------------+
|  Name  |   Shape   |     Color      |
+--------+-----------+----------------+
| person |   Bitmap  |  [0, 255, 18]  |
|  dog   |  Polygon  |  [253, 0, 0]   |
|  car   |  Polygon  | [190, 85, 206] |
|  bike  | Rectangle | [246, 255, 0]  |
+--------+-----------+----------------+
Image Tags
+-------------+--------------+-----------------------+
|     Name    |  Value type  |    Possible values    |
+-------------+--------------+-----------------------+
| cars_number |  any_number  |          None         |
|     like    |     none     |          None         |
|   situated  | oneof_string | ['inside', 'outside'] |
+-------------+--------------+-----------------------+
Object Tags
+---------------+--------------+-----------------------+
|      Name     |  Value type  |    Possible values    |
+---------------+--------------+-----------------------+
|  vehicle_age  | oneof_string | ['modern', 'vintag

## Construct Destination ProjectMeta

In [7]:
def process_meta(input_meta):
    output_meta = input_meta.clone(obj_classes=[])    
    output_meta = output_meta.add_img_tag_meta(tag_meta_train)
    output_meta = output_meta.add_img_tag_meta(tag_meta_val)
    return output_meta

In [8]:
dst_meta = process_meta(meta)
print("Destination ProjectMeta:\n", dst_meta)

Destination ProjectMeta:
 ProjectMeta:
Object Classes
+--------+-----------+----------------+
|  Name  |   Shape   |     Color      |
+--------+-----------+----------------+
| person |   Bitmap  |  [0, 255, 18]  |
|  dog   |  Polygon  |  [253, 0, 0]   |
|  car   |  Polygon  | [190, 85, 206] |
|  bike  | Rectangle | [246, 255, 0]  |
+--------+-----------+----------------+
Image Tags
+-------------+--------------+-----------------------+
|     Name    |  Value type  |    Possible values    |
+-------------+--------------+-----------------------+
| cars_number |  any_number  |          None         |
|     like    |     none     |          None         |
|   situated  | oneof_string | ['inside', 'outside'] |
|    train    |     none     |          None         |
|     val     |     none     |          None         |
+-------------+--------------+-----------------------+
Object Tags
+---------------+--------------+-----------------------+
|      Name     |  Value type  |    Possible values

## Create Destination project

In [14]:
# check if destination project already exists. If yes - generate new free name
if api.project.exists(workspace.id, dst_project_name):
    dst_project_name = api.project.get_free_name(workspace.id, dst_project_name)
print("Destination project name: ", dst_project_name)

Destination project name:  tutorial_project_tagged_001


In [15]:
dst_project = api.project.create(workspace.id, dst_project_name)
api.project.update_meta(dst_project.id, dst_meta.to_json())
print("Destination project has been created: id={}, name={!r}".format(dst_project.id, dst_project.name))

Destination project has been created: id=904, name='tutorial_project_tagged_001'


## Iterate over all images, tag them and add to destination project

In [16]:
for dataset in api.dataset.get_list(project.id):
    print('Dataset: {}'.format(dataset.name))
    dst_dataset = api.dataset.create(dst_project.id, dataset.name)

    for image in tqdm(api.image.get_list(dataset.id)):
        ann_json = api.annotation.download(image.id).annotation
        ann = sly.Annotation.from_json(ann_json, meta)
        
        tag = sly.Tag(tag_meta_val) if random.random() <= validation_portion else sly.Tag(tag_meta_train)
        ann = ann.add_tag(tag)
        
        dst_image = api.image.add(dst_dataset.id, image.name, image.hash)
        api.annotation.upload(dst_image.id, ann.to_json())

  0%|          | 0/3 [00:00<?, ?it/s]

Dataset: dataset_01


100%|██████████| 3/3 [00:00<00:00,  5.94it/s]
 50%|█████     | 1/2 [00:00<00:00,  8.58it/s]

Dataset: dataset_02


100%|██████████| 2/2 [00:00<00:00,  8.95it/s]


In [17]:
print("Project {!r} has been sucessfully uploaded".format(dst_project.name))
print("Number of images: ", api.project.get_images_count(dst_project.id))

Project 'tutorial_project_tagged_001' has been sucessfully uploaded
Number of images:  5
