## **Reformat of svg files to fit the format of Matlab code**

In [1]:
from xml.dom import minidom
import svgpathtools 
from svgpathtools import wsvg, Line, QuadraticBezier, Path, parse_path
import re
import os
import os.path

## **Functions**
The findSVGFiles() function is responsible to read all the svg files of the directory. With this function it can
read all the subfolders of the direcory and find the .svg files. It stores directory path of the file along with the file name
in the files_list and the 'category + "//" + filename' in the file_names list.

The create_directory() function is respnsible to create a folder at the given path.

The findTheDPathsOfSVGs() function, is the function that parse the svg files and select the path of the svgs. The svg path along with
the file 'category + "\\" + name' is stored as a tuple.

The bezierCurveAndLineCheck() function takes as input a set of allowed letters to check for the path. The allowed letters from the 
Matlab code I have are {M, C, L, Z} so the purpose of this function is to check if the paths contain a subset of this letters. It is a check
to ensure that the paths have the correct form and discard the ones that does not fullfill the requirement.


In [2]:
def findSVGFiles(folder_src_path):
    files_list = []
    file_names = []
    for src_path, src_names, filenames in os.walk(folder_src_path):
        for filename in [f for f in filenames if f.endswith('.svg')]:
            files_list.append(src_path + "\\" + filename)
            #category + name
            file_names.append(src_path.replace(folder_src_path + "\\", "") + "\\" + filename)
    return files_list, file_names

def create_directory(path):
    try:
        os.mkdir(path)
    except OSError:
        print ("Creation of the directory %s failed" % path)
    else:
        print ("Successfully created the directory %s " % path)

def findTheDPathsOfSVGs(files_list, file_names):
    name_and_path_list = []
    i = 0
    for svg_file in files_list:
        doc = minidom.parse(svg_file)  # parseString also exists
        path_strings = [path.getAttribute('d') for path
                        in doc.getElementsByTagName('path')]
        # The svg files have only one path in my case, that is why i am taking
        # only path_strings[0]
        name_and_path_list.append((file_names[i], path_strings[0]))
        i = i + 1
        doc.unlink()
    return name_and_path_list

def bezierCurveAndLineCheck(allowed_letters, name_and_path_list):
    allowed_paths = []
    for i in range(0, len(name_and_path_list)):
        letters = set()
        path_string =  ''.join(filter(str.isalpha, name_and_path_list[i][1]))
        for letter in path_string:
            letters.add(letter)

        # (A Union B)
        a_or_b = allowed_letters.union(letters)
        if len(a_or_b) == len(allowed_letters):
            allowed_paths.append(name_and_path_list[i])
    return allowed_paths


The following code creates the list with the directory paths of the original svgs, and stores the name and the svg path into the name_and_path_list.

In [3]:
folder_src_path = "C:\\Users\\arist\\Desktop\\icon"
files_list, file_names = findSVGFiles(folder_src_path)

name_and_path_list = findTheDPathsOfSVGs(files_list, file_names)


This code is responsible to do the check for the allowed letters into the svg paths for the original svgs. It checks which original svgs (52) fullifill the requirements and prints the names along with their category.

In [4]:
allowed_letters = {'m', 'M', 'C', 'c', 'Z', 'z', 'L', 'l'}
allowed_paths = bezierCurveAndLineCheck(allowed_letters, name_and_path_list)

print(len(allowed_paths))
for path in allowed_paths:
    print(path[0])

52
accessibility\accessible-icon.svg
arrows\angle-double-down.svg
arrows\angle-double-left.svg
arrows\angle-double-right.svg
arrows\angle-double-up.svg
arrows\angle-down.svg
arrows\angle-left.svg
arrows\angle-right.svg
arrows\angle-up.svg
arrows\chevron-down.svg
arrows\chevron-left.svg
arrows\chevron-right.svg
arrows\chevron-up.svg
beverage\wine-bottle.svg
business\certificate.svg
chat\comment-slash.svg
code\code.svg
communication\phone-slash.svg
communication\phone.svg
construction\ruler.svg
currency\ethereum.svg
currency\gg.svg
design\magic.svg
design\paint-brush.svg
design\pen.svg
design\splotch.svg
design\tint-slash.svg
editors\unlink.svg
education\theater-masks.svg
fitness\heart.svg
fitness\heart1.svg
food\candy-cane.svg
food\lemon.svg
games\xbox.svg
interfaces\check-double.svg
interfaces\check.svg
mathematics\percentage.svg
mathematics\times.svg
objects\gavel.svg
objects\glass-cheers.svg
objects\utensil-spoon.svg
pharmacy\syringe.svg
shapes\heart-broken.svg
shapes\star.svg
shapes

This code translate the svg paths of the original svgs into path gameobject of svgpathtool library. It checks if the class objects in tha path object have only the name "Line" and "CubicBezier" and discards the path objects that doesnt fullfill the check. With this transformation 845 objects are created with lines and cubic bezier out of 1188 which is promising since the original svg paths with lines and cubic beziers were only 52. Then it writes the path objects as svgs into a new folder with category subfolders.

In [5]:
# valid_list has all the svgs which are translated into lines and cubic bezier
# in valid_list the 'category + "\\" + filename" is stored along with path object
# the path object has objects of Line and CubicBezier only
counter = 0
valid_list = []
for i in range(0, len(name_and_path_list)):
    path_alt = parse_path(name_and_path_list[i][1])
    for path in path_alt:
        is_valid = True
        if path.__class__.__name__ !="Line" and path.__class__.__name__ !="CubicBezier":
            counter = counter + 1
            is_valid = False
            break
    if is_valid:
        valid_list.append((name_and_path_list[i][0].replace(folder_src_path, ""), path_alt))


print(len(valid_list))

# Create new svg with only C and L
dst_path = "C:\\Users\\arist\\Desktop\\ReformatedIcons\\"
create_directory(dst_path)
for cat_name, path in valid_list:
    category = cat_name.split("\\")[0]
    create_directory(dst_path + category)
    wsvg(path, filename=dst_path + cat_name)

845
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\ failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\ReformatedIcons\alert failed
Creation of the d

With this code we read the svg paths as before, but for the new reformated svgs

In [6]:

# The path that the SVGs are created
reformated_src_path = "C:\\Users\\arist\\Desktop\\ReformatedIcons"
reformated_list, reformated_names = findSVGFiles(reformated_src_path)

reformated_name_and_path_list = findTheDPathsOfSVGs(reformated_list, reformated_names)

I do a check with the allowed letters to ensure that the new paths have only the allowed letters. The letter "e" is added because some coordinates have power of 10 which is (e^x)

In [8]:
# the letter e is allowed because it is the power of 10 for some points
allowed_letters = {'M', 'C', 'L', 'e'}

reformated_allowed_paths = bezierCurveAndLineCheck(allowed_letters, reformated_name_and_path_list)

print(len(reformated_allowed_paths))

845


The function createCommandDictionary() creates a list of dictionaries. Each dictionary is either M, L and C command. If the command is M or L we store their names as "name" and their start coordinates as "start". If it is a C command we store the "name", start coordinates "start", controls "controls" and end coordinates as "end". The coordinates and controls are stored as one string value separated with comma.

The createCcommand() function is responsible to create a sequence of C commands if they are succussive. It returns the sequence string, the end coordinates of the last C command and the index in the list of the command after the last C command.

The createSVGpathList() function creates the path strings depending on the commands. The Matlab code for the stroke removal and deformation has a specific format. If the C commands are successive are allowed to be in the same path. L commands should be separated into different paths, no matter what. When we seperate the commands in different paths the end point of the last command of the previous path is the (M)oving point of the next path. Each path has an M command, and either it has a sequence of C commands (or only one C command) or exactly one L command. 
This code implements the paths following the constraints above.

The createSVGfileFormat() creates an svg file following the exact format of the svg files of the Matlab code.

Running the code we get the final format of the svg files. With this format we are able to run the Matlab code which supports only M, C, L commands.

Matlab Code:https://github.com/yuqian1023/sketch-specific-data-augmentation 

In [9]:
def createCommandDictionary(splitted_path):
    # create a list of dictionaries, each dictionary is a command
    command_list = []
    count = 0
    while count < len(splitted_path):
        command_dict = {}
        if splitted_path[count] == "M" or splitted_path[count] == "L":
            command_dict["name"] = splitted_path[count]
            command_dict["start"] = splitted_path[count+1]
            count = count + 2
            command_list.append(command_dict)
        elif splitted_path[count] == "C":
            command_dict["name"] = splitted_path[count]
            command_dict["start"] = splitted_path[count+1]
            command_dict["controls"] = splitted_path[count+2]
            command_dict["end"] = splitted_path[count+3]
            count = count + 4
            command_list.append(command_dict)
        else:
            print("Command format is wrong")
    return command_list

def createCcommand(command_list, start_index):
    c_string = command_list[start_index]["name"] + " " 
    c_string = c_string + command_list[start_index]["start"] + " "
    c_string = c_string + command_list[start_index]["controls"] + " "
    c_string = c_string + command_list[start_index]["end"]
    end_point = command_list[start_index]["end"]
    end_index = start_index + 1

    for i in range(start_index, len(command_list)):
        if i + 1 > len(command_list) - 1:
            break
        elif command_list[i]["name"] == "C" and command_list[i + 1]["name"] == "C":
            c_string = c_string + " " + command_list[i + 1]["name"] + " " 
            c_string = c_string + command_list[i + 1]["start"] + " "
            c_string = c_string + command_list[i + 1]["controls"] + " "
            c_string = c_string + command_list[i + 1]["end"]
            end_point = command_list[i + 1]["end"]
            end_index = end_index + 1
        else:
            break
    return c_string, end_index, end_point

def createSVGpathList(command_list):
    index = 0
    svg_path_list = []
    while(index < len(command_list)):
        if command_list[index]["name"] == "M":
            m_string = command_list[index]["name"] + " " + command_list[index]["start"]
            if command_list[index + 1]["name"] == "L":
                m_string = m_string + " " + command_list[index + 1]["name"] + " " + command_list[index + 1]["start"]
                global_end_point = command_list[index + 1]["start"]
                index = index + 2
                svg_path_list.append(m_string)
            elif command_list[index + 1]["name"] == "C":
                c_string, end_index, c_end_point = createCcommand(command_list, index + 1)
                index = end_index
                svg_path_list.append(m_string + " " + c_string)
                global_end_point = c_end_point
            else:
                print("Something went wrong")
        elif command_list[index]["name"] == "C":
            c_string, end_index, c_end_point = createCcommand(command_list, index)
            svg_path_list.append("M " + global_end_point + " " + c_string)
            global_end_point = c_end_point
            index = end_index
        elif command_list[index]["name"] == "L":
            l_string = "M " + global_end_point + " " + command_list[index]["name"] + " " + command_list[index]["start"]
            global_end_point = command_list[index]["start"]
            index = index + 1
            svg_path_list.append(l_string)
        else:
            print("Something went wrong")
    return svg_path_list

def createSVGfileFormat(svg_path_list, svg_file_name, stroke_width):
    headline1 = '<?xml version="1.0" encoding="utf-8"?>\n'
    headline2 = '<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">\n'
    headline3 = '<svg viewBox="0 0 800 800" preserveAspectRatio="xMinYMin meet" xmlns="http://www.w3.org/2000/svg" version="1.1"\n>'
    headline4 = '<g fill="none" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="' + str(stroke_width) + '">\n'
    headline5 = '<g transform="translate(100,100) scale(1) translate(0,0)">\n'

    with open(svg_file_name, "w+") as f:
        f.write(headline1)
        f.write(headline2)
        f.write(headline3)
        f.write(headline4)
        f.write(headline5)

        count = 0
        for path in svg_path_list:
            path = "<path id=" + '"' + str(count) +'"' + ' d="' + path + '"/>\n'
            count = count + 1
            f.write(path)
        
        f.write('</g>\n')
        f.write('</g>\n')
        f.write('</svg>')
    f.close()

final_format_dst_path = "C:\\Users\\arist\\Desktop\\FinalFormatIcons\\"
create_directory(final_format_dst_path)

stroke_width = 1.2    
for i in range(0, len(reformated_allowed_paths)):
    # split the path with space
    svg_path = reformated_allowed_paths[i][1]
    splitted_path = svg_path.split(" ")

    command_list = createCommandDictionary(splitted_path)
    svg_path_list = createSVGpathList(command_list)

    category = reformated_allowed_paths[i][0].split("\\")[0]
    create_directory(final_format_dst_path + category)
    createSVGfileFormat(svg_path_list, final_format_dst_path + reformated_allowed_paths[i][0], stroke_width)



Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\ failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\accessibility failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\alert failed
Creation of the directory C:\Users\arist\Desktop\FinalFormatIcons\alert failed
Creation 