# SDA Coding Challenge 
**Applicant: Sasa Redzepovic**

 ## Task introduction

Import Yolo object detection output.  
 
A Yolo bases network detects different objects with a camera. We have the output of an example execution. In 
this challenge we want to parse the process output of the example and setup a data structure for analysis.  

 ## Features we would like to see:  
1. Python based program which reads the example output attached to this task 
2. A data structure in memory to work with.   
3. Some base statistics like: 
    1. How many object detections the run contains? 
    2. How many different objects was detected? 
    3. Is there a typical position of the objects in the camera view part. 
4. An Idea no implementation how your program would work, if the output is a stream and not a text file.

## Import packages

In [1]:
import re
import io
import os
import numpy as np

In [2]:
# path to data
file_path = "data/pkg_dump.txt"

## 1. Python based program which reads the example output attached to this task 
## Notes:
I detected that the output contained information from 2 demos. I decided to split the demos into 2 separat text files. I used the word "Demo" as a starting and stopping point for the split. I saved the 2 text files as demo0.txt and demo1.txt in the data/demo/ directory.  
I continiued with feature 2 using demo0.  

I also preprocessed the files. This was mainly done to get a nicer layout of the file when it is printed and to guide mi decision process.  
1. Removed all double linebreaks '\n\n'
2. Renamed first occurance of Object since it did not contain more information
3. I assume that some objects have been captured within the same frame, because these objects were summarized within one "Objects" message. I decided to add the message object to the lines were it was missing. FPS and AVG_FPS were missing for some categories. I copied the values from the line before since I think that they are in the same frame. but see output and explanation below  
4. I Added a linebreak to the key word FPS, this way the output was correctly aligned.

In [3]:
from scripts.parse_doc_utils import _split_and_preprocess

demos_file_path,original_demo1,preporcessed_demo1  = _split_and_preprocess(file_path)

There were 2 demos detected in the challenge text file.
---------------
The challenge text file was split into 2 separate text files.
The text files were renamed and saved under ['./data/demos/demo0.txt', './data/demos/demo1.txt']
---------------


 ## Example output were multiple objects are detected in frame. 

In [4]:
# find lines
#[print(x) for x in re.finditer(('Flugzeug'),original_demo1)]
print('-'*150)
print(original_demo1[10730:11000])
print('-'*150)

------------------------------------------------------------------------------------------------------------------------------------------------------
Flugzeug:   95%   (left_x:  109   top_y:  36   width:  201   height:  52)

FPS:15.9 	 AVG_FPS:15.7
Objects:

Flachhänger: 100% 	(left_x:  423   top_y:  437   width:  455   height:  260)
Flugzeug:   92%   (left_x:  105   top_y:  34   width:  204   height:  57)

FPS:16.0 
------------------------------------------------------------------------------------------------------------------------------------------------------


**Flachhänger and Flugzeug are summarised within one objects, FPS and AVG_FPS message.**

 ## Example output preprocessed file

In [5]:
# find lines
#[print(x) for x in re.finditer(('Flugzeug'),preporcessed_demo1)]
print('-'*150)
print(preporcessed_demo1[10800:12000])
print('-'*150)

------------------------------------------------------------------------------------------------------------------------------------------------------
  456   height:  261)
FPS: NULL 	 AVG_FPS: NULL Objects: Flugzeug:   95%   (left_x:  109   top_y:  36   width:  201   height:  52) 
FPS: 15.9 	 AVG_FPS: 15.7 Objects: Flachhänger: 100% 	(left_x:  423   top_y:  437   width:  455   height:  260)
FPS: NULL 	 AVG_FPS: NULL Objects: Flugzeug:   92%   (left_x:  105   top_y:  34   width:  204   height:  57) 
FPS: 16.0 	 AVG_FPS: 15.7 Objects: Flachhänger: 100% 	(left_x:  424   top_y:  437   width:  453   height:  259)
FPS: NULL 	 AVG_FPS: NULL Objects: Flugzeug:   86%   (left_x:  109   top_y:  42   width:  207   height:  52) 
FPS: 15.9 	 AVG_FPS: 15.7 Objects: Flachhänger: 100% 	(left_x:  423   top_y:  437   width:  454   height:  259)
FPS: NULL 	 AVG_FPS: NULL Objects: Flugzeug:   91%   (left_x:  115   top_y:  45   width:  209   height:  56) 
FPS: 15.9 	 AVG_FPS: 15.7 Objects: Flachhänger: 100

**I addedd "FPS: NULL  AVG_FPS: NULL Objects:" to all lines were a Flugzeug was detected. The output looks neat and is better comprehensible for humans.** 

 ## 2. A data structure in memory to work with. 
 ## Notes:
Feature 2 asks for a data structure in memory. I have decided not to use SQlite or any other database, since the instruction explicitly asked for a structure. I am unaware of any kind of special structure that might suffice this condition. I therefore assume I can use any data structure which is stored in RAM. So far as I know this includes dictionaries, list,tuples and dataframes. I feel this might be a trick question.  

I build a dictionary with the following keys: "Objects:","left_x:","top_y:","width:","height:","\nFPS:","AVG_FPS:","Video stream:","Accuracy"  
The values consist of lists of strings, integers or floats.
I read out the values for all but one variable using regular expression. The Accuracy read out is a bit hard coded. I look for % signs in the text and index 3 steps back to get the percentages.  







In [6]:
from scripts.data_struct_utils import _build_data_struct

data_struct,text = _build_data_struct(demos_file_path[0],'dict')
print('-'*150)
print(data_struct.keys())
print('-'*150)

Keys have the same lengths
------------------------------------------------------------------------------------------------------------------------------------------------------
dict_keys(['Objects', 'left_x', 'top_y', 'width', 'height', 'FPS', 'AVG_FPS', 'Video stream', 'Accuracy'])
------------------------------------------------------------------------------------------------------------------------------------------------------


 ## 3. Some base statistics like:

 ### 3A How many object detections the run contains?

In [7]:
from scripts.data_struct_utils import _get_number_of_object_detections

nr_of_obj_det = _get_number_of_object_detections(data_struct)

print('-'*150)
print(f'{nr_of_obj_det} objects were detected if we count all objects separately.')
print('-'*150)

------------------------------------------------------------------------------------------------------------------------------------------------------
991 objects were detected if we count all objects separately.
------------------------------------------------------------------------------------------------------------------------------------------------------


**I calculate the absolute detections over all objects here. Not sure if it would be possible to use the FPS to see if multiple objects of 1 category were present**

 ### 3B How many different objects were detected? 

In [8]:
from scripts.data_struct_utils import _get_number_of_unique_objects

nr_of_obj, obj = _get_number_of_unique_objects(data_struct)

print('-'*150)
print(f'There were {nr_of_obj} unique objects: {obj}.')
print('-'*150)

------------------------------------------------------------------------------------------------------------------------------------------------------
There were 2 unique objects: ['Flugzeug', 'Flachhänger'].
------------------------------------------------------------------------------------------------------------------------------------------------------


### 3C Is there a typical position of the objects in the camera view part.
I split the data structure into 2 categories and converted the numerical data to a np.array in order to calculate the mean position of the objects.

In [9]:
from scripts.data_struct_utils import _get_typical_position, _get_means_for_print_output

groups, mean_dict, ident_dict = _get_typical_position(data_struct)

for key in groups.keys():
    top,bot,w,h = _get_means_for_print_output(mean_dict, key)
    print(f'The screen resolution was {data_struct["Video stream"][0]} x {data_struct["Video stream"][1]}.\n\nThe mean position for a {key} is:\nmean xy(top): {top}\nmean xy(bot): {bot}\nmean width: {w}\nmean hight: {h} ')
    print('-'*150)


The screen resolution was 1620 x 1283.

The mean position for a Flugzeug is:
mean xy(top): (104.67, 35.33)
mean xy(bot): (301.0, 85.33)
mean width: 196.33
mean hight: 50.0 
------------------------------------------------------------------------------------------------------------------------------------------------------
The screen resolution was 1620 x 1283.

The mean position for a Flachhänger is:
mean xy(top): (378.84, 440.61)
mean xy(bot): (846.36, 712.74)
mean width: 467.52
mean hight: 272.14 
------------------------------------------------------------------------------------------------------------------------------------------------------


**I assumed that the data was correct here so I didn't check for outlier bounding boxes. Next I would also plot all detection boxes and calculate a probability density map for each category**  
We can see that the y positions for the Flugzeug is more on top of the screen y(35.33,85.33) compared to the Flachhänger y(440.61,712.74).
Also the Flugzeug is narrower and flatter because of the distance to the camera.

## Demo2 / run 2

In [10]:
data_struct2,text2 = _build_data_struct(demos_file_path[1],'dict')
nr_of_obj_det2 = _get_number_of_object_detections(data_struct2)

print('-'*150)
print(f'{nr_of_obj_det2} objects were detected if we count all objects separately.')
print('-'*150)
nr_of_obj, obj = _get_number_of_unique_objects(data_struct)

print('-'*150)
print(f'There were {nr_of_obj} unique objects: {obj}.')
print('-'*150)

groups2, mean_dict2, ident_dict2 = _get_typical_position(data_struct2)

for key in groups2.keys():
    top,bot,w,h = _get_means_for_print_output(mean_dict2, key)
    print(f'The screen resolution was {data_struct["Video stream"][0]} x {data_struct["Video stream"][1]}.\n\nThe mean position for a {key} is:\nmean xy(top): {top}\nmean xy(bot): {bot}\nmean width: {w}\nmean hight: {h} ')
    print('-'*150)

Keys have the same lengths
------------------------------------------------------------------------------------------------------------------------------------------------------
1716 objects were detected if we count all objects separately.
------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------
There were 2 unique objects: ['Flugzeug', 'Flachhänger'].
------------------------------------------------------------------------------------------------------------------------------------------------------
The screen resolution was 1620 x 1283.

The mean position for a Flugzeug is:
mean xy(top): (109.5, 39.25)
mean xy(bot): (314.75, 93.5)
mean width: 205.25
mean hight: 54.25 
---------------------------------------------------------------------------

 ## 4. An Idea no implementation how your program would work, if the output is a stream and not a text file.

I have no experience with steam files but I opened the text file as one. So the code should work as is? Unless I have not implemented it correctly. The benefit of streams would be that none of the data is saved locally.