<a href="https://colab.research.google.com/github/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/blob/main/Documentation/Documentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Motivation
In the current Corona pandemic, many people are tied to their own home offices and therefore have to do their work mostly with meager equipment. As a result, many students are also increasingly exposed to online tutoring, where one of the things they have to do is present solutions in tutorials. Some students may even be the tutor themselves and lead the lessons. Under normal circumstances, one would traditionally sit in a room with a projector and blackboard or whiteboard. Here, the latter two media help immensely to explain any questions by visually supporting what is being said. However, this technique is not easy to implement at home and requires technical equipment such as a graphics tablet or a document camera. The only problem is that not everyone has access to such technology and therefore the quality of the tutorial suffers from this lack.

To counteract this, an interactive camera system presents itself as an attractive solution. This active camera system should be able to identify an outstretched index finger and zoom in on its tip in order to better display what is being shown. Furthermore, the zoom should be controlled by a gesture using a flat palm with an extended thumb, so that the entire control of the software, after an initial start, can be done hands-free.

In the following, we will implement our interactive camera step by step and go into each aspect in more detail.

# Gesture recognition with Tensorflow


## Tensorflow as our Machine-Learning Software
Machine learning is an important part of our gesture recognition. The software available to us today is far more powerful than we needed in this case. This is also the case with TensorFlow. Nevertheless, we decided to use this software because it is easy to implement and no further knowledge is required for the initial setup. In addition, we are already provided with more in-depth information on the application of this in the course of the event.

## Aquire training data
In order to achieve a high degree of consistency in the recognized gestures, we first need a large data set. We also have to think about in which part of the software the gesture recognition should take place. First of all, there are many possibilities. It would be possible to recognize the gesture before any processing of the image. However, this would cause a lot of problems. For example, we would have to take hundreds of pictures of each gesture with different lighting conditions, skin colors and backgrounds in order to achieve even a rudimentarily accurate result. Another possibility for gesture recognition would come after backprojection. Here, what is most important for gesture recognition has already been filtered out. The hand. At the same time we get an image that is only available in black and white and would not need the background, nor the skin color for training. However, there are still some artifacts to be seen, as certain areas of the image have a similar hue, but do not belong to the hand. Therefore, the best step would be the last step of the processing. Thresholding. As already described above, the occurring artifacts are filtered out during backprojection and the important areas are additionally highlighted.

Now, to get as much data as possible, we collected over 1000 images per gesture. Since we don't have to pay attention to background, lighting conditions, or skin color, we were able to create the dataset very quickly.


### Live video capture of gestures we want to recognise
To collect the data we had several options. On the one hand, the software itself could store the processed images and we would only have to sift through them once and sort out any inaccurate results. However, this would have a strong impact on the performance and slow down the creation of the data set.
We considered it more useful to record the displayed output of the processed frame as a screen video.


## Prepare training data
Since the created video cannot simply serve as training data in TensorFlow, they had to be further prepared by us beforehand.


### Converting the Videos to Images, cropping and resizing
We converted the created videos into a sequence of images. In addition, we reduced the images to the relevant area for us. These images also had to be sifted by us afterwards in order to sort out errors. Since a tensor flow model works with an input of 224 x 224 images, we scaled our images to the same size.


## Training of the Model
The training of the model could now be started. For this, there is a very helpful website (https://teachablemachine.withgoogle.com), which takes over the entire training of the model and provides suitable training methods depending on the use case. We chose the Image Classification model for our purpose.
![TeachableMachine.com](https://github.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/blob/main/Documentation/Pictures/27.01.21/teachableMachine.jpg?raw=true)


## Implementation of the Model
The actual use of the code turned out to be very simple. After the model was trained, we were given the opportunity to take a simple example of the implementation of this Keras model in Python directly. This just had to be adapted a bit to our needs.


In [None]:
def getGesturePredictionFromTensorflow(frame, model):
    if frame is None or model is None or type(frame) != np.ndarray or type(model) != tf.keras.Sequential:
        return "OTHER"
    h1 = frame.shape[0]
    w1 = frame.shape[1]

    # Create the array of the right shape to feed into the keras model
    # The 'length' or number of images you can put into the array is
    # determined by the first position in the shape tuple, in this case 1.
    data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)

    # Replace this with the path to your image
    dimension = (224, 224)
    image = cv2.resize(frame, dimension, interpolation=cv2.INTER_AREA)

    # turn the image into a numpy array
    image_array = np.asarray(image)

    # Normalize the image
    normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1

    # Load the image into the array
    data[0] = normalized_image_array

    # run the inference
    prediction = model.predict(data)

    # print(prediction)
    predictionDictionary = {
        "LEFT": prediction[0][0],
        "RIGHT": prediction[0][1],
        "OTHER": prediction[0][2]
    }
    global lastDetection, lastDetectionCount
    detection = max(predictionDictionary.items(), key=operator.itemgetter(1))[0]
    if lastDetection is None or lastDetection != detection:
        lastDetection = detection
        lastDetectionCount = 0
    else:
        lastDetectionCount += 1

    return detection

# GUI
A software like ours, which is to be used as a collaborative tool, needs a GUI just because of a live preview. Therefore, we wanted to make it as simple and clear as possible. We also had to take into account that the software will be used on systems that have multiple cameras and screens. So we had to create a way to switch between the different monitors and cameras as easily as possible. This also without restarting the software.

We also had to think about how to display multiple windows that reflect different steps in the processing of the image and thus visualize our processing pipeline.

In [None]:
class ImageShower(object):
    """Creates another TKInter Window and shows the given Image
    """

    def __init__(self, name="Window", window=None):
        """
        Initialize a new ImageShower, by creating another TKInter Window and set its Name
        :param name:
        """
        if window is None:
            self.window = tk.Toplevel(app)
            self.window.title(name)
        else:
            self.window = window

        self.panel = None
        self.frame = None

    def update(self, image):
        """
        Update the Image witch will be shown in this Window
        :param image: The Image as cv2 Image in BGR
        """
        self.frame = image

    def show(self, width=640, height=360):
        """
        Shows the Image, witch has been already set by the Update Method or is given by an Optional Parameter
        :param frame: The Optional cv2 Image in BGR
        :param width: The Optional scaled Width of the Image
        :param height: The Optional scaled Height of the Image
        :return: None if no Image is given
        """
        if self.frame is None:
            return
        try:
            # Resize and Convert cv2 Image to TKInter Image
            img = cv2.resize(np.array(self.frame), (width, height), interpolation=cv2.INTER_AREA)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGBA)
            img = Image.fromarray(img)
            img = ImageTk.PhotoImage(img)
            # if the panel is not None, we need to initialize it
            if self.panel is None:
                self.panel = tk.Label(self.window, image=img)
                self.panel.image = img
                self.panel.pack(side=tk.TOP)

            # otherwise, simply update the panel
            else:
                self.panel.configure(image=img)
                self.panel.image = img
        except RuntimeError:
            print("[INFO] caught a RuntimeError")
        except cv2.error:
            print("[DEBUG] Bildfehler! (Format richtig?)")

## Showing Windows
The first step was to display the current monitor within the software. For this purpose we used the Python library 'mss'. It is able to read all connected monitors and to display data like the current screen content or the dimensions of the selected monitor. For debugging reasons, we also wanted to output individual intermediate steps, as well as various metadata, on the basis of which certain actions are performed.

For this we could then also use the ImageShower shown earlier.

In [None]:
# Create Optional Windows for Debugging and Additional Infos
histogramWindow = ImageShower("Histogram")
histogramThreshWindow = ImageShower("Histogram mit Threshhold")
mainCameraWithInfo = ImageShower("Hauptkamera mit Infos")

### Live Camerafeed with generated metadata
In order to find out whether our software is working correctly, the frame read in by the camera was output from a small extra window. This also contains further data such as an activation circle, the last recognized positions of the finger, as well as the assumed position of the back of the hand. Also available in the view is the current zoom level.


### Processed Image with Backprojection
Another output represents a specific point in the actual image processing. After a histogram has been recorded, it is applied to the current camera image using backprojection. The result is all pixels that match parts of the histogram. All other parts of the image are black. This display was important to us because it provided important information about the processing steps that had already been performed. Also, whether various changes in the size of the histogram or in the parameters of the backprojection produced more positive results.


### Processed Image with additional Thresholding
Another processing step we used for debugging purposes was a small window showing the processed camera image after the additional thresholding. Here, too, various previously performed processing steps played a major role in the final quality. An example would be different lighting conditions, or different skin tones on the back and palm of the hand.


### Main-Window (Screen + PiP)
To bring all the processing steps together, there is a main window. This contains both the possibility of the choice between different monitors and cameras, as well as the display of the selected monitor and the processed picture of the camera. The camera image is then only displayed when a finger is in the image. In addition, the image can be zoomed in or out using the aforementioned gesture recognition. The zoomed image always follows the finger and zooms to the displayed position. The zoom level is maintained even if the finger leaves the picture.


## Performance Improvements
One problem that came to our attention quite quickly was the performance drop after not only the current monitor was displayed in the window, but also the incoming camera image was processed. The problem with our software was that all actions happened on one thread. Both the reading of the monitor, the camera, as well as the entire processing and the subsequent display of the results. We came up with the idea that some sections of the program could be outsourced to separate threads in order to already read in the image that was to be processed and make it available by means of a variable. For this reason we have programmed two different program sections, which separately take care of the camera to be read in as well as the reading of the monitor.

In [None]:
class MonitorGrabber(object):
    """
    Reads the Current Screen in another Thread and Stores it for easy Access
    """

    def __init__(self, src=1, width=1280, height=720):
        """
        Initialize a new MonitorGrabber
        :param src: MonitorIndex from mss
        :param width: Scaled Output Image width
        :param height: Scaled Output Image hight
        """
        self.setSrc(src)
        self.width = width
        self.height = height

        # Grab Monitor Image, Resize, Convert and Store it
        img = sct.grab(self.src)
        # noinspection PyTypeChecker
        img = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)
        self.picture = cv2.cvtColor(img, cv2.COLOR_BGRA2BGR)
        self.stopped = False

    def start(self):
        """
        Starts another Thread for its own get-Method, to grab the Image out of Mainloop
        :return:  Optional: The Own Object to create, start the Thread and save the Object at the same Time
        """
        Thread(target=self.get, args=()).start()
        return self

    def setSrc(self, src):
        """
        Re-Sets the Monitor Input Source Index of mss
        :param src: The new Monitor Index
        """
        self.src = sct.monitors[src]

    def get(self):
        """
        Grabs the current Monitor Image, Resize, convert and stores it
        """
        while not self.stopped:
            img = sct.grab(self.src)
            img = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)
            self.picture = cv2.cvtColor(img, cv2.COLOR_BGRA2BGR)

    def stop(self):
        """
        Stops the MonitorGrabber-Get-Thread started by the start-Method
        """
        self.stopped = True

In [None]:
class CameraGrabber(object):
    """
    Reads the Current Camera-feed in another Thread and Stores it for easy Access
    """

    def __init__(self, src, width=1280, height=720):
        """
        Initialize a new CameraGrabber
        :param src: CameraIndex from mss
        :param width: Scaled Output Image width
        :param height: Scaled Output Image hight
        """
        self.width = width
        self.height = height

        # Grab Camera Image, Resize, Convert and Store it
        self.stream = cv2.VideoCapture(src)
        (self.grabbed, img) = self.stream.read()
        self.picture = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)
        self.stopped = False

    def start(self):
        """
        Starts another Thread for its own get-Method, to grab the Image out of Mainloop
        :return:  Optional: The Own Object to create, start the Thread and save the Object at the same Time
        """
        Thread(target=self.get, args=()).start()
        return self

    def setSrc(self, src):
        """
        Re-Sets the Camera Input Source Index of mss
        :param src: The new Camera Index
        """
        self.stream = cv2.VideoCapture(src)

    def get(self):
        """
        Grabs the current Camera Image, Resize and stores it
        """
        while not self.stopped:
            if not self.grabbed:
                self.stop()
            else:
                (self.grabbed, img) = self.stream.read()
                self.picture = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)

    def stop(self):
        """
        Stops the CameraGrabber-Get-Thread started by the start-Method
        """
        self.stopped = True