#Motivation
Due to the COVID-19 pandemic, many people are tied to their home offices and subsequently have to do their work often with meager equipment. Moreover, many students are also increasingly exposed to online tutoring sessions, in which for example they have to present solutions for different exercises. In some cases, students may even be the tutor and are leading the lessons. Under normal circumstances, one would sit in a room with a projector and blackboard or whiteboard. The latter two media would help to explain any questions by visually supporting what is being said. However, this technique is not easy to implement at home. It would require technical equipment such as a graphics tablet or a document camera. One of many problem is that not everyone has access to such technology and therefore the quality of the tutoring session could potentially suffer. It can also be difficult for someone who has acces to them but not the means to understand how to use them properly.To counteract such difficulties, an interactive camera system can be presented as an attractive solution. 

As a possible solution to the problems that have been named, we would like to introduce our approach to an interactive camera within an active camera system. 
For our approach to be succesful,two goals have to be meet: On the one hand,the active camera system should be able to identify an outstretched index finger and zoom in on its tip in order to better display what is being shown. On the other hand, the zoom should be controlled by a gesture using a flat palm with an extended thumb, so that the entire control of the software, after an initial start, can be done hands-free.

In this documentation, we would like to show how we implemented our approach to an interactive camera and explain each aspect in more detail.

#Finger recognition
As described above, one of the goal of the interactive camera is to track an outstretched finger based on the fingertip. Therefore, the first problem we encounter, would be to have the interactive camera recognize a hand overall and consequently isolate an outstretched finger. The following figure shows a processing pipeline of all steps, which will now be discussed in more detail.



## Histogram creation
To first detect a hand, a histogram-assisted masking of the given camera image is performed. A histogram represents a frequency distribution of individual pixels in an image. If a color is represented by a high number of pixels, this color has a high frequency or intensity in the histogram. Therefore, histograms are useful for filtering differently colored objects in a scene so that these filtered objects can be used in binary representation for more convenient use. The problems, which arise from hands of a different skin color than the developer's, can hence be avioded as there would be no neglection of any skin type anymore. However, this requires histogram creation by the user before the actual software is being used. Thus, as in Figure XX, the user needs to place his hand under the auxiliary areas drawn in and create the histogram by pressing the "Z" key.

![alt Text](https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/18.11.20/Messung_Hautfarbe.png "Bild 7: Messpunkte auf Hand")

**Figure XX:** Measuring points on hand 

This is done by merging the drawn areas into an independent image and then using "[*OpenCV*](https://pypi.org/project/opencv-python/)" to create the required histogram. The following code example shows how the measurement areas are drawn onto a given image and how a histogram is created from them. It should be mentioned that the calculated histogram is normalized to reduce any noise.
```python
  def drawMeasuringRectangles(frame):
    """Draws 'amountOfMeasuringRectangles' Rectangles on the given frame and returns the modified image"""
    rows, cols, dontCare = frame.shape
    global amountOfMeasuringRectangles, xCoordinatesOfMeasuringRectangles_topLeft, yCoordinatesOfMeasuringRectangles_topLeft, xCoordinatesOfMeasuringRectangles_bottomRight, yCoordinatesOfMeasuringRectangles_bottomRight

    # position messure points of hand histogram
    xCoordinatesOfMeasuringRectangles_topLeft = np.array(
        [6 * rows / 20, 6 * rows / 20, 6 * rows / 20, 9 * rows / 20, 9 * rows / 20, 9 * rows / 20, 12 * rows / 20,
         12 * rows / 20, 12 * rows / 20], dtype=np.uint32)

    yCoordinatesOfMeasuringRectangles_topLeft = np.array(
        [9 * cols / 20, 10 * cols / 20, 11 * cols / 20, 9 * cols / 20, 10 * cols / 20, 11 * cols / 20, 9 * cols / 20,
         10 * cols / 20, 11 * cols / 20], dtype=np.uint32)

    # define shape of drawn small rectangles | here 10x10
    xCoordinatesOfMeasuringRectangles_bottomRight = xCoordinatesOfMeasuringRectangles_topLeft + 10
    yCoordinatesOfMeasuringRectangles_bottomRight = yCoordinatesOfMeasuringRectangles_topLeft + 10

    # draw calculated rectangles
    for i in range(amountOfMeasuringRectangles):
        cv2.rectangle(frame,
                      (yCoordinatesOfMeasuringRectangles_topLeft[i], xCoordinatesOfMeasuringRectangles_topLeft[i]),
                      (yCoordinatesOfMeasuringRectangles_bottomRight[i],
                       xCoordinatesOfMeasuringRectangles_bottomRight[i]),
                      (0, 255, 0), 1)

    return frame

  def createHandHistogram(frame):
    global xCoordinatesOfMeasuringRectangles_topLeft, yCoordinatesOfMeasuringRectangles_topLeft

    # convert cv2 bgr colorspace to hsv colorspace for easier handling
    hsv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    # create new blank Region Of Interest matrix/image
    roi = np.zeros([90, 10, 3], dtype=hsv_frame.dtype)

    # fill ROI with the sample rectangles
    for i in range(amountOfMeasuringRectangles):
        roi[i * 10: i * 10 + 10, 0: 10] = hsv_frame[xCoordinatesOfMeasuringRectangles_topLeft[i]:
                                                    xCoordinatesOfMeasuringRectangles_topLeft[i] + 10,
                                          yCoordinatesOfMeasuringRectangles_topLeft[i]:
                                          yCoordinatesOfMeasuringRectangles_topLeft[i] + 10]

    # create a Hand histogram and normalize it
    hand_hist = cv2.calcHist([roi], [0, 1], None, [180, 256], [0, 180, 0, 256])

    # remove noise and retun
    return cv2.normalize(hand_hist, hand_hist, 0, 255, cv2.NORM_MINMAX)
```


##Masking the hand
Once a histogram is created, it can be used to subdivide a given image into gray levels using a so-called *backProjection* by OpenCV. A white pixel indicates that this place in the original image represents the color of the most intense color in the histogram, while a black pixel indicates the opposite. Thus,it is already possible to recognize the desired object in the resulting image. 

However, as can be seen in Fig. XY, this is still quite noisy and does not yet show the aforementioned binary image. 

![altText](https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/18.11.20/Hand_Nach_Histogram_BackProjection.png)

**Bild XY:** Hand after `cv2.calcBackProject( [...] )`

```python
 hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

    # mask area that matches with the histogram via back projection
    histogramMaskBackProjection = cv2.calcBackProject([hsv], [0, 1], hist, [0, 180, 0, 256], 5)
```

Furthermore, isolated false positives can be detected, which appear at the edge of the image around the searched object. To solve these problems, a closing and subsequent opening operation is required, which in turn is followed by a threshold holding operation. In figure YY, one can see that the to be recognized hand already has fewer gaps. Yet false positives can still be detected. That is the case, a closing operation closes smaller holes, i.e. black pixels, and thus allows white pixels that are close to each other to grow.

![altText](https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/Hand_nach_closing_operation.png)

**Figure YY:** Hand nach *Closing-Operation*

```python
maskingCircle = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (10, 10))

closedBackProjection = cv2.morphologyEx(histogramMaskBackProjection, cv2.MORPH_CLOSE,maskingCircle, iterations=2)
```

The resulting undesired effect of larger false positive values can be counteracted by an opening operation. An opening operation removes smaller objects, i.e. white pixels, and thus removes almost any false positive values as can be seen in Figure YZ.

![altText](https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/Hand_nach_opening_operation.png)

**Figure YZ:** Hand after *Opening-Operation*

```python
openedBackProjection = cv2.morphologyEx(closedBackProjection, cv2.MORPH_OPEN,maskingCircle, iterations=2)
```


Finally, the image must be converted to a binary, i.e. black and white, image for easier handling later on. That can be done by a simple thresholding operation, which turns every pixel that is not black white.

A result of this operation can be seen in Figure ZZ.

![altText](https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/Hand_nach_Threshholding.png)

**Figure ZZ:** Hand nach *Threshholding-Operation*

```python
ret, thresh = cv2.threshold(openedBackProjection, 1, 255, cv2.THRESH_BINARY)

thresh = cv2.merge((thresh, thresh, thresh))

cv2.bitwise_and(frame, thresh)
```

As shown in the code example, after the thresholding operation, a new image is generated using the ``cv2.merge`` command, which uses the color values of the thresholding operation in each of the RGB channels. This produces a result as shown in Figure ZZ. This binary image is then used to mask the original image appropriately by the ``cv2.bitwise_and`` command, so that only the relevant area of the hand is colored. The rest of the image remains black.


#Unsuccessful approach with Machine Learning
Before the histogram-based hand recognition was implemented, there was an attempt to recognize a pointing hand in an image using machine learning. However, the acquisition of data proved to be a challenge. Different lighting conditions can have a strong negative impact on the results, so data must be collected under different lighting conditions. That is tedious and difficult to automate. **Therefore, the training data amounted to only a few hundred images, which is why a correspondingly sobering result can be heard here, as far as the hit rate of the model is concerned.** That means that pointing and non-pointing hands can only be distinguished with difficulty. In addition, actions such as for example as writing or wiping, ensure that some false positives are detected. Thus,machine learning has proven to be insufficiently accurate at this point, which is why this approach was discarded.

##Finger tip recognition
To recognize the user's fingertip, the software uses a contour-based solution. OpenCV is able to recognize the contour of a shown object by means of an image as the following code shows. 
```python
def getContoursFromMaskedImage(maskedHistogramImage):
    """Returns the contours of a given masked Image"""
    grayscaledMaskedHistogramImage = cv2.cvtColor(maskedHistogramImage, cv2.COLOR_BGR2GRAY)
    ret, thresh = cv2.threshold(grayscaledMaskedHistogramImage, 0, 255, 0)
    cont, hierarchyDontCare = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    return cont
```
Using the list of straight lines obtained, i.e. the contours, the center of the contour is determined in order to obtain the center of the hand as well. This is important later to draw a line to an outstretched finger. The following code shows how the center of mass of a series of points can be found using so-called *moments*. The points are represented here by the ends of the vectors describing the contour.
```python
def getCenterCoordinatesOfContour(maxContour):
    """Returns the Centercoordinates of a given contour in the shape  X, Y"""
    moment = cv2.moments(maxContour)
    if moment['m00'] == 0:
        return None
    cx = int(moment['m10'] / moment['m00'])
    cy = int(moment['m01'] / moment['m00'])
    return cx, cy
```
In addition, it is possible to form a convex (i.e. outwardly curved) envelope through the contour, which in turn can be used to detect convex defects. A convex defect is characterized by the fact that a point is not located on the line of the envelope, but inside it. Figure ZA shows this behavior using the gray hand line and the red convex hull. 

<img src="https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/Zeichnung_konvexeHuelle_mit_Defekten.png" width=50%>
<img src="https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/Zeichnung_maximaler_konvexer_defekt.png" width = 40%>

**Figure ZA:** Convex hull with general (left) and maximum (right) convex defects drawn in.

With the help of all convex defects and the center of mass, the most distant point of the shell can be identified, which is assumed to be the fingertip. Thus, the point or the defect is searched for, which has the biggest distance to the center.

These two points are recalculated for each frame and added to a list of points in order to determine the average center of both point lists over time. It ensures that the points of the center of the hand and those of the recognized fingertip are less volatile. I.e. on the basis of an average point over time possible measuring errors have a small deflection and fall so for further computations less into weight. However, it should be noted here that a long list must be generated over a longer period of time or more frames, which means that the average center point can only move sluggishly. This may be desirable, as it allows the point to move smoothly and not make "jumps". However, this can result in the desired center point moving too slowly and thus becoming unusable. Therefore, finding an optimal number of points is of high importance. By trial and error, a value of 25 points has turned out to be the optimal number.
Figure AA shows here in yellow the points of the fingertip with their average center in green. In addition, the center of the hand and its average center can be seen here in pink and red, respectively.

<img src="https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/Hauptkamera_Mit_Infos.png" width = 100%>

**Figure AA:** Detected points of the fingertip and the center of the hand with their average centers

The large unfilled circles in red and green represent another control instance here. Points that appear outside these radii are discarded during processing in order to ignore possible misrecognized points. This means that in this frame the respective point is not recognized and therefore this frame does not contribute to the creation of the point list. This kind of self-checking is based on the assumption that the user does not move his hand frantically and thus could fall outside the radius. If this is the case, the system paralyzes itself and all points remain at their last position. This undesired behavior can easily be corrected by the user moving his hand again to the last detected position and resuming the tracking from there.

#Zoom to the fingertip
Since what is shown may be too small for the image in the image section, a digital zoom is required. It should focus on the user's fingertip. However, the focus should not be directly on the user's fingertip at first, but just above it, since the focus would otherwise not be on what is shown, but on the finger itself. This is not desirable. Therefore, a line or vector is drawn from the center of the hand to the fingertip, which is then extended by 50%. The end of the vector thus displays the center of the image to be focused.
A previously defined but variable zoom factor determines the size of the newly calculated image section. The image section is determined in a way that, starting from the center of the image, the respective corner coordinates are determined with the following formula: 
```python
#frame.shape[1] := width ; frame.shape[0] := height
leftX, rightX = int(xCenterOfNewFrame - frame.shape[1] // zoomFactor // 2), int(xCenterOfNewFrame + frame.shape[
        1] // zoomFactor // 2)
    bottomY, topY = int(yCenterOfNewFrame - frame.shape[0] // zoomFactor // 2), int(yCenterOfNewFrame + frame.shape[
        0] // zoomFactor // 2)
```
After this calculation, it is important to check if the new coordinates are actually inside the original image. If that is not the case, as in Figure AB, the outer coordinates must be moved to the nearest edge so that the new image section is at the edge of the original image.

<img src="https://raw.githubusercontent.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/main/Documentation/Pictures/Verschobener_Frame.png" width = 100%>

**Figure AB:** Illegal image cropping of the zoomed window

# Gesture recognition with Tensorflow


## Tensorflow as our Machine-Learning Software
Machine learning is an important part of our gesture recognition. The software available to us today is far more powerful than we needed in this case. This is also the case with TensorFlow. Nevertheless, we decided to use this software because it is easy to implement and no further knowledge is required for the initial setup. In addition, we are already provided with more in-depth information on the application of this in the course of the event.

## Aquire training data
In order to achieve a high degree of consistency in the recognized gestures, a large data set is needed. 
It also has to considered in which part of the software the gesture recognition should take place. First of all, there are many options. It would be possible to recognize the gesture before any processing of the image. However, this would cause a lot of problems. For example, hundreds of pictures of each gesture with different lighting conditions, skin colors and backgrounds would have to be taken in order to achieve even a rudimentarily accurate result. Another possibility for gesture recognition would come after backprojection. What is most important for gesture recognition has already been filtered out: the hand. At the same time it results in an image that is only available in black and white and would not need the background, nor the skin color for training. However, there are still some artifacts to be seen, as certain areas of the image have a similar hue, but do not belong to the hand. Therefore, the best step would be the last step of the processing: Thresholding. As already described above, the occurring artifacts are filtered during backprojection. 
The important areas are additionally highlighted as well.

Now, to get as much data as possible, over 1000 images per gesture need to be collected. The data were created rather quickly due minimal focus on lighting conditions and the lack of attention on background or skin color. 


### Live video capture of gestures we want to recognise
To collect the data, several options were possible. On the one hand, the software itself could store the processed images. One would only have to sift through them once and sort out any inaccurate results. However, this would have a strong impact on the performance and slow down the creation of the data set. It was deemed more useful to record the displayed output of the processed frame as a screen video.


## Prepare training data
Since the created video cannot simply serve as training data in TensorFlow, they had to be further prepared beforehand.


### Converting the Videos to Images, cropping and resizing
The created videos were converted into a sequence of images. In addition,the images were reduced to the relevant area for us. These images also had to be sifted afterwards in order to sort out errors. Since a tensor flow model works with an input of 224 x 224 images, the images were scaled to the same size.


## Training of the Model
The training of the model could now be started. 
There is a very helpful website (https://teachablemachine.withgoogle.com), which handles the entire training of the model and provides suitable training methods depending on the different domains of usage. 
The Image Classification model was chosen for this purpose.
![TeachableMachine.com](https://github.com/uol-mediaprocessing-202021/medienverarbeitung-e-interactive-camera-system/blob/main/Documentation/Pictures/27.01.21/teachableMachine.jpg?raw=true)


## Implementation of the Model
The actual use of the code turned out to be rather simple. After the model was trained, a opportunity provided by the website arised to take a simple example of the implementation of the Keras model in Python directly. An adaption had to take place for their template to fit into the approach chosen.


In [None]:
def getGesturePredictionFromTensorflow(frame, model):
    if frame is None or model is None or type(frame) != np.ndarray or type(model) != tf.keras.Sequential:
        return "OTHER"
    h1 = frame.shape[0]
    w1 = frame.shape[1]

    # Create the array of the right shape to feed into the keras model
    # The 'length' or number of images you can put into the array is
    # determined by the first position in the shape tuple, in this case 1.
    data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)

    # Replace this with the path to your image
    dimension = (224, 224)
    image = cv2.resize(frame, dimension, interpolation=cv2.INTER_AREA)

    # turn the image into a numpy array
    image_array = np.asarray(image)

    # Normalize the image
    normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1

    # Load the image into the array
    data[0] = normalized_image_array

    # run the inference
    prediction = model.predict(data)

    # print(prediction)
    predictionDictionary = {
        "LEFT": prediction[0][0],
        "RIGHT": prediction[0][1],
        "OTHER": prediction[0][2]
    }
    global lastDetection, lastDetectionCount
    detection = max(predictionDictionary.items(), key=operator.itemgetter(1))[0]
    if lastDetection is None or lastDetection != detection:
        lastDetection = detection
        lastDetectionCount = 0
    else:
        lastDetectionCount += 1

    return detection

# GUI
The software presented here, which is to be used as a collaborative tool, needs a GUI just because of a live preview. 
Therefore, one of the aims was to make it as simple and clear as possible. It also had to be considered that the software will be used with systems that have multiple cameras and screens. Therefore, a way to switch between the different monitors and cameras as easily as possible had to be created. This also without restarting the software.
It was also important to think about how to display multiple windows that reflect different steps in the processing of the image and thus visualize our processing pipeline.

In [None]:
class ImageShower(object):
    """Creates another TKInter Window and shows the given Image
    """

    def __init__(self, name="Window", window=None):
        """
        Initialize a new ImageShower, by creating another TKInter Window and set its Name
        :param name:
        """
        if window is None:
            self.window = tk.Toplevel(app)
            self.window.title(name)
        else:
            self.window = window

        self.panel = None
        self.frame = None

    def update(self, image):
        """
        Update the Image witch will be shown in this Window
        :param image: The Image as cv2 Image in BGR
        """
        self.frame = image

    def show(self, width=640, height=360):
        """
        Shows the Image, witch has been already set by the Update Method or is given by an Optional Parameter
        :param frame: The Optional cv2 Image in BGR
        :param width: The Optional scaled Width of the Image
        :param height: The Optional scaled Height of the Image
        :return: None if no Image is given
        """
        if self.frame is None:
            return
        try:
            # Resize and Convert cv2 Image to TKInter Image
            img = cv2.resize(np.array(self.frame), (width, height), interpolation=cv2.INTER_AREA)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGBA)
            img = Image.fromarray(img)
            img = ImageTk.PhotoImage(img)
            # if the panel is not None, we need to initialize it
            if self.panel is None:
                self.panel = tk.Label(self.window, image=img)
                self.panel.image = img
                self.panel.pack(side=tk.TOP)

            # otherwise, simply update the panel
            else:
                self.panel.configure(image=img)
                self.panel.image = img
        except RuntimeError:
            print("[INFO] caught a RuntimeError")
        except cv2.error:
            print("[DEBUG] Bildfehler! (Format richtig?)")

## Showing Windows
The first step was to display the current monitor within the software. For this purpose the Python library 'mss' was used. It is able to read all connected monitors and to display data like the current screen content or the dimensions of the selected monitor. 

Some of the progress steps required precise adjustments to the parameters. For this purpose, some steps needed to be displayed with additional information. Subsequently ,relevant information can be extrated as to how the software reacts in certain situations.

It would be a possibility to use the ImageShower shown earlier in order to achieve that.

In [None]:
# Create Optional Windows for Debugging and Additional Infos
histogramWindow = ImageShower("Histogram")
histogramThreshWindow = ImageShower("Histogram mit Threshhold")
mainCameraWithInfo = ImageShower("Hauptkamera mit Infos")

### Live Camerafeed with generated metadata
In order to see how the software performs on different devices, the current frame rate is displayed on the processed frame.

This contains further data such as an activation circle, the last recognized positions of the finger, as well as the assumed position of the back of the hand. Also available to see in the view, is the current zoom level.


### Processed Image with Backprojection
Another output represents a specific point in the actual image processing. After a histogram has been recorded, it is applied to the current camera image using backprojection. The result is all pixels that match parts of the histogram. All the other parts of the image are black. This display was valuable to use because it provided important information about the processing steps that had already been performed. Moreover, it showed whether various changes in the size of the histogram or in the parameters of the backprojection produced more positive results.


### Processed Image with additional Thresholding
Another processing step that was used for debugging purposes was a small window showing the processed camera image after the additional thresholding. Various previously performed processing steps played a major role in the final quality. An example would be different lighting conditions, or different skin tones on the back and palm of the hand.


### Main-Window (Screen + PiP)
To bring all the processing steps together, there is a main window. It contains both the choice between different monitors and cameras, as well as the display of the selected monitor and the processed picture of the camera. The camera image is then only displayed when a finger is in the image. In addition, the image can be zoomed in or out using the aforementioned gesture recognition. The zoomed image always follows the finger and zooms to the displayed position. The zoom level is maintained even if the finger leaves the picture.


## Performance Improvements
An issue that was detected quite quickly was the performance drop after not only the current monitor was displayed in the window, but also the incoming camera image was processed. The problem with the software was that all actions happened on one thread: both the reading of the monitor, the camera, the entire processing and the subsequent display of the results. 
The solution was discussed that some sections of the program could be outsourced to separate threads in order to already read in the image that was to be processed and make it available by means of a variable.
To resolve the issue, two different program sections were programmed, which separately take care of the camera to be read in as well as the reading of the monitor.

In [None]:
class MonitorGrabber(object):
    """
    Reads the Current Screen in another Thread and Stores it for easy Access
    """

    def __init__(self, src=1, width=1280, height=720):
        """
        Initialize a new MonitorGrabber
        :param src: MonitorIndex from mss
        :param width: Scaled Output Image width
        :param height: Scaled Output Image hight
        """
        self.setSrc(src)
        self.width = width
        self.height = height

        # Grab Monitor Image, Resize, Convert and Store it
        img = sct.grab(self.src)
        # noinspection PyTypeChecker
        img = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)
        self.picture = cv2.cvtColor(img, cv2.COLOR_BGRA2BGR)
        self.stopped = False

    def start(self):
        """
        Starts another Thread for its own get-Method, to grab the Image out of Mainloop
        :return:  Optional: The Own Object to create, start the Thread and save the Object at the same Time
        """
        Thread(target=self.get, args=()).start()
        return self

    def setSrc(self, src):
        """
        Re-Sets the Monitor Input Source Index of mss
        :param src: The new Monitor Index
        """
        self.src = sct.monitors[src]

    def get(self):
        """
        Grabs the current Monitor Image, Resize, convert and stores it
        """
        while not self.stopped:
            img = sct.grab(self.src)
            img = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)
            self.picture = cv2.cvtColor(img, cv2.COLOR_BGRA2BGR)

    def stop(self):
        """
        Stops the MonitorGrabber-Get-Thread started by the start-Method
        """
        self.stopped = True

In [None]:
class CameraGrabber(object):
    """
    Reads the Current Camera-feed in another Thread and Stores it for easy Access
    """

    def __init__(self, src, width=1280, height=720):
        """
        Initialize a new CameraGrabber
        :param src: CameraIndex from mss
        :param width: Scaled Output Image width
        :param height: Scaled Output Image hight
        """
        self.width = width
        self.height = height

        # Grab Camera Image, Resize, Convert and Store it
        self.stream = cv2.VideoCapture(src)
        (self.grabbed, img) = self.stream.read()
        self.picture = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)
        self.stopped = False

    def start(self):
        """
        Starts another Thread for its own get-Method, to grab the Image out of Mainloop
        :return:  Optional: The Own Object to create, start the Thread and save the Object at the same Time
        """
        Thread(target=self.get, args=()).start()
        return self

    def setSrc(self, src):
        """
        Re-Sets the Camera Input Source Index of mss
        :param src: The new Camera Index
        """
        self.stream = cv2.VideoCapture(src)

    def get(self):
        """
        Grabs the current Camera Image, Resize and stores it
        """
        while not self.stopped:
            if not self.grabbed:
                self.stop()
            else:
                (self.grabbed, img) = self.stream.read()
                self.picture = cv2.resize(np.array(img), (self.width, self.height), interpolation=cv2.INTER_AREA)

    def stop(self):
        """
        Stops the CameraGrabber-Get-Thread started by the start-Method
        """
        self.stopped = True