# Automated Game Playing With OpenCV and PyAutoGUI

We have already seen how we can automate everyday activities with PyAutoGUI. We will now see how we can use OpenCV image recognition with PyAutoGUI. We will use image recognition with OpenCV to send signals to which we will interpret as actions to our game and we will play them using PyAutoGUI.

## Workflow

Our workflow involves two steps:
* Face recognition with OpenCV
* Automation with PyAutoGUI

**Image recognition**
<br>
Here we will have the following steps:
* Load a model
* Capture frames from webcam
* Feed the model with frames
* Get predictions from the model
* Detect whether face has left centre bounding box
* If it has left, make the appropriate move

**Automation**
<br>
Once we detect a move outside the bounding box, we call PyAutoGui for automation.

OpenCV supports deep neural networks through its `dnn` module. It has methods to load models from various frameworks such as TensorFlow, PyTorch and Caffe. We can also use it to get predictions on input into a model.
<br>
For this example, we are going to use a Caffe framework model for face detection. We will use OPenCV's `readnetfromCaffe` method.

In [3]:
import cv2
import numpy as np

## Methods

We will begin by describing all the methods that we will use and then go on to the final flow.

### Draw Rectangle

We define a simple function to draw a rectangle on the screen. This function will be used to draw the bounding box, and face position on a frame.

In [4]:
def draw_rectangle(top, bottom, frame, color=(0, 255, 0)):
    '''
    Given coordinates and a frame, draw a rectangle.

    Takes two tuples, top and bottom and the frame,
    Optionally takes a color
    Returns a copy of the frame, redrawn.
    '''

    cv2.rectangle(frame, top, bottom, color=color, thickness=3)

### Press Key

This function simply takes as input the hexcode of a key, and presses it. For this, we will use the win32api instead of pyautogui. This is because, for a game, it becomes a bit too slow to use pyautogui.

In [5]:
import win32api
import win32con

def press_key(hexKeyCode):
    win32api.keybd_event(hexKeyCode, 0, 0, 0)  # Press key
    # Release key
    win32api.keybd_event(hexKeyCode, 0,\
                         win32con.KEYEVENTF_KEYUP, 0)

### Select Key

This function works with the `press_key` function above. It selects a key based on user movement and passes it to `press_key` for pressing.

In [6]:
def move_key(key:str):
    '''
    Make a keystroke, depending on the key pressed.

    It takes the returns from face_inbox and presses
    a set key.

    You can customize for a single or for all keys.
    '''

    if key == 'left':
        press_key(win32con.VK_LEFT)
    elif key == 'right':
        press_key(win32con.VK_RIGHT)
    elif key == 'up':
        press_key(win32con.VK_UP)
    elif key == 'down':
        press_key(win32con.VK_DOWN)
    else:
        pass

We use this function to press the 4 arrow keys (since they are used in most games). Although, we can customize it to any key(s) we want. The function takes a string which could either be 'left', 'right', 'up' or 'down'. These values show the relative position of the face w.r.t the bounding box of the frame, the centre. We check to see if the face is within the bounding box, if true, it is in the 'centre' in which case no action is taken. If it is outside the bounding box, we return either of the values showing the position and press the necessary key.

### Get position of face w.r.t bounding box

We now create a method that will tell us whether or not the face is within the bounding box. The output of this function is the input of the function above.

In [7]:
def face_inbox(bbox, face_coords):
    '''
    This function will check whether the face is
    in the bounding box or not.

    It takes two 4 value tuples:
    (topx, topy, bottx, botty)

    Returns 'left', 'right', 'center', 'down' or 'up'
    depending on the position of the inner box wrt the
    bounding box
    '''

    if(face_coords[0] < bbox[0]):
        return 'left'
    elif(face_coords[1] < bbox[1]):
        return 'up'
    elif(face_coords[2] > bbox[2]):
        return 'right'
    elif(face_coords[3] > bbox[3]):
        return 'down'
    
    return 'center'

### Detecting the face

We will now go ahead and detect a face from a frame. Our function `get_prediction` takes a model and a frame and returns a dictionary with the location of all the faces found in the image and the degree of confidence that it is a face.

In [8]:
def get_predictions(net, frame):
    '''
    This function takes the frame as input to the model and
    gets the prediciton of whether it has a face or not.

    It returns a dictionary with the coordinates of the face:
    {
      'Top': (TopX, TopY),
      'Bottom': (BottomX, BottomY) }
    '''
    #Will hold the dictionary of coordinates
    face_coordinates = []

    h, w = frame.shape[:2]
    # We will first create a blob from the image
    # A blob is an image/images that have the same depth,
    # shape (width, height) and that have been
    # preprocessed in the same manner
    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)),
                                 scalefactor=1.0,
                                 size=(300,300),
                                 mean=(104.0, 177.0, 123.0))
    
    # The function blobFromImage returns a 4-D tuple like so:
    # (num_images, num_channels, width, height)

    # We then feed our blob into the net
    net.setInput(blob)

    # We perform a feed foward across
    # all layers to get a prediction
    predictions = net.forward()

    # The forward() function also returns a 4-D tuple like so:
    # (1, 1, 200, 7)
    # 1, 1 - number of images working on
    # 200 - number of faces detected
    # 7 - a vector of 7 values like so:
    # [Image number, Binary (0 or 1), confidence score (0 to 1),
    # StartX, StartY, EndX, EndY]

    # With this data, we can filter based on confidence score
    # we iterate through every face
    for face in range(predictions.shape[2]):
        #get confidence
        confidence = predictions[0, 0, face, 2]

        if confidence > 0.5:
            #Take the coordinates
            (Top_x, Top_y) = predictions[0, 0, face, 3:5] * np.array([w, h])
            (Bott_x, Bott_y) = predictions[0, 0, face, 5:] * np.array([w, h])
        else:
            continue
        
        face_coordinates.append({
            'top': (int(Top_x), int(Top_y)),
            'bottom' : (int(Bott_x), int(Bott_y)),
            'confidence' : confidence
        })
    return face_coordinates

The function does quite a lot, hence it is important to take some time and see that you understand what it is doing. All the steps are documented above.

## The main loop

Now that we have defined all our methods, we can get into the main part of the execution. Our steps are simple:
* We load the model
* Create a `VideoCapture` object
* Create a bounding box
* Start the main loop

### Load the model

In [9]:
# We load the model
net = cv2.dnn.readNetFromCaffe('model/deploy.prototxt',\
                               'model/res10_300x300_ssd_iter_140000.caffemodel')

We are using a caffe framework face detection model.

### VideoCapture object and bounding box

In [11]:
# We open the VideoCapture object here to be able
# to create a bounding box
cap = cv2.VideoCapture(0)

frame_width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
frame_height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)

#create a bounding box: w=180, h=185
top_x, top_y = int(frame_width//2 - 90), int(frame_height//2 - 85)
bottom_x, bottom_y = int(frame_width//2 + 90), int(frame_height//2 + 100)
bbox = [top_x, top_y, bottom_x, bottom_y]

#assign in tuples
centre_top = (top_x, top_y)
centre_bottom = (bottom_x, bottom_y)

We create a bounding box specifying the top left and bottom right corners of our image. We put these values inside `bbox` and `centre_top` and `centre_bottom`. We do this since the values are needed in other functions and this format allows us to pass them easily to those functions.

### Define necessary variables

We define 3 variables:
* `init` - to check for initialization, if it is 0, we cannot start playing. We will see what triggers it in the main loop.
* `last_move` and `move` - these describe the last and current position of the face's bounding box w.r.t the main bounding box. We will see the necessity of recording the last move in the main loop.

In [12]:
init = 0
move = ''
last_move = ''

### The while loop

In [13]:
#the main loop
while True:
    if not cap.isOpened():
        cap = cv2.VideoCapture(0)

    ret, frame = cap.read()
    
    if ret is False:
        break

    # Flip frame to match normal movement
    frame = cv2.flip(frame, 2)

    # Get predictions
    faces = get_predictions(net, frame)

    # If we find a face
    if len(faces) > 0:
        # Sort according to highest confidence,
        # in case of many points
        sorted_faces = sorted(faces, key=lambda x: x['confidence'],
                              reverse=True)

        # Take the best sorted image
        face = sorted_faces[0]

        #Get the face coordinates
        face_top_x, face_top_y = face['top']
        face_bott_x, face_bott_y = face['bottom']
        
        #draw face bounding rectangle on the frame
        draw_rectangle((face_top_x, face_top_y),
                        (face_bott_x, face_bott_y),
                        frame, color=(0, 0, 255))
    
        # check whether the face is in bbox:
        face_coords = [face_top_x, face_top_y, face_bott_x, face_bott_y]

        move = face_inbox(bbox, face_coords)

        cv2.putText(frame, move, (30, 50), 0, 3, (180, 0, 180), 2)

        #initialize game
        if init == 0:
            if move == 'center':
                init = 1
                cv2.putText(frame, 'Initialized', (30, 50), 0, 3, (10, 100, 180), 10)
        else:
            if last_move == 'center':
                move_key(move)

        last_move = move
    
    
    # Draw the central bounding box
    draw_rectangle(centre_top, centre_bottom, frame)

    cv2.imshow('current frame', frame)

    key = cv2.waitKey(5)
    if key == 27:
        break

Let me explain what is happening in the loop above:
* We first ensure that the video is opened
* We then try to read a frame using cap.read(), if the frame cannot be read, we break loop
* We flip the frame, horizontally so that it will better match our movement.
* We use the `get_predictions` method to get the coordinates of a detected face.
* If we found no face, we skip the whole section and only draw the outer bounding box, else
* If we did:
* we sort the fac.es based on confidence
* take the first face
* get the face coordinates
* draw the face bounding rectangle in the frame
* detect the face movement using `face_inbox`
* write on screen the position of the face w.r.t the bounding box
* check if the game is initialized. The game is initialized when the face moves to the centre of the bounding box and `init` is set to 1.
* If the game is initialized, we will only press a key if the last move was 'centre' to reduce the number of accidental moves and lag.
* Lastly we show the frame, and exit the loop if the user presses `Esc` key.

In [None]:
cap.release()
cv2.destroyAllWindows()

We then release the VideoCaoture object and destroy all windows after the while loop. And there, we have it. With this code,we can be able to play video games only using head movements. We can speed it up by capturing every other frame. You can get more from this by running on an IDE than on the notebook.