# Fall detection

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/uclasystem/VQPy/blob/main/examples/fall_detection/demo.ipynb)

## Introduction

In this example we adopt code from [Human Falling Detection and Tracking](https://github.com/GajuuzZ/Human-Falling-Detect-Tracks) to track human movement and detect action. Two models are used: [AlphaPose](https://github.com/MVIG-SJTU/AlphaPose), to get person's body keypoints from cropped image of person; and [ST-GCN](https://github.com/yysijie/st-gcn), to predict action from every 30 frames of each person's keypoints.

# Environment setup

Python3.8 is recommended to avoid compatibility issues when installing YOLOX.

Please download video from [here](https://youtu.be/ctniCxIdpTY) and place it in the same directory as this notebook.

You'll also need to download pre-trained models from [SPPE FastPose (AlphaPose)](https://drive.google.com/file/d/1IPfCDRwCmQDnQy94nT1V-_NVtTEi4VmU/view?usp=sharing) and [ST-GCN](https://drive.google.com/file/d/1mQQ4JHe58ylKbBqTjuKzpwN2nwKOWJ9u/view?usp=sharing) and place them in the same directory. (models from [Human Falling Detection and Tracking](https://github.com/GajuuzZ/Human-Falling-Detect-Tracks#pre-trained-models))

In [None]:
# install YOLOX
!git clone https://github.com/Megvii-BaseDetection/YOLOX.git
# download YOLOX pretrained model
!wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_x.pth
!cd YOLOX && pip3 install .
# download VQPy, move vqpy/ to root directory for import
!git clone https://github.com/uclasystem/VQPy.git
!mv VQPy/vqpy ./
# install VQPy's dependencies
!pip3 install lap cython_bbox shapely
import numpy as np
import torch
import vqpy
import os
import sys
sys.path.append("VQPy/examples/fall_detection/detect")
video_path = "./fall.mp4"	# path to video
save_folder = "./vqpy_outputs"
model_dir = "./"
# import AlphaPose and ST-GCN models
from PoseEstimateLoader import SPPE_FastPose
from ActionsEstLoader import TSSTG

For reference, the working directory should look like:

```text
.
├── VQPy
├── YOLOX
├── fall.mp4	# video to query on
├── vqpy	# make vqpy available for import
├── yolox_x.pth	# YOLOX model checkpoint
├── fast_res50_256x192.pth	# AlphaPose model checkpoint
└── tsstg-model.pth	#ST-GCN model checkpoint
```

## Fall detection with VQPy

### Step 1: Define `VObj` type for person

Interested in people's pose, we create a `Person` VObj:

In [None]:
class Person(vqpy.VObjBase):
    pass

#### Adapt model from pose prediction

Two models are used to predict the pose a person:

- AlphaPose: takes the frame and person's bounding box, returns a list of keypoints. Keypoints are mid-products to be used in ST-GCN.
- ST-GCN: takes keypoints list of the last 30 frames, returns pose predicted

In [None]:
# loading the two models for inference
pose_model = SPPE_FastPose('resnet50', 224, 160, device='cuda',
        weights_file=os.path.join(
            os.path.abspath(model_dir), "fast_res50_256x192.pth"
        )
    )
action_model = TSSTG(
    weight_file=os.path.join(os.path.abspath(model_dir), "tsstg-model.pth")
)

To store the final output, person's pose, and mid-product, list of keypoints, we create two properties in `Person` VObj.

Since ST-GCN requires keypoints be stored for the last 30 frames, function that computes `keypoints` needs to be decorated with `@stateful(30)`, where `30` specifies that 30 frames of values should be saved.

Adding the two properties to `Person`, we have:

In [None]:
class Person(vqpy.VObjBase):
    required_fields = ['class_id', 'tlbr']

    @vqpy.property()
    @vqpy.stateful(30)  # require 30 frames of 
    def keypoints(self):
        image = self._ctx.frame
        tlbr = self.getv('tlbr')
        # per-frame property, tlbr could be None when tracking is lost
        # temporary work around until we have better dependency control
        if tlbr is None:
            return None
        return pose_model.predict(image, torch.tensor([tlbr]))

    @vqpy.property()
    def pose(self) -> str:
        keypoints_list = []
        # retrieve list of keypoints from the last 30 frames
        # also need to deal with object lost during tracking
        # return 'unknown' if not enough keypoints
        for i in range(-self._track_length, 0):
            keypoint = self.getv('keypoints', i)
            if keypoint is not None:
                keypoints_list.append(keypoint)
            if len(keypoints_list) >= 30:
                break
        if len(keypoints_list) < 30:
            return 'unknown'
        # type conversion to adapt data to model input
        pts = np.array(keypoints_list, dtype=np.float32)
        out = action_model.predict(pts, self._ctx.frame.shape[:2])
        action_name = action_model.class_names[out[0].argmax()]
        return action_name

> The conditional statements `if tlbr is None` in L11 and iteration over `for i in range(-self._track_length, 0)` in L21 are attempting to get the most recent bounding box and keypoints. We are still working on how properties (and stateful properties) in VObjs can be accessed rather conveniently.

## Step 2: Query on `Person`'s pose

To filter on people that are falling down, we filter on `pose` having value `"Fall Down"` (7 actions should be supported: `"Standing", "Walking", "Sitting", "Lying Down", "Stand up", "Sit down", "Fall Down"`).

`filter_cons` is:

In [None]:
filter_cons = {
    '__class__': lambda x: x == Person,
    'pose': lambda x: x == "Fall Down"
}

For output, we select:

- tracker id, selected with `track_id`
- bounding box, in format of coordinate of top-left and bottom-right corner, selected with `tlbr`

`select_cons` is:

In [None]:
select_cons = {
    'track_id': None,
    'tlbr': lambda x: str(x)
}

The query could be:

In [None]:
class FallDetection(vqpy.QueryBase):
    @staticmethod
    def setting() -> vqpy.VObjConstraint:
        return vqpy.VObjConstraint(
            filter_cons=filter_cons,
            select_cons=select_cons,
            filename='fall'
        )

## Running the query

With the `Person` VObj and the query defined, we can run the query, with:

- `cls_name` is a tuple for mapping numerical outputs of object detector to str

	e.g. `vqpy.COCO_CLASSES` here starts with `("person", "bicycle", ...)`, meaning that we will map output `0` of object detector to COCO class `"person"`

- `cls_type` is a dictionary that maps name of object type (in str) to VObj types defined

	`{"person": Person}` means we wish to map COCO class `person` to VObj type `Person`

- `tasks` is a list of queries to run on the video

In [None]:
vqpy.launch(
    cls_name=vqpy.COCO_CLASSES,
    cls_type={"person": Person},
    tasks=[FallDetection()],
    video_path=video_path,
    save_folder=save_folder,
    detector_model_dir=model_dir
)

# Expected result

Result of the query will be in `{save_folder}/{video_name}_{task_name}_{detector_name}.json`, output for this example should be in `./vqpy_outputs/fall_fall_yolox.json`.

One entry is created for each frame that has filter condition satisfied.

e.g. The entry in frame 133:

```json
{
  "frame_id": 133,
  "data": [
    { "track_id": 188, "tlbr": "[485. 270. 796. 588.]" }
  ]
}
```

<img src="./demo.assets/fall133.png" alt="with coordinate marked" style="zoom: 60%;" />