Human Activity Detection with TensorFlow and Python

A simple baseline object detection model (Faster-RCNN with ResNet101 backbone) that can detect basic human activities like walking, running, sitting etc from image and video. The model is pre-trained on the Google AVA Actions dataset which contains the bounding box annotations for 60 basic human actions like sit, stand, walk, run etc. The entire list can be found on the label file. Checkout the blog post to learn more.

Installation

Install the dependencies using the commands below.

git clone https://github.com/visiongeeklabs/human-activity-detection.git
cd human-activity-detection
wget https://github.com/visiongeeklabs/human-activity-detection/releases/download/v0.1.0/frozen_inference_graph.pb
pip install -r requirements.txt

Running inference on image

Run inference on image using the command below

python detect_activity_image.py /path/to/input/image

# For example
python detect_activity_image.py sample_inputs/input_image1.webp

Running inference on video

Run inference on video using the command below

python detect_activity_video.py /path/to/input/video

# For example
python detect_activity_video.py sample_inputs/input_video.mp4

Limitations

There are some known limitations to this model that need to be kept in mind while using it.

It is an object detection model working on a single frame at a time. It doesn’t really have the memory of previous frames. For complex actions, it is important for the model to know what was happening in previous frames.
Faster-RCNN with ResNet101 backbone is a heavy model. It is recommended to run on a reasonably powerful GPU for faster processing. For example, on average it takes around 110 ms for a single frame of size 1280x720 on Nvidia T4 GPU (15 GB RAM) and takes around 3.5 seconds for the same frame on Intel Core i5 CPU (1.8 GHz, 8 GB RAM).
Sometimes the same person might be doing multiple activities like watching a person while standing. The model produces separate bounding boxes for each activity for the same person which might make the output image clumsy (that is why we have omitted few classes from processing).

Support on Patreon

If you are getting value out of this work, please consider supporting on Patreon and unlock exclusive perks such as

Downloadable PDFs
Ready to run Google Colab notebooks (with all the dependencies pre-configured))
Early access to blog posts and video tutorials
Hands-on live coding sessions and Q&A
Access to exclusive Discord Server

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
sample_inputs		sample_inputs
sample_outputs		sample_outputs
README.md		README.md
detect_activity_image.py		detect_activity_image.py
detect_activity_video.py		detect_activity_video.py
labels.txt		labels.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Activity Detection with TensorFlow and Python

Installation

Running inference on image

Running inference on video

Limitations

Support on Patreon

About

Releases 1

Languages

visiongeeklabs/human-activity-detection

Folders and files

Latest commit

History

Repository files navigation

Human Activity Detection with TensorFlow and Python

Installation

Running inference on image

Running inference on video

Limitations

Support on Patreon

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages