* This project is supported by Cloud Application and Platform Lab led by Prof. Yonggang Wen
An intelligent multimodal-learning based system for video, product and ads analysis. You can build various downstream applications with the system, such as product recommendation, video retrieval. Several examples are provided.
V2 is under active development currently. You are welcome to create a issue, pull request here. We will credit them into V2.
- Highlights
- Updates
- Showcase
- Download Data
- Installation
- Configuration
- Demo
- Some Useful Tools
- Credits
- Contribute to Hysia-V2O
- Paper Citation
- About Us
V1.3:
To be updated.
V1.2:
- A light model database
- model exporter
V1.1:
- Docker support
- Frontend separation
- More models
- Improve documents
- Multimodal learning-based video analysis:
- Scene / Object / Face detection and recognition
- Multimodality data preprocessing
- Results align and store
- Downstream applications:
- Intelligent ads insertion
- Content-product match
- Visualized testbed
- Visualize multimodality results
- Can be installed seperatelly
Here is a summary of required data / packed libraries.
File name | Description | File ID | Unzipped directory |
---|---|---|---|
hysia-decoder-lib-linux-x86-64.tar.gz | Hysia Decoder dependent lib | 1fi-MSLLsJ4ALeoIP4ZjUQv9DODc1Ha6O | hysia/core/HysiaDecode |
weights.tar.gz | Pretrained model weights | 1O1-QT8HJRL1hHfkRqprIw24ahiEMkfrX | . |
object-detection-data.tar.gz | Object detection data | 1an7KGVer6WC3Xt2yUTATCznVyoSZSlJG | third/object_detection |
For users without Google Drive access, you can download from Baidu Wangpan and unzip files correspondingly. (See Option 2)
# Make sure this script is run from project root
bash scripts/download-data.sh
cd ..
Note: curl
can be used to download from Google Drive directly according to amit-chahar's Gist. File names and file IDs are available from the above table:
fileid=<file id>
filename=<file name>
curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
rm cookie
Please cd
to the specific folder (from the above table, column Unzipped directory
) before execute curl
.
1. Download Hysia Decoder dependent libraries and unzip it:
deocder_path=hysia/core/HysiaDecode
mv hysia-decoder-lib-linux-x86-64.tar.gz "${deocder_path}"
cd "${deocder_path}"
tar xvzf hysia-decoder-lib-linux-x86-64.tar.gz
rm -f hysia-decoder-lib-linux-x86-64.tar.gz
cd -
2. Download pretrained model weights and unzip it:
tar xvzf weights.tar.gz
# and remove the weights zip
rm -f weights.tar.gz
3. Download object detection data in third-party library and unzip it:
mv object-detection-data.tar.gz third/object_detection
cd third/object_detection
tar xvzf object-detction-data.tar.gz
rm object-detection-data.tar.gz
cd -
Requirements:
- Conda
- Nvidia driver
- CUDA = 9*
- CUDNN
- g++
- zlib1g-dev
We recommend to install this V2O platform in a UNIX like system. These scripts are tested on Ubuntu 16.04 x86-64 with CUDA9.0 and CUDNN7.
Run the following script:
# Execute this script at project root
bash ./scripts/install-build.sh
cd ..
See Run with Docker to build and install.
# Firstly, make sure that your Conda is setup correctly and have CUDA,
# CUDNN installed on your system.
# Install Conda virtual environment
conda env create -f environment.yml
conda activate V2O
export BASE_DIR=${PWD}
# Compile HysiaDecoder
cd "${BASE_DIR}"/hysia/core/HysiaDecode
make clean
# If nvidia driver is higher than 396, set NV_VERSION=<your nvidia major version>
make NV_VERSION=<your nvidia driver major version>
# Build mmdetect
# ROI align op
cd "${BASE_DIR}"/third/
cd mmdet/ops/roi_align
rm -rf build
python setup.py build_ext --inplace
# ROI pool op
cd ../roi_pool
rm -rf build
python setup.py build_ext --inplace
# NMS op
cd ../nms
make clean
make PYTHON=python
# Initialize Django
# This will prompt some input from you
cd "${BASE_DIR}"/server
python -m grpc_tools.protoc -I . --python_out=. --grpc_python_out=. protos/api2msl.proto
python manage.py makemigrations restapi
python manage.py migrate
python manage.py loaddata dlmodels.json
python manage.py createsuperuser
unset BASE_DIR
You can omit this part as we have provided a pre-built frontend. If the frontend is updated, please run the following:
Option 1: auto-rebuild
cd server/react-build
bash ./build.sh
Option 2: Step-by-step rebuild
cd server/react-front
# Install dependencies
npm i
npm audit fix
# Build static files
npm run-script build
# fix js path
python fix_js_path.py build
# create a copy of build static files
mkdir -p tmp
cp -r build/* tmp/
# move static folder to static common
mv tmp/*html ../templates/
mv tmp/* ../static/
cp -rfl ../static/static/* ../static/
rm -r ../static/static/
# clear temp
rm -r tmp
- Decode hardware:
Change the configuration here at last line:Value can beDECODING_HARDWARE = 'CPU'
CPU
orGPU:<number>
(e.g.GPU:0
) - ML model running hardware:
Change the configuration of model servers under this directory:
A possible value can be your device ID
# Custom request servicer class Api2MslServicer(api2msl_pb2_grpc.Api2MslServicer): def __init__(self): ... os.environ['CUDA_VISIBLE_DEVICES'] = '0'
0
,0,1
, ...
cd server
# Start model server
python start_model_servers.py
# Run Django
python manage.py runserver 0.0.0.0:8000
Then you can go to http://localhost:8000. Use username: admin and password: admin to login.
- Large dataset preprocessing
- Video/audio decoding
- Model profiling
- Multimodality data testbed
Here is a list of models that we used in Hysia-V2O.
Models | GitHub Repo | License |
---|---|---|
MMDetection | ||
Google Object detection | ||
Scene Recognition | ||
Audio Recognition | ||
Image Retrieval | ||
Face Detection | ||
Face Recognition | ||
Text Detection | ||
Text Recognition |
You are welcome to pull request. We will credit it in our version 2.0.
Coming soon!
- Huaizheng Zhang [GitHub]
- Yuanming Li yli056@e.ntu.edu.sg [GitHub]
- Qiming Ai [GitHub]
- Shengsheng Zhou [GitHub]