Skip to content

Sample solution development for the Video Track of Low Power Computer Vision Challenge 2020.

Notifications You must be signed in to change notification settings

tanliyon/lpcvc-2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LPCVC-2020 Sample Solution

Overview

This is the sample solution for Low Power Computer Vision Challenge (LPCVC) 2020 Video Track. This solution serves only as the baseline solution and a lot of improvements can be made on top of this to further optimize the performance of the solution.

The proposed solution is made up of 3 blocks. The first block (sampling block) takes in a video file and determine which frames are worth doing detection and recognition on. This sample solution does so by dissecting the motion vector from the H.264 encoding of the video to pick out stationary i-frames. The second block (detection block) does word detection on the frames selected from the sampling block. This sample solution uses EAST Detector. Lastly, the third block (recognition block) does optical character recognition (OCR) on the cropped words. The sample solution provides two choices: Connectionist Temporal Classification (CTC) or Attention OCR.

Contents

  1. Setup
  2. Usage
  3. Notes

Setup

  1. Clone code from master branch.
git clone https://github.com/tanliyon/lpcvc-2020.git
  1. Download model file for all EAST-Detector, CTC and Attention OCR.
    EAST-Detector
    CTC
    Attention-Encoder
    Attention-Decoder

  2. Install dependencies.
    pip install -r requirements.txt
    Note that lanms might not work with Windows.

  3. Check directory structure. It should be:
    lpcvc-2020
    |_wrapper.py
    |_detector.pth
    |_ctc.pth
    |_encoder.pth
    |_decoder.pth
    |_(all other folders pulled from master)

Usage

The call syntax is:

python main.py video_file_path.mp4 question_file_path.txt

To toggle between the two recognition option, you can toggle the USE_ATTN_OCR flag in main.py. The SHOW_BOXES flag controls if the detection output should be saved in a folder and the SHOW_TEXT flag controls if the recognition prediction should be printed in stdout.

Notes

  1. Currently, the solution takes a long time because of the number of frames it run inference on. If you want to test only a portion of it, run the code for a set amount of time, then comment out the line frames_list = iFRAMES(video_path) in wrapper.py. Then run the code again.

References

  1. Low Power Computer Vision Challenge (LPCVC) 2020 Video Track
  2. EAST Detector
  3. Connectionist Temporal Classification (CTC)
  4. Attention OCR

About

Sample solution development for the Video Track of Low Power Computer Vision Challenge 2020.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages