This is the sample solution for Low Power Computer Vision Challenge (LPCVC) 2020 Video Track. This solution serves only as the baseline solution and a lot of improvements can be made on top of this to further optimize the performance of the solution.
The proposed solution is made up of 3 blocks. The first block (sampling block) takes in a video file and determine which frames are worth doing detection and recognition on. This sample solution does so by dissecting the motion vector from the H.264 encoding of the video to pick out stationary i-frames. The second block (detection block) does word detection on the frames selected from the sampling block. This sample solution uses EAST Detector. Lastly, the third block (recognition block) does optical character recognition (OCR) on the cropped words. The sample solution provides two choices: Connectionist Temporal Classification (CTC) or Attention OCR.
- Clone code from master branch.
git clone https://github.com/tanliyon/lpcvc-2020.git
-
Download model file for all EAST-Detector, CTC and Attention OCR.
EAST-Detector
CTC
Attention-Encoder
Attention-Decoder -
Install dependencies.
pip install -r requirements.txt
Note that lanms might not work with Windows. -
Check directory structure. It should be:
lpcvc-2020
|_wrapper.py
|_detector.pth
|_ctc.pth
|_encoder.pth
|_decoder.pth
|_(all other folders pulled from master)
The call syntax is:
python main.py video_file_path.mp4 question_file_path.txt
To toggle between the two recognition option, you can toggle the USE_ATTN_OCR
flag in main.py. The SHOW_BOXES
flag controls if the detection output should be saved in a folder and the SHOW_TEXT
flag controls if the recognition prediction should be printed in stdout.
- Currently, the solution takes a long time because of the number of frames it run inference on. If you want to test only a portion of it, run the code for a set amount of time, then comment out the line
frames_list = iFRAMES(video_path)
in wrapper.py. Then run the code again.