Skip to content

Latest commit

 

History

History
114 lines (93 loc) · 5 KB

dataset.md

File metadata and controls

114 lines (93 loc) · 5 KB

We introduce the Humans in Context dataset in our work. The dataset is sourced from 10 existing research datasets, heavily filtered to contain 19M frames of humans in everyday environments, and supplemented with pose labels obtained using OpenPose.

The final dataset cannot be released directly due to licensing limitatons of the source datasets. However, below are instructions to download all of the 10 source datasets and construct the full Humans in Context meta-dataset we use in our paper. It is also possible to construct a similar dataset from different sources or a subset of the source datasets.

Download source datasets

Construct Humans in Context meta-dataset

After downloading the source datasets, the Humans in Context meta-dataset can be constructed by running a sequence of three processing scripts on each source dataset.

1. Extract high quality frames.

We filter images and video frames for sufficient resolution and bitrate, and resize so that the short edge is 256 resolution. This step is achieved by running python data_from_images.py input_dir=INPUT_DIR output_dir=OUTPUT_DIR when the source dataset consists of directories of frames stored as images, or python data_from_videos.py input_dir=INPUT_DIR output_dir=OUTPUT_DIR when the source dataset consists of video files. The output directory should be specific to each source dataset and will be used as the input the next phase of processing. For example:

python data_from_videos.py input_dir=kinetics output_dir=kinetics_frames_256

2. Filter frames with person detection.

Next, we filter clips using a person detection network. Run python data_filter_people.py input_dir=INPUT_DIR output_dir=OUTPUT_DIR where the input directory is the output of the previous step, and the output directory is again specific to each source. For example:

python data_filter_people.py input_dir=kinetics_frames_256 output_dir=kinetics_people

3. Filter and label using OpenPose.

Lastly, we filter clips and provide labels by running OpenPose. Make sure to get the OpenPose pretrained model from the main README. Then run python data_detect_pose.py input_dir=INPUT_DIR output_dir=OUTPUT_DIR where the input directory is again the output of the previous step. For example:

python data_detect_pose.py input_dir=kinetics_people output_dir=kinetics_pose

The LMDB outputs for poses, clips and frames should then be arranged to form the following directory structure:

charades 
  clips_db 
  frames_db 
  poses_db
hvu  
  clips_db 
  frames_db 
  poses_db
insta_variety  
  clips_db 
  frames_db 
  poses_db
kinetics 
  clips_db 
  frames_db 
  poses_db
moments  
  clips_db 
  frames_db 
  poses_db
mpii  
  clips_db 
  frames_db 
  poses_db
oops   
  clips_db 
  frames_db 
  poses_db
penn_action   
  clips_db 
  frames_db 
  poses_db
vlog_people   
  clips_db 
  frames_db 
  poses_db
youtube_vos 
  clips_db 
  frames_db 
  poses_db