The data are contained in pickle files with download links: TAP-Vid-DAVIS and TAP-Vid-RGB-stacking.
For DAVIS, the pickle file contains a dictionary, where each key is a DAVIS video name, and the values are the frames (4D uint8 tensor), the points (float32 tensor with 3 axes; the first is point id, the second is time, and the third is x/y), and the occlusions (bool tensor with 2 axis; the first is point id, the second is time). RGB-Stacking is the same format, except there is no video name, so it is a list of these structures rather than a dictionary.
The labels are contained in a csv file with download link: TAP-Vid-Kinetics.
The videos are expected as the raw clips from Kinetics700-2020 validation set and stored in a local folder <video_root_path>
. The clips should be stored as MP4, following the name pattern f'{youtube_id}_{start_time_sec:06}_{end_time_sec:06}.mp4'
, e.g. 'abcdefghijk_000010_000020.mp4'.
Clips can be stored in any subfolder within the <video_root_path>
. The most common pattern is to store it as <video_root_path>/<action_label>/<clip_name>
.
Once the validation clips have been downloaded, a pickle file containing all the information can be generated using the provided script:
pip3 install -r requirements.txt
python3 generate_tapvid.py \
--input_csv_path=<path_to_tapvid_kinetics.csv> \
--output_base_path=<path_to_pickle_folder> \
--video_root_path=<path_to_raw_videos_root_folder> \
--alsologtostderr
We also provide a script generating an MP4 with the points painted on top of the frames. The script will work with any of the pickle files. A random clip is chosen from all the available ones and all the point tracks are painted.
pip3 install -r requirements.txt
python3 visualize.py \
--input_path=<path_to_pickle_file.pkl> \
--output_path=<path_to_output_video.mp4> \
--alsologtostderr
For visualization examples, we have the full TAP-Vid-DAVIS as well as 10 examples from the synthetic TAP-Vid-Kubric and TAP-Vid-RGB-Stacking datasets.
TAP-Vid-DAVIS, TAP-Vid-RGB-stacking and TAP-Vid-Kinetics are mainly used for evaluation purpose. To train the model, we use TAP-Vid-Kubric.