inspired by https://github.com/mdnestor/yt-mt3.
use it for making a new dataset for multimodal data, more precisely ,for video-midi pair.
Open one of the following Colab notebooks with your Google account.
If you run into a problem, please leave a mail.