ATEPP is a dataset of expressive piano performances by virtuoso pianists. The dataset contains 11742 11677 performances (~1000 hours) by 49 pianists and covers 1580 movements by 25 composers. All of the MIDI files in the dataset come from the piano transcription of existing audio recordings of piano performances. Scores in MusicXML format are also available for around half of the tracks. The dataset is organized and aligned by compositions and movements for comparative studies.
Please follow disclaimer.md to agree a disclaimer and download a latest version of ATEPP (~212MB).
You can inference your own track with the modified code and new checkpoint in piano_transcription-master
. The env and setup are the same as https://github.com/bytedance/piano_transcription
python3 pytorch/inference.py --model_type=Regress_onset_offset_frame_velocity_CRNN --checkpoint_path=300000_iterations.pth --audio_path="resources/schumann_romanzen.mp3" --cuda
- 11742 performances (in midi format)
- 1007 hours
- 1580 movements
- 25 composers
- 49 performers
- 43% with scores
Updates: When creating ATEPP version-1.0, we only applied movement-wise matching to remove erroneously downloaded audio. Now, we finished detecting repeated audios by audio-wise fingerprint matching. Only 65 audios were detected repeated, and the corresponding transcribed midi files were removed. The repeats.csv
lists the repeated transcribed files that have been removed.
Changed Statistics:
- 11677 performances
- 1002 hours
We've released a Python package developed for linking classical music recording & track to the corresponding composition / movement, useful in cleaning up metadata in classical music datasets.
Package on PyPI: https://pypi.org/project/composition-entity-linker/
- Huan Zhang @github/anusfoil, huan.zhang@qmul.ac.uk,
- Jingjing Tang @github/BetsyTang, jingjing.tang@qmul.ac.uk
- Syed Rafee, @github/syedrafee s.rafee@qmul.ac.uk
CC BY 4.0