End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding

Paper | Demo | Pre-trained Models

Installation

Clone this repository and create a conda environment with the provided dependencies:

git clone --recursive https://github.com/wei-zeng98/piano-a2s.git
cd piano-a2s
conda env create -f environment.yaml
conda activate a2s2024

Install Fluidsynth and FFmpeg, which are used to synthesize the audios:

sudo apt-get install fluidsynth ffmpeg

Download piano soundfonts.

Download TimGM6mb.sf2, FluidR3_GM.sf2, UprightPianoKW-20220221.sf2, SalamanderGrandPiano-V3+20200602.sf2, YDP-GrandPiano-20160804.sf2. Extract the .sf2 files and move them to your soundfont folder (e.g. /usr/share/sounds/sf2/), and specify soundfont_folder in hparams/pretrain.yaml to the folder path.

Get Humdrum extra tools.

You can choose either a) or b) to get the Humdrum extra tools.

a) Build from source:

git clone https://github.com/mangelroman/humextra.git
cd humextra
make library extractx hum2xml tiefix transpose
cd ..

b) Directly download the compiled executable binary files from the Humdrum website: extractx, hum2xml, tiefix, transpose

Once have builded/downloaded the Humdrum extra tools, add the folder containing the binary files to $PATH.

Install Verovio:

git clone https://github.com/rism-digital/verovio.git
cd verovio/tools
cmake ../cmake
make
sudo make install
cd ../..

If you do not install Verovio, you need to use the -r option to set the resource directory ./verovio/data in data_processing/render.py where the command line is needed. You may refer to the guide for more details.

Clone VirtuosoNet

This step should already be done in Step 1 as specified by --recursive. However, you will probably encounter the error: xml.etree.ElementTree.Element object has no attribute getchildren when using VirtuosoNet. This is because in Python 3.9, the method xml.etree.ElementTree.Element.getchildren does not work anymore. You may refer to this issue to solve the problem.

Install MV2H

git clone https://github.com/apmcleod/MV2H.git
cd MV2H
make
cd ..

After compiling, add the bin folder path to mv2h_bin in hparams/finetune.yaml.

Preparing Datasets

Get MuseSyn

Request access to MuseSyn via https://zenodo.org/records/4527460.
Copy the XML files in MuseSyn for further processing:

cp -r path/to/MuseSyn/xml data_processing/xml/

Get HumSyn

Get the kern files for HumSyn dataset:

chomd +x data_processing/get_kern.sh
bash data_processing/get_kern.sh

Data synthesizing

Change the workspace in both hparams/pretrain.yaml and hparams/finetune.yaml to your own path, which is used for saving synthesized data and trained models etc.
Synethesizing:

python data_processing/render.py

Synthesized dataset will be saved at workspace as indicated in hparams/pretrain.yaml.

Prepare ASAP

Please refer to the ASAP repo and prepare the ASAP dataset. Once finished, set the asap_folder in hparams/finetune.yaml to the ASAP dataset folder, and run:

python datasets/asap.py

Train

Pretrain

python pretrain.py hparams/pretrain.yaml

To use multiple GPUs (e.g. use 4 GPUs including CUDA:0, CUDA:1, CUDA:2 and CUDA:3):

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 pretrain.py hparams/pretrain.yaml

Finetune

python finetune.py hparams/finetune.yaml

To use multiple GPUs (e.g. use 4 GPUs including CUDA:0, CUDA:1, CUDA:2 and CUDA:3):

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune.py hparams/finetune.yaml

Evaluate

python evaluate.py

Limitations

Currently, this model is not yet applicable in real-world scenarios, despite having been tested on real-world audio. This is primarily due to the following reasons:

The length of the audio clips is constrained to 5 bars. Cutting an arbitrary recording into 5-bar clips without downbeat information is inflexible.
The maximum length of the input audio is limited to 12 seconds due to memory constraints. Audio with durations longer than 12 seconds will exceed the model's capacity.
The model sometimes outputs illegal Kern sequences, requiring a post-processing step to handle these sequences.

We plan to address these limitations in our future work and welcome researchers to test the model.

Contact

Wei Zeng (Ph.D. Student at NUS)

w.zeng@u.nus.edu

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data_processing		data_processing
datasets		datasets
hparams		hparams
images		images
virtuosoNet @ 9d4d477		virtuosoNet @ 9d4d477
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
evaluate.py		evaluate.py
evaluate_midi_mv2h.sh		evaluate_midi_mv2h.sh
finetune.py		finetune.py
models.py		models.py
pretrain.py		pretrain.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding

Paper | Demo | Pre-trained Models

Installation

Preparing Datasets

Get MuseSyn

Get HumSyn

Data synthesizing

Prepare ASAP

Train

Pretrain

Finetune

Evaluate

Limitations

Contact

About

Releases

Packages

Languages

License

wei-zeng98/piano-a2s

Folders and files

Latest commit

History

Repository files navigation

End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding

Paper | Demo | Pre-trained Models

Installation

Preparing Datasets

Get MuseSyn

Get HumSyn

Data synthesizing

Prepare ASAP

Train

Pretrain

Finetune

Evaluate

Limitations

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages