- Clone this repository and create a conda environment with the provided dependencies:
git clone --recursive https://github.com/wei-zeng98/piano-a2s.git
cd piano-a2s
conda env create -f environment.yaml
conda activate a2s2024
- Install Fluidsynth and FFmpeg, which are used to synthesize the audios:
sudo apt-get install fluidsynth ffmpeg
- Download piano soundfonts.
Download TimGM6mb.sf2, FluidR3_GM.sf2, UprightPianoKW-20220221.sf2, SalamanderGrandPiano-V3+20200602.sf2, YDP-GrandPiano-20160804.sf2. Extract the .sf2
files and move them to your soundfont folder (e.g. /usr/share/sounds/sf2/
), and specify soundfont_folder
in hparams/pretrain.yaml
to the folder path.
- Get Humdrum extra tools.
You can choose either a) or b) to get the Humdrum extra tools.
a) Build from source:
git clone https://github.com/mangelroman/humextra.git
cd humextra
make library extractx hum2xml tiefix transpose
cd ..
b) Directly download the compiled executable binary files from the Humdrum website: extractx, hum2xml, tiefix, transpose
Once have builded/downloaded the Humdrum extra tools, add the folder containing the binary files to $PATH
.
- Install Verovio:
git clone https://github.com/rism-digital/verovio.git
cd verovio/tools
cmake ../cmake
make
sudo make install
cd ../..
If you do not install Verovio, you need to use the -r
option to set the resource directory ./verovio/data
in data_processing/render.py
where the command line is needed. You may refer to the guide for more details.
- Clone VirtuosoNet
This step should already be done in Step 1 as specified by --recursive
. However, you will probably encounter the error: xml.etree.ElementTree.Element
object has no attribute getchildren
when using VirtuosoNet. This is because in Python 3.9, the method xml.etree.ElementTree.Element.getchildren
does not work anymore. You may refer to this issue to solve the problem.
- Install MV2H
git clone https://github.com/apmcleod/MV2H.git
cd MV2H
make
cd ..
After compiling, add the bin
folder path to mv2h_bin
in hparams/finetune.yaml
.
-
Request access to MuseSyn via https://zenodo.org/records/4527460.
-
Copy the XML files in MuseSyn for further processing:
cp -r path/to/MuseSyn/xml data_processing/xml/
Get the kern files for HumSyn dataset:
chomd +x data_processing/get_kern.sh
bash data_processing/get_kern.sh
-
Change the
workspace
in bothhparams/pretrain.yaml
andhparams/finetune.yaml
to your own path, which is used for saving synthesized data and trained models etc. -
Synethesizing:
python data_processing/render.py
Synthesized dataset will be saved at workspace
as indicated in hparams/pretrain.yaml
.
Please refer to the ASAP repo and prepare the ASAP dataset. Once finished, set the asap_folder
in hparams/finetune.yaml
to the ASAP dataset folder, and run:
python datasets/asap.py
python pretrain.py hparams/pretrain.yaml
To use multiple GPUs (e.g. use 4 GPUs including CUDA:0, CUDA:1, CUDA:2 and CUDA:3):
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 pretrain.py hparams/pretrain.yaml
python finetune.py hparams/finetune.yaml
To use multiple GPUs (e.g. use 4 GPUs including CUDA:0, CUDA:1, CUDA:2 and CUDA:3):
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune.py hparams/finetune.yaml
python evaluate.py
Currently, this model is not yet applicable in real-world scenarios, despite having been tested on real-world audio. This is primarily due to the following reasons:
- The length of the audio clips is constrained to 5 bars. Cutting an arbitrary recording into 5-bar clips without downbeat information is inflexible.
- The maximum length of the input audio is limited to 12 seconds due to memory constraints. Audio with durations longer than 12 seconds will exceed the model's capacity.
- The model sometimes outputs illegal Kern sequences, requiring a post-processing step to handle these sequences.
We plan to address these limitations in our future work and welcome researchers to test the model.
Wei Zeng (Ph.D. Student at NUS)