This is based on the project "Archive of 20th Century Cantonese Chanting in Hong Kong"「二十世紀香港粵語吟誦典藏」
Cantonese Chanting is a traditional Chinese scholarly practice of reading, composing, and teaching classical poetry and prose in a specific melodic style. It is characterized by its melodic nature and room for improvisation, with various schools formed by dialects, lineages, and personal preferences. The art of chanting is gradually fading, as the elderly who once learned in private schools pass away, making the preservation of this artform urgent.
Since many recordings are extracted from class sessions, with chanting intermingled with teaching. Researcher have to find out the chanting activities within these hours long recordings. The works could be time consuming.
By using the python based module pyAudioAnalysis, it is possible to identify speech and chanting by the machine learning method. This tool as a quick filter to identify if the chanting exists and to label the period in the audio.
As one of the default output file is txt, which include the labels of "start time", "end time" and "class". This is recommended to use Audacity to import the txt for further audio editing.
- Clone the source of this library:
git clone https://github.com/tylercuhklib/Machine-Learning-Chanting.git
, or directly download ZIP from this page. - Copy your test audio(in wav format) into folder "audio/test/source"
- Open terminal and cd to the project's file path, e.g.
cd C:\Users\Users\Documents\Machine-Learning-Chanting
- Use the trained SVM model in folder "model" to predict the result. Run the following in terminal:
python predict_result.py
- The result .txt file will be saved in the folder "audio/test/result". It contains the start time, end time and label(show "Chanting" only for convenience because we only want to extract the chanting section from the source file).
- Open Audacity and then import the audio and the label file for further editing. We need to adjust the actual start/end time for
- Export the edited label file as .segments file and save it to the source wav file. It will be used to extract the chanting section from the original file.
- To extract chanting from the source audio:
python extract_chanting.py source_folder target_folder
#example
python extract_chanting.py audio/test/source audio/result_final
- All chanting audio will be in the folder /audio/result_final
- The data are mainly from 「二十世紀香港粵語吟誦典藏」. Save the urls to urls.txt
- To download the mp3 file:
python getaudio.py
- All .mp3 file have to be convected to .wav file.
- Denoised and normalized volume.
- Audios file are segmented and divided into three classes: Chanting, Speech, Silence.
- With the function of pyAudioAnalysis, the segment-based audios are used for feature extraction and classifier training
- SVM method are used as its preformeance are better in our case.
- The function also include the hyperparameter tuning and evaluation.
python train_the_model.py
For our trained model, the confusion Matrix:
Chanting | Speaking | Silence | |
---|---|---|---|
Chanting | 32.84 | 3.37 | 0.09 |
Speaking | 3.25 | 47.25 | 0.74 |
Silence | 0.12 | 0.74 | 11.59 |
- Best macro f1: 91.9
- Best macro f1: std 1.9
- Selected params: 0.10000
The Classifier with parameter C = 0.1 has the best f1 score of 91.9%
pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:
- Extract audio features and representations (e.g. mfccs, spectrogram, chromagram)
- Train, parameter tune and evaluate classifiers of audio segments
- Classify unknown sounds
- Detect audio events and exclude silence periods from long recordings
- Perform supervised segmentation (joint segmentation - classification)
- Perform unsupervised segmentation (e.g. speaker diarization) and extract audio thumbnails
- Train and use audio regression models (example application: emotion recognition)
- Apply dimensionality reduction to visualize audio data and content similarities
This is general info. Click here for the complete wiki and here for a more generic intro to audio data handling
Apart from this README file, to bettern understand how to use this library one should read the following:
- Audio Handling Basics: Process Audio Files In Command-Line or Python, if you want to learn how to handle audio files from command line, and some basic programming on audio signal processing. Start with that if you don't know anything about audio.
- Intro to Audio Analysis: Recognizing Sounds Using Machine Learning This goes a bit deeper than the previous article, by providing a complete intro to theory and practice of audio feature extraction, classification and segmentation (includes many Python examples).
- The library's wiki
- How to Use Machine Learning to Color Your Lighting Based on Music Mood. An interesting use-case of using this lib to train a real-time music mood estimator.
- A more general and theoretic description of the adopted methods (along with several experiments on particular use-cases) is presented in this publication. Please use the following citation when citing pyAudioAnalysis in your research work:
@article{giannakopoulos2015pyaudioanalysis,
title={pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis},
author={Giannakopoulos, Theodoros},
journal={PloS one},
volume={10},
number={12},
year={2015},
publisher={Public Library of Science}
}
For Matlab-related audio analysis material check this book.