Skip to content


Repository files navigation

This is based on the project "Archive of 20th Century Cantonese Chanting in Hong Kong"「二十世紀香港粵語吟誦典藏」

Classification of Cantonese Chanting by Machine Learning.

Cantonese Chanting is a traditional Chinese scholarly practice of reading, composing, and teaching classical poetry and prose in a specific melodic style. It is characterized by its melodic nature and room for improvisation, with various schools formed by dialects, lineages, and personal preferences. The art of chanting is gradually fading, as the elderly who once learned in private schools pass away, making the preservation of this artform urgent.

Since many recordings are extracted from class sessions, with chanting intermingled with teaching. Researcher have to find out the chanting activities within these hours long recordings. The works could be time consuming.

By using the python based module pyAudioAnalysis, it is possible to identify speech and chanting by the machine learning method. This tool as a quick filter to identify if the chanting exists and to label the period in the audio.

As one of the default output file is txt, which include the labels of "start time", "end time" and "class". This is recommended to use Audacity to import the txt for further audio editing.


  • Clone the source of this library: git clone, or directly download ZIP from this page.
  • Copy your test audio(in wav format) into folder "audio/test/source"
  • Open terminal and cd to the project's file path, e.g.
cd C:\Users\Users\Documents\Machine-Learning-Chanting
  • Use the trained SVM model in folder "model" to predict the result. Run the following in terminal:
  • The result .txt file will be saved in the folder "audio/test/result". It contains the start time, end time and label(show "Chanting" only for convenience because we only want to extract the chanting section from the source file).
  • Open Audacity and then import the audio and the label file for further editing. We need to adjust the actual start/end time for
  • Export the edited label file as .segments file and save it to the source wav file. It will be used to extract the chanting section from the original file.
  • To extract chanting from the source audio:
python source_folder target_folder
python audio/test/source audio/result_final
  • All chanting audio will be in the folder /audio/result_final

Machine Learning

Data Collection

Data Preprocessing

  • All .mp3 file have to be convected to .wav file.
  • Denoised and normalized volume.
  • Audios file are segmented and divided into three classes: Chanting, Speech, Silence.

Training of the model

  • With the function of pyAudioAnalysis, the segment-based audios are used for feature extraction and classifier training
  • SVM method are used as its preformeance are better in our case.
  • The function also include the hyperparameter tuning and evaluation.

For our trained model, the confusion Matrix:

Chanting Speaking Silence
Chanting 32.84 3.37 0.09
Speaking 3.25 47.25 0.74
Silence 0.12 0.74 11.59
  • Best macro f1: 91.9
  • Best macro f1: std 1.9
  • Selected params: 0.10000

The Classifier with parameter C = 0.1 has the best f1 score of 91.9%

About pyAudioAnalysis

pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:

  • Extract audio features and representations (e.g. mfccs, spectrogram, chromagram)
  • Train, parameter tune and evaluate classifiers of audio segments
  • Classify unknown sounds
  • Detect audio events and exclude silence periods from long recordings
  • Perform supervised segmentation (joint segmentation - classification)
  • Perform unsupervised segmentation (e.g. speaker diarization) and extract audio thumbnails
  • Train and use audio regression models (example application: emotion recognition)
  • Apply dimensionality reduction to visualize audio data and content similarities

This is general info. Click here for the complete wiki and here for a more generic intro to audio data handling

Further reading

Apart from this README file, to bettern understand how to use this library one should read the following:

  title={pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis},
  author={Giannakopoulos, Theodoros},
  journal={PloS one},
  publisher={Public Library of Science}

For Matlab-related audio analysis material check this book.


No description, website, or topics provided.







No releases published