This repo contains Jupyter/IPython notebooks and Python and R scripts using the data from Hillenbrand et al. 1995 on the acoustic measurements on American English vowels. The vowels are either identified using a Support Vector Machine or separated into different clusters using k-means clustering and Gaussian mixture models. Learning algorithms are implemented in scikit-learn
and visualization is performed in R using ggplot2
(clusters only). For the supervised learning, formant values are used as predictors; for the cluster analyses, formants and formant ratios are used as the features for the clustering.
- Import data and remove rows with at least one missing observation
- Identify observation by word, vowel, and sex of speaker
- Map vowel characteristics to observation (e.g., front, open-mid, etc.)
- Create targets for speaker sex, word, and vowel (for supervised learning)
- Normalizing features (z-score)
- Create feature matrices
- Implement classification or clustering algorithms
- Visualize clusters and feature space
- Vowel measurement data in CSV format
- Vowel observations and cluster assignments for both k-means and Gaussian mixture models
- Jupyter/IPython notebook using Python kernel to format data for unsupervised learning
- Jupyter/IPython notebook using Python kernel to classify vowels using a Support Vector Machine
- Jupyter/IPython notebook using Python kernel to implement clustering of vowel observations
- Jupyter/IPython notebooks using R kernel to plot the data
- Python script to format data for unsupervised learning
- Python script to classify vowels using a Support Vector Machine
- Python script to implement clustering of vowel observations
- R scripts to plot the data