Clustering of the Hillenbrand vowel data

This repo contains Jupyter/IPython notebooks and Python and R scripts using the data from Hillenbrand et al. 1995 on the acoustic measurements on American English vowels. The vowels are either identified using a Support Vector Machine or separated into different clusters using k-means clustering and Gaussian mixture models. Learning algorithms are implemented in scikit-learn and visualization is performed in R using ggplot2 (clusters only). For the supervised learning, formant values are used as predictors; for the cluster analyses, formants and formant ratios are used as the features for the clustering.

Steps:

Import data and remove rows with at least one missing observation
Identify observation by word, vowel, and sex of speaker
Map vowel characteristics to observation (e.g., front, open-mid, etc.)
Create targets for speaker sex, word, and vowel (for supervised learning)
Normalizing features (z-score)
Create feature matrices
Implement classification or clustering algorithms
Visualize clusters and feature space

Directories and contents

`data`:

Vowel measurement data in CSV format
Vowel observations and cluster assignments for both k-means and Gaussian mixture models

`notebooks`:

Jupyter/IPython notebook using Python kernel to format data for unsupervised learning
Jupyter/IPython notebook using Python kernel to classify vowels using a Support Vector Machine
Jupyter/IPython notebook using Python kernel to implement clustering of vowel observations
Jupyter/IPython notebooks using R kernel to plot the data

`scripts`:

Python script to format data for unsupervised learning
Python script to classify vowels using a Support Vector Machine
Python script to implement clustering of vowel observations
R scripts to plot the data

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
README.md		README.md
hillenbrand-gaussmm-formant-data.csv		hillenbrand-gaussmm-formant-data.csv
hillenbrand-gaussmm-formant-ratio-data.csv		hillenbrand-gaussmm-formant-ratio-data.csv
hillenbrand-kmeans-formant-data.csv		hillenbrand-kmeans-formant-data.csv
hillenbrand-kmeans-formant-ratio-data.csv		hillenbrand-kmeans-formant-ratio-data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering of the Hillenbrand vowel data

Steps:

Directories and contents

`data`:

`notebooks`:

`scripts`:

About

Releases

Packages

Languages

zixin-yan/hillenbrand-vowel-clustering

Folders and files

Latest commit

History

Repository files navigation

Clustering of the Hillenbrand vowel data

Steps:

Directories and contents

data:

notebooks:

scripts:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`data`:

`notebooks`:

`scripts`:

Packages