scripts to model depression in speech and text
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This repo contains scripts to model depression in speech and text. LSTM models are utilized to model at the segment-level of an interview (not at the word-level). The two modalities are also combined and fed into a feedforward network.


The data used can be downloaded from the Distress Analysis Interview Corpus, and contains audio, video, and text of interviews with 189 subjects, about 20% of whom had some level of depression.


The features are either segment-level statistics of the audio, or doc2vec embeddings of the words in a segment. Higher-level audio features (mean, max, min, median, std) were extracted using the COVAREP and FORMNAT features provided in the corpus, and the doc2vec embeddings were generated using this script. I trained using the binary outcomes as well as the multi-class outcomes.


The repo contains the following files:

  • which contains the methods used to train the models.
  • requirements.txt which are the libraries used in the conda environment of this project.

Keras with the tensorflow back-end was used for modeling.

Interested in using my audio/text features? Let me know.


I used the following librarires:


Reference Paper

T. Alhanai, MM. Ghassemi, J. Glass, 
"Detecting "Detecting Depression with Audio/Text Sequence Modeling of Interviews"
Interspeech 2018, India

Paper can be found here

DISCLAIMER: The user accepts the code / configuration / repo AS IS, WITH ALL FAULTS.