Skip to content
/ CTA Public

Code for Conditional Topic Allocations for Open Ended Survey Responses

License

Notifications You must be signed in to change notification settings

twekhof/CTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTApy

Python package for the "Conditional Topic Allocation" (CTA): a text-analysis method that identifies topics that correlate with numerical outcomes.

How does CTA work?

CTA finds topics by conditioning on observables. For example, do Republicans write differently about politics than Democrats? It consists of three steps:


1. Predict the outcome variable with text.
  • Uses DistilBERT to predict outcome.

2. Select words with high predictive power (positive or negative).
  • Calculates SHAP values for each word and select words with a statistically significant SHAP value.

3. Group words by semantic similarity.
  • Returns topics with either positive or negative correlation with the outcome.

CTA supports all languages.

Installation

CTApy requires Python 3.9 and pip.
It is highly recommended to use a virtual environment (or conda environment) for the installation.

# upgrade pip, wheel and setuptools
python -m pip install -U pip wheel setuptools

# install the package
python -m pip install -U CTApy

If you want to use Jupyter, make sure you have it installed in the current environment.

Quickstart

Please see the hands-on tutorials, which replicate the research paper: https://github.com/twekhof/CTA/tree/main/tutorials.

Author

CTApy was developed by

Tobias Wekhof, ETH Zurich

Disclaimer

This Python package is a research tool currently under development. The authors take no responsibility for the accuracy or reliability of the results produced by it.

About

Code for Conditional Topic Allocations for Open Ended Survey Responses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published