CompEx

Extract competency triples from written text.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Python 3.7
pipenv
(optional) pyenv to automatically install required Pythons
- If pyenv is not installed, Python 3.7 is required, otherwise pyenv will install it
Java JRE 1.8+ for CoreNLP server
Stanford CoreNLP

Installing

Setup a python virtual environment and download all dependencies

$ pipenv install --dev

ComPex requires an installation of CoreNLP with german models. Download required CoreNLP Java server and german models from here to destination of your choosing. You can use the following script to automate this process, which downloads all required files to ./.corenlp:

$ ./download_corenlp.sh

Enter pipenv virtual environment

$ pipenv shell

Running

Set environment variable $CORENLP_HOME to the directory, where CoreNLP and german models are located. If you used the helper script download_corenlp.sh, the files are in ./.corenlp.

$ export CORENLP_HOME=./.corenlp

Show help

$ python -m compex -h

Extraction

Show help

$ python -m compex extract -h

Extract competencies of a simple sentence (you can pipe textdata into compex!)

$ echo "Die studierenden beherrschen grundlegende Techniken des wissenschaftlichen Arbeitens." | python -m compex extract

or use a file

$ python -m compex extract testsentences.txt

or use stdin

$ python -m compex extract < testsentences.txt

Check for taxonomy verbs. Checks if a found competency verb is in the given taxonomy verb dictionary. If not, it's ignored. In addition, this parameter fills the taxonomy_dimension parameter of the extracted competency. You can use the sample file blooms_taxonomy.json.

$ python -m compex extract --taxonomyjson blooms_taxonomy.json testsentences.txt

Sample output on stdout (formatted for better readability)

{
    "Die studierenden beherrschen grundlegende Techniken des wissenschaftlichen Arbeitens.": [
        {
            "objects": [],
            "taxonomy_dimension": null,
            "word": {
                "index": 2,
                "word": "beherrschen"
            }
        }
    ]
}

Evaluation

Evaluate compex against pre-annotated data. Outputs recall, precision and F1. To evaluate a pre-annoted WebAnno TSV 3.2 file is needed. See here for the file format. You can use WebAnno to annotate data and evaluate compex with it. This repository contains pre-annotated data from Modulhandbooks of Department~VI of Beuth University of Applied Sciences Berlin. They can be found here: tests/resources/bht-annotated. The corresponding WebAnno Projekt is located at tests/resources/webanno/BHT+Test_2020-03-22_1808.zip.

Show help

$ python -m compex evaluate -h

Evaluate only competency verbs

$ python -m compex evaluate tests/resources/test.tsv

Evaluate competency verbs and objects

$ python -m compex evaluate --objects tests/resources/test.tsv

Evaluate competency verbs, objects and contexts

$ python -m compex evaluate --objects --contexts tests/resources/test.tsv

It is possible to use a dedicated taxonomy json file just like with the extract function

$ python -m compex evaluate --taxonomyjson blooms_taxonomy.json tests/resources/test.tsv

Sample evaluation output on stdout (formatted for better readability)

{
    "f1": 0.5024705551113972,
    "negatives": {
        "false": 168.36206347622323,
        "true": 81.63793652377686
    },
    "positives": {
        "false": 137.53333333333336,
        "true": 154.4666666666666
    },
    "precision": 0.5289954337899542,
    "recall": 0.4784786862008745
}

Running the tests

Run unit tests. CoreNLP server in ./.corenlp is required!

$ pytest

Get test coverage

Run coverage

$ coverage run --source=./compex/ -m pytest

Export coverage report as html

$ coverage html

Generate coverage badge

$ coverage-badge -o coverage.svg

Built With

Python 3.7
pipenv - Python Development Workflow for Humans
stanfordnlp - Python NLP Library for many Human Languages
Stanford CoreNLP - Natural language software
jsonpickle - Python library for serialization and deserialization of complex Python objects

Authors

Timo Raschke - Initial work - traschke

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.vscode		.vscode
compex		compex
tests		tests
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
coverage.svg		coverage.svg
download_corenlp.sh		download_corenlp.sh
testsentences.txt		testsentences.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CompEx

Getting Started

Prerequisites

Installing

Running

Extraction

Evaluation

Running the tests

Get test coverage

Built With

Authors

License

Acknowledgments

Sources for Bloom's Taxonomy verbs:

About

Releases 1

Packages

Languages

License

traschke/bht-compex

Folders and files

Latest commit

History

Repository files navigation

CompEx

Getting Started

Prerequisites

Installing

Running

Extraction

Evaluation

Running the tests

Get test coverage

Built With

Authors

License

Acknowledgments

Sources for Bloom's Taxonomy verbs:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages