MAUD Supplementary Extraction Task

This repository contains the dataset and baseline code for the MAUD supplementary extraction task, as described in the appendix of "MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding".

For the main MAUD dataset and baselines, see github.com/TheAtticusProject/maud.

bugs: The baselines reported in the papers are underperforming due to a training bug. See #1 .

Installation

pip install torch transformers tensorboard pandas scikit-learn tqdm

Notes

During the first run, feature caching and evaluation requires a lot of CPU memory (>=150 GB) and will save about 25 GB of files on the hard disk. This CPU requirement can be reduced, at the expense of speed, by lowering the --threads count in run_maud.sh.

Training uses around 22 GB of GPU memory.

Validation runs (grid-search)

./run_maud.sh

Test runs with best-performing hyperparameters

./run_maud_best_hp.sh

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
maud_data/maud_squad_split_answers		maud_data/maud_squad_split_answers
README.md		README.md
evaluate.py		evaluate.py
run_maud.sh		run_maud.sh
run_maud_best_hp.sh		run_maud_best_hp.sh
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAUD Supplementary Extraction Task

Installation

Notes

Validation runs (grid-search)

Test runs with best-performing hyperparameters

About

Releases

Packages

Contributors 2

Languages

TheAtticusProject/maud-extraction

Folders and files

Latest commit

History

Repository files navigation

MAUD Supplementary Extraction Task

Installation

Notes

Validation runs (grid-search)

Test runs with best-performing hyperparameters

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages