Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction

This repository contains the source code and dataset for the paper: A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Zhepei Wei, Jianlin Su, Yue Wang, Yuan Tian and Yi Chang. ACL 2020. [pdf]


At the core of the proposed CasRel framework is the fresh perspective that instead of treating relations as discrete labels on entity pairs, we actually model the relations as functions that map subjects to objects. More precisely, instead of learning relation classifiers f(s,o) -> r, we learn relation-specific taggers f_{r}(s) -> o, each of which recognizes the possible object(s) of a given subject under a specific relation. Under this framework, relational triple extraction is a two-step process: first we identify all possible subjects in a sentence; then for each subject, we apply relation-specific taggers to simultaneously identify all possible relations and the corresponding objects.



This repo was tested on Python 3.7 and Keras 2.2.4. The main requirements are:

  • tqdm
  • codecs
  • keras-bert = 0.80.0
  • tensorflow-gpu = 1.13.1



  1. Get pre-trained BERT model for Keras

    Download Google's pre-trained BERT model (BERT-Base, Cased). Then decompress it under pretrained_bert_models/. More pre-trained models are available here.

  2. Build dataset in the form of triples

    Take the NYT dataset for example:

    a) Switch to the corresponding directory and download the dataset

    cd CasRel/data/NYT/raw_NYT

    b) Follow the instructions at the same directory, and just run


    c) Finally, build dataset in the form of triples

    cd CasRel/data/NYT

    This will convert the raw numerical dataset into a proper format for our model and generate train.json, test.json and val.json(if not provided in the raw dataset, it will randomly sample 5% or 10% data from the train.json or test.json to create val.json as in line with previous works). Then split the test dataset by type and num for in-depth analysis on different scenarios of overlapping triples.

  3. Specify the experimental settings

    By default, we use the following settings in

        "bert_model": "cased_L-12_H-768_A-12",
        "max_len": 100,
        "learning_rate": 1e-5,
        "batch_size": 6,
        "epoch_num": 100,
  4. Train and select the model

    Specify the running mode and dataset at the command line

    python ---train=True --dataset=NYT

    The model weights that lead to the best performance on validation set will be stored in saved_weights/DATASET/.

  5. Evaluate on the test set

    Specify the test dataset at the command line

    python --dataset=NYT

    The extracted result will be saved in results/DATASET/ with the following format:

        "text": "Tim Brooke-Taylor was the star of Bananaman , an STV series first aired on 10/03/1983 and created by Steve Bright .",
        "triple_list_gold": [
                "subject": "Bananaman",
                "relation": "starring",
                "object": "Tim Brooke-Taylor"
                "subject": "Bananaman",
                "relation": "creator",
                "object": "Steve Bright"
        "triple_list_pred": [
                "subject": "Bananaman",
                "relation": "starring",
                "object": "Tim Brooke-Taylor"
                "subject": "Bananaman",
                "relation": "creator",
                "object": "Steve Bright"
        "new": [],
        "lack": []


  title={A Novel Cascade Binary Tagging Framework for Relational Triple Extraction},
  author={Wei, Zhepei and Su, Jianlin and Wang, Yue and Tian, Yuan and Chang, Yi},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},


A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Accepted by ACL 2020.








No releases published


No packages published