Skip to content

yanran-tang/CaseGNN

Repository files navigation

CaseGNN

Code for ECIR 2024 paper.

Title: CaseGNN: Graph Neural Networks for Legal Case Retrieval with Text-Attributed Graphs

Author: Yanran Tang, Ruihong Qiu, Yilun Liu, Xue Li and Zi Huang

Installation

Requirements can be seen in /requirements.txt

Dataset

Datasets can be downloaded from COLIEE2022 and COLIEE2023.

Specifically, the downloaded COLIEE2023 folders task1_train_files_2023 and task1_test_files_2023 should be put into /PromptCase/task1_train_2023/ and /PromptCase/task1_test_2023/ respectively.

The label file task1_train_labels_2023.json and task1_test_labels_2023.json shoule be put into folder /label/.

COLIEE2022 folders should be set in a similar way.

The final project file are as follows:

```
$ ./CaseGNN/
.
├── DATASET
│   └── data_load.py
├── Grpah_generation
│   ├── graph
│   │   ├── graph_bin_2022
│   │   └── graph_bin_2023
│   └── TACG.py
├── Information_extraction  
│   ├── coliee2022_ie    
│   ├── coliee2023_ie
│   ├── lexnlp             
│   ├── stanford-openie
│   ├── create_structured_csv.py
│   ├── knowledge_graph.py
│   └── relation_extractor.py             
├── label 
│   ├── hard_neg_top50_train_2022.json
│   ├── hard_neg_top50_train_2023.json
│   ├── task1_test_labels_2022.json            
│   ├── task1_test_labels_2023.json 
│   ├── task1_train_labels_2022.json 
│   ├── task1_train_labels_2023.json 
│   ├── test_2022_candidate_with_yearfilter.json
│   └── test_2023_candidate_with_yearfilter.json     
├── PromptCase
│   ├── preprocessing
│   │   ├── openaiAPI.py
│   │   ├── process.py
│   │   └── reference.py
│   ├── promptcase_embedding
│   ├── PromptCase_embedding_generation.py
│   ├── task1_test_2022
│   │   └── task1_test_files_2022
│   ├── task1_test_2023
│   │   └── task1_test_files_2023
│   ├── task1_train_2022
│   │   └── task1_train_files_2022
│   └── task1_train_2023
│       └── task1_train_files_2023
├── CaseGNN_2022.sh
├── CaseGNN_2023.sh
├── LegalFeatureExtraction_2022.sh
├── LegalFeatureExtraction_2023.sh
├── PromptcaseEmbeddingGeneration_2022.sh
├── PromptcaseEmbeddingGeneration_2023.sh
├── RelationExtraction_2022.sh
├── RelationExtraction_2023.sh
├── TACG_2022.sh
├── TACG_2023.sh
├── main.py
├── model.py
├── torch_metrics.py
├── train.py
├── requirements.txt
└── README.md          
```

1. Information Extraction

    1. Legal Feature Extraction
    • PromptCase Preprocessing is used to extracted the fact and issue from the cases.

    • Run . ./LegalFeatureExtraction_2023.sh to generate files in the following three folders:

      • /PromptCase/task1_test_2023/processed/,
      • /PromptCase/task1_test_2023/processed_new/, which is the legal issues of cases,
      • /PromptCase/task1_test_2023/summary_test_2023_txt/, which is the legal facts of cases.
    • The same process for COLIEE2022.

    1. Relation Extraction
    • Run . ./RelationExtraction_2023.sh.

    • The final relation triplets are in the folder /Information_extraction/coliee2023_ie/coliee2023train(or test)_sum(or fact)/result/.

    • The same process for COLIEE2022.

    • The relation extraction is based on the knowledge_graph_from_unstructured_text and lexnlp.

  • Note: Legal feature extraction should be done first since the relation extraction is based on the extracted legal features.

  • The extracted information can be also downloaded here.

2. PromptCase Embedding Generation

  • PromptCase is used to generate the case embedding (the feature of virtual global node)
    • Run . ./PromptcaseEmbeddingGeneration_2023.sh.
    • The generated case embedding and the according index list of cases are saved in folder /PromptCase/promptcase_embedding/
    • The same process for COLIEE2022.
  • The generated PromptCase embedding can be also downloaded here.

3. TACG Constrction

  • TACG constrction utilises the result of Information Extraction and PromptCase Embedding, please ensure the folders of coliee2023_ie/coliee2023train(or test)_sum(or fact)/result/ and /PromptCase/promptcase_embedding/ have been generated or downloaded.

  • Run . ./CaseGNN_2023.sh

  • The TACG graphs are saved in folder /Graph_generation/graph/

  • The same process for COLIEE2022.

4. CaseGNN Model Training

Run . ./CaseGNN_2022.sh and . ./CaseGNN_2023.sh for COLIEE2022 and COLIEE2023, respectively.

Cite

If you find this repo useful, please cite

@article{CaseGNN,
  author    = {Yanran Tang, Ruihong Qiu, Yilun Liu, Xue Li and Zi Huang},
  title     = {CaseGNN: Graph Neural Networks for Legal Case Retrieval with Text-Attributed Graphs},
  journal   = {CoRR},
  volume    = {abs/2312.11229},
  year      = {2023},
}

@article{PromptCase,
  author    = {Yanran Tang and Ruihong Qiu and Xue Li},
  title     = {Prompt-based Effective Input Reformulation for Legal Case Retrieval},
  journal   = {CoRR},
  volume    = {abs/2309.02962},
  year      = {2023},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published