Code for CaseLink paper (SIGIR 2024).
Title: CaseLink: Inductive Graph Learning for Legal Case Retrieval
Author: Yanran Tang, Ruihong Qiu, Hongzhi Yin, Xue Li, and Zi Huang
Requirements can be seen in /requirements.txt
Datasets can be downloaded from COLIEE2022 and COLIEE2023.
Specifically, the downloaded COLIEE2022 folders task1_train_files_2022
and task1_test_files_2022
should be put into /Graph_generation/COLIEE2022/
.
The label file task1_train_labels_2022.json
and task1_test_labels_2022.json
shoule be put into folder /label/
.
COLIEE2023 folders should be set in a similar way.
The final project files are as follows:
```
$ ./CaseGNN/
.
├── DATASET
│ └── data_load.py
├── Grpah_generation
│ ├── bm25
│ │ ├── bm25_matrix
│ │ ├── bm25_coliee+lecard.py
│ │ └── bm25_model.py
│ ├── casegnn_embedding
│ ├── graph
│ │ ├── graph_bin_2022
│ │ └── graph_bin_2023
│ ├── COLIEE2022
│ ├── COLIEE2023
│ ├── graph_construction.py
│ └── federal_charges_coliee.txt
├── label
│ ├── hard_neg_top50_train_2022.json
│ ├── hard_neg_top50_train_2023.json
│ ├── task1_test_labels_2022.json
│ ├── task1_test_labels_2023.json
│ ├── task1_train_labels_2022.json
│ ├── task1_train_labels_2023.json
│ ├── test_2022_candidate_with_yearfilter.json
│ └── test_2023_candidate_with_yearfilter.json
├── CaseLink2022_run.sh
├── CaseLink2023_run.sh
├── Graph.sh
├── BM25_matrix_generation.sh
├── CaseLink_model.py
├── main.py
├── train.py
├── torch_metrics.py
├── requirements.txt
└── README.md
```
-
Run
. ./BM25_matrix_generation.sh
to generate COLIEE2022 bm25 ranking matrix files in folder./Graph_generation/bm25/bm25_matrix/
. -
The same process for COLIEE2023, please change the
--data 2022
to--data 2023
inBM25_matrix_generation.sh
. -
The BM25 ranking matrix can be also downloaded here.
-
To generate the CaseGNN case embeddings, please run the CaseGNN2022_run.sh and CaseGNN2023_run.sh in CaseGNN github
-
The CaseGNN case embedding can be also downloaded here.
-
CaseLink graph constrction utilises the result of bm25 ranking result and case embedding from CaseGNN, please ensure the folders of
./Graph_generation/bm25/bm25_matrix/
and./Graph_generation/casegnn_embedding/
have been generated or downloaded. -
Run
. ./Graph.sh
--topk_neighbor
can be chosen from {3, 5, 10, 20}.--charge_threshold
can be chosen from {0.85, 0.9, 0.95}.
-
The CaseLink graphs are saved in folder
/Graph_generation/graph/
-
The same process for COLIEE2023, please change the
--data 2022
to--data 2023
inGraph.sh
. -
The CaseLink graphs can be also downloaded here.
Run . ./CaseLink2022_run.sh
and . ./CaseLink2023_run.sh
for COLIEE2022 and COLIEE2023, respectively.
If you find this repo useful, please cite
@article{CaseLink,
author = {Yanran Tang and
Ruihong Qiu and
Hongzhi Yin and
Xue Li and
Zi Huang},
title = {CaseLink: Inductive Graph Learning for Legal Case Retrieval},
journal = {CoRR},
volume = {abs/2403.17780},
year = {2024}
}