Skip to content

colbert for dense retrieval, including multi view version, dureader-retrieval as an example

License

Notifications You must be signed in to change notification settings

wuyaoxuehun/colbert

Repository files navigation

ColBERT

环境

#pip install -r requirements.txt
conda env create -f environment.yaml [-p /data/anaconda/env/vir_base] 

1训练

./eval.sh train
  1. 语料库编码
./eval.sh index 
  1. faiss建立ivfpq索引
./eval.sh faiss
  1. 开启稠密检索服务
./eval.sh server 
  1. 检索验证
./eval.sh eval

以上过程针对dureader中文数据集。
模型非训练使用参数位于 proj_conf/dense.yaml
加载参数代码位于 colbert/utils/dense_conf.py
训练验证文件格式

[{
  "question": "...",
  "positive_ctxs": [
    "...",
    "..."
  ],
  "hard_negative_ctxs": [
    "...",
    "..."
  ]
}]

数据集格式为

[
  "...", 
  "..."
]

加载数据集代码在 colbert/proj_utils/dureader_utils.py->get_dureader_ori_corpus(),可以修改为自己的数据集

使用multi-view版本

About

colbert for dense retrieval, including multi view version, dureader-retrieval as an example

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published