Skip to content

USTC-StarTeam/GenKI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for GenKI

./msmarco_jsons:

we filter out the data without correct answers on the MSMARCO dataset and randomly sample the same amount of data as TriviaQA, here are the dev and test set of MSMARCO we use.

The format of JSONs we use for tuning and inferring is as follows:

[
 {
  "instruction": "Please extract the answer keyword",
  "input": "Question: Kim Carnes' nine weeks at No 1 with Bette Davis Eyes was interrupted for one week by which song? Answer:  Stars on 45 medley ",
  "output": "Processed answer: Stars on 45"
 }
]

./retriever

retrieving knowledge from knowledge base

to generate_embedding of knowledge base:

python generate_embedding.py

to retrieve knowledge by embedding:

python retriever.py

./K_PP_glm6b

for knowledge integration and post-processing part of our framework using GLM-6B

to porcess json file, do

sh cover_jasonl.sh

to finetune models, do

sh finetune.sh

to infer, do

python infer.py --saveI 1

./K_PP_LLaMA

for knowledge integration and post-processing part of our framework using LLaMA-65B

to finetune models, do

sh finetune.sh

to infer, do

sh generate.sh

./RM

for reward model part of our model

to finetune models, do

sh train_rm.sh

to infer, do

sh inference_rm.sh

./Consistency-calc

for calculation of consistency and fluency of generated answer to calculate, do

sh test.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published