Code for GenKI

./msmarco_jsons:

we filter out the data without correct answers on the MSMARCO dataset and randomly sample the same amount of data as TriviaQA, here are the dev and test set of MSMARCO we use.

The format of JSONs we use for tuning and inferring is as follows:

[
 {
  "instruction": "Please extract the answer keyword",
  "input": "Question: Kim Carnes' nine weeks at No 1 with Bette Davis Eyes was interrupted for one week by which song? Answer:  Stars on 45 medley ",
  "output": "Processed answer: Stars on 45"
 }
]

./retriever

retrieving knowledge from knowledge base

to generate_embedding of knowledge base:

python generate_embedding.py

to retrieve knowledge by embedding:

python retriever.py

./K_PP_glm6b

for knowledge integration and post-processing part of our framework using GLM-6B

to porcess json file, do

sh cover_jasonl.sh

to finetune models, do

sh finetune.sh

to infer, do

python infer.py --saveI 1

./K_PP_LLaMA

for knowledge integration and post-processing part of our framework using LLaMA-65B

to finetune models, do

sh finetune.sh

to infer, do

sh generate.sh

./RM

for reward model part of our model

to finetune models, do

sh train_rm.sh

to infer, do

sh inference_rm.sh

./Consistency-calc

for calculation of consistency and fluency of generated answer to calculate, do

sh test.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Consistency-calc		Consistency-calc
K_PP_GLM6B		K_PP_GLM6B
K_PP_LLaMA_		K_PP_LLaMA_
RM		RM
msmarco_jsons		msmarco_jsons
retriever		retriever
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code for GenKI

About

Uh oh!

Releases

Packages

Uh oh!

Languages

USTC-StarTeam/GenKI

Folders and files

Latest commit

History

Repository files navigation

Code for GenKI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages