Name		Name	Last commit message	Last commit date
parent directory ..
Aplaca		Aplaca
FLAN-T5		FLAN-T5
GPT-4		GPT-4
LLaMA2		LLaMA2
OPT		OPT
TULU		TULU
Vicuna		Vicuna
chatgpt		chatgpt
davinci1		davinci1
davinci2		davinci2
davinci3		davinci3
readme.md		readme.md

readme.md

Introduction

Below this folder are answers to questions classified according to models.

Our comparison models include Chatgpt, davinci1, davinci2, davinci3 (from OpenAI Models) , FLAN-T5 (From google/flan-t5-xxl) and GPT-4 models.

Each model folder contains answers(responses) to 8 datasets under this model.

Format Description

If model's answers file is a txt file, the format of file is : index '\t' question '\t' answer

For example:


8	What has alternative rock in common with Greg Graffin?	Greg Graffin is a musician who is often associated with the alternative rock genre, specifically as the lead singer and songwriter of the punk rock band Bad Religion.

If the file is a json file, an example of the format of the file is as follows:


{
        "id": "0",  
        "language": "fr",  
        "question": "Quel est le fuseau horaire de Salt Lake City?",  
        "answer": "Le fuseau horaire de Salt Lake City est Mountain Time Zone (MT)."  
}

About Some Answers From Questions With Prompt

For the davinci3, davinci2 and FLAN-T5 models, we made an English prompt on the QALD-9 dataset for comparison, and the file name ends with "QALD-9_en_prompt.json".

On the davinci1 model, we did not handle the English prompt.

For the ChatGPT model, we have done multiple comparison experiments. On the WQSP, QALD-9 and GraphQuestions data sets, we compared the data of February and March, and ended with "_March.txt" and "_February.txt" respectively .

Additionally, for the QALD-9 and GraphQuestions datasets, we used Chinese prompts and English prompts for comparison, ending with "zh_cn_prompt.txt" and "en_prompt.txt" respectively.

🔥At the same time, we also tested the newly released GPT-4. Due to the restrictions of OpenAI, we sampled 500 data for testing from each data set. We guarantee that the sampling data contains all the answers and inference types as much as possible. The format of the test data result is the same as the format of ChatGPT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

answers_from_LLMs

answers_from_LLMs

readme.md

Introduction

Format Description

About Some Answers From Questions With Prompt

Files

answers_from_LLMs

Directory actions

More options

Directory actions

More options

Latest commit

History

answers_from_LLMs

Folders and files

parent directory

readme.md

Introduction

Format Description

About Some Answers From Questions With Prompt