Stanford Alpaca Regenerate: An Instruction-following LLaMA Model

Updated to the latest OpenAI Python SDK. For the data generation part, we mostly keep it and instead replace the model training part with axolotl config.

create dataset

You need a FRIENDLI_TOKEN. You can get it from the document.

## Generate 52 new data -> regen.json
python -m generate_instruction generate_instruction_following_data \
  --output_dir ./ \
  --num_instructions_to_generate 52

# convert regen.json to jsonl -> output.jsonl
python convert_to_jsonl.py

Example of completed data

$ head -n 1 output.jsonl
{"instruction": "What can you infer from the following conversation?", "input": "John: How was your weekend?\nJane: It was great. I went to the beach with friends and had a lot of fun.", "output": "Jane had a great weekend and enjoyed her time at the beach with friends."}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_to_jsonl.py		convert_to_jsonl.py
generate_instruction.py		generate_instruction.py
output.jsonl		output.jsonl
prompt.txt		prompt.txt
regen.json		regen.json
requirements.txt		requirements.txt
seed_tasks.jsonl		seed_tasks.jsonl
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stanford Alpaca Regenerate: An Instruction-following LLaMA Model

create dataset

Example of completed data

About

Languages

License

minpeter/stanford_alpaca_regen

Folders and files

Latest commit

History

Repository files navigation

Stanford Alpaca Regenerate: An Instruction-following LLaMA Model

create dataset

Example of completed data

About

Topics

Resources

License

Stars

Watchers

Forks

Languages