しりとりのfinetuning

概要

Jumandicからしりとりのデータセットを作成し、それを用いてfine tuning及びLoRA tuningを行った。

そして、学習データの量がどう精度を影響を与えるか、finetuningとLoRAtuningの精度の比較、使用する日本語LLM(cyberagent/open-calm-1b,3b,7b)及びChatGPTとの精度の差を検証した。

コードについて

学習データの作成

noun.txt・・・Jumandicから名詞のみを取り出して.txtファイルに保存したもの

makedict.py・・・noun.txtに格納されている名詞から、辞書のように索引(先頭のひらがな)->単語という形で名詞データを再保存するためのコード

noun_dict_sorted.txt・・・makedict.pyにより順番・構造を辞書式に並び替えられた名詞のデータが保存されたもの

tojson.py・・・noun_dict_sorted.txtからN個分のしりとりのデータセットを取り出して、しりとりのinstruction tuningができるようにしたjsonファイル

LoRA tuning, fine tuningの実行

loratune.py

finetune.py

zero-shotのChatGPTによるしりとりの実行

gpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

しりとりのfinetuning

概要

コードについて

学習データの作成

LoRA tuning, fine tuningの実行

zero-shotのChatGPTによるしりとりの実行

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
finetune.py		finetune.py
gpt.py		gpt.py
loratune.py		loratune.py
makedict.py		makedict.py
noun.txt		noun.txt
noun_dict_sorted.txt		noun_dict_sorted.txt
shiritori_data.json		shiritori_data.json
tojson.py		tojson.py

tryuuu/project_research_spring

Folders and files

Latest commit

History

Repository files navigation

しりとりのfinetuning

概要

コードについて

学習データの作成

LoRA tuning, fine tuningの実行

zero-shotのChatGPTによるしりとりの実行

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages