Skip to content

Repository of the paper: "Spoken Language Intelligence of Large Language Models for Language Learning"

Notifications You must be signed in to change notification settings

vocaliodmiku/SLI-LL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Spoken Language Intelligence of Large Language Models for Language Learning

     


1 NetEase Youdao   2 The University of Texas at Austin   3 Beijing University of Posts and Telecommunications

We introduce a new multiple-choice question dataset to evaluate the effectiveness of LLMs in Spoken Language Learning, including understanding and application of spoken language knowledge. We investigate the influence of various prompting techniques such as zero- and few-shot method (prepending the question with question-answer exemplars), chain-of-thought (CoT, think step-by-step), in-domain exampler and external tools (Google, Wikipedia). We conducted large-scale evaluation on popular LLMs (20 distinct models) using these methods, and advanced method achieved significant performance improvements compared to the zero-shot baseline in the practical questions reasoning (GPT-3.5, 49.1% -> 63.1%; LLaMA2-70B-Chat, 42.2% -> 48.6%). We found that models of different sizes have good understanding of concepts in phonetics, phonology, and second language acquisition, but show limitations in reasoning for real-world problems. Additionally, we also explore preliminary findings on conversational communication. These performances highlight the impressive Spoken Language Intelligence exhibited by LLMs, and Chatbots based on large language models possess significant potential to enhance conversational spoken language learning.

SLIQ-LL Dataset

The data is stored in the "data" directory in the form of JSON format.

Knowledge & Concept subset

  • concept_dev.json
  • concept_test.json

Application Questions

  • capt_dev.json
  • capt_test.json

Citation

If you find our work useful in your research, please consider citing:

@misc{peng2023spoken,
      title={Spoken Language Intelligence of Large Language Models for Language Learning}, 
      author={Linkai Peng and Baorian Nuchged and Yingming Gao},
      year={2023},
      eprint={2308.14536},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Disclaimer

This is not an official product of NetEase Youdao. This repository can only be used for personal/research/non-commercial purposes.