Evaluation of Language Models for Large Input Sizes #55

EKebriaei · 2024-02-27T06:39:19Z

In certain datasets (such as EDTSUM, finQA,...), the input size might exceed the default maximum context length of the language models. I am curious to know the methodologies and considerations employed by the PIXIU team when dealing with such situations. How do you handle evaluations for large input sizes, and what strategies or techniques are implemented to ensure accurate and meaningful results?

jiminHuang · 2024-02-27T16:24:58Z

Thank you for raising this issue and for your interest in how we handle evaluations with large input sizes. Here's how we approach this challenge:

Context Length Limitation: Context length is a crucial limitation for language learning models (LLMs), especially for smaller models around the 7B parameter range. This limitation becomes particularly significant when dealing with datasets with inherently large input sizes, such as EDTSUM and finQA.
Truncation for Fair Comparison: To manage this issue and ensure a fair comparison across different models, our strategy involves truncation. We truncate the input data to fit within the maximum context length the model can handle. Although this approach might not capture the full context of the data, it allows for consistent evaluation metrics across various models.
Impact on Model Performance: For specific datasets like fintrade, the impact of context length limitation is markedly evident. Smaller models, due to their limited context length, often fail to generate a trading action. This clearly demonstrates how critical the context length is for the performance and capabilities of LLMs, especially for tasks that require the analysis of large volumes of data.

We're continuously exploring ways to mitigate these limitations and improve our models' ability to handle large input sizes more effectively. Your interest and inquiries are invaluable to our ongoing efforts and discussions on this front.

jiminHuang added the question Further information is requested label Feb 27, 2024

jiminHuang self-assigned this Feb 27, 2024

jiminHuang closed this as completed Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation of Language Models for Large Input Sizes #55

Evaluation of Language Models for Large Input Sizes #55

EKebriaei commented Feb 27, 2024

jiminHuang commented Feb 27, 2024

Evaluation of Language Models for Large Input Sizes #55

Evaluation of Language Models for Large Input Sizes #55

Comments

EKebriaei commented Feb 27, 2024

jiminHuang commented Feb 27, 2024