Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of Language Models for Large Input Sizes #55

Closed
EKebriaei opened this issue Feb 27, 2024 · 1 comment
Closed

Evaluation of Language Models for Large Input Sizes #55

EKebriaei opened this issue Feb 27, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@EKebriaei
Copy link

In certain datasets (such as EDTSUM, finQA,...), the input size might exceed the default maximum context length of the language models. I am curious to know the methodologies and considerations employed by the PIXIU team when dealing with such situations. How do you handle evaluations for large input sizes, and what strategies or techniques are implemented to ensure accurate and meaningful results?

@jiminHuang
Copy link
Contributor

Thank you for raising this issue and for your interest in how we handle evaluations with large input sizes. Here's how we approach this challenge:

  1. Context Length Limitation: Context length is a crucial limitation for language learning models (LLMs), especially for smaller models around the 7B parameter range. This limitation becomes particularly significant when dealing with datasets with inherently large input sizes, such as EDTSUM and finQA.

  2. Truncation for Fair Comparison: To manage this issue and ensure a fair comparison across different models, our strategy involves truncation. We truncate the input data to fit within the maximum context length the model can handle. Although this approach might not capture the full context of the data, it allows for consistent evaluation metrics across various models.

  3. Impact on Model Performance: For specific datasets like fintrade, the impact of context length limitation is markedly evident. Smaller models, due to their limited context length, often fail to generate a trading action. This clearly demonstrates how critical the context length is for the performance and capabilities of LLMs, especially for tasks that require the analysis of large volumes of data.

We're continuously exploring ways to mitigate these limitations and improve our models' ability to handle large input sizes more effectively. Your interest and inquiries are invaluable to our ongoing efforts and discussions on this front.

@jiminHuang jiminHuang added the question Further information is requested label Feb 27, 2024
@jiminHuang jiminHuang self-assigned this Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants