This file presents an evaluation of GPT-3.5 Turbo's reasoning abilities using a dataset of academic prompts published in the latter half of 2023.
The evaluation focuses on analyzing the model's responses to various reasoning tasks using different prompting techniques, such as zero-shot-CoT, Prompt and plan, Analogical Prompting, and Emotion Prompt.
The results provide insights into the model's strengths and weaknesses in reasoning.
- OpenAI API Key
- a Google account
- Beginners
- Intermediate users familiar with basic general worksheets concepts
Follow the instructions to recreate your own file here.
In short:
- In Tester sheet, insert your OpenAI's api key in cell A2
- Select the menu Evaluator->Generate test
- After two test runs, go to Analysis sheet, select Evaluator->Calculate similarity