Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vercel AI and Gemini Flash #22

Merged
merged 7 commits into from
Jun 7, 2024
Merged

Add Vercel AI and Gemini Flash #22

merged 7 commits into from
Jun 7, 2024

Conversation

zachblume
Copy link
Owner

@zachblume zachblume commented Jun 7, 2024

No idea which base models performs best yet, but GPT-4o was expensive to run hundreds or thousands of times in development, so this PR swaps the OpenAI sdk for the Vercel AI sdk which supports many (cheaper) models addressing #20

Again, not a performance review, but:

I tested briefly with Gemini Flash and found it vaguely capable enough to use in development runs, and crucially it is currently free up to the following limits:

15 RPM (requests per minute)
1 million TPM (tokens per minute)
1,500 RPD (requests per day)

That makes a big difference in cost while developing.

Example trajectory output below:

2024-06-07T11:51:55.990Z [INFO] - Test Summary:
2024-06-07T11:51:55.991Z [INFO] - ✘ 1. The user should be able to sort the todo items by clicking on the sort buttons
2024-06-07T11:51:55.991Z [INFO] -   1.1) action: markSpecAsComplete, reason: The spec failed, explanationWhySpecComplete: There are no sort buttons on the page, so the user cannot sort the todo items., planningThoughtAboutTheActionIWillTake: The spec is asking for the user to be able to sort the todo items by clicking on the sort buttons. However, there are no sort buttons on the page. Therefore, the spec fails.
2024-06-07T11:51:55.991Z [INFO] - ✘ 2. The user should be able to clear all completed todo items by clicking on the clear completed button
2024-06-07T11:51:55.991Z [INFO] -   2.1) action: markSpecAsComplete, reason: The spec failed, explanationWhySpecComplete: The spec is asking for a button to clear completed items, but there are no completed items to clear., planningThoughtAboutTheActionIWillTake: The spec is asking for a button to clear completed items, but there are no completed items to clear. The spec is failing.
2024-06-07T11:51:55.991Z [INFO] - ✔ 3. The user should be able to add a new todo item by typing in the input field and pressing enter
2024-06-07T11:51:55.991Z [INFO] -   3.1) action: clickOn, cssSelector: #todo-input, nth: 0, planningThoughtAboutTheActionIWillTake: I need to click on the input field to be able to type in it
2024-06-07T11:51:55.991Z [INFO] -   3.2) action: keyboardInputString, cssSelector: #todo-input, nth: 0, string: Buy groceries, planningThoughtAboutTheActionIWillTake: I need to type in the input field
2024-06-07T11:51:55.991Z [INFO] -   3.3) action: keyboardInputSingleKey, cssSelector: #todo-input, nth: 0, key: Enter, planningThoughtAboutTheActionIWillTake: I need to press enter to submit the todo item
2024-06-07T11:51:55.991Z [INFO] -   3.4) action: markSpecAsComplete, reason: The spec passed, explanationWhySpecComplete: The todo item was added to the list, planningThoughtAboutTheActionIWillTake: The spec has passed, the todo item was added to the list
2024-06-07T11:51:55.991Z [INFO] - ✔ 4. The user should be able to mark a todo item as completed by clicking on the checkbox next to it
2024-06-07T11:51:55.991Z [INFO] -   4.1) action: keyboardInputString, cssSelector: #todo-input, nth: 0, string: Buy milk, planningThoughtAboutTheActionIWillTake: The spec requires a todo item to be present to be marked as completed. Since there are no todo items present, I will add a todo item first.
2024-06-07T11:51:55.991Z [INFO] -   4.2) action: keyboardInputSingleKey, cssSelector: #todo-input, nth: 0, key: Enter, planningThoughtAboutTheActionIWillTake: I will press the enter key to add the todo item to the list.
2024-06-07T11:51:55.991Z [INFO] -   4.3) action: clickOn, cssSelector: .toggle, nth: 0, planningThoughtAboutTheActionIWillTake: I will click on the checkbox next to the todo item to mark it as completed.
2024-06-07T11:51:55.991Z [INFO] -   4.4) action: markSpecAsComplete, reason: The spec passed, explanationWhySpecComplete: The todo item is now marked as completed. The spec is complete., planningThoughtAboutTheActionIWillTake: The todo item is now marked as completed. The spec is complete.
2024-06-07T11:51:55.991Z [INFO] - ✘ 5. The user should be able to filter the todo items by clicking on the filter buttons
2024-06-07T11:51:55.991Z [INFO] -   5.1) action: hoverOver, cssSelector: footer p, nth: 0, planningThoughtAboutTheActionIWillTake: I need to find the filter buttons to test the filtering functionality. I will look for buttons with text like 'All', 'Active', and 'Completed' in the footer of the page.
2024-06-07T11:51:55.991Z [INFO] -   5.2) action: hoverOver, cssSelector: footer p, nth: 0, planningThoughtAboutTheActionIWillTake: I need to move the mouse cursor to the first filter button, which is the 'All' button. I will use the footer p tag to find the button.
2024-06-07T11:51:55.991Z [INFO] -   5.3) action: clickOn, cssSelector: footer p, nth: 0, planningThoughtAboutTheActionIWillTake: I need to click on the 'All' button to filter the todo items. I will use the footer p tag to find the button.
2024-06-07T11:51:55.991Z [INFO] -   5.4) action: clickOn, cssSelector: footer p, nth: 1, planningThoughtAboutTheActionIWillTake: I have clicked on the 'All' button, and the page is still the same. I will now click on the 'Active' button to see if the page changes.
2024-06-07T11:51:55.991Z [INFO] -   5.5) action: clickOn, cssSelector: footer p, nth: 2, planningThoughtAboutTheActionIWillTake: I have clicked on the 'Active' button, and the page is still the same. I will now click on the 'Completed' button to see if the page changes.
2024-06-07T11:51:55.991Z [INFO] -   5.6) action: markSpecAsComplete, reason: The spec failed, explanationWhySpecComplete: The page did not change after clicking on the 'Completed' button. The page is still the same as before., planningThoughtAboutTheActionIWillTake: The spec has failed because the page did not change after clicking on the 'Completed' button. The page is still the same as before.
2024-06-07T11:51:55.991Z [INFO] - ✔ 6. The user should be able to delete an existing todo item by clicking on the delete button next to it
2024-06-07T11:51:55.991Z [INFO] -   6.1) action: keyboardInputString, cssSelector: .new-todo, nth: 0, string: Buy milk, planningThoughtAboutTheActionIWillTake: The application is empty, so I need to add a todo item first to be able to delete it. I will type 'Buy milk' into the input field and press enter to add a todo item.
2024-06-07T11:51:55.991Z [INFO] -   6.2) action: keyboardInputSingleKey, cssSelector: .new-todo, nth: 0, key: Enter, planningThoughtAboutTheActionIWillTake: I need to press enter to add the todo item.
2024-06-07T11:51:55.991Z [INFO] -   6.3) action: clickOn, cssSelector: .destroy, nth: 0, planningThoughtAboutTheActionIWillTake: I need to click on the delete button next to the todo item.
2024-06-07T11:51:55.991Z [INFO] -   6.4) action: hoverOver, cssSelector: .view, nth: 0, planningThoughtAboutTheActionIWillTake: The delete button is not visible, so I need to hover over the todo item to make it visible.
2024-06-07T11:51:55.991Z [INFO] -   6.5) action: scroll, deltaX: 0, deltaY: -100, planningThoughtAboutTheActionIWillTake: The mouse cursor is not positioned over the delete button, so I need to adjust it.
2024-06-07T11:51:55.991Z [INFO] -   6.6) action: clickOn, cssSelector: .destroy, nth: 0, planningThoughtAboutTheActionIWillTake: The mouse cursor is now positioned over the delete button, so I can click on it.
2024-06-07T11:51:55.991Z [INFO] -   6.7) action: markSpecAsComplete, reason: The spec passed, explanationWhySpecComplete: The todo item was deleted successfully., planningThoughtAboutTheActionIWillTake: The todo item has been deleted, and the application is now empty. The spec has passed.
2024-06-07T11:51:55.991Z [INFO] - ✘ 7. The user should be able to edit an existing todo item by double-clicking on it and typing in the input field
2024-06-07T11:51:55.991Z [INFO] -   7.1) action: keyboardInputString, cssSelector: #todo-input, nth: 0, string: Buy groceries, planningThoughtAboutTheActionIWillTake: The spec requires me to edit an existing todo item, but there are no todo items on the page. I will add a todo item first.
2024-06-07T11:51:55.991Z [INFO] -   7.2) action: doubleClickOn, cssSelector: #root > main > ul > li > div > label, nth: 0, planningThoughtAboutTheActionIWillTake: I need to double click on the todo item to edit it.
2024-06-07T11:51:55.991Z [INFO] -   7.3) action: clickOn, cssSelector: #root > main > ul, nth: 0, planningThoughtAboutTheActionIWillTake: The todo item is not visible on the page. I will need to click on the todo list to make it visible.
2024-06-07T11:51:55.991Z [INFO] -   7.4) action: clickOn, cssSelector: #root > main > ul, nth: 0, planningThoughtAboutTheActionIWillTake: The todo item is not visible on the page. I will need to click on the todo list to make it visible.
2024-06-07T11:51:55.991Z [INFO] -   7.5) action: clickOn, cssSelector: #root > main > ul, nth: 0, planningThoughtAboutTheActionIWillTake: The todo item is not visible on the page. I will need to click on the todo list to make it visible.
2024-06-07T11:51:55.991Z [INFO] -   7.6) action: clickOn, cssSelector: #root > main > ul, nth: 0, planningThoughtAboutTheActionIWillTake: The todo item is not visible on the page. I will need to click on the todo list to make it visible.
2024-06-07T11:51:55.991Z [INFO] -   7.7) action: clickOn, cssSelector: #root > main > ul, nth: 0, planningThoughtAboutTheActionIWillTake: The todo item is not visible on the page. I will need to click on the todo list to make it visible.
2024-06-07T11:51:55.991Z [INFO] -   7.8) action: clickOn, cssSelector: #root > main > ul, nth: 0, planningThoughtAboutTheActionIWillTake: The todo item is not visible on the page. I will need to click on the todo list to make it visible.
2024-06-07T11:51:55.991Z [INFO] -   7.9) action: clickOn, cssSelector: #root > main > ul, nth: 0, planningThoughtAboutTheActionIWillTake: The todo item is not visible on the page. I will need to click on the todo list to make it visible.

info.message = stripAnsi(info.message);
return info;
const testPlanSchema = z.object({
arrayOfSpecs: z.array(z.string()),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPT-4o will only accept JSON/zod schemas where the base type is an object. I originally tried const testPlanSchema = z.array(z.string()); and it threw an error saying that. So this object shape ensures cross-compatability.

winston.format.printf(
({ timestamp, level, message }) =>
`${timestamp} [${level.toUpperCase()}] - ${message}`,
`${timestamp} [${level.toUpperCase()}] - ${stripAnsi(message)}`,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ansi stripping didn't need to be so complicated and there were typeissues with the previous way the object was being handled for some reason

Copy link
Collaborator

@craigmulligan craigmulligan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor notes.


const actionStepSchema = z.object({
planningThoughtAboutTheActionIWillTake: z.string(),
action: z.object({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the action type a discriminate union? That way you could have stricter types for each action type.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, let me open an issue for this

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#23


logger.info(output.choices[0].message.content);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we still need some rate-limiting retry mechanism? Or does vercels sdk have this baked in?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I went this way because I saw it baked in with a expoential backoff apparently:
https://sdk.vercel.ai/docs/ai-sdk-core/settings#maxretries

Want to give it a try and see if we need to bump the maxRetires above the default of 2?

},
"dependencies": {
"@ai-sdk/anthropic": "^0.0.19",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: doesn't look like anthropic is actually used.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wired up in the model config for claude-3-haiku! Haven't tested it yet but will soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants