Banana.dev CodeLlama-34B-Python-GPTQ starter template

This is a CodeLlama-34B-Python-GPTQ starter template from Banana.dev that allows on-demand serverless GPU inference.

You can fork this repository and deploy it on Banana as is, or customize it based on your own needs.

Running this app

Deploying on Banana.dev

Fork this repository to your own Github account.
Connect your Github account on Banana.
Create a new model on Banana from the forked Github repository.

Running after deploying

Wait for the model to build after creating it.
Make an API request to it using one of the provided snippets in your Banana dashboard.

For more info, check out the Banana.dev docs.