The purpose of this cookbook is to provide a starting template to build DSPy applications on Databricks. It covers how to get started using DSPy with Databricks features such as vector search, MLFlow, Databricks Agent Framework and Mosaic Agent Eval.
It also provides a guide to understanding the value of DSPy when building GenAI applications on Databricks.
As a teaching example, we're looking at a simple RAG agent, however DSPy really excels at more complex and nuanced tasks, so we encourage you to use these notebooks as a starting point for your more advanced projects.
- Generate a PAT token and store it as a Databricks secret
- Create a compute instance to run the notebooks, this example is not compute intensive so a general purpose instance with 2 cores is sufficient, e.g. m5d.large. However, if you are a more advanced user looking to do multi-threaded optimization, aim to use compute such that #threads = #cores.
- Populate the
config.yaml
file with values for all the given fields. The values will be referenced throughout all the notebooks. - Navigate to the setup folder, and run the first 3 notebooks.
- Once the volume is created, upload the files in the data folder to the volume.
- Run notebooks 04 and 05 in the setup folder.
- Go through the
01_dspy_without_opt_rag_agent
notebook - Go through the
02_dspy_with_opt_rag_agent
notebook - Go through the
03_register_and_deploy
notebook