# output-parser-databricks

This notebook calls Databricks foundation models to generate JSON data in response to a question

In [0]:
%pip install --upgrade langchain mlflow

In [0]:
dbutils.library.restartPython()

In [0]:
import os
os.environ["DATABRICKS_HOST"] = "https://e2-demo-west.cloud.databricks.com" # set to your server URI
os.environ["DATABRICKS_TOKEN"] = dbutils.secrets.get('vbalasu', 'e2-demo-west-token')

In [0]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
class NutritionFacts(BaseModel):
  calories: float = Field(description='Total calories')
  fat: float = Field(description='Total fat in grams')
  carbohydrates: float = Field(description='Total carbohydrates in grams')
  sodium: float = Field(description='Total sodium in milligrams')
parser = JsonOutputParser(pydantic_object=NutritionFacts)

In [0]:
format_instructions = parser.get_format_instructions()
format_instructions

In [0]:
from langchain_community.llms.databricks import Databricks
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.chains import LLMChain

In [0]:
import os
os.environ.get('DATABRICKS_HOST')

In [0]:
llm = Databricks(endpoint_name='databricks-mpt-30b-instruct', extra_params={'temperature': 0.0})

In [0]:
prompt = PromptTemplate.from_template(template="""What are the nutrition facts for a single {fruit}? You must format your output as a JSON object.\n{format_instructions}""",
                                     partial_variables={'format_instructions': format_instructions})
prompt

In [0]:
# The standard output_parser doesn't work with MPT-30b due to extra characters
# Hence we are defining our own "extract_json" function
from extract_json import extract_json

In [0]:
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)  # output_parser=parser, 
#chain = prompt | llm
chain

In [0]:
avocado_nutrition_facts = extract_json(chain.run('avocado'))
avocado_nutrition_facts

In [0]:
apple_nutrition_facts = extract_json(chain.run('apple'))
apple_nutrition_facts

In [0]:
avocado_nutrition_facts, apple_nutrition_facts

##### Observation
MPT 30b Instruct model returns the expected information, but generates extra text in addition to JSON. This extra text messes up the regular output parser, so it required a custom function to clean up

### Use a chat model - Llama2 70b Chat

In [0]:
from langchain_community.chat_models.databricks import ChatDatabricks
chat = ChatDatabricks(endpoint='databricks-llama-2-70b-chat', extra_params={'temperature': 0.0})

In [0]:
chain = LLMChain(llm=chat, prompt=prompt, output_parser=parser, verbose=True)
chain

In [0]:
avocado_nutrition_facts = chain.run('avocado')

In [0]:
avocado_nutrition_facts

##### Observation
Llama2 70b Chat model produced valid JSON, but it did not provide the information expected.