## LLM to edit JSON files: Langchain & Groq 

In [33]:
import os, json
from langchain_groq import ChatGroq
from langchain_core.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
)
from langchain import LLMChain
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

now we load JSONs,  we will need it multiple times, lets wrap it around a function 

In [34]:
# load the json files
def get_jsons():
    with open('./robot.json', 'r') as file:
        robot_json = json.dumps(json.load(file), indent=4)
    with open('./products.json', 'r') as file:
        products_json = json.dumps(json.load(file),  indent=4)
    with open('./conveyer.json', 'r') as file:
        conveyer_json = json.dumps(json.load(file), indent=4)
    return robot_json, products_json, conveyer_json

lets set up Groq API key, you can get it from free from https://console.groq.com/keys.  
You can use any of the models listed here: https://console.groq.com/docs/models  
learn about how to use here: https://console.groq.com/docs/text-chat 

In [35]:
GROQ_API = "your_groq_api_key"
chat_llm = ChatGroq(
    temperature=0,
    model="llama3-70b-8192", 
    api_key=GROQ_API
)

memory is important to retain the value of conversations. Langchain provided a great way to build up memory placeholders.   
learn different memory methods here: https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/. 

k = 5 indicates that we will remember the previous 5 conversations.

In [36]:
memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history",input_key="text_input", return_messages=True)

lets build a conversation chain, it contains of system prompt, message placeholders and user query.  
learn in detail from the langchain official guide: https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/

In [37]:
def build_conversation_chain():
    prompt = ChatPromptTemplate.from_messages(
            [
                ("system", "you are JSON master who can work with JSON files. Learn the current JSON files\
                         Take the current values of three JSON files below delimited by triple backticks and use it to perform the operations that are requested by the user.\
                    Conveyers: ```{conveyers}```\
                    Products: ```{products}```\
                    Robot: ```{robot}``` \
                    Depending on the operations, learn the following instructions below\
                    1) Once you update a value, show the whole JSON file\
                    2) You cannot return two JSON in one response\
                 "
                 ),
                MessagesPlaceholder(
                    variable_name="chat_history"
                ),

                ("human", "{user_input}")
            ]
        )
    return LLMChain(
            llm=chat_llm,  # The Groq LangChain chat object initialized earlier.
            prompt=prompt,  # The constructed prompt template.
            verbose=False,   # Enables verbose output, which can be useful for debugging.
            memory=memory
        )

lets build a chain 

In [38]:
chain = build_conversation_chain()

now we will get the current values of JSON and pass it to the system prompt of the LLM.  

In [39]:
robot, product, conveyer = get_jsons()

note that these placeholders inside 'msgs' are very important as they allow us to pass the values inside the LLM prompts

In [40]:
prompt = "How many files are there"
msgs = {
        "conveyers": conveyer, 
        "products": product, 
        "robot": robot, 
        "user_input": prompt, 
        "text_input":prompt,
    }


we will use the invoke method to make the LLM call

In [41]:
response = chain.invoke(msgs)

In [42]:
response

{'conveyers': '{\n    "name": "conveyer",\n    "conveyor": {\n        "id": "C67890",\n        "name": "ConveyorMaster-200",\n        "type": "Belt",\n        "manufacturer": "ConveyorTech",\n        "year_of_manufacture": 2021,\n        "specifications": {\n            "belt_width_mm": 500,\n            "belt_length_m": 20,\n            "speed_m_per_min": 50,\n            "load_capacity_kg": 1000,\n            "motor_power_kw": 2.5,\n            "material": "Stainless Steel",\n            "control_system": "ConveyorControl-Z"\n        },\n        "maintenance_schedule": {\n            "last_maintenance_date": "2024-06-15",\n            "next_maintenance_date": "2024-11-15",\n            "maintenance_frequency_months": 6\n        },\n        "status": "operational"\n    }\n}',
 'products': '{\n    "name": "product",\n    "product": {\n        "id": "P54321",\n        "name": "Widget-A1",\n        "category": "Electronics",\n        "manufacturer": "WidgetMakers Inc.",\n        "date_of

lets decode each and every parameter of the LLM response. 
1) the first three values are the current values of JSON files
2) user_input is the user input obviously. 
3) chat_histoy: remember we create memory object above with the place holder 'chat_history'. the object saves in 5 conversations iteratively and it is passed inside the conversation. this is useful to remember the context. This memory object will save both the user message and the AI response.
4) Text input is the placeholder to store memory values 
5) text: this is the generated response by the LLM

In [43]:
generated_response = response['text']
generated_response

'There are 3 JSON files:\n\n1. Conveyers\n2. Products\n3. Robot'

lets check if the memory actually works or not. lets pass in another prompt and see the contents of memory. 

In [44]:
robot, product, conveyer = get_jsons()
prompt = "What is the name of three files?"
msgs = {
        "conveyers": conveyer, 
        "products": product, 
        "robot": robot, 
        "user_input": prompt, 
        "text_input":prompt,
    }
chain.invoke(msgs)

{'conveyers': '{\n    "name": "conveyer",\n    "conveyor": {\n        "id": "C67890",\n        "name": "ConveyorMaster-200",\n        "type": "Belt",\n        "manufacturer": "ConveyorTech",\n        "year_of_manufacture": 2021,\n        "specifications": {\n            "belt_width_mm": 500,\n            "belt_length_m": 20,\n            "speed_m_per_min": 50,\n            "load_capacity_kg": 1000,\n            "motor_power_kw": 2.5,\n            "material": "Stainless Steel",\n            "control_system": "ConveyorControl-Z"\n        },\n        "maintenance_schedule": {\n            "last_maintenance_date": "2024-06-15",\n            "next_maintenance_date": "2024-11-15",\n            "maintenance_frequency_months": 6\n        },\n        "status": "operational"\n    }\n}',
 'products': '{\n    "name": "product",\n    "product": {\n        "id": "P54321",\n        "name": "Widget-A1",\n        "category": "Electronics",\n        "manufacturer": "WidgetMakers Inc.",\n        "date_of

see that, now the chat_history saves the current conversation and the previous conversation.  
here"s how you can see the memory buffer. 

In [45]:
chain.memory.buffer 

[HumanMessage(content='How many files are there'),
 AIMessage(content='There are 3 JSON files:\n\n1. Conveyers\n2. Products\n3. Robot'),
 HumanMessage(content='What is the name of three files?'),
 AIMessage(content='The names of the three JSON files are:\n\n1. conveyer\n2. product\n3. robot')]

Notice that everytime we pass in a simple prompt we need to get the current values from the json files, you will get to understand this later.   
lets now wrap that around a function. 

In [46]:
def get_response(prompt):
    robot, product, conveyer = get_jsons()
    msgs = {
            "conveyers": conveyer, 
            "products": product, 
            "robot": robot, 
            "user_input": prompt, 
            "text_input":prompt,
        }
    resp = chain.invoke(msgs)
    return resp
    

 lets see the content of the JSON files

In [47]:
prompt = "Show me the contents of conveyer."
resp = get_response(prompt)

In [48]:
print(resp['text'])

Here is the contents of the "conveyer" JSON file:

```
{
    "name": "conveyer",
    "conveyor": {
        "id": "C67890",
        "name": "ConveyorMaster-200",
        "type": "Belt",
        "manufacturer": "ConveyorTech",
        "year_of_manufacture": 2021,
        "specifications": {
            "belt_width_mm": 500,
            "belt_length_m": 20,
            "speed_m_per_min": 50,
            "load_capacity_kg": 1000,
            "motor_power_kw": 2.5,
            "material": "Stainless Steel",
            "control_system": "ConveyorControl-Z"
        },
        "maintenance_schedule": {
            "last_maintenance_date": "2024-06-15",
            "next_maintenance_date": "2024-11-15",
            "maintenance_frequency_months": 6
        },
        "status": "operational"
    }
}
```


Now in order to be able to edit JSON files, we first need to extract the JSON content from the response, only if it exists. lets build a function to extract JSON content

In [49]:
def find_json(reply):
    start = reply.find("{")
    end = reply.rfind("}") + 1
    
    # Check if start and end positions were found
    if start != -1 and end != -1:
        json_str = reply[start:end]
        try:
            # Attempt to parse the JSON string
            json_data = json.loads(json_str)
            #print("FOUND JSON IN RESPONSE:", json.dumps(json_data, indent=4))
            return json_data
        except json.JSONDecodeError:
            print("Error decoding JSON from response.")
            return 0
    else:
        print("No JSON data found in response.")
        return 0

In [50]:
json_content = find_json(resp['text'])
json_content

{'name': 'conveyer',
 'conveyor': {'id': 'C67890',
  'name': 'ConveyorMaster-200',
  'type': 'Belt',
  'manufacturer': 'ConveyorTech',
  'year_of_manufacture': 2021,
  'specifications': {'belt_width_mm': 500,
   'belt_length_m': 20,
   'speed_m_per_min': 50,
   'load_capacity_kg': 1000,
   'motor_power_kw': 2.5,
   'material': 'Stainless Steel',
   'control_system': 'ConveyorControl-Z'},
  'maintenance_schedule': {'last_maintenance_date': '2024-06-15',
   'next_maintenance_date': '2024-11-15',
   'maintenance_frequency_months': 6},
  'status': 'operational'}}

see that we have successfully parsed the JSON content from the above response. This is not the usual approach to parse JSON content out of the response. There are special functions like pydantic output parsers. But those function only works when you have a fixed structure of JSON content.  

In [51]:
type(json_content)

dict

now we are able to extract the JSON content out of our response, lets see if it works.

In [55]:
prompt = "Change the value of belt_length_m of conveyer to 500" # see the current value above
resp = get_response(prompt)

In [58]:
print(resp['text'])

Here is the updated contents of the "conveyer" JSON file:

```
{
    "name": "conveyer",
    "conveyor": {
        "id": "C67890",
        "name": "ConveyorMaster-200",
        "type": "Belt",
        "manufacturer": "ConveyorTech",
        "year_of_manufacture": 2021,
        "specifications": {
            "belt_width_mm": 500,
            "belt_length_m": 500,
            "speed_m_per_min": 50,
            "load_capacity_kg": 1000,
            "motor_power_kw": 2.5,
            "material": "Stainless Steel",
            "control_system": "ConveyorControl-Z"
        },
        "maintenance_schedule": {
            "last_maintenance_date": "2024-06-15",
            "next_maintenance_date": "2024-11-15",
            "maintenance_frequency_months": 6
        },
        "status": "operational"
    }
}
```
Note: The value of `belt_length_m` is already 500, so no change is made.


In [60]:
updated_json = find_json(resp['text'])
print(type(updated_json))
updated_json

<class 'dict'>


{'name': 'conveyer',
 'conveyor': {'id': 'C67890',
  'name': 'ConveyorMaster-200',
  'type': 'Belt',
  'manufacturer': 'ConveyorTech',
  'year_of_manufacture': 2021,
  'specifications': {'belt_width_mm': 500,
   'belt_length_m': 500,
   'speed_m_per_min': 50,
   'load_capacity_kg': 1000,
   'motor_power_kw': 2.5,
   'material': 'Stainless Steel',
   'control_system': 'ConveyorControl-Z'},
  'maintenance_schedule': {'last_maintenance_date': '2024-06-15',
   'next_maintenance_date': '2024-11-15',
   'maintenance_frequency_months': 6},
  'status': 'operational'}}

Hurray, we now got the updated JSON content from it.  
Now we can easily save the updates made to this JSON file

Important: LLMs tend to hallucinate, it means sometimes LLAMA models does not perform very well with JSON contents. it is very necessary to check the generated json response before updating the files.  for now we will replace the JSON files directly. 

now here we have three json files to update from, so it is very necessary to know which json file to update. it is necessary that your JSON file have a particular key called "name" which describes the content of the JSON file. lets check this for our content

In [61]:
updated_json.get("name")

'conveyer'

by the help of this key, we can replace the particular JSON file, lets build a function around it. 

In [68]:
def update_jsons(updated_json):
    json_name = updated_json.get("name")
    updated_json = json.dumps(updated_json, indent=4)
    if json_name is None:
        print("[ERROR] json name not specified, try prompting again")
        return 0
    with open(f"{json_name}.json", 'w') as file: #note that the value in name parameter is same as the filename, you can also create your own logic
            file.write(updated_json)
    print("[INFO]", "json files are updated successfully")
    return updated_json

In [70]:
update_jsons(updated_json)

[INFO] json files are updated successfully


'{\n    "name": "conveyer",\n    "conveyor": {\n        "id": "C67890",\n        "name": "ConveyorMaster-200",\n        "type": "Belt",\n        "manufacturer": "ConveyorTech",\n        "year_of_manufacture": 2021,\n        "specifications": {\n            "belt_width_mm": 500,\n            "belt_length_m": 500,\n            "speed_m_per_min": 50,\n            "load_capacity_kg": 1000,\n            "motor_power_kw": 2.5,\n            "material": "Stainless Steel",\n            "control_system": "ConveyorControl-Z"\n        },\n        "maintenance_schedule": {\n            "last_maintenance_date": "2024-06-15",\n            "next_maintenance_date": "2024-11-15",\n            "maintenance_frequency_months": 6\n        },\n        "status": "operational"\n    }\n}'

hurray ! we can now update the JSON files. Lets now build an overall function. 

In [79]:
def llm_update_json(prompt):
    resp = get_response(prompt)['text']
    json_content =find_json(resp)
    if json_content:
        update_jsons(json_content) # TODO: add a logic to update only when the files are different
    return resp

In [80]:
llm_update_json("Show me robot.json")

[INFO] json files are updated successfully


'Here is the contents of the "robot" JSON file:\n\n```\n{\n    "name": "robot",\n    "robot": {\n        "id": "R12345",\n        "name": "RoboWorker-3000",\n        "type": "Articulated",\n        "manufacturer": "RoboCorp",\n        "year_of_manufacture": 2022,\n        "specifications": {\n            "payload_capacity_kg": 150,\n            "reach_mm": 3000,\n            "accuracy_mm": 0.05,\n            "speed_mm_per_sec": 2000,\n            "power_supply_v": 400,\n            "control_system": "RoboController-X",\n            "sensors": [\n                "proximity",\n                "vision",\n                "force"\n            ]\n        },\n        "maintenance_schedule": {\n            "last_maintenance_date": "2024-07-01",\n            "next_maintenance_date": "2024-12-01",\n            "maintenance_frequency_months": 6\n        },\n        "status": "operational"\n    }\n}\n```'

In [81]:
llm_update_json("change its id to R999")

[INFO] json files are updated successfully


'Here is the updated "robot" JSON file:\n\n```\n{\n    "name": "robot",\n    "robot": {\n        "id": "R999",\n        "name": "RoboWorker-3000",\n        "type": "Articulated",\n        "manufacturer": "RoboCorp",\n        "year_of_manufacture": 2022,\n        "specifications": {\n            "payload_capacity_kg": 150,\n            "reach_mm": 3000,\n            "accuracy_mm": 0.05,\n            "speed_mm_per_sec": 2000,\n            "power_supply_v": 400,\n            "control_system": "RoboController-X",\n            "sensors": [\n                "proximity",\n                "vision",\n                "force"\n            ]\n        },\n        "maintenance_schedule": {\n            "last_maintenance_date": "2024-07-01",\n            "next_maintenance_date": "2024-12-01",\n            "maintenance_frequency_months": 6\n        },\n        "status": "operational"\n    }\n}\n```'

see now how we have the complete pipeline to edit json files which also remembers the previous context. 