# JSON Loader

Let's look at how to load files with the `.json` extension using a loader.

- Author: [leebeanbin](https://github.com/leebeanbin)
- Peer Review: [Teddy Lee](https://github.com/teddylee777)
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/tree/main/06-DocumentLoader)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)

## Overview
This tutorial demonstrates how to use LangChain's JSONLoader to load and process JSON files. We'll explore how to extract specific data from structured JSON files using jq-style queries.

### Table of Contents
- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Generate JSON Data](#generate-json-data)
- [JSONLoader](#jsonloader)
  
When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.


### References

- https://python.langchain.com/docs/how_to/document_loader_json/

## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**
- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. 
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [1]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_openai",
        "langchain_community",
        "jq",
    ],
    verbose=False,
    upgrade=False,
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "JSON-Loader",
    }
)

Environment variables have been set successfully.


You can alternatively set `OPENAI_API_KEY` in `.env` file and load it. 

[Note] This is not necessary if you've already set `OPENAI_API_KEY` in previous steps.

In [3]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Generate JSON Data

if you want to generate JSON data, you can use the following code.


In [4]:
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pathlib import Path
from dotenv import load_dotenv
from pprint import pprint
import json
import os

# Load .env file
load_dotenv()

# Initialize ChatOpenAI
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    model_kwargs={"response_format": {"type": "json_object"}},
)

# Create prompt template
prompt = PromptTemplate(
    input_variables=[],
    template="""Generate a JSON array containing detailed personal information for 5 people. 
        Include various fields like name, age, contact details, address, personal preferences, and any other interesting information you think would be relevant.""",
)

# Create and invoke runnable sequence using the new pipe syntax
response = (prompt | llm).invoke({})
generated_data = json.loads(response.content)

# Save to JSON file
current_dir = Path().absolute()
data_dir = current_dir / "data"
data_dir.mkdir(exist_ok=True)

file_path = data_dir / "people.json"
with open(file_path, "w", encoding="utf-8") as f:
    json.dump(generated_data, f, ensure_ascii=False, indent=2)

print("Generated and saved JSON data:")
pprint(generated_data)

Generated and saved JSON data:
{'people': [{'address': {'city': 'Springfield',
                         'country': 'USA',
                         'state': 'IL',
                         'street': '123 Maple St',
                         'zip': '62701'},
             'age': 28,
             'contact': {'email': 'alice.johnson@example.com',
                         'phone': '+1234567890'},
             'interesting_info': {'pet': {'name': 'Buddy', 'type': 'dog'},
                                  'travel_history': ['France',
                                                     'Japan',
                                                     'Brazil']},
             'name': {'first': 'Alice', 'last': 'Johnson'},
             'personal_preferences': {'favorite_color': 'blue',
                                      'favorite_food': 'Italian',
                                      'hobbies': ['reading',
                                                  'hiking',
                                

The case of loading JSON data is as follows when you want to load your own JSON data.

In [5]:
import json
from pathlib import Path
from pprint import pprint


file_path = "data/people.json"
data = json.loads(Path(file_path).read_text())

pprint(data)

{'people': [{'address': {'city': 'Springfield',
                         'country': 'USA',
                         'state': 'IL',
                         'street': '123 Maple St',
                         'zip': '62701'},
             'age': 28,
             'contact': {'email': 'alice.johnson@example.com',
                         'phone': '+1234567890'},
             'interesting_info': {'pet': {'name': 'Buddy', 'type': 'dog'},
                                  'travel_history': ['France',
                                                     'Japan',
                                                     'Brazil']},
             'name': {'first': 'Alice', 'last': 'Johnson'},
             'personal_preferences': {'favorite_color': 'blue',
                                      'favorite_food': 'Italian',
                                      'hobbies': ['reading',
                                                  'hiking',
                                                  'cooking'],
 

In [6]:
print(type(data))

<class 'dict'>


## JSONLoader

When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.

In [7]:
from langchain_community.document_loaders import JSONLoader

# Create JSONLoader
loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[]",  # Access each item in the people array
    text_content=False,
)

# Example: extract only contact_details
# loader = JSONLoader(
#     file_path="data/people.json",
#     jq_schema=".people[].contact_details",
#     text_content=False,
# )

# Or extract only hobbies from personal_preferences
# loader = JSONLoader(
#     file_path="data/people.json",
#     jq_schema=".people[].personal_preferences.hobbies",
#     text_content=False,
# )

# Load documents
docs = loader.load()
pprint(docs)

[Document(metadata={'source': '/Users/teddy/Documents/GitHub/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": {"first": "Alice", "last": "Johnson"}, "age": 28, "contact": {"email": "alice.johnson@example.com", "phone": "+1234567890"}, "address": {"street": "123 Maple St", "city": "Springfield", "state": "IL", "zip": "62701", "country": "USA"}, "personal_preferences": {"hobbies": ["reading", "hiking", "cooking"], "favorite_food": "Italian", "favorite_color": "blue", "languages_spoken": ["English", "Spanish"]}, "interesting_info": {"pet": {"type": "dog", "name": "Buddy"}, "travel_history": ["France", "Japan", "Brazil"]}}'),
 Document(metadata={'source': '/Users/teddy/Documents/GitHub/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": {"first": "Michael", "last": "Smith"}, "age": 34, "contact": {"email": "michael.smith@example.com", "phone": "+9876543210"}, "address": {"street": "456 Oak Ave", 