# JSON

Let's look at how to load files with the `.json` extension using a loader.

- Author: [leebeanbin](https://github.com/leebeanbin)
- Design:
- Peer Review : [syshin0116](https://github.com/syshin0116), [Teddy Lee](https://github.com/teddylee777)
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/tree/main/06-DocumentLoader)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/06-DocumentLoader/10-JSON-Loader.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/06-DocumentLoader/10-JSON-Loader.ipynb)

## Environment Setup

Setting up your environment is the first step. See the [Environment Setup](https://wikidocs.net/257836) guide for more details.

**[Note]**
- The `langchain-opentutorial` is a bundle of easy-to-use environment setup guidance, useful functions and utilities for tutorials.
- Check out the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

## Overview
This tutorial demonstrates how to use LangChain's JSONLoader to load and process JSON files. We'll explore how to extract specific data from structured JSON files using jq-style queries.

### Table of Contents
- [Environment Set up](#environment-setup)
- [JSON](#json)
- [Overview](#overview)
- [Generate JSON Data](#generate-json-data)
- [JSONLoader](#jsonloader)
  
When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.


### reference
- https://python.langchain.com/docs/how_to/document_loader_json/

## Environment Setup

You can set and load `OPENAI_API_KEY` from a `.env` file when you'd like to make new json file.


In [1]:
%pip install langchain langchain_openai langchain_community rq

Collecting rq
  Downloading rq-2.1.0-py3-none-any.whl.metadata (5.8 kB)
Collecting click>=5 (from rq)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting redis>=3.5 (from rq)
  Downloading redis-5.2.1-py3-none-any.whl.metadata (9.1 kB)
Downloading rq-2.1.0-py3-none-any.whl (96 kB)
Downloading click-8.1.8-py3-none-any.whl (98 kB)
Downloading redis-5.2.1-py3-none-any.whl (261 kB)
Installing collected packages: redis, click, rq
Successfully installed click-8.1.8 redis-5.2.1 rq-2.1.0
Note: you may need to restart the kernel to use updated packages.


## Generate JSON Data

---

if you want to generate JSON data, you can use the following code.


In [33]:
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pathlib import Path
from dotenv import load_dotenv
from pprint import pprint
import json
import os

# Load .env file
load_dotenv()

# Initialize ChatOpenAI
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    model_kwargs={"response_format": {"type": "json_object"}}
)

# Create prompt template
prompt = PromptTemplate(
    input_variables=[],
    template="""Generate a JSON array containing detailed personal information for 5 people. 
        Include various fields like name, age, contact details, address, personal preferences, and any other interesting information you think would be relevant."""
)

# Create and invoke runnable sequence using the new pipe syntax
response = (prompt | llm).invoke({})
generated_data = json.loads(response.content)

# Save to JSON file
current_dir = Path().absolute()
data_dir = current_dir / "data"
data_dir.mkdir(exist_ok=True)

file_path = data_dir / "people.json"
with open(file_path, "w", encoding="utf-8") as f:
    json.dump(generated_data, f, ensure_ascii=False, indent=2)

print("Generated and saved JSON data:")
pprint(generated_data)

Generated and saved JSON data:
{'people': [{'address': {'city': 'Springfield',
                         'country': 'USA',
                         'state': 'IL',
                         'street': '123 Maple St',
                         'zip': '62704'},
             'age': 28,
             'contact': {'email': 'alice.johnson@example.com',
                         'phone': '+1-555-0123',
                         'social_media': {'linkedin': 'linkedin.com/in/alicejohnson',
                                          'twitter': '@alice_j'}},
             'interesting_fact': 'Alice has traveled to over 15 countries and '
                                 'speaks 3 languages.',
             'name': {'first': 'Alice', 'last': 'Johnson'},
             'personal_preferences': {'favorite_food': 'Italian',
                                      'hobbies': ['Reading',
                                                  'Hiking',
                                                  'Cooking'],
           

The case of loading JSON data is as follows when you want to load your own JSON data.

In [34]:
import json
from pathlib import Path
from pprint import pprint


file_path = "data/people.json"
data = json.loads(Path(file_path).read_text())

pprint(data)

{'people': [{'address': {'city': 'Springfield',
                         'country': 'USA',
                         'state': 'IL',
                         'street': '123 Maple St',
                         'zip': '62704'},
             'age': 28,
             'contact': {'email': 'alice.johnson@example.com',
                         'phone': '+1-555-0123',
                         'social_media': {'linkedin': 'linkedin.com/in/alicejohnson',
                                          'twitter': '@alice_j'}},
             'interesting_fact': 'Alice has traveled to over 15 countries and '
                                 'speaks 3 languages.',
             'name': {'first': 'Alice', 'last': 'Johnson'},
             'personal_preferences': {'favorite_food': 'Italian',
                                      'hobbies': ['Reading',
                                                  'Hiking',
                                                  'Cooking'],
                                      'mus

In [35]:
print(type(data))

<class 'dict'>


# JSONLoader

---

When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.

In [32]:
from langchain_community.document_loaders import JSONLoader

# Create JSONLoader
loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[]",  # Access each item in the people array
    text_content=False,
)

# Example: extract only contact_details
# loader = JSONLoader(
#     file_path="data/people.json",
#     jq_schema=".people[].contact_details",
#     text_content=False,
# )

# Or extract only hobbies from personal_preferences
# loader = JSONLoader(
#     file_path="data/people.json",
#     jq_schema=".people[].personal_preferences.hobbies",
#     text_content=False,
# )

# Load documents
docs = loader.load()
pprint(docs)

[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": "Alice Smith", "age": 32, "contact": {"email": "alice.smith@example.com", "phone": "555-123-4567"}, "address": {"street": "123 Main St", "city": "New York", "state": "NY", "zip": "10001"}, "personal_preferences": {"favorite_color": "blue", "hobbies": ["reading", "yoga"], "favorite_food": "sushi"}}'),
 Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": "John Doe", "age": 45, "contact": {"email": "john.doe@example.com", "phone": "555-987-6543"}, "address": {"street": "456 Elm St", "city": "Los Angeles", "state": "CA", "zip": "90001"}, "personal_preferences": {"favorite_color": "green", "hobbies": ["hiking", "gardening"], "favorite_food": "pizza"}}'),
 Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial