# JSON
JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).

JSON Lines is a file format where each line is a valid JSON value.

The JSONLoader uses a specified jq schema to parse the JSON files. It uses the jq python package. Check this manual for a detailed documentation of the jq syntax.

In [None]:
from langchain_community.document_loaders import JSONLoader

In [None]:
import json
from pathlib import Path
from pprint import pprint


file_path='../../../text_files/chapter_10_01.json'
data = json.loads(Path(file_path).read_text())

In [None]:
pprint(data)

## Using JSONLoader
Suppose we are interested in extracting the values under the content field within the messages key of the JSON data. This can easily be done through the JSONLoader as shown below.

### JSON file

In [None]:
file_path='../../../text_files/chapter_10_01.json'

summary_loader = JSONLoader(
    file_path=file_path,
    jq_schema='.summary',
    # jq_schema='.proposition[]',
    text_content=False)

uuid_loader = JSONLoader(
    file_path=file_path,
    jq_schema='.uuid',
    text_content=True)

proposition_loader = JSONLoader(
    file_path=file_path,
    # jq_schema='.[]',
    jq_schema='.proposition[]',
    text_content=False)

summary_data = summary_loader.load()
uuid_data = uuid_loader.load()
proposition_data = proposition_loader.load()

In [None]:
# print(summary_data[0].page_content)
print(uuid_data)

In [None]:
# proposition_data = [(data.metadata = "test") for data in proposition_data]
# proposition_data = [data.metadata for data in proposition_data]
for data in proposition_data:
    data.metadata = {"summary":summary_data[0].page_content, "uuid":uuid_data[0].page_content}

In [None]:
# print(summary_data[0].metadata)
print(proposition_data)
# print(type(proposition_data[0]))

In [None]:
pprint(Path(file_path).read_text())

## Extracting metadata
Generally, we want to include metadata available in the JSON file into the documents that we create from the content.

The following demonstrates how metadata can be extracted using the JSONLoader.

There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to directly specify in the schema where the value for the page_content can be extracted from.

`.messages[].content`

In the current example, we have to tell the loader to iterate over the records in the messages field. The jq_schema then has to be:

`.messages[]`

This allows us to pass the records (dict) into the metadata_func that has to be implemented. The metadata_func is responsible for identifying which pieces of information in the record should be included in the metadata stored in the final Document object.

Additionally, we now have to explicitly specify in the loader, via the content_key argument, the key from the record where the value for the page_content needs to be extracted from.

In [None]:
# Define the metadata extraction function.
# def metadata_func(record: dict, metadata: dict) -> dict:
#     metadata["summary"] = record.get("summary")
#     metadata["keywords"] = record.get("keywords")
#     return metadata

def metadata_func(record: dict, metadata: dict) -> dict:
    metadata["summary"] = record.get("summary", "")
    metadata["keywords"] = record.get("keywords", [])
    return metadata

file_path='../../../text_files/chapter_10_01.json'

loader = JSONLoader(
    file_path=file_path,
    jq_schema='.[]',
     text_content=False,
    # content_key="proposition",
    # metadata_func=metadata_func
)

# loader = JSONLoader(
#     file_path=file_path,
#     # jq_schema='.[]',
#     jq_schema='.proposition[]',
#     text_content=False)

data = loader.load()

In [None]:
pprint(data)