# Loading a document

This notebook demonstrates the use of the TrustGraph librarian API to submit text for processing.

The API optionally allows extra arbitrary metadata to be submitted with the document which is associated with the document and added to the triple store.  In this example, we add metadata in a form structured in line with schema.org Organization, PublicationEvent and Document schemas.

The additional metadata is optional, the metadata element can be ignored.  However, if additional metadata is known, it can be integrated with TrustGraph processing.

This particular processing uses the following API calls:
- Load a document into the library
- Create a new flow
- Submit the document for processing in that flow

This will fail if the document, flow and flow submission already exist, so don't execute this notebook more than once.

In [1]:
!pip install trustgraph-base


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import trustgraph.api as tg
from trustgraph.knowledge import DigitalDocument, Organization
from trustgraph.knowledge import PublicationEvent, hash, to_uri
from trustgraph.knowledge import PREF_PUBEV, PREF_ORG, PREF_DOC

In [3]:
# Open a text document from this repo.  PDFs are binary blobs
text = open("../sources/README.cats", "r").read()

title = "Mark's cats"

In [4]:
cli = tg.Api()

In [5]:
org_id = to_uri(PREF_ORG, "3c35111a-f8ce-54b2-4dd6-c673f8bf0d09")
doc_id = to_uri(PREF_DOC, "4faa45c1-f91a-a96a-d44f-2e57b9813db8")
pub_id = to_uri(PREF_PUBEV, "a847d950-a281-4099-aaab-c5e35333ff61")

In [6]:
org = Organization(
    id = org_id,
    name = "trustgraph.ai",
)

pubev = PublicationEvent(
    id = pub_id,
    description = "Uploading to Github",
    start_date = "2024-10-23",
    end_date = "2024-10-23",
    organization = org,
)

doc = DigitalDocument(
    id = doc_id,
    name = "Mark's cats",
    description = "This document describes Mark's cats",
    copyright_holder = "trustgraph.ai",
    copyright_notice = "Public domain",
    copyright_year = "2024",
    keywords = ["animals", "cats", "home-life"],
    publication = pubev,
    url = "https://example.com",
)

In [7]:
resp = cli.library().add_document(
    document = text.encode("utf-8"),
    id = doc_id,
    metadata = doc,
    user = "trustgraph",
    title = title,
    comments = "A test data document",
    kind = "application/pdf",
    tags = [ "cats", "pets" ]
)

# Start a flow

In [8]:
cli.flow().start(
    class_name = "document-rag+graph-rag", 
    id = "my-flow",
    description = "My new flow",
)

# Submit document for processing

In [9]:
cli.library().start_processing(
    id = "proc01",
    document_id = doc_id,
    flow = "my-flow",
    user = "trustgraph",
    collection = "default",
    tags = [ "my document", "processing test" ]
)

{}