GitHub - novaladai/novalad: Novalad offers a unified, centralized platform enabling organizations to extract meaningful data and perform advanced processing at high speed.

Novalad is an AI-powered platform that transforms chaotic, unstructured files—such as PDFs and PowerPoints—into beautifully organized, machine-readable data 💡. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach 🧩.

View Novalad Extraction Result

Installation 🚀

Install the Novalad package using pip:

pip install novalad

Usage 📚

Generate API Key:
Log in to Novalad (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.
Importing and Initializing the Client
Begin by importing NovaladClient from the package and initializing it with your API key: You can set NOVALAD_API_KEY in env variable or pass it to Client
```
from novalad import NovaladClient

# Initialize client with your API key
client = NovaladClient(api_key="YOUR_API_KEY") # or set env NOVALAD_API_KEY 
```

Uploading a File from Your Local System

If you have a file stored locally (e.g., a PDF document), specify its file path and use the upload method to send the file for processing.
Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.

# Define the path to your document
path = r"C:\path\to\your\document.pdf"

# Upload the file
client.upload(file_path=path)

After uploading your file, trigger the processing job using the run method:

# Start processing the uploaded file
client.run()

OR

Processing a Document Directly from a URL

If your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the run method. This approach avoids the local upload step.

# Process document directly by passing the file URL
client.run(
    url="https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf"
)

Supported URL Types:

HTTPS URLs
AWS S3 pre-signed URLs
GCP Storage Signed URLs
Azure Blob HTTPS public URLs

Checking Job Status

Monitor the status of your processing job by calling the status method. The job continues until the status is either "success" or "failed":

import time

while True:
    status = client.status()
    if status["status"] in ["success", "failed"]:
        break
    time.sleep(30)  # Check every 30 seconds
    print(".", end="")
print("\n", status)

Retrieving and Rendering Outputs

After the job is complete, you can retrieve and render the results in various formats:

Format	Description
JSON 🧾	Raw layout and structured element data (ideal for developers)
Markdown 📘	Clean, human-readable content for documentation and wikis
Knowledge Graph 🕸️	Visual representation of semantic relations and entities
LangChain Docs 🔗	Plug-and-play format optimized for LLM pipelines

JSON Output

Retrieve the raw JSON response containing structured data, metadata, and extracted text:

json_response = client.output(format="json")
print(json_response)

Markdown Output

Get a Markdown version of the output and render it using the render_markdown helper:

markdown_output = client.output(format="markdown")
print(markdown_output)

LangChain Document Format Output

Retrieve the output as a structured document object for further processing:

documents = client.output(format="document")
print(documents)

Knowledge Graph Output

Retrieve the relationships and entities within the document as a knowledge graph:

kg_output = client.output(format="graph")
print(kg_output)

Rendering the Outputs (NOTEBOOK ONLY!!!)

IF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS

Render JSON Output:
This code renders images displaying the PDF document page-wise with elements and layouts highlighted.
Note: You can also save the rendered images to a local directory by passing save_dir=r"C:\path\to\save\visualization" to the render_elements function.

from novalad import render_elements

render_elements(path, json_response)
# To save images locally:
# render_elements(path, json_response, save_dir=r"C:\path\to\save\visualization")

Render Markdown Output:

from novalad import render_markdown

render_markdown(markdown_output)

Render Knowledge Graph:

from novalad import render_knowledge_graph

render_knowledge_graph(kg_output)

Troubleshooting 🛠️

Job Failure: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.
File Path Issues: Ensure the file path is correctly formatted (use raw strings for Windows paths).
URL Issues: Confirm that the document URL is correct and publicly accessible.
API Key Problems: Verify that your API key is active and valid. If authentication issues persist, please contact support.

for any issue please mail us at info@novalad.ai

License 📄

This project is licensed under the Apache License.

Support 🙋‍♂️🙋‍♀️

For additional help or to report issues, please refer to the official documentation or contact support at info@novalad.ai

Thank you for choosing Novalad! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
docs		docs
notebook		notebook
novalad		novalad
.gitignore		.gitignore
.python-version		.python-version
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
clean_cache.sh		clean_cache.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

Installation 🚀

Usage 📚

Uploading a File from Your Local System

Processing a Document Directly from a URL

Checking Job Status

Retrieving and Rendering Outputs

JSON Output

Markdown Output

LangChain Document Format Output

Knowledge Graph Output

Rendering the Outputs (NOTEBOOK ONLY!!!)

Troubleshooting 🛠️

License 📄

Support 🙋‍♂️🙋‍♀️

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

novaladai/novalad

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Installation 🚀

Usage 📚

Uploading a File from Your Local System

Processing a Document Directly from a URL

Checking Job Status

Retrieving and Rendering Outputs

JSON Output

Markdown Output

LangChain Document Format Output

Knowledge Graph Output

Rendering the Outputs (NOTEBOOK ONLY!!!)

Troubleshooting 🛠️

License 📄

Support 🙋‍♂️🙋‍♀️

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages