Novalad is an AI-powered platform that transforms chaotic, unstructured files—such as PDFs and PowerPoints—into beautifully organized, machine-readable data 💡. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach 🧩.
View Novalad Extraction Result
Install the Novalad package using pip:
pip install novalad
-
Generate API Key:
Log in to Novalad (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy. -
Importing and Initializing the Client
Begin by importingNovaladClient
from the package and initializing it with your API key: You can setNOVALAD_API_KEY
in env variable or pass it to Clientfrom novalad import NovaladClient # Initialize client with your API key client = NovaladClient(api_key="YOUR_API_KEY") # or set env NOVALAD_API_KEY
If you have a file stored locally (e.g., a PDF document), specify its file path and use the upload
method to send the file for processing.
Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.
# Define the path to your document
path = r"C:\path\to\your\document.pdf"
# Upload the file
client.upload(file_path=path)
After uploading your file, trigger the processing job using the run
method:
# Start processing the uploaded file
client.run()
OR
If your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the run
method. This approach avoids the local upload step.
# Process document directly by passing the file URL
client.run(
url="https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf"
)
Supported URL Types:
- HTTPS URLs
- AWS S3 pre-signed URLs
- GCP Storage Signed URLs
- Azure Blob HTTPS public URLs
Monitor the status of your processing job by calling the status
method. The job continues until the status is either "success"
or "failed"
:
import time
while True:
status = client.status()
if status["status"] in ["success", "failed"]:
break
time.sleep(30) # Check every 30 seconds
print(".", end="")
print("\n", status)
After the job is complete, you can retrieve and render the results in various formats:
Format | Description |
---|---|
JSON 🧾 | Raw layout and structured element data (ideal for developers) |
Markdown 📘 | Clean, human-readable content for documentation and wikis |
Knowledge Graph 🕸️ | Visual representation of semantic relations and entities |
LangChain Docs 🔗 | Plug-and-play format optimized for LLM pipelines |
Retrieve the raw JSON response containing structured data, metadata, and extracted text:
json_response = client.output(format="json")
print(json_response)
Get a Markdown version of the output and render it using the render_markdown
helper:
markdown_output = client.output(format="markdown")
print(markdown_output)
Retrieve the output as a structured document object for further processing:
documents = client.output(format="document")
print(documents)
Retrieve the relationships and entities within the document as a knowledge graph:
kg_output = client.output(format="graph")
print(kg_output)
IF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS
Render JSON Output:
This code renders images displaying the PDF document page-wise with elements and layouts highlighted.
Note: You can also save the rendered images to a local directory by passing save_dir=r"C:\path\to\save\visualization"
to the render_elements
function.
from novalad import render_elements
render_elements(path, json_response)
# To save images locally:
# render_elements(path, json_response, save_dir=r"C:\path\to\save\visualization")
Render Markdown Output:
from novalad import render_markdown
render_markdown(markdown_output)
Render Knowledge Graph:
from novalad import render_knowledge_graph
render_knowledge_graph(kg_output)
- Job Failure: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.
- File Path Issues: Ensure the file path is correctly formatted (use raw strings for Windows paths).
- URL Issues: Confirm that the document URL is correct and publicly accessible.
- API Key Problems: Verify that your API key is active and valid. If authentication issues persist, please contact support.
for any issue please mail us at info@novalad.ai
This project is licensed under the Apache License.
For additional help or to report issues, please refer to the official documentation or contact support at info@novalad.ai
Thank you for choosing Novalad! 🚀