<div align="center">
<p align="center" style="width: 100%;">
    <img src="https://raw.githubusercontent.com/vlm-run/.github/refs/heads/main/profile/assets/vlm-black.svg" alt="VLM Run Logo" width="80" style="margin-bottom: -5px; color: #2e3138; vertical-align: middle; padding-right: 5px;"><br>
</p>
<p align="center"><a href="https://docs.vlm.run"><b>Website</b></a> | <a href="https://docs.vlm.run/"><b>API Docs</b></a> | <a href="https://docs.vlm.run/blog"><b>Blog</b></a> | <a href="https://discord.gg/AMApC2UzVY"><b>Discord</b></a> | <a href="https://chat.vlm.run"><b>Chat</b></a>
</p>
</div>

# VLM Run Orion - 3D Reconstruction (Node.js)

This comprehensive cookbook demonstrates [VLM Run Orion's](https://vlm.run/orion) 3D reconstruction capabilities using **Node.js/TypeScript**. For more details on the API, see the [Agent API docs](https://docs.vlm.run/agents/introduction).

For this notebook, we'll cover how to use the **VLM Run Agent Chat Completions API** - an OpenAI-compatible interface for building powerful 3D reconstruction workflows with the same familiar chat-completions interface.

We'll cover the following topics:
 1. 3D Reconstruction from Single Images (depth estimation and geometry inference)
 2. Multi-Object 3D Reconstruction from a Single Image
 3. 3D Reconstruction from Multiple Images/Video (multi-view stereo reconstruction)

## Prerequisites

- Node.js 18+
- VLM Run API key (get one at [app.vlm.run](https://app.vlm.run))
- Deno or tslab kernel for running TypeScript in Jupyter


## Setup

First, install the required packages and configure the environment.


In [None]:
// Install the VLM Run SDK
// npm install vlmrun openai zod zod-to-json-schema

// If using Deno kernel, install dependencies via npm specifiers
// For tslab, run: npm install vlmrun openai zod zod-to-json-schema in your project directory


In [1]:
// Import the VLM Run SDK and dependencies
import { VlmRun } from "npm:vlmrun";
import { z } from "npm:zod";
import { zodToJsonSchema } from "npm:zod-to-json-schema";


In [2]:
// Get API key from environment variable
const VLMRUN_API_KEY = Deno.env.get("VLMRUN_API_KEY");

if (!VLMRUN_API_KEY) {
    throw new Error("Please set the VLMRUN_API_KEY environment variable");
}

console.log("✓ API Key loaded successfully");


✓ API Key loaded successfully


## Initialize the VLM Run Client

We use the OpenAI-compatible chat completions interface through the VLM Run SDK.


In [3]:
// Initialize the VLM Run client using the SDK
const client = new VlmRun({
    apiKey: VLMRUN_API_KEY,
    baseURL: "https://agent.vlm.run/v1"  // Use the agent API endpoint
});

console.log("✓ VLM Run client initialized successfully!");
console.log("Base URL: https://agent.vlm.run/v1");
console.log("Model: vlmrun-orion-1");


✓ VLM Run client initialized successfully!
Base URL: https://agent.vlm.run/v1
Model: vlmrun-orion-1


## Response Models (Schemas)

We define Zod schemas for structured outputs. These schemas provide type-safe, validated responses.


In [4]:
// 3D Reconstruction Response Schema
const Recon3DResponseSchema = z.object({
    recon_path: z.string().describe("Pre-signed URL to the 3D reconstruction file (PLY format)")
});

console.log("✓ Response schemas defined successfully!");
console.log("Schemas include type-safe validation for structured outputs.");


✓ Response schemas defined successfully!
Schemas include type-safe validation for structured outputs.


## Helper Functions

We create helper functions to simplify making chat completion requests and working with 3D reconstruction files.


In [5]:
/**
 * Make a chat completion request with optional images/video and structured output.
 * 
 * @param prompt - The text prompt/instruction
 * @param images - Optional list of images to process (URLs)
 * @param video - Optional video URL to process
 * @param responseSchema - Optional Zod schema for structured output
 * @param model - Model to use (default: vlmrun-orion-1:auto)
 * @returns Parsed response if responseSchema provided, else raw response text
 */
async function chatCompletion(
    prompt,
    images,
    video,
    responseSchema,
    model = "vlmrun-orion-1:auto"
) {
    const content = [];
    content.push({ type: "text", text: prompt });

    if (images) {
        for (const image of images) {
            if (typeof image === "string") {
                if (!image.startsWith("http")) {
                    throw new Error("Image URLs must start with http or https");
                }
                content.push({
                    type: "image_url",
                    image_url: { url: image, detail: "auto" }
                });
            }
        }
    }

    if (video) {
        if (!video.startsWith("http")) {
            throw new Error("Video URLs must start with http or https");
        }
        content.push({
            type: "video_url",
            video_url: { url: video }
        });
    }

    const kwargs = {
        model: model,
        messages: [{ role: "user", content: content }]
    };

    if (responseSchema) {
        kwargs.response_format = {
            type: "json_schema",
            schema: zodToJsonSchema(responseSchema)
        };
    }

    const response = await client.agent.completions.create(kwargs);
    const responseText = response.choices[0].message.content || "";

    if (responseSchema) {
        const parsed = JSON.parse(responseText);
        return responseSchema.parse(parsed);
    }

    return responseText;
}

/**
 * Download a PLY file from a URL.
 * 
 * @param url - URL to the PLY file
 * @param filename - Optional filename to save as
 * @returns The downloaded file data as Uint8Array
 */
async function downloadPly(url, filename) {
    const urlParts = url.split("/");
    const lastPart = urlParts[urlParts.length - 1] || "";
    const actualFilename = filename || lastPart.split("?")[0] || "model.ply";
    console.log(`Downloading PLY file → ${actualFilename}`);
    
    const response = await fetch(url);
    if (!response.ok) {
        throw new Error(`Failed to download PLY file: ${response.status} ${response.statusText}`);
    }
    
    const arrayBuffer = await response.arrayBuffer();
    const data = new Uint8Array(arrayBuffer);
    
    console.log(`✓ Downloaded ${data.length} bytes`);
    console.log(`File saved as: ${actualFilename}`);
    
    return data;
}

console.log("✓ Helper functions defined!");
console.log("  - chatCompletion(): Make chat completion requests with images/video");
console.log("  - downloadPly(): Download 3D reconstruction PLY files");


✓ Helper functions defined!
  - chatCompletion(): Make chat completion requests with images/video
  - downloadPly(): Download 3D reconstruction PLY files


### 1. 3D Reconstruction from a Single Image

Create a 3D model from a single image. The model will infer depth and geometry to create a full 3D reconstruction.


In [6]:
const IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/agent_use_cases/guided-segmentation/image-11.png";

console.log("Input image:", IMAGE_URL);
console.log("\n>> Generating 3D reconstruction...");

const result = await chatCompletion(
    "Generate a 3D reconstruction of the table in the image",
    [IMAGE_URL],
    undefined,
    Recon3DResponseSchema
);

console.log("\n>> RESPONSE");
console.log(result);
console.log("\n>> 3D Reconstruction PLY file:", result.recon_path);


Input image: https://storage.googleapis.com/vlm-data-public-prod/hub/examples/agent_use_cases/guided-segmentation/image-11.png

>> Generating 3D reconstruction...

>> RESPONSE
{
  recon_path: "https://storage.googleapis.com/vlm-userdata-prod/agents/artifacts/ae8ae740-ddd6-426b-bde1-75540f99f277/2df2e00f-a978-4eca-9e49-66aa11391cec/3d_reconstruction_1766330097.ply?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=vlm-deployments%40vlm-infra-prod.iam.gserviceaccount.com%2F20251221%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251221T151501Z&X-Goog-Expires=604800&X-Goog-SignedHeaders=host&X-Goog-Signature=c062c7cf75df5c91a963b19c3fe5bcf8bb403a7fc97632e10a78a61c45ad17823fd500c4ea6126e4a5e9e04969835998c2ed5e5fc1722b68d398c421897f6b07c8a0d7f0f142bfe0b28406e832d3c36b27b1c07fefa615d1bf8d5e0ffa239b368ebcb1a46cf6d2ad7a731865abc99a356868b0df4543c70f965b12b0a217a302b060b069e96f3833fbe0ecab655b30207304fabc6035db1f823a22a8804c3fbec0a09782cc69a9382f524bc2b182a953450da37fc92d1f99b313438c43ad8df7e3e

In [7]:
// Download the PLY file
const plyData = await downloadPly(result.recon_path);

console.log("\n>> The PLY file is now available for:");
console.log("  - 3D visualization in external tools (Blender, MeshLab, etc.)");
console.log("  - Further processing with 3D libraries");
console.log("  - Integration into 3D applications");

// Note: To visualize in the notebook, you would need a 3D visualization library
// For now, the file is downloaded and ready for use in external tools


Downloading PLY file → 3d_reconstruction_1766330097.ply
✓ Downloaded 45139360 bytes
File saved as: 3d_reconstruction_1766330097.ply

>> The PLY file is now available for:
  - 3D visualization in external tools (Blender, MeshLab, etc.)
  - Further processing with 3D libraries
  - Integration into 3D applications


In [8]:
const IMAGE_URL_FURN = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/image.caption/furniture-colorful.jpg";

console.log("Input image:", IMAGE_URL_FURN);
console.log("\n>> Generating multi-object 3D reconstruction...");

const resultMulti = await chatCompletion(
    "Generate a 3D reconstruction of the two chairs in the image, by first detecting them, segmenting them and then reconstructing",
    [IMAGE_URL_FURN],
    undefined,
    Recon3DResponseSchema
);

console.log("\n>> RESPONSE");
console.log(resultMulti);
console.log("\n>> 3D Reconstruction PLY file:", resultMulti.recon_path);


Input image: https://storage.googleapis.com/vlm-data-public-prod/hub/examples/image.caption/furniture-colorful.jpg

>> Generating multi-object 3D reconstruction...

>> RESPONSE
{
  recon_path: "https://storage.googleapis.com/vlm-userdata-prod/agents/artifacts/ae8ae740-ddd6-426b-bde1-75540f99f277/33ea7d93-440f-4d37-8ead-b7dc95988e4c/3d_reconstruction_1766330375.ply?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=vlm-deployments%40vlm-infra-prod.iam.gserviceaccount.com%2F20251221%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251221T151939Z&X-Goog-Expires=604800&X-Goog-SignedHeaders=host&X-Goog-Signature=0cd044d663f4bda6cf692225fbd20089904f3322171621e5e4d44f98eaca1f7faa1f111826f6e254d7dadaa5dbbb989c15502a2b991fb9e3ded797275992c2d9317c84c7a271a673dbc54ee00ddfc5bd2f21f798fff8c44619f01a141fcd6958306432c7a4c0e35c79b7c28f0593bb5706c5800e310e38c30adfdb5b39f74f239395f201952277c4bd7e9e60eb221790bcad1f6b0a062ad8ee3b0ee859539fde8216d1211e75bc05b41faaaa404f98c4bc3bd3ff709af058b2372d8275e1d22570

In [9]:
// Download the PLY file
const plyDataMulti = await downloadPly(resultMulti.recon_path);

console.log("\n>> Multi-object 3D reconstruction completed!");
console.log("The PLY file contains the reconstructed chairs and is ready for visualization.");


Downloading PLY file → 3d_reconstruction_1766330375.ply
✓ Downloaded 24204064 bytes
File saved as: 3d_reconstruction_1766330375.ply

>> Multi-object 3D reconstruction completed!
The PLY file contains the reconstructed chairs and is ready for visualization.


### 3. 3D Reconstruction from Video

For better results, provide a video of the scene. The model will automatically extract frames from different viewpoints to create a more robust 3D reconstruction.


In [10]:
const VIDEO_URL = "https://storage.googleapis.com/vlm-data-public-prod/web/videos/tunnel.mp4";

console.log("Input video:", VIDEO_URL);
console.log("\n>> Generating 3D reconstruction from video...");

const resultScene = await chatCompletion(
    "Generate a 3D reconstruction of the scene by sampling some frames from the video",
    undefined,
    VIDEO_URL,
    Recon3DResponseSchema
);

console.log("\n>> RESPONSE");
console.log(resultScene);
console.log("\n>> 3D Reconstruction PLY file:", resultScene.recon_path);


Input video: https://storage.googleapis.com/vlm-data-public-prod/web/videos/tunnel.mp4

>> Generating 3D reconstruction from video...

>> RESPONSE
{
  recon_path: "https://storage.googleapis.com/vlm-userdata-prod/agents/artifacts/ae8ae740-ddd6-426b-bde1-75540f99f277/8f8a046a-1119-464d-8297-b8e99c304966/3d_reconstruction_multiview_1766330657.ply?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=vlm-deployments%40vlm-infra-prod.iam.gserviceaccount.com%2F20251221%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251221T152422Z&X-Goog-Expires=604800&X-Goog-SignedHeaders=host&X-Goog-Signature=35950fc1adc692054d564973119e3965331845830e3eb63e0aa70ddecb4f9d6d8f48180af2f1f9b96a738b55f8b56cf1a4140bc82ef7ff1269d8687ccf41f778c578573fd3d0a2b8eba71d35a265b5429d0864308419039a3fe4dd088d18e693f91fd305d681cd1ad5544515865d7dcfde35f03cc7cee8c8e678033ab9592107606a21750a74fe376f6d4aa38f50b8097b926806b91d31245a8708249de4e29b5008451106c65dab18290d5d150716ab87a5ce07c96ebd26040f42630ce2b2380b2ed022ddfe4c69a44f7b

In [11]:
// Download the PLY file
const plyDataScene = await downloadPly(resultScene.recon_path);

console.log("\n>> 3D reconstruction from video completed!");
console.log("The PLY file contains the reconstructed scene and is ready for visualization.");


Downloading PLY file → 3d_reconstruction_multiview_1766330657.ply
✓ Downloaded 35732036 bytes
File saved as: 3d_reconstruction_multiview_1766330657.ply

>> 3D reconstruction from video completed!
The PLY file contains the reconstructed scene and is ready for visualization.


---

## Conclusion

This cookbook demonstrated the comprehensive 3D reconstruction capabilities of the **VLM Run Orion Agent API** using Node.js/TypeScript.

### Key Takeaways

1. **OpenAI-Compatible Interface**: The API follows the OpenAI chat completions format, making it easy to integrate with existing workflows and tools.
2. **Structured Outputs**: Use Zod schemas with `response_format` parameter to get type-safe, validated responses with automatic parsing.
3. **3D Reconstruction from Single Images**: Generate 3D models by inferring depth and geometry from a single input image.
4. **Multi-Object 3D Reconstruction**: Reconstruct multiple objects within a single image by first detecting and segmenting them.
5. **3D Reconstruction from Video**: Utilize videos to automatically extract frames and reconstruct a scene for more robust 3D models.
6. **Type Safety**: TypeScript and Zod provide compile-time and runtime type checking for better developer experience.

### Next Steps

- Explore the [VLM Run Documentation](https://docs.vlm.run) for more details
- Check out the [Agent API docs](https://docs.vlm.run/agents/introduction) for advanced features
- Join our [Discord community](https://discord.gg/AMApC2UzVY) for support
- Check out more examples in the [VLM Run Cookbook](https://docs.vlm.run/blog)
- Review the [VLM Run Node.js SDK](https://github.com/vlm-run/vlmrun-node-sdk) documentation

Happy building!
