Skip to content

wordghost1234/agnes-ai-storyboard-studio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

preview

AuraSynth: Open Narrative Engine

Welcome to AuraSynth — an entirely self-hosted, open-source platform for generating AI-crafted narrative videos from raw text. Unlike conventional tools that lean on cloud dependencies or hidden pricing tiers, this project puts the full pipeline in your hands: text-to-multi-scene video with synchronized voiceover, intelligent subtitle embedding, and a virtual presenter entity. This is not merely another generator; it is a narrative constructor designed for creators, educators, storytellers, and developers who demand sovereign control over their media production.

AuraSynth eschews the typical "freemium" model. There are no usage caps, no watermark locks, and no per-minute fees. You run it on your infrastructure, and you own every pixel, every audio frame, every subtitle line. The underlying engine—inspired by the architectural principles of the Agnes AI lineage—has been reimagined from the ground up to prioritize modularity, local inference, and deep customization.

Whether your goal is to produce explainer videos, dynamic presentations, visual poetry, or synthetic news anchors, AuraSynth provides a unified Web UI that manages complexity while exposing the levers for fine-grained control.

Download


📖 Overview

AuraSynth transforms the traditional video creation workflow into a single, coherent process. Input a script or a series of prompts, and the system orchestrates a chain of AI models to generate visual scenes, convert text to speech, synchronize lip movements for a digital host, and composite everything into a final video file with burned-in subtitles.

The core philosophy is composition over compilation. Instead of a black box that outputs a video, AuraSynth breaks the process into inspectable stages: scene planning, visual generation, audio narration, subtitle timing, and final rendering. Each stage can be interrupted, adjusted, or replaced with external tools. This makes it an ideal foundation for experimentation, research into generative pipelines, or production workflows that require manual oversight on specific elements.

Core Capabilities

Feature Description
Multi-Scene Orchestration Break a single text block into a coherent sequence of scenes with automatic transition logic.
Present-Grade Narration Neural text-to-speech with emotion modulation, pitch variance, and pace control.
Embedded Subtitle Rendering Automatic speech-to-text alignment with customizable typography, positioning, and animation.
Digital Anchor Module A synthetic presenter that reads the narration with realistic lip-sync and gesture cues.
API-First Design All functionality is accessible via RESTful endpoints, enabling integration into larger media workflows.

🌟 Distinctive Features

1. Script-to-Storyboard Inference

Rather than requiring manual scene descriptions, AuraSynth employs a language model to parse a narrative script and infer visual elements: character positions, camera angles, lighting moods, and temporal transitions. This reduces the creative overhead from "describe every frame" to "provide the story."

2. Contextual Subtitle Intelligence

Subtitles are not merely burned as OCR output. The system analyzes sentence boundaries, emotional beats, and audio cadence to insert line breaks that align with natural speech pauses. The result is a subtitle track that feels timed to the performance, not pasted onto it.

3. Local Inference Sanctuary

All AI models—visual, audio, and textual—run on your hardware. No data leaves your network. This is especially critical for sensitive content, proprietary scripts, or projects that cannot rely on third-party inference providers. AuraSynth supports GPU acceleration via CUDA and Metal, with CPU fallback for smaller models.

4. Responsive Control Surface

The Web UI adapts to screen sizes from mobile to ultrawide monitors. The interface is built on a reactive framework that provides real-time progress visualization for each stage of the pipeline. You can pause, preview, and restart individual stages without losing previous work.

5. Multilingual Architecture

The text-to-speech and subtitle engines support over 40 languages, including tonal and non-Latin scripts. The narration pipeline automatically detects the language of the input and applies appropriate phonetic models and subtitle fonts.

6. 24/7 Operation Mode

Designed for unattended batch processing, AuraSynth includes a queue management system that allows you to stack dozens of video generation tasks. The system handles resource contention (e.g., VRAM allocation) between sequential operations without manual intervention.


🧠 Technical Philosophy

AuraSynth is built on three pillars: determinism, extensibility, and privacy.

  • Determinism: Given the same input text and seed values, the pipeline will produce identical output across runs. This is essential for iterative refinement, version control, and compliance scenarios.
  • Extensibility: Every module exposes clear interfaces. You can swap the visual generator (e.g., use a custom Stable Diffusion checkpoint), replace the TTS engine with a different provider, or inject post-processing filters via a plugin system.
  • Privacy: As mentioned, all computation occurs locally. There are no telemetry calls, no analytics pings, and no latent data exfiltration. The source code is auditable and has been reviewed for network calls.

Why "AuraSynth"?

The name evokes the idea of synthesizing an aura—the intangible atmosphere and emotional texture of a scene—from raw textual intention. Traditional video generators focus on fidelity and speed; AuraSynth emphasizes tone and narrative flow. It is a tool for those who care as much about the feeling of a video as its factual content.


📂 Repository Structure

This repository is organized as a monorepo with the following primary directories:

aurasynth/
├── engine/          # Core AI inference logic and model orchestration
├── ui/              # Web user interface (React-based)
├── api/             # RESTful API and event-driven architecture
├── models/          # Default model weights, configuration files, and adapter scripts
├── docs/            # Comprehensive documentation, including tutorials and API specs
└── tests/           # Unit and integration test suites

The engine/ directory is the heart of the system. It contains separate submodules for visual generation (using latent diffusion), audio synthesis (using neural vocoders), subtitle alignment (using forced alignment algorithms), and digital anchor animation (using 3D morphable models).


🚀 Immediate Value Proposition

If you are currently paying for cloud-based video generation services, or if you are constrained by their rate limits and content policies, AuraSynth offers an alternative that requires no subscription, no internet connection during processing, and no third-party approval for your content. You set the rules. You define the quality thresholds. You control the storage and distribution.

The platform is not a "free" alternative in the economic sense—it is an unconstrained alternative. The only cost is the hardware you allocate and the electricity it consumes.

Example Use Cases

  • Educational Content Creators: Generate lecture supplements with synchronized slides and narration without recurring platform fees.
  • Marketing Teams: Produce A/B test video variants for ad campaigns with complete brand control.
  • Storytellers and Authors: Create visual chapters or book trailers using textual excerpts.
  • Researchers: Study generative video pipelines end-to-end, modifying individual components for ablation studies.
  • Accessibility Advocates: Generate videos with precisely timed subtitles in any supported language.

⚠️ Disclaimer

AuraSynth is a powerful generative tool. As with any technology capable of synthesizing realistic media, the potential for misuse exists. The developers and contributors to this repository operate under a code of ethical responsibility: we do not condone the generation of misleading, defamatory, or harmful content. The software is provided "as is" under the MIT License, without warranty of any kind. By using this software, you agree to comply with all applicable laws and regulations in your jurisdiction regarding synthetic media, data privacy, and intellectual property.

This project does not include any usage monitoring, content filtering, or moderation capabilities by default. Implementing such measures is the responsibility of the deployer. We strongly recommend that any public-facing deployment of AuraSynth integrate content safety filters appropriate to the context.


📜 License

AuraSynth is released under the MIT License, effective 2026. You are free to use, modify, distribute, and sublicense the software, provided that the original copyright notice and disclaimer are included in all copies or substantial portions of the software.

For the full text of the license, see the LICENSE file in the repository root.


🤝 Contributing

We welcome contributions that align with the project's philosophy of open, local-first media generation. Whether you are improving model performance, expanding language support, refining the UI, or fixing bugs, please refer to the CONTRIBUTING.md guide for coding standards and pull request procedures.

All contributors are expected to adhere to the Code of Conduct. We maintain a respectful, constructive community space—no exceptions.


🧭 Getting Oriented

To begin working with AuraSynth, start by reviewing the docs/ directory. The Quick Start guide provides a step-by-step walkthrough for spinning up the Web UI on a local machine. For those interested in integration, the API Reference documents every endpoint, request schema, and error code.

The Configuration section explains environment variables, model download procedures, and hardware requirement estimation. AuraSynth is designed to run on consumer-grade hardware (16GB VRAM recommended for 720p output), but scales gracefully to multi-GPU server racks for higher resolution and batch throughput.


💬 Support and Community

For bug reports, feature requests, and general discussion, please use the Issues tab. We do not maintain a dedicated Discord or Slack server; instead, the issue tracker serves as the central knowledge base. Search before opening a new issue—your question may already have an answer.

Professional support is available through third-party consultants who have contributed to the codebase; contact information can be found in the SUPPORT.md file.


🔮 Future Directions (2026 and Beyond)

The road ahead includes:

  • Real-time collaborative editing: Multiple users working on the same narrative draft simultaneously.
  • Advanced lip-sync models: GAN-based refinement for digital anchor expressions.
  • Streaming output: Begin playback before the entire video is rendered.
  • Custom model fine-tuning: Tools to train the visual and audio models on proprietary datasets.
  • Plugin marketplace: A decentralized repository of community-contributed modules.

These features are driven by community demand. Please vote for or propose enhancements via the issue tracker.


Download

Releases

No releases published

Packages

 
 
 

Contributors

Languages