TaylorTorch is a modern Swift wrapper for LibTorch, designed to bring the elegance of Swift to the power of PyTorch's C++ backend. It is built from the ground up to feel idiomatic to Swift developers and to leverage the language's first-class automatic differentiation capabilities for seamless gradient computation. By embracing a protocol-oriented design, TaylorTorch provides a flexible “front-to-back” experience so you can compose complex models in pure Swift.
The journey of TaylorTorch begins with a deep appreciation for the pioneering work of the Swift for TensorFlow (S4TF) project. S4TF first introduced Differentiable Programming as a first-class feature in Swift, and although the project is now archived, its ambitious vision left a lasting impact on the community.
Fortunately, the story of automatic differentiation in Swift didn't end there. Thanks to the continued efforts of the Swift community and the dedicated team at PassiveLogic, the language's auto-diff capabilities have not only been preserved but have matured and strengthened significantly.
Witnessing these advancements, and in an era of rapid progress with technologies like Large Language Models (LLMs), I set a personal challenge: to see if the original vision of S4TF—a powerful, end-to-end deep learning framework in Swift—could be resurrected. TaylorTorch is the result of that challenge, aiming to rebuild that capability but this time powered by the robust and widely-used LibTorch backend.
This project is a foundational layer, and its future can be extended in countless ways—from adding higher-level APIs like torch.nn to developing integrations with other scientific computing libraries. The roadmap is intentionally open, as I eagerly await feedback, contributions, and ideas from the Swift community to guide its evolution.
While TaylorTorch is an experimental alpha release, it comes packed with features to let you start building powerful models right away. The core goal is to provide comprehensive access to the underlying Torch ATen tensor functionalities directly within Swift, giving you a massive library of operations at your fingertips.
This release includes a robust set of common neural network layers, ready for you to compose your own architectures:
- 
Core Layers: Linear, Conv (1D, 2D, 3D), Dropout 
- 
Normalization: BatchNorm, LayerNorm, GroupNorm 
- 
Attention Mechanisms: Multi-Head Attention, SiLU/ELU/GELU activations 
- 
Recurrent Cells: LSTMCell, GRUCell 
TaylorTorch adopts Swift’s Layer protocol (an alias of Module) as the
foundation for the high-level API. Every layer that you will find under
Sources/Torch/Modules/Layers is both differentiable and composable:
- 
Initialisation – weight tensors now follow the [in, out]convention for dense projections (Linear,Dense) and[Cin, Cout]style shapes for convolutional blocks, mirroring the semantics used across the tests.
- 
Builder syntax – Sequentialis a single generic type that accepts anyLayerinstance. The nested chains created by the result builder eliminate the need to manually spell outSequential<Chain<…>>. The examples and tests now uselet head = Sequential { Dense(inputSize: 70, outputSize: 64) ReLU() Dense(inputSize: 64, outputSize: 32) } which matches the API available inside GraphNetwork, Transformers, and the examples.
- 
Differentiable semantics – most layers provide handwritten TangentVectors to keep Swift’s autodiff compiler happy. When you compose layers inSequentialor within more specialised modules (e.g. the graph stack in the Karate example), their gradients can be consumed by optimisers such asSGDorAdamdirectly.
- 
Graph utilities – the layers in Modules/Graphinclude batching helpers, GNN-style message passing (GraphNetwork), and pooling utilities. They expect host indices (Swift arrays) for sender/receiver lists, and the README examples demonstrate the required shapes.
Beyond standard layers, TaylorTorch includes initial building blocks for Graph Neural Networks (GNNs). Inspired by the excellent work from DeepMind's Graph Nets and Jraph libraries, these components are designed to make graph-based machine learning a first-class citizen in the Swift ecosystem.
To help you get started, the library includes several working examples that showcase its capabilities:
- 
MNIST: A classic image classification task using convolutional layers (Conv2d). 
- 
ANKI English to Spanish Translation: A sequence-to-sequence model demonstrating the use of multi-head attention. 
- 
Karate Club Classification: A 2-class node classification problem solved with Graph Networks, showing how to leverage the GNN components. 
These examples serve as a practical guide and a starting point for your own projects.
Please read this section carefully before building the project.
This initial release of TaylorTorch has been tested exclusively with the following configuration:
- 
PyTorch: Version 2.8.0, compiled from source, CPU only. 
- 
Platform: Mac with Apple Silicon (ARM), CPU only. 
- 
Swift: Version 6.3-dev (swift-DEVELOPMENT-SNAPSHOT-2025-10-02, LLVM 0d0246569621d5b, Swift 199240b3fe97eda). 
Support for other platforms (Linux, Windows), hardware (GPUs), or different PyTorch versions is not yet confirmed.
To successfully build the project, you must manually update several paths.
- 
LibTorch Paths: In the Package.swift file, update the header and library search paths to point to the correct locations where your compiled PyTorch library is stored. 
- 
C++ Interop Macros: Given the project's use of C++ interoperability and custom macros, you also need to ensure the location of the Customization Macros in <swift/bridging> is correctly configured in your build settings. 
Failure to update these paths will result in build errors.
import Torch
var model = Sequential {
    // ── Block 1 ──────────────────────────────────────────────────────────────
    Conv2D(
      kaimingUniformInChannels: 1, outChannels: 32,
      kernelSize: (3, 3), padding: (1, 1))
    Dropout(probability: 0.025)  
    BatchNorm(featureCount: 32)  // NCHW => axis 1
    ReLU()
    AvgPool2D(kernelSize: (2, 2), stride: (2, 2))
    // ── Block 2 ──────────────────────────────────────────────────────────────
    Conv2D(
      kaimingUniformInChannels: 32, outChannels: 64,
      kernelSize: (3, 3), padding: (1, 1))
    BatchNorm(featureCount: 64)
    ReLU()
    AvgPool2D(kernelSize: (2, 2), stride: (2, 2))
    // ── Head ─────────────────────────────────────────────────────────────────
    Flatten(startDim: 1)  // [N, 64*7*7] = [N, 3136]
    Linear(inputSize: 64 * 7 * 7, outputSize: 256)
    Dropout(probability: 0.025)
    ReLU()
    Linear(inputSize: 256, outputSize: 10)
  }import Torch
let model = Sequential {
  Dense(inputSize: 3, outputSize: 2)
  ReLU()
}
let x = Tensor(array: [
  0.5, -1.0, 2.0,
  1.5,  0.0, -0.5,
], shape: [2, 3], dtype: .float64)
let y = model(x)
print(y)TaylorTorch is a passion project developed in my free time, which means balancing development with important things like sleep and sports! The library has many potential avenues for growth, and the path forward will be heavily influenced by community feedback and contributions.
Here are some of the exciting directions we could explore:
- 
Expanded Operator Coverage: Systematically increase the number of covered ATen tensor operators to provide more comprehensive access to the LibTorch backend. 
- 
Robust Test Coverage: Implement a thorough testing suite for both the Swift front-end and the C++ bridging code to ensure stability and reliability. 
- 
GPU / Metal Support: Unlock high-performance training and inference by adding support for GPU acceleration via Metal on Apple Silicon, which would be a game-changer for larger models. 
- 
Richer Model Zoo: Add more advanced layers and end-to-end examples, such as a full Transformer or a Vision Transformer (ViT). 
- 
Ecosystem Interoperability: Integrate support for standards like DLPack to allow for zero-copy tensor sharing with other libraries (like NumPy or JAX), making it easier to use weights and data from different ecosystems. 
- 
Exploring New Backends: Investigate a potential future version that moves away from LibTorch and instead uses a more native backend like Apple's MLX (Swift) for a deeply integrated experience on Apple hardware. 
If any of these ideas excite you, feel free to open an issue or submit a pull request. Your contributions are what will shape the next era of TaylorTorch!
Developing TaylorTorch has been a fascinating journey into the depths of Swift's automatic differentiation system. The library's multi-nested architecture, combined with its heavy reliance on the Differentiable protocol, makes it a powerful stress test for the Swift compiler.
Along the way, I encountered and resolved several interesting SIL (Swift Intermediate Language) generation issues related to differentiation. A key insight was that for many complex Differentiable structs, compiler-related problems could be consistently solved by explicitly implementing the associated TangentVector.
Given its structure, this library can serve as a valuable test suite for the Swift community. It provides a real-world, complex use case to help identify, debug, and improve the compiler's handling of automatic differentiation. The ultimate goal is to contribute to making differentiable programming a more robust and first-class citizen in the Swift ecosystem.
