# 张量网络并行计算

## 1.Network类与张量收缩

In [3]:
from cuquantum import Network
import numpy as np
from cupy.cuda.runtime import getDeviceCount
from mpi4py import MPI
import numpy as np
from cuquantum import OptimizerOptions
import time

--------------------------------------------------------------------------------

  CuPy may not function correctly because multiple CuPy packages are installed
  in your environment:

    cupy, cupy-cuda12x

  Follow these steps to resolve this issue:

    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:

         $ pip uninstall <package_name>

      If you previously installed CuPy via conda, also run the following:

         $ conda uninstall cupy

    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.

         https://docs.cupy.dev/en/stable/install.html

--------------------------------------------------------------------------------



## 1.1张量网络定义+路径优化+并行切片

![jupyter](./f.png)

In [153]:
# 定义张量收缩表达式   
expr = 'aij,bjk,klc,lid'
# 定义张量形状 
n=12
shapes = [(2, 2**n,2**n), (2, 2**n, 2**n), (2**n,2**n, 2),(2**n,2**n, 2)]
print(2**25)
# 准备张量数据   
operands = [np.random.rand(*shape) for shape in shapes]


# 创建 Network 对象   
network = Network(expr, *operands)
# 创建 OptimizerOptions 对象  
opt_options = OptimizerOptions(samples=16, slicing={'min_slices': 2})
# 查看收缩路径  
path, info = network.contract_path(optimize=opt_options)
print("收缩路径:", path)
print("总 FLOPs 数:", info.opt_cost)  # 估算的浮点运算数 
print("最佳路径:", info.path)  # 查看最优路径  
print("切片数:", info.num_slices)  # 切片数量信息 
print("切片模式:", info.slices)  # 切片模式信息 
start = time.time()
result = network.contract()
end = time.time()
print (str(end-start))


33554432
收缩路径: [(0, 1), (0, 1), (0, 1)]
总 FLOPs 数: 1100048564208.0
最佳路径: [(0, 1), (0, 1), (0, 1)]
切片数: 4096
切片模式: (('i', 1),)
261.18039631843567


## 1.2并行运算 

1) 初始化 MPI

In [4]:
root =0
comm = MPI.COMM_WORLD
rank, size = comm.Get_rank(), comm.Get_size()

MPI.COMM_WORLD：定义全局通信器，以便所有进程参与同一个通信域。\
rank：当前进程的标识符。\
size：总的进程数。

2) 设置张量表达式与形状

In [5]:
# 定义张量收缩表达式   
expr = 'ij,jkl,lm'
# 定义张量形状 
shapes = [(2, 2), (2, 2, 2), (2, 2)]


expr: 表示张量网络的收缩表达式。\
shapes: 每个操作数（即张量）的形状。

3) 广播操作数数据

In [6]:
#广播操作数数据
operands = [np.random.rand(*shape)
            for shape in shapes] if rank == root else None
operands = comm.bcast(operands, root)

仅在 rank == root 的进程（通常是主进程）上创建操作数的数据，这些数据随后会通过 MPI 广播给其他进程。\
将主进程的数据广播给所有进程，这样每个进程都有相同的操作数数据。

4) 设备分配

In [8]:
#分配设备
device_id = rank % getDeviceCount()

使用当前进程 rank 与 GPU 设备总数进行取模，以便每个进程分配一个 GPU 设备。

5) 创建 Network 对象

In [9]:
#创建network对象
network = Network(expr, *operands, 
                  options={'device_id' : device_id})

使用 Network 类创建张量网络对象，指定收缩表达式和操作数，并分配到指定的 GPU 上。

6) 计算收缩路径

In [10]:
#计算收缩路径
path, info = network.contract_path(
    optimize={'samples': 8, 'slicing': 
              {'min_slices': size}})

使用 contract_path 方法优化张量收缩路径。\
samples=8 表示超参数优化的采样次数。\
slicing={'min_slices': max(16, size)} 强制启用切片以支持并行计算。

7) 选择最佳路径

In [11]:
#选择计算开销最小的收缩路径
opt_cost, sender = comm.allreduce(
    sendobj=(info.opt_cost, rank), op=MPI.MINLOC)
if rank == root:
    print(f"Process {sender} has the path with the lowest FLOP count {opt_cost}.")


Process 0 has the path with the lowest FLOP count 64.0.


使用 MPI.MINLOC 操作来选择 FLOP 计算开销最小的路径，并找出拥有此路径的进程 sender。

8) 广播最佳路径信息

In [13]:
#广播与设置收缩路径确保几个进程统一收缩路径
opt_cost, sender = comm.allreduce(
    sendobj=(info.opt_cost, rank), op=MPI.MINLOC)
if rank == root:
    print(f"Process {sender} has the path with the lowest FLOP count {opt_cost}.")

Process 0 has the path with the lowest FLOP count 64.0.


将最佳路径的信息从 sender 进程广播给其他进程。

9) 设置路径和切片信息

In [11]:
path, info = network.contract_path(
    optimize={'path': info.path, 'slicing': info.slices})



使用之前计算得到的路径和切片信息重新设置路径，这样每个进程都能获得同样的路径信息。

10) 切片分配

In [None]:
path, info = network.contract_path(
    optimize={'path': info.path, 'slicing': info.slices})
num_slices = info.num_slices
chunk, extra = num_slices // size, num_slices % size
slice_begin = rank * chunk + min(rank, extra)
slice_end = num_slices if rank == size - 1 else (rank + 1) * chunk + min(rank + 1, extra)
slices = range(slice_begin, slice_end)   
print(slices)

将切片任务分配到每个进程：\
num_slices: 总切片数。
slice_begin 和 slice_end: 当前进程负责的切片范围

11) 执行收缩

In [13]:
#执行收缩
result = network.contract(slices=slices)

使用 contract 方法进行张量收缩，仅计算当前进程负责的切片范围。

12) 汇总结果

In [14]:
#汇总结果
result = comm.reduce(
    sendobj=result, op=MPI.SUM, root=root)

使用 MPI.SUM 将所有进程的计算结果汇总到 root 进程。

13) 检查结果正确性

In [None]:
if rank == root:
   result_np = np.einsum(expr, *operands, optimize=True)
   print("Does the cuQuantum parallel contraction result match the numpy.einsum result?", np.allclose(result, result_np))

root 进程使用 numpy.einsum 检查并行收缩的结果是否与序列化计算结果一致，以验证正确性。

## 1.3资源分配 

本示例展示了如何管理有状态对象使用的内存资源。当张量网络需要大量内存，并且对有状态对象的执行方法（例如自动调优、收缩和梯度计算）的调用与对其他操作（包括对其他张量网络的操作）的调用交错进行时，这种管理非常有用，这些操作也可能需要大量内存。

在本示例中，我们使用两个张量网络，分别表示两个大型矩阵乘法，并以交替的方式在循环中执行这两个收缩操作。我们假设可用的设备内存仅足够容纳一个操作数集和一次收缩操作。

1) 导入库

In [76]:
import logging
import cupy as cp
import numpy as np
import cuquantum

2) 配置和初始化

In [77]:
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)-8s %(message)s', datefmt='%m-%d %H:%M:%S')
N = 1024

开启日志：用于监控内存管理消息，例如内存分配、释放的详细信息。

3) 第一次初始化和收缩计算

In [78]:
a = cp.random.rand(N, N)
b = cp.random.rand(N, N)
n1 = cuquantum.Network("ij,jk", a, b)
n1.contract_path()
r = n1.contract(release_workspace=True)


11-15 22:58:24 INFO     cuTensorNet version = 2.5.0
11-15 22:58:24 INFO     Beginning network creation...
11-15 22:58:24 INFO     The memory limit is 4.80 GiB.
11-15 22:58:24 DEBUG    Beginning output tensor creation...
11-15 22:58:24 DEBUG    The output tensor has been created.
11-15 22:58:24 INFO     The network has been created.
11-15 22:58:24 INFO     Setting user-provided path...
11-15 22:58:24 INFO     Finished setting user-provided path.
11-15 22:58:24 INFO     Optimizer Information:
    Largest intermediate = 1.00 MiElements
    Optimized cost = 2.147e+09 FLOPs
    Path = [(0, 1)]
    Slicing not needed.
    Intermediate tensor mode labels = [ik]
11-15 22:58:24 INFO     The workspace size requirements range from 24.00 MiB to 72.00 MiB.
11-15 22:58:24 INFO     The scratch workspace size has been set to 72.00 MiB.
11-15 22:58:24 INFO     The cache workspace size has been set to 0.00 B.
11-15 22:58:24 DEBUG    Creating contraction plan...
11-15 22:58:24 DEBUG    Finished creating 

张量网络 1 (n1)：\
定义张量网络收缩表达式 ij,jk。\
创建随机张量 a 和 b。\
创建 Network 对象 n1，并绑定操作数。\
路径优化：n1.contract_path() 计算收缩路径。\
执行收缩：n1.contract()，并通过 release_workspace=True 释放中间计算的临时内存，只保留结果 r。

4) 释放资源并初始化第二个网络

In [79]:
n1.reset_operands(None)
a = b = None

c = cp.random.rand(N, N)
d = cp.random.rand(N, N)
n2 = cuquantum.Network("ij,jk", c, d)
n2.contract_path()
r = n2.contract(release_workspace=True)


11-15 22:58:26 INFO     The operands have been reset to None.
11-15 22:58:26 INFO     cuTensorNet version = 2.5.0
11-15 22:58:26 INFO     Beginning network creation...
11-15 22:58:26 INFO     The memory limit is 4.80 GiB.
11-15 22:58:26 DEBUG    Beginning output tensor creation...
11-15 22:58:26 DEBUG    The output tensor has been created.
11-15 22:58:26 INFO     The network has been created.
11-15 22:58:26 INFO     Setting user-provided path...
11-15 22:58:26 INFO     Finished setting user-provided path.
11-15 22:58:26 INFO     Optimizer Information:
    Largest intermediate = 1.00 MiElements
    Optimized cost = 2.147e+09 FLOPs
    Path = [(0, 1)]
    Slicing not needed.
    Intermediate tensor mode labels = [ik]
11-15 22:58:26 INFO     The workspace size requirements range from 24.00 MiB to 72.00 MiB.
11-15 22:58:26 INFO     The scratch workspace size has been set to 72.00 MiB.
11-15 22:58:26 INFO     The cache workspace size has been set to 0.00 B.
11-15 22:58:26 DEBUG    Creating 

释放资源：\
调用 n1.reset_operands(None) 释放 n1 的操作数 a 和 b 的内存。\
将变量 a 和 b 设为 None，方便后续重新绑定。\
张量网络 2 (n2)：\
创建随机张量 c 和 d。\
创建 Network 对象 n2，并绑定操作数。\
路径优化：n2.contract_path() 计算收缩路径。\
执行收缩：通过 release_workspace=True 释放中间计算的临时内存。

5) 交替收缩循环

In [80]:
with n1, n2:

    for i in range(num_iter):
        print(f"Iteration {i}")
        # Create and set new operands for n1.
        a = cp.random.rand(N, N)
        b = cp.random.rand(N, N)
        n1.reset_operands(a, b)

        # Perform the first contraction
        r = n1.contract(release_workspace=True)

        # Reset network n1 operands
        n1.reset_operands(None)
        a = b = None

        # Create and set new operands for n2
        c = cp.random.rand(N, N)
        d = cp.random.rand(N, N)
        n2.reset_operands(c, d)

        # Perform the second contraction
        r = n2.contract(release_workspace=True)

        # Reset network n2 operands
        n2.reset_operands(None)
        c = d = None


11-15 22:58:28 INFO     Resetting operands...
11-15 22:58:28 INFO     The operands have been reset.
11-15 22:58:28 DEBUG    Allocating scratch workspace for the tensor network computation...
11-15 22:58:28 DEBUG    _CupyCUDAMemoryManager (allocate memory): size = 75498752, ptr = 21718106112, device = 0, stream=<Stream 0 (device -1)>
11-15 22:58:28 DEBUG    Finished allocating device memory of size 72.00 MiB and host memory of size 0.00 B for contraction in the context of stream <ExternalStream 0 (device -1)>.
11-15 22:58:28 DEBUG    The scratch workspace memory (device pointer = 21718106112, host pointer = 94415127363760) has been set in the workspace descriptor.
11-15 22:58:28 DEBUG    Beginning output (empty) tensor creation...
11-15 22:58:28 DEBUG    The output (empty) tensor has been created.
11-15 22:58:28 INFO     All the available slices (1) will be contracted.
11-15 22:58:28 INFO     Starting network contraction...
11-15 22:58:28 INFO     This call is blocking and will return o

Iteration 0
Iteration 1


11-15 22:58:28 INFO     The contraction took 66.218 ms to complete.
11-15 22:58:28 DEBUG    Established ordering with respect to the computation before releasing the scratch workspace.
11-15 22:58:28 DEBUG    [_release_workspace_memory_perhaps] The scratch workspace memory has been released.
11-15 22:58:28 INFO     The operands have been reset to None.
11-15 22:58:28 INFO     Resetting operands...
11-15 22:58:28 INFO     The operands have been reset.
11-15 22:58:28 DEBUG    Allocating scratch workspace for the tensor network computation...
11-15 22:58:28 DEBUG    _CupyCUDAMemoryManager (allocate memory): size = 75498752, ptr = 21718106112, device = 0, stream=<Stream 0 (device -1)>
11-15 22:58:28 DEBUG    Finished allocating device memory of size 72.00 MiB and host memory of size 0.00 B for contraction in the context of stream <ExternalStream 0 (device -1)>.
11-15 22:58:28 DEBUG    The scratch workspace memory (device pointer = 21718106112, host pointer = 94415127363760) has been set in

Iteration 2


使用 with 上下文管理器，确保在退出时自动释放 n1 和 n2 的所有资源。\
在每次迭代中：\
绑定新数据：生成随机张量 a, b, c, d，分别绑定到 n1 和 n2。\
执行收缩：\
使用 n1.contract() 和 n2.contract() 分别进行张量收缩。\
每次操作后通过 release_workspace=True 释放工作区内存。\
释放操作数：\
调用 reset_operands(None) 释放当前操作数。\
循环交替进行两个网络的收缩。

6) 总代码

In [84]:

N=2**13
# Turn on logging and set the level to DEBUG to print memory management messages.
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)-8s %(message)s', datefmt='%m-%d %H:%M:%S')

# Create, prepare, and execute the first iteration on network n1.
a = cp.random.rand(N, N)
b = cp.random.rand(N, N)
n1 = cuquantum.Network("ij,jk", a, b)
n1.contract_path()
r = n1.contract(release_workspace=True)

# Reset network n1 operands to None, and set a and b to None to make memory available for the network n2.
n1.reset_operands(None)
a = b = None

# Create, prepare, and execute the first iteration on network n2.
c = cp.random.rand(N, N)
d = cp.random.rand(N, N)
n2 = cuquantum.Network("ij,jk", c, d)
n2.contract_path()
r = n2.contract(release_workspace=True)

# Reset network n2 operands to None, and set c and d to None to make memory available for next contraction of network n1.
n2.reset_operands(None)
c = d = None

num_iter = 3
# Use the networks as context managers so that internal library resources are properly cleaned up.
with n1, n2:

    for i in range(num_iter):
        print(f"Iteration {i}")
        # Create and set new operands for n1.
        a = cp.random.rand(N, N)
        b = cp.random.rand(N, N)
        n1.reset_operands(a, b)

        # Perform the first contraction, and request that the workspace be released at the end of the operation so that there is enough
        #   memory for the second one.
        r = n1.contract(release_workspace=True)

        # Reset network n1 operands to None, and set a and b to None to make memory available for the operands for and contracting network n2.
        n1.reset_operands(None)
        a = b = None

        # Create and set new operands for n2.
        c = cp.random.rand(N, N)
        d = cp.random.rand(N, N)
        n2.reset_operands(c, d)

        # Perform the second contraction, and request that the workspace be released at the end of the operation so that there is enough
        #   memory for the next iteration of the first contraction.
        r = n2.contract(release_workspace=True)

        # Reset network n2 operands to None, and set c and d to None to make memory available for next contraction and operands of network n1.
        n2.reset_operands(None)
        c = d = None

11-15 23:03:40 INFO     cuTensorNet version = 2.5.0
11-15 23:03:40 INFO     Beginning network creation...
11-15 23:03:40 INFO     The memory limit is 4.80 GiB.
11-15 23:03:40 DEBUG    Beginning output tensor creation...
11-15 23:03:40 DEBUG    The output tensor has been created.
11-15 23:03:40 INFO     The network has been created.
11-15 23:03:40 INFO     Setting user-provided path...
11-15 23:03:40 INFO     Finished setting user-provided path.
11-15 23:03:40 INFO     Optimizer Information:
    Largest intermediate = 64.00 MiElements
    Optimized cost = 1.100e+12 FLOPs
    Path = [(0, 1)]
    Slicing not needed.
    Intermediate tensor mode labels = [ik]
11-15 23:03:40 INFO     The workspace size requirements range from 528.00 MiB to 576.00 MiB.
11-15 23:03:40 INFO     The scratch workspace size has been set to 576.00 MiB.
11-15 23:03:40 INFO     The cache workspace size has been set to 0.00 B.
11-15 23:03:40 DEBUG    Creating contraction plan...
11-15 23:03:40 DEBUG    Finished creat

Iteration 0


11-15 23:03:59 INFO     The contraction took 6006.506 ms to complete.
11-15 23:03:59 DEBUG    Established ordering with respect to the computation before releasing the scratch workspace.
11-15 23:03:59 DEBUG    [_release_workspace_memory_perhaps] The scratch workspace memory has been released.
11-15 23:03:59 INFO     The operands have been reset to None.
11-15 23:03:59 INFO     Resetting operands...
11-15 23:03:59 INFO     The operands have been reset.
11-15 23:03:59 DEBUG    Allocating scratch workspace for the tensor network computation...
11-15 23:03:59 DEBUG    _CupyCUDAMemoryManager (allocate memory): size = 603981056, ptr = 22791847936, device = 0, stream=<Stream 0 (device -1)>
11-15 23:03:59 DEBUG    Finished allocating device memory of size 576.00 MiB and host memory of size 0.00 B for contraction in the context of stream <ExternalStream 0 (device -1)>.
11-15 23:03:59 DEBUG    The scratch workspace memory (device pointer = 22791847936, host pointer = 94415120035424) has been se

Iteration 1


11-15 23:04:12 INFO     The contraction took 6014.287 ms to complete.
11-15 23:04:12 DEBUG    Established ordering with respect to the computation before releasing the scratch workspace.
11-15 23:04:12 DEBUG    [_release_workspace_memory_perhaps] The scratch workspace memory has been released.
11-15 23:04:12 INFO     The operands have been reset to None.
11-15 23:04:12 INFO     Resetting operands...
11-15 23:04:12 INFO     The operands have been reset.
11-15 23:04:12 DEBUG    Allocating scratch workspace for the tensor network computation...
11-15 23:04:12 DEBUG    _CupyCUDAMemoryManager (allocate memory): size = 603981056, ptr = 22254977024, device = 0, stream=<Stream 0 (device -1)>
11-15 23:04:12 DEBUG    Finished allocating device memory of size 576.00 MiB and host memory of size 0.00 B for contraction in the context of stream <ExternalStream 0 (device -1)>.
11-15 23:04:12 DEBUG    The scratch workspace memory (device pointer = 22254977024, host pointer = 94415120035424) has been se

Iteration 2


11-15 23:04:24 INFO     The contraction took 5997.393 ms to complete.
11-15 23:04:24 DEBUG    Established ordering with respect to the computation before releasing the scratch workspace.
11-15 23:04:24 DEBUG    [_release_workspace_memory_perhaps] The scratch workspace memory has been released.
11-15 23:04:24 INFO     The operands have been reset to None.
11-15 23:04:24 INFO     Resetting operands...
11-15 23:04:24 INFO     The operands have been reset.
11-15 23:04:24 DEBUG    Allocating scratch workspace for the tensor network computation...
11-15 23:04:24 DEBUG    _CupyCUDAMemoryManager (allocate memory): size = 603981056, ptr = 22254977024, device = 0, stream=<Stream 0 (device -1)>
11-15 23:04:24 DEBUG    Finished allocating device memory of size 576.00 MiB and host memory of size 0.00 B for contraction in the context of stream <ExternalStream 0 (device -1)>.
11-15 23:04:24 DEBUG    The scratch workspace memory (device pointer = 22254977024, host pointer = 94415120035424) has been se

KeyboardInterrupt: 

# 2.Network类与张量分解

![jupyter](./1.png)


1) 导入需要的库

In [3]:
import cupy as cp
import numpy as np
from cuquantum import cutensornet as cutn
from cuquantum.cutensornet.experimental import contract_decompose

In [5]:

# 定义初始MPS生成函数
def generate_initial_mps(n_qubits, dtype='complex128'):
    """生成初始MPS（|000...0>）"""
    state_tensor = cp.asarray([1, 0], dtype=dtype).reshape(1, 2, 1)
    return [state_tensor.copy() for _ in range(n_qubits)]

# 定义双量子比特门作用函数
def apply_two_qubit_gate(mps, gate, qubits, cutoff=1e-12, handle=None):
    """
    对MPS作用双量子比特门，并进行截断
    Args:
        mps: 当前的MPS张量列表
        gate: 4维双量子比特门张量
        qubits: [q1, q2] 要作用的量子比特
        cutoff: 截断阈值
        handle: cuQuantum的上下文句柄
    Returns:
        更新后的MPS张量列表
    """
    i, j = qubits
    if abs(i - j) != 1:
        raise ValueError("仅支持相邻双量子比特门")
    # 合并两个站点的张量并作用门
    combined_tensor, _, truncated_tensor = contract_decompose(
        'ipj,jqk,rspq->irj,jsk',
        mps[i], mps[j], gate,
        algorithm={'svd_method': {'partition': 'V', 'abs_cutoff': cutoff}},
        options={'handle': handle}  # 使用句柄
    )
    print(_)
    mps[i], mps[j] = combined_tensor, truncated_tensor
    return mps

# 初始化MPS
n_qubits = 4
dtype = 'complex128'
mps = generate_initial_mps(n_qubits, dtype=dtype)

# 初始化双量子比特门（CNOT门为例）
cnot_gate = cp.zeros((2, 2, 2, 2), dtype=dtype)
cnot_gate[0, 0, 0, 0] = cnot_gate[1, 1, 1, 1] = cnot_gate[1, 0, 1, 0] = cnot_gate[0, 1, 1, 0] = 1

# 定义cuQuantum句柄
handle = cutn.create()

# 对MPS作用CNOT门并截断
mps = apply_two_qubit_gate(mps, cnot_gate, qubits=(0, 1), cutoff=1e-25, handle=handle)
mps = apply_two_qubit_gate(mps, cnot_gate, qubits=(1, 2), cutoff=1e-25, handle=handle)
mps = apply_two_qubit_gate(mps, cnot_gate, qubits=(2, 3), cutoff=1e-12, handle=handle)

# 打印结果
print("最终MPS张量的形状：")
for idx, tensor in enumerate(mps):
    print(f"站点 {idx}，形状: {tensor.shape}")

# 销毁cuQuantum句柄，释放资源
cutn.destroy(handle)


2024-11-25 14:25:54,032 [INFO] cuTensorNet version = 2.5.0
2024-11-25 14:25:54,034 [INFO] Beginning operands parsing...
2024-11-25 14:25:54,036 [INFO] Calling specicialized kernel `cutensornetGateSplit` for contraction and decomposition.
2024-11-25 14:25:54,041 [INFO] The SVDConfig attribute 'abs_cutoff' has been set to 1e-25.
2024-11-25 14:25:54,041 [INFO] The SVDConfig attribute 'rel_cutoff' has been set to 0.0.
2024-11-25 14:25:54,043 [INFO] The SVDConfig attribute 'partition' has been set to 2.
2024-11-25 14:25:54,044 [INFO] The SVDConfig attribute 'normalization' has been set to 0.
2024-11-25 14:25:54,046 [INFO] The SVDConfig attribute 'algorithm' has been set to 0.
2024-11-25 14:25:54,046 [INFO] The SVDConfig attribute 'discarded_weight_cutoff' has been set to 0.0.
2024-11-25 14:25:54,091 [INFO] Starting contract-decompose (gate split)...
2024-11-25 14:25:54,092 [INFO] This call is blocking and will return only after the operation is complete.
2024-11-25 14:25:54,267 [INFO] The c

None
None
None
最终MPS张量的形状：
站点 0，形状: (1, 2, 1)
站点 1，形状: (1, 2, 1)
站点 2，形状: (1, 2, 1)
站点 3，形状: (1, 2, 1)


![jupyter](./2.png)

In [11]:
import numpy as np
import logging
from cuquantum.cutensornet.tensor import decompose, SVDMethod

def svd_decompose_with_options(expr, tensor_data, abs_cutoff=None, rel_cutoff=None, discarded_weight_cutoff=None):
    """
    使用 cuTensorNet 的 SVD 方法对张量进行分解并截断，支持多种选项。
    
    Args:
        expr (str): 张量分解表达式。
        tensor_data (ndarray): 输入张量。
        abs_cutoff (float): 绝对截断值。
        rel_cutoff (float): 相对截断值。
        discarded_weight_cutoff (float): 丢弃权重截断值。

    Returns:
        tuple: 分解后的张量 U, S, V。
    """
    # 定义 SVD 方法
    svd_method = SVDMethod(
        abs_cutoff=abs_cutoff,
        rel_cutoff=rel_cutoff,
        discarded_weight_cutoff=discarded_weight_cutoff,
        partition='V'

    )

    # 设置日志记录器
    logger = logging.getLogger("cuTensorNet_SVD")
    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")

    # 定义分解选项


    # 执行分解
    u, s, v = decompose(expr, tensor_data, method=svd_method)

    return u, s, v


def main():
    # 创建张量
    t = np.random.rand(4, 3, 3, 3)

    # 修正后的分解表达式
    svd_expr = "ijab->ijm,mab"

    # 执行 SVD 分解
    u, s, v = svd_decompose_with_options(
        svd_expr, 
        t, 
        abs_cutoff=1e-6, 
        rel_cutoff=0.01, 
        discarded_weight_cutoff=0.1
    )

    # 打印结果
    print("U 张量形状:", u.shape)
    print("V 张量形状:", v.shape)


if __name__ == "__main__":
    main()


2024-11-25 14:29:16,765 [INFO] CUDA runtime version = 12020
2024-11-25 14:29:16,768 [INFO] cuTensorNet version = 2.5.0
2024-11-25 14:29:16,769 [INFO] Beginning operands parsing...
2024-11-25 14:29:16,771 [INFO] Begin transferring input data from host to device 0
2024-11-25 14:29:16,773 [INFO] Input data transfer finished
2024-11-25 14:29:16,774 [INFO] The SVDConfig attribute 'abs_cutoff' has been set to 1e-06.
2024-11-25 14:29:16,775 [INFO] The SVDConfig attribute 'rel_cutoff' has been set to 0.01.
2024-11-25 14:29:16,776 [INFO] The SVDConfig attribute 'partition' has been set to 2.
2024-11-25 14:29:16,777 [INFO] The SVDConfig attribute 'normalization' has been set to 0.
2024-11-25 14:29:16,778 [INFO] The SVDConfig attribute 'algorithm' has been set to 0.
2024-11-25 14:29:16,778 [INFO] The SVDConfig attribute 'discarded_weight_cutoff' has been set to 0.1.
2024-11-25 14:29:17,497 [INFO] Starting tensor decomposition...
2024-11-25 14:29:17,498 [INFO] This call is blocking and will return

U 张量形状: (4, 3, 3)
V 张量形状: (3, 3, 3)
