# Demo Tensors Dataset (HDF5)

This notebook demonstrates the generation and usage of the Demo Tensors dataset, which includes a diverse range of tensor dimensions (0D to 5D).

The dataset is generated by `scripts/generate_tensors.py` and results in an HDF5 file (`.h5`) located at `data/synthetic/demo_tensors/demo_tensors.h5` (relative to the repository root).

**Note**: To run this notebook, you will need the `h5py` library. If you haven't installed it yet, you can do so by running:
```bash
pip install h5py
```

## 1. Generate the Dataset

The following cell executes the script `scripts/generate_tensors.py`.
This script now generates several tensors and saves them into `demo_tensors.h5`: 
`tensor_a`, `tensor_b`, `scalar_data`, `vector_data`, `image_grayscale_data`, `image_rgb_data`, `video_frames_data`, and `simulation_data`.

Since this notebook is in the `notebooks/` directory, we use `../` to correctly path to the script and the output directory.
This command will create (or overwrite) `../data/synthetic/demo_tensors/demo_tensors.h5`.

In [1]:
!python ../scripts/generate_tensors.py ../data/synthetic/demo_tensors

Generated demo_tensors.h5 in /app/data/synthetic/demo_tensors


## 2. Load and Inspect the Data from HDF5 File

Now, we'll load the generated `demo_tensors.h5` file using the `h5py` library.
We will then list all datasets (tensors) stored within the file and inspect each one, showing its shape, data type (dtype), and for the scalar, its value.

In [2]:
import h5py
import numpy as np # Still useful for verifying dtypes or further operations

# Define the path to the data file (relative to this notebook)
data_path = '../data/synthetic/demo_tensors/demo_tensors.h5'

print(f"Attempting to load data from: {data_path}\n")

with h5py.File(data_path, 'r') as hf:
    print(f"Available datasets in HDF5: {sorted(list(hf.keys()))}\n")

    # Define the tensors we expect to find (and their descriptions for printing)
    tensor_info = {
        "tensor_a": "Original general 2D tensor",
        "tensor_b": "Original general 3D tensor",
        "scalar_data": "0D tensor (scalar)",
        "vector_data": "1D tensor (vector)",
        "image_grayscale_data": "2D tensor (grayscale image)",
        "image_rgb_data": "3D tensor (RGB image)",
        "video_frames_data": "4D tensor (sequence of grayscale frames)",
        "simulation_data": "5D tensor (higher-dimensional data)"
    }

    for name, description in tensor_info.items():
        if name in hf:
            dataset = hf[name] # Access the HDF5 dataset object
            print(f"Dataset: '{name}' ({description})")
            print(f"  Shape: {dataset.shape}")
            print(f"  dtype: {dataset.dtype}")
            
            # Load data into memory as NumPy array to work with it
            if name == 'scalar_data':
                value = dataset[()] # Use [()] for scalars
                print(f"  Value: {value}")
            else:
                # For other tensors, you might load them fully using [:]
                # array_data = dataset[:] 
                # print(f"  First element if loaded: {array_data.flatten()[0] if array_data.size > 0 else 'N/A'}")
                pass # Avoid loading large arrays fully in this overview
            print("---")
        else:
            print(f"Dataset: '{name}' was not found in the file.")
            print("---")

# The file hf is automatically closed when exiting the 'with' block.

Attempting to load data from: ../data/synthetic/demo_tensors/demo_tensors.h5

Available datasets in HDF5: ['image_grayscale_data', 'image_rgb_data', 'scalar_data', 'simulation_data', 'tensor_a', 'tensor_b', 'vector_data', 'video_frames_data']

Dataset: 'tensor_a' (Original general 2D tensor)
  Shape: (100, 3)
  dtype: float64
---
Dataset: 'tensor_b' (Original general 3D tensor)
  Shape: (50, 10, 3)
  dtype: float64
---
Dataset: 'scalar_data' (0D tensor (scalar))
  Shape: ()
  dtype: float64
  Value: 42.0
---
Dataset: 'vector_data' (1D tensor (vector))
  Shape: (150,)
  dtype: float64
---
Dataset: 'image_grayscale_data' (2D tensor (grayscale image))
  Shape: (32, 32)
  dtype: float64
---
Dataset: 'image_rgb_data' (3D tensor (RGB image))
  Shape: (16, 16, 3)
  dtype: float64
---
Dataset: 'video_frames_data' (4D tensor (sequence of grayscale frames))
  Shape: (10, 8, 8, 1)
  dtype: float64
---
Dataset: 'simulation_data' (5D tensor (higher-dimensional data))
  Shape: (5, 6, 6, 3, 2)
  dtyp