---
title: 音声信号の "delta" という特徴量
date: "2024-05-20"
categories: ["音声","ml"]
---

音声信号に `delta` という特徴量があるらしい。
[Practical Cryptography](http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/#deltas-and-delta-deltas)

あまり良くわかっていませんが、各周波数帯での軌跡を特徴量にするイメージでしょうか？
各周波数帯のデータと `[1.0] * time` をたたみ込み演算をしたもの特徴量とするようです。

kaggle でも過去の上位解法に使用されているようです。[[4-th place solution] Inference and Training tips](https://www.kaggle.com/code/vladimirsydor/4-th-place-solution-inference-and-training-tips)  
また torchaudio にも実装されています。[torchaudio.functional.compute\_deltas — Torchaudio 2.2.0.dev20240520 documentation](https://pytorch.org/audio/main/generated/torchaudio.functional.compute_deltas.html)


実装例

```python
delta_1 = compute_deltas(spectrogram)
delta_2 = compute_deltas(delta_1)
x = torch.cat([spectrogram, delta_1, delta_2], dim=1)
```

## 一応演算結果確認しておく

In [1]:
import numpy as np
import torch

In [2]:
# kaggle の solution notebook より
def compute_deltas(
        specgram: torch.Tensor, win_length: int = 5, mode: str = "replicate"
) -> torch.Tensor:
    device = specgram.device
    dtype = specgram.dtype

    # pack batch
    shape = specgram.size()
    specgram = specgram.reshape(1, -1, shape[-1])

    assert win_length >= 3

    n = (win_length - 1) // 2

    # twice sum of integer squared
    denom = n * (n + 1) * (2 * n + 1) / 3

    specgram = torch.nn.functional.pad(specgram, (n, n), mode=mode)

    kernel = torch.arange(-n, n + 1, 1, device=device, dtype=dtype).repeat(
        specgram.shape[1], 1, 1
    )
    output = (
            torch.nn.functional.conv1d(specgram, kernel, groups=specgram.shape[1]) / denom
    )

    # unpack batch
    output = output.reshape(shape)

    return output

In [3]:
x=torch.rand([1, 10]) # dim (freq, time)
print(x.shape)
print(x)

torch.Size([1, 10])
tensor([[0.6760, 0.4193, 0.8303, 0.1316, 0.1804, 0.4828, 0.7212, 0.0631, 0.1529,
         0.5410]])


In [4]:
delta=compute_deltas(x)
print(delta.shape)
print(delta)

torch.Size([1, 10])
tensor([[ 0.0052, -0.0934, -0.1279, -0.0523,  0.0133,  0.0404, -0.0475, -0.0452,
          0.0117,  0.1344]])


In [5]:
n=2
tmp=torch.nn.functional.pad(x, (2, 2), mode='replicate')
denom = n * (n + 1) * (2 * n + 1) / 3

In [6]:
for p in range(n,n+10):
    sm=0.0
    for i in range(1,n+1):
        sm+=(-tmp[0][p-i] +tmp[0][p+i])*i
    print(sm/denom)

tensor(0.0052)
tensor(-0.0934)
tensor(-0.1279)
tensor(-0.0523)
tensor(0.0133)
tensor(0.0404)
tensor(-0.0475)
tensor(-0.0452)
tensor(0.0117)
tensor(0.1344)
