# About PyNeuralFx

PyNeuralFx is an open-source toolkit made to train, comparing, visualize the neural audio effect modeling system.


## In this tutorial

We are going to learn

*   Inference with pretained models
*   Demonstrating the workflow of the frame work by an example (**Snapshot modeling**)

To train **full modeling scenario**, we will show in other tutorials.



# Install PyNeuralFx

In [1]:
!pip install pyneuralfx # final version

Collecting pyneuralfx
  Downloading pyneuralfx-0.1.1-py3-none-any.whl.metadata (9.8 kB)
Collecting einops<0.8.0,>=0.7.0 (from pyneuralfx)
  Downloading einops-0.7.0-py3-none-any.whl.metadata (13 kB)
Collecting pyloudnorm<0.2.0,>=0.1.1 (from pyneuralfx)
  Downloading pyloudnorm-0.1.1-py3-none-any.whl.metadata (5.6 kB)
Collecting torch-dct<0.2.0,>=0.1.6 (from pyneuralfx)
  Downloading torch_dct-0.1.6-py3-none-any.whl.metadata (2.7 kB)
Collecting future>=0.16.0 (from pyloudnorm<0.2.0,>=0.1.1->pyneuralfx)
  Downloading future-1.0.0-py3-none-any.whl.metadata (4.0 kB)
Downloading pyneuralfx-0.1.1-py3-none-any.whl (49 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m768.7 kB/s[0m eta [36m0:00:00[0m
[?25hDownloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyloudnorm-0.1.1-py3-none-any.whl (9.6 kB)
Downloading torch_dct-

In [2]:
!git clone https://github.com/ytsrt66589/pyneuralfx.git

Cloning into 'pyneuralfx'...
remote: Enumerating objects: 263, done.[K
remote: Total 263 (delta 0), reused 0 (delta 0), pack-reused 263[K
Receiving objects: 100% (263/263), 92.77 MiB | 47.64 MiB/s, done.
Resolving deltas: 100% (106/106), done.


In [3]:
%cd pyneuralfx/frame_work/

/content/pyneuralfx/frame_work


# Inference with pre-trained model (pretrained on Boss OD-3, overdrive)


We demonstrate how to inference with our provided pre-trained model, helping user to quickly see the result.

In [4]:
import librosa
import soundfile as sf
import IPython.display as ipd

import utils
from pyneuralfx.models.rnn.gru import *

In [5]:
# Setting the model
cmd = {
    'config': './pre_trained/statichyper_gru_32/statichyper_gru.yml'
}

args = utils.load_config(cmd['config'])

nn_model = None
nn_model = utils.setup_models(args)
nn_model = utils.load_model(
                './pre_trained/statichyper_gru_32',
                nn_model,
                device='cpu',
                name='best_params.pt')

 [*] restoring model from ./pre_trained/statichyper_gru_32/best_params.pt


In [6]:
wav_x, sr = librosa.load('./example_wavs/example.wav', sr=nn_model.sample_rate, mono=True)
display_wav_x = wav_x
wav_x = torch.from_numpy(wav_x).unsqueeze(0).unsqueeze(0)

# For boss od3, there are two control parameters: distortion, and tone.
# During training, we normalize the condition value to -1 ~ 1
# In this example, [0, 0] means the distortion value is 0 and the tone value is zero.
# (The middle value of the entire condition range)
cond = [0, 0] # -1 ~ 1


device = 'cuda' if torch.cuda.is_available() else 'cpu'

# We use the forward_func we implemet for convenience, but it is always ok for users to customize their own feedforward function
wav_y_pred = utils.forward_func(wav_x, cond, nn_model, args.model.arch, device)


wav_y_pred = utils.convert_tensor_to_numpy(wav_y_pred, is_squeeze=True)

In [7]:
print('Input x')
ipd.display(ipd.Audio(data=display_wav_x, rate=args.data.sampling_rate, normalize=False))
print(f'Output y with condition {cond}')
ipd.display(ipd.Audio(data=wav_y_pred, rate=args.data.sampling_rate, normalize=False))

Input x


Output y with condition [0.5, 0.5]


# Snapshot modeling training

For showing the simplest case about **snapshot modeling scenario**

We provide the input-output audio pair in `example_wavs/snapshot_examples`


The input audio file contains the dry signal, while the output audio file contains the wet signal. The goal of neural audio effect modeling is to train a neural network to replicate the implicit behavior that transforms the dry signal into the wet signal. 

We can start by listening the input-output file. 

In [None]:

# We can hear 5 sec about the dry signal, and the output signal we aim to emulate
wav_x, sr_x = librosa.load('./example_wavs/snapshot_examples/input.wav', sr=None, mono=True)
wav_y, sr_y = librosa.load('./example_wavs/snapshot_examples/output.wav', sr=None, mono=True)

assert sr_x == sr_y

print('Input x')
ipd.display(ipd.Audio(data=wav_x[:sr_x*5], rate=sr_x, normalize=False))
print(f'Output y')
ipd.display(ipd.Audio(data=wav_y[:sr_y*5], rate=sr_y, normalize=False))

Input x


Output y


### Preprocess data to match the template

In the workflow demonstrated in the README.md and the paper, we need to preprocess the data to match the template expected by PyNeuralFx.
For the snapshot-modeling scenario, we can directly use `preprocess/preproc_snapshot.py` to accomplish this. In this example, we have already set it up for you.


However, if you're using this for your own task, remember to modify `path_to_x`, `path_to_y`, and `path_to_save` in preprocess/preproc_snapshot.py.

In [9]:
!python3 preprocess/preproc_snapshot.py



### Set the configuration file

Go to `configs/` directory and choose one configuration file.

In this example, we choose `configs/cnn/gcn/snapshot_gcn.yml`

You don't have to modify anything in this tutorial, because we default help you set to the proper settings. However, in your own task, make sure you modify to the correct configurations.

### Training

In this example, you also don't have to modify anything. 

In your own task, remember to at least change the path to the configuration file.

In [11]:
!python3 main_snapshot.py

 > config: ./configs/cnn/gcn/snapshot_gcn.yml
> [Loss] --- Hybrid Trans Loss ---
> [Loss] --- Temporal L1 Loss ---
> [Loss] --- STFT Complex Loss ---
> [Loss] is complex: True
> [Loss] win_len: 2048, overlap: 0.75
> [Loss] --- Hybrid Trans Loss ---
> [Loss] --- Temporal L1 Loss ---
> [Loss] --- STFT Complex Loss ---
> [Loss] is complex: True
> [Loss] win_len: 2048, overlap: 0.75
EXP DIR:  exp/snapshot_example
 >>>>> training
> train dataset ready ...........
> valid dataset ready ...........
 [!] saver created!
 > params amount: 17,280 | trainable: 17,280 |  bs: 25  
epoch: 0/10 (  0/ 51) | exp/snapshot_example | t: 0.83 | loss: 0.106907 | time: 0:00:00.8 | counter: 0
pred: max:0.049570, min:0.005716, mean:0.027113
anno: max:0.471922, min:-0.450387, mean:0.000041
 [*] run validation...
 > validation loss: 0.125758 | counter: 0
 [!] --- best model updated ---
 [*] saving model to exp/snapshot_example, name: best
 [*] saving model to exp/snapshot_example, name: latest
epoch: 0/10 ( 10/ 5

### Evaluation

Always remember to evaluate your models.

You can choose the propoer evaluation metric to evaluate you models, please refer to the README.md

In [13]:
!python3 evaluation.py

> [Loss] --- Multi-resolution STFT Loss ---
> [Loss] --- ESR Loss ---
> [Metrics] --- Transient evaluation based on STN separation---
> [Loss] --- ESR Loss ---
> exp name:  snapshot_example
> total audio clips:  1
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:873.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
  x_s = torch.istft(X_s, n_fft=self.n_fft, hop_length=self.hop_length)
100% 1/1 [00:03<00:00,  3.91s/it]
##################################################
#####    evaluation report   ('valid_gen', 'snapshot_example')  #########
##################################################
> metric MRSTFTLoss:  1.2333221435546875
> metric ESRLoss:  0.16281643509864807
> metric Transientv2:  0.9376843571662903
########################

Final Output Comparison

In [15]:
target_y, sr_x = librosa.load('./exp/snapshot_example/valid_gen/anno/output0.wav', sr=None, mono=True)
pred_y, sr_y = librosa.load('./exp/snapshot_example/valid_gen/pred/output0.wav', sr=None, mono=True)

assert sr_x == sr_y

print('Target')
ipd.display(ipd.Audio(data=target_y[:sr_x*5], rate=sr_x, normalize=False))
print(f'Predict')
ipd.display(ipd.Audio(data=pred_y[:sr_y*5], rate=sr_y, normalize=False))

Target


Predict
