Demo track 9 - Frequency cutoff at 11 kHz #3

JeffreyCA · 2020-12-22T02:38:20Z

This looks promising, very nice work!
It looks like the separated parts for the footprint track cut off at 11 kHz, similar to Spleeter. Do you know what's going on here? The other tracks don't have this cut off.

ws-choi · 2020-12-22T02:53:52Z

Hi JeffreyCA,
Is this image the output of our model?
Can you share the separated outputs in the wav or mp3 format?

To clarify,
- our model takes an audio file (44100hz)
- it applies stft (n_fft:2048 or 4096) on the audio file to obtain the complex-valued spectrgram
- it does not cut off any freq
- it estimates the target complex-valued spectrogram of the target
- it reconstructs signal by applying i-stft.

TLDR; our models use the full coverage of the frequency-axis of the given spectrogram.

JeffreyCA · 2020-12-22T03:02:25Z

Hi, I was looking at the output files on your demo page. The outputs for Track 9 (footprint) are cut off at 11k but the other songs are not.

https://lasaft.github.io/audios/footprint.mp3
https://lasaft.github.io/audios/footprint-vocals.wav
https://lasaft.github.io/audios/footprint-bass.wav
https://lasaft.github.io/audios/footprint-drums.wav
https://lasaft.github.io/audios/footprint-other.wav

How come it's only this track that is cut off?

JeffreyCA · 2020-12-22T03:05:11Z

Btw I also noticed on the demo page under Track Infomation, track 8 is "Footprints - Woosung Choi" but in the table track 9 is "Footprints".

ws-choi · 2020-12-22T03:24:07Z

Thank you, I'll revise it later 👍
It is a very interesting result since I used the same script for separating sources of footprint.

I guess that this is because the original 'footprint' track is the only 'mp3' file with sample rate of 22050.
The other tracks are wav files.

-- edited--

I'm sorry, but I think it needs more investigation.
It turns out that all demo files have the same sample rate.

JeffreyCA · 2020-12-22T03:34:13Z

I checked the sample rate of footprint.mp3 on my local machine and it says it's 44.1 kHz. I think when you do librosa.load without specifying the sr parameter it defaults to 22150, see this.

However, the sample rates of all the output .wavs are 22050...

ws-choi · 2020-12-22T03:39:31Z

Yes I was wrong. I also checked it.
Below is the script for separating them.
I put all tracks in the demo directory and ran the script below:

import os
import librosa
import soundfile

directory = 'demo'
for filename in os.listdir(directory):
    if filename.endswith(".wav") or filename.endswith(".mp3"):
        path = os.path.join(directory, filename)
        print(path)
        data, sr=librosa.load(path, mono=False)
        data = librosa.resample(data, sr, 44100)
        
        for target in ['vocals', 'drums','bass','other']:
          model.separate_track(data.T, target)
          os.rename('temp.wav',os.path.join('result', filename[:-4]+'_'+ target+'.wav')) 

    else:
        continue

I'll find out what have happened sooner or later

ws-choi · 2020-12-22T03:47:29Z

bty, how did you create the spectrogram image above??

JeffreyCA · 2020-12-22T04:11:42Z

bty, how did you create the spectrogram image above??

I used Audacity (reference)

JeffreyCA · 2020-12-23T01:05:05Z

I just ran "footprints" through your colab notebook and the higher frequencies are all there.

ws-choi · 2020-12-23T01:47:42Z

I just ran "footprints" through your colab notebook and the higher frequencies are all there.

Did you run the script for the original only, or both the original and all the separated files? And can you share the script you with me used?

JeffreyCA · 2020-12-23T01:52:18Z

I did not run the script above. I ran the following in your colab notebook:

!wget https://lasaft.github.io/audios/footprint.mp3
# ...
# load model, etc
# ...
audio, rate = librosa.load('footprint.mp3', sr=44100, mono=False)
separated = model.separate_track(audio.T, 'drums')

Then I downloaded the temp.wav to my computer and checked the spectrogram.

ws-choi · 2020-12-23T03:02:53Z

I think I finally got a clue!

For short-duration files (<15 secs), it seems that the spectrogram of the separated file has high freqs when you use AUDACITY for spectrogram analysis.

I installed audacity and used spec view with max freq: 22050 and window size: 1024-(default).

Try this file: https://github.com/lasaft/lasaft.github.io/blob/master/audios/shortprint.wav
The file is a short version of the 'footprint' track.

I guess this issue was caused by very complex reasons such as AUDACITY's visualization methods.

BTW, our models do not exploit an explicit 'Frequency cutoff'.
Regardless of the length of given input audio, it applies the same script for separation.

TLDR; for short duration files (<15 secs), it seems that the spectrogram of the separated file has high freqs when you use AUDACITY for spectrogram analysis. Our models do not exploit an explicit 'Frequency cutoff'.

JeffreyCA · 2020-12-23T04:09:19Z

Sorry, but I'm not sure I understand...

To make it more clear, I uploaded two files here: https://github.com/JeffreyCA/footprint/

JeffreyCA-footprint-vocals.wav is what I generated through your collab to isolate the vocals of footprint.mp3.
original-footprint-vocals.wav is the same file as https://lasaft.github.io/audios/footprint-vocals.wav.

If you use Audacity or any other spectrogram tool to compare the two, they are clearly different:

In original-footprint-vocals.wav, there's nothing above 11 kHz (which shouldn't be happening), but in JeffreyCA-footprint-vocals.wav those frequencies are there (this is expected).

JeffreyCA-footprint-vocals.wav:

original-footprint-vocals.wav:

ws-choi · 2020-12-23T04:19:52Z

Have you ever tried this?

It seems AUDACITY automatically filters out relatively 'noisy' freqs, for better visualization.

ws-choi · 2020-12-23T04:23:51Z

spec analysis of separated 'vocals' file of 'shortprint.wav'

JeffreyCA · 2020-12-23T04:27:53Z

Yes I tried what you suggested but for the original footprint-vocals.wav the high frequencies are not there.

To clarify, all the other demo tracks are fine. It's just the footprint track that's weird.
Here's a different tool: https://academo.org/demos/spectrum-analyzer/. If you upload the footprint-vocals.wav and play it, you see no high frequencies. If you upload my version, they are there.

ws-choi · 2020-12-23T04:58:15Z

Oh, it's very weird. You are right.
I'm very sorry for my misunderstanding.
I'm re-generating files for footprints, and I'll re-post it when it finished.
It seems these files have high frequencies.
Thank you very much!

The separated results of footprints in the updated demo page do not have this issue.
Another demo page that we created two months ago also does not have this issue, making me more confused.

Anyway, thank you very much again! Also please let me know if there are further issues.

JeffreyCA · 2020-12-23T16:03:24Z

Thank you, the new tracks look good!

Another thing, how come if you add up all the individual sources (vocals + other + bass + drums), it doesn't sound the same as the original? If you focus on the higher frequencies of the vocals or hi-hats they sound muffled compared to the original. It's not just this model, it seems like a common observation I've seen across other source separation models as well.

ws-choi · 2020-12-23T16:26:27Z

It's a good point. I recommend you to read the section Energy Preserved Wasserstein Learning of this paper [1].

As mentioned in the paper:

the loss function involves i) the energy preservation term to restrict the separated sources's total energy is close to the mixed one

Trained with this auxiliary loss function, a source separation model can produce better results. The sum of its separated results would be closer to the original.

We have not exploited the auxiliary loss function for training our models for the sake of simplicity.

[1] Zhang, Ning, Junchi Yan, and Yuchen Zhou. "Weakly supervised audio source separation via spectrum energy preserved wasserstein learning." arXiv preprint arXiv:1711.04121 (2017).

JeffreyCA · 2020-12-23T17:16:35Z

Thanks for your help! I'll go ahead and close this issue now.

JeffreyCA changed the title ~~Separating above 11 kHz~~ Demo track 9 - Frequency cutoff at 11 kHz Dec 22, 2020

JeffreyCA closed this as completed Dec 23, 2020

JeffreyCA mentioned this issue Dec 23, 2020

Support additional source separation models JeffreyCA/spleeter-web#19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo track 9 - Frequency cutoff at 11 kHz #3

Demo track 9 - Frequency cutoff at 11 kHz #3

JeffreyCA commented Dec 22, 2020 •

edited

Loading

ws-choi commented Dec 22, 2020

JeffreyCA commented Dec 22, 2020

JeffreyCA commented Dec 22, 2020

ws-choi commented Dec 22, 2020 •

edited

Loading

JeffreyCA commented Dec 22, 2020 •

edited

Loading

ws-choi commented Dec 22, 2020

ws-choi commented Dec 22, 2020

JeffreyCA commented Dec 22, 2020

JeffreyCA commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020 •

edited

Loading

ws-choi commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020 •

edited

Loading

ws-choi commented Dec 23, 2020 •

edited

Loading

JeffreyCA commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020

Demo track 9 - Frequency cutoff at 11 kHz #3

Demo track 9 - Frequency cutoff at 11 kHz #3

Comments

JeffreyCA commented Dec 22, 2020 • edited Loading

ws-choi commented Dec 22, 2020

JeffreyCA commented Dec 22, 2020

JeffreyCA commented Dec 22, 2020

ws-choi commented Dec 22, 2020 • edited Loading

JeffreyCA commented Dec 22, 2020 • edited Loading

ws-choi commented Dec 22, 2020

ws-choi commented Dec 22, 2020

JeffreyCA commented Dec 22, 2020

JeffreyCA commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020 • edited Loading

ws-choi commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020 • edited Loading

ws-choi commented Dec 23, 2020 • edited Loading

JeffreyCA commented Dec 23, 2020

ws-choi commented Dec 23, 2020

JeffreyCA commented Dec 23, 2020

JeffreyCA commented Dec 22, 2020 •

edited

Loading

ws-choi commented Dec 22, 2020 •

edited

Loading

JeffreyCA commented Dec 22, 2020 •

edited

Loading

JeffreyCA commented Dec 23, 2020 •

edited

Loading

JeffreyCA commented Dec 23, 2020 •

edited

Loading

ws-choi commented Dec 23, 2020 •

edited

Loading