Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo track 9 - Frequency cutoff at 11 kHz #3

Closed
JeffreyCA opened this issue Dec 22, 2020 · 20 comments
Closed

Demo track 9 - Frequency cutoff at 11 kHz #3

JeffreyCA opened this issue Dec 22, 2020 · 20 comments

Comments

@JeffreyCA
Copy link

JeffreyCA commented Dec 22, 2020

This looks promising, very nice work!
It looks like the separated parts for the footprint track cut off at 11 kHz, similar to Spleeter. Do you know what's going on here? The other tracks don't have this cut off.

Screen Shot 2020-12-21 at 9 39 27 PM

@ws-choi
Copy link
Owner

ws-choi commented Dec 22, 2020

Hi JeffreyCA,
Is this image the output of our model?
Can you share the separated outputs in the wav or mp3 format?

  • To clarify,
    • our model takes an audio file (44100hz)
    • it applies stft (n_fft:2048 or 4096) on the audio file to obtain the complex-valued spectrgram
    • it does not cut off any freq
    • it estimates the target complex-valued spectrogram of the target
    • it reconstructs signal by applying i-stft.

TLDR; our models use the full coverage of the frequency-axis of the given spectrogram.

@JeffreyCA
Copy link
Author

Hi, I was looking at the output files on your demo page. The outputs for Track 9 (footprint) are cut off at 11k but the other songs are not.

https://lasaft.github.io/audios/footprint.mp3
https://lasaft.github.io/audios/footprint-vocals.wav
https://lasaft.github.io/audios/footprint-bass.wav
https://lasaft.github.io/audios/footprint-drums.wav
https://lasaft.github.io/audios/footprint-other.wav

How come it's only this track that is cut off?

@JeffreyCA
Copy link
Author

Btw I also noticed on the demo page under Track Infomation, track 8 is "Footprints - Woosung Choi" but in the table track 9 is "Footprints".

@JeffreyCA JeffreyCA changed the title Separating above 11 kHz Demo track 9 - Frequency cutoff at 11 kHz Dec 22, 2020
@ws-choi
Copy link
Owner

ws-choi commented Dec 22, 2020

Thank you, I'll revise it later 👍
It is a very interesting result since I used the same script for separating sources of footprint.

I guess that this is because the original 'footprint' track is the only 'mp3' file with sample rate of 22050.
The other tracks are wav files.

-- edited--

I'm sorry, but I think it needs more investigation.
It turns out that all demo files have the same sample rate.

@JeffreyCA
Copy link
Author

JeffreyCA commented Dec 22, 2020

I checked the sample rate of footprint.mp3 on my local machine and it says it's 44.1 kHz. I think when you do librosa.load without specifying the sr parameter it defaults to 22150, see this.

However, the sample rates of all the output .wavs are 22050...

@ws-choi
Copy link
Owner

ws-choi commented Dec 22, 2020

Yes I was wrong. I also checked it.
Below is the script for separating them.
I put all tracks in the demo directory and ran the script below:

import os
import librosa
import soundfile

directory = 'demo'
for filename in os.listdir(directory):
    if filename.endswith(".wav") or filename.endswith(".mp3"):
        path = os.path.join(directory, filename)
        print(path)
        data, sr=librosa.load(path, mono=False)
        data = librosa.resample(data, sr, 44100)
        
        for target in ['vocals', 'drums','bass','other']:
          model.separate_track(data.T, target)
          os.rename('temp.wav',os.path.join('result', filename[:-4]+'_'+ target+'.wav')) 

    else:
        continue

I'll find out what have happened sooner or later

@ws-choi
Copy link
Owner

ws-choi commented Dec 22, 2020

bty, how did you create the spectrogram image above??

@JeffreyCA
Copy link
Author

bty, how did you create the spectrogram image above??

I used Audacity (reference)

@JeffreyCA
Copy link
Author

I just ran "footprints" through your colab notebook and the higher frequencies are all there.

@ws-choi
Copy link
Owner

ws-choi commented Dec 23, 2020

I just ran "footprints" through your colab notebook and the higher frequencies are all there.

Did you run the script for the original only, or both the original and all the separated files? And can you share the script you with me used?

@JeffreyCA
Copy link
Author

I did not run the script above. I ran the following in your colab notebook:

!wget https://lasaft.github.io/audios/footprint.mp3
# ...
# load model, etc
# ...
audio, rate = librosa.load('footprint.mp3', sr=44100, mono=False)
separated = model.separate_track(audio.T, 'drums')

Then I downloaded the temp.wav to my computer and checked the spectrogram.

@ws-choi
Copy link
Owner

ws-choi commented Dec 23, 2020

I think I finally got a clue!

For short-duration files (<15 secs), it seems that the spectrogram of the separated file has high freqs when you use AUDACITY for spectrogram analysis.

I installed audacity and used spec view with max freq: 22050 and window size: 1024-(default).

spec

Try this file: https://github.com/lasaft/lasaft.github.io/blob/master/audios/shortprint.wav
The file is a short version of the 'footprint' track.


I guess this issue was caused by very complex reasons such as AUDACITY's visualization methods.

BTW, our models do not exploit an explicit 'Frequency cutoff'.
Regardless of the length of given input audio, it applies the same script for separation.


TLDR; for short duration files (<15 secs), it seems that the spectrogram of the separated file has high freqs when you use AUDACITY for spectrogram analysis. Our models do not exploit an explicit 'Frequency cutoff'.

@JeffreyCA
Copy link
Author

JeffreyCA commented Dec 23, 2020

Sorry, but I'm not sure I understand...

To make it more clear, I uploaded two files here: https://github.com/JeffreyCA/footprint/

JeffreyCA-footprint-vocals.wav is what I generated through your collab to isolate the vocals of footprint.mp3.
original-footprint-vocals.wav is the same file as https://lasaft.github.io/audios/footprint-vocals.wav.

If you use Audacity or any other spectrogram tool to compare the two, they are clearly different:
Screen Shot 2020-12-22 at 11 05 27 PM

In original-footprint-vocals.wav, there's nothing above 11 kHz (which shouldn't be happening), but in JeffreyCA-footprint-vocals.wav those frequencies are there (this is expected).

JeffreyCA-footprint-vocals.wav:
Screen Shot 2020-12-22 at 11 07 52 PM

original-footprint-vocals.wav:
Screen Shot 2020-12-22 at 11 08 15 PM

@ws-choi
Copy link
Owner

ws-choi commented Dec 23, 2020

Have you ever tried this?
1
2

It seems AUDACITY automatically filters out relatively 'noisy' freqs, for better visualization.

@ws-choi
Copy link
Owner

ws-choi commented Dec 23, 2020

3

spec analysis of separated 'vocals' file of 'shortprint.wav'

@JeffreyCA
Copy link
Author

JeffreyCA commented Dec 23, 2020

Yes I tried what you suggested but for the original footprint-vocals.wav the high frequencies are not there.

To clarify, all the other demo tracks are fine. It's just the footprint track that's weird.
Here's a different tool: https://academo.org/demos/spectrum-analyzer/. If you upload the footprint-vocals.wav and play it, you see no high frequencies. If you upload my version, they are there.

@ws-choi
Copy link
Owner

ws-choi commented Dec 23, 2020

Oh, it's very weird. You are right.
I'm very sorry for my misunderstanding.
I'm re-generating files for footprints, and I'll re-post it when it finished.
It seems these files have high frequencies.
Thank you very much!


The separated results of footprints in the updated demo page do not have this issue.
Another demo page that we created two months ago also does not have this issue, making me more confused.

Anyway, thank you very much again! Also please let me know if there are further issues.

@JeffreyCA
Copy link
Author

Thank you, the new tracks look good!

Another thing, how come if you add up all the individual sources (vocals + other + bass + drums), it doesn't sound the same as the original? If you focus on the higher frequencies of the vocals or hi-hats they sound muffled compared to the original. It's not just this model, it seems like a common observation I've seen across other source separation models as well.

@ws-choi
Copy link
Owner

ws-choi commented Dec 23, 2020

It's a good point. I recommend you to read the section Energy Preserved Wasserstein Learning of this paper [1].

As mentioned in the paper:

the loss function involves i) the energy preservation term to restrict the separated sources's total energy is close to the mixed one

Trained with this auxiliary loss function, a source separation model can produce better results. The sum of its separated results would be closer to the original.

We have not exploited the auxiliary loss function for training our models for the sake of simplicity.

[1] Zhang, Ning, Junchi Yan, and Yuchen Zhou. "Weakly supervised audio source separation via spectrum energy preserved wasserstein learning." arXiv preprint arXiv:1711.04121 (2017).

@JeffreyCA
Copy link
Author

Thanks for your help! I'll go ahead and close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants