-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo track 9 - Frequency cutoff at 11 kHz #3
Comments
Hi JeffreyCA,
TLDR; our models use the full coverage of the frequency-axis of the given spectrogram. |
Hi, I was looking at the output files on your demo page. The outputs for Track 9 (footprint) are cut off at 11k but the other songs are not. https://lasaft.github.io/audios/footprint.mp3 How come it's only this track that is cut off? |
Btw I also noticed on the demo page under Track Infomation, track 8 is "Footprints - Woosung Choi" but in the table track 9 is "Footprints". |
Thank you, I'll revise it later 👍 I guess that this is because the original 'footprint' track is the only 'mp3' file with sample rate of 22050. -- edited-- I'm sorry, but I think it needs more investigation. |
I checked the sample rate of However, the sample rates of all the output .wavs are 22050... |
Yes I was wrong. I also checked it. import os
import librosa
import soundfile
directory = 'demo'
for filename in os.listdir(directory):
if filename.endswith(".wav") or filename.endswith(".mp3"):
path = os.path.join(directory, filename)
print(path)
data, sr=librosa.load(path, mono=False)
data = librosa.resample(data, sr, 44100)
for target in ['vocals', 'drums','bass','other']:
model.separate_track(data.T, target)
os.rename('temp.wav',os.path.join('result', filename[:-4]+'_'+ target+'.wav'))
else:
continue I'll find out what have happened sooner or later |
bty, how did you create the spectrogram image above?? |
I used Audacity (reference) |
I just ran "footprints" through your colab notebook and the higher frequencies are all there. |
Did you run the script for the original only, or both the original and all the separated files? And can you share the script you with me used? |
I did not run the script above. I ran the following in your colab notebook:
Then I downloaded the |
I think I finally got a clue! For short-duration files (<15 secs), it seems that the spectrogram of the separated file has high freqs when you use AUDACITY for spectrogram analysis. I installed audacity and used spec view with max freq: 22050 and window size: 1024-(default). Try this file: https://github.com/lasaft/lasaft.github.io/blob/master/audios/shortprint.wav I guess this issue was caused by very complex reasons such as AUDACITY's visualization methods. BTW, our models do not exploit an explicit 'Frequency cutoff'. TLDR; for short duration files (<15 secs), it seems that the spectrogram of the separated file has high freqs when you use AUDACITY for spectrogram analysis. Our models do not exploit an explicit 'Frequency cutoff'. |
Sorry, but I'm not sure I understand... To make it more clear, I uploaded two files here: https://github.com/JeffreyCA/footprint/ JeffreyCA-footprint-vocals.wav is what I generated through your collab to isolate the vocals of If you use Audacity or any other spectrogram tool to compare the two, they are clearly different: In original-footprint-vocals.wav, there's nothing above 11 kHz (which shouldn't be happening), but in JeffreyCA-footprint-vocals.wav those frequencies are there (this is expected). |
Yes I tried what you suggested but for the original footprint-vocals.wav the high frequencies are not there. To clarify, all the other demo tracks are fine. It's just the footprint track that's weird. |
Oh, it's very weird. You are right. The separated results of footprints in the updated demo page do not have this issue. Anyway, thank you very much again! Also please let me know if there are further issues. |
Thank you, the new tracks look good! Another thing, how come if you add up all the individual sources (vocals + other + bass + drums), it doesn't sound the same as the original? If you focus on the higher frequencies of the vocals or hi-hats they sound muffled compared to the original. It's not just this model, it seems like a common observation I've seen across other source separation models as well. |
It's a good point. I recommend you to read the section Energy Preserved Wasserstein Learning of this paper [1]. As mentioned in the paper:
Trained with this auxiliary loss function, a source separation model can produce better results. The sum of its separated results would be closer to the original. We have not exploited the auxiliary loss function for training our models for the sake of simplicity. [1] Zhang, Ning, Junchi Yan, and Yuchen Zhou. "Weakly supervised audio source separation via spectrum energy preserved wasserstein learning." arXiv preprint arXiv:1711.04121 (2017). |
Thanks for your help! I'll go ahead and close this issue now. |
This looks promising, very nice work!
It looks like the separated parts for the footprint track cut off at 11 kHz, similar to Spleeter. Do you know what's going on here? The other tracks don't have this cut off.
The text was updated successfully, but these errors were encountered: