# MVSep-MDX23 Colab Fork v2.5
Adaptation of MVSep-MDX23 algorithm for Colab, with few tweaks:

https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.4/MVSep-MDX23-Colab.ipynb  
<br>  

Recent changes:  


**v2.5**
* Kim's MelBand-Roformer model added  


**v2.4**
* BS-Roformer models from viperx added
* MDX-InstHQ4 model added as optionnal
* Flac output
* Control input volume gain
* Filter vocals below 50Hz option
* Better chunking algo (no clicks)
* Some code cleaning

</font>
<br>

<details>
    <summary>Full changelog :</summary>
<br>
<font size=2>
<br>

[**v2.3**](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/tree/v2.3)
* HQ3-Instr model replaced by VitLarge23 (thanks to MVSep)
* Improved MDXv2 processing (thanks to Anjok)
* Improved BigShifts algo (v2)
* BigShifts processing added to MDXv3 & VitLarge
* Faster folder batch processing

[**v2.2.2**](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/tree/v2.2)
* Improved MDXv3 chunking code (thanks to HymnStudio)
* D1581 demo model replaced by new InstVocHQ MDXv3 model.
<br>

**v2.2.1**
* Added custom weights feature
* Fixed some bugs
* Fixed input: you can use a file or a folder as input now
<br>

**v2.2**
* Added MDXv3 compatibility
* Added MDXv3 demo model D1581 in vocals stem multiband ensemble.
* Added VOC-FT Fullband SRS instead of UVR-MDX-Instr-HQ3.
* Added 2stems feature : output only vocals/instrum (faster processing)
* Added 16bit output format option
* Added "BigShift trick" for MDX models
* Added separated overlap values for MDX, MDXv3 and Demucs
* Fixed volume compensation fine-tuning for MDX-VOC-FT
<br>

[**v2.1 (by deton24)**](https://github.com/deton24/MVSEP-MDX23-Colab_v2.1)
* Updated with MDX-VOC-FT instead of Kim Vocal 2
<br>

[**v2.0**](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/tree/2.0)
* Updated with new Kim Vocal 2 & UVR-MDX-Instr-HQ3 models
* Folder batch processing
* Fixed high frequency bleed in vocals
* Fixed volume compensation for MDX models
<br>
</font>
</details>
<br>

Credits:
* [ZFTurbo/MVSep](https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model)
* Models by [Demucs](https://github.com/facebookresearch/demucs), [Anjok](https://github.com/Anjok07/ultimatevocalremovergui), [Kimberley Jensen](https://github.com/KimberleyJensen), [aufr33](https://github.com/aufr33) & viperx
* Adaptation & tweaks by [jarredou](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/)
</font>

In [None]:
#@markdown #Installation
#@markdown *Run this cell to install MVSep-MDX23*
print('Installing... This will take between 1 and 15 minutes, depending of how crappy Colab currently is...')
%cd /content
!git clone -b v2.5 https://github.com/jarredou/MVSEP-MDX23-Colab_v2  &> /dev/null
%cd /content/MVSEP-MDX23-Colab_v2
print('Installing dependencies...')
!pip install -r requirements.txt &> /dev/null
print('Installation done !')

In [None]:
#@markdown #Gdrive connection
from google.colab import drive
drive.mount('/content/drive')

### About settings:


<font size=2>

* **BigShifts :** Better quality/speed performance with values between 3 and 11, **BUT** 11 doesn't always give the best results. Think about it like seed, different values will give slightly different results.<br>
Higher values = longer processing.
</font>

<font size=2>

* **Overlap InstVoc/VitLarge :** No big advantage to use high values when BigShifts is already high. If you use BigShifts=1 (regular processing), you can use higher values like 8 or even 16.<br>
Higher values = longer processing.<br>
 *Same goes with overlap_VOCFT, but with values between 0 and 0.95*
</font>

<font size=2>

* **Weights :** How much importance the result from the given model will have in final results.
</font>


In [None]:
#@markdown #Separation
from pathlib import Path
import glob

%cd /content/MVSEP-MDX23-Colab_v2

#@markdown ---
#@markdown #### separation config:
input = '/content/drive/MyDrive/input' #@param {type:"string"}
output_folder = '/content/drive/MyDrive/output' #@param {type:"string"}

output_format = 'FLAC' #@param ["PCM_16", "FLOAT", "FLAC"]
Separation_mode = 'Vocals/Instrumental' #@param ["Vocals/Instrumental", "4-STEMS"]
input_gain = 0 #@param [0, -3, -6] {type:"raw"}
restore_gain_after_separation = False #@param {type:"boolean"}
filter_vocals_below_50hz = False #@param {type:"boolean"}
#@markdown ___
##@markdown

#@markdown  ### Model config:

#@markdown  *Set BigShifts=1 to disable that feature*
BigShifts = 3 #@param {type:"slider", min:1, max:41, step:1}
#@markdown ---
BSRoformer_model = 'ep_368_1296' #@param ["ep_317_1297", "ep_368_1296"]
weight_BSRoformer = 9.18 #@param {type:"slider", min:0, max:10, step:0.1}
weight_Kim_MelRoformer = 10 #@param {type:"slider", min:0, max:10, step:0.1}
weight_InstVoc = 3.39 #@param {type:"slider", min:0, max:10, step:0.1}
#@markdown ---
use_VitLarge = False #@param {type:"boolean"}
weight_VitLarge = 1 #@param {type:"slider", min:0, max:10, step:0.1}
#@markdown ---
use_InstHQ4 = False #@param {type:"boolean"}
weight_InstHQ4 = 2 #@param {type:"slider", min:0, max:10, step:0.1}
overlap_InstHQ4 = 0.1 #@param {type:"slider", min:0, max:0.95, step:0.05}
#@markdown ---
use_VOCFT = False #@param {type:"boolean"}
weight_VOCFT = 2 #@param {type:"slider", min:0, max:10, step:0.1}
overlap_VOCFT = 0.1 #@param {type:"slider", min:0, max:0.95, step:0.05}
#@markdown ---
#@markdown  *Demucs is only used in 4-STEMS mode.*
overlap_demucs = 0.6 #@param {type:"slider", min:0, max:0.95, step:0.05}

use_InstVoc_ = '--use_InstVoc' #forced use
use_BSRoformer_ =  '--use_BSRoformer' #forced use
use_Kim_MelRoformer_ =  '--use_Kim_MelRoformer' #forced use

use_VOCFT_ = '--use_VOCFT' if use_VOCFT is True else ''
use_VitLarge_ = '--use_VitLarge' if use_VitLarge is True else ''
use_InstHQ4_ = '--use_InstHQ4' if use_InstHQ4 is True else ''
restore_gain = '--restore_gain' if restore_gain_after_separation is True else ''
vocals_only = '--vocals_only' if Separation_mode == 'Vocals/Instrumental' else ''
filter_vocals = '--filter_vocals' if filter_vocals_below_50hz is True else ''

if Path(input).is_file():
  file_path = input
  Path(output_folder).mkdir(parents=True, exist_ok=True)
  !python inference.py \
        --input_audio "{file_path}" \
        --BSRoformer_model {BSRoformer_model} \
        --weight_BSRoformer {weight_BSRoformer} \
        --weight_Kim_MelRoformer {weight_Kim_MelRoformer} \
        --weight_InstVoc {weight_InstVoc} \
        --weight_InstHQ4 {weight_InstHQ4} \
        --weight_VOCFT {weight_VOCFT} \
        --weight_VitLarge {weight_VitLarge} \
        --overlap_demucs {overlap_demucs} \
        --overlap_VOCFT {overlap_VOCFT} \
        --overlap_InstHQ4 {overlap_InstHQ4} \
        --output_format {output_format} \
        --BigShifts {BigShifts} \
        --output_folder "{output_folder}" \
        --input_gain {input_gain} \
        {filter_vocals} \
        {restore_gain} \
        {vocals_only} \
        {use_VitLarge_} \
        {use_VOCFT_} \
        {use_InstHQ4_} \
        {use_InstVoc_} \
        {use_BSRoformer_} \
        {use_Kim_MelRoformer_}


else:
    file_paths = sorted(glob.glob(input + "/*"))[:]
    input_audio_args = ' '.join([f'"{path}"' for path in file_paths])
    Path(output_folder).mkdir(parents=True, exist_ok=True)
    !python inference.py \
        --input_audio {input_audio_args} \
        --BSRoformer_model {BSRoformer_model} \
        --weight_BSRoformer {weight_BSRoformer} \
        --weight_Kim_MelRoformer {weight_Kim_MelRoformer} \
        --weight_InstVoc {weight_InstVoc} \
        --weight_InstHQ4 {weight_InstHQ4} \
        --weight_VOCFT {weight_VOCFT} \
        --weight_VitLarge {weight_VitLarge} \
        --overlap_demucs {overlap_demucs} \
        --overlap_VOCFT {overlap_VOCFT} \
        --overlap_InstHQ4 {overlap_InstHQ4} \
        --output_format {output_format} \
        --BigShifts {BigShifts} \
        --output_folder "{output_folder}" \
        --input_gain {input_gain} \
        {filter_vocals} \
        {restore_gain} \
        {vocals_only} \
        {use_VitLarge_} \
        {use_VOCFT_} \
        {use_InstHQ4_} \
        {use_InstVoc_} \
        {use_BSRoformer_} \
        {use_Kim_MelRoformer_}