# The Art of Music: 
## Generating art from emotions in music

This notebook allows a user to input a song of choice and as output the system returns an artwork genrated based on the emotion recognised in the musical piece. 

### Steps to get emotions from music: 
1. Upload song file into a folder with the name of the song into the new_songs folder
2. Optional but recommended: Clip songs into 30s clip using the clip.py script
3. If song file is .mp3 transform it to a .wav file using the transformat.sh script
4. Extract features and predict emotions on 0.5s segments using the predict script

For static images, follow the instructions for static artwork, for dynamic proceed to dynamic artworks

### To get static art:
1. Average the emotion from the prediction csv file outputted from the previous step
2. Use the emotional mapping function to get the mapping from a VA point to a class label that cna be used with the conditional Style-GAN
3. Use the given conditional generate script to generate an image or more based on the emotion


## Setup

In [1]:
!nvidia-smi

Thu Jun  1 10:19:47 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    23W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Install repos

In [3]:
import os
!pip install gdown --upgrade


if os.path.isdir("/content/drive/MyDrive/AoM/art-of-music"):
    %cd "/content/drive/MyDrive/AoM/art-of-music"
elif os.path.isdir("/content/drive/"):
    #install script
    %cd "/content/drive/MyDrive/"
    !mkdir AoM
    %cd AoM
    !git clone https://github.com/williamostensen98/art-of-music.git
    %cd art-of-music

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gdown
  Downloading gdown-4.7.1-py3-none-any.whl (15 kB)
Installing collected packages: gdown
  Attempting uninstall: gdown
    Found existing installation: gdown 4.6.6
    Uninstalling gdown-4.6.6:
      Successfully uninstalled gdown-4.6.6
Successfully installed gdown-4.7.1
/content/drive/MyDrive/AoM/art-of-music


In [4]:
 if os.path.isdir("/content/drive/MyDrive/dynamic_mer"):
    %cd ../
    %cd "/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition"
elif os.path.isdir("/content/drive/"):
    #install script
    %cd "/content/drive/MyDrive/"
    !mkdir dynamic_mer
    %cd dynamic_mer 
    !git clone https://github.com/williamostensen98/Dynamic_Music_Emotion_Recognition.git
    %cd Dynamic_Music_Emotion_Recognition
    !mkdir new_songs
    
else:
    print("Mount drive before installing repo")

/content/drive/MyDrive/AoM
/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition


In [5]:
!git fetch origin
!git pull

Already up to date.


# Code

In [6]:
%cd /content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition

from emotional_mapping import * 

%cd /content/drive/MyDrive/AoM/art-of-music/

import torch.nn.functional as F
import scipy.interpolate
import imageio
from tqdm import tqdm
from typing import List, Tuple, Union, Optional
import moviepy.editor

import pandas as pd 
from functools import reduce
import os
import re

import dnnlib
import numpy
import PIL.Image
import torch
import random

from functools import reduce

import legacy

def size_range(s: str) -> List[int]:
    '''Accept a range 'a-c' and return as a list of 2 ints.'''
    return [int(v) for v in s.split('-')][::-1]

def load_gen(network_pkl, size=None, scale_type="symm"):

  if(size): 
      print('render custom size: ',size)
      print('padding method:', scale_type )
      size = size_range(size)
      custom = True
  else:
      custom = False

  G_kwargs = dnnlib.EasyDict()
  G_kwargs.size = size 
  G_kwargs.scale_type = scale_type

  print('Loading networks from "%s"...' % network_pkl)
  device = torch.device('cuda')
  with dnnlib.util.open_url(network_pkl) as f:
    G = legacy.load_network_pkl(f, custom=custom, **G_kwargs)['G_ema'].to(device) # type: ignore
  
  return G

def get_emotion_times(emotion_list):
  e = []
  count = 1
  for idx in range(len(emotion_list)):
     # getting Consecutive elements
     
     if idx == len(emotion_list) - 1:
       if emotion_list[idx] == emotion_list[idx - 1]:
         e.append((emotion_list[idx], count))
         count = 1
       else:
         e.append((emotion_list[idx], count))
      
     else:
       if emotion_list[idx] == emotion_list[idx + 1]:
           count += 1
       else:
           e.append((emotion_list[idx], count))
           count = 1
  return e

def generateFromCategoryDist(G, dist, seeds, truncation, noise_mode, outdir, device=torch.device("cuda"), conditional_truncation=False):
  assert len(dist) == G.c_dim, "Distribution must be equal to the conditonal dimansion"
  l_ndarray = numpy.array(dist)
  # make sure our output directory exists
  os.makedirs(outdir, exist_ok=True)

  label = torch.as_tensor([l_ndarray], device=device)
  for seed in seeds:
    z = numpy.random.RandomState(seed).randn(1, G.z_dim)
    z = torch.from_numpy(z).to(device) 
    img = G(z, label, truncation_psi=truncation, noise_mode=noise_mode, conditional_truncation=conditional_truncation)
    #print(img)
    img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
    #print(img)
    PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(f'{outdir}/seed1{seed:04d}_class.png')


def get_w_from_seed(G, batch_sz, device, truncation_psi=1.0, seed=None, centroids_path=None, class_idx=None):
    """Get the dlatent from a list of random seeds, using the truncation trick (this could be optional)"""

    if G.c_dim != 0:
        # sample random labels if no class idx is given
        if class_idx is None:
            class_indices = numpy.random.RandomState(seed).randint(low=0, high=G.c_dim, size=(batch_sz))
            class_indices = torch.from_numpy(class_indices).to(device)
            w_avg = G.mapping.w_avg.index_select(0, class_indices)
        else:
            w_avg = G.mapping.w_avg[class_idx].unsqueeze(0).repeat(batch_sz, 1)
            class_indices = torch.full((batch_sz,), class_idx).to(device)

        labels = F.one_hot(class_indices, G.c_dim)

    else:
        w_avg = G.mapping.w_avg.unsqueeze(0)
        labels = None
        if class_idx is not None:
            print('Warning: --class is ignored when running an unconditional network')

    z = numpy.random.RandomState(seed).randn(batch_sz, G.z_dim)
    z = torch.from_numpy(z).to(device)
    w = G.mapping(z, labels, truncation_psi=truncation_psi, conditional_truncation=True)

    # multimodal truncation
    if centroids_path is not None:

        with dnnlib.util.open_url(centroids_path, verbose=False) as f:
            w_centroids = numpy.load(f)
        w_centroids = torch.from_numpy(w_centroids).to(device)
        w_centroids = w_centroids[None].repeat(batch_sz, 1, 1)

        # measure distances
        dist = torch.norm(w_centroids - w[:, :1], dim=2, p=2)
        w_avg = w_centroids[0].index_select(0, dist.argmin(1))

    w_avg = w_avg.unsqueeze(1).repeat(1, G.mapping.num_ws, 1)
    w = w_avg + (w - w_avg) * truncation_psi


    return w

def layout_grid(img, grid_w=None, grid_h=1, float_to_uint8=True, chw_to_hwc=True, to_numpy=True):
    batch_size, channels, img_h, img_w = img.shape
    if grid_w is None:
        grid_w = batch_size // grid_h
    assert batch_size == grid_w * grid_h
    if float_to_uint8:
        img = (img * 127.5 + 128).clamp(0, 255).to(torch.uint8)
    img = img.reshape(grid_h, grid_w, channels, img_h, img_w)
    img = img.permute(2, 0, 3, 1, 4)
    img = img.reshape(channels, grid_h * img_h, grid_w * img_w)
    if chw_to_hwc:
        img = img.permute(1, 2, 0)
    if to_numpy:
        img = img.cpu().numpy()
    return img

def slerp(t, v0, v1, DOT_THRESHOLD=0.9995):
    '''
    Spherical linear interpolation
    Args:
        t (float/np.ndarray): Float value between 0.0 and 1.0
        v0 (np.ndarray): Starting vector
        v1 (np.ndarray): Final vector
        DOT_THRESHOLD (float): Threshold for considering the two vectors as
                               colineal. Not recommended to alter this.
    Returns:
        v2 (np.ndarray): Interpolation vector between v0 and v1
    '''
    #v0 = v0.cpu().detach().numpy()
    #v1 = v1.cpu().detach().numpy()
    # Copy the vectors to reuse them later
    v0_copy = numpy.copy(v0)
    v1_copy = numpy.copy(v1)
    # Normalize the vectors to get the directions and angles
    v0 = v0 / numpy.linalg.norm(v0)
    v1 = v1 / numpy.linalg.norm(v1)
    # Dot product with the normalized vectors (can't use np.dot in W)
    dot = numpy.sum(v0 * v1)
    # If absolute value of dot product is almost 1, vectors are ~colineal, so use lerp
    if numpy.abs(dot) > DOT_THRESHOLD:
        return lerp(t, v0_copy, v1_copy)
    # Calculate initial angle between v0 and v1
    theta_0 = numpy.arccos(dot)
    sin_theta_0 = numpy.sin(theta_0)
    # Angle at timestep t
    theta_t = theta_0 * t
    sin_theta_t = numpy.sin(theta_t)
    # Finish the slerp algorithm
    s0 = numpy.sin(theta_0 - theta_t) / sin_theta_0
    s1 = sin_theta_t / sin_theta_0
    v2 = s0 * v0_copy + s1 * v1_copy
    return torch.from_numpy(v2).to("cuda")

def slerp_interpolate(zs, steps):
    out = []
    for i in range(len(zs)-1):
        for index in range(steps):
            fraction = index/float(steps)
            out.append(slerp(fraction,zs[i],zs[i+1]))
    return out

def lerp(zs, steps):
    out = []
    for i in range(len(zs)-1):
        for index in range(steps):
            t = index/float(steps)
            v = zs[i+1]*t + zs[i]*(1-t)
            out.append(v)
    return out


def gen_interp_video(G, mp4: str, seed, shuffle_seed=None, w_frames=60*4, kind='linear', grid_dims=(1,1), num_keyframes=None, wraps=2, truncation_psi=1, device=torch.device('cuda'), centroids_path=None, class_idx=None, **video_kwargs):
    grid_w = grid_dims[0]
    grid_h = grid_dims[1]

    seeds = [seed]*len(class_idx)

    if num_keyframes is None:
        if len(seeds) % (grid_w*grid_h) != 0:
            raise ValueError('Number of input seeds must be divisible by grid W*H')
        num_keyframes = len(seeds) // (grid_w*grid_h)

    all_seeds = numpy.zeros(num_keyframes*grid_h*grid_w, dtype=numpy.int64)
    for idx in range(num_keyframes*grid_h*grid_w):
        all_seeds[idx] = seeds[idx % len(seeds)]

    if shuffle_seed is not None:
        rng = numpy.random.RandomState(seed=shuffle_seed)
        rng.shuffle(all_seeds)

    if class_idx is None:
        class_idx = [None] * len(seeds)
    elif len(class_idx) == 1:
        class_idx = [class_idx] * len(seeds)
    assert len(all_seeds) == len(class_idx), "Seeds and class-idx should have the same length"

    ws = []
    for seed, cls in zip(all_seeds, class_idx):
        ws.append(
            get_w_from_seed(G, 1, device, truncation_psi, seed=seed,
                                      centroids_path=centroids_path, class_idx=cls)
        )
    ws = torch.cat(ws)

    _ = G.synthesis(ws[:1]) # warm up
    ws = ws.reshape(grid_h, grid_w, num_keyframes, *ws.shape[1:])

    # Interpolation.
    grid = []
    for yi in range(grid_h):
        row = []
        for xi in range(grid_w):
            x = numpy.arange(-num_keyframes * wraps, num_keyframes * (wraps + 1))
            y = numpy.tile(ws[yi][xi].cpu().numpy(), [wraps * 2 + 1, 1, 1])
            interp = scipy.interpolate.interp1d(x, y, kind=kind, axis=0)
            row.append(interp)
        grid.append(row)

    # Render video.
    video_out = imageio.get_writer(mp4, mode='I', fps=60, codec='libx264', **video_kwargs)
    for frame_idx in tqdm(range(num_keyframes * w_frames)):
        imgs = []
        for yi in range(grid_h):
            for xi in range(grid_w):
                interp = grid[yi][xi]
                w = torch.from_numpy(interp(frame_idx / w_frames)).to(device)
                img = G.synthesis(ws=w.unsqueeze(0), noise_mode='const')[0]
                imgs.append(img)
        video_out.append_data(layout_grid(torch.stack(imgs), grid_w=grid_w, grid_h=grid_h))
    video_out.close()



def w_to_img(G, dlatents: Union[List[torch.Tensor], torch.Tensor], noise_mode: str = 'const', to_np: bool = True) -> np.ndarray:
    """
    Get an image/np.ndarray from a dlatent W using G and the selected noise_mode. The final shape of the
    returned image will be [len(dlatents), G.img_resolution, G.img_resolution, G.img_channels].
    """
    assert isinstance(dlatents, torch.Tensor), f'dlatents should be a torch.Tensor!: "{type(dlatents)}"'
    if len(dlatents.shape) == 2:
        dlatents = dlatents.unsqueeze(0)  # An individual dlatent => [1, G.mapping.num_ws, G.mapping.w_dim]

    synth_image = G.synthesis(dlatents, noise_mode=noise_mode)
    synth_image = (synth_image + 1) * 255/2  # [-1.0, 1.0] -> [0.0, 255.0]
    if to_np:
        synth_image = synth_image.permute(0, 2, 3, 1).clamp(0, 255).to(torch.uint8).cpu().numpy()  # NCWH => NWHC
    return synth_image

def combine_audio(audio_file, video_clip, mp4_filename, fps=24, codec='libx264', audio_codec='aac', bitrate='15M'):
  #mp4_filename = '/content/drive/MyDrive/ballad_audio_695.mp4'
  # video_clip = moviepy.editor.VideoClip(render_frame, duration=duration)
  video_clip = moviepy.editor.VideoFileClip(video_clip)
  audio_clip_i = moviepy.editor.AudioFileClip(audio_file)
  video_clip = video_clip.set_audio(audio_clip_i)
  video_clip.write_videofile(mp4_filename, fps=24, codec=codec, audio_codec=audio_codec, bitrate=bitrate)


def gen_single_interp_video(G, seed, class_idx, time, frame_outdir, centroids_path=None, truncation_psi=1, device=torch.device('cuda')):
  
    seeds = [seed]*len(class_idx)
    outdir = "/content/output"
  
    if not os.path.exists(f'{outdir}'):
        os.makedirs(f'{outdir}') 
    if not os.path.exists(f'{outdir}/{frame_outdir}'):
      os.makedirs(f'{outdir}/{frame_outdir}') 
    

    if class_idx is None:
        class_idx = [None] * len(seeds)
    elif len(class_idx) == 1:
        class_idx = [class_idx] * len(seeds)

    # Create interpolated w vectors
    inter_ws = []
    for seed, c in zip(seeds, class_idx):
        ws = get_w_from_seed(G, 1, device, truncation_psi, seed=seed,
                                      centroids_path=centroids_path, class_idx=c)
        
        #w = np.squeeze(ws, axis=0)  # [L, C]
        inter_ws.append(ws)
    
    ## interpolation between classes
    #interpol = slerp_interpolate(inter_ws, num_interpolations)
    
    interpol = lerp(inter_ws, 24 * time)
    
    for idx, w in enumerate(interpol): 
        img = G.synthesis(w, noise_mode=noise_mode, force_fp32=True)
        img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
        PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(f'{outdir}/{frame_outdir}/frame-{idx:04d}.png')

/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
/content/drive/MyDrive/AoM/art-of-music


In [7]:
!pip install audioclipextractor
!pip install opensimplex
!pip install ninja
!pip install audiofile
!pip install opensmile
!pip install imageio-ffmpeg

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting audioclipextractor
  Downloading audioclipextractor-0.3.0-py2.py3-none-any.whl (14 kB)
Installing collected packages: audioclipextractor
Successfully installed audioclipextractor-0.3.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting opensimplex
  Downloading opensimplex-0.4.4-py3-none-any.whl (19 kB)
Installing collected packages: opensimplex
Successfully installed opensimplex-0.4.4
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ninja
  Downloading ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m146.0/146.0 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ninja
Successfully installed ninja-1.11.1
Looking in indexes: https://py

# CONFIG


In [8]:
start = 0
end = 60
song_name = "jimbo"
DATASET_DIR = "new_songs"
song_path = DATASET_DIR + f'/{song_name}/'
song_dir = f'{song_name}/'

num_images = 20
seeds = random.sample(range(1, 1000), num_images)
input_seeds = str(seeds)[1:-1]
input_seeds = input_seeds.replace(" ", "")
print(input_seeds)
truncation = 1
padding_method = "symm"
noise_mode = 'const' # 'const', 'random', 'none'
outdir_avg = f'/content/out/avg/{song_name}/'
outdir_dist = f'/content/out/dist/{song_name}/'
network_pkl = "/content/drive/MyDrive/pkls/g50network-snapshot-004800.pkl"
size = "1920-1360"

571,950,971,649,137,969,9,689,653,91,490,405,684,286,409,313,735,453,899,605


In [None]:
song_name = "2025"
DATASET_DIR = "new_songs"
song_path = DATASET_DIR + f'/{song_name}/'
song_dir = f'{song_name}/'

num_images = 20
seeds = random.sample(range(1, 1000), num_images)
input_seeds = str(seeds)[1:-1]
input_seeds = input_seeds.replace(" ", "")
print(input_seeds)
truncation = 1
padding_method = "symm"
noise_mode = 'const' # 'const', 'random', 'none'
outdir_avg = f'/content/out/avg/{song_name}/'
outdir_dist = f'/content/out/dist/{song_name}/'
network_pkl = "/content/drive/MyDrive/pkls/g50network-snapshot-004800.pkl"
size = "1360-1920"

389,936,243,834,340,752,824,425,210,161,755,852,390,573,258,533,400,177,645,764


# CLIP AUDIO AND TRANSFORM TO WAV

In [None]:
%cd /content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
!python clip.py --song_path=$song_path --output_dir=$song_path --start=$start --end=$end

/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition


In [None]:
%cd /content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
!bash transformat.sh

/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
Please Enter the MP3Path -> 
new_songs/2025/
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr -

# RUN WHOLE SCRIPT

In [None]:
%cd /content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
print('PROCESS: FEATURE EXTRACTION')
print("Predicting arousal and valence")
!python predict.py --input_folder=$song_dir

arousal_pred = pd.read_csv(f'output/{song_name}_regression_results_Arousal.csv')
valence_pred = pd.read_csv(f'output/{song_name}_regression_results_Valence.csv')

arousal_list = arousal_pred['prediction'].values.tolist()
valence_list = valence_pred['prediction'].values.tolist()

arousal_avg = reduce(lambda a, b: a + b, arousal_list) / len(arousal_list)
valence_avg = reduce(lambda a, b: a + b, valence_list) / len(valence_list)

print("Arousal Values")
print(arousal_list)
print(len(arousal_list))

print("Valence Values")
print(valence_list)
print(len(valence_list))

print("Average Arousal value:")
print(arousal_avg)

print("Average Valence value:")
print(valence_avg)

emotional_class = getEmotionFromPoint(valence_avg, arousal_avg)
emotional_category = getEmotionFromPoint(valence_avg, arousal_avg, 1)

# Add from arousal list to class condition list [0.8, 0.54, 0.54] -> [0,2,2]
emotional_list = getEmotionListFromPointList(arousal_list, valence_list)
emotional_list = emotional_list[::2]



#TODO
# Add from arousal list to prrobability distribtution
dist = getEmotionDistribution(emotional_list, normalize=1)

print("Emotional Class: ", emotional_class)
print("Emotional Category: ", emotional_category)
print("Emotional Category list per 0.5s", emotional_list)
print("Emotional Distibution", dist)

%cd ../..
%cd colab-sg2-ada-pytorch
%cd stylegan2-ada-pytorch

print("GENERATING AVERAGE IMAGES")

!python generate.py --outdir=$outdir_avg --trunc=$truncation --class=$emotional_class --seeds=$input_seeds --size=$size --scale-type=$padding_method --network=$network_pkl

print("GENERATING DISTRIBUTED IMAGES")
G = load_gen(network_pkl, size)
generateFromCategoryDist(G, dist, seeds, truncation, noise_mode, outdir_dist)

print("ZIPPING BOTH FOLDERS")
avg_loc = f'/content/drive/MyDrive/out_avg_{song_name}.zip'
dist_loc = f'/content/drive/MyDrive/out_dist_{song_name}.zip'

!zip -r $avg_loc Soutdir_avg

!zip -r $dist_loc $outdir_dist

print("PLEASE PICK SEED TO GENERATE VIDEO")
seed = input("SEED: ")

mp4_emotion = f'/content/drive/MyDrive/{song_name}_{seed}_emotion.mp4'
mp4_audio = f'/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition/new_songs/{song_name}/wav/1-{song_name}.wav'
mp4_combined = f'/content/drive/MyDrive/{song_name}_{seed}_audio.mp4'

print("GENERATING VIDEO")

gen_interp_video(G=G, mp4=mp4_emotion, bitrate='12M', seed=seed, shuffle_seed=False,w_frames=60, class_idx=emotional_list)

print("GENERATING COMBINED AUDIO FILE")

combine_audio(mp4_audio, mp4_emotion, mp4_combined)

/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
PROCESS: FEATURE EXTRACTION
Predicting arousal and valence
Collecting features for song...
num 50
Loading regressor SVRs for Valence and Arousal...
Predicting in Arousal dimension...
Predicting in Valence dimension...
Arousal Values
[-0.2961807802436806, -0.3169495559628302, -0.42841588664721, -0.3830536137040166, -0.3716405946805355, -0.3227533391462493, -0.3519734965146278, -0.4286648591838384, -0.3621630551833957, -0.3625314309817374, -0.299849213114624, -0.2564650441719622, -0.2012971302475386, -0.2387968572767234, -0.2339480890467778, -0.2358296874676961, -0.1876854414861045, -0.0763690579566183, -0.2001195917880547, -0.2765483959657175, -0.3852317089949185, -0.2542352551814599, -0.1163372365612019, -0.1155574723167989, -0.2123043449211297, -0.2188728357426915, -0.2613151720145364, -0.3722556875195061, -0.3387015802479391, -0.2667512969289118, -0.2529991656108908, -0.2350165132602936, -0.3366101592098377, -0.3747

  label = torch.as_tensor([l_ndarray], device=device)



Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
ZIPPING BOTH FOLDERS

zip error: Nothing to do! (try: zip -r /content/drive/MyDrive/out_avg_ribbons.zip . -i Soutdir_avg)
  adding: content/out/dist/ribbons/ (stored 0%)
  adding: content/out/dist/ribbons/seed10998_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10576_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10488_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10529_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10805_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10755_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10712_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10760_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10217_class.png (deflated 0%)
  adding: content/out/dist/ribbons/seed10378_class.png (deflated 0%)
  adding: content/out/dist/ribbon

100%|██████████| 6600/6600 [27:35<00:00,  3.99it/s]


GENERATING COMBINED AUDIO FILE
Moviepy - Building video /content/drive/MyDrive/ribbons_576_audio.mp4.
MoviePy - Writing audio in ribbons_576_audioTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video /content/drive/MyDrive/ribbons_576_audio.mp4



t:  77%|███████▋  | 2039/2640 [06:37<05:09,  1.94it/s, now=None]

# Emotion Recognition

In [9]:
%cd /content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
print('PROCESS: FEATURE EXTRACTION')
print("Predicting arousal and valence")
!python predict.py --input_folder=$song_dir

arousal_pred = pd.read_csv(f'output/{song_name}_regression_results_Arousal.csv')
valence_pred = pd.read_csv(f'output/{song_name}_regression_results_Valence.csv')

arousal_list = arousal_pred['prediction'].values.tolist()
valence_list = valence_pred['prediction'].values.tolist()

arousal_avg = reduce(lambda a, b: a + b, arousal_list) / len(arousal_list)
valence_avg = reduce(lambda a, b: a + b, valence_list) / len(valence_list)

print("Arousal Values")
print(arousal_list)
print(len(arousal_list))

print("Valence Values")
print(valence_list)
print(len(valence_list))

print("Average Arousal value:")
print(arousal_avg)

print("Average Valence value:")
print(valence_avg)

emotional_class = getEmotionFromPoint(valence_avg, arousal_avg)
emotional_category = getEmotionFromPoint(valence_avg, arousal_avg, 1)

# Add from arousal list to class condition list [0.8, 0.54, 0.54] -> [0,2,2]
emotional_list = getEmotionListFromPointList(arousal_list, valence_list)

def round_decimals_down(number:float, decimals:int=2):
    """
    Returns a value rounded down to a specific number of decimal places.
    """
    if not isinstance(decimals, int):
        raise TypeError("decimal places must be an integer")
    elif decimals < 0:
        raise ValueError("decimal places has to be 0 or more")
    elif decimals == 0:
        return math.floor(number)

    factor = 10 ** decimals
    return math.floor(number * factor) / factor

#TODO
# Add from arousal list to prrobability distribtution
dist = getEmotionDistribution(emotional_list, normalize=0)
dist = [round_decimals_down(x,3) for x in dist]

print("Emotional Class: ", emotional_class)
print("Emotional Category: ", emotional_category)
emotional_list = emotional_list[::2]
print("Emotional Category list per 1s", emotional_list)
print("Emotional Distibution", dist)

/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition
PROCESS: FEATURE EXTRACTION
Predicting arousal and valence
Collecting features for song...
Loading regressor SVRs for Valence and Arousal...
Predicting in Arousal dimension...
Predicting in Valence dimension...
Arousal Values
[-0.0157685639435321, -0.032710061978571, -0.214573374279537, -0.2449112162619176, -0.2649119557936679, -0.2321775708018291, 0.0307640898715714, -0.031315920399493, -0.0146720460781063, -0.0669876263125953, -0.0440054700781453, -0.1531680345159419, -0.08571547962268, -0.1437278267264561, -0.1128754257289602, -0.0438027647301313, -0.0345710329579066, -0.0250679397003713, -0.1532111862602411, -0.1558349339995204, -0.09263434797425, -0.3764576441316799, -0.315212295831104, -0.0843956617891669, -0.0501483521081176, -0.2193085958790399, -0.1174051687357007, -0.2161808813243506, -0.2421424306158618, -0.0228402656836406, -0.0577769006378532, -0.0764156342831481, -0.2442283128144512, -0.034072503798168, 

# Generate samples from all classes

In [None]:
%cd /content/drive/MyDrive/AoM/art-of-music/
outdir = "/content/output/"
G = load_gen(network_pkl, size)
for index in range(8):
  label = [0,0,0,0,0,0,0,0]
  out = outdir + str(index) + "/"
  label[index] = 1
  num_images = 50
  seeds = random.sample(range(1, 1000), num_images)
  generateFromCategoryDist(G, label, seeds, 1, "const", out)

In [None]:
!zip -r /content/drive/MyDrive/out.zip /content/output

In [None]:
from google.colab import files
files.download("/content/file.zip")

# Samples from category dist

In [10]:
%cd /content/drive/MyDrive/AoM/art-of-music/

print("GENERATING DISTRIBUTED IMAGES")
G = load_gen(network_pkl, size)
generateFromCategoryDist(G, dist, seeds, 1, noise_mode, outdir_dist)

print("ZIPPING BOTH FOLDERS")
dist_loc = f'/content/drive/MyDrive/out_{song_name}.zip'

!zip -r $dist_loc $outdir_dist

/content/drive/MyDrive/AoM/art-of-music
GENERATING DISTRIBUTED IMAGES
render custom size:  1920-1360
padding method: symm
Loading networks from "/content/drive/MyDrive/pkls/g50network-snapshot-004800.pkl"...


  label = torch.as_tensor([l_ndarray], device=device)



Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
ZIPPING BOTH FOLDERS
  adding: content/out/dist/jimbo/ (stored 0%)
  adding: content/out/dist/jimbo/seed10689_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10009_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10409_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10899_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10490_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10950_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10313_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10405_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10091_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10735_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10605_class.png (deflated 0%)
  adding: content/out/dist/jimbo/seed10653_class.png (deflated 0%)
  adding: content/out/

## For musical match survey

In [None]:
%cd /content/drive/MyDrive/AoM/art-of-music/

print("GENERATING AVERAGE IMAGES")

!python generate.py --outdir=$outdir_avg --trunc=$truncation --class=$emotional_class --seeds=$input_seeds --size=$size --scale-type=$padding_method --network=$network_pkl

print("GENERATING DISTRIBUTED IMAGES")
G = load_gen(network_pkl, size)
generateFromCategoryDist(G, dist, seeds, truncation, noise_mode, outdir_dist)

print("ZIPPING BOTH FOLDERS")
avg_loc = f'/content/drive/MyDrive/out_avg_{song_name}.zip'
dist_loc = f'/content/drive/MyDrive/out_dist_{song_name}.zip'

!zip -r $avg_loc $outdir_avg

!zip -r $dist_loc $outdir_dist

/content/drive/MyDrive/AoM/art-of-music
GENERATING AVERAGE IMAGES
  elif(len(seeds) is not 3):
render custom size:  [1920, 1360]
padding method: symm
Loading networks from "/content/drive/MyDrive/pkls/g50network-snapshot-004800.pkl"...
Generating image for seed 389 (0/20) ...
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Generating image for seed 936 (1/20) ...
Generating image for seed 243 (2/20) ...
Generating image for seed 834 (3/20) ...
Generating image for seed 340 (4/20) ...
Generating image for seed 752 (5/20) ...
Generating image for seed 824 (6/20) ...
Generating image for seed 425 (7/20) ...
Generating image for seed 210 (8/20) ...
Generating image for seed 161 (9/20) ...
Generating image for seed 755 (10/20) ...
Generating image for seed 852 (11/20) ...
Generating image for seed 390 (12/20) ...
Generating image for seed 573 (13/20) ...
Generating image for seed 258 (14/20) ...
Generating image for seed 533 (15/20)

# Generate Interpolation video - Alt 1

In [None]:
%cd /content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch
print("PLEASE PICK SEED TO GENERATE VIDEO")
seed = input("SEED: ")
G = load_gen(network_pkl, size)
mp4_emotion = f'/content/drive/MyDrive/{song_name}_{seed}_emotion2.mp4'
mp4_audio = f'/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition/new_songs/{song_name}/wav/1-{song_name}.wav'
mp4_combined = f'/content/drive/MyDrive/{song_name}_{seed}_audio2.mp4'

print("GENERATING VIDEO")

gen_interp_video(G=G, mp4=mp4_emotion, bitrate='12M', seed=seed, shuffle_seed=False,w_frames=60, class_idx=emotional_list)

print("GENERATING COMBINED AUDIO FILE")

combine_audio(mp4_audio, mp4_emotion, mp4_combined)

/content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch
PLEASE PICK SEED TO GENERATE VIDEO
SEED: 253
render custom size:  1920-1280
padding method: symm
Loading networks from "/content/drive/MyDrive/network-snapshot-004640.pkl"...
GENERATING VIDEO


100%|██████████| 1920/1920 [09:52<00:00,  3.24it/s]


GENERATING COMBINED AUDIO FILE
Moviepy - Building video /content/drive/MyDrive/lambs_wool_253_audio2.mp4.
MoviePy - Writing audio in lambs_wool_253_audio2TEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video /content/drive/MyDrive/lambs_wool_253_audio2.mp4





Moviepy - Done !
Moviepy - video ready /content/drive/MyDrive/lambs_wool_253_audio2.mp4


# Interpolation vide alt 2 

In [11]:
e = get_emotion_times(emotional_list)
print(e)
print("number of parts :", len(e))

[(4, 2), (3, 1), (5, 1), (3, 1), (4, 1), (3, 1), (4, 2), (3, 1), (4, 1), (3, 1), (4, 2), (3, 1), (4, 1), (3, 1), (4, 1), (3, 1), (0, 1), (3, 2), (4, 2), (6, 1), (0, 1), (1, 1), (4, 2), (1, 1), (4, 2), (0, 2), (1, 1), (0, 1), (1, 1), (0, 1), (2, 1), (0, 1), (1, 1), (0, 1), (3, 1), (6, 1), (4, 1), (0, 1), (1, 1), (0, 1), (2, 1), (0, 1), (5, 2), (7, 1), (6, 1), (1, 1), (4, 1), (1, 1), (0, 2), (1, 1)]
number of parts : 50


In [12]:
#e = get_emotion_times(emotional_list)
%cd /content/drive/MyDrive/AoM/art-of-music/
!mkdir /content/output/
!mkdir /content/video_parts/


seed = 950

G = load_gen(network_pkl, size)

for i in range(len(e)-1):
  if i == len(e) - 2:
    gen_single_interp_video(G,seed, [e[i][0], e[i+1][0]],(e[i][1]+e[i+1][1]),f'video_part_{i}', truncation_psi=1.0)
    break
    #gen_single_interp_video(G,seed,[e[-1][0], e[-1][0]], e[-1][1], f'video_part_{i+1}')

  gen_single_interp_video(G,seed, [e[i][0], e[i+1][0]],e[i][1], f'video_part_{i}',truncation_psi=1.0)


/content/drive/MyDrive/AoM/art-of-music
render custom size:  1920-1360
padding method: symm
Loading networks from "/content/drive/MyDrive/pkls/g50network-snapshot-004800.pkl"...


In [13]:

import subprocess
import shlex

num_parts = len(next(os.walk('/content/output/'))[1])
print(num_parts)
for i in range(num_parts):
  cmd = f'ffmpeg -i /content/output/video_part_{i}/frame-%04d.png -r 24 -vcodec libx264 -pix_fmt yuv420p /content/video_parts/file{i}.mp4'
  print("Generating video part", i, "out of", num_parts)
  try:
    !{cmd}
    
  except:
    print("Failed to create interpolation video")

for i in range(num_parts):
  arrows = ">" if i == 0 else ">>"
  video = f'echo file /content/video_parts/file{i}.mp4 {arrows} /content/videolist.txt'
  try:
    !{video}
  except:
    print("Failed to create interpolation video")



49
Generating video part 0 out of 49
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --ena

In [14]:
!ffmpeg -f concat -safe 0 -i /content/videolist.txt -c copy /content/output.mp4

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --e

In [15]:
mp4_combined_alt = f'/content/drive/MyDrive/{song_name}_{seed}_audio_alt.mp4'
mp4_audio = f'/content/drive/MyDrive/dynamic_mer/Dynamic_Music_Emotion_Recognition/new_songs/{song_name}/wav/1-{song_name}.wav'
combine_audio(mp4_audio, "/content/output.mp4", mp4_combined_alt)

Moviepy - Building video /content/drive/MyDrive/jimbo_950_audio_alt.mp4.
MoviePy - Writing audio in jimbo_950_audio_altTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video /content/drive/MyDrive/jimbo_950_audio_alt.mp4






Moviepy - Done !
Moviepy - video ready /content/drive/MyDrive/jimbo_950_audio_alt.mp4


# Replicate sr-gan


In [None]:
!pip install replicate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting replicate
  Downloading replicate-0.8.1.tar.gz (22 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: replicate
  Building wheel for replicate (pyproject.toml) ... [?25l[?25hdone
  Created wheel for replicate: filename=replicate-0.8.1-py3-none-any.whl size=21114 sha256=e6b174eb7ed32e4ad01277b311b5c2e7193b3d2435b36c299d569ae4801f186e
  Stored in directory: /root/.cache/pip/wheels/7f/de/2f/7f55f2dcb401baa19b90823c42391b55b09a372f5751356b02
Successfully built replicate
Installing collected packages: replicate
Successfully installed replicate-0.8.1


In [None]:
import os
os.environ['REPLICATE_API_TOKEN'] = 'r8_7Ymank6Yv2QQPX27H538NRTKn95Krp34HTdwG'

In [None]:
import replicate
from urllib.request import urlretrieve

output = replicate.run(
    "nightmareai/real-esrgan:42fed1c4974146d4d2414e2be2c5277c7fcf05fcc3a73abf41610695738c1d7b",
    input={"image": open("/content/out/dist/mental/seed10909_class.png", "rb")},
    scale=2
)
print(output)

ModelError: ignored