This notebook is to blend 250 images together of genres and decades as proof of concept for this idea

### Read in Data

In [11]:
#load in packages
import pandas as pd
import numpy as np
import graphlab as gl
import shutil
import random

In [4]:
#load in data

final = pd.read_csv("final_pull.csv")
del final["Unnamed: 0"]
final.head()

Unnamed: 0,artistName,artworkUrl100,releaseDate,collectionName,decade,updated_genre,target
0,Admiral Bailey,http://is5.mzstatic.com/image/thumb/Music/v4/0...,1987-06-09T07:00:00Z,The Best of Admiral Bailey,1980.0,Reggae,Reggae-1980.0
1,Admiral Bailey,http://is3.mzstatic.com/image/thumb/Music/v4/0...,1988-01-01T08:00:00Z,Big Belly,1980.0,Reggae,Reggae-1980.0
2,Admiral Bailey,http://is4.mzstatic.com/image/thumb/Music/v4/3...,1987-06-09T07:00:00Z,Best of Admiral Bailey,1980.0,Reggae,Reggae-1980.0
3,Admiral Bailey,http://is5.mzstatic.com/image/thumb/Music/v4/2...,2006-06-24T07:00:00Z,Admiral Bailey's Turn Off the Heat - EP,2000.0,World,World-2000.0
4,Admiral Bailey,http://is5.mzstatic.com/image/thumb/Music/v4/8...,2011-06-20T07:00:00Z,Dela Move,2010.0,Reggae,Reggae-2010.0


In [5]:
#load in pivot to identify sample sizes

genre_decade = pd.pivot_table(final, index = ["updated_genre", "decade"], values = ["artistName"], aggfunc = len).unstack()
genre_decade

Unnamed: 0_level_0,artistName,artistName,artistName,artistName
decade,1980.0,1990.0,2000.0,2010.0
updated_genre,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Alternative,401,1527,6718,7961
Blues,13,112,545,406
Country,202,1023,4098,4931
Dance,91,686,7278,18717
Electronic,71,966,5275,8165
HipHop,21,266,2444,4441
Jazz,83,329,1481,1564
Pop,277,889,4718,7438
RB Soul,62,267,1167,1523
Reggae,185,633,3233,6821


### Begin to split data by genre and by decade

In [6]:
#pull 800 of each of sample

genre_list = list(final["updated_genre"].unique())

genre_samps = []
for i in genre_list:
    genre = final[final["updated_genre"] == i].sample(n=800, random_state = 77)
    genre_samps.append(genre)
genre_sample = pd.concat(genre_samps)

decade_list = list(final["decade"].unique())

decade_samps = []
for i in decade_list:
    decade = final[final["decade"] == i].sample(n=1000, random_state = 77)
    decade_samps.append(decade)
    
decade_sample = pd.concat(decade_samps)

In [7]:
y_genre = genre_sample[["artistName","updated_genre"]]
y_decade = decade_sample[["artistName", "decade"]]

In [8]:
#change the target for decade to be binary (pre 2000 and post 2000)
y_decade["decade"] = y_decade["decade"].apply(lambda x: "1980 to 2000" if x == 1980 or x == 1990 else "2000 to Present")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [9]:
#split up lists by target variable

pre2000_ind = y_decade[y_decade["decade"] == "1980 to 2000"]
post2000_ind = y_decade[y_decade["decade"] == "2000 to Present"]

In [12]:
#require image paths to pull lists 

pre2000_paths = ["../image/"+ str(i+1)+".jpg" for i in pre2000_ind.index]
post2000_paths = ["../image/"+ str(i+1)+".jpg" for i in post2000_ind.index]

random.shuffle(pre2000_paths)
random.shuffle(post2000_paths)

In [13]:
#do the same for genre

reggae_ind = y_genre[y_genre["updated_genre"] == "Reggae"]
world_ind = y_genre[y_genre["updated_genre"] == "World"]
dance_ind = y_genre[y_genre["updated_genre"] == "Dance"]
rock_ind = y_genre[y_genre["updated_genre"] == "Rock"]
alt_ind = y_genre[y_genre["updated_genre"] == "Alternative"]
pop_ind = y_genre[y_genre["updated_genre"] == "Pop"]
hiphop_ind = y_genre[y_genre["updated_genre"] == "HipHop"]
jazz_ind = y_genre[y_genre["updated_genre"] == "Jazz"]
rb_ind = y_genre[y_genre["updated_genre"] == "RB Soul"]
electronic_ind = y_genre[y_genre["updated_genre"] == "Electronic"]
country_ind = y_genre[y_genre["updated_genre"] == "Country"]
sing_ind = y_genre[y_genre["updated_genre"] == "Singer/Songwriter"]
blues_ind = y_genre[y_genre["updated_genre"] == "Blues"]

In [14]:
#pull paths for genre

reggae_paths = ["../image/"+ str(i+1)+".jpg" for i in reggae_ind.index]
world_paths = ["../image/"+ str(i+1)+".jpg" for i in world_ind.index]
dance_paths = ["../image/"+ str(i+1)+".jpg" for i in dance_ind.index]
rock_paths = ["../image/"+ str(i+1)+".jpg" for i in rock_ind.index]
alt_paths = ["../image/"+ str(i+1)+".jpg" for i in alt_ind.index]
pop_paths = ["../image/"+ str(i+1)+".jpg" for i in pop_ind.index]
hiphop_paths = ["../image/"+ str(i+1)+".jpg" for i in hiphop_ind.index]
jazz_paths = ["../image/"+ str(i+1)+".jpg" for i in jazz_ind.index]
rb_paths = ["../image/"+ str(i+1)+".jpg" for i in rb_ind.index]
electronic_paths = ["../image/"+ str(i+1)+".jpg" for i in electronic_ind.index]
country_paths = ["../image/"+ str(i+1)+".jpg" for i in country_ind.index]
sing_paths = ["../image/"+ str(i+1)+".jpg" for i in sing_ind.index]
blues_paths = ["../image/"+ str(i+1)+".jpg" for i in blues_ind.index]

In [15]:
#shuffle full list

random.shuffle(reggae_paths)
random.shuffle(world_paths)
random.shuffle(dance_paths)
random.shuffle(rock_paths)
random.shuffle(alt_paths)
random.shuffle(pop_paths)
random.shuffle(hiphop_paths)
random.shuffle(jazz_paths)
random.shuffle(rb_paths)
random.shuffle(electronic_paths)
random.shuffle(country_paths)
random.shuffle(sing_paths)
random.shuffle(blues_paths)

### Create blended images of 250

Utilized this method of blending images for python: https://github.com/mexitek/python-image-averaging
        
The method works by moving all files to blend into a folder, and outputting images into another folder.

The following code, genre by genre, and decade by decade, moves 250 random images for each target into the folder. 

In [348]:
for path in pre2000_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [349]:
for path in post2000_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [350]:
for path in reggae_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [351]:
for path in world_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [333]:
for path in dance_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [334]:
for path in rock_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [335]:
for path in alt_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [336]:
for path in pop_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [337]:
for path in hiphop_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [338]:
for path in jazz_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [339]:
for path in rb_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [340]:
for path in electronic_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [341]:
for path in country_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [342]:
for path in sing_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

In [343]:
for path in blues_paths[:250]:
    shutil.copy2(path, "../../../python-image-averaging/source/")

### Final Images

Located in this folder, "blended_images"