*	In neural style transfer, the parameters are already set (pre-trained).
* When we are talking about style transfer the parameters are already set, in particular, they are the parameters of a pre-trained neural network like VGG. 
* This is why the idea of transfer learning, pretraining and making use of previous models becomes our friend. So in style transfer w or theta if you want to call it is not found. It has already been found via some other tasks like doing classification on imagenet.


### Artistic style transfer (aka neural style transfer)
* enables us to transform ordinary images into masterpieces. 
* what this is is that it is a combination of some deep learning techniques such as convolutional neural networks, transfer learning and auto-encoders.
* It's  theoretical background is somewhat hard to grasp and so it implementations can be a little bit complex to understand. In this lecture, we will tackle both the background of style transfer and apply it from scratch.

* this technique of deep learning is not a typical neural networks operation. 
* we know that ypical neural networks tune weights based on input and output pairs. 
* Here though, we will use pre-trained network and will never update weights. 
* We will update the  inputs instead.

 * We shall be using VGG model as a pre-trained network. 

### What is a Pre-trained Model?
* well, a pre-trained model is a model created by some one else to solve problem. Instead of building a model from scratch to solve a similar problem, you use the model trained on other problem as a starting point.

* A pre-trained model may not be 100% accurate in your application, but it saves huge efforts required to re-invent the wheel. Let me show this to you with a recent example. It is also a time saver

* I will be using VGG16 model which is pre-trained on the ImageNet dataset and provided in the keras library for use. 

In [None]:
'''
Optional parameters:

--iter, To specify the number of iterations \
the style transfer takes place (Default is 10)
--content_weight, The weight given to the content loss (Default is 0.025)
--style_weight, The weight given to the style loss (Default is 1.0)
--tv_weight, The weight given to the total variation loss (Default is 1.0)
'''

In [None]:
"""
- The total variation loss imposes local spatial continuity between
the pixels of the combination image, giving it visual coherence.

- The style loss is where the deep learning keeps in --that one is defined
using a deep convolutional neural network.
Precisely, it consists in a sum of
L2 distances between the Gram matrices of the representations of
the base image and the style reference image, extracted from
different layers of a convnet (trained on ImageNet). 

The general idea is to capture color/texture information at different spatial
scales (fairly large scales --defined by the depth of the layer considered).

- The content loss is a L2 distance between the features of the base
image (extracted from a deep layer) and the features of the combination image,
keeping the generated image close enough to the original one.
"""

In [2]:
from keras.preprocessing.image import load_img, save_img, img_to_array
import numpy as np
from scipy.optimize import fmin_l_bfgs_b
import time
from keras.applications import vgg16
from keras.applications.imagenet_utils import preprocess_input
from keras import backend as K
import tensorflow as tf
import keras


### Images
* so we are going to transfer the style of an image to another one.
* The image we would like to transform is called content image whereas the image we would like to transfer its style is called style image. 
* Then, style image’s brush strokes would be reflected to content image and this new image is called as generated image.

In [3]:
base_image_path = 'katy_tailor.jpg'
style_reference_image_path = 'style.jpg'

iterations = 1

* we provide the Content and style images.
* You might remember that we must initialize weights randomly in neural networks.
* But here, generated image will be initialized randomly instead of weights.
* You understand by now that this application is not a typical neural networks.
* Let’s construct the code for reading content and style images, generating random image for generated image.

In [4]:
img_nrows = 400; img_ncols = 400
# dimensions of the generated picture.
# util function to open, resize and format pictures into appropriate tensors

In [5]:
def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_nrows, img_ncols))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = preprocess_input(img)
    return img

In [6]:
# util function to convert a tensor into a valid image
def deprocess_image(x):
    x = x.reshape((img_nrows, img_ncols, 3))
    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x


In [7]:
# get tensor representations of our images
base_image = K.variable(preprocess_image(base_image_path))

x = K.variable(preprocess_image(base_image_path))
style_reference_image = K.variable(preprocess_image(style_reference_image_path))


* We need to work on 2D matrices to calculate gram matrix.
* Basically batch flatten command transforms n dimensional matrix to 2 dimensional.
* Notice that the structure of VGG network. For istance, size of 3rd convolution layer is (56×56)x256.
* Here, 256 refers to number of filters in that layer. If shape of the layer transformed to 256x56x56, 56×56 sized matrices put alongside. Permute dimensions function will help us to organize matrices before flattening.

In [8]:
random_pixels = np.random.randint(256, size=(img_nrows, img_ncols, 3))
combination_image = preprocess_input(np.expand_dims(random_pixels, axis=0))
# this will contain our generated image
combination_image = K.variable(combination_image)

* Normally, python stores images in 3D numpy array (1 dimension for RGB codes). 
* However, VGG network designed to work with 4D inputs. If you transfer 3D numpy array to its input, you’ll run into an error called  exception layer “block1_conv1: expected ndim=4, found ndim=3“. 
* That’s why, we have added expand dimensions command in preprocessing step. This command will add a dummy dimension to handle this fault.
* Additionally, our input features we are supplying to the VGG network is 400x400x3. That is why, content, style and generated images are size of 400 by 400.

### Network
* Now, we are going to transfer those images to VGG network as input features.
* But, we need outputs of some layers instead of output of network.
* Fortunately, Keras offers winning CNN models as out-of-the-box function for us to use.

In [9]:
content_model = vgg16.VGG16(input_tensor=base_image, weights='imagenet', include_top=False)
style_model = vgg16.VGG16(input_tensor=style_reference_image, weights='imagenet', include_top=False)

# build the VGG16 network with our 3 images as input
# the model will be loaded with pre-trained ImageNet weights
generated_model = vgg16.VGG16(input_tensor=combination_image, weights='imagenet', include_top=False)



#### Loss
* We will store loss value twice, one for content and one for style. In typical neural networks, loss value is calculated by comparing actual output and model output (prediction). 
* Here, we will compare compressed presentations of auto-encoded images. Please remember that auto-encoded compressed representations are actually outputs of some middle layers. Let’s store each output of a layer and layer name once network is run.

In [10]:
# get the symbolic outputs of each "key" layer (we gave them unique names).
content_outputs = dict([(layer.name, layer.output) for layer in content_model.layers])
style_outputs = dict([(layer.name, layer.output) for layer in style_model.layers])
# build the VGG16 network with our 3 images as input
# the model will be loaded with pre-trained ImageNet weights
generated_outputs = dict([(layer.name, layer.output) for layer in generated_model.layers])


### Content loss
* We’ll transfer randomly generated image and content image to same VGG network.
* Original work uses 5th block’s 2nd convolution layer (block5_conv2) to calculate content loss. 
* This is not a must, you might use different layer to compress images in your work.

In [11]:
# an auxiliary loss function
# designed to maintain the "content" of the
# base image in the generated image
def content_loss(base, combination):
    return K.sum(K.square(combination - base))
# combine these loss functions into a single scalar
loss = K.variable(0)

base_image_features = content_outputs['block5_conv2'][0]
combination_features = generated_outputs['block5_conv2'][0]
contentloss = content_loss(base_image_features, combination_features)

feature_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']


* We have already transfer both content and generated images to VGG network in previous step. 
* We can calculate content loss as squared difference of outputs of same layer for both content and generated one.

* Here, finding distances between gram matrices is expected. Gram matrix can  be calculated by multiplying a matrix with its transposed version

In [12]:
# the gram matrix of an image tensor (feature-wise outer product)
def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram

### Style loss
* This loss type is a little bit harder to calculate. Firstly, we will compare first 5 layer’s outputs
* Now, we can calculate style loss

In [13]:
# the "style loss" is designed to maintain
# the style of the reference image in the generated image.
# It is based on the gram matrices (which capture style) of
# feature maps from the style reference image
# and from the generated image

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_nrows * img_ncols
    return K.sum(K.square(S - C)) / (4. * (pow(channels,2)) * (pow(size,2)))


styleloss = K.variable(0)

for layer_name in feature_layers:
    style_reference_features = style_outputs[layer_name][0]
    combination_features = generated_outputs[layer_name][0]
    styleloss = styleloss + style_loss(style_reference_features, combination_features)


### Total loss
* We have calculated both content and style loss. We can calculate total loss right now.

In [14]:
alpha = 0.025; beta = 0.2
loss = alpha * contentloss + beta * styleloss

### Gradient Descent
* Total loss is reflected to all weights backwardly in a back propagation algorithm.
* Derivative of total error with respect to the each weight is calculated in neural networks learning procedure.
* This calculation is also called as gradient calculation. 
* In style transfer, we need gradients with respect to the input instead of weights.

In [15]:
# get the gradients of the generated image wrt the loss
grads = K.gradients(loss, combination_image)

outputs = [loss]
outputs += grads


* In this way, (1, 400, 400, 3) shaped tensor will be calculated as gradients. 
* Just like our images. Now, we will update input of generated image instead of weights.

In [16]:
f_outputs = K.function([combination_image], outputs)

def eval_loss_and_grads(x):
    x = x.reshape((1, img_nrows, img_ncols, 3))
    outs = f_outputs([x])
    loss_value = outs[0]
    if len(outs[1:]) == 1:
        grad_values = outs[1].flatten().astype('float64')
    else:
        grad_values = np.array(outs[1:]).flatten().astype('float64')
    return loss_value, grad_values


In [17]:

# this Evaluator class makes it possible
# to compute loss and gradients in one pass
# while retrieving them via two separate functions,
# "loss" and "grads". This is done because scipy.optimize
# requires separate functions for loss and gradients,
# but computing them separately would be inefficient.
class Evaluator(object):

    def __init__(self):
        self.loss_value = None
        self.grads_values = None

    def loss(self, x):
        assert self.loss_value is None
        loss_value, grad_values = eval_loss_and_grads(x)
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values


In [18]:
evaluator = Evaluator()


In [19]:

# run scipy-based optimization (L-BFGS) over the pixels of the generated image
# so as to minimize the neural style loss
x = preprocess_image(base_image_path)
# this will run the network according to the number of iterations set previously, it is said to work
#well with 10 iterations, however I am just running it with 2 for now
for i in range(0,iterations):
    print("epoch ",i)
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x.flatten(), fprime=evaluator.grads, maxfun=20)
    img = deprocess_image(x.copy())
    fname = 'generated_%d.png' % i
    save_img(fname, img)# save current generated image

epoch  0
epoch  1
