To make TensorLayer simple, we minimize the number of layer classes as much as we can. So we encourage you to use TensorFlow's function. For example, we provide layer for local response normalization, but we still suggest users to apply tf.nn.lrn
on network.outputs
. More functions can be found in TensorFlow API.
All TensorLayer layers have a number of properties in common:
layer.outputs
: a Tensor, the outputs of current layer.layer.all_params
: a list of Tensor, all network variables in order.layer.all_layers
: a list of Tensor, all network outputs in order.layer.all_drop
: a dictionary of {placeholder : float}, all keeping probabilities of noise layer.
All TensorLayer layers have a number of methods in common:
layer.print_params()
: print the network variables information in order (aftertl.layers.initialize_global_variables(sess)
). alternatively, print all variables bytl.layers.print_all_variables()
.layer.print_layers()
: print the network layers information in order.layer.count_params()
: print the number of parameters in the network.
The initialization of a network is done by input layer, then we can stacked layers as follow, a network is a Layer
class. The most important properties of a network are network.all_params
, network.all_layers
and network.all_drop
. The all_params
is a list which store all pointers of all network parameters in order, the following script define a 3 layer network, then:
all_params
= [W1, b1, W2, b2, W_out, b_out]
To get specified variables, you can use network.all_params[2:3]
or get_variables_with_name()
. As the all_layers
is a list which store all pointers of the outputs of all layers, in the following network:
all_layers
= [drop(?,784), relu(?,800), drop(?,800), relu(?,800), drop(?,800)], identity(?,10)]
where ?
reflects any batch size. You can print the layer information and parameters information by using network.print_layers()
and network.print_params()
. To count the number of parameters in a network, run network.count_params()
.
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
y_ = tf.placeholder(tf.int64, shape=[None, ], name='y_')
network = tl.layers.InputLayer(x, name='input_layer')
network = tl.layers.DropoutLayer(network, keep=0.8, name='drop1')
network = tl.layers.DenseLayer(network, n_units=800,
act = tf.nn.relu, name='relu1')
network = tl.layers.DropoutLayer(network, keep=0.5, name='drop2')
network = tl.layers.DenseLayer(network, n_units=800,
act = tf.nn.relu, name='relu2')
network = tl.layers.DropoutLayer(network, keep=0.5, name='drop3')
network = tl.layers.DenseLayer(network, n_units=10,
act = tl.activation.identity,
name='output_layer')
y = network.outputs
y_op = tf.argmax(tf.nn.softmax(y), 1)
cost = tl.cost.cross_entropy(y, y_)
train_params = network.all_params
train_op = tf.train.AdamOptimizer(learning_rate, beta1=0.9, beta2=0.999,
epsilon=1e-08, use_locking=False).minimize(cost, var_list = train_params)
tl.layers.initialize_global_variables(sess)
network.print_params()
network.print_layers()
In addition, network.all_drop
is a dictionary which stores the keeping probabilities of all noise layer. In the above network, they are the keeping probabilities of dropout layers.
So for training, enable all dropout layers as follow.
feed_dict = {x: X_train_a, y_: y_train_a}
feed_dict.update( network.all_drop )
loss, _ = sess.run([cost, train_op], feed_dict=feed_dict)
feed_dict.update( network.all_drop )
For evaluating and testing, disable all dropout layers as follow.
feed_dict = {x: X_val, y_: y_val}
feed_dict.update(dp_dict)
print(" val loss: %f" % sess.run(cost, feed_dict=feed_dict))
print(" val acc: %f" % np.mean(y_val ==
sess.run(y_op, feed_dict=feed_dict)))
For more details, please read the MNIST examples on Github.
Before creating your own TensorLayer layer, let's have a look at Dense layer. It creates a weights matrix and biases vector if not exists, then implement the output expression. At the end, as a layer with parameter, we also need to append the parameters into all_params
.
class DenseLayer(Layer):
"""
The :class:`DenseLayer` class is a fully connected layer.
Parameters
----------
layer : a :class:`Layer` instance
The `Layer` class feeding into this layer.
n_units : int
The number of units of the layer.
act : activation function
The function that is applied to the layer activations.
W_init : weights initializer
The initializer for initializing the weight matrix.
b_init : biases initializer
The initializer for initializing the bias vector. If None, skip biases.
W_init_args : dictionary
The arguments for the weights tf.get_variable.
b_init_args : dictionary
The arguments for the biases tf.get_variable.
name : a string or None
An optional name to attach to this layer.
"""
def __init__(
self,
layer = None,
n_units = 100,
act = tf.nn.relu,
W_init = tf.truncated_normal_initializer(stddev=0.1),
b_init = tf.constant_initializer(value=0.0),
W_init_args = {},
b_init_args = {},
name ='dense_layer',
):
Layer.__init__(self, name=name)
self.inputs = layer.outputs
if self.inputs.get_shape().ndims != 2:
raise Exception("The input dimension must be rank 2")
n_in = int(self.inputs._shape[-1])
self.n_units = n_units
print(" tensorlayer:Instantiate DenseLayer %s: %d, %s" % (self.name, self.n_units, act))
with tf.variable_scope(name) as vs:
W = tf.get_variable(name='W', shape=(n_in, n_units), initializer=W_init, **W_init_args )
if b_init:
b = tf.get_variable(name='b', shape=(n_units), initializer=b_init, **b_init_args )
self.outputs = act(tf.matmul(self.inputs, W) + b)#, name=name)
else:
self.outputs = act(tf.matmul(self.inputs, W))
# Hint : list(), dict() is pass by value (shallow).
self.all_layers = list(layer.all_layers)
self.all_params = list(layer.all_params)
self.all_drop = dict(layer.all_drop)
self.all_layers.extend( [self.outputs] )
if b_init:
self.all_params.extend( [W, b] )
else:
self.all_params.extend( [W] )
To implement a custom layer in TensorLayer, you will have to write a Python class that subclasses Layer and implement the outputs
expression.
The following is an example implementation of a layer that multiplies its input by 2:
class DoubleLayer(Layer):
def __init__(
self,
layer = None,
name ='double_layer',
):
Layer.__init__(self, name=name)
self.inputs = layer.outputs
self.outputs = self.inputs * 2
self.all_layers = list(layer.all_layers)
self.all_params = list(layer.all_params)
self.all_drop = dict(layer.all_drop)
self.all_layers.extend( [self.outputs] )
Greedy layer-wise pretraining is an important task for deep neural network initialization, while there are many kinds of pre-training methods according to different network architectures and applications.
For example, the pre-train process of Vanilla Sparse Autoencoder can be implemented by using KL divergence (for sigmoid) as the following code, but for Deep Rectifier Network, the sparsity can be implemented by using the L1 regularization of activation output.
# Vanilla Sparse Autoencoder
beta = 4
rho = 0.15
p_hat = tf.reduce_mean(activation_out, reduction_indices = 0)
KLD = beta * tf.reduce_sum( rho * tf.log(tf.div(rho, p_hat))
+ (1- rho) * tf.log((1- rho)/ (tf.sub(float(1), p_hat))) )
There are many pre-train methods, for this reason, TensorLayer provides a simple way to modify or design your own pre-train method. For Autoencoder, TensorLayer uses ReconLayer.__init__()
to define the reconstruction layer and cost function, to define your own cost function, just simply modify the self.cost
in ReconLayer.__init__()
. To creat your own cost expression please read Tensorflow Math. By default, ReconLayer
only updates the weights and biases of previous 1 layer by using self.train_params = self.all _params[-4:]
, where the 4 parameters are [W_encoder, b_encoder, W_decoder, b_decoder]
, where W_encoder, b_encoder
belong to previous DenseLayer, W_decoder, b_decoder
belong to this ReconLayer. In addition, if you want to update the parameters of previous 2 layers at the same time, simply modify [-4:]
to [-6:]
.
ReconLayer.__init__(...):
...
self.train_params = self.all_params[-4:]
...
self.cost = mse + L1_a + L2_w
tensorlayer.layers
get_variables_with_name get_layers_with_name set_name_reuse print_all_variables initialize_global_variables
Layer
InputLayer OneHotInputLayer Word2vecEmbeddingInputlayer EmbeddingInputlayer
DenseLayer ReconLayer DropoutLayer GaussianNoiseLayer DropconnectDenseLayer
Conv1dLayer Conv2dLayer DeConv2dLayer Conv3dLayer DeConv3dLayer PoolLayer Padlayer UpSampling2dLayer DownSampling2dLayer AtrousConv2dLayer SeparableConv2dLayer
Conv1d Conv2d DeConv2d
MaxPool1d MeanPool1d MaxPool2d MeanPool2d MaxPool3d MeanPool3d
BatchNormLayer LocalResponseNormLayer
TimeDistributedLayer
RNNLayer BiRNNLayer advanced_indexing_op retrieve_seq_length_op retrieve_seq_length_op2 DynamicRNNLayer BiDynamicRNNLayer
Seq2Seq PeekySeq2Seq AttentionSeq2Seq
FlattenLayer ReshapeLayer LambdaLayer
ConcatLayer ElementwiseLayer
ExpandDimsLayer TileLayer
EstimatorLayer SlimNetsLayer KerasLayer
PReluLayer
MultiplexerLayer
EmbeddingAttentionSeq2seqWrapper
flatten_reshape clear_layers_name initialize_rnn_state list_remove_repeat
These functions help you to reuse parameters for different inference (graph), and get a list of parameters by given name. About TensorFlow parameters sharing click here.
get_variables_with_name
get_layers_with_name
set_name_reuse
print_all_variables
initialize_global_variables
Layer
InputLayer
OneHotInputLayer
Word2vecEmbeddingInputlayer
EmbeddingInputlayer
DenseLayer
ReconLayer
DropoutLayer
GaussianNoiseLayer
DropconnectDenseLayer
Conv1dLayer
Conv2dLayer
DeConv2dLayer
Conv3dLayer
DeConv3dLayer
UpSampling2dLayer
DownSampling2dLayer
AtrousConv2dLayer
2D Separable convolutional layer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autoclass:: SeparableConv2dLayer
For users don't familiar with TensorFlow, the following simplified functions may easier for you. We will provide more simplified functions later, but if you are good at TensorFlow, the professional APIs may better for you.
Conv1d
Conv2d
DeConv2d
MaxPool1d
MeanPool1d
MaxPool2d
MeanPool2d
MaxPool3d
MeanPool3d
Pooling layer for any dimensions and any pooling functions.
PoolLayer
Padding layer for any modes.
PadLayer
For local response normalization as it does not have any weights and arguments, you can also apply tf.nn.lrn
on network.outputs
.
BatchNormLayer
LocalResponseNormLayer
TimeDistributedLayer
All recurrent layers can implement any type of RNN cell by feeding different cell function (LSTM, GRU etc).
RNNLayer
BiRNNLayer
These operations usually be used inside Dynamic RNN layer, they can compute the sequence lengths for different situation and get the last RNN outputs by indexing.
advanced_indexing_op
retrieve_seq_length_op
retrieve_seq_length_op2
DynamicRNNLayer
BiDynamicRNNLayer
Seq2Seq
PeekySeq2Seq
AttentionSeq2Seq
FlattenLayer
ReshapeLayer
LambdaLayer
ConcatLayer
ElementwiseLayer
ExpandDimsLayer
TileLayer
EstimatorLayer
Yes ! TF-Slim models can be connected into TensorLayer, all Google's Pre-trained model can be used easily , see Slim-model .
SlimNetsLayer
Yes ! Keras models can be connected into TensorLayer! see tutorial_keras.py .
KerasLayer
PReluLayer
MultiplexerLayer
EmbeddingAttentionSeq2seqWrapper
flatten_reshape
clear_layers_name
initialize_rnn_state
list_remove_repeat