About Keras models

There are two types of models available in Keras: the Sequential model and the Model class used with functional API.

These models have a number of methods in common:

model.summary(): prints a summary representation of your model.
model.get_config(): returns a dictionary containing the configuration of the model. The model can be reinstantiated from its config via:
config = model.get_config()
model = Model.from_config(config)
# or, for Sequential:
model = Sequential.from_config(config)
model.get_weights(): returns a list of all weight tensors in the model, as Numpy arrays.
model.set_weights(weights): sets the values of the weights of the model, from a list of Numpy arrays. The arrays in the list should have the same shape as those returned by get_weights().
model.to_json(): returns a representation of the model as a JSON string. Note that the representation does not include the weights, only the architecture. You can reinstantiate the same model (with reinitialized weights) from the JSON string via:
from models import model_from_json

json_string = model.to_json()
model = model_from_json(json_string)
model.to_yaml(): returns a representation of the model as a YAML string. Note that the representation does not include the weights, only the architecture. You can reinstantiate the same model (with reinitialized weights) from the YAML string via:
from models import model_from_yaml

yaml_string = model.to_yaml()
model = model_from_yaml(yaml_string)
model.save_weights(filepath): saves the weights of the model as a HDF5 file.
model.load_weights(filepath, by_name=False): loads the weights of the model from a HDF5 file (created by  save_weights). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use by_name=True to load only those layers with the same name.

In [None]:
About Keras layers

All Keras layers have a number of methods in common:

layer.get_weights(): returns the weights of the layer as a list of Numpy arrays.
layer.set_weights(weights): sets the weights of the layer from a list of Numpy arrays (with the same shapes as the output of get_weights).
layer.get_config(): returns a dictionary containing the configuration of the layer. The layer can be reinstantiated from its config via:
from keras.utils.layer_utils import layer_from_config

config = layer.get_config()
layer = layer_from_config(config)
If a layer has a single node (i.e. if it isn't a shared layer), you can get its input tensor, output tensor, input shape and output shape via:

layer.input
layer.output
layer.input_shape
layer.output_shape
If the layer has multiple nodes (see: the concept of layer node and shared layers), you can use the following methods:

layer.get_input_at(node_index)
layer.get_output_at(node_index)
layer.get_input_shape_at(node_index)
layer.get_output_shape_at(node_index)

In [None]:
Dense

keras.layers.core.Dense(output_dim, init='glorot_uniform', activation=None, weights=None, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None)
Just your regular fully connected NN layer.

Example

# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_dim=16))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)

# this is equivalent to the above:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))

# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))
Arguments

output_dim: int > 0.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of Numpy arrays to set as initial weights. The list should have 2 elements, of shape (input_dim, output_dim) and (output_dim,) for weights and biases respectively.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
bias: whether to include a bias (i.e. make the layer affine rather than linear).
input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword argument input_shape) is required when using this layer as the first layer in a model.
Input shape

nD tensor with shape: (nb_samples, ..., input_dim). The most common situation would be a 2D input with shape (nb_samples, input_dim).

Output shape

nD tensor with shape: (nb_samples, ..., output_dim). For instance, for a 2D input with shape  (nb_samples, input_dim), the output would have shape (nb_samples, output_dim).

[source]

Activation

keras.layers.core.Activation(activation)
Applies an activation function to an output.

Arguments

activation: name of activation function to use
(see: activations), or alternatively, a Theano or TensorFlow operation.
Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as input.

[source]

Dropout

keras.layers.core.Dropout(p)
Applies Dropout to the input. Dropout consists in randomly setting a fraction p of input units to 0 at each update during training time, which helps prevent overfitting.

Arguments

p: float between 0 and 1. Fraction of the input units to drop.
References

Dropout: A Simple Way to Prevent Neural Networks from Overfitting
[source]

Flatten

keras.layers.core.Flatten()
Flattens the input. Does not affect the batch size.

Example

model = Sequential()
model.add(Convolution2D(64, 3, 3,
            border_mode='same',
            input_shape=(3, 32, 32)))
# now: model.output_shape == (None, 64, 32, 32)

model.add(Flatten())
# now: model.output_shape == (None, 65536)
[source]

Reshape

keras.layers.core.Reshape(target_shape)
Reshapes an output to a certain shape.

Arguments

target_shape: target shape. Tuple of integers, does not include the samples dimension (batch size).
Input shape

Arbitrary, although all dimensions in the input shaped must be fixed. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

(batch_size,) + target_shape

Example

# as first layer in a Sequential model
model = Sequential()
model.add(Reshape((3, 4), input_shape=(12,)))
# now: model.output_shape == (None, 3, 4)
# note: `None` is the batch dimension

# as intermediate layer in a Sequential model
model.add(Reshape((6, 2)))
# now: model.output_shape == (None, 6, 2)
[source]

Permute

keras.layers.core.Permute(dims)
Permutes the dimensions of the input according to a given pattern.

Useful for e.g. connecting RNNs and convnets together.

Example

model = Sequential()
model.add(Permute((2, 1), input_shape=(10, 64)))
# now: model.output_shape == (None, 64, 10)
# note: `None` is the batch dimension
Arguments

dims: Tuple of integers. Permutation pattern, does not include the samples dimension. Indexing starts at 1. For instance,  (2, 1) permutes the first and second dimension of the input.
Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same as the input shape, but with the dimensions re-ordered according to the specified pattern.

[source]

RepeatVector

keras.layers.core.RepeatVector(n)
Repeats the input n times.

Example

model = Sequential()
model.add(Dense(32, input_dim=32))
# now: model.output_shape == (None, 32)
# note: `None` is the batch dimension

model.add(RepeatVector(3))
# now: model.output_shape == (None, 3, 32)
Arguments

n: integer, repetition factor.
Input shape

2D tensor of shape (nb_samples, features).

Output shape

3D tensor of shape (nb_samples, n, features).

[source]

Merge

keras.engine.topology.Merge(layers=None, mode='sum', concat_axis=-1, dot_axes=-1, output_shape=None, output_mask=None, arguments=None, node_indices=None, tensor_indices=None, name=None)
A Merge layer can be used to merge a list of tensors into a single tensor, following some merge mode.

Example

model1 = Sequential()
model1.add(Dense(32, input_dim=32))

model2 = Sequential()
model2.add(Dense(32, input_dim=32))

merged_model = Sequential()
merged_model.add(Merge([model1, model2], mode='concat', concat_axis=1))
Arguments

layers: Can be a list of Keras tensors or a list of layer instances. Must be more than one layer/tensor.
mode: String or lambda/function. If string, must be one
of: 'sum', 'mul', 'concat', 'ave', 'cos', 'dot', 'max'. If lambda/function, it should take as input a list of tensors and return a single tensor.
concat_axis: Integer, axis to use in mode concat.
dot_axes: Integer or tuple of integers, axes to use in mode dot or cos.
output_shape: Either a shape tuple (tuple of integers), or a lambda/function to compute output_shape (only if merge mode is a lambda/function). If the argument is a tuple, it should be expected output shape, not including the batch size (same convention as the input_shape argument in layers). If the argument is callable, it should take as input a list of shape tuples
(1:1 mapping to input tensors) and return a single shape tuple, including the batch size (same convention as the get_output_shape_for method of layers).
node_indices: Optional list of integers containing the output node index for each input layer (in case some input layers have multiple output nodes). will default to an array of 0s if not provided.
tensor_indices: Optional list of indices of output tensors to consider for merging (in case some input layer node returns multiple tensors).
output_mask: Mask or lambda/function to compute the output mask (only if merge mode is a lambda/function). If the latter case, it should take as input a list of masks and return a single mask.
[source]

Lambda

keras.layers.core.Lambda(function, output_shape=None, arguments=None)
Used for evaluating an arbitrary Theano / TensorFlow expression on the output of the previous layer.

Examples

# add a x -> x^2 layer
model.add(Lambda(lambda x: x ** 2))
# add a layer that returns the concatenation
# of the positive part of the input and
# the opposite of the negative part

def antirectifier(x):
    x -= K.mean(x, axis=1, keepdims=True)
    x = K.l2_normalize(x, axis=1)
    pos = K.relu(x)
    neg = K.relu(-x)
    return K.concatenate([pos, neg], axis=1)

def antirectifier_output_shape(input_shape):
    shape = list(input_shape)
    assert len(shape) == 2  # only valid for 2D tensors
    shape[-1] *= 2
    return tuple(shape)

model.add(Lambda(antirectifier,
         output_shape=antirectifier_output_shape))
Arguments

function: The function to be evaluated. Takes input tensor as first argument.
output_shape: Expected output shape from function. Can be a tuple or function. If a tuple, it only specifies the first dimension onward; sample dimension is assumed either the same as the input: output_shape = (input_shape[0], ) + output_shape or, the input is None and the sample dimension is also None: output_shape = (None, ) + output_shape If a function, it specifies the entire shape as a function of the input shape:  output_shape = f(input_shape)
arguments: optional dictionary of keyword arguments to be passed to the function.
Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Specified by output_shape argument.

[source]

ActivityRegularization

keras.layers.core.ActivityRegularization(l1=0.0, l2=0.0)
Layer that returns its input unchanged, but applies an update to the cost function based on the activity of the input.

Arguments

l1: L1 regularization factor (positive float).
l2: L2 regularization factor (positive float).
Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as input.

[source]

Masking

keras.layers.core.Masking(mask_value=0.0)
Masks an input sequence by using a mask value to identify timesteps to be skipped.

For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will masked (skipped) in all downstream layers (as long as they support masking).

If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.

Example

Consider a Numpy data array x of shape (samples, timesteps, features), to be fed to a LSTM layer. You want to mask timestep #3 and #5 because you lack data for these timesteps. You can:

set x[:, 3, :] = 0. and x[:, 5, :] = 0.
insert a Masking layer with mask_value=0. before the LSTM layer:
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(32))
[source]

Highway

keras.layers.core.Highway(init='glorot_uniform', activation=None, weights=None, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None)
Densely connected highway network, a natural extension of LSTMs to feedforward networks.

Arguments

init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of Numpy arrays to set as initial weights. The list should have 2 elements, of shape (input_dim, output_dim) and (output_dim,) for weights and biases respectively.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
bias: whether to include a bias (i.e. make the layer affine rather than linear).
input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword argument input_shape) is required when using this layer as the first layer in a model.
Input shape

2D tensor with shape: (nb_samples, input_dim).

Output shape

2D tensor with shape: (nb_samples, input_dim).

References

Highway Networks
[source]

MaxoutDense

keras.layers.core.MaxoutDense(output_dim, nb_feature=4, init='glorot_uniform', weights=None, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None)
A dense maxout layer.

A MaxoutDense layer takes the element-wise maximum of nb_feature Dense(input_dim, output_dim) linear layers. This allows the layer to learn a convex, piecewise linear activation function over the inputs.

Note that this is a linear layer; if you wish to apply activation function (you shouldn't need to --they are universal function approximators), an Activation layer must be added after.

Arguments

output_dim: int > 0.
nb_feature: number of Dense layers to use internally.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
weights: list of Numpy arrays to set as initial weights. The list should have 2 elements, of shape (input_dim, output_dim) and (output_dim,) for weights and biases respectively.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
bias: whether to include a bias (i.e. make the layer affine rather than linear).
input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword argument input_shape) is required when using this layer as the first layer in a model.
Input shape

2D tensor with shape: (nb_samples, input_dim).

Output shape

2D tensor with shape: (nb_samples, output_dim).

References

Maxout Networks

In [None]:
Docs » Layers » Convolutional Layers  Edit on GitHub
[source]

Convolution1D

keras.layers.convolutional.Convolution1D(nb_filter, filter_length, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample_length=1, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None, input_length=None)
Convolution operator for filtering neighborhoods of one-dimensional inputs. When using this layer as the first layer in a model, either provide the keyword argument input_dim (int, e.g. 128 for sequences of 128-dimensional vectors), or input_shape (tuple of integers, e.g. (10, 128) for sequences of 10 vectors of 128-dimensional vectors).

Example

# apply a convolution 1d of length 3 to a sequence with 10 timesteps,
# with 64 output filters
model = Sequential()
model.add(Convolution1D(64, 3, border_mode='same', input_shape=(10, 32)))
# now model.output_shape == (None, 10, 64)

# add a new conv1d on top
model.add(Convolution1D(32, 3, border_mode='same'))
# now model.output_shape == (None, 10, 32)
Arguments

nb_filter: Number of convolution kernels to use (dimensionality of the output).
filter_length: The extension (spatial or temporal) of each filter.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
subsample_length: factor by which to subsample output.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
bias: whether to include a bias (i.e. make the layer affine rather than linear).
input_dim: Number of channels/dimensions in the input. Either this argument or the keyword argument input_shapemust be provided when using this layer as the first layer in a model.
input_length: Length of input sequences, when it is constant. This argument is required if you are going to connect  Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).
Input shape

3D tensor with shape: (samples, steps, input_dim).

Output shape

3D tensor with shape: (samples, new_steps, nb_filter). steps value might have changed due to padding.

[source]

AtrousConvolution1D

keras.layers.convolutional.AtrousConvolution1D(nb_filter, filter_length, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample_length=1, atrous_rate=1, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
Atrous Convolution operator for filtering neighborhoods of one-dimensional inputs. A.k.a dilated convolution or convolution with holes. When using this layer as the first layer in a model, either provide the keyword argument  input_dim (int, e.g. 128 for sequences of 128-dimensional vectors), or input_shape (tuples of integers, e.g. (10, 128) for sequences of 10 vectors of 128-dimensional vectors).

Example

# apply an atrous convolution 1d with atrous rate 2 of length 3 to a sequence with 10 timesteps,
# with 64 output filters
model = Sequential()
model.add(AtrousConvolution1D(64, 3, atrous_rate=2, border_mode='same', input_shape=(10, 32)))
# now model.output_shape == (None, 10, 64)

# add a new atrous conv1d on top
model.add(AtrousConvolution1D(32, 3, atrous_rate=2, border_mode='same'))
# now model.output_shape == (None, 10, 32)
Arguments

nb_filter: Number of convolution kernels to use (dimensionality of the output).
filter_length: The extension (spatial or temporal) of each filter.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
subsample_length: factor by which to subsample output.
atrous_rate: Factor for kernel dilation. Also called filter_dilation elsewhere.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
bias: whether to include a bias (i.e. make the layer affine rather than linear).
input_dim: Number of channels/dimensions in the input. Either this argument or the keyword argument input_shapemust be provided when using this layer as the first layer in a model.
input_length: Length of input sequences, when it is constant. This argument is required if you are going to connect  Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).
Input shape

3D tensor with shape: (samples, steps, input_dim).

Output shape

3D tensor with shape: (samples, new_steps, nb_filter). steps value might have changed due to padding.

[source]

Convolution2D

keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='default', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
Convolution operator for filtering windows of two-dimensional inputs. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g.  input_shape=(3, 128, 128) for 128x128 RGB pictures.

Examples

# apply a 3x3 convolution with 64 output filters on a 256x256 image:
model = Sequential()
model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256)))
# now model.output_shape == (None, 64, 256, 256)

# add a 3x3 convolution on top, with 32 output filters:
model.add(Convolution2D(32, 3, 3, border_mode='same'))
# now model.output_shape == (None, 32, 256, 256)
Arguments

nb_filter: Number of convolution filters to use.
nb_row: Number of rows in the convolution kernel.
nb_col: Number of columns in the convolution kernel.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
subsample: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
bias: whether to include a bias (i.e. make the layer affine rather than linear).
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (samples, nb_filter, new_rows, new_cols) if dim_ordering='th' or 4D tensor with shape: (samples, new_rows, new_cols, nb_filter) if dim_ordering='tf'. rows and cols values might have changed due to padding.

[source]

AtrousConvolution2D

keras.layers.convolutional.AtrousConvolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1), atrous_rate=(1, 1), dim_ordering='default', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
Atrous Convolution operator for filtering windows of two-dimensional inputs. A.k.a dilated convolution or convolution with holes. When using this layer as the first layer in a model, provide the keyword argument  input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(3, 128, 128) for 128x128 RGB pictures.

Examples

# apply a 3x3 convolution with atrous rate 2x2 and 64 output filters on a 256x256 image:
model = Sequential()
model.add(AtrousConvolution2D(64, 3, 3, atrous_rate=(2,2), border_mode='valid', input_shape=(3, 256, 256)))
# now the actual kernel size is dilated from 3x3 to 5x5 (3+(3-1)*(2-1)=5)
# thus model.output_shape == (None, 64, 252, 252)
Arguments

nb_filter: Number of convolution filters to use.
nb_row: Number of rows in the convolution kernel.
nb_col: Number of columns in the convolution kernel.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
subsample: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere.
atrous_rate: tuple of length 2. Factor for kernel dilation. Also called filter_dilation elsewhere.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
bias: whether to include a bias (i.e. make the layer affine rather than linear).
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (samples, nb_filter, new_rows, new_cols) if dim_ordering='th' or 4D tensor with shape: (samples, new_rows, new_cols, nb_filter) if dim_ordering='tf'. rows and cols values might have changed due to padding.

References

Multi-Scale Context Aggregation by Dilated Convolutions
[source]

SeparableConvolution2D

keras.layers.convolutional.SeparableConvolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1), depth_multiplier=1, dim_ordering='default', depthwise_regularizer=None, pointwise_regularizer=None, b_regularizer=None, activity_regularizer=None, depthwise_constraint=None, pointwise_constraint=None, b_constraint=None, bias=True)
Separable convolution operator for 2D inputs.

Separable convolutions consist in first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes together the resulting output channels. The  depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step.

Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into two smaller kernels, or as an extreme version of an Inception block.

When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(3, 128, 128) for 128x128 RGB pictures.

Theano warning

This layer is only available with the TensorFlow backend for the time being.

Arguments

nb_filter: Number of convolution filters to use.
nb_row: Number of rows in the convolution kernel.
nb_col: Number of columns in the convolution kernel.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: 'valid' or 'same'.
subsample: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere.
depth_multiplier: how many output channel to use per input channel for the depthwise convolution step.
depthwise_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the depthwise weights matrix.
pointwise_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the pointwise weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
depthwise_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the depthwise weights matrix.
pointwise_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the pointwise weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
bias: whether to include a bias (i.e. make the layer affine rather than linear).
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (samples, nb_filter, new_rows, new_cols) if dim_ordering='th' or 4D tensor with shape: (samples, new_rows, new_cols, nb_filter) if dim_ordering='tf'. rows and cols values might have changed due to padding.

[source]

Deconvolution2D

keras.layers.convolutional.Deconvolution2D(nb_filter, nb_row, nb_col, output_shape, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='default', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
Transposed convolution operator for filtering windows of two-dimensional inputs. The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. [1]

When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(3, 128, 128) for 128x128 RGB pictures.

To pass the correct output_shape to this layer, one could use a test model to predict and observe the actual output shape.

Examples

# apply a 3x3 transposed convolution with stride 1x1 and 3 output filters on a 12x12 image:
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 14, 14), border_mode='valid', input_shape=(3, 12, 12)))
# Note that you will have to change the output_shape depending on the backend used.

# we can predict with the model and print the shape of the array.
dummy_input = np.ones((32, 3, 12, 12))
# For TensorFlow dummy_input = np.ones((32, 12, 12, 3))
preds = model.predict(dummy_input)
print(preds.shape)
# Theano GPU: (None, 3, 13, 13)
# Theano CPU: (None, 3, 14, 14)
# TensorFlow: (None, 14, 14, 3)

# apply a 3x3 transposed convolution with stride 2x2 and 3 output filters on a 12x12 image:
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 25, 25), subsample=(2, 2), border_mode='valid', input_shape=(3, 12, 12)))
model.summary()

# we can predict with the model and print the shape of the array.
dummy_input = np.ones((32, 3, 12, 12))
# For TensorFlow dummy_input = np.ones((32, 12, 12, 3))
preds = model.predict(dummy_input)
print(preds.shape)
# Theano GPU: (None, 3, 25, 25)
# Theano CPU: (None, 3, 25, 25)
# TensorFlow: (None, 25, 25, 3)
Arguments

nb_filter: Number of transposed convolution filters to use.
nb_row: Number of rows in the transposed convolution kernel.
nb_col: Number of columns in the transposed convolution kernel.
output_shape: Output shape of the transposed convolution operation. tuple of integers (nb_samples, nb_filter, nb_output_rows, nb_output_cols) Formula for calculation of the output shape [1], [2]: o = s (i - 1) + a + k - 2p, \quad a \in {0, \ldots, s - 1}
where: i - input size (rows or cols), k - kernel size (nb_filter), s - stride (subsample for rows or cols respectively), p - padding size, a - user-specified quantity used to distinguish between the s different possible output sizes. Because a is not specified explicitly and Theano and Tensorflow use different values, it is better to use a dummy input and observe the actual output shape of a layer as specified in the examples.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano/TensorFlow function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
subsample: tuple of length 2. Factor by which to oversample output. Also called strides elsewhere.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
bias: whether to include a bias (i.e. make the layer affine rather than linear).
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (samples, nb_filter, new_rows, new_cols) if dim_ordering='th' or 4D tensor with shape: (samples, new_rows, new_cols, nb_filter) if dim_ordering='tf'. rows and cols values might have changed due to padding.

References

[1] A guide to convolution arithmetic for deep learning [2] Transposed convolution arithmetic [3] Deconvolutional Networks

[source]

Convolution3D

keras.layers.convolutional.Convolution3D(nb_filter, kernel_dim1, kernel_dim2, kernel_dim3, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1, 1), dim_ordering='default', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
Convolution operator for filtering windows of three-dimensional inputs. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g.  input_shape=(3, 10, 128, 128) for 10 frames of 128x128 RGB pictures.

Arguments

nb_filter: Number of convolution filters to use.
kernel_dim1: Length of the first dimension in the convolution kernel.
kernel_dim2: Length of the second dimension in the convolution kernel.
kernel_dim3: Length of the third dimension in the convolution kernel.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of Numpy arrays to set as initial weights.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
subsample: tuple of length 3. Factor by which to subsample output. Also called strides elsewhere.
Note: 'subsample' is implemented by slicing the output of conv3d with strides=(1,1,1).
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 4. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
bias: whether to include a bias (i.e. make the layer affine rather than linear).
Input shape

5D tensor with shape: (samples, channels, conv_dim1, conv_dim2, conv_dim3) if dim_ordering='th' or 5D tensor with shape: (samples, conv_dim1, conv_dim2, conv_dim3, channels) if dim_ordering='tf'.

Output shape

5D tensor with shape: (samples, nb_filter, new_conv_dim1, new_conv_dim2, new_conv_dim3) if dim_ordering='th' or 5D tensor with shape: (samples, new_conv_dim1, new_conv_dim2, new_conv_dim3, nb_filter) if dim_ordering='tf'. new_conv_dim1, new_conv_dim2 and new_conv_dim3 values might have changed due to padding.

[source]

Cropping1D

keras.layers.convolutional.Cropping1D(cropping=(1, 1))
Cropping layer for 1D input (e.g. temporal sequence). It crops along the time dimension (axis 1).

Arguments

cropping: tuple of int (length 2) How many units should be trimmed off at the beginning and end of the cropping dimension (axis 1).
Input shape

3D tensor with shape (samples, axis_to_crop, features)

Output shape

3D tensor with shape (samples, cropped_axis, features)

[source]

Cropping2D

keras.layers.convolutional.Cropping2D(cropping=((0, 0), (0, 0)), dim_ordering='default')
Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. width and height.

Arguments

cropping: tuple of tuple of int (length 2) How many units should be trimmed off at the beginning and end of the 2 cropping dimensions (width, height).
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

4D tensor with shape: (samples, depth, first_axis_to_crop, second_axis_to_crop)

Output shape

4D tensor with shape: (samples, depth, first_cropped_axis, second_cropped_axis)

Examples

# Crop the input 2D images or feature maps
model = Sequential()
model.add(Cropping2D(cropping=((2, 2), (4, 4)), input_shape=(3, 28, 28)))
# now model.output_shape == (None, 3, 24, 20)
model.add(Convolution2D(64, 3, 3, border_mode='same))
model.add(Cropping2D(cropping=((2, 2), (2, 2))))
# now model.output_shape == (None, 64, 20, 16)

[source]

Cropping3D

keras.layers.convolutional.Cropping3D(cropping=((1, 1), (1, 1), (1, 1)), dim_ordering='default')
Cropping layer for 3D data (e.g. spatial or spatio-temporal).

Arguments

cropping: tuple of tuple of int (length 3) How many units should be trimmed off at the beginning and end of the 3 cropping dimensions (kernel_dim1, kernel_dim2, kernerl_dim3).
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 4. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

5D tensor with shape: (samples, depth, first_axis_to_crop, second_axis_to_crop, third_axis_to_crop)

Output shape

5D tensor with shape: (samples, depth, first_cropped_axis, second_cropped_axis, third_cropped_axis)

[source]

UpSampling1D

keras.layers.convolutional.UpSampling1D(length=2)
Repeat each temporal step length times along the time axis.

Arguments

length: integer. Upsampling factor.
Input shape

3D tensor with shape: (samples, steps, features).

Output shape

3D tensor with shape: (samples, upsampled_steps, features).

[source]

UpSampling2D

keras.layers.convolutional.UpSampling2D(size=(2, 2), dim_ordering='default')
Repeat the rows and columns of the data by size[0] and size[1] respectively.

Arguments

size: tuple of 2 integers. The upsampling factors for rows and columns.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (samples, channels, upsampled_rows, upsampled_cols) if dim_ordering='th' or 4D tensor with shape: (samples, upsampled_rows, upsampled_cols, channels) if dim_ordering='tf'.

[source]

UpSampling3D

keras.layers.convolutional.UpSampling3D(size=(2, 2, 2), dim_ordering='default')
Repeat the first, second and third dimension of the data by size[0], size[1] and size[2] respectively.

Arguments

size: tuple of 3 integers. The upsampling factors for dim1, dim2 and dim3.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 4. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

5D tensor with shape: (samples, channels, dim1, dim2, dim3) if dim_ordering='th' or 5D tensor with shape: (samples, dim1, dim2, dim3, channels) if dim_ordering='tf'.

Output shape

5D tensor with shape: (samples, channels, upsampled_dim1, upsampled_dim2, upsampled_dim3) if dim_ordering='th' or 5D tensor with shape: (samples, upsampled_dim1, upsampled_dim2, upsampled_dim3, channels) if dim_ordering='tf'.

[source]

ZeroPadding1D

keras.layers.convolutional.ZeroPadding1D(padding=1)
Zero-padding layer for 1D input (e.g. temporal sequence).

Arguments

padding: int, or tuple of int (length 2), or dictionary.
If int: How many zeros to add at the beginning and end of the padding dimension (axis 1).
If tuple of int (length 2) How many zeros to add at the beginning and at the end of the padding dimension, in order '(left_pad, right_pad)'.
If dictionary: should contain the keys {'left_pad', 'right_pad'}. If any key is missing, default value of 0 will be used for the missing key.
Input shape

3D tensor with shape (samples, axis_to_pad, features)

Output shape

3D tensor with shape (samples, padded_axis, features)

[source]

ZeroPadding2D

keras.layers.convolutional.ZeroPadding2D(padding=(1, 1), dim_ordering='default')
Zero-padding layer for 2D input (e.g. picture).

Arguments

padding: tuple of int (length 2), or tuple of int (length 4), or dictionary.
If tuple of int (length 2): How many zeros to add at the beginning and end of the 2 padding dimensions (rows and cols).
If tuple of int (length 4): How many zeros to add at the beginning and at the end of the 2 padding dimensions (rows and cols), in the order '(top_pad, bottom_pad, left_pad, right_pad)'.
If dictionary: should contain the keys {'top_pad', 'bottom_pad', 'left_pad', 'right_pad'}. If any key is missing, default value of 0 will be used for the missing key.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (samples, channels, padded_rows, padded_cols) if dim_ordering='th' or 4D tensor with shape: (samples, padded_rows, padded_cols, channels) if dim_ordering='tf'.

[source]

ZeroPadding3D

keras.layers.convolutional.ZeroPadding3D(padding=(1, 1, 1), dim_ordering='default')
Zero-padding layer for 3D data (spatial or spatio-temporal).

Arguments

padding: tuple of int (length 3) How many zeros to add at the beginning and end of the 3 padding dimensions (axis 3, 4 and 5). Currently only symmetric padding is supported.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 4. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

5D tensor with shape: (samples, depth, first_axis_to_pad, second_axis_to_pad, third_axis_to_pad)

Output shape

5D tensor with shape: (samples, depth, first_padded_axis, second_padded_axis, third_axis_to_pad)

In [None]:
Docs » Layers » Pooling Layers  Edit on GitHub
[source]

MaxPooling1D

keras.layers.pooling.MaxPooling1D(pool_length=2, stride=None, border_mode='valid')
Max pooling operation for temporal data.

Input shape

3D tensor with shape: (samples, steps, features).

Output shape

3D tensor with shape: (samples, downsampled_steps, features).

Arguments

pool_length: size of the region to which max pooling is applied
stride: integer, or None. factor by which to downscale. 2 will halve the input. If None, it will default to pool_length.
border_mode: 'valid' or 'same'.
[source]

MaxPooling2D

keras.layers.pooling.MaxPooling2D(pool_size=(2, 2), strides=None, border_mode='valid', dim_ordering='default')
Max pooling operation for spatial data.

Arguments

pool_size: tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the image in each dimension.
strides: tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.
border_mode: 'valid' or 'same'.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (nb_samples, channels, pooled_rows, pooled_cols) if dim_ordering='th' or 4D tensor with shape: (samples, pooled_rows, pooled_cols, channels) if dim_ordering='tf'.

[source]

MaxPooling3D

keras.layers.pooling.MaxPooling3D(pool_size=(2, 2, 2), strides=None, border_mode='valid', dim_ordering='default')
Max pooling operation for 3D data (spatial or spatio-temporal).

Arguments

pool_size: tuple of 3 integers, factors by which to downscale (dim1, dim2, dim3). (2, 2, 2) will halve the size of the 3D input in each dimension.
strides: tuple of 3 integers, or None. Strides values.
border_mode: 'valid' or 'same'.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 4. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

5D tensor with shape: (samples, channels, len_pool_dim1, len_pool_dim2, len_pool_dim3) if dim_ordering='th' or 5D tensor with shape: (samples, len_pool_dim1, len_pool_dim2, len_pool_dim3, channels) if dim_ordering='tf'.

Output shape

5D tensor with shape: (nb_samples, channels, pooled_dim1, pooled_dim2, pooled_dim3) if dim_ordering='th' or 5D tensor with shape: (samples, pooled_dim1, pooled_dim2, pooled_dim3, channels) if dim_ordering='tf'.

[source]

AveragePooling1D

keras.layers.pooling.AveragePooling1D(pool_length=2, stride=None, border_mode='valid')
Average pooling for temporal data.

Arguments

pool_length: factor by which to downscale. 2 will halve the input.
stride: integer, or None. Stride value. If None, it will default to pool_length.
border_mode: 'valid' or 'same'.
Input shape

3D tensor with shape: (samples, steps, features).

Output shape

3D tensor with shape: (samples, downsampled_steps, features).

[source]

AveragePooling2D

keras.layers.pooling.AveragePooling2D(pool_size=(2, 2), strides=None, border_mode='valid', dim_ordering='default')
Average pooling operation for spatial data.

Arguments

pool_size: tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the image in each dimension.
strides: tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.
border_mode: 'valid' or 'same'.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (nb_samples, channels, pooled_rows, pooled_cols) if dim_ordering='th' or 4D tensor with shape: (samples, pooled_rows, pooled_cols, channels) if dim_ordering='tf'.

[source]

AveragePooling3D

keras.layers.pooling.AveragePooling3D(pool_size=(2, 2, 2), strides=None, border_mode='valid', dim_ordering='default')
Average pooling operation for 3D data (spatial or spatio-temporal).

Arguments

pool_size: tuple of 3 integers, factors by which to downscale (dim1, dim2, dim3). (2, 2, 2) will halve the size of the 3D input in each dimension.
strides: tuple of 3 integers, or None. Strides values.
border_mode: 'valid' or 'same'.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 4. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

5D tensor with shape: (samples, channels, len_pool_dim1, len_pool_dim2, len_pool_dim3) if dim_ordering='th' or 5D tensor with shape: (samples, len_pool_dim1, len_pool_dim2, len_pool_dim3, channels) if dim_ordering='tf'.

Output shape

5D tensor with shape: (nb_samples, channels, pooled_dim1, pooled_dim2, pooled_dim3) if dim_ordering='th' or 5D tensor with shape: (samples, pooled_dim1, pooled_dim2, pooled_dim3, channels) if dim_ordering='tf'.

[source]

GlobalMaxPooling1D

keras.layers.pooling.GlobalMaxPooling1D()
Global max pooling operation for temporal data.

Input shape

3D tensor with shape: (samples, steps, features).

Output shape

2D tensor with shape: (samples, features).

[source]

GlobalAveragePooling1D

keras.layers.pooling.GlobalAveragePooling1D()
Global average pooling operation for temporal data.

Input shape

3D tensor with shape: (samples, steps, features).

Output shape

2D tensor with shape: (samples, features).

[source]

GlobalMaxPooling2D

keras.layers.pooling.GlobalMaxPooling2D(dim_ordering='default')
Global max pooling operation for spatial data.

Arguments

dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

2D tensor with shape: (nb_samples, channels)

[source]

GlobalAveragePooling2D

keras.layers.pooling.GlobalAveragePooling2D(dim_ordering='default')
Global average pooling operation for spatial data.

Arguments

dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3. It defaults to the image_dim_ordering value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "tf".
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

2D tensor with shape: (nb_samples, channels)

In [None]:
Docs » Layers » Locally-connected Layers  Edit on GitHub
[source]

LocallyConnected1D

keras.layers.local.LocallyConnected1D(nb_filter, filter_length, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample_length=1, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None, input_length=None)
The LocallyConnected1D layer works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. When using this layer as the first layer in a model, either provide the keyword argument input_dim (int, e.g. 128 for sequences of 128-dimensional vectors), or input_shape (tuple of integers, e.g. input_shape=(10, 128) for sequences of 10 vectors of 128-dimensional vectors). Also, note that this layer can only be used with a fully-specified input shape (None dimensions not allowed).

Example

# apply a unshared weight convolution 1d of length 3 to a sequence with
# 10 timesteps, with 64 output filters
model = Sequential()
model.add(LocallyConnected1D(64, 3, input_shape=(10, 32)))
# now model.output_shape == (None, 8, 64)
# add a new conv1d on top
model.add(LocallyConnected1D(32, 3))
# now model.output_shape == (None, 6, 32)
Arguments

nb_filter: Dimensionality of the output.
filter_length: The extension (spatial or temporal) of each filter.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: Only support 'valid'. Please make good use of ZeroPadding1D to achieve same output length.
subsample_length: factor by which to subsample output.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
bias: whether to include a bias (i.e. make the layer affine rather than linear).
input_dim: Number of channels/dimensions in the input. Either this argument or the keyword argument input_shapemust be provided when using this layer as the first layer in a model.
input_length: Length of input sequences, when it is constant. This argument is required if you are going to connect  Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).
Input shape

3D tensor with shape: (samples, steps, input_dim).

Output shape

3D tensor with shape: (samples, new_steps, nb_filter). steps value might have changed due to padding.

[source]

LocallyConnected2D

keras.layers.local.LocallyConnected2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='default', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
The LocallyConnected2D layer works similarly to the Convolution2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(3, 128, 128) for 128x128 RGB pictures. Also, note that this layer can only be used with a fully-specified input shape (None dimensions not allowed).

Examples

# apply a 3x3 unshared weights convolution with 64 output filters on a 32x32 image:
model = Sequential()
model.add(LocallyConnected2D(64, 3, 3, input_shape=(3, 32, 32)))
# now model.output_shape == (None, 64, 30, 30)
# notice that this layer will consume (30*30)*(3*3*3*64) + (30*30)*64 parameters

# add a 3x3 unshared weights convolution on top, with 32 output filters:
model.add(LocallyConnected2D(32, 3, 3))
# now model.output_shape == (None, 32, 28, 28)
Arguments

nb_filter: Number of convolution filters to use.
nb_row: Number of rows in the convolution kernel.
nb_col: Number of columns in the convolution kernel.
init: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
activation: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
border_mode: Only support 'valid'. Please make good use of ZeroPadding2D to achieve same output shape.
subsample: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of WeightRegularizer, applied to the bias.
activity_regularizer: instance of ActivityRegularizer, applied to the network output.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the constraints module, applied to the bias.
dim_ordering: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
bias: whether to include a bias (i.e. make the layer affine rather than linear).
Input shape

4D tensor with shape: (samples, channels, rows, cols) if dim_ordering='th' or 4D tensor with shape: (samples, rows, cols, channels) if dim_ordering='tf'.

Output shape

4D tensor with shape: (samples, nb_filter, new_rows, new_cols) if dim_ordering='th' or 4D tensor with shape: (samples, new_rows, new_cols, nb_filter) if dim_ordering='tf'. rows and cols values might have changed due to padding.

In [None]:
Docs » Layers » Recurrent Layers  Edit on GitHub
[source]

Recurrent

keras.layers.recurrent.Recurrent(weights=None, return_sequences=False, go_backwards=False, stateful=False, unroll=False, consume_less='cpu', input_dim=None, input_length=None)
Abstract base class for recurrent layers. Do not use in a model -- it's not a valid layer! Use its children classes LSTM,  GRU and SimpleRNN instead.

All recurrent layers (LSTM, GRU, SimpleRNN) also follow the specifications of this class and accept the keyword arguments listed below.

Example

# as the first layer in a Sequential model
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
# now model.output_shape == (None, 32)
# note: `None` is the batch dimension.

# the following is identical:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10))

# for subsequent layers, not need to specify the input size:
model.add(LSTM(16))
Arguments

weights: list of Numpy arrays to set as initial weights. The list should have 3 elements, of shapes: [(input_dim, output_dim), (output_dim, output_dim), (output_dim,)].
return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
go_backwards: Boolean (default False). If True, process the input sequence backwards.
stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. When using TensorFlow, the network is always unrolled, so this argument does not do anything. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
consume_less: one of "cpu", "mem", or "gpu" (LSTM/GRU only). If set to "cpu", the RNN will use an implementation that uses fewer, larger matrix products, thus running faster on CPU but consuming more memory. If set to "mem", the RNN will use more matrix products, but smaller ones, thus running slower (may actually be faster on GPU) while consuming less memory. If set to "gpu" (LSTM/GRU only), the RNN will combine the input gate, the forget gate and the output gate into a single matrix, enabling more time-efficient parallelization on the GPU. Note: RNN dropout must be shared for all gates, resulting in a slightly reduced regularization.
input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword argument input_shape) is required when using this layer as the first layer in a model.
input_length: Length of input sequences, to be specified when it is constant. This argument is required if you are going to connect  Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed). Note that if the recurrent layer is not the first layer in your model, you would need to specify the input length at the level of the first layer (e.g. via the input_shape argument)
Input shape

3D tensor with shape (nb_samples, timesteps, input_dim).

Output shape

if return_sequences: 3D tensor with shape  (nb_samples, timesteps, output_dim).
else, 2D tensor with shape (nb_samples, output_dim).
Masking

This layer supports masking for input data with a variable number of timesteps. To introduce masks to your data, use an Embedding layer with the mask_zero parameter set to True.

Note on performance

You are likely to see better performance with RNNs in Theano compared to TensorFlow. Additionally, when using TensorFlow, it is often preferable to set unroll=True for better performance.

Note on using statefulness in RNNs

You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one mapping between samples in different successive batches.

To enable statefulness: - specify stateful=True in the layer constructor. - specify a fixed batch size for your model, by passing if sequential model: a batch_input_shape=(...) to the first layer in your model. else for functional model with 1 or more Input layers: a batch_shape=(...) to all the first layers in your model. This is the expected shape of your inputs including the batch size. It should be a tuple of integers, e.g. (32, 10, 100).

To reset the states of your model, call .reset_states() on either a specific layer, or on your entire model.

[source]

SimpleRNN

keras.layers.recurrent.SimpleRNN(output_dim, init='glorot_uniform', inner_init='orthogonal', activation='tanh', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0)
Fully-connected RNN where the output is to be fed back to input.

Arguments

output_dim: dimension of the internal projections and the final output.
init: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: initializations).
inner_init: initialization function of the inner cells.
activation: activation function. Can be the name of an existing function (str), or a Theano function (see: activations).
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the input weights matrices.
U_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the recurrent weights matrices.
b_regularizer: instance of WeightRegularizer, applied to the bias.
dropout_W: float between 0 and 1. Fraction of the input units to drop for input gates.
dropout_U: float between 0 and 1. Fraction of the input units to drop for recurrent connections.
References

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
[source]

GRU

keras.layers.recurrent.GRU(output_dim, init='glorot_uniform', inner_init='orthogonal', activation='tanh', inner_activation='hard_sigmoid', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0)
Gated Recurrent Unit - Cho et al. 2014.

Arguments

output_dim: dimension of the internal projections and the final output.
init: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: initializations).
inner_init: initialization function of the inner cells.
activation: activation function. Can be the name of an existing function (str), or a Theano function (see: activations).
inner_activation: activation function for the inner cells.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the input weights matrices.
U_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the recurrent weights matrices.
b_regularizer: instance of WeightRegularizer, applied to the bias.
dropout_W: float between 0 and 1. Fraction of the input units to drop for input gates.
dropout_U: float between 0 and 1. Fraction of the input units to drop for recurrent connections.
References

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
[source]

LSTM

keras.layers.recurrent.LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one', activation='tanh', inner_activation='hard_sigmoid', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0)
Long-Short Term Memory unit - Hochreiter 1997.

For a step-by-step description of the algorithm, see this tutorial.

Arguments

output_dim: dimension of the internal projections and the final output.
init: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: initializations).
inner_init: initialization function of the inner cells.
forget_bias_init: initialization function for the bias of the forget gate. Jozefowicz et al. recommend initializing with ones.
activation: activation function. Can be the name of an existing function (str), or a Theano function (see: activations).
inner_activation: activation function for the inner cells.
W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the input weights matrices.
U_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the recurrent weights matrices.
b_regularizer: instance of WeightRegularizer, applied to the bias.
dropout_W: float between 0 and 1. Fraction of the input units to drop for input gates.
dropout_U: float between 0 and 1. Fraction of the input units to drop for recurrent connections.
References

Long short-term memory (original 1997 paper)
Learning to forget: Continual prediction with LSTM
Supervised sequence labeling with recurrent neural networks
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

In [None]:
Docs » Layers » Embedding Layers  Edit on GitHub
[source]

Embedding

keras.layers.embeddings.Embedding(input_dim, output_dim, init='uniform', input_length=None, W_regularizer=None, activity_regularizer=None, W_constraint=None, mask_zero=False, weights=None, dropout=0.0)
Turn positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

This layer can only be used as the first layer in a model.

Example

  model = Sequential()
  model.add(Embedding(1000, 64, input_length=10))
  # the model will take as input an integer matrix of size (batch, input_length).
  # the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
  # now model.output_shape == (None, 10, 64), where None is the batch dimension.

  input_array = np.random.randint(1000, size=(32, 10))

  model.compile('rmsprop', 'mse')
  output_array = model.predict(input_array)
  assert output_array.shape == (32, 10, 64)
Arguments

input_dim: int > 0. Size of the vocabulary, ie. 1 + maximum integer index occurring in the input data.
output_dim: int >= 0. Dimension of the dense embedding.
init: name of initialization function for the weights of the layer (see: initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
weights: list of Numpy arrays to set as initial weights. The list should have 1 element, of shape (input_dim, output_dim).
W_regularizer: instance of the regularizers module (eg. L1 or L2 regularization), applied to the embedding matrix.
W_constraint: instance of the constraints module (eg. maxnorm, nonneg), applied to the embedding matrix.
mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful for recurrent layers which may take variable length input. If this is True then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal |vocabulary| + 2).
input_length: Length of input sequences, when it is constant. This argument is required if you are going to connect  Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).
dropout: float between 0 and 1. Fraction of the embeddings to drop.
Input shape

2D tensor with shape: (nb_samples, sequence_length).

Output shape

3D tensor with shape: (nb_samples, sequence_length, output_dim).

References

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

In [None]:
Docs » Layers » Advanced Activations Layers  Edit on GitHub
[source]

LeakyReLU

keras.layers.advanced_activations.LeakyReLU(alpha=0.3)
Special version of a Rectified Linear Unit that allows a small gradient when the unit is not active: f(x) = alpha * x for x < 0, f(x) = x for x >= 0.

Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as the input.

Arguments

alpha: float >= 0. Negative slope coefficient.
[source]

PReLU

keras.layers.advanced_activations.PReLU(init='zero', weights=None, shared_axes=None)
Parametric Rectified Linear Unit: f(x) = alphas * x for x < 0, f(x) = x for x >= 0, where alphas is a learned array with the same shape as x.

Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as the input.

Arguments

init: initialization function for the weights.
weights: initial weights, as a list of a single Numpy array.
shared_axes: the axes along which to share learnable parameters for the activation function. For example, if the incoming feature maps are from a 2D convolution with output shape (batch, height, width, channels), and you wish to share parameters across space so that each filter only has one set of parameters, set shared_axes=[1, 2].
References

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
[source]

ELU

keras.layers.advanced_activations.ELU(alpha=1.0)
Exponential Linear Unit: f(x) =  alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0.

Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as the input.

Arguments

alpha: scale for the negative factor.
References

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
[source]

ParametricSoftplus

keras.layers.advanced_activations.ParametricSoftplus(alpha_init=0.2, beta_init=5.0, weights=None, shared_axes=None)
Parametric Softplus: alpha * log(1 + exp(beta * x))

Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as the input.

Arguments

alpha_init: float. Initial value of the alpha weights.
beta_init: float. Initial values of the beta weights.
weights: initial weights, as a list of 2 numpy arrays.
shared_axes: the axes along which to share learnable parameters for the activation function. For example, if the incoming feature maps are from a 2D convolution with output shape (batch, height, width, channels), and you wish to share parameters across space so that each filter only has one set of parameters, set shared_axes=[1, 2].
References

Inferring Nonlinear Neuronal Computation Based on Physiologically Plausible Inputs
[source]

ThresholdedReLU

keras.layers.advanced_activations.ThresholdedReLU(theta=1.0)
Thresholded Rectified Linear Unit: f(x) = x for x > theta f(x) = 0 otherwise.

Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as the input.

Arguments

theta: float >= 0. Threshold location of activation.
References

Zero-Bias Autoencoders and the Benefits of Co-Adapting Features
[source]

SReLU

keras.layers.advanced_activations.SReLU(t_left_init='zero', a_left_init='glorot_uniform', t_right_init='glorot_uniform', a_right_init='one', shared_axes=None)
S-shaped Rectified Linear Unit.

Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as the input.

Arguments

t_left_init: initialization function for the left part intercept
a_left_init: initialization function for the left part slope
t_right_init: initialization function for the right part intercept
a_right_init: initialization function for the right part slope
shared_axes: the axes along which to share learnable parameters for the activation function. For example, if the incoming feature maps are from a 2D convolution with output shape (batch, height, width, channels), and you wish to share parameters across space so that each filter only has one set of parameters, set shared_axes=[1, 2].
References

Deep Learning with S-shaped Rectified Linear Activation Units

In [None]:
Docs » Layers » Normalization Layers  Edit on GitHub
[source]

BatchNormalization

keras.layers.normalization.BatchNormalization(epsilon=0.001, mode=0, axis=-1, momentum=0.99, weights=None, beta_init='zero', gamma_init='one', gamma_regularizer=None, beta_regularizer=None)
Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.

Arguments

epsilon: small float > 0. Fuzz parameter. Theano expects epsilon >= 1e-5.
mode: integer, 0, 1 or 2.
0: feature-wise normalization. Each feature map in the input will be normalized separately. The axis on which to normalize is specified by the axis argument. Note that if the input is a 4D image tensor using Theano conventions (samples, channels, rows, cols) then you should set axis to 1 to normalize along the channels axis. During training we use per-batch statistics to normalize the data, and during testing we use running averages computed during the training phase.
1: sample-wise normalization. This mode assumes a 2D input.
2: feature-wise normalization, like mode 0, but using per-batch statistics to normalize the data during both testing and training.
axis: integer, axis along which to normalize in mode 0. For instance, if your input tensor has shape (samples, channels, rows, cols), set axis to 1 to normalize per feature map (channels axis).
momentum: momentum in the computation of the exponential average of the mean and standard deviation of the data, for feature-wise normalization.
weights: Initialization weights. List of 2 Numpy arrays, with shapes:  [(input_shape,), (input_shape,)] Note that the order of this list is [gamma, beta, mean, std]
beta_init: name of initialization function for shift parameter (see initializations), or alternatively, Theano/TensorFlow function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
gamma_init: name of initialization function for scale parameter (see initializations), or alternatively, Theano/TensorFlow function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.
gamma_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the gamma vector.
beta_regularizer: instance of WeightRegularizer, applied to the beta vector.
Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.

Output shape

Same shape as input.

References

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift