CNN Input shape for Deep Learning Frameworks

In Deep learning use cases images are represented as 3-D tensors (for colored images) and 2-D tensors (for gray scale images). Images mainly have three attributes: height, width, channels. Different deep learning frameworks expect these attributes to be specified in different order as per respective frameworks. In this post we are going to discuss the formats in which popular deep learning frameworks expect these attributes to be specified.

Currently there are many deep learning frameworks in the market like:

1. Keras (for Python): Keras is a deep learning framework for Python. Keras is a wrapper around numerical computing libraries to provide user an easy interface to code Deep Learning networks. As a backend Keras could use:
a. Tensorflow
b. Theano

2. DL4J (for Java/Scala): DL4J is a deep learning framework written in Java. It could be used with Java as well scala programming languages.

Image Structure in CNN:

CNN (Convolutional Neural Networks) are neural network architectures mainly used for image based tasks. In a CNN a color image is represented as:


Fig 1. Image structure in CNN.

Above figure represents structure of an image. A colored image is represented as having three attributes:
    a. Height of image (in pixels)
    b. Width of image (in pixels)
    c. Channels
Here height and width are self explanatory. Channels represent the number of channels for image. Channels could have following values:
    a. 3 : If image is colored then there are 3 channels in the image (Red, Green, Blue).
    b. 1 : If image is gray scale then number of channels is 1.

Input shape for CNN layer in Deep Learning Frameworks:

In different frameworks the 3 attributes of an image could appear in different ordering thus creating a confusing situation for developer. This post summarizes the attribute ordering for different frameworks:

1. Keras (with Tensorflow backend): If we are using Keras and Tensorflow is our backend then shape of input image is:

(height, width, channels)

#(Python code)

# Create the model

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(height, width,
channels), padding='same', activation='relu', kernel_constraint=maxnorm(3)))

Highlighted section demonstrates the order in which attributes of image are to be listed in Keras (with Tensorflow backend).

2. Keras (with Theano backend): If we are using Keras and Theano is our backend then the shape of input image is:

(channels, height, width)

#(Python code)

# Create the model

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(channels, height,
width), padding='same', activation='relu', kernel_constraint=maxnorm(3)))

Highlighted section demonstrates the order in which attributes of image are to be listed in Keras (with Theano backend).

3. DL4J: In DL4J while creating MultiLayerConfiguration the input image is represented as:

(height, width, channels)

// (Java code)

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()

.seed(seed)

.iterations(iterations)

........

.backprop(true).pretrain(false)
.setInputType(InputType.convolutional(height, width, channels))

Highlighted section demonstrates the order in which attributes of image are to be listed in DL4J.

GreyMatter

Search This Blog