Mask R-CNN for TPU on Google Colab


We are trying to build an image segmentation deep learning model using Google Colab TPU. Our model is Mask R-CNN.

TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR'] import tensorflow as tf tpu_model = tf.contrib.tpu.keras_to_tpu_model( model.keras_model, strategy=tf.contrib.tpu.TPUDistributionStrategy( tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))

However I am running into issues while converting our Mask R-CNN model to TPU model as pasted below.

ValueError: Layer <keras.engine.topology.InputLayer object at 0x7f58574f1940> has a variable shape in a non-batch dimension. TPU models must have constant shapes for all operations. You may have to specify `input_length` for RNN/TimeDistributed layers. Layer: <keras.engine.topology.InputLayer object at 0x7f58574f1940> Input shape: (None, None, None, 3) Output shape: (None, None, None, 3)

Appreciate any help.


Google recently released a tutorial on getting Mask R-CNN going on their TPUs. For this, they are using an experimental model for Mask RCNN on Google's TPU github repository (under models/experimental/mask_rcnn). Looking through the code, it looks like they define the model with a fixed input size to overcome the issue you are seeing.

See below for more explanation:

As @aman2930 points out, the shape of your input tensor is not static. This won't work because Tensorflow compiles models with XLA to use a TPU and XLA must have all tensor shapes defined at compile time. In the link above, the website specifically calls this out:


Static shapes

During regular usage TensorFlow attempts to determine the shapes of each tf.Tensor during graph construction. During execution any unknown shape dimensions are determined dynamically, see Tensor Shapes for more details.

To run on Cloud TPUs TensorFlow models are compiled using XLA. XLA uses a similar system for determining shapes at compile time. XLA requires that all tensor dimensions be statically defined at compile time. All shapes must evaluate to a constant, and not depend on external data, or stateful operations like variables or a random number generator.


That side, further down the document, they mention that the input function is run on the CPU, so isn't limited by static XLA sizes. They point to batch size being the issue, not image size:


Static shapes and batch size

The input pipeline generated by your input_fn is run on CPU. So it is mostly free from the strict static shape requirements imposed by the XLA/TPU environment. The one requirement is that the batches of data fed from your input pipeline to the TPU have a static shape, as determined by the standard TensorFlow shape inference algorithm. Intermediate tensors are free to have a dynamic shapes. If shape inference has failed, but the shape is known it is possible to impose the correct shape using tf.set_shape().


So you could fix this by reformulating your model to have fixed batch size or to use tf.contrib.data.batch_and_drop_remainder as they suggest.


Could you please share the input data function. It is hard to tell the exact issue, but it seems that the shape of tensor representing input sample is not static.



  • Cannot import keras.initializers
  • How to apply Monte Carlo Dropout, in tensorflow, for an LSTM if batch normalization is part of the m
  • Make an existing SVG to spin (loading icon)
  • Keras encoder-decoder model RuntimeError: You must compile your model before using it
  • Application stops generating login cookies
  • Why do I get “TypeEror: Cannot freeze”?
  • asp.net json serializer adding backslash “\\” to my properties
  • How can I prevent page-break in CFDocument from occuring in middle of content?
  • How to print an array in defined order in AWK 3.1.3
  • Crossgen compilation in .NET Core
  • What is the type VoidTaskResult as it relates to async methods?
  • Create Unit test methods dynamically during runtime in MSTest
  • How can I sign in a Wordpress (mu) user from outside of Wordpress?
  • What are the arguments against using a CSS Framework?
  • Java wait() & notify() vs Android wait() & notify()
  • Why do I get a message saying 1.not.found.as.a.resource?
  • Create .so file in android studio and used it in another application in Android
  • Is rootViewController always ready to present a segue by the time application:didBecomeActive is cal
  • creating multiple button using while loop but only first button is responsive when using .onclick fu
  • sys.exit doesn't work as expected after try:
  • Fatal error: Call to a member function user() on boolean
  • three.js — fit a background image panorama
  • How to Include CSS style when converting svg to png
  • Smarter Removing Unnecessary WhiteSpace CSV
  • How to get a time and Date Separately?
  • Tkinter tkMessageBox disables Tkinter key bindings
  • Create an Office365 mailbox from within C# Web API method
  • How to use Flask's render_template from an ajax POST form submit
  • Separating definition/instantiation of template classes without 'extern'
  • Google Spreadsheet Script to Blink a range of Cells
  • How to specify generic type when the type is only known at runtime?
  • calling IO Operations from thread in ruby c extension will cause ruby to hang
  • WPF custom control and direct content support
  • media foundation H264 decoder not working properly