Skip to content
Branch: master
Find file History
khanhlvg and Copybara-Service Improved UI/UX for smart reply sample.
PiperOrigin-RevId: 276798418
Latest commit 008a5be Oct 26, 2019

Smart Reply Model

What is On-Device Smart Reply Model?

Smart Replies are contextually relevant, one-touch responses that help the user to reply to an incoming text message (or email) efficiently and effortlessly. Smart Replies have been highly successful across several Google products including Gmail, Inbox and Allo.

The On-device Smart Reply model is targeted towards text chat use cases. It has a completely different architecture from its cloud-based counterparts, and is built specifically for memory constraints devices such as phones & watches. It has been successfully used to provide Smart Replies on Android Wear to all first- & third-party apps.

The on-device model comes with several benefits. It is:

  • Faster: The model resides on the device and does not require internet connectivity. Thus, the inference is very fast and has an average latency of only a few milliseconds.
  • Resource efficient: The model has a small memory footprint on the device.
  • Privacy-friendly: The user data never leaves the device and this eliminates any privacy restrictions.

A caveat, though, is that the on-device model has lower triggering rate than its cloud counterparts (triggering rate is the percentage of times the model suggests a response for an incoming message).

When to use this Model?

The On-Device Smart Reply model is aimed towards improving the messaging experience for day-to-day conversational chat messages. We recommend using this model for similar use cases. Some sample messages on which the model does well are provided in this tsv file for reference. The file format is:

   {incoming_message  smart_reply1   [smart_reply2]   [smart_reply3]}

For the current model, we see a triggering rate of about 30-40% for messages which are similar to those provided in the tsv file above.

In case the model does not trigger any response, the system falls back to suggesting replies from a fixed back-off set that was compiled from popular response intents observed in chat conversations. Some of the fallback responses are Ok, Yes, No, 👍, ☺.

The model can only be used for inference at this time (i.e. it cannot be custom trained). If you are interested to know how the model was trained, please refer to this blog post and research paper.

How to Use This Model?

We have provided a pre-built demo APK that you can download, install and test on your phone (demo APK here).

The On-Device Smart Reply demo App works in the following way:

  1. Android app links to the JNI binary with a predictor library.

  2. In the predictor library, GetSegmentPredictions is called with a list of input strings.

    2.1 The input string can be 1-3 most recent messages of the conversations in form of string vector. The model will run on these input sentences and provide Smart Replies corresponding to them.

    2.2 The function performs some preprocessing on input data which includes:

    • Sentence splitting: The input message will be split into sentences if message has more than one sentence. Eg: a message like “How are you? Want to grab lunch?” will be broken down into 2 different sentences.

    • Normalization: The individual sentences will be normalized by converting them into lower cases, removing unnecessary punctuations, etc. Eg: “how are you????” will be converted to “how are you?” (refer for NORMALIZE op for more details).

      The input string content will be converted to tensors.

    2.3 The function then runs the prediction model on the input tensors.

    2.4 The function also performs some post-processing which includes aggregating the model predictions for the input sentences from 2.2 and returning the appropriate responses.

  3. Finally, it gets response(s) from std::vector<PredictorResponse>, and returns back to Android app. Responses are sorted in descending order of confidence score.

Ops and Functionality Supported

Following are the ops supported for using On-Device Smart Reply model:


    This is a custom op which normalizes the sentences by:

    • Converting all sentences into lower case.
    • Removing unnecessary punctuations (eg: “how are you????” → “how are you?”).
    • Expanding sentences wherever necessary (eg: “ I’m home” → “I am home”).

    This is an op inside TensorFlow Lite that converts sentences into a list of skip grams. The configurable parameters are ngram_size and max_skip_size. For the model provided, the values for these parameters are set to 3 & 2 respectively.


    This is a custom op that hashes skip grams to features represented as integers. Longer skip-grams are allocated higher weights.


    This is an op inside TensorFlow Lite that projects input features to a corresponding bit vector space using Locality Sensitive Hashing (LSH).


    This is a custom op that runs the input features through the projection model (details here), computes the appropriate response labels along with weights for the projected features, and aggregates the response labels and weights together.


    This is an op inside TensorFlow Lite that uses label id from predict op and looks up the response text from the given label id.

How to Run the App?

We release (pre-built demo APK here) for you to download, install and test on your phone.

If you want to know how to build the app at own side, please refer to

Further Information

  • Open source code here.
You can’t perform that action at this time.