diff --git a/tensorflow_model_optimization/g3doc/guide/clustering/index.md b/tensorflow_model_optimization/g3doc/guide/clustering/index.md
new file mode 100644
index 000000000..d1398ad70
--- /dev/null
+++ b/tensorflow_model_optimization/g3doc/guide/clustering/index.md
@@ -0,0 +1,125 @@
+# Weight clustering
+
+This document provides an overview on weight clustering to help you determine how it fits with your use case.
+
+- To dive right into an end-to-end example, see the [weight clustering example](clustering_example.ipynb).
+- To quickly find the APIs you need for your use case, see the [weight clustering comprehensive guide](clustering_comprehensive_guide.ipynb).
+
+## Overview
+
+Clustering, or weight sharing, reduces the number of unique weight values in a model, leading to benefits for deployment. It first groups the weights of each layer into *N* clusters, then shares the cluster's centroid value for all the weights belonging to the cluster.
+
+This technique brings improvements via model compression. Future framework support can unlock memory footprint improvements that can make a crucial difference for deploying deep learning models on embedded systems with limited resources.
+
+We have experimented with clustering across vision and speech tasks. We've seen up to 5x improvements in model compression with minimal loss of accuracy, as demonstrated by the [results](#results) presented below.
+
+Please note that clustering will provide reduced benefits for convolution and dense layers that precede a batch normalization layer, as well as in combination with per-axis post-training quantization.
+
+### API compatibility matrix
+
+Users can apply clustering with the following APIs:
+
+* Model building: `tf.keras` with only Sequential and Functional models
+* TensorFlow versions: TF 1.x for versions 1.14+ and 2.x.
+ * `tf.compat.v1` with a TF 2.X package and `tf.compat.v2` with a TF 1.X
+ package are not supported.
+* TensorFlow execution mode: both graph and eager
+
+## Results
+
+### Image classification
+
+
+
+ Model |
+ Original |
+ Clustered |
+
+
+ Top-1 accuracy (%) |
+ Size of compressed .tflite (MB) |
+ Configuration |
+ # of clusters |
+ Top-1 accuracy (%) |
+ Size of compressed .tflite (MB) |
+
+
+ MobileNetV1 |
+ 71.02 |
+ 14.96 |
+
+
+ Selective (last 3 Conv2D layers) |
+ 256, 256, 32 |
+ 70.62 |
+ 8.42 |
+
+
+ Full (all Conv2D layers) |
+ 64 |
+ 66.07 |
+ 2.98 |
+
+
+ MobileNetV2 |
+ 72.29 |
+ 12.90 |
+
+
+ Selective (last 3 Conv2D layers) |
+ 256, 256, 32 |
+ 72.31 |
+ 7.00 |
+
+
+ Full (all Conv2D layers) |
+ 32 |
+ 69.33 |
+ 2.60 |
+
+
+
+The models were trained and tested on ImageNet.
+
+### Keyword spotting
+
+
+
+ Model |
+ Original |
+ Clustered |
+
+
+ Top-1 accuracy (%) |
+ Size of compressed .tflite (MB) |
+ Configuration |
+ # of clusters |
+ Top-1 accuracy (%) |
+ Size of compressed .tflite (MB) |
+
+
+ DS-CNN-L |
+ 95.03 |
+ 1.5 |
+ Full |
+ 32 |
+ 94.71 |
+ 0.3 |
+
+
+
+The models were trained and tested on SpeechCommands v0.02.
+
+NOTE: *Size of compressed .tflite* refers to the size of the zipped .tflite file obtained from the model from the following process:
+1. Serialize the Keras model into .h5 file
+2. Convert the .h5 file into .tflite using `TFLiteConverter.from_keras_model_file()`
+3. Compress the .tflite file into a zip
+
+## Examples
+
+In addition to the [Weight clustering in Keras example](clustering_example.ipynb.ipynb), see the following examples:
+
+* Cluster the weights of a CNN model trained on the MNIST handwritten digit classification dataset:
+[code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/clustering/keras/mnist/mnist_cnn.py)
+
+The weight clustering implementation is based on the *Deep Compression: Compressing Deep Neural Networks With Pruning, Trained Quantization and Huffman Coding* [paper](https://arxiv.org/abs/1510.00149). See chapter 3, titled *Trained Quantization and Weight Sharing*.
\ No newline at end of file