How tf.image.extract_image_patches works ? #29857

Khoa-NT · 2019-06-17T02:00:08Z

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 16.04
TensorFlow installed from (source or binary): Anaconda for 1.13.1 and Pip for TF 2 Beta
TensorFlow version (use command below): TF 1.13.1 and TF 2 Beta
Python version: 3.6.8
CUDA/cuDNN version: CUDA 10 / cuDNN 7.6
GPU model and memory: GTX 1080 , 11GB

Describe the current behavior
I want to extract a large gray scale image from (1250 x 1250) to patches (512 x 512). So I tried to run tf.image.extract_image_patches on both versions: TF 1.13.1 and TF 2 Beta follow this tutorial.
My parameters:

my_input_image # shape = ( batch , size_x, size_y)
my_input_image = tf.expand_dims(my_input_image ,-1)  #add 1 more "depth" channel as the last axis 
ksizes = [1, 512, 512, 1] #size of output patch
strides = [1, 256, 256, 1] # Stride
rates = [1, 1, 1, 1]  #Rate
padding='SAME' # I want to have zero padding when the stride go out of my_input_image 

image_patches = tf.image.extract_patches(input_big_pic, ksizes, strides, rates, padding)


image_patches.shape  # => TensorShape([125, 5, 5, 262144]) . Why we have 5 pictures in a row?

patch1 = image_patches[0,0,0,] # Get the 1st patch
patch1 = tf.reshape(patch1, [512, 512, 1]) # Reshape to the correct shape
patch1 = tf.squeeze(patch1) # Remove the depth channel
plt.imshow(patch1)

tf.image.extract_patches will output a matrix image patches 5x5.
The zero paddings = 143 on each edge.
I don't understand why we have 5 pictures in a row?
How tf.image.extract_patches works ?

Describe the expected behavior
It should be 4x4 matrix image patches with zero paddings = 15 on each edge.
Denote n is the number of stride steps.
The number image patchs = n + 1

Input_size = 1250
Output_size= 512
Stride = 256
Padding:

We have an equation:
2*Padding + Input_size = n*Stride + Output (1)
We don't know how many zero padding we need. So:
Input_size <= n*Stride + Output
1250 <= n*256 + 512
Then 2.88 <= n .
We choose the nearest interger n = 3.
(1) => zero padding = 15

Will will have 4x4 matrix image patches with zero paddings = 15 on each edge.

Please correct me if my calculation is wrong.

The text was updated successfully, but these errors were encountered:

martinwicke · 2019-06-22T22:06:06Z

Please the this excellent explanation on StackOverflow: https://stackoverflow.com/questions/40731433/understanding-tf-extract-image-patches-for-extracting-patches-from-an-image

I don't think there's a bug here, so I will close this issue.

martinwicke · 2019-06-22T22:06:27Z

(That said, I am adding a proper docstring to tf.image.extract_patches)

Khoa-NT · 2019-06-23T05:53:40Z

@martinwicke : Hi, I've read it.
But I would like to know why there is a difference between a manual calculation result and tf.image.extract_patches result ?

martinwicke · 2019-06-23T09:00:08Z

1250/256 = 4.8.., because you use 'same' padding, you round up.

The sizes (512) is not relevant to determine how many patches there are, only how big they are (they can overlap).

Fixes #29857. PiperOrigin-RevId: 254797260

Khoa-NT · 2019-06-25T10:30:03Z

Example:
Input_picture = np.zeros((1,1250,1250,1))
From tf.extract_image_patches

def extract_patches(x):
    return tf.extract_image_patches(
        x,
        (1, 512, 512, 1),
        (1, 256, 256, 1),
        (1, 1, 1, 1),
        padding="SAME"
        )
Output_extracted = extract_patches(Input_picture)

Output_extracted will have shape:
TensorShape([Dimension(1), Dimension(5), Dimension(5), Dimension(262144)])
( 1, 5, 5, 262144 )
It means there are 5 x 5 = 25 patches.

But we calculate:

1250/256 = 4.8.., because you use 'same' padding, you round up.

The sizes (512) is not relevant to determine how many patches there are, only how big they are (they can overlap).

4.8 rounds up = 5
256*5 = 1280
It means we can add zero pad = 1280 - 1250 = 30
=> Padding big image size = 1280 x 1280
Then we extract 4 x 4 = 16 patches by
1st patch: 0----------->512
2nd patch: 256------------>768
3rd patch: 512-------------->1024
4th patch: 768---------------->1280

So the correct answer should be ( 1, 4, 4, 262144 )

gadagashwini-zz self-assigned this Jun 18, 2019

gadagashwini-zz added comp:ops OPs related issues type:support Support issues 2.0.0-beta0 labels Jun 18, 2019

gadagashwini-zz assigned jvishnuvardhan and unassigned gadagashwini-zz Jun 18, 2019

martinwicke closed this as completed Jun 22, 2019

tensorflow-copybara pushed a commit that referenced this issue Jun 24, 2019

Give images.extract_patches a proper docstring.

f9abdc0

Fixes #29857. PiperOrigin-RevId: 254797260

lvenugopalan added the TF 2.0 Issues relating to TensorFlow 2.0 label Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How tf.image.extract_image_patches works ? #29857

How tf.image.extract_image_patches works ? #29857

Khoa-NT commented Jun 17, 2019 •

edited

martinwicke commented Jun 22, 2019

martinwicke commented Jun 22, 2019

Khoa-NT commented Jun 23, 2019

martinwicke commented Jun 23, 2019

Khoa-NT commented Jun 25, 2019 •

edited

How tf.image.extract_image_patches works ? #29857

How tf.image.extract_image_patches works ? #29857

Comments

Khoa-NT commented Jun 17, 2019 • edited

martinwicke commented Jun 22, 2019

martinwicke commented Jun 22, 2019

Khoa-NT commented Jun 23, 2019

martinwicke commented Jun 23, 2019

Khoa-NT commented Jun 25, 2019 • edited

Khoa-NT commented Jun 17, 2019 •

edited

Khoa-NT commented Jun 25, 2019 •

edited