`tf.vectorized_maps` resolve fallbacks #55639

bhack · 2022-04-15T21:14:44Z

Mirroring keras-team/keras-cv#291 for the TF core component.

System information

TensorFlow version (you are using):
master
Are you willing to contribute it (Yes/No):
I don't know. If I have a clear contribution path/pin-pointer probably.

Describe the feature and the current behavior/state.
Missing converters PFOR and eventually mitigate other fallbacks

Will this change the current api? How?
No

Who will benefit with this feature?
Performance on fallback

Any Other info.

/cc @wangpengmit

bhack · 2022-04-18T15:38:50Z

On the TF side I add also keras-team/keras-cv#258 (comment) as it will be the next one.

bhack · 2022-04-27T11:57:24Z

Any feedback on what to do next?

bhack · 2022-04-28T13:47:25Z

I've just updated the fallback list from the current Keras-cv master:

WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting AdjustContrastv2 cause Input "contrast_factor" of op 'AdjustContrastv2' expected to be loop invariant.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting AdjustHue cause Input "delta" of op 'AdjustHue' expected to be loop invariant.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting AdjustSaturation cause Input "scale" of op 'AdjustSaturation' expected to be loop invariant.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting DepthwiseConv2dNative cause Input "filter" of op 'DepthwiseConv2dNative' expected to be not loop invariant.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting DynamicPartition cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting DynamicStitch cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting HistogramFixedWidth cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting RandomShuffle cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING  tensorflow:pfor.py:1082 Using a while_loop for converting StridedSlice cause Input "input" of op 'StridedSlice' expected to be not loop invariant.

wangpengmit · 2022-04-28T21:18:29Z

We discussed it in our meeting but still don't have an owner for tf.vectorized_map yet. So the only feedback we can give at the moment is that contributions are welcomed. (CCing @rohan100jain)

bhack · 2022-04-28T21:30:02Z

@wangpengmit Thank you but I need to wait that we have a resource allocated to tf.vectorized_map to discuss a little bit in detail the next steps.

I want also try to minimize the risk that at some point I find out that it will no longer be actively supported or it is superseded by something else as we don't have a public visibility on TF roadmaps.

Please notify me on this ticket when you have a new resource allocated and I could help.

wangpengmit · 2022-07-25T21:48:05Z

CCing @ishark .

ishark · 2022-08-31T15:14:57Z

Sorry for the late reply on this. Can you please help prioritize which ops would you like to have converters for first?

bhack · 2022-08-31T15:33:22Z

I think that we need to understand a little bit of things before looking at converters priorities:

I suppose no registered converter need to be considered as fallback fail fast case for vectorized_map.

As you know the origin of these are from Keras-cv so I'am more worried about the expected to be loop invariant. Cause I suppose that, just around the corner of the missing converter, we will have this kind of issue also on the other ops.

And my guess (it could be wrong) is that we are hitting an hard limit of vectorized_map related, under the hood, to the limits of the related native ops/kernels.

E.g. see:
keras-team/keras-cv#291 (comment)
keras-team/keras-cv#581 (comment)

@ishark What do you think?

/cc @LukeWood @qlzh727

ishark · 2022-08-31T17:49:07Z

Thanks for the prompt reply @bhack and for providing the context!

Agreed, it makes sense to understand the whole context before diving into solution.

So the way I understand, there are 2 main issues:

Not all ops have vectorization support implemented in vectorized_map. The design of vectorized_map requires that we register each op converter explicitly. So, there are quite a few missing ops.
There might be restrictions about which inputs are expected to be invariant. For example, in the AdjustHue op case, delta is expected to be a constant for all of the calls and the first input is vectorized. It might depend on each usecase, how the inputs should be treated. Perhaps, vectorized_map should provide a flexible mechanism for doing so.

In my opinion, 1 has a more straightforward solution, unless the op is not vectorizable. For 2 on the other hand, things can be a bit tricky and might depend on the usecase. One possible approach here might be to provide way to extend the vectorization converters to users, if general solution is not suitable. Can you confirm that in your delta needs to be a variant and actually has a different value for each call? Or is it coming from keras cv implementation?

bhack · 2022-08-31T18:11:10Z

Obviously 1 is always necessary as a pre-requisite but in this case I think we will end up in 2 by design.

Our variant parameters are mainly inherited by some design choices in the early days of the Keras-cv repo using a within the batch randomization of the TF op parameters.

To have more context please check in the original thread with the few messages around keras-team/keras-cv#146 (comment).

See also my last checklist in:
keras-team/keras-cv#581 (comment)

bhack · 2022-09-01T11:34:24Z

Just to clarify with another example the within the batch approach vs the nature of the TF native op:
The Keras-cv policy is "going to request" to vectorized_map a vectorization of a random/variable contrast_factor for every image in the images params (the batch).
The nature of the TF native operator/API currently is that it handle only a constant contrast_factor for all the images:

tensorflow/tensorflow/python/ops/parallel_for/pfor.py

Lines 1713 to 1717 in d8ce9f9

    
           @RegisterPFor("AdjustContrastv2") 
        
           def _convert_adjust_contrastv2(pfor_input): 
        
             images = pfor_input.stacked_input(0) 
        
             contrast_factor = pfor_input.unstacked_input(1) 
        
             return wrap(gen_image_ops.adjust_contrastv2(images, contrast_factor), True)

davidanoel · 2022-09-30T21:03:20Z

The same is happening for RandAugment. Without RandAugment each epoch takes around 35s on my machine. With RandAugment it takes about 2mins 25 seconds. Any resolution on the roadmap?

LukeWood · 2022-09-30T21:14:40Z

The same is happening for RandAugment. Without RandAugment each epoch takes around 35s on my machine. With RandAugment it takes about 2mins 25 seconds. Any resolution on the roadmap

hello!

please provide repro code. Are you using it in a sequential model? You should use it in the tf.data pipeline

davidanoel · 2022-09-30T22:42:36Z

@LukeWood I am using it within the tf.data pipeline. See snippets below.

from keras_cv.layers import RandAugment

(train_set, valid_set, test_set) ,info= tfds.load("mnist", split=["train[:90%]", "train[90%:]", "test"],as_supervised=True, with_info=True)

augmenter = keras_cv.layers.RandAugment(
    value_range=(0, 255),
    augmentations_per_image=3,
    magnitude=0.9,
    magnitude_stddev=0.2,
    rate=0.5,
)

def resize_and_rescale(image, label):
    # convert MNIST images to RGB
    image = tf.image.resize_with_pad(image, RESIZE_TO, RESIZE_TO)
    image = tf.tile(image, [1, 1, 3])
    image = tf.cast(image, tf.float32)
    label = tf.one_hot(label, num_classes)
    return image, label

def prepare_dataset(dataset, shuffle=False, augment=False):
    dataset = dataset.map(resize_and_rescale, num_parallel_calls=AUTO)
    if shuffle:
        dataset = dataset.shuffle(10 * BATCH_SIZE)
    dataset = dataset.batch(BATCH_SIZE)
    if augment:
        dataset = dataset.map(lambda x,y:(augmenter(x),y),num_parallel_calls=AUTO)
    return dataset.prefetch(AUTO)

train_dataset_aug = prepare_dataset(train_set,shuffle=True, augment=True)

LukeWood · 2022-10-01T02:18:03Z

@LukeWood I am using it within the tf.data pipeline. See snippets below.

from keras_cv.layers import RandAugment

(train_set, valid_set, test_set) ,info= tfds.load("mnist", split=["train[:90%]", "train[90%:]", "test"],as_supervised=True, with_info=True)

augmenter = keras_cv.layers.RandAugment(
    value_range=(0, 255),
    augmentations_per_image=3,
    magnitude=0.9,
    magnitude_stddev=0.2,
    rate=0.5,
)

def resize_and_rescale(image, label):
    # convert MNIST images to RGB
    image = tf.image.resize_with_pad(image, RESIZE_TO, RESIZE_TO)
    image = tf.tile(image, [1, 1, 3])
    image = tf.cast(image, tf.float32)
    label = tf.one_hot(label, num_classes)
    return image, label

def prepare_dataset(dataset, shuffle=False, augment=False):
    dataset = dataset.map(resize_and_rescale, num_parallel_calls=AUTO)
    if shuffle:
        dataset = dataset.shuffle(10 * BATCH_SIZE)
    dataset = dataset.batch(BATCH_SIZE)
    if augment:
        dataset = dataset.map(lambda x,y:(augmenter(x),y),num_parallel_calls=AUTO)
    return dataset.prefetch(AUTO)

train_dataset_aug = prepare_dataset(train_set,shuffle=True, augment=True)

I see. It’s not clear what the expected performance is of RandAugment; this might not be a bug but rather the cost of doing more computation - especially if your neural network is tiny. What’s your model code?

bhack · 2022-10-01T11:53:03Z

this might not be a bug but rather the cost of doing more computation - especially if your neural network is tiny. What’s your model code?

Yes It could still be a too small input/network that makes the forward and backward pass faster then the augmentation task and so the augmentation is the bottleneck but we had reported a quite heavy overhead also just with a single augmentation like Random contrast with EfficientnetB0 with CIFAR (ok not a so huge input size) at keras-team/keras-cv#581

The main problem is that we have never benchmarked in Keras-cv the performance drop of these vectorized map fallbakcs and our original choice/policy to randomize within the batch that it is one of the root cause of these fallbacks. I haven't still any specific reference about a paper that sustain why this training regime has gain on the loss/epoch. What I can claim for sure is that we have a performance drop that need to be benchmarked.

Then with @ishark we could check if there is any action item or we don't have too much pratical solutions if we want to maintain the within the batch augmentation policy other that changing the tf.image ops to support batched args. But other then an API break it will requires kernels and XLA lowerings/bridge rewriting (/cc @paynecl right?)

All this with an orphan tf.image API ownership (tensorflow/community#412 (comment))

paynecl · 2022-10-03T04:21:47Z

'But other then an API break it will requires kernels and XLA lowerings/bridge rewriting' Possibly. Depends on how the low level op(s) in question is(are) affected.

ishark · 2022-10-04T00:07:33Z

Thanks @atlanticstarr1 and @bhack for adding details to the bug.

Regarding the code snippet for RandAugment, I did not see vectorized_map being used in the snippet. Tracing through the dataset.map method briefly, it seems like it uses custom dataset ops and not vectorized_map under the hood. @atlanticstarr1, could you please confirm the slowdown is related to vectorized_map? Otherwise we might want to track it as a separate issue.

Will look into a solution for invariant issue for pfor in this quarter and update. Thanks!

bhack · 2022-10-04T00:11:49Z

@ishark You can find it if you look at the base class of the image preprocessing layers:

https://github.com/keras-team/keras/blob/master/keras/layers/preprocessing/image_preprocessing.py#L278:L281

bhack · 2022-10-04T00:17:03Z

Will look into a solution for invariant issue for pfor in this quarter and update. Thanks!

With the within the batch augmentation policy we have (imho wrongly) tried to use vectorized_map to workaround the limits of the image ops that are working with an images (batch) but with a fixed transformation (whole batch) arg.

If we are still talking about covering the missing converter with invariant I don't think that we will solve this cases.

ishark · 2022-10-04T00:49:26Z

Thanks for the link to keras preprocessing layer @bhack.

I meant I would investigate the original issue of tf.vectorized_map fallbacks occurred during a few of the Keras cv preprocessing layers and check which ones we can resolve in the short term. It may not solve all the issues related to Keras CV + pfor usage, however, hopefully it will move us further along.

bhack · 2022-10-04T12:27:22Z

I meant I would investigate the original issue of tf.vectorized_map fallbacks occurred during a few of the Keras cv preprocessing layers and check which ones we can resolve in the short term.

Hopefully we can discuss it publicly as soon as you have an idea. My hypothesis is obviously that in this specific case the lack of converters is just a "facade" problem

bhack · 2022-10-07T13:17:35Z

The main problem is that we have never benchmarked in Keras-cv the performance drop of these vectorized map fallbakcs and our original choice/policy to randomize within the batch that it is one of the root cause of these fallbacks. I haven't still any specific reference about a paper that sustain why this training regime has gain on the loss/epoch. What I can claim for sure is that we have a performance drop that need to be benchmarked.

Please check my Colab performance test at keras-team/keras-cv#581 (comment)

bhack · 2022-10-07T18:38:46Z

/cc @martin-gorner

ishark · 2022-10-07T20:46:25Z

Thank you @bhack! This is very helpful.

martin-gorner · 2022-11-10T11:35:23Z

some design choices in the early days of the Keras-cv repo using a within the batch randomization

I confirm this is a deliberate product decision: data augmentation techniques that augment an entire batch of images in a similar way are potentially harmful. Many models train better with good stochasticity of the training data. Even inadequate shuffling of data within a batch can show up on learning curves. Data augmentation is supposed to increase the variance of input images. But having an entire batch augmented in the same way works against that.

These effects on training can be more or less visible, sometimes very subtle and therefore very difficult to pinpoint and debug. That's why we want to make sure this is a problem users of KerasCV augmentation layer do not have to worry about.

This discussion is not so much about numerical impact (of batches augmented in a similar way vs. with good intra-batch augmentation stochasticity) on this or that model. There might be evidence that this does or does not matter for some models. Model vary and the evidence will always be debated.

This product decision is about making this debate a moot point by offering a design that is always safe out of the box.

bhack · 2022-11-10T12:14:21Z

Yes I think we all understand this but we also know that if in the design phase we don't have planned the resources to rewrite the required TF ops and Kernels (+XLA) the performance gap was assured as we need to launch a kernel for every image in the batch.

Also in some specific known conditions it is hard to plan at all to refactor these specific API (see tensorflow/community#412 (comment))

In the early design days, as re-mentioned in keras-team/keras-cv#372 (comment), I've tried to argue, in the case we didn't plan to rewrite these ops/kernels, a sort of compromise between stochasticity and the time required for a training step:

Can we interpret the mentioned paper by randomizing the augmenting params in sub-batches repeating the same set of images? In this way we could populate the batch reducing the number of launched kernels and also reduce the IO traffic on the filesystem.

Maybe you have discussed this internally but in the "community" this has never been addressed.

On the practical side, if this idea does not convince the team, what other solution do we have in hand besides rewriting the ops in TF?

martin-gorner · 2022-11-10T12:46:42Z

I'm hearing you. Performance is an important consideration too. We will be discussing the tradeoff.

bhack · 2022-11-10T13:17:10Z

@martin-gorner Thanks, I hope that you could keep the discussion open in the community also if sometimes I understand it could be slower than an internal thread.

Currently I don't have the Cloud resource available to compile again TF and prepare some testing PRs.
E.g. Just to make a concrete example, also if I have not seen the full implementation in details, but just considering the CPU kernel in RandomContrast/AdjustContrast (see keras-team/keras-cv#581 (comment)) I think it could be not too tricky to use a Tensor instead of a scalar for the requirements of a RandomContrast within the batch augmentation:

tensorflow/tensorflow/core/kernels/image/adjust_contrast_op.cc

Lines 381 to 386 in 7d9c767

    
           const float factor_value = factor(); 
        
           float* p = output.data(); 
        
           const float* q = input.data(); 
        
           for (int64_t n = 0; n < input.size(); ++n) { 
        
             p[n] += factor_value * (q[n] - p[n]); 
        
           }

Always considering that we are not going to accumulate a combinatorial explosion of TF ops to modify especially with the new 3d augmentation API and that we have at least enough resources to review PRs.

I think that instead for the missing converters it is a minor problem that can be handled more safely by the vectorized_map team without too much problems (or by community with PRs if we have enough reviewers).

bhack · 2022-11-12T23:23:06Z

@martin-gorner In the case it could help https://arxiv.org/abs/2210.06441

bhack · 2023-01-30T22:22:25Z

@martin-gorner What do you think about keras-team/keras-cv#1331 ?

johnlundberg · 2023-03-06T16:50:37Z

I'm seeing the same issue with PopulationCount

bhack added the type:feature Feature requests label Apr 15, 2022

google-ml-butler bot assigned sushreebarsa Apr 15, 2022

bhack mentioned this issue Apr 15, 2022

vectorize KPLS: tf.vectorized_map fallback cause keras-team/keras-cv#291

Closed

sushreebarsa added the comp:ops OPs related issues label Apr 18, 2022

sushreebarsa assigned chunduriv and unassigned sushreebarsa Apr 18, 2022

chunduriv assigned sachinprasadhs and unassigned chunduriv Apr 20, 2022

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 20, 2022

This was referenced May 24, 2022

Model trains extremely slow after upgrade to tf 2.9 #56242

Open

Update performance benchmarking script to include a layer with ops that rely on fallbacks for autovectorization keras-team/keras-cv#455

Closed

sachinprasadhs mentioned this issue Aug 31, 2022

tf.matmul returns wrong results if called within tf.vectorized_map #52148

Closed

bhack mentioned this issue Sep 30, 2022

BUG: RandomContrast slows down training 6-fold keras-team/keras-cv#581

Closed

bhack mentioned this issue Oct 25, 2022

WARNING:tensorflow:Using a while_loop #58296

Closed

bhack mentioned this issue Dec 15, 2022

Adding ViT weights keras-team/keras-cv#1145

Merged

bhack mentioned this issue Dec 24, 2022

investigate the impact of changing arguments in map_fn/vmap for preprocessing layers keras-team/keras-cv#595

Closed

bhack mentioned this issue Jan 19, 2023

Implement RepeatedAugmentation as a KerasCV API keras-team/keras-cv#1293

Merged

1 task

bhack mentioned this issue Jan 30, 2023

Implement vectorized base image augmentation layer w/ Grayscale() example keras-team/keras-cv#1331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tf.vectorized_maps` resolve fallbacks #55639

`tf.vectorized_maps` resolve fallbacks #55639

bhack commented Apr 15, 2022

bhack commented Apr 18, 2022

bhack commented Apr 27, 2022

bhack commented Apr 28, 2022

wangpengmit commented Apr 28, 2022

bhack commented Apr 28, 2022 •

edited

wangpengmit commented Jul 25, 2022

ishark commented Aug 31, 2022

bhack commented Aug 31, 2022 •

edited

ishark commented Aug 31, 2022

bhack commented Aug 31, 2022 •

edited

bhack commented Sep 1, 2022

davidanoel commented Sep 30, 2022

LukeWood commented Sep 30, 2022 •

edited

davidanoel commented Sep 30, 2022 •

edited

LukeWood commented Oct 1, 2022

bhack commented Oct 1, 2022

paynecl commented Oct 3, 2022 •

edited

ishark commented Oct 4, 2022

bhack commented Oct 4, 2022

bhack commented Oct 4, 2022

ishark commented Oct 4, 2022

bhack commented Oct 4, 2022

bhack commented Oct 7, 2022 •

edited

bhack commented Oct 7, 2022

ishark commented Oct 7, 2022

martin-gorner commented Nov 10, 2022

bhack commented Nov 10, 2022 •

edited

martin-gorner commented Nov 10, 2022

bhack commented Nov 10, 2022 •

edited

bhack commented Nov 12, 2022

bhack commented Jan 30, 2023

johnlundberg commented Mar 6, 2023

tf.vectorized_maps resolve fallbacks #55639

tf.vectorized_maps resolve fallbacks #55639

Comments

bhack commented Apr 15, 2022

bhack commented Apr 18, 2022

bhack commented Apr 27, 2022

bhack commented Apr 28, 2022

wangpengmit commented Apr 28, 2022

bhack commented Apr 28, 2022 • edited

wangpengmit commented Jul 25, 2022

ishark commented Aug 31, 2022

bhack commented Aug 31, 2022 • edited

ishark commented Aug 31, 2022

bhack commented Aug 31, 2022 • edited

bhack commented Sep 1, 2022

davidanoel commented Sep 30, 2022

LukeWood commented Sep 30, 2022 • edited

davidanoel commented Sep 30, 2022 • edited

LukeWood commented Oct 1, 2022

bhack commented Oct 1, 2022

paynecl commented Oct 3, 2022 • edited

ishark commented Oct 4, 2022

bhack commented Oct 4, 2022

bhack commented Oct 4, 2022

ishark commented Oct 4, 2022

bhack commented Oct 4, 2022

bhack commented Oct 7, 2022 • edited

bhack commented Oct 7, 2022

ishark commented Oct 7, 2022

martin-gorner commented Nov 10, 2022

bhack commented Nov 10, 2022 • edited

martin-gorner commented Nov 10, 2022

bhack commented Nov 10, 2022 • edited

bhack commented Nov 12, 2022

bhack commented Jan 30, 2023

johnlundberg commented Mar 6, 2023

`tf.vectorized_maps` resolve fallbacks #55639

`tf.vectorized_maps` resolve fallbacks #55639

bhack commented Apr 28, 2022 •

edited

bhack commented Aug 31, 2022 •

edited

bhack commented Aug 31, 2022 •

edited

LukeWood commented Sep 30, 2022 •

edited

davidanoel commented Sep 30, 2022 •

edited

paynecl commented Oct 3, 2022 •

edited

bhack commented Oct 7, 2022 •

edited

bhack commented Nov 10, 2022 •

edited

bhack commented Nov 10, 2022 •

edited