Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shifted predictions when predicting on videos with different aspect ratio #596

Closed
talmo opened this issue Oct 20, 2021 · 13 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@talmo
Copy link
Collaborator

talmo commented Oct 20, 2021

It looks like the predictions are slightly off when predicting on videos that have a different aspect ratio than those used for training.

This was supposed to be fixed in v1.1.3 (in #524), but it seems to still be happening in some(?) cases.

Workaround:
image
Disable this functionality by editing training_config.json for each model used for inference and setting: "resize_and_pad_to_target": false.

Related issues: #516, #584

@talmo talmo added the bug Something isn't working label Oct 20, 2021
@catubc
Copy link

catubc commented Oct 25, 2021

Ok, we will try workaround and report.

@talmo
Copy link
Collaborator Author

talmo commented May 12, 2022

It is not clear if this remains as an issue, but we should at a minimum add some tests here:

  • With and without resize_and_pad_to_target enabled
  • With sleap.Videos from both MediaVideo and SingleImageVideo sources (e.g., DLC imports)
  • Integration test so we actually run a compiled pipeline (pulling from tf.data.Dataset examples) and not just the standalone transformer (SizeMatcher)

@leonardolv
Copy link

Hi,

I'm using SLEAP from the source, and I have the problem described above.
My model is trained with different aspect ratio videos than the ones I'm predicting on.
Then I predicted on several videos and the predictions are shifted, as you can see in the attached picture. Also one of the markers (tail base) is not in the correct position...

Do I need to predict again with the solution above or can I just apply it after the analysis?

RAT SLEAP

SLEAP is very useful BTW, thanks!

Leonardo

@roomrys
Copy link
Collaborator

roomrys commented May 30, 2022

Hi @leonardolv,

Thanks for confirming that this remains an issue! We will work on providing a more permanent fix.

The training_config.json is used during inference and thus needs to be edited before making the predictions. Unfortunately, you will need to make the above changes to each training_config.json used during inference and predict again.

Thanks,
Liezl

@leonardolv
Copy link

Hi @roomrys,

I edited the training_config.json as @talmo suggested but it does not work.
There is an error that prevents inference in any video I've tried...
This is the error:

ERROR SLEAP

Please let me know if you need more info about the bug.

Is there any other solution to this shift in predictions?

Thanks,

Leonardo

@roomrys
Copy link
Collaborator

roomrys commented Jun 2, 2022

Hi @leonardolv,

Can we solicit you for a sample video and model please?

Thanks,
Liezl

@leonardolv
Copy link

leonardolv commented Jul 4, 2022

Hi @roomrys,

Sorry for the late response.
Unfortunately I deleted all the videos with different sizes and started over from scratch.
In that way I didn't have the error described above.

Thanks for your answer.

Best,

Leonardo

@roomrys
Copy link
Collaborator

roomrys commented Jul 19, 2022

Update: Bug located - creating PR + tests

The problem was that we had a leftover line in SizeMatcher which effectively applied scaling twice when wratio >= hratio. Scaling is applied here when wratio >= hratio

example[self.scale_key] = example[self.scale_key] * hratio

and again for any wratio/hratio

example[self.scale_key] = example[self.scale_key] * effective_scaling_ratio

which results in reversing too large/small of a scale for higher/lower resolution videos hence the instances being too big/small on respective resolutions.

Test1 (Crop to different aspect ratio, instance size remains same)

  • I trained a model on a 1024 X 1024 video then cropped a copy of the video to 768 X 1024 with an x-offset of 128. The model predicted fine on both videos.
  • I then repeated the crop in the y-direction (w/ y-offset as well). The predictions were fine.
  • I predicted through sleap-track and through the GUI (loading both cropped and uncropped, then again with just the cropped) - both methods resulted in good predictions.

Test2 (Pad to different aspect ratio, instance size remains same)

  • I added padding to a 1024 X 1024 video to create a 1280 X 1024 video with no offset (top left corner at 0, 0).
  • I predicted on the padded video using the original model. The results were fine.

Test3 (Up-sample keeping same aspect ratio, instances are scaled up)

  • I scaled the 1024 X 1024 video to 1280 X 1280 - predictions were too large and offset down and to the right (in positive direction) as seen below.

image
Fig 1: Shifted predictions when instances are scaled up.

image
Fig 2: Oversized predictions when instances are scaled up.


Test4 (Down-sample keeping aspect ratio, instances are scaled down)

  • I scaled the 1024 X 1024 video to 768 X 768 - predictions were too small and offset up and to the left (in negative direction) as seen below.

image
Fig 3: Shifted predictions when instances are scaled down.

image
Fig 4: Undersized predictions when instances are scaled down.

Ideas:

  1. Perhaps the scaling formula was meant to be written as scale * (x - offset) but was actually written as scale * x - offset (plus incorrect scale)
  2. The scale may have been miscalculated (too large)
  3. The offset may have been miscalculated (too small)
  4. The bug originates in loading the model -> inference (TrainingJobConfig.data.preprocessing.target_height/target_width)
  5. Bug in SizeMatcher?
  6. Need to scale predicted coordinates - but it seems like we do this, just incorrectly (up-sampled gets bigger, and down-sampled gets smaller). There is a PointsRescaler transformer available that is never used.

Observations

  • Always predicts correctly on largest video (that has labeled frame and was used in training). This is likely because SizeMatcher only scales smaller videos and uses largest video as a reference.
  • Qt coordinates in viewer update to reflect video resolution (different coordinates for same mouse position on different sized videos)

Relevant Code

  1. Make the provider from CLI args

    sleap/sleap/nn/inference.py

    Lines 4284 to 4285 in 44e4661

    # Setup data loader.
    provider, data_path = _make_provider_from_cli(args)

  2. Run inference

    sleap/sleap/nn/inference.py

    Lines 4294 to 4295 in 44e4661

    # Run inference!
    labels_pr = predictor.predict(provider)

  3. Generate predictions

    sleap/sleap/nn/inference.py

    Lines 430 to 431 in 44e4661

    # Initialize inference loop generator.
    generator = self._predict_generator(data)

  4. Make the data pipeline from the data_provider inside _predict_generator

    sleap/sleap/nn/inference.py

    Lines 305 to 306 in 44e4661

    # Initialize data pipeline and inference model if needed.
    self.make_pipeline(data_provider)

  5. Add SizeMatcher to pipeline inside make_pipeline.

    sleap/sleap/nn/inference.py

    Lines 259 to 267 in 44e4661

    if self.data_config.preprocessing.resize_and_pad_to_target:
    points_key = None
    if data_provider is not None and "instances" in data_provider.output_keys:
    points_key = "instances"
    pipeline += SizeMatcher.from_config(
    config=self.data_config.preprocessing,
    provider=data_provider,
    points_key=points_key,
    )

5a. Predictor.data_config is an abstract method which is set based on the type of predictor used, but always contains a DataConfig from TrainingJobConfig.data. Ex for TopDownPredictor:

sleap/sleap/nn/inference.py

Lines 1952 to 1958 in 44e4661

@property
def data_config(self) -> DataConfig:
return (
self.centroid_config.data
if self.centroid_config
else self.confmap_config.data
)

  1. Process batch for each item in dataset

    sleap/sleap/nn/inference.py

    Lines 402 to 403 in 44e4661

    for ex in self.pipeline.make_dataset():
    yield process_batch(ex)

  2. Set scale key in SizeMatcher. Notice that L406 is not applied when hratio > wratio

    # Only apply this transform if image shape differs from target
    if (
    current_shape[-3] != self.max_image_height
    or current_shape[-2] != self.max_image_width
    ):
    # Calculate target height and width for resizing the image (no padding
    # yet)
    hratio = self.max_image_height / tf.cast(current_shape[-3], tf.float32)
    wratio = self.max_image_width / tf.cast(current_shape[-2], tf.float32)
    if hratio > wratio:
    # The bottleneck is width, scale to fit width first then pad to
    # height
    effective_scaling_ratio = wratio
    target_height = tf.cast(
    tf.cast(current_shape[-3], tf.float32) * wratio, tf.int32
    )
    target_width = self.max_image_width
    else:
    # The bottleneck is height, scale to fit height first then pad to
    # width
    effective_scaling_ratio = hratio
    target_height = self.max_image_height
    target_width = tf.cast(
    tf.cast(current_shape[-2], tf.float32) * hratio, tf.int32
    )
    example[self.scale_key] = example[self.scale_key] * hratio

  3. Adjust points based on scale set by SizeMatcher inside process_batch

    sleap/sleap/nn/inference.py

    Lines 323 to 326 in 44e4661

    # Adjust for potential SizeMatcher scaling.
    ex["instance_peaks"] /= np.expand_dims(
    np.expand_dims(ex["scale"], axis=1), axis=1
    )

  4. Add points to labeled frame inside _make_labeled_frames_from_generator

    sleap/sleap/nn/inference.py

    Lines 1306 to 1321 in 44e4661

    # Loop over frames.
    for video_ind, frame_ind, points, confidences in zip(
    ex["video_ind"],
    ex["frame_ind"],
    ex["instance_peaks"],
    ex["instance_peak_vals"],
    ):
    # Loop over instances.
    predicted_instances = [
    sleap.instance.PredictedInstance.from_arrays(
    points=points[0],
    point_confidences=confidences[0],
    instance_score=np.nansum(confidences[0]),
    skeleton=skeleton,
    )
    ]

@anas-masood
Copy link

anas-masood commented Aug 4, 2022

Hey, any update to this? We are facing the same issue when trained the network with video sizes of 1248x1248 but now predicting on a different video size (1408x1408). We are doing this because we were losing some corners.

I also tried to rescale my prediction videos to 1248x1248 using ffmpeg toolbox but the way this scaling is done is still causing some constant offset to all instances (even when predicting a sequence of frames using flow). It would be nice if we don't have to retrain the network. Also any advise on what would happen if we train the same network with images/videos of different sizes?

@roomrys
Copy link
Collaborator

roomrys commented Aug 8, 2022

Hi @anas-masood,

I just ran a quick test: training the network with images/videos of different size still results in offset predictions in the videos where SLEAP tries to automatically rescale the image/video.

I have shifted my focus back to this PR and will post updates here.

Thanks,
Liezl

@anas-masood
Copy link

@roomrys , thanks so much. For now we have trained another network on our desired resolution but reading (superficially) through the doc i found that the tracker and network both crop and resize for data augmentation meaning the network prediction should be robust to this. Must be happening at rescaling (scaling back) the node values to the input image. Don't know if i am totally correct though. Would love to start contributing and testing but have loads on my plate. Bonne chance and let me know if you find a solution.

@roomrys roomrys added the fixed in future release Fix or feature is merged into develop and will be available in future release. label Aug 10, 2022
@roomrys
Copy link
Collaborator

roomrys commented Aug 10, 2022

Hi @anas-masood, @leonardolv,

The fix has been merged into develop and will be available in the next release. Alternatively, you can install SLEAP from source to get the fix now.

Thanks,
Liezl

@roomrys roomrys self-assigned this Aug 16, 2022
@roomrys
Copy link
Collaborator

roomrys commented Sep 12, 2022

This issue has been resolved in the new release - install here.

@roomrys roomrys closed this as completed Sep 12, 2022
@roomrys roomrys removed the fixed in future release Fix or feature is merged into develop and will be available in future release. label Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants