Skip to content

Conversation

@Witiko
Copy link
Contributor

@Witiko Witiko commented Aug 11, 2020

This pull request aims to make video processing faster by increasing the accuracy and the speed of scene detection:

  1. We now use the mean squared error (MSE) of pixel values, which is differentiable and places emphasis on large errors, rather than the non-differentiable mean absolute error (MAE). This change should improve accuracy.
  2. The frame image pixels are now denoised and sampled before comparison. This change should improve speed and accuracy.
  3. The error of pixel values is computed in the CIE L*a*b* color space, where the Euclidean distance corresponds to the perceptual color distance, instead of sRGB. This change should improve accuracy.

As a result of these changes, we were able to increase threshold from 0.12 MAE to 0.22 MSE while retaining 100% accuracy on the training set. Just replacing MAE with MSE decreases the threshold from 0.12 MAE to 0.1 MSE, i.e. the increased threshold is due to the denoising and the perceptual color distance, which increase separability. For further speed improvement, the threshold could be increased to 0.25 MSE for ≥ 95% accuracy on the training set.

@xbankov, when you are testing new models, can you please write new code on top of the speed-up-scene-detection branch to see if this helps the conversion speed? If so, I will merge this and close #2.

@Witiko Witiko changed the title Speed up scene detection Make scene detection faster and more accurate Aug 11, 2020
Change mean absolute error to MSE, downscale frame images, use CIE LAB
@Witiko Witiko force-pushed the speed-up-scene-detection branch from e9d222e to a04d0a9 Compare August 11, 2020 16:19
@xbankov
Copy link
Collaborator

xbankov commented Aug 12, 2020

Looks great! I will use the speed-up-scene-detection branch for testing the conversion speed

@Witiko
Copy link
Contributor Author

Witiko commented Aug 13, 2020

Evaluation with the withheld IA067-D2-20191112.mp4 recording shows that the updated scene detector significantly improves speed (except with the annotated page detector, which does no computation) with no loss of accuracy, see the table below and the experimental notebook. This is most significant with vgg16, which took 2:30 hours to finish without scene detection and only 12 minutes with scene detection on my machine. This is a significant leap forward, although the low accuracies are worrying: We need to replace vgg16 with something better before the system is practically useful, see #5.

Screenshot_20200813_230451-2x

@xbankov, if you remove the line that says # fastai is too large to load and rerun the experimental notebook, you should get a larger table that also contains the measurements for your screen detector. With the updated scene detector, we should finally see some real-time performance. Feel free to push the notebook with the updated table to the speed-up-scene-detection branch. I am interested to see what the results will be.

@Witiko Witiko mentioned this pull request Aug 13, 2020
10 tasks
@xbankov
Copy link
Collaborator

xbankov commented Aug 14, 2020

90272110-2efed980-de5d-11ea-9b8e-bfab33f38f6b

fastai looks "more" real-time with distance, however, accuracies are a little bit worrying.
Why are page detectors better with a fastai screen detector than with annotated?

@Witiko
Copy link
Contributor Author

Witiko commented Aug 14, 2020

Why are page detectors better with a fastai screen detector than with annotated?

The recording was captured in 2019, but the last annotations for screen positions are from 2016. I would expect that the cameras have moved since then and the annotations are therefore imperfect.

vgg16: 57.89% | 63.16%

Switching the pages may cause overexposure and it takes a few seconds for the camera to readjust. The training dataset does not contain these moments. Since the scene detector only uses the screen detector during these moments, it may be that the screen detector is unable to detect the screens correctly.

annotated: 25.00% | 30.26%

If fastai with the annotated page detector received < 100% accuracy, then it could be because fastai does not detect any screens or detects too many screens. However, I am baffled as to why fastai with the annotated page detector received less than fastai with vgg16. If you could delete the file docs/notebooks/__main__/speed_and_accuracy_outputs/IA067-D2-20191112-annotated-fastai-distance.accuracy, apply the following patch and rerun the notebook, then we should be able to tell.

diff --git a/docs/notebooks/__main__/annotated.py b/docs/notebooks/__main__/annotated.py
index 0dd2652..5c21d08 100644
--- a/docs/notebooks/__main__/annotated.py
+++ b/docs/notebooks/__main__/annotated.py
@@ -141,10 +141,14 @@ def evaluate_event_detector(event_detector):
             if page_number is None:
                 if not detected_page_dict:
                     num_successes += 1
+                else:
+                    print(f'Frame {frame_number}: Expected no pages, but detected {detected_page_dict}')                                                                                                        
             else:
                 detected_page_numbers = set(page.number for page in detected_page_dict.values())
                 if len(detected_page_dict) <= 2 and detected_page_numbers == set([page_number]):
                     num_successes += 1
+                else:
+                    print(f'Frame {frame_number}: Expected at most 2 screens with page {page_number}, but detected {len(detected_page_dict)} screens with pages {detected_page_numbers}')                       

         if isinstance(event, (ScreenAppearedEvent, ScreenChangedContentEvent)):
             detected_page_dict[event.screen_id] = event.page

The notebook is written so that only the table cell of fastai with the annotated page detector will be rerun.

@xbankov
Copy link
Collaborator

xbankov commented Aug 19, 2020

Thi print statement looks like the following:
That means screen detector does not work that good, am I right?

Frame 346: Expected at most 2 screens with page 1, but detected 0 screens with pages set()
Frame 346: Expected at most 2 screens with page 1, but detected 3 screens with pages {1}
Frame 346: Expected at most 2 screens with page 1, but detected 3 screens with pages {1}
Frame 346: Expected at most 2 screens with page 1, but detected 3 screens with pages {1}
Frame 346: Expected at most 2 screens with page 1, but detected 3 screens with pages {1}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1}
...
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1, 2}
Frame 1077: Expected at most 2 screens with page 2, but detected 2 screens with pages {1, 2}
Frame 1077: Expected at most 2 screens with page 2, but detected 3 screens with pages {2}
Frame 1077: Expected at most 2 screens with page 2, but detected 3 screens with pages {2}
Frame 1077: Expected at most 2 screens with page 2, but detected 3 screens with pages {2}
...
Frame 1077: Expected at most 2 screens with page 2, but detected 3 screens with pages {2}
Frame 65925: Expected no pages, but detected {'screen-69': <PDFDocumentPage, page #65>, 'screen-130': <PDFDocumentPage, page #65>, 'screen-131': <PDFDocumentPage, page #65>}
Frame 65925: Expected no pages, but detected {'screen-69': <PDFDocumentPage, page #65>, 'screen-130': <PDFDocumentPage, page #65>, 'screen-131': <PDFDocumentPage, page #65>}

@Witiko
Copy link
Contributor Author

Witiko commented Aug 19, 2020

@xbankov It is unexpected that the frame numbers should repeat. Is this the raw output you are getting? If so, this indicates errors in the evaluation code. I will need to troubleshoot this further, thank you for investigating.

What we can say for sure is that the screen detector seems to be detecting three screens instead of two in the first two cases. At the moment, this is not too troubling and we can relax the conditions for success to all screens showing the expected page, no matter the number of detected screens.

@Witiko
Copy link
Contributor Author

Witiko commented Aug 19, 2020

I received a very different output and an accuracy of 75% (instead of 25%) after removing file IA067-D2-20191112-annotated-fastai-distance.accuracy and rerunning the notebook:

Frame 1476: Expected at most 2 screens with page 3, but detected 3 screens with pages {3}
Frame 5203: Expected at most 2 screens with page 11, but detected 3 screens with pages {11}
Frame 9144: Expected at most 2 screens with page 17, but detected 3 screens with pages {17}
Frame 14923: Expected at most 2 screens with page 26, but detected 3 screens with pages {26}
Frame 15365: Expected at most 2 screens with page 27, but detected 3 screens with pages {27}
Frame 15659: Expected at most 2 screens with page 28, but detected 3 screens with pages {28}
Frame 15904: Expected at most 2 screens with page 29, but detected 3 screens with pages {29}
Frame 16904: Expected at most 2 screens with page 30, but detected 3 screens with pages {30}
Frame 19077: Expected at most 2 screens with page 33, but detected 3 screens with pages {33}
Frame 25177: Expected at most 2 screens with page 45, but detected 3 screens with pages {45}
Frame 25851: Expected at most 2 screens with page 46, but detected 3 screens with pages {46}
Frame 27114: Expected at most 2 screens with page 47, but detected 3 screens with pages {47}
Frame 27885: Expected at most 2 screens with page 48, but detected 3 screens with pages {48}
Frame 42657: Expected at most 2 screens with page 50, but detected 3 screens with pages {50}
Frame 43460: Expected at most 2 screens with page 51, but detected 3 screens with pages {51}
Frame 44916: Expected at most 2 screens with page 53, but detected 3 screens with pages {53}
Frame 50477: Expected at most 2 screens with page 57, but detected 3 screens with pages {57}
Frame 50572: Expected at most 2 screens with page 58, but detected 3 screens with pages {58}

It seems that the only issue is an extra detected screen:

image

Do you have any idea why such output would be produced, @xbankov?
I will rerun the accuracy part of the evaluation, ignoring errors in the number of screens, and report the results tomorrow.

@Witiko
Copy link
Contributor Author

Witiko commented Aug 21, 2020

@xbankov Below are the updated accuracies:

results

It seems that using the scene detector leads to some loss of accuracy afterall, although in is not clear whether it is the page detector or the screen detector that is responsible for this. We should be able to tell after I have fixed the faulty screen annotations and recomputed the accuracies.

@Witiko
Copy link
Contributor Author

Witiko commented Aug 23, 2020

Here are the accuracies after I have fixed the screen annotations:

90902566-f2872c80-e3cc-11ea-9c9f-ed96904e7903

With imagehash and vgg16 page detectors, fastai achieves up to 10% worse accuracy compared to annotated, because it detects three screens instead of two, as discussed above. The siamese page detector is a great mystery, since it seems to benefit from fastai: this needs further investigation.

The distance scene detector decreases accuracy with both the fastai and the annotated screen detectors, but fastai is hit harder. My hypothesis is that the scene detector will often detect a scene transition when a crossfade between two presentation slides has not yet finished. Neither the fastai screen detector nor the page detectors have been trained with cross-faded images.

@Witiko
Copy link
Contributor Author

Witiko commented Aug 23, 2020

The siamese page detector is a great mystery, since it seems to benefit from fastai: this needs further investigation.

Below is the output of the annotated screen detector:

Frame 346: Expected at most 2 screens with page 1, but detected 0 screens
Frame 1077: Expected at most 2 screens with page 2, but detected 0 screens
Frame 1476: Expected at most 2 screens with page 3, but detected 0 screens
Frame 1623: Expected at most 2 screens with page 4, but detected 0 screens
Frame 2063: Expected at most 2 screens with page 5, but detected 0 screens
Frame 2642: Expected at most 2 screens with page 6, but detected 0 screens
Frame 3543: Expected at most 2 screens with page 7, but detected 0 screens
Frame 4202: Expected at most 2 screens with page 8, but detected 0 screens
Frame 4414: Successfully detected page 9
Frame 4828: Expected at most 2 screens with page 10, but detected 0 screens
Frame 5203: Expected at most 2 screens with page 11, but detected 0 screens
...

Below is the output of the fastai screen detector:

Frame 346: Successfully detected page 1
Frame 1077: Successfully detected page 2
Frame 1476: Expected at most 2 screens with page 3, but detected 0 screens
Frame 1623: Successfully detected page 4
Frame 2063: Successfully detected page 5
Frame 2642: Expected at most 2 screens with page 6, but detected 0 screens
Frame 3543: Expected at most 2 screens with page 7, but detected 0 screens
Frame 4202: Expected at most 2 screens with page 8, but detected 0 screens
Frame 4414: Successfully detected page 9
Frame 4828: Expected at most 2 screens with page 10, but detected 0 screens
Frame 5203: Expected at most 2 screens with page 11, but detected 0 screens
...

As you can see in the output of the fastai screen detector with the annotated page detector, the fastai screen detector with the siamese page detector fails where fastai fails (frames 1476, 5203, ...). This indicates that siamese benefits from different coordinates of the screens detected by fastai, not from fastai's failure to detect the correct number of screens.

Below, you can see the detected screens from frames 346, 1077, and 1623, where fastai+siamese succeeds (green), but annotated+siamese fails (red):

frame 346
frame 1077
frame 1623

As you can see, the screens detected by fastai do not seem better than annotated, i.e. this seems to be just an idiosyncrasy of siamese. It could be that siamese cannot cope with the annotated screens that extend beyond the boundaries of a screen.

@Witiko Witiko merged commit 918378d into master Aug 23, 2020
@xbankov xbankov deleted the speed-up-scene-detection branch February 6, 2021 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MeanSquaredErrorSceneDetector does not improve speed

3 participants