Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOD implementation #61

Merged
merged 9 commits into from
Mar 18, 2019
Merged

SOD implementation #61

merged 9 commits into from
Mar 18, 2019

Conversation

John-Almardeny
Copy link
Contributor

@John-Almardeny John-Almardeny commented Mar 11, 2019

All Submissions Basics:

#60

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you checked all Issues to tie the PR to a specific one?

All Submissions Cores:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

New Model Submissions:

  • Have you created a .py in ~/pyod/models/?
  • Have you created a test_.py in ~/pyod/test/?
  • Have you lint your code locally prior to submission?

@coveralls
Copy link

coveralls commented Mar 11, 2019

Pull Request Test Coverage Report for Build 699

  • 57 of 59 (96.61%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 95.496%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pyod/models/sod.py 49 51 96.08%
Totals Coverage Status
Change from base Build 682: 0.02%
Covered Lines: 3583
Relevant Lines: 3752

💛 - Coveralls

@yzhao062
Copy link
Owner

Hi there,

Thanks a lot for the PR. Could you only commit "sod.py", "test_sod.py", and "sod_example.py"? The xml and pyc files are not needed.
Cheers,

Yue

@yzhao062 yzhao062 self-requested a review March 11, 2019 14:26
@John-Almardeny
Copy link
Contributor Author

John-Almardeny commented Mar 11, 2019

Hi @yzhao062

No problem, I just deleted them.
Also I optimized the performance, now it is faster in roughly 150% than before by caching existing results during SNN process.

Thanks,
Yahya.

@John-Almardeny
Copy link
Contributor Author

Hi @yzhao062

For large datasets, most likely memory cannot handle them in SOD, thus I improved this aspect so it can handle datasets up to 65535 features with less consumption of memory.

It's worth mentioning that SOD is slow by nature since it performs peer-to-peer comparison along the observations. The original research paper did not mention any suggestion to do partial comparison (i.e. with a subset of observations), nor did find any study/suggestion to do that in SOD in particular!

To avoid any inconvenience, I'd suggest to leave it as it's been originally proposed.

Thanks,
Yahya.

@John-Almardeny
Copy link
Contributor Author

Hi @yzhao062

I could avail of numpy isin function to speed up the execution and the performance now is extremely faster than before.

I am very happy with it now. Thus I believe no more commits will be from my side for SOD.

I will leave it for you for review.

Thanks,
Yahya.

@yzhao062
Copy link
Owner

yzhao062 commented Mar 12, 2019

Hi Yahya,

Thanks a lot for the work! I will review the code, make necessary changes, and add some additional parts (example, the content of the test file, and link it with the documentation).

This will take some time as I am currently busy with (branch [jmlr]). The PyOD accompanying paper just gets accepted by JMLR and I need to address the concerns from the reviewers to make the paper published. We are aiming to finish this in 1-2 weeks and release a new version 0.6.9.

Your SOD implementation will be scheduled to release in v 0.7.0 as an exciting new detection model. It should be out by either late this month or early next month. I will keep you posted if there is anything needed.

Please feel free to reach out if you have any questions.
Thanks again.

Yue

@John-Almardeny
Copy link
Contributor Author

Hi Yue,

Cheers. Great work ya did indeed. Thanks very much.
Looking forward to seeing the new version of PyOD.

Kind Regards,
Yahya.

@yzhao062 yzhao062 merged commit 27bf057 into yzhao062:development Mar 18, 2019
@yzhao062
Copy link
Owner

Hello Yahya,

I finally got the chance to review the code (as we just finished a big update in jmlr branch). I think some changes are still required for SOD model and I will have to revert the PR for now. I am looking forward to seeing the changes and SOD could be a very important component of PyOD.

  1. X_train is saved in fit function. This should be prevented by every effort because it may lead to a serious memory issue.
    self.X_train_ = X
    I would suggest saving the trained models only. For example, save KNN instance or KD-tree, not X itself.
  2. decision_function does not actually predict on unseen dataset X:
    return self._sod() Here X is not passed in.
    which means when we pass in a new data set X_test to decision_function, its score will not be generated.

I wrote a quick example (http://www.andrew.cmu.edu/user/yuezhao2/shared/SOD_example.py) to test SOD, see error message below.
image

The error is due to decision_function doest not actually predict on the new data (X_test) but the train data. This should be fixed before it could be merged.

Again thanks a lot for doing this, and I am happy to discuss how to fix this and have it in PyOD.

@John-Almardeny
Copy link
Contributor Author

Hi Yue, @yzhao062

Regarding issue number 1: I followed the implementation you have already in ABOD and XGBOD where X is assigned to the x_train class attribute. However, I will try to understand what you want in this and fix it.

Point number 2 is truly valid: indeed it does not predict on unseen data, I am already using it in my research thus I missed this point since I do not need it to predict on unseen data rather only on the whole data set.
I will work on fixing it.
Thanks.

@John-Almardeny John-Almardeny mentioned this pull request Apr 15, 2019
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants