Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rogue node detection. #27

Merged
merged 34 commits into from Aug 21, 2016

Conversation

Projects
None yet
2 participants
@CdavM
Copy link
Contributor

CdavM commented Jul 25, 2016

This PR should be used to discuss an implementation of the rogue node algorithm implementation.

@mitar

This comment has been minimized.

Copy link
Member

mitar commented Jul 26, 2016

So, why are tests failing here?


from nodewatcher import celery

from nodewatcher.modules.monitor.http.survey.management.commands.export_survey_data import extract_survey_graph

This comment has been minimized.

Copy link
@mitar

mitar Jul 26, 2016

Member

I think it would be better if this particular function would be somewhere else, and then both the management command and this task would import it.

@@ -0,0 +1,81 @@
import datetime

from django.core.mail import mail_admins

This comment has been minimized.

Copy link
@mitar

mitar Jul 26, 2016

Member

Import only modules, never methods/functions directly.

import datetime

from django.core.mail import mail_admins

This comment has been minimized.

Copy link
@mitar

mitar Jul 26, 2016

Member

No newlines between imports of the same top-level package.

@CdavM

This comment has been minimized.

Copy link
Contributor Author

CdavM commented Jul 27, 2016

It's no longer failing. Had to rename a couple of modules.

graph=input_graph['graph'],
friendly_nodes=input_graph['friendly_nodes'],
)
# append -results to the end of the filename

This comment has been minimized.

Copy link
@mitar

mitar Jul 27, 2016

Member

Comment style.

from nodewatcher.modules.analysis.rogue_nodes.tasks import rogue_node_detection_algorithm

import json
import io

This comment has been minimized.

Copy link
@mitar

mitar Jul 27, 2016

Member

Imports order.

@mitar

This comment has been minimized.

Copy link
Member

mitar commented Jul 27, 2016

Hm, do you think it is useful to store whole results for your tests? Maybe. But isn't the idea that you detect rogue nodes, but slight changes in values might not be so problematic? So maybe more important is that class rogue/not does not change?

"""

nx_graph = nx.Graph()
nx_graph.add_nodes_from([(node['i'], {'b': node['b']}) if 'b' in node else node['i'] for node in graph["v"]])

This comment has been minimized.

Copy link
@mitar

mitar Jul 27, 2016

Member

Don't use ".

@@ -0,0 +1,37 @@
import datetime
from django.core import mail

This comment has been minimized.

Copy link
@mitar

mitar Jul 27, 2016

Member

Newline before this import.

@CdavM

This comment has been minimized.

Copy link
Contributor Author

CdavM commented Jul 27, 2016

When you say "results", you mean the output of the detection algorithm? Meaning a bunch of nodes and the probability of each being rogue?
I think we need to store all the data because that should be the output of an algorithm. An alternative is to filter the results before the algorithm outputs them, but I prefer it the way it is now.
But it is true that I am assuming that there is only one rogue node in these test cases and we are not sensitive at all to perturbations.

So what should we do?

@CdavM

This comment has been minimized.

Copy link
Contributor Author

CdavM commented Jul 31, 2016

Yes, I implemented a custom test suite. Please do a code review.

@CdavM

This comment has been minimized.

Copy link
Contributor Author

CdavM commented Jul 31, 2016

We could also try to ensure that the numbers are in a certain range, as you proposed, but it is not within the scope of the algorithm to determine the "classification" of a rogue node. That is done by the maintainers. So we might be breaking abstraction barriers.

But more importantly, I don't foresee anyone tampering with this algorithm in the near future so I wouldn't spend too much time trying to predict how they will work on it.

"""

nx_graph = nx.Graph()
nx_graph.add_nodes_from([(node['i'], {'b': node['b']}) if 'b' in node else node['i'] for node in graph['v']])

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

I do not get it here. Nodes are or tuple of (ID, {b: BSSID}) or just ID? This hurts my mental type-checking. :-)

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 11, 2016

Author Contributor

Yes, that's how you create nodes in nx:
http://networkx.readthedocs.io/en/latest/reference/generated/networkx.Graph.add_node.html#networkx.Graph.add_node

It has to be a tuple in case you're storing additional information.

},
{
"name": "C4:6E:1F:C5:76:84",
"probability_being_rogue": 1,

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

Probability is not a float?

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 11, 2016

Author Contributor

This got fixed and it's now a float. Must be an outdated json file.

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

Then fix it in the JSON.

},
{
"name": "C4:6E:1F:C5:76:84",
"probability_being_rogue": 0.5,

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

So in this case it did not detect it with our 0.9 threshold?

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 11, 2016

Author Contributor

Correct.

"JSTJEsuits",
null,
"Allenoke-2.4",
"TP-LINK_C57684"

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

???

You added fake SSIDs to it?

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

Is this another node? But why it looks registered/known? But probability to be rogue is 1? How can a known node (with UUID) have 1 as probability?

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 11, 2016

Author Contributor

I think this was the result of a bug. I should re-do this test case.

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

Yes, re-do these bad cases.

},
{
"name": "C4:6E:1F:C5:76:84",
"probability_being_rogue": 0.5,

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

So this is the real rogue with < 0.9?

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 11, 2016

Author Contributor

Yes, I think so. I should re-do this test case.

]
},
{
"name": "df93387a-6180-4cde-a854-7cba043c385f",

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

You registered a node?

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 11, 2016

Author Contributor

Bug. 🚨

# append -results to the end of the filename
results_filename = '{0}-results.json'.format(filename[:-5])
asserted_output = json.load(io.open(os.path.join(path, results_filename), encoding='utf-8'))
test_cases.addTest(RogueNodeTestCase('run_test', algorithm_output, asserted_output))

This comment has been minimized.

Copy link
@mitar

mitar Aug 11, 2016

Member

Lol. You cannot run tests when creating a suite. You should be only doing definition of tests during this stage and run them later on when tests run.

@mitar

This comment has been minimized.

Copy link
Member

mitar commented Aug 11, 2016

I reviewed the pull request. Please answer questions I made. Also see changes I made. See if everything still works.

@CdavM

This comment has been minimized.

Copy link
Contributor Author

CdavM commented Aug 18, 2016

Hi, not all tests are running with your code. Django only reports running one test: comparing 'behind-couch-....' with its results. How do we fix this?

for filename in files:
# Test every JSON file that does not contain "results" in the filename.
if os.path.splitext(filename)[1] == '.json' and 'results' not in filename:
test_cases.addTest(RogueNodeTestCase('run_test', os.path.join(path, filename)))

This comment has been minimized.

Copy link
@mitar

mitar Aug 18, 2016

Member

Can you check if this is called for all files?

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 18, 2016

Author Contributor

Before you modified the code, I was able to use print statements to debug. It seems that stdout is now being redirected. I also can't use pdb, it seems to skip over the whole thing. How do I verify that this is called?

This comment has been minimized.

Copy link
@mitar

mitar Aug 18, 2016

Member

Print statements in load_tests should work, because that is called when tests are being loaded (the same as before in your code). So does for loop work correctly?

Inside tests stdout is collected and you get it to output only if there is an error. So one simple way is to throw an exception with your message. ;-)

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 18, 2016

Author Contributor

OK, the for loop works correctly and finds all the test cases. test_cases is an array of 7 elements, one for every test we have.
But run_test only runs once in total (only with the first test_filename).

This comment has been minimized.

Copy link
@mitar

mitar Aug 18, 2016

Member

Not sure. Research. Maybe the issue is that we have the same name for tests all the time. So id returns the same. You should probably override id and get it to return module name + class name + test_filename or something.

This comment has been minimized.

Copy link
@mitar

mitar Aug 18, 2016

Member

See how it is officially defined. You could also just call super and append fest_filename to the id.

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 19, 2016

Author Contributor

I decided it would be easier to construct a test_case with a directory and the test would go into the directory and check each entry. This also actually works.

This comment has been minimized.

Copy link
@mitar

mitar Aug 19, 2016

Member

Have you tried id method approach?

Are you sure it works? That tests don't now stop on first error? That you run all tests and allow some of them to fail?

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 19, 2016

Author Contributor

Hi,

the overriding id doesn't work.
Tests probably do stop on the first error, but at least all of them run.

This comment has been minimized.

Copy link
@CdavM

CdavM Aug 19, 2016

Author Contributor

ok. I found out that I have to override the hashing function. It works now. will push new code shortly.

with io.open(os.path.join(path, results_filename), encoding='utf-8') as asserted_output_file:
asserted_output = json.load(asserted_output_file)

self.assertEqual(algorithm_output, asserted_output)

This comment has been minimized.

Copy link
@mitar

mitar Aug 19, 2016

Member

If any test fails now, other tests are not run.


def load_tests(loader, tests, pattern):
test_cases = unittest.TestSuite()
test_cases.addTest(RogueNodeTestCase('run_test', os.path.join(os.path.dirname(__file__), 'test_json_files')))

This comment has been minimized.

Copy link
@mitar

mitar Aug 19, 2016

Member

You do not need that then. You just create a test file with test case and it is found automatically. load_tests is used only when you want custom loading.

I find this approach now bad. The one before was better, but you have to learn to read documentation and how to fix it. Are you saying that making id unique did not work?

@CdavM

This comment has been minimized.

Copy link
Contributor Author

CdavM commented Aug 21, 2016

@mitar Can you do a code review of this module?

@mitar

This comment has been minimized.

Copy link
Member

mitar commented Aug 21, 2016

Perfect. Merging!

@mitar mitar merged commit 672b8a2 into development Aug 21, 2016

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@mitar mitar deleted the rogue-node-detection branch Aug 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.