Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unittest: Find similar desc/fix #10

Closed
andresriancho opened this issue Mar 28, 2015 · 1 comment
Closed

Unittest: Find similar desc/fix #10

andresriancho opened this issue Mar 28, 2015 · 1 comment

Comments

@andresriancho
Copy link
Contributor

Create a unittest that will find descriptions that are duplicated/very similar between two files. I'm worried about some of the data we imported from arachni, namely all the xss_* we have at https://github.com/vulndb/data/tree/master/db . If they are duplicated we should remove them, and the unittest will also help is avoid similar issues in the future.

@andresriancho
Copy link
Contributor Author

from difflib import SequenceMatcher
from tests.vulndb_test import VulnDBTest


class TestSimilarTexts(VulnDBTest):

    MAX_RATE = 0.8

    def get_rate(self, a, b):
        return SequenceMatcher(None, a, b).ratio()

    def test_similar_texts(self):
        invalid = []

        for _file_1, db_data_1 in self.get_all_json():
            for _file_2, db_data_2 in self.get_all_json():

                if _file_1 == _file_2:
                    continue

                description_1 = self.to_string(db_data_1['description'])
                description_2 = self.to_string(db_data_2['description'])
                if self.get_rate(description_1, description_2) > self.MAX_RATE:
                    invalid.append((_file_1, _file_2, 'description'))

                fix_1 = self.to_string(db_data_1['fix']['guidance'])
                fix_2 = self.to_string(db_data_2['fix']['guidance'])
                if self.get_rate(fix_1, fix_2) > self.MAX_RATE:
                    invalid.append((_file_1, _file_2, 'fix_guidance'))

        self.assertEqual(invalid, [])

Well... that outputs A LOT of similarities...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant