Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display hyperlinks #79

Merged
merged 22 commits into from
Jul 15, 2020
Merged

Display hyperlinks #79

merged 22 commits into from
Jul 15, 2020

Conversation

hhc97
Copy link
Collaborator

@hhc97 hhc97 commented Jul 10, 2020

Displays links as hyperlinks within clues and student comments and replies.

Also sanitize incoming text before displaying links to prevent against unwanted attacks.

If a potential bad comment is found, it is logged into the debug logger at WARNING level in the following format:

10/07/2020 03:37:16 [WARNING] - HTML detected in comment or reply (student): {'text': '<badtag> test text </badtag>', 'owner': 15, 'instance': 4, 'release': 3}

A set of pre-chosen tags can be allowed through by adding them into the self.allowed attribute of the new HTMLSanitizer class.

@hhc97 hhc97 added the enhancement New feature or request label Jul 10, 2020
@hhc97 hhc97 requested a review from utmandrew July 10, 2020 03:50
@hhc97 hhc97 linked an issue Jul 10, 2020 that may be closed by this pull request
Copy link
Owner

@utmandrew utmandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the use of bleach (https://pypi.org/project/bleach/) instead of a custom Sanitizer class in views.py. The argument: sanitization is a security issue, and a failure to sanitize properly will lead to a security vulnerability.

Your code does look clean -- and I can't see a bug in it -- but that isn't a guarantee. :-)

@hhc97
Copy link
Collaborator Author

hhc97 commented Jul 14, 2020

For the purposes of logging bad comments bleach doesn't seem to tell you if it found any unwanted tags in the input string, and we can't just do if bleached_text != original_text because an <em > will get bleached to a <em> (notice the space) but they are the same tag and should not be logged. We could perhaps count the number of <'s but again I'm not sure that's reliable.

Lastly, I'd argue that my custom class depends on the builtin HTMLParser so if we depend on the parent class to be able to find all HTML tags I feel like that would be secure enough? Also a plus because its a builtin? (one less dependency) ;)

@utmandrew
Copy link
Owner

It would be sad to lose precise logging.

I don't think that reducing dependencies is an argument. This is a security-related function. I would prefer that someone who is likely to be paying close attention to threats be doing updates. That's actually a benefit.

I don't have an absolute preference on this since you've already completed the work, but if I were to have done this, I would have used bleach. Writing code to do something when there is an already maintained package that already does it is (a) slower and (b) a risk.

@hhc97 hhc97 linked an issue Jul 15, 2020 that may be closed by this pull request
@hhc97 hhc97 mentioned this pull request Jul 15, 2020
@hhc97
Copy link
Collaborator Author

hhc97 commented Jul 15, 2020

Agreed on that point, bleach will be maintained for longer than I will be around to maintain this. So switched to bleach.

Although on that point about 'faster', yes its faster to implement bleach, but I did some testing and found that bleach is actually consistently 10-20x slower than a custom parser class when sanitizing. Good thing is that it only needs to be run once, and as long as thousands of comments don't suddenly flood the server it should be fine, we're not at a point where that would be a bottleneck.

@hhc97 hhc97 requested a review from utmandrew July 15, 2020 15:04
Copy link
Owner

@utmandrew utmandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The whitespace fixes make paragraphs happen but does not fix indentation. That is being left for markdown.

@utmandrew utmandrew merged commit 6309c12 into master Jul 15, 2020
@hhc97 hhc97 deleted the display-hyperlinks branch July 24, 2020 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hyperlinks - show as links Confirm space visibility
2 participants