Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make validation of block html tags and attributes case insensitive #19207

Merged
merged 4 commits into from Dec 20, 2019

Conversation

@glendaviesnz
Copy link
Contributor

glendaviesnz commented Dec 17, 2019

Description

Fixes #18666 by lower casing html tags and attributes before comparison is done.

How has this been tested?

Have tested manually by modifying html tag and attribute casing in the block editor 'code editor' view and then switching back to 'visual editor' view. Have also updated the existing unit test to cover case differences

Types of changes

Adds lower casing of strings before comparison checks are done.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code has proper inline documentation.
attribute[ 0 ] = attribute[ 0 ].toLowerCase();
return attribute;
} ) )
.map( fromPairs );

This comment has been minimized.

Copy link
@aduth

aduth Dec 18, 2019

Member

I am hyper-conscious to the performance impact we might have in these changes, considering that this validation runs over every set of attributes for the markup of every block in a (potentially very long) post.

A couple thoughts:

  • Can we flatten any of these Array#map into a single mapping function?
  • Could we defer an "expensive" case-insensitive comparison to occur only when an initial failure is detected (i.e. optimize a mismatch as the rare edge case)

This comment has been minimized.

Copy link
@glendaviesnz

glendaviesnz Dec 18, 2019

Author Contributor

good point, will take another look at it from the optimisation angle.

This comment has been minimized.

Copy link
@glendaviesnz

glendaviesnz Dec 18, 2019

Author Contributor

A proxy would be a nice way of deferring the case sensitivity check to only failure cases, eg.

const expectedAttributesProxy = new Proxy(expectedAttributes, {
	get( target, property ) {
		if ( target[ property ] ) {
			return target[ property ];
		}
		const caseInsensitiveMatch = target[ Object.keys( target ).find( key => key.toLowerCase() === property.toLowerCase() ) ];
		if ( caseInsensitiveMatch ) {
			return caseInsensitiveMatch;
		}
		return;
	},
	getOwnPropertyDescriptor( target, property ) {
		if( target.hasOwnProperty( property ) ) {
			return { configurable: true, enumerable: true };
		}
                 // do a case insensitive match
		if ( Object.keys( target ).find( key => key.toLowerCase() === property.toLowerCase() ) ){
			return { configurable: true, enumerable: true };
		}
	}
});

But it doesn't look like the getOwnPropertyDescriptor trap can be polyfilled in IE11. Will look for a more old school approach :-)

This comment has been minimized.

Copy link
@glendaviesnz

glendaviesnz Dec 19, 2019

Author Contributor

After a bit of a play around with this, running fromPairs within the lowercasing map made it slower, as did using a single reduce instead of the maps, but a good old fashioned for loop is quite a bit faster in this instance, eg. with a page with 20 paragraph blocks with class and style attributes:

Original case sensitive check - 0.02 milliseconds
Case insensitive check with extra maps (current PR) - 0.045 milliseconds
Single reducer instead of multiple maps - 0.07 milliseconds
Single for loop instead of maps - 0.005 milliseconds

So we can make a case insensitive check faster than the current case sensitive check if we optimise with a for loop for creating the attribute lookup instead of a functional approach, eg.

const expectedAttributes = {};
for (let i = 0; i < expected.length; i++) {
	expectedAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[i][ 1 ];
}
const actualAttributes = {};
for (let i = 0; i < expected.length; i++) {
	actualAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[i][ 1 ];
}

What are your thoughts on this?

This comment has been minimized.

Copy link
@aduth

aduth Dec 19, 2019

Member

After a bit of a play around with this, running fromPairs within the lowercasing map made it slower,

By this, do you mean something like:

const [ actualAttributes, expectedAttributes ] = [ actual, expected ]
	.map( ( attributes ) => fromPairs( attributes.map( ( attribute ) => {
		attribute[ 0 ] = attribute[ 0 ].toLowerCase();
		return attribute;
	} ) ) );

This sort of flattening was the first thought I had in looking at this. I can't see how that could possibly be slower than two separate Array#map.

but a good old fashioned for loop is quite a bit faster in this instance

It doesn't surprise me a whole lot, since for loops tend to be optimized quite well in browsers. The Array#map tends to be a bit more expressive, and I could imagine future browser optimizations to bring the performance characteristics closer together. I'm fine with either approach, to be honest. Optimizing using a for loop could be a nice way to offset the potential impact of these changes.

Could we defer an "expensive" case-insensitive comparison to occur only when an initial failure is detected (i.e. optimize a mismatch as the rare edge case)

By defer, my original thought was more on the lines of: Only do the case-insensitive mapping / comparison if we reach a point where the function would otherwise return false;. In other words, optimize for the majority case that the block is valid and that we don't need to do a case-insensitive matching.

If it turns out that this is too cumbersome to implement, I don't think we need to worry about it too much. But it could be a nice optimization if it turned out to be simple to implement.

This comment has been minimized.

Copy link
@glendaviesnz

glendaviesnz Dec 19, 2019

Author Contributor

This sort of flattening was the first thought I had in looking at this. I can't see how that could possibly be slower than two separate Array#map.

yeh, I was a bit mystified as to why it was slower, maybe I was just made some error with the timers as I was chopping and changing chunks of code a bit, I might give it another go out of interest.

I will also take another look at possible ways to defer the case sensitivity check as you mention.

I am away over the holiday period so probably won't get to it until sometime after 6 Jan. Anyone else is welcome to take over the PR and do some more work on this in the mean time.

This comment has been minimized.

Copy link
@mmtr

mmtr Dec 20, 2019

Contributor

I took over this and tried to optimize the checks by using a for loop and deferring the case-insensitive checks for the minority case that block is invalid: d57882d

It has been a bit cumbersome, but comments should help to understand the intention. I'm happy to simplify that if we prefer easier code to read/maintain over optimizations.

@gwwar

This comment has been minimized.

Copy link
Contributor

gwwar commented Dec 19, 2019

Thanks, this tests well for me manually @glendaviesnz The blocker here is seeing if this is performant enough for us to include the more "moderate" check.

Before After
Screen Shot 2019-12-19 at 2 26 12 PM Screen Shot 2019-12-19 at 2 33 43 PM

With the following markup:

<!-- wp:paragraph -->
<P>A P TAG</P>
<!-- /wp:paragraph -->
@aduth
aduth approved these changes Dec 20, 2019
for ( let i = 0; i < actual.length; i++ ) {
actualAttributes[ actual[ i ][ 0 ] ] = actual[ i ][ 1 ];
expectedAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[ i ][ 1 ];
}

for ( const name in actualAttributes ) {
Comment on lines 414 to 419

This comment has been minimized.

Copy link
@aduth

aduth Dec 20, 2019

Member

I don't want to let this get carried away, but with the revised implementation, it seems like we could easily combine these two loops into a single loop, since they're both effectively iterating over the same set of actual attributes.

This comment has been minimized.

Copy link
@aduth

aduth Dec 20, 2019

Member

I don't want to let this get carried away, but with the revised implementation, it seems like we could easily combine these two loops into a single loop, since they're both effectively iterating over the same set of actual attributes.

On further review, this isn't possible, because the reason these objects are created separate from the subsequent loop is to support the case that attributes can occur in different orders. However, there's a middle-ground option here, in that only the "expected" comparison needs to be converted to an object. We can avoid creating an actualAttributes object, since we'll loop over the actual set anyways in the next loop.

See 57effe9 .

There's probably not much gained through this, but it's a decent simplification to avoid an unnecessary object.

For me, it's enough to offset a "cost" of doing the lower-casing in all code paths. As noted below, I have some concerns where trying to use it as a fallback could leave potential for unexpected behavior.

if ( ! expectedAttributes.hasOwnProperty( name ) ) {
if (
! expectedAttributes.hasOwnProperty( name ) &&
// Optimization: case-insensitive check deferred for the minority case that the block is invalid.

This comment has been minimized.

Copy link
@aduth

aduth Dec 20, 2019

Member

I realize this is part of the blocks module, but this utility was meant to be a very generic "HTML equivalence" utility, one which I have some hopes to pull out to a standalone module at some point. References to "block validity", at least within the generic bits (i.e. not isValidBlockContent, etc), would ideally be avoided.

const actualValue = actualAttributes[ name ];
const expectedValue = expectedAttributes[ name ];
const expectedValue = expectedAttributes[ name ] || expectedAttributes[ name.toLowerCase() ];

This comment has been minimized.

Copy link
@aduth

aduth Dec 20, 2019

Member

One thing that leaves me a bit uncomfortable with this is in whether one or these would be a falsey value, where if the expectedAttributes[ name ] value was explicitly an empty string, we should not want it to consider the fallback || here. In trying to come up with a unit test which would fail, I'm unable to. But for the purpose of keeping this maintainable, I'm thinking we should just simplify how we normalize the case. At the very least, I think we should not use a blanket "falsey" check here, but rather condition against expectedAttributes.hasOwnProperty.

@mmtr mmtr merged commit ddc98cb into master Dec 20, 2019
2 checks passed
2 checks passed
pull-request-automation
Details
Travis CI - Pull Request Build Passed
Details
@mmtr mmtr deleted the update/block-validation-case-sensitivity branch Dec 20, 2019
@youknowriad youknowriad added this to the Gutenberg 7.2 milestone Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.