Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underscores in tag names not matched #35

Closed
rossriley opened this issue Feb 4, 2016 · 10 comments
Closed

Underscores in tag names not matched #35

rossriley opened this issue Feb 4, 2016 · 10 comments
Assignees
Labels
Milestone

Comments

@rossriley
Copy link

Wasn't sure if this was a deliberate decision but in parsing tags containing underscores aren't matched.

eg: [testtag] works but [test_tag] does not.

Both the regular parser here:
https://github.com/thunderer/Shortcode/blob/master/src/Parser/RegularParser.php#L81
and the Wordpress parser here:
https://github.com/thunderer/Shortcode/blob/master/src/Parser/WordpressParser.php#L27

use a slightly different match syntax but neither contain the _ character.

Reason I ask is because I'm trying to use this library to make a content importer from Wordpress and Wordpress does seem to parse them correctly.

@thunderer
Copy link
Owner

Hi, I recently fixed inconsistent behavior between RegexParser and RegularParser and removed support for underscores _ in shortcode names. Those changes were introduced in a22cd63 and 3a8d537 . I'm curious about WordpressParser because it contains verbatim copy of WordPress' code that is exactly meant to work like in WordPress. There is a test that happily passes on all parsers and catches no shortcode. Does your WordPress parse those shortcodes with underscores? Can you show me the sample input and the desired output?

@thunderer
Copy link
Owner

Just to clarify - I can add support for underscores back in RegexParser and RegularParser, but I apply fixes to WordpressParser only if its behavior differs from WordPress. That is explained in a disclaimer at the top of that class.

@rossriley
Copy link
Author

Sorry, my mistake, the wordpress parser was failing on something else to do with the body of text I was processing.. When I took it out to a simple string the WordpressParser does work correctly.

So, would be nice eventually to have underscore support in the main parser, but for now I can use the Wordpress parser.

thanks

@thunderer
Copy link
Owner

Can you tell me what was the problem? Is it something I can fix on my side? I took a note to add the underscore _ back, I'll also explicitly state in the new docs in #34 what names are allowed. I hope to finish work on outstanding PRs and tag v0.6.0 with all planned features.

BTW In general you should use WordpressParser only if you want exact same behavior as in WP. If you're migrating data from WP to other system, I recommend to use at least RegexParser as it does not have the shortcomings of the weak WP regex.

@rossriley
Copy link
Author

@thunderer no, it was my fault, I think it was caused by the shortcode being right next to a CDATA section which came from the WXR file, once I pulled it out as plain text properly the Wordpress parser picked it up.

Because the underscore wasn't picked up on the standard parser I had assumed it was the same problem but it wasn't.

@thunderer
Copy link
Owner

Yeah, XML may be conflicting with shortcode syntax as <![CDATA[ this is a text ]]> is a valid shortcode this. :)

I think I may get what is going on here. WordPress' regex catches only shortcodes with registered handlers, not the generic name pattern. To do that it gets the names from the global $shortcode_tags variable. To be truly compliant I should pass a HandlerContainer to WordpressParser constructor so that it can create its regex accordingly. That would be of course a BC break, so I need to make it work both ways for now and clean it later.

@thunderer thunderer reopened this Feb 5, 2016
@thunderer thunderer added the bug label Feb 5, 2016
@thunderer thunderer self-assigned this Feb 5, 2016
@thunderer
Copy link
Owner

I added optional HandlerContainer instance in WordpressParser in ed246fe. If you pass it there, that parser will construct its regex just like WordPress does it internally and it should be completely compliant now, it will also support any shortcode name without validation, just as WordPress does. Could you please check latest dev-master and confirm that it works for you?

@thunderer
Copy link
Owner

Sorry, my previous comment was too hasty, I changed the constructor to named constructors createFromHandlers() and createFromNames() and I forgot about the call to preg_quote().

@thunderer thunderer added this to the 0.6 milestone Feb 5, 2016
@thunderer
Copy link
Owner

@rossriley Underscores are back in the allowed characters in 5265cb5 (I hope to tag release soon), I also standardized their validation so that they will be changed simultaneously in the future to avoid accidental differences.

Would you like me to do anything more or can this issue be closed? Yes, I know that I reopened it, but still, thanks for letting me know. :)

@rossriley
Copy link
Author

@thunderer No, that's great. Thanks for doing that so quickly.

@thunderer thunderer mentioned this issue Feb 11, 2016
35 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants