Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Parsing of Filenames with multiple period characters in 0.19.3 #517

Closed
odenzo opened this issue Sep 6, 2023 · 4 comments · Fixed by #526
Closed

Invalid Parsing of Filenames with multiple period characters in 0.19.3 #517

odenzo opened this issue Sep 6, 2023 · 4 comments · Fixed by #526
Labels
Milestone

Comments

@odenzo
Copy link

odenzo commented Sep 6, 2023

What Happens: When using a filename with multiple . the parsing for file extension starts with the first . instead of the last.
Expected: .0.1.md correctly parses extension as md
Version: 0.19.3

	at laika.io.runtime.ParserRuntime$.$anonfun$run$21(ParserRuntime.scala:108)
	at cats.syntax.EitherOps$.leftMap$extension(either.scala:193)
	at laika.io.runtime.ParserRuntime$.parseAll$1(ParserRuntime.scala:108)
	at laika.io.runtime.ParserRuntime$.$anonfun$run$36(ParserRuntime.scala:167)
	at flatMap @ laika.io.runtime.ParserRuntime$.parseAll$1(ParserRuntime.scala:152)
	at map @ laika.io.runtime.ParserRuntime$.$anonfun$run$36(ParserRuntime.scala:167)
	at map @ laika.io.runtime.ParserRuntime$.$anonfun$run$2(ParserRuntime.scala:59)
	at flatMap @ laika.io.runtime.ParserRuntime$.mergeInputs$1(ParserRuntime.scala:58)
	at flatMap @ laika.io.runtime.ParserRuntime$.$anonfun$run$35(ParserRuntime.scala:166)
	at delay @ fs2.io.file.FilesCompanionPlatform$AsyncFiles.isDirectory(FilesPlatform.scala:262)
	at delay @ laika.sbt.Settings$$anon$1.filter(Settings.scala:46)
	at delay @ fs2.io.file.FilesCompanionPlatform$AsyncFiles.isDirectory(FilesPlatform.scala:262)
	at delay @ laika.sbt.Settings$$anon$1.filter(Settings.scala:46)
	at traverse @ laika.helium.generate.MergedCSSGenerator$.merge(MergedCSSGenerator.scala:28)
@odenzo
Copy link
Author

odenzo commented Sep 6, 2023

Also occurs in 1.0.0-M3

@jenshalm
Copy link
Contributor

jenshalm commented Sep 6, 2023

Hello and thanks for reporting.

My main question is: what is your main concern? Is it how the Path API works (e.g. what Path.suffix is reporting) or is your main concern simply the consequence of this behaviour, e.g. that some .md files are not included in the transformation?

The reason I'm asking is because the current behaviour is semi-intended (the way Path.suffix works is intentional, but some .md files being excluded and causing the transformation to fail is not). The API was designed to be working with compound suffixes, e.g. .tar.gz without additional parsing, partially because compound suffixes are heavily used internally (e.g. using .epub.css for styles which are only intended for EPUB output). And for those types of file extensions reporting baseName as the name until the last dot is not really accurate either, so while the behaviour is certainly different than in most other APIs I'm not sure it is really "invalid" given the existence of compound suffixes. Therefore, my preferred way of fixing this would be to prevent the observed issues without changing suffix parsing to "last dot only".

An API-breaking way to do this would be to turn the Path.suffix type from String to a richer type Suffix, which could come with a .last property to give you the expected (more traditional) file extension. Which means this would be possible for an upcoming 1.0 milestone, since we break some APIs anyway.

For 0.19.x what Path.suffix reports has to remain, but the logic around it (meaning how file types are determined), can be fixed by examining the reported suffix more closely.

@jenshalm
Copy link
Contributor

jenshalm commented Sep 7, 2023

Another option that is somewhat of a hack, but otherwise a pragmatic, low-risk improvement for 0.19.x at least would be to only parse compound suffixes known internally (e.g. .template.html or epub.css) and otherwise fall back to single-segment suffixes. This would not mess as much with user expectations and would fix the issues you ran into.

There'd still be room then to come up with something more polished for 1.x (which is still a few weeks away).

@jenshalm jenshalm added the bug label Sep 7, 2023
@jenshalm jenshalm added this to the 1.0.0-M5 milestone Sep 7, 2023
@jenshalm
Copy link
Contributor

jenshalm commented Sep 7, 2023

Deferring this fix to 1.0.0-M5 as the changes required to address this introduce a high risk for regressions for 0.19.x which will see its final (planned) release in a few days - and 1.0.0-M4 is also nearly ready for release.

I also cannot reproduce anything that produces a stacktrace like the one you posted, do you have any additional detail on how you got that? A transformation with markdown documents containing multiple dots does succeed, the only issue is that the generated output names strip part of the name (e.g. name.1.2.3.md becomes name.html in the output).

There are clean ways to address this in 1.x, partly because we can reduce the need for compound suffixes and then safely change the behaviour of the path parser afterwards. It's also not highest priority, as using names with dots appears to be extremely rare (this is the first report in 11 years).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants