Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change PageRangeFormat::format to follow Pandoc #7

Merged
merged 10 commits into from
May 8, 2024

Conversation

DerDrodt
Copy link
Contributor

@DerDrodt DerDrodt commented Jan 12, 2024

(This is a big change; I apologize.)

This PR changes formatting of page ranges entirely. In summary

  1. PageRangeFormat::format no longer accepts start and end parameters of type i32 but &str to handle prefixes better.
  2. A range may now have a prefix. E.g., 8n11564-8n1568 is formatted to 8n11564–68 by Chicago15. (The prefix must be identical for start and end of the range!)

I followed the Haskell implementation of pandoc pretty closely, while trying to be as efficient as possible and writing it like one would in Rust. I would not be surprised if it can be optimized further.

Three points should be cleared up before merging:

  • Should we follow the citeproc test suite exactly? Prefixes, while important for many texts, are not mentioned in the CSL spec. In particular, the test suite demands that 123N110 - N5 is turned to 123N110-N5 by Expanded (note that no en-dash is used!). I would prefer the given separator instead of the normal dash. This is a matter of taste. Answer: We should use the given separator.
  • Should the parsing of the ranges be done by Citationberg while rendering or should Citationberg have a more complex data structure for ranges that is created by hayagriva and then only rendered by Citationberg? This is a question of where we want this to be handled: Should accessing a hayagriva Library expose the structure of the ranges without rendering using CSL or should it be seen as part of CSL? Answer: See comment below.
  • Should we no longer have a default for page-range-format? Answer: no.

This would be the second API change to citationberg. If merged, we should have a new version.

@reknih
Copy link
Member

reknih commented Jan 24, 2024

I share your preference to keep the user-specified separator. With Hayagriva's new test allow-list, we can explicitly opt out of some of citeproc's tests.

I see the boundary between citationberg and hayagriva as follows: The former implements CSL features in a way that does not make assumptions about the data we want to format while hayagriva comes with its own data model and the implementation needed to format it. I think that parsing our user input is still format-specific and should be handled in hayagriva. Do you agree?

@DerDrodt
Copy link
Contributor Author

I share your preference to keep the user-specified separator. With Hayagriva's new test allow-list, we can explicitly opt out of some of citeproc's tests.

That makes sense. In that case, we should have our own tests in the same format to test the new behavior.

I also found that according to the CSL docs we should only replace hyphens if the attribute was set. There also is no mention of a default for page-range-format. Should we remove our default?

I see the boundary between citationberg and hayagriva as follows: The former implements CSL features in a way that does not make assumptions about the data we want to format while hayagriva comes with its own data model and the implementation needed to format it. I think that parsing our user input is still format-specific and should be handled in hayagriva. Do you agree?

I agree. My proposal is this:

  1. Hayagriva handles splitting the input into multiple ranges or single pages (e.g., 1-5, 6-12 & 18 is turned to [1-5, 6-12, 18].
  2. A range in hayagriva consists of a start and end page, with each page having an optional prefix.
  3. Citationberg handles the formatting of a single range or page like in this PR.

That seems to be the desired behavior.

@reknih
Copy link
Member

reknih commented Feb 20, 2024

I apologize for my late answer! If the CSL spec does not want us to remove hyphens if the attribute isn't set we shall not. I think that the page-range-format default, however, should be kept as styles may rely on it given that citeproc.js is the biggest CSL consumer.

Your proposal looks good to me!

@DerDrodt
Copy link
Contributor Author

I updated this PR. I am preparing a PR for hayagriva. When this is merged, I can use it in hayagriva to incorporate the things we discussed.

@DerDrodt DerDrodt marked this pull request as ready for review April 23, 2024 15:06
@reknih reknih merged commit ff1b135 into typst:main May 8, 2024
2 checks passed
@reknih
Copy link
Member

reknih commented May 8, 2024

Thank you! Do you need a breaking citationberg release right away or do we expect more changes?

@Drodt Drodt deleted the csl-page branch May 10, 2024 07:04
@DerDrodt
Copy link
Contributor Author

No need for a new release. It is already set to use the git repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants