sentence-case #26

retorquere · 2023-08-05T18:33:10Z

Simpler sentence-caser. I've tried my best to make sure it is in line with the coding style for Zotero, but utilities does not have an eslint config.

dstillman · 2023-08-05T21:45:53Z

Can we add a test to https://github.com/zotero/utilities/blob/master/test/tests/utilitiesTest.js? Assuming you have the strings separately, we can just put a JSON file with the pairs in https://github.com/zotero/utilities/tree/master/test/data and load with loadSampleData().

Once you've run npm i once, you can run tests in this repo with npm test.

retorquere · 2023-08-05T22:03:37Z

According to the README, npm i && npm test should run the tests, but I get

no such file or directory, open 'test/../resource/schema/global/schema.json'

retorquere · 2023-08-05T22:09:27Z

I've dumped my strings at https://gist.github.com/retorquere/8fb5a14a0b0f0a60db3df5313a258d5c . I haven't inspected everyone I must admit, this is accrued testing over the years based on user reports. I'd be happy to add tests.

dstillman · 2023-08-05T22:16:29Z

You probably didn't do a recursive clone. (Almost all Zotero repos require recursive clones.)

You can use git submodule update --init --recursive to initialize the submodules.

retorquere · 2023-08-05T22:35:21Z

great, thanks. But when we're talking about loadSampleData, ~~shouldn't I be adding them to test/tests/utilities_itemTest.js?~~ never mind, I see the pattern.

retorquere · 2023-08-05T22:59:05Z

tests have been added.

dstillman · 2023-08-06T01:36:25Z

utilities.js

+		let masked = text.replace(/<[^>]+>/g, (match, i) => {
+			preserve.push({ start: i, end: i + match.length, description: 'markup' });
+			return '\uFFFD'.repeat(match.length);
+		});


The above two sections don't seem to be doing anything, at least with the sample input. Do we need them? If so, we should add some test data for them.

The (sub-)sentence-start ones? One of them did fail a test when I removed them, but I've added two new samples. But in the mail I received you seemed to be pointing towards the (sub-)sentence start handlers, here on the site it looks like you're referring to the markup handler. I'll look into the markup handler tonight -- certainly that should hit on something.

The two blocks above my comment here — protect nocase and mask html tags with characters…. Nothing failed when I removed those (but still setting masked properly).

Well that's just weird. I'll look into it tonight - I have a 6 hour drive in front of me right now.

It captures in-word markup. I will grant that I do not have a non-synthetic case at hand, but I've added a synthetic case.

dstillman · 2023-08-06T23:46:02Z

What about the nocase?

retorquere · 2023-08-07T00:13:40Z

I missed that testcase, added it now.

dstillman · 2023-08-07T02:27:14Z

"How to Get an A Without Trying": "How to get an A without trying",

"Effects of Open- and Closed-System Temperature Changes on Blood O₂-Binding Characteristics of Atlantic Bluefin Tuna (Thunnus thynnus)": "Effects of open- and closed-system temperature changes on blood O₂-binding characteristics of Atlantic bluefin tuna (Thunnus thynnus)",

I don't think either of these is really appropriate as an example. CSL styles don't sentence-case, so despite the name, the sole point of nocase is to avoid automatic title-casing of foreign words, scientific terms, and mixed-case words like iPhone (though citeproc-js should arguably just avoid capitalizing the last one). So other than Thunnus thynnus, which maybe would be protected before sentence-casing, none of these other strings would/should be. I know we're only testing the sentence-caser here, but I don't think we want to imply anywhere (even to developers looking at the code) that someone should be putting nocase around A, Atlantic, or O. Short of perhaps AI, there's no way to avoid manually recapitalizing proper nouns as a human being after sentence-casing a string, and that's not going to change with this better sentence-caser.

retorquere · 2023-08-07T08:57:59Z

I'd have no issue with Zotero making that call. For BBT it's a live case, but for Zotero it doesn't need to be.

dstillman · 2023-08-07T09:03:54Z

I'm just suggesting we go with something like this:

"Migration and the Origins of Homo sapiens": "Migration and the origins of Homo sapiens",

Which better reflects how nocase is meant to be used.

dstillman · 2023-08-07T09:06:05Z

(Which actually makes me wonder if citeproc-js actually looks for  or just class="nocase". I.e., does  work?)

retorquere · 2023-08-07T09:18:34Z

I'm just suggesting we go with something like this:

"Migration and the Origins of Homo sapiens": "Migration and the origins of Homo sapiens",

Which better reflects how nocase is meant to be used.

Ah I see, I hadn't scrutinized the sample, this is how I got it from a user. I can update this tonight.

retorquere · 2023-08-07T09:21:18Z

(Which actually makes me wonder if citeproc-js actually looks for  or just class="nocase". I.e., does  work?)

It wouldn't be hard to add that; the BBT html-to-latex supports it but I haven't documented that. Do you know what markup citeproc supports beyond b/i/sup/sub BTW?

retorquere · 2023-08-07T22:04:58Z

I've replaced the nocase tests with the proposed sample

northword · 2023-08-08T12:10:40Z

Maybe some of these idea can carry over? But I'd suggest e.g. testing for \p{Lu} rather than A-Z, and I think string.charAt would have problems with multibyte characters. I've opened a new PR at #26.

It looks great!

I've tried adding chem elements detection to the sentencecaser, but it flags "No" as an element instead of as the adverb/adjective/noun. edit: also: "Be", "Au" (French), "Na" (Portuguese).

oh! Sorry, I hadn't thought about this at all. Perhaps we can apply chemical elements only when the title is in English? Or unmatch chemical elements that are function words in other languages.
This seems to be a bit of a hassle, maybe it can be done with a plugin patch as well.

retorquere · 2023-08-09T09:29:40Z

quote-protection has been removed.

dstillman · 2023-08-09T11:43:52Z

Great! Thank you!

Closes #293

retorquere · 2023-08-09T12:27:42Z

Super. This will show up in the next beta I take it? If so, when could we expect the next beta?

dstillman · 2023-08-09T12:37:34Z

Beta 32, out now

close #35, #27, #18 related: zotero/utilities#26

northword · 2023-08-09T12:52:14Z

The problem is that when the text is in all caps, the result is still in all caps, e.g. "NITROUS-OXIDE EMISSIONS FROM VEHICLES", which obviously doesn't have a specific word that needs to be capitalized, so I think that we can convert every word to lowercase when uppercase == uppercase with lowercase in all text.

retorquere · 2023-08-09T12:53:12Z

That wouldn't be hard to add.

dstillman · 2023-08-09T13:25:53Z

Oh, yes, that's a fairly significant regression.

retorquere · 2023-08-09T13:40:38Z

#27

sentence-case

099fc12

retorquere mentioned this pull request Aug 5, 2023

sentence-caser zotero/zotero#3251

Closed

retorquere added 5 commits August 5, 2023 20:41

typo

871b52d

don't add spurious inner caps

e769f20

single-pass masking, dash-seperated words, and sentence-leading markup

67d7d45

comment

c424dc6

U.S.A.

e2b439b

invalid regex

d45de28

retorquere added 3 commits August 6, 2023 00:48

lead capital can have markup

598f5ce

capturing groups

f6e8ee2

test

a7b1521

dstillman reviewed Aug 6, 2023

View reviewed changes

retorquere added 3 commits August 6, 2023 09:19

leading markup

6f647be

samples for sentence case

81f1ecd

inner markup

0333bb9

nocase test

4ed1bde

nocase test

bbc9a27

nocase test

101a990

quotes are nothing special

9a29c87

dstillman merged commit 27c9818 into zotero:master Aug 9, 2023
1 check passed

dstillman added a commit to zotero/zotero that referenced this pull request Aug 9, 2023

Switch to better sentence-caser from zotero/utilities#26

5e27a75

Closes #293

northword added a commit to northword/zotero-format-metadata that referenced this pull request Aug 9, 2023

fix: to sentence case

ee987c2

close #35, #27, #18 related: zotero/utilities#26

northword mentioned this pull request Jan 2, 2024

[Feature Request] Title case and sentence case northword/zotero-format-metadata#18

Closed

5 tasks

retorquere mentioned this pull request Mar 3, 2024

First character after opening quotation mark should remain capitalized when converting to sentence case zotero/zotero#3787

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sentence-case #26

sentence-case #26

retorquere commented Aug 5, 2023

dstillman commented Aug 5, 2023

retorquere commented Aug 5, 2023

retorquere commented Aug 5, 2023

dstillman commented Aug 5, 2023

retorquere commented Aug 5, 2023 •

edited

retorquere commented Aug 5, 2023

dstillman Aug 6, 2023

retorquere Aug 6, 2023

dstillman Aug 6, 2023

retorquere Aug 6, 2023

retorquere Aug 6, 2023 •

edited

dstillman commented Aug 6, 2023

retorquere commented Aug 7, 2023 •

edited

dstillman commented Aug 7, 2023 •

edited

retorquere commented Aug 7, 2023

dstillman commented Aug 7, 2023 •

edited

dstillman commented Aug 7, 2023

retorquere commented Aug 7, 2023 •

edited

retorquere commented Aug 7, 2023

retorquere commented Aug 7, 2023

northword commented Aug 8, 2023

retorquere commented Aug 9, 2023

dstillman commented Aug 9, 2023

retorquere commented Aug 9, 2023 •

edited

dstillman commented Aug 9, 2023

northword commented Aug 9, 2023 •

edited

retorquere commented Aug 9, 2023

dstillman commented Aug 9, 2023 •

edited

retorquere commented Aug 9, 2023

sentence-case #26

sentence-case #26

Conversation

retorquere commented Aug 5, 2023

dstillman commented Aug 5, 2023

retorquere commented Aug 5, 2023

retorquere commented Aug 5, 2023

dstillman commented Aug 5, 2023

retorquere commented Aug 5, 2023 • edited

retorquere commented Aug 5, 2023

dstillman Aug 6, 2023

Choose a reason for hiding this comment

retorquere Aug 6, 2023

Choose a reason for hiding this comment

dstillman Aug 6, 2023

Choose a reason for hiding this comment

retorquere Aug 6, 2023

Choose a reason for hiding this comment

retorquere Aug 6, 2023 • edited

Choose a reason for hiding this comment

dstillman commented Aug 6, 2023

retorquere commented Aug 7, 2023 • edited

dstillman commented Aug 7, 2023 • edited

retorquere commented Aug 7, 2023

dstillman commented Aug 7, 2023 • edited

dstillman commented Aug 7, 2023

retorquere commented Aug 7, 2023 • edited

retorquere commented Aug 7, 2023

retorquere commented Aug 7, 2023

northword commented Aug 8, 2023

retorquere commented Aug 9, 2023

dstillman commented Aug 9, 2023

retorquere commented Aug 9, 2023 • edited

dstillman commented Aug 9, 2023

northword commented Aug 9, 2023 • edited

retorquere commented Aug 9, 2023

dstillman commented Aug 9, 2023 • edited

retorquere commented Aug 9, 2023

retorquere commented Aug 5, 2023 •

edited

retorquere Aug 6, 2023 •

edited

retorquere commented Aug 7, 2023 •

edited

dstillman commented Aug 7, 2023 •

edited

dstillman commented Aug 7, 2023 •

edited

retorquere commented Aug 7, 2023 •

edited

retorquere commented Aug 9, 2023 •

edited

northword commented Aug 9, 2023 •

edited

dstillman commented Aug 9, 2023 •

edited