Skip to content

CLDR-17202 kbd: add Bengali (bn) keyboard #3368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 12, 2024

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Oct 27, 2023

CLDR-17202

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@srl295 srl295 self-assigned this Oct 27, 2023
@srl295 srl295 requested a review from a team October 27, 2023 19:20
@miloush
Copy link
Contributor

miloush commented Oct 27, 2023

Where are the key ids from? They seem to be in some sort of transliteration scheme but then there is ch, sh, w etc.

@srl295
Copy link
Member Author

srl295 commented Oct 27, 2023

Where are the key ids from? They seem to be in some sort of transliteration scheme but then there is ch, sh, w etc.

wikipedia

@miloush
Copy link
Contributor

miloush commented Oct 27, 2023

heh that page is tagged with this issue, also from the Talk page:

this article needs serious cleanup, it appears to be using some sort of home-grown transliteration scheme at present.

I would be quite uncomfortable having this scheme, could we use one of the established ones? (ISO15919 if it was up to me, or I could live with ALA-LC too)

@srl295
Copy link
Member Author

srl295 commented Oct 27, 2023

heh that page is tagged with this issue, also from the Talk page:

this article needs serious cleanup, it appears to be using some sort of home-grown transliteration scheme at present.

I would be quite uncomfortable having this scheme, could we use one of the established ones? (ISO15919 if it was up to me, or I could live with ALA-LC too)

ok. i might use ala-lc as i have worked with that one.

@srl295
Copy link
Member Author

srl295 commented Oct 27, 2023

I'll use the ALA-LC table at https://www.loc.gov/catdir/cpso/romanization/bengali.pdf

One of the test failures seems to be a CLDR bug, tracking at https://unicode-org.atlassian.net/browse/CLDR-17204

@miloush
Copy link
Contributor

miloush commented Oct 27, 2023

I checked and the only substantial difference between ALA-LC and ISO15919:2001 is sha in ALA-LC and ṣa in ISO-15919.

EDIT: other difference is ṁ vs. ṃ for anusvara

@srl295
Copy link
Member Author

srl295 commented Oct 27, 2023

will need to borrow this one from the Assamese table

<key id="wa" output="ৱ" /> <!-- Assamese-->

@miloush
Copy link
Contributor

miloush commented Oct 27, 2023

ISO15919 has this one:
image

@srl295
Copy link
Member Author

srl295 commented Oct 27, 2023

image

Chart so far

@srl295 srl295 mentioned this pull request Oct 27, 2023
1 task

<!-- UNSHIFTED KEYS -->
<!-- E: (top) row -->
<key id="n̐" output="\u{0981}" /> <!-- candrabindu-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just be id="candrabindu" and avoid comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is per romanization table

<key id="au" output="\u{09CC}" />
<key id="pha" output="ফ" />

<key id="au-length" output="\u{09D7}" /> <!-- TODO: better name? -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wanted to double-check we want encourage people to be entering E+AU LENGTH MARK rather than VOWEL SIGN AU (the spec says length mark exists for compatibility and does not have a meaning on its own)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to verify this one yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right ,Not sure what that one was doing here. I'll remove it.

<transform from="\u{q}\u{09C8}" to="ঐ" />
<transform from="\u{q}\u{09CB}" to="ও" />
<transform from="\u{q}\u{09CC}" to="ঔ" />
<transform from="\u{q}\u{09D7}" to="আ" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it clear, the situation is:
A -> SIGN AA
SHIFT+A -> AU MARK

Q, A -> LETTER A
Q, SHIFT+A -> LETTER AA

(notably Q and long sign produce short letter unlike Q in combination with everything else)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does match the upstream
image

@miloush
Copy link
Contributor

miloush commented Oct 28, 2023

OK I checked the assignments. It's a bit difficult to review though, why would you order the keys in bag in layout order rather than [Brahmic] alphabetical order?

This probably should have a rupee sign.

@srl295
Copy link
Member Author

srl295 commented Oct 28, 2023

Still todo

  • Tests
  • Add reorder rules

@srl295
Copy link
Member Author

srl295 commented Oct 28, 2023

OK I checked the assignments. It's a bit difficult to review though, why would you order the keys in bag in layout order rather than [Brahmic] alphabetical order?

It was for ease of entry. I can sort them now though.

This probably should have a rupee sign.

So as with a number of other comments: this is a port of an existing keyboard file (see link in xml). I'm fine with improving it, just noting that step 1 is a port.

Thanks for the comments and thorough review. I'll keep working on it.

@srl295
Copy link
Member Author

srl295 commented Nov 1, 2023

image

here's a screenshot of a Windows system typing using this keyboard.

@srl295 srl295 force-pushed the kbd/cldr-17202/bengali branch from 348d80c to ff24ba3 Compare November 3, 2023 22:26
@jira-pull-request-webhook

This comment was marked as outdated.

@srl295 srl295 changed the base branch from maint/maint-44 to main November 3, 2023 22:26
@srl295 srl295 marked this pull request as ready for review November 3, 2023 22:36
@srl295 srl295 requested a review from miloush November 6, 2023 15:52
@srl295 srl295 force-pushed the kbd/cldr-17202/bengali branch from f246d4c to ad6ad45 Compare November 6, 2023 16:45
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295
Copy link
Member Author

srl295 commented Nov 6, 2023

rebased and should build clean. @miloush or anyone can I get an approval on it?

@srl295
Copy link
Member Author

srl295 commented Nov 28, 2023

  • comment transforms / reorder

- rename lengthener to au-lengthener
- add displays for 3 keys
- fix XML order to match spec
- spec also allows <startContext> to be optional in test files
- reorder keys for review
- fix ya/sha confusion
@srl295 srl295 force-pushed the kbd/cldr-17202/bengali branch from 183f086 to 44bb8f0 Compare November 30, 2023 01:18
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295
Copy link
Member Author

srl295 commented Nov 30, 2023

@miloush and @mhosken i've attempted to document the reorder rules.

- document the reorders
@srl295 srl295 force-pushed the kbd/cldr-17202/bengali branch from 8808917 to c6dd369 Compare November 30, 2023 23:17
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • keyboards/3.0/bn.xml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 force-pushed the kbd/cldr-17202/bengali branch from d86e6d0 to c6dd369 Compare November 30, 2023 23:47
@srl295
Copy link
Member Author

srl295 commented Nov 30, 2023

i'm tracking a build issue causing kbd-check to not work in keymanapp/keyman#10111

@miloush
Copy link
Contributor

miloush commented Jan 8, 2024

Did we say the processing turns everything into NFD? Are the reordering matches done before or after that? Or are the match expressions also normalized? Because 9CB O/9CC AU which are under right side vowels decompose into left+right side vowels, in the latter case the right one being 09D7 AU LENGTH MARK which is not part of the rules.

We don't care about characters not entered through the keyboard (like using Alt+Numpad), correct? (especially thinking about Vedic marks)

@srl295
Copy link
Member Author

srl295 commented Jan 8, 2024

I'm working on a document for review on normalization https://unicode-org.atlassian.net/browse/CLDR-17192

Did we say the processing turns everything into NFD?

We will say that the match acts as if it's in NFD.

Are the reordering matches done before or after that?

Matches will be in NFD

Or are the match expressions also normalized?

and the expression will be normalized also.

Because 9CB O/9CC AU which are under right side vowels decompose into left+right side vowels, in the latter case the right one being 09D7 AU LENGTH MARK which is not part of the rules.

I think I, earlier, removed 09D7 due to normalization. But I should re-add it for this reason.

We don't care about characters not entered through the keyboard (like using Alt+Numpad), correct? (especially thinking about Vedic marks)

The reorder rules (being script specific) are eventually supposed to be imported data separate from the keyboard proper. But besides that, someone might enter something with Alt-numpad and then click and type. Just like I could type e and then click after it, and then type a combining grave. I don't think the transform rules per se need to handle everything in the script but they should handle everything that the language is likely to encounter. The reorder rules should (though perhaps not in this keyboard) handle everything in the script.

@srl295
Copy link
Member Author

srl295 commented Jan 8, 2024

@miloush PTAL, U+09D7 is now included in the reorders

@srl295
Copy link
Member Author

srl295 commented Jan 12, 2024

@miloush OK to merge?

@srl295 srl295 merged commit 56267a4 into unicode-org:main Jan 12, 2024
@srl295 srl295 deleted the kbd/cldr-17202/bengali branch January 12, 2024 16:20
@srl295
Copy link
Member Author

srl295 commented Jan 12, 2024

@miloush thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants