Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert translation maps to UTF-8. #2942

Merged
merged 1 commit into from Jun 21, 2023

Conversation

demiankatz
Copy link
Member

@demiankatz demiankatz commented Jun 15, 2023

The recent upgrade to SolrMarc changed its expectations, so translation maps are now interpreted as UTF-8 instead of Latin-1. This PR converts our Latin-1 translation maps into UTF-8. Several of our existing translation maps were already UTF-8, which likely means that they were interpreted incorrectly by the older version of SolrMarc. This is clearly a step in the right direction, since we use UTF-8 encoding literally everywhere else.

TODO

  • Add changelog note to convert encoding of local translation maps.

@mtrojan-ub
Copy link
Contributor

I think the encoding change itself is alright, but: If most of the countries are listed by their english names, wouldn't it be more appropriate to e.g. also list countries like Côte d'Ivoire by their english name Ivory Coast instead? (But this is just a minor question, feel free to merge anyway if you disagree)

@demiankatz
Copy link
Member Author

@mtrojan-ub, I'll merge this as-is, since I want to separate the commit for encoding changes from any content-related changes to keep things from getting confusing... but I'm open to a follow-up PR if we can agree on an approach.

I'm not sure where the country text was originally sourced, or what rules/patterns were applied there. I would be inclined to follow some standard rather than making arbitrary decisions, so it's easy to know what to do in the future. As you may have seen, #2933 proposes aligning our language names with the MARC standard. I haven't looked into it in detail, but I wonder if we could do the same thing for country codes for consistency. If there's no MARC standard that applies here, then we might also be able to find an official ISO list and compare that against what we currently have. I'm curious if @damien-git has any opinions about this!

@demiankatz demiankatz merged commit 0b7441b into vufind-org:dev Jun 21, 2023
7 checks passed
@demiankatz demiankatz deleted the utf8-translation-maps branch June 21, 2023 10:52
@damien-git
Copy link
Contributor

damien-git commented Jun 28, 2023

@demiankatz I think it makes sense to use the titles in the MARC standard for country codes. But it does look inconsistent...

@demiankatz
Copy link
Member Author

Thanks, @damien-git, I agree that for now we should stick to the standard unless there's a really strong reason to deviate. There is, of course, nothing to prevent local customization if preferences differ, but sticking with the standard simplifies long-term maintenance by saving us from repeating decisions/discussions whenever an update comes along.

bpalme pushed a commit to bpalme/vufind that referenced this pull request Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants