Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for exporting and importing dictionaries database. #187

Merged
merged 1 commit into from
Aug 15, 2023

Conversation

forsakeninfinity
Copy link

@forsakeninfinity forsakeninfinity commented Jul 8, 2023

It's super annoying to have to import dictionaries one at a time every time you move across browsers or devices. This change adds an experimental mechanism to export and import the entire database of dictionaries so that users have to deal with only one source instead of tracking tens of different dictionaries separately when migrating.

This is related to, but not exactly the same as the open pull request: #115

I see value in keeping the approach of having the source extension send messages to the Yomitan extension in that pull request as a separate option as well though that may be just confusing if we merge this change (and I think the user experience is superior with this approach). However, this change takes on an additional dependency which may or not be up your alley.

It sort of addresses #128 but not entirely. I think there's value to having the actual settings and the dictionaries database exports remain different. The major annoyance for migration is in having to import each dictionary one-at-a-time. Otherwise it is actually useful to be able to import configurations separately from the dictionary data.

Lastly, no schema check is performed with this change either. I am not sure what exactly you have in mind in terms of a schema check. It is relatively easy to ensure that if a database does exist then whatever is imported MUST have the same database schema. I envision the primary use case being to recreate the database entirely, or to do an initial import so I am not sure how useful such a check actually is (might even get annoying if we change the schema in the future and people get blocked from importing newer exports). If you want to parse a schema from elsewhere then it'll be more work. Either way, this change doesn't address it. I can follow-up with a change later in the future if schema validation is deemed important enough.

See images for how it looks in practice. I prefer actual numbers over the progress bar thing (which is also clunky in the codebase) so I just picked a black background with readable text colors instead. (The import error occurs if the file being imported gets deleted from the system before the import is complete.)

import_progress
import_complete
export_progress
import_error

@djahandarie
Copy link
Collaborator

Nice, thanks for the PR. What would be the envisioned workflow if someone wanted to migration from yomichan (which doesn't have any of these changes) to yomitan?

Regarding schema validation, I was trying to refer to the fact that we need to figure out what to do if someone imports/exports across different version of the extension,when something has changed in the DB representation (either the schema itself or simply something about the data being stored in it). Probably some versioning + warnings, or versioning + auto-migration (upgrade/downgrade) logic might work

@forsakeninfinity
Copy link
Author

So, I added instructions for how to export the database from Yomichan in a different repo that I actually link to from the settings page there. We could add it in a README too, and make it part of the welcome page.

The instructions in that repository are pretty comprehensive and provide a one-liner that people can copy-paste into their console from the Yomichan settings page readily. It's a bit nastier for chrome (has to be really a really long "one-liner") because Yomichan just doesn't have the permissions to inject external code AT ALL for Chrome.

See https://github.com/forsakeninfinity/yomichan-data-exporter

@github-actions
Copy link

github-actions bot commented Jul 8, 2023

⚠️ Visual differences introduced by this PR; please validate if they are desirable.

View Playwright Report (note: open the "playwright-report" artifact)

@forsakeninfinity
Copy link
Author

That update should have fixed the linting errors fwiw. I find a couple of them, especially not allowing anonymous functions at all, pretty silly but nvm.

Regarding the schema validation, just blocking an import and raising an error is straightforward. Actual data migration is going to be much more complicated.

@djahandarie
Copy link
Collaborator

@forsakeninfinity Wow, nice. Thanks for all the work to make it a full package!

I'm a little worried about instructing people to execute unauditable remote code or one-liners in their consoles (which is why I tried to make it short and auditable in my version). If we decide to go with the Dixie version I'd like to at least make it auditable for people what is going on.

Could I request the following things?

  • [Security] Move that repo into themoeway org so that it's clear to people it's part of the project, and set some strict branch protections and other rules on it to make sure there is no chance it gets compromised (if you ask @jamesmaa to make you a member of the themoeway org, I think you should be able to transfer that repo)
  • [Security] Could you remove all the minified code committed to the yomichan-data-exporter repo, and instead make the GitHub Actions CI generate the minified code and upload it in a separate branch, so that it's more auditable? And maybe make fetching dependencies remotely (with pins) be part of the build process instead of having them vendored so we know it's the upstream version rather than a modified version. It might be good to introduce npm for the dependency tracking part of the problem.
  • [UX] Include something in the welcome page and README so it's easy for people to know
  • [UX] (Optional) Add margin: -5px or something like that on the black boxes you added, so the black box isn't so tightly stuck against the text. And honestly, I think the contrast is a bit too intense with the black box + neon colors. In general text-weight: bold and color: darkred or something like that should be sufficient for something to visually stick out, especially if it's also moving/changing like these numbers would be.

Please let me know if you have any questions or would like more justification of why any of these suggestions are important. 🙇

@forsakeninfinity
Copy link
Author

[Security] Could you remove all the minified code committed to the yomichan-data-exporter repo, and instead make the GitHub Actions CI generate the minified code and upload it in a separate branch, so that it's more auditable? And maybe make fetching dependencies remotely (with pins) be part of the build process instead of having them vendored so we know it's the upstream version rather than a modified version. It might be good to introduce npm for the dependency tracking part of the problem.

The former, sure.

For the latter part of it, I actually think it's far more secure to vendor those files on our end right? Upstream author could change what's hosted on remote on us (pulling the rug) but once we have audited a specific version and verified that we are happy with it, there is no particular reason to actually fetch it from remote where the data could change instead of having what we audited committed locally, no?

[Security] Move that repo into themoeway org so that it's clear to people it's part of the project, and set some strict branch protections and other rules on it to make sure there is no chance it gets compromised (if you ask @jamesmaa to make you a member of the themoeway org, I think you should be able to transfer that repo)

Okay. I will work with James to get the membership to the org. What sort of branch protections would you like specifically?

[UX] Include something in the welcome page and README so it's easy for people to know

Okay.

[UX] (Optional) Add margin: -5px or something like that on the black boxes you added, so the black box isn't so tightly stuck against the text. And honestly, I think the contrast is a bit too intense with the black box + neon colors. In general text-weight: bold and color: darkred or something like that should be sufficient for something to visually stick out, especially if it's also moving/changing like these numbers would be.

Okay. I wanted to use different colors for the different types of messages as a sort of hint to how things are going. I will see if I can find colors that are not as intense.


Will also address the linting issue. Is there anyway I can run this exact linter locally btw?

@Aquafina-water-bottle
Copy link

Is there anyway I can run this exact linter locally btw?

It should be specified in .github/workflows/ci.yml:

      - name: Lint
        run: npm run test-lint
        env:
          CI: true

      - name: Lint CSS
        run: npm run test-lint-css
        env:
          CI: true

      - name: Lint HTML
        run: npm run test-lint-html
        env:
          CI: true

I haven't looked too hard into the PR yet but it looks pretty solid. Thanks for all your work on it! At a glance, I also agree with Darius that the black background is a bit too much contrast, and it would be nice to stick to the existing color palette if possible.

@djahandarie
Copy link
Collaborator

For the latter part of it, I actually think it's far more secure to vendor those files on our end right? Upstream author could change what's hosted on remote on us (pulling the rug) but once we have audited a specific version and verified that we are happy with it, there is no particular reason to actually fetch it from remote where the data could change instead of having what we audited committed locally, no?

Nope. There's two stages auditing need to happen at:

  1. A user (and surrounding community) audit that yomichan-data-exporter doesn't have any exploits
  2. We (and surrounding community) audit that our upstream dependencies don't have any exploits, and ensure that fixed version is what is used

If you vendor, that makes 1 extremely hard to do because it introduces a ton of extra code that could have a subtle exploit in it, with likely no where near enough eyes actually checking it.

Of course, like you were saying, 2 is also required, but that is achievable by pinning the dependency at a specific hash of our choice using a lock file. That gives us the guarantee that it's what we audited, and gives our users the guarantee that it is not a modified version of the dependency.

What sort of branch protections would you like specifically?

It's a bit complicated so I can set them, just make sure to mark me as an admin of that repo.

@forsakeninfinity
Copy link
Author

forsakeninfinity commented Jul 15, 2023

Frankly I haven't really kept up with all the different things npmjs may have done recently for provenance but it used to be pretty awful (leftpad etc.).

Nevertheless, I did as you asked and added a workflow to autobuild a minified+combined file off of pinned dependencies and commit it to a release branch. I also transferred the repo to the TMW org but I think I don't actually have permissions to set up branch protections (although apparently the permissions for the workflows transfer over so it does autobuild and commit as expected). You may want to take a look and lock down permissions to the main branch: https://github.com/themoeway/yomichan-data-exporter/blob/main/.github/workflows/build.yml

I am making the changes to the commit in this PR itself as requested rn, will update in a bit.

@forsakeninfinity
Copy link
Author

I got rid of the black box and switched the colors for the messages to readable colors on a white background with bold weight as suggested.

Added a section in the README for migrating from Yomichan, and another section that has steps for importing and exporting dictionaries in general.

Added a section in the welcome page that talks about importing collection of dictionaries and migrating from Yomichan.

Links should all refer to TMW's repos now.

See pics.
yomichan-migration-notice
export-progress
import-progress
import-complete
import-error

It's super annoying to have to import dictionaries one at a time every
time you move across browsers or devices. This change adds an
experimental mechanism to export and import the entire database of
dictionaries so that users have to deal with only one source instead of
tracking tens of different dictionaries separately when migrating.
@forsakeninfinity
Copy link
Author

forsakeninfinity commented Jul 15, 2023

Interesting... github closes PRs if you rebase the commits lol. Should be a clean merge now with history fwiw.

Copy link
Collaborator

@djahandarie djahandarie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. And sorry for the delay in getting this merged.

@djahandarie djahandarie merged commit 83be1d6 into themoeway:master Aug 15, 2023
8 checks passed
@djahandarie djahandarie added the kind/enhancement The issue or PR is a new feature or request label Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement The issue or PR is a new feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants