Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace codespell with pyspelling #663

Merged
merged 17 commits into from Jun 1, 2022

Conversation

strickvl
Copy link
Contributor

@strickvl strickvl commented May 30, 2022

TL;DR Quick Summary:

  • Our old automated spellchecker wasn’t actually checking our spelling, so I switched it out and now we’re using pyspelling
  • It’s a bit more aggressive than codespell
  • If it’s claiming a word is an error but it’s a legitimate (correctly spelled) word, then you can add that word as a new line to our canonical list of ignored words: .pyspelling-ignore-words

We use pyspelling at ZenML to guard against typos and other non-standard uses of words being used in our codebase. (We previously used codespell for this but it only checked against a known list of typos and therefore missed lots of errors).

It takes one or two minutes to run apyspelling check over the entire codebase, unfortunately, so this spellcheck only runs at the Github Action CI level by default. (In other words, it is no longer part of our pre-commit checks.)

If you wish to run the spellchecks manually on your local machine, first install aspell (using brew or apt-get), run poetry update to get the relevant dependency and then simply use:

poetry run bash scripts/check-spelling.sh

Note that there are some quirks / less-than-ideal realities of working with pyspelling:

  • it is a LOT more aggressive about finding typos than codespell was
  • it isn’t particularly smart about knowing whether a particular term is misspelled vs it just hasn’t seen it before. In particular, referencing ZenML specific class or variable names often causes it to detect failures.
  • it is currently set up to check
    • all our Python files (mainly checking comments and docstring text)
    • all our .md markdown files
  • Technically it is supposed to ignore URLs and code inside triple ``` backticks, but this doesn't always work perfectly, it seems.

⚠️ What do do when spelling errors are found?

When spelling errors are detected, it won’t suggest an alternative (as codespell did) so you’ll have to examine your text to see whether it’s a legitimate failure or not. If you find a word that is legitimate and not a typo, you will need to add it to our list of ignored words: .pyspelling-ignore-words. As you’ll note, this file is already fairly long. Just add your word onto a new line anywhere. If you want to sort it at the end, you can run sort -o .pyspelling-ignore-words .pyspelling-ignore-words on a *nix machine but no need to do that for pyspelling to work.

June 2022 UPDATE: pyspelling is currently only added on the zenml repo, but I’ll likely add it to all the other key repos (ZenBytes, ZenFiles, Blog) in due course once we’ve ironed out any issues.

(This note is currently captured in Notion as well, for future reference).

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table.
  • I have added tests to cover my changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@strickvl strickvl marked this pull request as ready for review May 30, 2022 13:30
@github-actions github-actions bot added internal To filter out internal PRs and issues bug Something isn't working labels May 30, 2022
@zenml-io zenml-io deleted a comment from review-notebook-app bot May 31, 2022
Copy link
Collaborator

@schustmi schustmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@AlexejPenner AlexejPenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luuks Gud tu mi!

@strickvl strickvl merged commit 8f91f5f into develop Jun 1, 2022
@strickvl strickvl deleted the bugfix/ENG-777-spellcheck-codespell branch June 1, 2022 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working internal To filter out internal PRs and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants