Skip to content

[CI] fix spelling mistakes in the codebase by adding codespell #2550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jun 18, 2025

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented Jun 16, 2025

Description

Root out all the misspellings in the codebase by using codespell (https://github.com/codespell-project/codespell) a tool currently also in use in oneTBB. This can be simply added as a pre-commit hook along with our other linters. This should improve the quality of the codebase and docs at little cost (5 seconds or less). This will improve the overall professional quality we show externally.

This PR includes fixes for all hits which were found (73 files).

No performance benchmarks necessary as it is a lint.


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

Copy link

codecov bot commented Jun 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag Coverage Δ
azure 79.84% <ø> (ø)
github 73.60% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
onedal/cluster/dbscan.cpp 39.13% <ø> (ø)
onedal/datatypes/_data_conversion.py 90.62% <ø> (ø)
onedal/datatypes/dlpack/dtype_conversion.cpp 79.66% <100.00%> (ø)
onedal/datatypes/sycl_usm/data_conversion.cpp 41.86% <ø> (ø)
onedal/datatypes/sycl_usm/sycl_usm_utils.cpp 48.35% <ø> (ø)
onedal/datatypes/tests/common.py 92.15% <ø> (ø)
onedal/ensemble/forest.py 73.49% <ø> (ø)
sklearnex/_device_offload.py 78.75% <ø> (ø)
sklearnex/base.py 75.75% <ø> (ø)
sklearnex/linear_model/incremental_ridge.py 86.66% <ø> (ø)
... and 2 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@icfaust icfaust marked this pull request as ready for review June 16, 2025 08:00
@icfaust icfaust changed the title [test, CI] fix spelling mistakes in codebase [CI] fix spelling mistakes in codebase by adding codespell Jun 16, 2025
@icfaust icfaust changed the title [CI] fix spelling mistakes in codebase by adding codespell [CI] fix spelling mistakes in the codebase by adding codespell Jun 16, 2025
alg = daal4py.dbscan(
method="defaultDense",
fptype=fpt,
fptype=getFPType(XX),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icfaust Could this be made as an optional check producing some report like the code coverage ones? Otherwise, I don't think it'd be helpful to require changes like this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could only find this: https://stackoverflow.com/questions/75500530/how-to-avoid-disabling-a-pre-commit-hook so it looks to not be possible. Let me know why you think not fixing these issues is considered better (given this will reduce review burden). If something is truly desired there are ways to in a case-by-case basis to disable the check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this specific case of 'fpt' disable the check as an example here? Could be a good test for the CI logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a file under .github/.codespellrc which can deselect words for the whole codebase, the whole list is currently : als,coo,inout,nd,ons. I have changed that specific circumstance to an inline ignore to show the case-by-case basis as was desired.

@david-cortes-intel
Copy link
Contributor

@icfaust Since this appears to take a lot longer to run than the usual linting, could it be moved into a separate job?

@icfaust
Copy link
Contributor Author

icfaust commented Jun 16, 2025

@david-cortes-intel
Copy link
Contributor

@icfaust Since this appears to take a lot longer to run than the usual linting, could it be moved into a separate job?

The last 5 merges to main #2544 58s (https://dev.azure.com/daal/daal4py/_build/results?buildId=47635&view=results) #2547 54s (https://dev.azure.com/daal/daal4py/_build/results?buildId=47744&view=results) #2514 1m 0s (https://dev.azure.com/daal/daal4py/_build/results?buildId=47746&view=results) #2530 1m 2s (https://dev.azure.com/daal/daal4py/_build/results?buildId=47747&view=results) #2521 1m 1s (https://dev.azure.com/daal/daal4py/_build/results?buildId=47750&view=results)

The run for this PR: 54s https://dev.azure.com/daal/daal4py/_build/results?buildId=47748&view=results

I don't understand? Could you link me to where you were looking?

Yes, seems it's fine - I might have been looking at the waiting time which was longer.

@icfaust
Copy link
Contributor Author

icfaust commented Jun 16, 2025

gotcha. it seems that a bunch of PRs were merged today to oneDAL and sklearnex all at once so wait times for runners went up quite a lot

@napetrov napetrov requested a review from Copilot June 16, 2025 15:40
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates codespell into the repository to automatically fix spelling mistakes across the codebase and documentation. The fixes improve text clarity and enhance the professional quality of the project.

  • Corrects various misspellings in comments and documentation.
  • Adds a pre-commit hook configuration for codespell.
  • Provides a configuration file for codespell.

Reviewed Changes

Copilot reviewed 75 out of 75 changed files in this pull request and generated no comments.

Show a summary per file
File Description
examples/daal4py/decision_forest_classification_default_dense.py Corrects spelling in comments (e.g. from "Traiing" to "Trailing")
examples/daal4py/covariance.py Fixes comment typo ("ourselfs" to "ourselves").
examples/daal4py/association_rules.py Corrects multiple misspellings in comments.
doc/sources/quick-start.rst Fixes typo in documentation text.
doc/sources/kaggle/regression.rst Corrects misspellings ("paramters" to "parameters").
doc/sources/kaggle/classification.rst Corrects multiple spelling errors in steps description.
doc/sources/kaggle/automl.rst Fixes spelling mistakes in the kernel example documentation.
doc/sources/ideas.rst Corrects typos and improves phrasing in project ideas.
doc/sources/daal4py.rst Fixes comment typos in code example references.
doc/sources/array_api.rst Corrects typo ("witout" to "without") in documentation.
doc/sources/about_daal4py.rst Fixes spelling mistakes in several commentary sections.
deselected_tests.yaml Corrects comment typo regarding tree count behavior.
daal4py/sklearn/utils/validation.py Fixes typos in comment descriptions.
daal4py/sklearn/tree/decision_tree.py Fixes a comment typo ("corrsponding" to "corresponding").
daal4py/sklearn/svm/svm.py Corrects spelling in error message text.
daal4py/sklearn/manifold/_t_sne.py Fixes typo in exception message ("suported" to "supported").
daal4py/sklearn/cluster/dbscan.py Adds codespell ignore comments for variable naming.
.pre-commit-config.yaml Adds codespell hook configuration.
.github/scripts/doc_release.sh Corrects typo in comment ("verison" to "version").
.github/.codespellrc New configuration file for codespell.
Comments suppressed due to low confidence (1)

examples/daal4py/decision_forest_classification_default_dense.py:50

  • The comment now uses 'Trailing' which might be a typo. If this comment is intended to refer to the result produced by a training algorithm, consider changing 'Trailing' to 'Training' for better clarity.
# Trailing result provides (depending on parameters) model,

@napetrov
Copy link
Contributor

/intelci: run

Copy link
Contributor

@ethanglaser ethanglaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to add this, thanks for doing it! On commented stuff - excellent. On variable names, I am fine to leave or keep as is. On functions and args - we should be careful but looks like theyre used internally here. But I don't think changing inpt->inp is necessary anyway.

@@ -972,7 +972,7 @@ def prepare_hlwrapper(self, ns, mode, func, no_dist, no_stream):
itype,
"const",
dflt,
inpt=True,
inp=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so sure about this one

@icfaust
Copy link
Contributor Author

icfaust commented Jun 18, 2025

/intelci: run

@icfaust icfaust merged commit ed954c1 into uxlfoundation:main Jun 18, 2025
31 of 32 checks passed
@icfaust icfaust deleted the codespell branch June 18, 2025 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants