Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark refactor - argparse CLI #533

merged 9 commits into from Apr 25, 2022


Copy link

@Erotemic Erotemic commented Apr 21, 2022

@bwoodsend This is a paired down version of #532 that is ready for review / merging. There are two main features added here:

  1. Uses the standard library's cpuinfo module to print CPU details when reporting the results of the benchmarks. This will help ensure that users don't compare incomparable data and can serve as a reminder for what benchmarks can be compared against each other. (whoops)

  2. Adds an argparse CLI so the user can specify specific modules to disable. I personally don't care about the other json modules that aren't a drop-in replacement for Python's json module. Also the user can specify a --factor as a fraction to speed up the benchmarks for easier hacking.

To add these two features I had to do a little bit of refactoring. I wanted to make it easy to add / remove any new implementation that might come onto the scene, so I registered each benchmark with it's associated library name using a decorator and am passing the list of library names that the user requested to test to the benchmark functions. (I was very tempted to factor out the global variables but I restrained myself).

Speaking of global variables, I made it such that the external libraries aren't imported until the script knows they are needed. I use importlib to implement this as a loop, and then I set those modules as global variables so the rest of the benchmark structure can work without modification.

Lastly, I added a note about what units the benchmarks were in, because I always confuse myself with that.

Copy link

codecov-commenter commented Apr 21, 2022

Codecov Report

Merging #533 (052add4) into main (b3f8754) will not change coverage.
The diff coverage is n/a.

Current head 052add4 differs from pull request most recent head 3d25fb4. Consider uploading reports for the commit 3d25fb4 to get more accurate results

@@           Coverage Diff           @@
##             main     #533   +/-   ##
  Coverage   91.75%   91.75%           
  Files           6        6           
  Lines        1819     1819           
  Hits         1669     1669           
  Misses        150      150           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3f8754...3d25fb4. Read the comment docs.

Copy link

Uses the standard library's cpuinfo module to print CPU details when reporting the results of the benchmarks.

> python -VV
Python 3.10.1
> python -c 'import cpuinfo'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'cpuinfo'

Are you sure it's standard lib?

Copy link
Contributor Author

I suppose I was mistaken. It looks like cpuinfo is coming from torch. It's not important enough to add a dependency. I'll remove that.

@Erotemic Erotemic changed the title Benchmark refactor - CPU info and argparse CLI Benchmark refactor - argparse CLI Apr 21, 2022
Copy link

@bwoodsend bwoodsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a marginally nicer way to eliminate the global LIBRARIES variable would be to have a class which takes libraries as a parameter and stores it as an attribute, then make all the functions that now take a libraries parameter methods of that class instead. But only marginally nicer. Honestly, if I were that allergic to OOP-destroying global variables then I would have abandoned Python altogether in favour pedantically object oriented, but otherwise underwhelming at every turn, Java 🙃 .

tests/ Outdated Show resolved Hide resolved
Copy link
Contributor Author

I was considering a class that stored the parameters as an alternative. But as marginally nice as that is, this script isn't run as part of a larger system, so the globals only bother me on an aesthetic level, they don't have any major down side that I can see.

When I get around to working on #532 go get nice graphs for the benchmarks (and hopefully t-tests to indicate when there is a high probability regression), I'll be rewriting most of this, so I figured I'll just keep the changes here minimal.

@bwoodsend bwoodsend requested a review from hugovk April 23, 2022 10:13
hugovk approved these changes Apr 25, 2022
Copy link

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@hugovk hugovk added the changelog: Changed For changes in existing functionality label Apr 25, 2022
@hugovk hugovk merged commit a900e46 into ultrajson:main Apr 25, 2022
24 checks passed
Copy link

hugovk commented Apr 25, 2022

Oops, this failed on the CI:

Please could you check it?

Let's also add this to run the benchmark on PRs changing the benchmark script or workflow file:

      - main
      - ".github/workflows/benchmark.yml"
      - "tests/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
changelog: Changed For changes in existing functionality
None yet

Successfully merging this pull request may close these issues.

None yet

4 participants