Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code for performance #46

Closed
wants to merge 1 commit into from

Conversation

wiseaidev
Copy link

@wiseaidev wiseaidev commented Apr 1, 2023

  • Refactor and remove unnecessary keys, list, and dict calls.

Note: I haven't tested these changes, but things should work as expected.

@@ -61,7 +61,7 @@ def __init__(
):
self.optimizer = optimizer
self.lr_dict = lr_dict
self.group_names = list(self.lr_dict.keys())
self.group_names = [* self.lr_dict]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way compromises the readability

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is a tradeoff between readability and performance. It is up to the team to decide which to maximize. You may argue that I would prefer readability for maintainability over that little performance gain, and that's totally fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no practical performance benefit from doing it this way.
It's trivial to benchmark the difference:

In [4]: def test1():
   ...:     return list(a.keys())
   ...: 

In [5]: def test2():
   ...:     return [*a]
   ...: 

In [6]: %timeit -n 10_000_000 test2
23.1 ns ± 0.214 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %timeit -n 10_000_000 test1
23.1 ns ± 0.246 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

using ipython

Copy link
Author

@wiseaidev wiseaidev Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> from timeit import timeit
>>> timeit("[]")
0.09845466999831842
>>> timeit("list()")
0.22419986899876676

Also, less memory footprint:

>>> import dis
>>> dis.dis("[]")
  1           0 BUILD_LIST               0
              2 RETURN_VALUE
>>> dis.dis("list()")
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

Copy link

@KristinnVikarJ KristinnVikarJ Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is simply not the same code at all, you're comparing apples to oranges.
My benchmark shows the exact change you did, and shows no performance improvement in the slightest.

In fact, if you run the benchmark using Python 3.11 the older method is consistently faster.

Python 3.11.2 (main, Feb  8 2023, 14:49:25) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: a = {"a": 5, "k": 3, "asdf": 53}

In [2]: def test1():
   ...:     return list(a.keys())
   ...: 

In [3]: def test2():
   ...:     return [*a]
   ...: 

In [4]: %timeit -n 10_000_000 test2
13.7 ns ± 1.16 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: %timeit -n 10_000_000 test1
13 ns ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [6]: %timeit -n 10_000_000 test1
13 ns ± 1.22 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [7]: %timeit -n 10_000_000 test2
14.1 ns ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [12]: dis.dis(test1)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + list)
             14 LOAD_GLOBAL              2 (a)
             26 LOAD_METHOD              2 (keys)
             48 PRECALL                  0
             52 CALL                     0
             62 PRECALL                  1
             66 CALL                     1
             76 RETURN_VALUE

In [13]: dis.dis(test2)
  1           0 RESUME                   0

  2           2 BUILD_LIST               0
              4 LOAD_GLOBAL              0 (a)
             16 LIST_EXTEND              1
             18 RETURN_VALUE

While test2 has a smaller bytecode output, it is still slower than test1. Not all opcodes are equally fast.

Copy link
Author

@wiseaidev wiseaidev Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like python 3.11 introduced a lot of optimizations out of the box, and it is the default behavior under the hood. On python 3.9, I am getting these results:

>>> timeit('[* {"a": 5, "k": 3, "asdf": 53}]')
0.7558078520014533
>>> timeit('list({"a": 5, "k": 3, "asdf": 53}.keys())')
1.142674779999652

Copy link

@KristinnVikarJ KristinnVikarJ Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the change would have a minor performance benefit, none of the code paths seem to be hot, so these nano optimizations gain nothing in practice.

@jjh42
Copy link

jjh42 commented Apr 1, 2023

Appreciate the interest and contribution. As we are in the process of moving to py 3.11, I don't think this will bring any performance benefits so I'm closing.

@jjh42 jjh42 closed this Apr 1, 2023
@no-identd no-identd mentioned this pull request Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants