Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmprofshow breaks on multiline dictionary comprehensions #118

Closed
jogo opened this issue Feb 17, 2017 · 5 comments · Fixed by #119
Closed

vmprofshow breaks on multiline dictionary comprehensions #118

jogo opened this issue Feb 17, 2017 · 5 comments · Fixed by #119

Comments

@jogo
Copy link
Contributor

jogo commented Feb 17, 2017

Using cpython 2.7

test.py:

def foo():
    a = {
        i: i**i
        for i in range(10000)}
    print "END"

foo()
$ python -m vmprof --lines -o test.prof test.py
END
$ vmprofshow --lines test.prof
Total hits: 1480 s
File: test.py
Function: <dictcomp> at line 3
Traceback (most recent call last):
  File "/usr/local/bin/vmprofshow", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 159, in main
    pp.show(args.profile)
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 182, in show
    self.show_func(filename, funline, funname, line_stats)
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 234, in show_func
    sublines = inspect.getblock(all_lines[start_lineno-1:])
  File "/usr/lib/python2.7/inspect.py", line 677, in getblock
    tokenize.tokenize(iter(lines).next, blockfinder.tokeneater)
  File "/usr/lib/python2.7/tokenize.py", line 169, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/usr/lib/python2.7/tokenize.py", line 175, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/usr/lib/python2.7/tokenize.py", line 357, in generate_tokens
    raise TokenError, ("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (6, 0))

Specifically the line i: i**i takes the most time. But when handling a tokenize error the same way as a file missing I get:

Total hits: 1480 s
File: test.py
Function: <dictcomp> at line 3

Line #     Hits   % Hits  Line Contents
=======================================
     3        1      0.1
     4     1479     99.9

Total hits: 1481 s
File: test.py
Function: <module> at line 1

Line #     Hits   % Hits  Line Contents
=======================================
     1                    def foo():
     2                        a = {
     3                            i: i**i
     4                            for i in range(10000)}
     5                        print "END"

Total hits: 2 s

Could not find file -
Are you sure you are running this program from the same directory
that you ran the profiler from?
Continuing without the function's contents.

Line #     Hits   % Hits  Line Contents
=======================================
     0        2    100.0

Total hits: 1480 s
File: test.py
Function: foo at line 1

Line #     Hits   % Hits  Line Contents
=======================================
     1                    def foo():
     2                        a = {
     3                            i: i**i
     4     1480    100.0          for i in range(10000)}
     5                        print "END"

Total hits: 1 s

I tried playing around with inspect.getblock for dictionary comprehension by hand and was unable to get it to work properly.

So I am not sure that multiline dictionary comprehensions are measured correctly per line.

At the very least we can prevent vmprofshow --lines test.prof from crashing by handling the tokenize error in vmprof.show.

@jogo
Copy link
Contributor Author

jogo commented Feb 22, 2017

diff --git a/vmprof/show.py b/vmprof/show.py
index 68d1701..e35a456 100644
--- a/vmprof/show.py
+++ b/vmprof/show.py
@@ -224,7 +224,7 @@ class LinesPrinter(object):
             return

         stream.write("Total hits: %g s\n" % total_hits)
-        if os.path.exists(filename) or filename.startswith("<ipython-input-"):
+        if (os.path.exists(filename) or filename.startswith("<ipython-input-")) and func_name != "<dictcomp>":
             stream.write("File: %s\n" % filename)
             stream.write("Function: %s at line %s\n" % (func_name, start_lineno))
             if os.path.exists(filename):

Fixes the issue for me, although there may be a better way.

@planrich
Copy link
Contributor

I think that your patch filters dict comprehensions, the error indicates up the following issue:

inspect.getblock rightfully raises a tokenize error, because in the example you provided, it cannot parse a full python block. it is missing the starting bracket { for the dict comprehensions.

This means that multiline list comprehensions are also broken.

I think a better fix would be to parse the whole file, and iterate each syntax element and check if the startline <= line <= endline, where line is the line you want to show.

@jogo
Copy link
Contributor Author

jogo commented Feb 22, 2017

I tested a multi line list comprehension and it appears to work.

Total hits: 747 s
File: test.py
Function: foo at line 7

Line #     Hits   % Hits  Line Contents
=======================================
     7                    def foo():
     8                        a = [
     9                            i**i
    10      747    100.0          for i in range(10000)]
    11                        print "END"
>>> import linecache, inspect
>>> all_lines = linecache.getlines('test.py')
>>> all_lines[5:]
['\n', 'def foo():\n', '    a = {\n', '        i: i**i\n', '        for i in range(10000)}\n', '    print "END"\n', '\n', 'foo()\n', 'bar()\n']
>>> all_lines[7:]
['    a = {\n', '        i: i**i\n', '        for i in range(10000)}\n', '    print "END"\n', '\n', 'foo()\n', 'bar()\n']
>>> inspect.getblock(all_lines[7:])
['    a = {\n']
>>> inspect.getblock(all_lines[8:])
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    inspect.getblock(all_lines[8:])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 677, in getblock
    tokenize.tokenize(iter(lines).next, blockfinder.tokeneater)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 170, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 176, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 363, in generate_tokens
    raise TokenError, ("EOF in multi-line statement", (lnum, 0))
TokenError: ('EOF in multi-line statement', (7, 0))

I am not sure how to get the endline, but here is a pull request that I think is close to what you describe: #119

@planrich
Copy link
Contributor

Yes, that looks better. I tried to use the ast module to find the lines by walking the graph nodes. The reason I failed is: There is no endlineno attritube on those nodes, and I dont see an easy way to compute those.

As I see this now, this gives a better result than previously, so lets merge it.

@jogo
Copy link
Contributor Author

jogo commented Feb 23, 2017

@planrich thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants