Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PuppetManifest: ctags command never finishes on long puppet file #1912

Closed
ahakanbaba opened this issue Oct 8, 2018 · 17 comments
Closed

PuppetManifest: ctags command never finishes on long puppet file #1912

ahakanbaba opened this issue Oct 8, 2018 · 17 comments
Assignees

Comments

@ahakanbaba
Copy link
Contributor

I have a ~40K line long puppet file. ctags never finishes processing it. (I left it at lest 3-4 hours). Ctags is stuck at 100% cpu.

The example.pp file is attached.
example.pp.zip

The file looks like this


class some::name::path::segment::list {

  ::some::custom::resource { 'some_name':
    param_1              => 'cccccccccccccccccccc',
    param_2              => "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd",
    param_3              => 'eeeeeeeeeeeeeeeeeeeeeeeeeeeee',
    param_4              => 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk',
    param_5              => 'z',
    param_6              => true,
    param_7              => '"pppppppppppppppppppppppppp"',
    param_8              => 'jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj',
  }

........ repeat enough times to have 40000 lines 

  }
}

A half certain observation: If I remove all the params from all the declarations (param_1 to param_8) ctags does indeed complete. ( The owner of the issue needs to reproduce this. I think I am correct but It is worth to retry this one more time)

The ctags process looks like this in my mac's top -o cpu` output

PID    COMMAND      %CPU TIME     #TH   #WQ  #PORT MEM    PURG   CMPRS  PGRP  PPID  STATE    BOOSTS          %CPU_ME %CPU_OTHRS UID       FAULTS    COW     MSGSENT    MSGRECV    SYSBSD      SYSMACH    CSW        PAGEINS IDLEW    POWE
44256  ctags        99.4 08:01.77 1/1   0    12    3960K  0B     0B     44256 23487 running  *0[1]           0.00000 0.00000    980771313 2090      117     66         25         702         95         139624+    128     0        99.4

example.pp is indeed valid puppet file

$ /opt/puppetlabs/bin/puppet apply --noop example.pp
Notice: Compiled catalog for mbp-005063.ad.whirl.net in environment production in 3.31 seconds
Notice: Applied catalog in 0.14 seconds

The name of the parser: puppetManifest

The command line you used to run ctags:

./ctags --options=NONE  ~/Downloads/example.pp

The content of input file:
see above

The tags output you are not satisfied with:
no tags generated. ctags stuck

The tags output you expect:
Do not know at this point

The version of ctags:

$ ctags --version
Universal Ctags 0.0.0(0c50a745), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Oct  6 2018, 13:03:14
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +yaml, +case-insensitive-filenames

How do you get ctags binary:

Built it from source.

@ahakanbaba
Copy link
Contributor Author

Attached to the process with lldb and this is the stack trace I got repeatedly couple of times

(lldb) bt
* thread #1: tid = 0x1dfa2c, 0x00007fff9570c11b libsystem_c.dylib`tre_tnfa_run_parallel + 1875, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff9570c11b libsystem_c.dylib`tre_tnfa_run_parallel + 1875
    frame #1: 0x00007fff9570450c libsystem_c.dylib`tre_match + 291
    frame #2: 0x00007fff957043dc libsystem_c.dylib`regnexec + 64
    frame #3: 0x000000010dea4ed9 ctags`matchMultitableRegex + 531 at lregex.c:2014 [opt]
    frame #4: 0x000000010dea4cc6 ctags`matchMultitableRegex(lcb=<unavailable>, allLines=0x00007fdf89458960) + 454 at lregex.c:2227 [opt]
    frame #5: 0x000000010deb2baa ctags`matchLanguageMultilineRegexCommon(language=<unavailable>, func=(ctags`matchMultitableRegex at lregex.c:2213), allLines=0x00007fdf89458960) + 42 at parse.c:3412 [opt]
    frame #6: 0x000000010deb768c ctags`iFileGetLine + 284 at read.c:874 [opt]
    frame #7: 0x000000010deb7afb ctags`readLineFromInputFile + 11 at read.c:953 [opt]
    frame #8: 0x000000010dea3205 ctags`findRegexTags [inlined] fileReadLineDriver + 21 at lregex.c:1478 [opt]
    frame #9: 0x000000010dea3200 ctags`findRegexTags [inlined] findRegexTagsMainloop at lregex.c:1472 [opt]
    frame #10: 0x000000010dea3200 ctags`findRegexTags + 16 at lregex.c:1483 [opt]
    frame #11: 0x000000010deb2607 ctags`createTagsWithFallback1 [inlined] createTagsForFile(language=63, passCount=1) + 52 at parse.c:3005 [opt]
    frame #12: 0x000000010deb25d3 ctags`createTagsWithFallback1(language=<unavailable>, exclusive_subparser=<unavailable>) + 355 at parse.c:3109 [opt]
    frame #13: 0x000000010deb2ab5 ctags`parseFileWithMio [inlined] createTagsWithFallback(fileName="/Users/hbaba/Downloads/example.pp", language=63, mio=0x0000000000000000) + 35 at parse.c:3194 [opt]
    frame #14: 0x000000010deb2a92 ctags`parseFileWithMio(fileName="/Users/hbaba/Downloads/example.pp", mio=<unavailable>) + 290 at parse.c:3380 [opt]
    frame #15: 0x000000010dea75d5 ctags`createTagsForEntry(entryName="/Users/hbaba/Downloads/example.pp") + 181 at main.c:258 [opt]
    frame #16: 0x000000010dea6f80 ctags`batchMakeTags + 23 at main.c:303 [opt]
    frame #17: 0x000000010dea6f69 ctags`batchMakeTags(args=<unavailable>, user=<unavailable>) + 217 at main.c:460 [opt]
    frame #18: 0x000000010dea6ded ctags`main [inlined] runMainLoop(args=0x00007fdf89403980) + 221 at main.c:432 [opt]
    frame #19: 0x000000010dea6ddd ctags`main(argc=<unavailable>, argv=<unavailable>) + 205 at main.c:673 [opt]
    frame #20: 0x00007fff96a645ad libdyld.dylib`start + 1
    frame #21: 0x00007fff96a645ad libdyld.dylib`start + 1
(lldb)

@ahakanbaba
Copy link
Contributor Author

To further debug, I think I need debug-info. If there is a build flag that will generate the debug info, I would appreciate any help to set that up.

@masatake
Copy link
Member

masatake commented Oct 8, 2018

C source level debugger doesn't help for optimizing the mtable parser.
--verbose output may help us.

masatake added a commit to masatake/ctags that referenced this issue Oct 8, 2018
Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  142.32s user 21.10s system 99% cpu 2:43.67 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  73.08s user 17.89s system 99% cpu 1:31.16 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Oct 8, 2018
Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  73.08s user 17.89s system 99% cpu 1:31.16 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  49.81s user 15.95s system 99% cpu 1:05.91 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Oct 8, 2018
Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  49.81s user 15.95s system 99% cpu 1:05.91 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  46.45s user 15.58s system 99% cpu 1:02.12 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

masatake commented Oct 8, 2018

Finally, I myself found --verbose option is very useful for optimization :-). Thank you, @hadrielk.

masatake added a commit to masatake/ctags that referenced this issue Oct 8, 2018
…erals

Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  46.45s user 15.58s system 99% cpu 1:02.12 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  4.48s user 1.01s system 99% cpu 5.498 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

masatake commented Oct 8, 2018

About 30 times faster.

@masatake
Copy link
Member

masatake commented Oct 8, 2018

Please, try #1913.

@masatake masatake self-assigned this Oct 8, 2018
@ahakanbaba
Copy link
Contributor Author

I have checked out masatake/puppetManifest-optimization, cleaned and re-compiled. I still see that ctags is stuck at 100% cpu over several minutes.
Did you run on the ~40k line long example.pp ? Did it complete in your machine ?
I am using a 2016 MacBook Pro and after 10 minutes it is still running

PID    COMMAND      %CPU TIME     #TH   #WQ  #PORT MEM    PURG  CMPRS  PGRP  PPID  STATE    BOOSTS          %CPU_ME %CPU_OTHRS UID       FAULTS    COW     MSGSENT    MSGRECV    SYSBSD      SYSMACH    CSW        PAGEINS IDLEW    POWE
25300  ctags        99.0 10:06.85 1/1   0    12    5588K  0B    0B     25300 22906 running  *0[1]           0.00000 0.00000    980771313 2478      121     80         25         690         103        278654+    1       0        99.0

@masatake
Copy link
Member

masatake commented Oct 8, 2018

I run example.pp. The new code is about 30 times faster than before. However, I don't say it completes in 10 minutes. If it took a day on your pc, It will take an hour with the new code.

What happens if you make example.pp smaller?
If ctags enters an infinite loop, what you see is not a performance issue. It is a bug. In that case, a long input like example.pp is not suitable for reporting an issue. I would like you to find minimum input to reproduce the bug.

@ahakanbaba
Copy link
Contributor Author

ahakanbaba commented Oct 9, 2018

OK Fair enough.

I think your changes addressing this issue are extremely useful. Thanks for your help here, again.

I ran the following experiment and the 40K line input did complete. The files are named example_<lineCount>.pp

$ time ctags  example_10.000.pp
real	0m38.921s
user	0m38.755s
sys	0m0.069s

$ time ctags  example_20.000.pp
real	2m34.463s
user	2m34.225s
sys	0m0.179s

$ time ctags  example_30.000.pp
real	6m2.762s
user	6m0.463s
sys	0m1.122s

$ time ctags  example_40.000.pp
real	11m11.400s
user	11m8.746s
sys	0m1.497s

Having said that, I see a problematic tendency here.
Doubling the input size multiplies the runtime with about ~4x.
Does this look reasonable to you with knowing the implementation? Are there justifiably O(n^2) algorithms in ctags ?

As a user, I must say this is surprising. Again, not knowing about the implementation, sorry if I make wrong assumptions here. If I recall correctly, other source code parser's (like compilers) runtime does not increase quadratically with increasing source code size. Sorry if I am not doing a just comparison here.

@masatake
Copy link
Member

masatake commented Oct 9, 2018

Could you upload example_<lineCount>.pp.zip to somewhere?
I would like to run ctags with them on my pc.

@masatake
Copy link
Member

masatake commented Oct 9, 2018

Reproduced. I expected O(n). I cannot find a good explanation for O(n^2).

masatake added a commit to masatake/ctags that referenced this issue Oct 9, 2018
Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  142.32s user 21.10s system 99% cpu 2:43.67 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  73.08s user 17.89s system 99% cpu 1:31.16 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Oct 9, 2018
Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  73.08s user 17.89s system 99% cpu 1:31.16 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  49.81s user 15.95s system 99% cpu 1:05.91 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Oct 9, 2018
Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  49.81s user 15.95s system 99% cpu 1:05.91 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  46.45s user 15.58s system 99% cpu 1:02.12 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Oct 9, 2018
…erals

Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  46.45s user 15.58s system 99% cpu 1:02.12 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  4.48s user 1.01s system 99% cpu 5.498 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Oct 9, 2018
Releated to universal-ctags#1912.
example.pp.zip submitted to universal-ctags#1921 is used to take a benchmark.

Before:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  4.48s user 1.01s system 99% cpu 5.498 total

After:
[yamato@master]~/var/ctags-github% time ./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1
./ctags --verbose /tmp/example.pp > /tmp/LOG 2>&1  3.46s user 0.84s system 99% cpu 4.312 total

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@ahakanbaba
Copy link
Contributor Author

Could you upload example_<lineCount>.pp.zip to somewhere?
I would like to run ctags with them on my pc.

Sorry for the delay.
I'll still upload the example_<lineCount>.pp files for the record.
examples.zip

@ahakanbaba
Copy link
Contributor Author

Reproduced. I expected O(n). I cannot find a good explanation for O(n^2).

Ok Great. This is good news.
I have debugged similar performance problems with valgrind before. In the makefiles, there looks like support for valgrind.

In my experience, setting up valgrind was time consuming and tedious. But once that is done, you generate a call graph. There you can see which functions are called N times and which ones N^2 times. That information helps with the diagnosis significantly.

Sorry If I am stating the obvious here.

@ahakanbaba
Copy link
Contributor Author

ahakanbaba commented Oct 9, 2018

What do you say we close this issue with the improvements you have done and merge them? Because technically the 40K line file's processing does not take indefinitely anymore.

Then we can create another issue for the O(N^2) behavior.

@masatake
Copy link
Member

masatake commented Oct 9, 2018

Conceptually, the puppetManifestParser is not written in C.
The most of the part is written in the set of regex patterns.
If you are interested in improving the parser, please, consider to learn _mtable-regex option.

I should know the reason of O(n^2). However, I must work on another item about Go parser.

@masatake
Copy link
Member

masatake commented Oct 9, 2018

O.k. I will merge the pull request associated with this issue. Then I will clsoe this.
Feel free open a new issue. However, I cannot promise I will work on the item near future.

masatake added a commit that referenced this issue Oct 9, 2018
@masatake masatake closed this as completed Oct 9, 2018
@ahakanbaba
Copy link
Contributor Author

ahakanbaba commented Oct 9, 2018

O.k. I will merge the pull request associated with this issue. Then I will clsoe this.
Feel free open a new issue. However, I cannot promise I will work on the item near future.

Thank you very much. I will create a new issue. Even if it will not be fixed soon, it is important to keep track of it.
Created #1914

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants