Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #1837

Closed
muglug opened this issue Jun 24, 2019 · 6 comments
Closed

Improve performance #1837

muglug opened this issue Jun 24, 2019 · 6 comments

Comments

@muglug
Copy link
Collaborator

muglug commented Jun 24, 2019

Running in a single process* (and using Phan's fallback parser), Phan is 20% faster than Psalm. Part of that is maybe due to Psalm doing some slightly deeper analysis, but Phan is also architected to be faster in a number of ways.

Cloning var types when cloning context

A significant (~25%) speedup can be gained by not cloning Context::$vars_in_scope (which contains the known types of variables and properties in a given scope) when branching (e.g. in if statements, foreach loops).

Turning off this cloning introduces a numbebr of bugs, but it's likely a surmountable problem.

PHP-Parser node types

Psalm stores them on PHP Parser nodes (which uses a "feature" of PHP allowing arbitrary properties set on nodes), but Phan stores them in its Context, which is a little faster.

Example here: https://gist.github.com/muglug/e3b8bb7dbbec4431c71ae541f61476e6

cc @TysonAndre

*when checking for dead code, Psalm is faster because it can do the analysis in multiple threads

@TysonAndre
Copy link
Contributor

phan --quick (plus unused variable detection and redundant condition detection) might be a better comparison to Psalm, since psalm doesn't recurse when union types aren't provided (after the extra checks, it'd be slower). It wouldn't make much of a difference in timing if the codebase was already fully qualified.

Phan does set dynamic properties on ast\Node when there was no other choice (search the codebase for suppress.*PhanUndeclaredProperty). It might be faster if I reused ast\Node->children - I haven't checked.

<?php

function example($x) {
    echo strlen($x);
}
example([]);

Dead code detection on multiple processes was something I thought about for Phan but haven't gotten around to trying to implement.

https://gist.github.com/muglug/e3b8bb7dbbec4431c71ae541f61476e6 - Phan stores inferred types in the Context because Phan analyzes the same function more than once when parameter types are missing. Caching it permanently in the Node would do the wrong thing the second time the node was analyzed and make it harder to free up that memory.


Suggestions for Optimizations

  1. Fully qualify global constants and global function calls, especially frequently used ones - e.g. is_string in ExpressionAnalyzer, strtolower in Reconciler, etc. (maybe a ~3% speedup if everything was fully qualified)

    ./phan --automatic-fix -P NotFullyQualifiedUsagePlugin can do that automatically - there's no way to specify the issue types to fix, but the issue type whitelist in .phan/config.php would help.

    Otherwise, the php interpreter has to check both the current namespace and the global namespace, and can't replace some calls (is_string, etc.) with specialized opcodes

  2. Consider immutable types and caching the union type instances for fully qualified union types such as those from CallMap (haven't checked what this does now)

  3. Consider storing only the variables that were added or changed in the branch and referring to the parent scope to avoid cloning hundreds of variables- see https://github.com/phan/phan/blob/master/src/Phan/Language/Scope/BranchScope.php#L29-L43 (for if, ?:, loops, etc.)

    Creating a BranchScope of the GlobalScope in each file is also useful, to avoid cloning hundreds of global variables (not sure what Psalm does)

  4. Benchmark using xdebug (outputting a call log) and kcachegrind to identify frequently used methods to focus on

  5. Benchmark using phpspy (The reversed call graph is useful - /path/to/phpspy/vendor/flamegraph.pl -r )

@muglug
Copy link
Collaborator Author

muglug commented Jun 25, 2019

Fully qualify global constants and global function calls

Great idea, will do that

Consider immutable types
Consider storing only the variables that were added or changed in the branch

Yeah, I have a slightly buggy branch that is working towards that (with the ~20% speedup mentioned above). It doesn't clone - just copies the type array, and tries to clone whenever the type is changed.

and caching the union type instances for fully qualified union types such as those from CallMap

this is done currently (at least for the CallMap)

Benchmark using xdebug

I do this occasionally, was great for some early gains, but has become less so as the project has done more and more

Benchmark using phpspy

Definitely going to try this, thanks!

muglug added a commit that referenced this issue Jun 26, 2019
LeSuisse added a commit to LeSuisse/psalm that referenced this issue Jun 26, 2019
This should give a small performance boost.
Part of vimeo#1837.

The change is enforced via phpcs and can be autofixed
with phpcbf.
muglug pushed a commit that referenced this issue Jun 26, 2019
This should give a small performance boost.
Part of #1837.

The change is enforced via phpcs and can be autofixed
with phpcbf.
muglug added a commit that referenced this issue Nov 26, 2019
@muglug
Copy link
Collaborator Author

muglug commented Jan 14, 2020

Another, more drastic change would be to transpile to Go.

VK's Go-powered noverify is roughly 4 times faster running on Psalm's own codebase (though, due to a bug, there's no speedup when running on Vimeo's codebase).

There'd be numerous challenges with such a transpilation – no plugins would work unless they, too, were transpiled.

@muglug
Copy link
Collaborator Author

muglug commented Jan 14, 2020

Closing for now as there's not much more that can be done on this front

@muglug muglug closed this as completed Jan 14, 2020
@muglug
Copy link
Collaborator Author

muglug commented Jan 14, 2020

But @TysonAndre would you mind running phpspy on Psalm's codebase and letting me know if there's anything interesting? It's Linux-only, and I'm Apple all the way.

@muglug
Copy link
Collaborator Author

muglug commented Jan 14, 2020

Nevermind, I figured out how to get it running. The only real feedback is that Algebra::groupImpossibilities is a performance hog, which I knew already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants