Speed up the generation of no-member suggestions #10277

correctmost · 2025-03-15T21:37:43Z

Type of Changes

	Type
✓	🔨 Refactoring

Description

Previously, edit distances were calculated for strings that had length differences greater than the maximum suggestion threshold.

Related PR:

Always use the suggestion mode for no-member / c-extension-no-member #9962

Stats

I profiled yt-dlp (commit bd0a6681) with the following .pylintrc file:

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=no-member

[REPORTS]
reports=no
score=no

Before

Command	Mean [s]	Min [s]	Max [s]	Relative
`pylint --recursive=y .`	39.161 ± 0.175	38.929	39.548	1.00

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   103292    3.706    0.000    5.653    0.000 pylint/checkers/typecheck.py:153(_string_distance)
 19623350    1.938    0.000    1.938    0.000 {built-in method builtins.min}
      245    0.039    0.000    5.747    0.023 pylint/checkers/typecheck.py:171(_similar_names)

After

Command	Mean [s]	Min [s]	Max [s]	Relative
`pylint --recursive=y .`	37.009 ± 0.249	36.650	37.333	1.00

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    15159    0.460    0.000    0.704    0.000 pylint/checkers/typecheck.py:153(_string_distance)
  2429889    0.244    0.000    0.244    0.000 {built-in method builtins.min}
      245    0.035    0.000    0.807    0.003 pylint/checkers/typecheck.py:175(_similar_names)

Previously, edit distances were calculated for strings that had length differences greater than the maximum suggestion threshold.

codecov · 2025-03-15T21:45:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.89%. Comparing base (3457433) to head (ab59393).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #10277   +/-   ##
=======================================
  Coverage   95.89%   95.89%           
=======================================
  Files         175      175           
  Lines       19073    19082    +9     
=======================================
+ Hits        18290    18299    +9     
  Misses        783      783

Files with missing lines	Coverage Δ
pylint/checkers/typecheck.py	`96.67% <100.00%> (+0.03%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Pierre-Sassoulas

Wow, a 5+% improvement ! That's nice.

It seems like a well known problems that we should not have to optimize ourselves though.

Would you mind comparing this with using difflib.SequenceMatcher ? I suggested something but it might not directly work. And I don't like adding dependencies but maybe using https://pypi.org/project/edlib/ would be worth it (it seems it's faster than difflib)..

pylint/checkers/typecheck.py

correctmost · 2025-03-18T00:38:57Z

Would you mind comparing this with using difflib.SequenceMatcher ?

difflib.get_close_matches is about 200ms faster than the PR code.

One drawback is that it may be hard to map existing missing-member-hint-distance values to cutoff ratios because of the different distance algorithms and because of floating point math. For example, I would expect "buff" not to match "buffer" in the first two cases:

import difflib

difflib.get_close_matches("buff", ["buffer"], n=3, cutoff=0.75)
-> ['buffer']

difflib.get_close_matches("buff", ["buffer"], n=3, cutoff=0.80)
-> ['buffer']

difflib.get_close_matches("buff", ["buffer"], n=3, cutoff=0.80000001)
-> []

Here's a separate case where I would expect a match but don't see one:

import difflib

difflib.get_close_matches("x", ["z"], n=3, cutoff=0.00001)
-> []

Python's own attribute suggestion code does not seem to use difflib:
https://github.com/python/cpython/blob/a931a8b32415f311008dbb3f09079aae1e6d7a3d/Lib/traceback.py#L1538-L1545

Here's the patch I used:

diff --git a/pylint/checkers/typecheck.py b/pylint/checkers/typecheck.py
index edef5188b..ab280e38b 100644
--- a/pylint/checkers/typecheck.py
+++ b/pylint/checkers/typecheck.py
@@ -6,9 +6,8 @@
 
 from __future__ import annotations
 
-import heapq
+import difflib
 import itertools
-import operator
 import re
 import shlex
 import sys
@@ -170,7 +169,7 @@ def _string_distance(seq1: str, seq2: str) -> int:
 
 def _similar_names(
     owner: SuccessfulInferenceResult,
-    attrname: str | None,
+    attrname: str,
     distance_threshold: int,
     max_choices: int,
 ) -> list[str]:
@@ -179,31 +178,24 @@ def _similar_names(
     The similar names are searched given a distance metric and only
     a given number of choices will be returned.
     """
-    possible_names: list[tuple[str, int]] = []
-    names = _node_names(owner)
+    if not attrname:
+        return []
 
-    for name in names:
-        if name == attrname:
-            continue
+    names = _node_names(owner)
+    possible_names = [name for name in names if name != attrname]
 
-        distance = _string_distance(attrname or "", name)
-        if distance <= distance_threshold:
-            possible_names.append((name, distance))
+    ratio = distance_threshold / len(attrname)
+    if distance_threshold == 1 and ratio == 1:
+        ratio = 0.5
 
-    # Now get back the values with a minimum, up to the given
-    # limit or choices.
-    picked = [
-        name
-        for (name, _) in heapq.nsmallest(
-            max_choices, possible_names, key=operator.itemgetter(1)
-        )
-    ]
-    return sorted(picked)
+    threshold = 1 - ratio - 0.0001
+    choices = difflib.get_close_matches(attrname, possible_names, max_choices, threshold)
+    return sorted(choices)
 
 
 def _missing_member_hint(
     owner: SuccessfulInferenceResult,
-    attrname: str | None,
+    attrname: str,
     distance_threshold: int,
     max_choices: int,
 ) -> str:
@@ -1209,7 +1201,7 @@ accessed. Python regular expressions are accepted.",
             if self.linter.config.missing_member_hint:
                 hint = _missing_member_hint(
                     owner,
-                    node.attrname,
+                    node.attrname or "",
                     self.linter.config.missing_member_hint_distance,
                     self.linter.config.missing_member_max_choices,
                 )

Pierre-Sassoulas · 2025-03-18T12:47:25Z

Neat.

It seems traceback use levenshtein distance while difflib use an algo that "does not yield minimal edit sequences, but does tend to yield matches that “look right” to people." (can't link the part of the doc specifically; https://docs.python.org/3/library/difflib.html)

Which would make https://pypi.org/project/edlib/ a better candidate as it's an optimized levenshtein distance ? Or we can copy paste the traceback code and not deal with a dependency if there isn't a big difference in performance ?

correctmost · 2025-03-23T22:36:54Z

I re-ran the benchmarks on an updated system (which is a bit slower now):

Baseline (main)

Command	Mean [s]	Min [s]	Max [s]	Relative
`pylint --recursive=y .`	41.209 ± 0.157	40.921	41.426	1.00

Current PR

Command	Mean [s]	Min [s]	Max [s]	Relative
`pylint --recursive=y .`	39.484 ± 0.194	39.195	39.816	1.00

traceback._levenshtein_distance

Command	Mean [s]	Min [s]	Max [s]	Relative
`pylint --recursive=y .`	39.363 ± 0.195	39.168	39.831	1.00

edlib

Command	Mean [s]	Min [s]	Max [s]	Relative
`pylint --recursive=y .`	39.296 ± 0.149	39.045	39.481	1.00

My preference is to move forward with the PR branch for these reasons:

I don't think the additional speed-up is worth adding a dependency on edlib. Also, the performance gains would be greatest for codebases that don't actively fix Pylint warnings :) (i.e., codebases with lots of no-member errors).
The traceback._levenshtein_distance implementation assigns different substitution costs for moves (cost of 2) and case differences (cost of 1). This makes it tricky to honor existing missing-member-hint-distance values in a backward-compatible manner.
Copying the traceback._levenshtein_distance code into Pylint would require reading and updating license text, which is a bit tedious.

Patches tested

traceback._levenshtein_distance patch

diff --git a/pylint/checkers/typecheck.py b/pylint/checkers/typecheck.py
index edef5188b..dddfa7da0 100644
--- a/pylint/checkers/typecheck.py
+++ b/pylint/checkers/typecheck.py
@@ -12,6 +12,7 @@ import operator
 import re
 import shlex
 import sys
+import traceback
 from collections.abc import Callable, Iterable
 from functools import cached_property, singledispatch
 from re import Pattern
@@ -170,7 +171,7 @@ def _string_distance(seq1: str, seq2: str) -> int:
 
 def _similar_names(
     owner: SuccessfulInferenceResult,
-    attrname: str | None,
+    attrname: str,
     distance_threshold: int,
     max_choices: int,
 ) -> list[str]:
@@ -182,12 +183,15 @@ def _similar_names(
     possible_names: list[tuple[str, int]] = []
     names = _node_names(owner)
 
+    # Multiply by two because _levenshtein_distance considers move changes to have a cost of 2
+    max_cost = distance_threshold * 2
+
     for name in names:
         if name == attrname:
             continue
 
-        distance = _string_distance(attrname or "", name)
-        if distance <= distance_threshold:
+        distance = traceback._levenshtein_distance(attrname, name, max_cost)
+        if distance <= max_cost:
             possible_names.append((name, distance))
 
     # Now get back the values with a minimum, up to the given
@@ -203,7 +207,7 @@ def _similar_names(
 
 def _missing_member_hint(
     owner: SuccessfulInferenceResult,
-    attrname: str | None,
+    attrname: str,
     distance_threshold: int,
     max_choices: int,
 ) -> str:
@@ -1209,7 +1213,7 @@ accessed. Python regular expressions are accepted.",
             if self.linter.config.missing_member_hint:
                 hint = _missing_member_hint(
                     owner,
-                    node.attrname,
+                    node.attrname or "",
                     self.linter.config.missing_member_hint_distance,
                     self.linter.config.missing_member_max_choices,
                 )

edlib patch

diff --git a/pylint/checkers/typecheck.py b/pylint/checkers/typecheck.py
index edef5188b..594d16b56 100644
--- a/pylint/checkers/typecheck.py
+++ b/pylint/checkers/typecheck.py
@@ -20,6 +20,7 @@ from typing import TYPE_CHECKING, Any, Literal, Union
 import astroid
 import astroid.exceptions
 import astroid.helpers
+import edlib
 from astroid import arguments, bases, nodes, util
 from astroid.nodes import _base_nodes
 from astroid.typing import InferenceResult, SuccessfulInferenceResult
@@ -170,7 +171,7 @@ def _string_distance(seq1: str, seq2: str) -> int:
 
 def _similar_names(
     owner: SuccessfulInferenceResult,
-    attrname: str | None,
+    attrname: str,
     distance_threshold: int,
     max_choices: int,
 ) -> list[str]:
@@ -186,7 +187,7 @@ def _similar_names(
         if name == attrname:
             continue
 
-        distance = _string_distance(attrname or "", name)
+        distance = edlib.align(attrname, name)["editDistance"]
         if distance <= distance_threshold:
             possible_names.append((name, distance))
 
@@ -203,7 +204,7 @@ def _similar_names(
 
 def _missing_member_hint(
     owner: SuccessfulInferenceResult,
-    attrname: str | None,
+    attrname: str,
     distance_threshold: int,
     max_choices: int,
 ) -> str:
@@ -1209,7 +1210,7 @@ accessed. Python regular expressions are accepted.",
             if self.linter.config.missing_member_hint:
                 hint = _missing_member_hint(
                     owner,
-                    node.attrname,
+                    node.attrname or "",
                     self.linter.config.missing_member_hint_distance,
                     self.linter.config.missing_member_max_choices,
                 )

Pierre-Sassoulas

Thank you for the benchmarks ! I agree with your analysis.

Previously, edit distances were calculated for strings that had length differences greater than the maximum suggestion threshold. Remove handling of missing-member-hint-distance = 0 Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com> (cherry picked from commit 9a6168d)

github-actions · 2025-03-25T12:50:31Z

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit ab59393

Previously, edit distances were calculated for strings that had length differences greater than the maximum suggestion threshold. Remove handling of missing-member-hint-distance = 0 Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com> (cherry picked from commit 9a6168d) Co-authored-by: correctmost <134317971+correctmost@users.noreply.github.com>

Speed up the generation of no-member suggestions

a43a797

Previously, edit distances were calculated for strings that had length differences greater than the maximum suggestion threshold.

This comment has been minimized.

Sign in to view

Remove handling of missing-member-hint-distance = 0

c73bc96

This comment has been minimized.

Sign in to view

Pierre-Sassoulas added the performance label Mar 16, 2025

Pierre-Sassoulas reviewed Mar 16, 2025

View reviewed changes

pylint/checkers/typecheck.py Show resolved Hide resolved

Pierre-Sassoulas added this to the 3.3.6 milestone Mar 16, 2025

Pierre-Sassoulas added the backport maintenance/3.3.x label Mar 16, 2025

Pierre-Sassoulas modified the milestones: 3.3.6, 3.3.7 Mar 20, 2025

Pierre-Sassoulas previously approved these changes Mar 24, 2025

View reviewed changes

Pierre-Sassoulas requested a review from DanielNoord March 24, 2025 06:03

DanielNoord previously approved these changes Mar 24, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into cm/speed-up-no-member

12582d7

This comment has been minimized.

Sign in to view

Add fragment

ab59393

Pierre-Sassoulas dismissed stale reviews from DanielNoord and themself via ab59393 March 25, 2025 12:27

Pierre-Sassoulas approved these changes Mar 25, 2025

View reviewed changes

Pierre-Sassoulas enabled auto-merge (squash) March 25, 2025 12:28

Pierre-Sassoulas merged commit 9a6168d into pylint-dev:main Mar 25, 2025
45 checks passed

github-actions bot mentioned this pull request Mar 25, 2025

[Backport maintenance/3.3.x] Speed up the generation of no-member suggestions #10311

Merged

correctmost deleted the cm/speed-up-no-member branch March 25, 2025 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the generation of no-member suggestions #10277

Speed up the generation of no-member suggestions #10277

correctmost commented Mar 15, 2025

codecov bot commented Mar 15, 2025 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

Pierre-Sassoulas left a comment

correctmost commented Mar 18, 2025

Pierre-Sassoulas commented Mar 18, 2025

correctmost commented Mar 23, 2025

Pierre-Sassoulas left a comment

This comment has been minimized.

github-actions bot commented Mar 25, 2025

Speed up the generation of no-member suggestions #10277

Speed up the generation of no-member suggestions #10277

Conversation

correctmost commented Mar 15, 2025

Type of Changes

Description

Stats

Before

After

codecov bot commented Mar 15, 2025 • edited Loading

Codecov Report

This comment has been minimized.

This comment has been minimized.

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

correctmost commented Mar 18, 2025

Pierre-Sassoulas commented Mar 18, 2025

correctmost commented Mar 23, 2025

Baseline (main)

Current PR

traceback._levenshtein_distance

edlib

Patches tested

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

This comment has been minimized.

github-actions bot commented Mar 25, 2025

codecov bot commented Mar 15, 2025 •

edited

Loading