Adds --baseline #1276

sobolevn · 2020-03-23T20:57:08Z

coveralls · 2020-03-24T00:06:34Z

Pull Request Test Coverage Report for Build 2864

87 of 87 (100.0%) changed or added relevant lines in 6 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 2847:	0.0%
Covered Lines:	5525
Relevant Lines:	5525

💛 - Coveralls

coveralls · 2020-03-24T00:06:35Z

Pull Request Test Coverage Report for Build 2872

87 of 87 (100.0%) changed or added relevant lines in 6 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 2847:	0.0%
Covered Lines:	5525
Relevant Lines:	5525

💛 - Coveralls

asottile · 2020-03-24T15:49:33Z

wemake_python_styleguide/logic/baseline.py

+import json
+import os
+from collections import Counter, defaultdict
+from hashlib import md5


some distributions compile out md5 so you'll probably want a stronger hash function

Sure! Any suggestions based on your experience?

sha256 should do

sobolevn · 2020-03-25T09:58:18Z

@lk-geimfari do you have any ideas about this PR? I would love to hear them 🙂

sobolevn · 2020-03-25T10:07:42Z

Ok, here's the problem I am facing:

I have this code in my file: x1 = f''
I record a baseline from it: it contains just this WPS305 Found f string hash
Next I refactor my code to be x0 = f''; x1 = f''
Baseline reports x1 = f'' and not x0 = f'' because the error message is the same and the first match wins

Is it really a problem? On one hand, I am reporting an invalid violation. On the second hand, I am reporting the exact same violation but in a different place. I am not sure.

sobolevn · 2020-03-25T10:17:57Z

I guess it is: because users won't be able to even find the new violation. And sometimes it is really hard to fix existing ones. Consider this example:

User has a too complex jones score for a line. It cannot be refactored easily. Because of reasons. Mostly legacy ones. And then user writes some new complex line in a new function before the existing one. What happens? The existing one is reported. But, user cannot touch it. It is complex and important. And he even cannot identify where is the new complex one?

I don't want to be in a situation like this! I need to refactor the existing solution to fix cases like this one. We should also make a decision: which violations are we reporting and which one are fine.

I will refactor this part:

        for (error_code, line_number, column, text, physical_line) in results:
            # --- patch start
            # Here we ignore violations present in the baseline.
            if self.baseline and self.baseline.has(filename, error_code, text):
                continue

Into something like this:

for (error_code, line_number, column, text, physical_line) in baseline.only_new_violations(results):

I would also change the baseline format: I would need locations and physical_line too.

sobolevn · 2020-03-25T12:22:31Z

docs/pages/usage/integrations/legacy.rst

+Baseline
+--------
+
+You can start using it with just a single command!


You can start using it with just a single command!

Better:

You can start using our linter with just a single command!

* fix: check wrong variable name as case insensitive Issue #1275 * test: add test_wrong_variable_name_case_insensitive Issue #1275 * docs: add bugfix changelog entry in 0.15.0 Issue: #1275 * test: add integration test for uppercase wrong name Issue: #1275 * lint: fix import sorting and empty lines separation * refactor: pass ignored_types as a tuple * test: add class_name fixture template

Bumps [docutils](http://docutils.sourceforge.net/) from 0.13.1 to 0.16. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>

Bumps [importlib-metadata](http://importlib-metadata.readthedocs.io/) from 1.5.0 to 1.5.2. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>

skarzi

really helpful feature 👍

tests/plugins/files.py

skarzi · 2020-03-26T19:12:47Z

wemake_python_styleguide/patches/baseline.py

+
+        # --- patch start
+        if self.baseline is None:
+            self.baseline = baseline.save_to_file(self.saved_reports)


Maybe some message should be logged that baseline file is missing and it is going to be created?

Co-Authored-By: Łukasz Skarżyński <skarzynski_lukasz@protonmail.com>

sobolevn · 2020-03-29T10:28:05Z

Based on the feedback I am going to change some details about how this works:

I am going to store metadata in the baseline file. Some people said that they wanted to see the last date baseline was modified and when it was originally created
I am going to add --baseline-refactoring flag that will force us to remove at least one violation from the baseline
I am going to make the baseline mutable: whenever we remove any violations from the source, baseline needs to shrink by the given size
I am going to report that a new baseline is created
I am going to store human-readable information about all violations, not hashes
I am going to change how violations are reported. I will try to do an intelligent guess here based on line number, column, line text, error code, and message

Bumps [importlib-metadata](http://importlib-metadata.readthedocs.io/) from 1.5.2 to 1.6.0. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>

sobolevn · 2020-05-11T10:00:46Z

Results:

baseline is immutable. It was a bad idea: a lot of tools do run flake8 a lot: like vscode integration, pre-commit hooks, etc. It would a hell to debug what went wrong and why now this violation is reported
--baseline-refactoring is very hard to implement, I will just leave it as is for now

sobolevn · 2020-05-12T19:39:23Z

So, I have news!

I wrote a fuzzy matcher for baseline entries. It even includes several steps of negative and forehead lookups on what lines are there to determine if this code is new or not.

But!

It is a very hard thing to explain. I ended up not understanding the rules myself. It is a hell to test! And I have decided to try exactly the opposite solution this time: minimalism.

We can have two states of code: touched or untouched.
It is pretty clear what "untouched" code is: when it is equal to what we have in the baseline.
"Touched" is way more interesting. Let me explain.

First of all, let's say we have this python file (which is baselined):

– – – – – –
– –
– – – – –
– – – – – –

One can add a new line on top of the file:

x x
– – – – – –
– –
– – – – –
– – – – – –

Which will make all other lines "touched". Or one can add a line in the middle:

– – – – – –
– –
x x
– – – – –
– – – – – –

This will make only part of the lines touched.

The same is applied when we are talking about removing a line:

– –
– – – – –
– – – – – –

And that's fine. Sometimes people do have to insert new and remove things within a legacy code.
We can handle these two cases by making only one change in our algorithm: we can ignore the line number.

If one touch any line in any other manner, then sorry: you have to refactor it or to create a new baseline.

This way we can achieve the correct and still useful behaviour. First round eliminates all exact matches, the second one ignores line numbers. All other violations are reported as new ones.

Dreamsorcerer · 2020-05-30T15:48:46Z

docs/pages/usage/integrations/legacy.rst

+Futher runs won't report any of the violations inside the baseline.
+
+If you already have ``.flake8-baseline.json`` file,
+than your baselined violations will be ignored.


Suggested change

than your baselined violations will be ignored.

then your baselined violations will be ignored.

Dreamsorcerer · 2020-05-30T16:23:04Z

Looking forward to this, I was just starting to implement this myself after finding that flakehell is fairly broken for our code base.

The documentation only talks about a CLI argument. Will it work with the config file too? If I set up a baseline, I'm not going to want to run flake8 without the baseline..
I imagined having a baseline = file_name in the config, and then using an explicit --create-baseline or similar argument for creating/overwriting the baseline.

I think being able to conveniently overwrite the baseline in a single step is also helpful. As an example, as I'm trying to roll this out into our project, not everyone will be running flake8 immediately (this gives us a chance to test it out and find the right balance in our flake8 config). This means those of us using it will likely need to create new baselines frequently to accommodate other people's changes until we have everyone running flake8 and can enforce it in CI.

Is this likely to make it into flake8 directly in future? Seems like something that should be useful for flake8 in general.

sobolevn · 2020-05-31T08:57:10Z

Hi, @Dreamsorcerer! Thanks a lot for your interest!

The documentation only talks about a CLI argument. Will it work with the config file too?

Yes, it surely will! This is just how flake8 works, so I didn't specify this explicitly.

and then using an explicit --create-baseline or similar argument for creating/overwriting the baseline

Can you please help me in researching this approach vs deleting a baseline manually and creating a new one with --baseline turned on?

I was thinking that manual deletion makes it less magical. And you control this process with very understandable and clear actions. But, I would love to hear other arguments!

Is this likely to make it into flake8 directly in future? Seems like something that should be useful for flake8 in general.

Yes, we had a discussion about it in https://gitlab.com/pycqa/flake8/-/issues/602
I surely want to collaborate and provide my implementation as a starting point.

But, this is a very complex feature. So, I am not sure about its future in the core. You can ask @asottile about his opinion on this.

I am fine with both outcomes:

Core feature will be more tested, native, documented, and more well-known
Our own implementation will attract more people to wemake-python-styleguide directly. It is a cool unique feature to have!

I was just starting to implement this myself after finding that flakehell is fairly broken for our code base.

I would love to see your solution. Also, if you wish we can collaborate on this PR together. Because, I am struggling to find some free time to work on this. (Currently I am too busy writing returns)

Dreamsorcerer · 2020-05-31T10:40:26Z

and then using an explicit --create-baseline or similar argument for creating/overwriting the baseline

Can you please help me in researching this approach vs deleting a baseline manually and creating a new one with --baseline turned on?

I was thinking that manual deletion makes it less magical. And you control this process with very understandable and clear actions. But, I would love to hear other arguments!

Ha, I was thinking the opposite. :P
It seems to me that the behaviour would be inconsistent with normal flake8 usage, if running the same command twice in a row produces 2 different sets of output. If the user fails to read the documentation, I would expect some confusion here.

I think that an explicit create option could avoid any confusion, and slightly reduce friction in our use case where we expect to recreate it often. It could also potentially catch some unexpected effects, such as if the file were moved/renamed accidentally, then flake8 could give an error saying that the baseline file could not be found. Without that, they would suddenly get huge amounts of violations, and possibly not understand why (especially if they are not the person who created the baseline in the first place).

I was just starting to implement this myself after finding that flakehell is fairly broken for our code base.

I would love to see your solution. Also, if you wish we can collaborate on this PR together. Because, I am struggling to find some free time to work on this. (Currently I am too busy writing returns)

I was just extracting the baseline code from flakehell, and stripping out all the other functionality (it's all that other functionality which was breaking things for our project). So, I presume you've already seen that code. I had only just made a start on it, when I was pointed here.

Happy to help out where I can (we have a hack day in a couple of weeks, so I can probably spend a full day working on it then). I'll try and get your branch working locally and test it out sometime this week.

Dreamsorcerer · 2020-05-31T11:12:46Z

I was thinking that manual deletion makes it less magical. And you control this process with very understandable and clear actions. But, I would love to hear other arguments!

Just thinking a little more about being explicit. I think running rm filename and flake8 --create-baseline are both clear, explicit actions to the user. However, not just if the baseline gets moved by accident, but also if a flake8 config gets copied into another project, then your approach would result in the baseline being created implicitly with no clear user action. In the case of copying a config to another project, there's a good chance that was not the desired behaviour. --create-baseline would never be run implicitly.

sobolevn · 2020-05-31T11:20:28Z

I think running rm filename and flake8 --create-baseline are both clear, explicit actions to the user

@Dreamsorcerer yes, I like this idea! This will indeed make the process even more clear.

Feel welcome to contribute 🙂

Dreamsorcerer · 2020-06-01T17:23:35Z

@sobolevn I've got the branch up and running. Are your latest changes on this branch? You mentioned earlier about fuzzy matching instead of hashes, but I'm still seeing hashes in the baseline file. It's also not using the baseline file, it creates the file just fine, but still gives me all the violations when I run it again.

sobolevn · 2020-06-01T18:06:23Z

@Dreamsorcerer sorry, updated the branch with the latest local changes.

Dreamsorcerer · 2020-06-02T13:35:50Z

Thanks. Still doesn't seem to ignore the violations successfully, but definetely doing samething now. On second run, it produces a load of try, ignored, violations and candidates lines, which I'm assuming is some debug output, but still exits 1. Also, doesn't seem to run when used as a git hook.

So, looks like there's a couple of tasks I should start by looking at.

Dreamsorcerer · 2020-06-11T13:47:55Z

I'm going to have a go at this next week. I'm going to look at:

--create-baseline (I'm finding plenty more edge cases where the implicit baseline doesn't work)
Remove unused violations from baseline.
Ensuring everything still works well when passing in specific files (rather than running on the entire project).
Try to catch file renames.
Remove or suppress the debug output.
Look at a way for report generation to produce reports of the baseline.

Dreamsorcerer · 2020-06-18T17:31:03Z

OK, the fuzzy matching algorithm needs some work. I added a line into a file and 2 violations 100+ lines away reappeared. The only change is that the line number moved down by 1.

Will try and debug that one tomorrow.

sobolevn · 2020-06-18T18:30:28Z

@Dreamsorcerer thanks a lot for your time and effort! 👍

Dreamsorcerer · 2020-06-19T16:59:53Z

OK, opened up a PR with all the work done today and yesterday: #1471

I'll try and start using this for our project on a daily basis next week, so I can see if there are any issues that come up.

Some ideas for improvements (but shouldn't be considered blocking this feature):

Looser fuzzy matching. e.g. If line number is the same and column number is <10 characters different and/or if physical line is 50% similar. Then we would be able to catch multiple small changes at once (e.g. a variable name being changed at the same time the line number is moved due to other code being added above).
If a file is deleted, it won't get removed from the baseline currently, would be nice to clean that up.
Support displaying the baseline, which can then be used with --statistics or flake8-html. This would involve overriding the run checks and passing the stored baseline directly to the report generation.

sobolevn · 2020-06-23T10:27:23Z

@Dreamsorcerer thanks a lot! I really appreciate you working on a release-blocking feature. 🙏

We can work on fuzzy-matcher during or after we implement the core features. As you wish.
Me, personaly, would prefer the second way 🙂

ma-sadeghi · 2021-11-17T23:26:39Z

Is this PR being actively worked on? Would be great to have baseline...

Dreamsorcerer · 2021-11-18T19:01:03Z

See #1817 (comment)

I think at this point, somebody would have to volunteer to convert the code to a dedicated flake8 plugin. You'll want to look at using this branch though: #1471
It's in a good enough situation to release.

sobolevn · 2021-12-13T16:50:27Z

No, --baseline won't be added with this PR. Reasons:

It does not detect code movements
It heavily depends on flake8's internals
It is hacky

sobolevn added 2 commits March 22, 2020 20:38

Changes wps action

36ba28f

Adds --baseline

5e4d8e3

sobolevn requested a review from orsinium March 23, 2020 20:57

sobolevn added 6 commits March 24, 2020 00:47

Adds --baseline

eac9315

Adds --baseline

5b2bdc8

Adds --baseline

b6790ec

Adds --baseline

505837c

Adds --baseline

bc02dfc

Adds --baseline

4f85fed

sobolevn added 2 commits March 24, 2020 11:16

Adds --baseline

d4e1f70

Adds --baseline docs

822b790

asottile reviewed Mar 24, 2020

View reviewed changes

sobolevn requested a review from lk-geimfari March 25, 2020 09:58

sobolevn mentioned this pull request Mar 25, 2020

Offline mode andreoliwa/nitpick#129

Merged

sobolevn commented Mar 25, 2020

View reviewed changes

sobolevn and others added 5 commits March 25, 2020 15:35

Improves docs with screenshots and examples

a253f37

Bump docutils from 0.13.1 to 0.16 (#1289)

d623935

Bumps [docutils](http://docutils.sourceforge.net/) from 0.13.1 to 0.16. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>

Updates dependencies

2cdf7f3

skarzi reviewed Mar 26, 2020

View reviewed changes

sobolevn and others added 2 commits March 26, 2020 23:01

Update tests/plugins/files.py

1be691b

Co-Authored-By: Łukasz Skarżyński <skarzynski_lukasz@protonmail.com>

Update tests/plugins/files.py

2d58dce

Co-Authored-By: Łukasz Skarżyński <skarzynski_lukasz@protonmail.com>

sobolevn mentioned this pull request May 22, 2020

disable module for run? #1412

Closed

orsinium mentioned this pull request May 30, 2020

Where has the bug tracker gone? life4/flakehell#80

Merged

Dreamsorcerer reviewed May 30, 2020

View reviewed changes

sobolevn added 2 commits June 1, 2020 21:02

WIP: latest changes

ad76b9b

WIP: latest changes

0bb95aa

sobolevn mentioned this pull request Jun 10, 2020

Disallow to ignore some violations on line level #1248

Closed

webknjaz mentioned this pull request Jun 12, 2020

Adopting use of flake8 plugins pypa/pip#5537

Closed

sobolevn force-pushed the master branch from 4b1a202 to a0a009d Compare July 29, 2020 16:44

Dreamsorcerer mentioned this pull request Feb 4, 2021

FlakeHell was archived #1817

Open

asottile mentioned this pull request Apr 3, 2021

Feature proposal: baseline for legacy projects PyCQA/flake8#330

Closed

sobolevn closed this Dec 13, 2021

sobolevn deleted the issue-1274 branch December 13, 2021 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds --baseline #1276

Adds --baseline #1276

sobolevn commented Mar 23, 2020

coveralls commented Mar 24, 2020

coveralls commented Mar 24, 2020 •

edited

asottile Mar 24, 2020

sobolevn Mar 24, 2020

webknjaz May 3, 2020

sobolevn commented Mar 25, 2020

sobolevn commented Mar 25, 2020 •

edited

sobolevn commented Mar 25, 2020

sobolevn Mar 25, 2020

skarzi left a comment

skarzi Mar 26, 2020

sobolevn Mar 26, 2020

sobolevn commented Mar 29, 2020

sobolevn commented May 11, 2020

sobolevn commented May 12, 2020 •

edited

Dreamsorcerer May 30, 2020

Dreamsorcerer commented May 30, 2020

sobolevn commented May 31, 2020 •

edited

Dreamsorcerer commented May 31, 2020 •

edited

Dreamsorcerer commented May 31, 2020

sobolevn commented May 31, 2020

Dreamsorcerer commented Jun 1, 2020

sobolevn commented Jun 1, 2020

Dreamsorcerer commented Jun 2, 2020

Dreamsorcerer commented Jun 11, 2020 •

edited

Dreamsorcerer commented Jun 18, 2020 •

edited

sobolevn commented Jun 18, 2020

Dreamsorcerer commented Jun 19, 2020 •

edited

sobolevn commented Jun 23, 2020

ma-sadeghi commented Nov 17, 2021

Dreamsorcerer commented Nov 18, 2021 •

edited

sobolevn commented Dec 13, 2021

	than your baselined violations will be ignored.
	then your baselined violations will be ignored.

Adds --baseline #1276

Adds --baseline #1276

Conversation

sobolevn commented Mar 23, 2020

coveralls commented Mar 24, 2020

Pull Request Test Coverage Report for Build 2864

💛 - Coveralls

coveralls commented Mar 24, 2020 • edited

Pull Request Test Coverage Report for Build 2872

💛 - Coveralls

asottile Mar 24, 2020

Choose a reason for hiding this comment

sobolevn Mar 24, 2020

Choose a reason for hiding this comment

webknjaz May 3, 2020

Choose a reason for hiding this comment

sobolevn commented Mar 25, 2020

sobolevn commented Mar 25, 2020 • edited

sobolevn commented Mar 25, 2020

sobolevn Mar 25, 2020

Choose a reason for hiding this comment

skarzi left a comment

Choose a reason for hiding this comment

skarzi Mar 26, 2020

Choose a reason for hiding this comment

sobolevn Mar 26, 2020

Choose a reason for hiding this comment

sobolevn commented Mar 29, 2020

sobolevn commented May 11, 2020

sobolevn commented May 12, 2020 • edited

Dreamsorcerer May 30, 2020

Choose a reason for hiding this comment

Dreamsorcerer commented May 30, 2020

sobolevn commented May 31, 2020 • edited

Dreamsorcerer commented May 31, 2020 • edited

Dreamsorcerer commented May 31, 2020

sobolevn commented May 31, 2020

Dreamsorcerer commented Jun 1, 2020

sobolevn commented Jun 1, 2020

Dreamsorcerer commented Jun 2, 2020

Dreamsorcerer commented Jun 11, 2020 • edited

Dreamsorcerer commented Jun 18, 2020 • edited

sobolevn commented Jun 18, 2020

Dreamsorcerer commented Jun 19, 2020 • edited

sobolevn commented Jun 23, 2020

ma-sadeghi commented Nov 17, 2021

Dreamsorcerer commented Nov 18, 2021 • edited

sobolevn commented Dec 13, 2021

coveralls commented Mar 24, 2020 •

edited

sobolevn commented Mar 25, 2020 •

edited

sobolevn commented May 12, 2020 •

edited

sobolevn commented May 31, 2020 •

edited

Dreamsorcerer commented May 31, 2020 •

edited

Dreamsorcerer commented Jun 11, 2020 •

edited

Dreamsorcerer commented Jun 18, 2020 •

edited

Dreamsorcerer commented Jun 19, 2020 •

edited

Dreamsorcerer commented Nov 18, 2021 •

edited