New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds --baseline #1276
Adds --baseline #1276
Conversation
Pull Request Test Coverage Report for Build 2864
💛 - Coveralls |
Pull Request Test Coverage Report for Build 2872
💛 - Coveralls |
import json | ||
import os | ||
from collections import Counter, defaultdict | ||
from hashlib import md5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some distributions compile out md5
so you'll probably want a stronger hash function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Any suggestions based on your experience?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sha256 should do
@lk-geimfari do you have any ideas about this PR? I would love to hear them 🙂 |
Ok, here's the problem I am facing:
Is it really a problem? On one hand, I am reporting an invalid violation. On the second hand, I am reporting the exact same violation but in a different place. I am not sure. |
I guess it is: because users won't be able to even find the new violation. And sometimes it is really hard to fix existing ones. Consider this example:
I don't want to be in a situation like this! I need to refactor the existing solution to fix cases like this one. We should also make a decision: which violations are we reporting and which one are fine. I will refactor this part: for (error_code, line_number, column, text, physical_line) in results:
# --- patch start
# Here we ignore violations present in the baseline.
if self.baseline and self.baseline.has(filename, error_code, text):
continue Into something like this: for (error_code, line_number, column, text, physical_line) in baseline.only_new_violations(results): I would also change the baseline format: I would need locations and |
Baseline | ||
-------- | ||
|
||
You can start using it with just a single command! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can start using it with just a single command!
Better:
You can start using our linter with just a single command!
* fix: check wrong variable name as case insensitive Issue #1275 * test: add test_wrong_variable_name_case_insensitive Issue #1275 * docs: add bugfix changelog entry in 0.15.0 Issue: #1275 * test: add integration test for uppercase wrong name Issue: #1275 * lint: fix import sorting and empty lines separation * refactor: pass ignored_types as a tuple * test: add class_name fixture template
Bumps [docutils](http://docutils.sourceforge.net/) from 0.13.1 to 0.16. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Bumps [importlib-metadata](http://importlib-metadata.readthedocs.io/) from 1.5.0 to 1.5.2. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really helpful feature 👍
|
||
# --- patch start | ||
if self.baseline is None: | ||
self.baseline = baseline.save_to_file(self.saved_reports) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some message should be logged that baseline file is missing and it is going to be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍
Co-Authored-By: Łukasz Skarżyński <skarzynski_lukasz@protonmail.com>
Co-Authored-By: Łukasz Skarżyński <skarzynski_lukasz@protonmail.com>
Based on the feedback I am going to change some details about how this works:
|
Bumps [importlib-metadata](http://importlib-metadata.readthedocs.io/) from 1.5.2 to 1.6.0. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Results:
|
So, I have news! I wrote a fuzzy matcher for baseline entries. It even includes several steps of negative and forehead lookups on what lines are there to determine if this code is new or not. But! It is a very hard thing to explain. I ended up not understanding the rules myself. It is a hell to test! And I have decided to try exactly the opposite solution this time: minimalism. We can have two states of code: touched or untouched. First of all, let's say we have this python file (which is baselined):
One can add a new line on top of the file:
Which will make all other lines "touched". Or one can add a line in the middle:
This will make only part of the lines touched. The same is applied when we are talking about removing a line:
And that's fine. Sometimes people do have to insert new and remove things within a legacy code. If one touch any line in any other manner, then sorry: you have to refactor it or to create a new baseline. This way we can achieve the correct and still useful behaviour. First round eliminates all exact matches, the second one ignores line numbers. All other violations are reported as new ones. |
Futher runs won't report any of the violations inside the baseline. | ||
|
||
If you already have ``.flake8-baseline.json`` file, | ||
than your baselined violations will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
than your baselined violations will be ignored. | |
then your baselined violations will be ignored. |
Looking forward to this, I was just starting to implement this myself after finding that flakehell is fairly broken for our code base. The documentation only talks about a CLI argument. Will it work with the config file too? If I set up a baseline, I'm not going to want to run flake8 without the baseline.. I think being able to conveniently overwrite the baseline in a single step is also helpful. As an example, as I'm trying to roll this out into our project, not everyone will be running flake8 immediately (this gives us a chance to test it out and find the right balance in our flake8 config). This means those of us using it will likely need to create new baselines frequently to accommodate other people's changes until we have everyone running flake8 and can enforce it in CI. Is this likely to make it into flake8 directly in future? Seems like something that should be useful for flake8 in general. |
Hi, @Dreamsorcerer! Thanks a lot for your interest!
Yes, it surely will! This is just how
Can you please help me in researching this approach vs deleting a baseline manually and creating a new one with I was thinking that manual deletion makes it less magical. And you control this process with very understandable and clear actions. But, I would love to hear other arguments!
Yes, we had a discussion about it in https://gitlab.com/pycqa/flake8/-/issues/602 But, this is a very complex feature. So, I am not sure about its future in the core. You can ask @asottile about his opinion on this. I am fine with both outcomes:
I would love to see your solution. Also, if you wish we can collaborate on this PR together. Because, I am struggling to find some free time to work on this. (Currently I am too busy writing |
Ha, I was thinking the opposite. :P I think that an explicit create option could avoid any confusion, and slightly reduce friction in our use case where we expect to recreate it often. It could also potentially catch some unexpected effects, such as if the file were moved/renamed accidentally, then flake8 could give an error saying that the baseline file could not be found. Without that, they would suddenly get huge amounts of violations, and possibly not understand why (especially if they are not the person who created the baseline in the first place).
I was just extracting the baseline code from flakehell, and stripping out all the other functionality (it's all that other functionality which was breaking things for our project). So, I presume you've already seen that code. I had only just made a start on it, when I was pointed here. Happy to help out where I can (we have a hack day in a couple of weeks, so I can probably spend a full day working on it then). I'll try and get your branch working locally and test it out sometime this week. |
Just thinking a little more about being explicit. I think running |
@Dreamsorcerer yes, I like this idea! This will indeed make the process even more clear. Feel welcome to contribute 🙂 |
@sobolevn I've got the branch up and running. Are your latest changes on this branch? You mentioned earlier about fuzzy matching instead of hashes, but I'm still seeing hashes in the baseline file. It's also not using the baseline file, it creates the file just fine, but still gives me all the violations when I run it again. |
@Dreamsorcerer sorry, updated the branch with the latest local changes. |
Thanks. Still doesn't seem to ignore the violations successfully, but definetely doing samething now. On second run, it produces a load of try, ignored, violations and candidates lines, which I'm assuming is some debug output, but still exits 1. Also, doesn't seem to run when used as a git hook. So, looks like there's a couple of tasks I should start by looking at. |
I'm going to have a go at this next week. I'm going to look at:
|
OK, the fuzzy matching algorithm needs some work. I added a line into a file and 2 violations 100+ lines away reappeared. The only change is that the line number moved down by 1. Will try and debug that one tomorrow. |
@Dreamsorcerer thanks a lot for your time and effort! 👍 |
OK, opened up a PR with all the work done today and yesterday: #1471 I'll try and start using this for our project on a daily basis next week, so I can see if there are any issues that come up. Some ideas for improvements (but shouldn't be considered blocking this feature):
|
@Dreamsorcerer thanks a lot! I really appreciate you working on a release-blocking feature. 🙏 We can work on fuzzy-matcher during or after we implement the core features. As you wish. |
Is this PR being actively worked on? Would be great to have baseline... |
See #1817 (comment) I think at this point, somebody would have to volunteer to convert the code to a dedicated flake8 plugin. You'll want to look at using this branch though: #1471 |
No,
|
Closes #1274