Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for regexes in url ignore list #80

Merged
merged 4 commits into from
Sep 30, 2022

Conversation

cyberw
Copy link
Contributor

@cyberw cyberw commented Sep 12, 2022

Fixes #81.

Description

I didnt find an easy way to ignore requests entirely by using plugins, and just using string matching is not flexible enough. Could be a breaking change in rare cases, but should work in most cases.

Types of Changes

What types of changes does your code introduce? Keep the ones that apply:

  • New feature (non-breaking change which adds functionality)

TODO

List of tasks you will do to complete the PR:

  • Add example and update documentation (will do this if you think the change is a good idea)

@cyberw cyberw requested a review from thilp as a code owner September 12, 2022 08:01
@cyberw
Copy link
Contributor Author

cyberw commented Sep 12, 2022

Not that it matters, but I am the maintainer of Locust itself :)

@thilp
Copy link
Member

thilp commented Sep 16, 2022

Hi @cyberw, thanks for contributing, I like the idea of giving more power to users via regexes! Do you think we could avoid making this a breaking change by enabling regex-matching with a CLI flag or an environment variable?

@cyberw
Copy link
Contributor Author

cyberw commented Sep 16, 2022

We could make it a flag, but it will only very rarely break anything, so I dont think it is worth it.

A regex . matches a string-match a literal ., so google.com in the ignore file still ignores google.com. The only breaking case should be if someone has a urlignore similar to "goo.le", which would start to ignore "google.com".

I kind of made this PR thinking we were matching URL:s, not just hostnames. Perhaps we can have two ignore files? One that matches entire URL:s (using regex) and one that string-matches only hostnames (preserving the old behaviour).

The only problem with that is that the old file is called urlignore (when it should actually be called hostignore)
Maybe we could deprecate .urlignore (but still support) and preserve that behaviour in .hostignore, and have a new file just called .ignore (or something), that matches the entire URL, using regexes. No need for additional parameters that way...

(I realize now that this was more of a train of thought, feel free to ignore the first two paragraphs :)

@thilp
Copy link
Member

thilp commented Sep 17, 2022

I like the idea of deprecating .urlignore and replacing it with .ignore. I think we can even dispense with keeping .hostignore since there’s nothing it can achieve that .ignore can’t. So we can publish a minor version adding .ignore and where .urlignore is still supported but with a warning, and after some time a major version will come and just keep .ignore.

I would also add an environment variable (TRANSFORMER_URL_IGNORE_PATH?) to allow users to override the default .ignore name that could be too general and conflict with other tools.

How does that sound? Do you feel like trying it? 😊 Meanwhile I would work on the GitHub Action setup that needs some love… 😅

@thilp
Copy link
Member

thilp commented Sep 17, 2022

Or we can just make a major version now, and drop the .urlignore support already (still with a warning message if it’s found, just not supported anymore). I want to drop py3.6 support too, which is technically a breaking change, so directly a major version would just make things easier.

@cyberw
Copy link
Contributor Author

cyberw commented Sep 17, 2022

Cool! I'll have a look next week!

@cyberw
Copy link
Contributor Author

cyberw commented Sep 18, 2022

I didnt implement the env var, but that could be done at a later point if needed.

For a new major version I would love to drop Locust 0.x support and maybe put everything in one task (insted of @task:s in a SequentialTaskSet - task sets are really an advanced feature, and a lot of people get confused when using them :)

And maybe optionally add catch_response=True and with-blocks.

Right now I post-process everything like this (slightly simplified from my actual script), and parts of it feel really dirty :)


text = re.sub(
    r".*TaskSequence\):",
    """from locust import task, events, FastHttpUser
from locust_plugins import listeners, run_single_user

class MyUser(FastHttpUser):
    @task
    def t(self):""",
    text,
    flags=re.DOTALL,
)
text = re.sub(r"class .*\(HttpLocust\):[\s\S]*", "", text)
text = re.sub(r"@seq_task\(.*\)\n    ", "", text)
text = re.sub(r"def .{10,}\(self\):\n    ", "", text)
text = re.sub(r"response = self", "self", text)
text = re.sub(r".*self.client.options\(.*\n", "", text)
text = re.sub(r", name='[^']*', timeout=30", "", text)
text = re.sub(
    r"self\.client\.post\(url='https://api\.spela\.svenskaspel\.se([^']*)'(.*)",
    r"with self.client.post('\1', catch_response=True \2 as resp:\n            pass",
    text,
)

@cyberw
Copy link
Contributor Author

cyberw commented Sep 18, 2022

Also, I had an idea: What if we were to add transformer to locust core? We could at least add it as an optional installation (using pip's optional dependencies, e.g. pip install locust[transformer]) and a page in the documentation (and also link to the proper transformer documentation). I would love to make it easier for people to find.

@cyberw
Copy link
Contributor Author

cyberw commented Sep 25, 2022

@thilp Hi! Did you have time to take a look?

Copy link
Member

@thilp thilp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @cyberw!

transformer/denylist.py Outdated Show resolved Hide resolved
transformer/denylist.py Outdated Show resolved Hide resolved
@zincr
Copy link

zincr bot commented Sep 27, 2022

🤖 zincr found 0 problems , 0 warnings

✅ Large Commits
✅ Approvals
✅ Specification
✅ Dependency Licensing

@thilp
Copy link
Member

thilp commented Sep 27, 2022

I didnt implement the env var, but that could be done at a later point if needed.

Works for me.

For a new major version I would love to drop Locust 0.x support and maybe put everything in one task (insted of @task:s in a SequentialTaskSet - task sets are really an advanced feature, and a lot of people get confused when using them :)

And maybe optionally add catch_response=True and with-blocks.

I must admit I’ve not kept myself up to date with Locust best practices, so I take your word for it. If I remember correctly, we are using SequentialTaskSet to attribute different weights to scenarios. How would you achieve the same flexibility with a single task, have Transformer hardcode the load-balancing code directly inside the task? Or do you simply not use weights?

Perhaps we can also continue this discussion in a dedicated issue, since this will likely outlive this PR 🙂

cyberw and others added 2 commits September 27, 2022 15:02
@thilp
Copy link
Member

thilp commented Sep 27, 2022

I am happy with this PR. Because of the compliance process in place for open-source at Zalando (who owns Transformer), please give me some time to get another colleague to approve it as well. This shouldn’t take long 🙂

Copy link
Member

@szuecs szuecs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR

@cyberw
Copy link
Contributor Author

cyberw commented Sep 28, 2022

For a new major version I would love to drop Locust 0.x support and maybe put everything in one task (insted of @task:s in a SequentialTaskSet - task sets are really an advanced feature, and a lot of people get confused when using them :)
And maybe optionally add catch_response=True and with-blocks.

I must admit I’ve not kept myself up to date with Locust best practices, so I take your word for it. If I remember correctly, we are using SequentialTaskSet to attribute different weights to scenarios. How would you achieve the same flexibility with a single task, have Transformer hardcode the load-balancing code directly inside the task? Or do you simply not use weights?

Perhaps we can also continue this discussion in a dedicated issue, since this will likely outlive this PR 🙂

Yes, lets! I can raise a new ticket.

@thilp
Copy link
Member

thilp commented Sep 30, 2022

I will now merge this PR and create a new release (associated with a new major version).

@thilp thilp merged commit fa68c50 into zalando-incubator:master Sep 30, 2022
@cyberw
Copy link
Contributor Author

cyberw commented Sep 30, 2022

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make .urlignore ignore based on whole URL instead of just hostname
3 participants