-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stdlib): add punycode encoding functions #672
Conversation
This adds `encode_punycode` and `decode_punycode` functions. It also adds tests to confirm `parse_url` function behavior when it comes to punycode. Fixes: vectordotdev#659
This adds punycode related functions. I have also realised that |
I haven't tested this patch, so forgive me if this lazy request could be answered that way: What happens if a string not containing UTF-8 characters is run through the encode_punycode process? Or a non-punycode string is run through decode_punycode? If the strings remain un-mangled through both directions if they do not contain UTF-8, then the ability to detect if a string is UTF-8 or not is un-necessary since simple string evaluations can make that determination. |
Right, I should have added some tests and examples to demonstrate that behavior. To answer your question, it will only encode whatever requires encoding, so for that reason the example (www.café.com) only encodes the café part. I will add tests and examples with fully ASCII domains to make this more clear. |
Thanks - more test cases are always good. Very happy to see this feature. I suspect it will have a better chance of approval if it also had documentation in the main docs repository as part of the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have added them now. I have added both cases when there is nothing to encode/decode and with IDN strings, to confirm that cases when no encoding is needed are faster (an order of magnitude faster on my machine). |
I have added VRL tests as well, demonstrating encoding, decoding and combination with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the tests!
* docs(vrl): add documentation for punycode encoding functions Related: vectordotdev/vrl#672 * Allow IDN and punycode in spellchecker * Change IDN allow entry into lowercase * chore: expose component test utils (#19826) * chore(deps): Bump VRL to 0.11.0 (#19827) Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (#19823) chore(ci): Bump aws-actions/configure-aws-credentials Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2. - [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases) - [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md) - [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2) --- updated-dependencies: - dependency-name: aws-actions/configure-aws-credentials dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): Bump the prost group with 1 update (#19830) Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect). Updates `prost-reflect` from 0.12.0 to 0.13.0 - [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md) - [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0) --- updated-dependencies: - dependency-name: prost-reflect dependency-type: direct:production update-type: version-update:semver-minor dependency-group: prost ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update punycode encoding to be fallible in docs * Add failure reasons for punycode encoding * Fix typo in decode_punycode docs * Simplify error descriptions for punycode_encoding * Fix formatting of punycode_encoding cue files --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(vrl): add documentation for punycode encoding functions Related: vectordotdev/vrl#672 * Allow IDN and punycode in spellchecker * Change IDN allow entry into lowercase * chore: expose component test utils (#19826) * chore(deps): Bump VRL to 0.11.0 (#19827) Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (#19823) chore(ci): Bump aws-actions/configure-aws-credentials Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2. - [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases) - [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md) - [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2) --- updated-dependencies: - dependency-name: aws-actions/configure-aws-credentials dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): Bump the prost group with 1 update (#19830) Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect). Updates `prost-reflect` from 0.12.0 to 0.13.0 - [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md) - [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0) --- updated-dependencies: - dependency-name: prost-reflect dependency-type: direct:production update-type: version-update:semver-minor dependency-group: prost ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update punycode encoding to be fallible in docs * Add failure reasons for punycode encoding * Fix typo in decode_punycode docs * Simplify error descriptions for punycode_encoding * Fix formatting of punycode_encoding cue files --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(vrl): add documentation for punycode encoding functions Related: vectordotdev/vrl#672 * Allow IDN and punycode in spellchecker * Change IDN allow entry into lowercase * chore: expose component test utils (#19826) * chore(deps): Bump VRL to 0.11.0 (#19827) Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (#19823) chore(ci): Bump aws-actions/configure-aws-credentials Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2. - [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases) - [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md) - [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2) --- updated-dependencies: - dependency-name: aws-actions/configure-aws-credentials dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): Bump the prost group with 1 update (#19830) Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect). Updates `prost-reflect` from 0.12.0 to 0.13.0 - [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md) - [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0) --- updated-dependencies: - dependency-name: prost-reflect dependency-type: direct:production update-type: version-update:semver-minor dependency-group: prost ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update punycode encoding to be fallible in docs * Add failure reasons for punycode encoding * Fix typo in decode_punycode docs * Simplify error descriptions for punycode_encoding * Fix formatting of punycode_encoding cue files --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…otdev#19794) * docs(vrl): add documentation for punycode encoding functions Related: vectordotdev/vrl#672 * Allow IDN and punycode in spellchecker * Change IDN allow entry into lowercase * chore: expose component test utils (vectordotdev#19826) * chore(deps): Bump VRL to 0.11.0 (vectordotdev#19827) Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (vectordotdev#19823) chore(ci): Bump aws-actions/configure-aws-credentials Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2. - [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases) - [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md) - [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2) --- updated-dependencies: - dependency-name: aws-actions/configure-aws-credentials dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): Bump the prost group with 1 update (vectordotdev#19830) Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect). Updates `prost-reflect` from 0.12.0 to 0.13.0 - [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md) - [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0) --- updated-dependencies: - dependency-name: prost-reflect dependency-type: direct:production update-type: version-update:semver-minor dependency-group: prost ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update punycode encoding to be fallible in docs * Add failure reasons for punycode encoding * Fix typo in decode_punycode docs * Simplify error descriptions for punycode_encoding * Fix formatting of punycode_encoding cue files --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This adds
encode_punycode
anddecode_punycode
functions. It also adds tests to confirmparse_url
function behavior when it comes to punycode.Fixes: #659