Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: jstrdecode should have a command line option to ignore newlines #21

Closed
1 task done
lcn2 opened this issue Oct 10, 2024 · 35 comments
Closed
1 task done
Labels
enhancement New feature or request

Comments

@lcn2
Copy link
Contributor

lcn2 commented Oct 10, 2024

Is there an existing issue for this?

  • I have searched for existing issues and did not find anything like this

Describe the enhancement

When jstedecode(1) processes data from stdin, it should have a command line option to ignore newlines.

Consider:

echo 'å∫ç∂' | jstrdecode

This will generate the following unhelpful error:

Warning: json_decode: found non-\-escaped char: 0x0a
Warning: jstrdecode_stream: error while decoding stdin buffer
Warning: main: failed to decode string from stdin

Whereas this works:

echo -n 'å∫ç∂' | jstrdecode

and produces:

å∫ç∂

The same problem happens when you use the - (dash character) arg:

echo 'å∫ç∂' | jstrdecode -

IMPORTANT NOTE: Objecting to a "naked" newline in a JSON string IS CORRECT BEHAVIOR as the so-called JSON spec does NOT allow for "naked" newlines in JSON strings.

Relevant images, screenshots or other files

echo Hello, World! | jstrdecode :-)

Relevant links

🔗⛓️

Anything else?

This is NOT a priority. The bin tools on that "other" repo supply a -T to trim outgoing newlines before feeding them into jstrdecode -. So there is no need for an emergency fix. We held off on making this change until after the code freeze.

"Technically and pedantically speaking" (as the expression goes), the current behavior of jstrdecode(1) is correct. The tool does "decode JSON encoded strings" and in the so-called JSON spec, newlines characters are NOT allowed. The JSON parser jparse(1) and its friendly library should NOT be relaxed to allow for naked newline characters in JSON documents.

Perhaps jstrdecode(1) should jstrdecode(1) have an option to ignore any final newline?

We observe that its companion jstrencode(1) tool has -n:

	-n		do not output newline after encode output (def: print final newline)

As does jstrdecode(1):

	-n		do not output newline after decode output

Perhaps a new option such as -N should tell jstrdecode to ignore a final newline, such that this would work:

echo 'å∫ç∂' | jstrdecode -N

Note that internal newlines would still cause this to fail:

echo -n 'å∫ç∂
´ƒ©˙' | jstrdecode -N

This is NOT a priory enhancement request.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

I like this thought.

If I understand it correctly this would skip past the byte 0x0a when decoding, if it's the final byte? If so that is a good idea and would help with typing the commands. I don't think it would break JSON validity either since it's only for inputting commands, not JSON itself.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

With -N one might consider that when reading internal newlines from stdin, that they each be treated as a separate argument so that:

echo -n 'å∫ç∂' 'ƒ©˙' | jstrdecode

and:

echo 'å∫ç∂
ƒ©˙' | jstrdecode -N

would produce the same thing?

UPDATE 0

See also GH-issuecomment-2405771581 ᐤ.

UPDATE 1b

Regarding GH-issuecomment-2405775932), see Unicode control points such as U+1424 and U+2218.

Perhaps a short scan of a List of Unicode Symbols would find better examples.

Other candidates could be U+07CB, U+09F9, U+0A66, U+0C66, U+0F1A, U+20DD, U+25CB, U+25CE, U+25EF, u+26AC, U+2B55, U+2D54, and U+FFEE.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

With -N one might consider that when reading internal newlines from stdin, that they each be treated as a separate argument so that:

echo -n 'å∫ç∂' 'ƒ©˙' | jstrdecode

and:

echo 'å∫ç∂
ƒ©˙' | jstrdecode -N

would produce the same thing?

I wonder. I was just doing it so it would skip the final newline which seems reasonable. But the question is if one uses it this way what does a final newline do? Meaning a trailing newline?

UPDATE 0

See also GH-issuecomment-2405771581 ᐤ.

I replied with a question on that.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

I just implemented the way I interpreted it. I can commit. If you wish for more then let me know. I have to check one thing. After this I then must do other things. (Okay did it in decode only: have to look at encode also.)

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

Regarding GH-issuecomment-2405774413, see the suggestion for processing internal newlines with -N as well.

The -N option would be limited to the internals of jstrdecode(1) only, and would not impact the JSON parser code.

We suggest that with -N, that jstrdecode(1) buffer stdin into memory, scan it removing newlines, before passing it on to the JSON string decoder.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

I just implemented the way I interpreted it. I can commit. If you wish for more then let me know. I have to check one thing. After this I then must do other things. (Okay did it in decode only: have to look at encode also.)

Please do commit, and feel free to add it to the "other repo".

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

I just implemented the way I interpreted it. I can commit. If you wish for more then let me know. I have to check one thing. After this I then must do other things. (Okay did it in decode only: have to look at encode also.)

Please do commit, and feel free to add it to the "other repo".

I'll do that and the other part can happen later I guess as I have another question about that way.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

Regarding GH-issuecomment-2405774413, see the suggestion for processing internal newlines with -N as well.

The -N option would be limited to the internals of jstrdecode(1) only, and would not impact the JSON parser code.

We suggest that with -N, that jstrdecode(1) buffer stdin into memory, scan it removing newlines, before passing it on to the JSON string decoder.

Okay assuming I remove internal \n. What would it do? How would it be removed? If I changed to a NUL byte that could be a problem. Changing it to space might be wrong too.

I guess if it's the final byte it would be like I am doing now?

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

Regarding GH-issuecomment-2405774413, see the suggestion for processing internal newlines with -N as well.
The -N option would be limited to the internals of jstrdecode(1) only, and would not impact the JSON parser code.
We suggest that with -N, that jstrdecode(1) buffer stdin into memory, scan it removing newlines, before passing it on to the JSON string decoder.

Okay assuming I remove internal \n. What would it do? How would it be removed? If I changed to a NUL byte that could be a problem. Changing it to space might be wrong too.

I guess if it's the final byte it would be like I am doing now?

We suggest you delete it from the buffer.

The command:

jstrdecode 'foo' 'bar'

concatenates both "foo" and "bar".

So think of:

(echo "foo"; echo "bar") | jstrdecode -N

as removing both newlines and concatenating the results.

Think of each line on stdin as a separate argument where -N trims the newlines on each line if they are found.

UPDATE 0

With -N, read stdin into memory, then copy to another memory block all bytes that are not a newline character, then free the 1st block of memory and decode the 2nd block with the newlines removed.

BTW: This should also work:

jstrdecode -N "hi
hello
there"

Just as if we did:

echo "hi
hello
there" | jstrdecode -N

and just as if we did:

jstrdecode hi hello there

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

About to do a pull request. If you wish to explain your further idea with -N please do so.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

About to do a pull request. If you wish to explain your further idea with -N please do so.

With -N, the jstrdecode command would ignore all newlines found.

    -N		ignore all newlines (def: treat newline characters as JSON string errors)

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

Okay I understand now. Not sure I can do this today but the final newline is at least removed now.

Perhaps you would like to do this quickly before the freeze over there? That would be appreciated if you have the time. Otherwise I can do it tomorrow I guess (or later today if I have time - but I doubt it).

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

Okay I understand now. Not sure I can do this today but the final newline is at least removed now.

Perhaps you would like to do this quickly before the freeze over there? That would be appreciated if you have the time. Otherwise I can do it tomorrow I guess (or later today if I have time - but I doubt it).

We have a tested PR that we will issue to the jparse repo. Then we will fold that PR into this repo after we complete the existing PR processing to be sure we stay in sync.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

If you think your updates resolves this issue then please feel free to close this, or let me know and I'll close it. If not then please let me know what else you think needs to be done.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

Okay I understand now. Not sure I can do this today but the final newline is at least removed now.
Perhaps you would like to do this quickly before the freeze over there? That would be appreciated if you have the time. Otherwise I can do it tomorrow I guess (or later today if I have time - but I doubt it).

We have a tested PR that we will issue to the jparse repo. Then we will fold that PR into this repo after we complete the existing PR processing to be sure we stay in sync.

Thanks for taking the time doing that! Really appreciate it.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

Sure. Please make sure to update the man page too. You can then sync to mkiocccentry when you're all done. I should be able to merge it today but if I don't get to it today I'll certainly do it in the morning (when I'm awake enough to look at it).

I guess that won't matter for a code freeze over in the mkiocccentry repo. Even so I'll be awake for many more hours just not as many as you.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

Sure. Please make sure to update the man page too. You can then sync to mkiocccentry when you're all done. I should be able to merge it today but if I don't get to it today I'll certainly do it in the morning (when I'm awake enough to look at it).

I guess that won't matter for a code freeze over in the mkiocccentry repo. Even so I'll be awake for many more hours just not as many as you.

PR is own the way soon, after man page edits.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

Sure. Please make sure to update the man page too. You can then sync to mkiocccentry when you're all done. I should be able to merge it today but if I don't get to it today I'll certainly do it in the morning (when I'm awake enough to look at it).
I guess that won't matter for a code freeze over in the mkiocccentry repo. Even so I'll be awake for many more hours just not as many as you.

PR is own the way soon, after man page edits.

Sounds good. Thanks!

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

Feel free to do the update you were thinking of with -t too. That'd be great as I'm not sure what code point you're referring to and it would also be fun to see what you have in mind.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

Double free bug .. need to fix.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

Double free bug .. need to fix.

Forgot to set to NULL? Or did I do something when I did the original addition very quickly ? Well anyway no problem. I will be going afk shortly but I'll have my phone with me.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

The jstrencode_stream() function needs fixing ...

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

The jstrencode_stream() function needs fixing ...

I hope you're a strong swimmer!

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

The jstrencode_stream() function needs fixing ...

I hope you're a strong swimmer!

Fixed.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

Doing the man page change now.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

The jstrencode_stream() function needs fixing ...

I hope you're a strong swimmer!

Fixed.

Great.

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

Doing the man page change now.

Thanks!

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

Done. Just a bit of hand testing now ...

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

building the commit ...

@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

Sounds good.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

creating the PR ...

Watching the GitHub test ...

debug[3]: \ud83d\ude4f\uD83D\uDD25\uD83D\uDC09: 🙏🔥🐉 == 🙏🔥🐉: true
debug[3]: \uD83D\uDC8D\uD83C\uDF0B: 💍🌋 == 💍🌋: true

🤓

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

See PR #23

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 10, 2024

We believe this issue has been completed.

@lcn2 lcn2 closed this as completed Oct 10, 2024
@xexyl
Copy link
Owner

xexyl commented Oct 10, 2024

creating the PR ...

Watching the GitHub test ...


debug[3]: \ud83d\ude4f\uD83D\uDD25\uD83D\uDC09: 🙏🔥🐉 == 🙏🔥🐉: true

debug[3]: \uD83D\uDC8D\uD83C\uDF0B: 💍🌋 == 💍🌋: true

🤓

Maybe I should have done the other one as well but the joke of course is that you have this person praying and yet a dragon is consuming them in fire!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants