Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix(tokenizer): don't increment array cursor by 2 on CRLF #3204

Merged
merged 2 commits into from Mar 22, 2024

Conversation

georgesittas
Copy link
Collaborator

@georgesittas georgesittas commented Mar 22, 2024

Fixes #3200

We used to increment i by 2 inside _advance, in order to basically skip the \r\n sequence and have the line attribute be automatically updated to the right value. However, this didn't take into account an edge case where the SQL string can end in \r\n and so it resulted in an off-by-one index error, as reported in the linked issue.

Since the original motivation for this change was to fix the line attribute, which was getting incremented twice for every \r\n sequence, I changed our approach to simply not increment it when we find the \r in the sequence and only do it on the subsequent \n.

@mr-miles
Copy link

mr-miles commented Mar 22, 2024

It's totally immaterial but I noticed the column gets set to 1 but the line number doesnt change til the next iteration ... how about

            if not(self._char == "\r" and self._peek == "\n"):
               self._col = 1
               self._line += 1

anyhow thanks for the swift attention!

@georgesittas
Copy link
Collaborator Author

georgesittas commented Mar 22, 2024

It's totally immaterial but I noticed the column gets set to 1 but the line number doesnt change til the next iteration ... how about

            if not(self._char == "\r" and self._peek == "\n"):
               self._col = 1
               self._line += 1

anyhow thanks for the swift attention!

Done, thanks! I liked this approach more, I think it simplifies the logic 👍

@tobymao tobymao merged commit 73fc807 into main Mar 22, 2024
5 checks passed
@tobymao tobymao deleted the jo/fix_tokenizer_crlf branch March 22, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows line endings cause IndexOutOfRange during tokenization
4 participants