Advance scanner position by byte length while searching for line #3612

vinistock · 2025-06-17T17:24:11Z

Motivation

Fixes #3494 and #2446. Second time is the charm 😅

In #3583, we fixed bytesize counting for UTF-8, but didn't advance @pos based on the byte size when finding the correct line, which meant the document would still get corrupted.

Implementation

I split scanning into 3 subclasses since I found the logic for each encoding to be sufficiently different. We also don't want to pay the price of checking the encoding inside the many loops.

For posterity, the spec explains that positions use code unit lengths. For each encoding, that means a different thing:

UTF-8: code units are equivalent to bytes. We simply work directly with byte sizes
UTF-16: code units are almost equivalent to code points. The main different is that code points after the surrogate pair are considered length 2 and everything else length 1
UTF-32: code units is the same as code points

I implemented the logic for each in subclasses.

Automated Tests

Added a bunch of tests for each encoding and some edge cases, which should hopefully help us prevent further regressions.

Manual Tests

Tested on VS Code and NeoVim (UTF-16 and UTF-8).

vinistock · 2025-06-17T17:24:29Z

Advance scanner position by byte length while searching for line #3612 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add the label graphite-merge to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

ChallaHalla · 2025-06-17T20:56:54Z

Tested in Neovim with UTF-32 and these changes seem to fix the problem!

Seems to be good for UTF 16 and 8 as well.

kddnewton · 2025-06-17T23:18:43Z

I thought this functionality was largely taken care of inside of Prism::Location. Is there something missing?

vinistock · 2025-06-18T13:46:02Z

@kddnewton Prism::Location fully handles the other direction returning the right code units for a given AST node. This one is the reverse, we need to receive the code unit positions from the editor and ensure that we're turning them into the right string indices to update our source representation.

alexcrocha

LGTM 🚀

Just one small change, otherwise good to 🚢
(feel free to ignore the nits)

lib/ruby_lsp/document.rb

And refactor each encoding scanner into a subclass

vinistock self-assigned this Jun 17, 2025

vinistock added the bugfix This PR will fix an existing bug label Jun 17, 2025 — with Graphite App

vinistock added the server This pull request should be included in the server gem's release notes label Jun 17, 2025 — with Graphite App

vinistock requested review from alexcrocha, amomchilov and ChallaHalla June 17, 2025 17:24

vinistock marked this pull request as ready for review June 17, 2025 17:29

vinistock requested a review from a team as a code owner June 17, 2025 17:29

vinistock force-pushed the 06-17-advance_scanner_position_by_byte_length_while_searching_for_line branch from 1a368c4 to 7b99851 Compare June 17, 2025 17:32

vinistock requested a review from st0012 June 17, 2025 18:12

This was referenced Jun 17, 2025

String with emoji in it throws off semantic highlighting #1162

Closed

100% CPU usage #2446

Closed

neovim-lsp: Weird behavior after inserting non-ascii characters #3494

Closed

alexcrocha approved these changes Jun 18, 2025

View reviewed changes

lib/ruby_lsp/document.rb Outdated Show resolved Hide resolved

lib/ruby_lsp/document.rb Outdated Show resolved Hide resolved

lib/ruby_lsp/document.rb Outdated Show resolved Hide resolved

vinistock force-pushed the 06-17-advance_scanner_position_by_byte_length_while_searching_for_line branch from 7b99851 to ab7bcf5 Compare June 18, 2025 18:26

Advance scanner position by byte length while searching for line

f930333

And refactor each encoding scanner into a subclass

vinistock force-pushed the 06-17-advance_scanner_position_by_byte_length_while_searching_for_line branch from ab7bcf5 to f930333 Compare June 18, 2025 18:43

vinistock enabled auto-merge (squash) June 18, 2025 18:43

vinistock merged commit 6b10f30 into main Jun 18, 2025
36 checks passed

vinistock deleted the 06-17-advance_scanner_position_by_byte_length_while_searching_for_line branch June 18, 2025 19:11

BrewTestBot mentioned this pull request Jun 19, 2025

ruby-lsp 0.24.2 Homebrew/homebrew-core#227429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advance scanner position by byte length while searching for line #3612

Advance scanner position by byte length while searching for line #3612

vinistock commented Jun 17, 2025 •

edited

Loading

Uh oh!

vinistock commented Jun 17, 2025

Uh oh!

ChallaHalla commented Jun 17, 2025 •

edited

Loading

Uh oh!

kddnewton commented Jun 17, 2025

Uh oh!

vinistock commented Jun 18, 2025

Uh oh!

alexcrocha left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Advance scanner position by byte length while searching for line #3612

Advance scanner position by byte length while searching for line #3612

Conversation

vinistock commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Implementation

Automated Tests

Manual Tests

Uh oh!

vinistock commented Jun 17, 2025

How to use the Graphite Merge Queue

Uh oh!

ChallaHalla commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kddnewton commented Jun 17, 2025

Uh oh!

vinistock commented Jun 18, 2025

Uh oh!

alexcrocha left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinistock commented Jun 17, 2025 •

edited

Loading

ChallaHalla commented Jun 17, 2025 •

edited

Loading

alexcrocha left a comment •

edited

Loading