Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vlib: refactor empty string checks #21300

Merged
merged 1 commit into from
Apr 17, 2024

Conversation

ttytm
Copy link
Member

@ttytm ttytm commented Apr 17, 2024

Instead of checking empty strings via str_var.len > 0, str_var.len == 0, str_var.len < 1
the PR moves towards using str_var != '' and str_var == ''.

The former is longer to write, has no performance benefit, and, being less expressive, it comes with more mental overhead and requires more context to comprehend what (type) the condition checks. So imho making it unnecessary hard.

If a PR for v vet has a chance, I'd give it a shot implementing it. Imho it would be fitting to notify about this.

@JalonSolov
Copy link
Contributor

The problem with this is that string comparisons are much slower than int comparisons. This check is very quick, but will still be slower with the change.

And lots of little slowdowns can add up to a large slowdown overall.

@ttytm
Copy link
Member Author

ttytm commented Apr 17, 2024

I couldn't find the check for '' being slower.

@ttytm
Copy link
Member Author

ttytm commented Apr 17, 2024

No doubt it makes sense when the string needs to be a certain length to check for then len. But for empty strings?

@JalonSolov
Copy link
Contributor

JalonSolov commented Apr 17, 2024

As I said, it's a very small difference, but it is there. Just look at line 732 in builtin/string.v to see the code that's executed for == for strings.

First, it checks to see if the str field == 0. Then it compares the lengths of the strings, then it checks to see if the length is > 0. Then it calls vmemcmp. Worse, it will always call vmemcmp (since none of the other tests will fail), even for empty strings.

All that, to compare 2 empty strings.

Whereas a comparison of 2 ints is... simple and straight-forward. A single CPU instruction on most systems.

@ttytm
Copy link
Member Author

ttytm commented Apr 17, 2024

As I said, it's a very small difference, but it is there. Just look at line 732 in builtin/string.v to see the code that's executed for == for strings.

As far as I can interpret it's made sure that the string len is compared when using the == on strings, so it should make no difference in performance.
From the check below it varies which on is faster and differences between runs are in a non significant range. It has to be measurable if there is a performance diff.

import time

str := ''
non_empty := 'This is my non empty string'

sw := time.new_stopwatch()

for _ in 0 .. 1_000_000 {
	if str.len == 0 {
		assert true
	}
	if non_empty.len != 0 {
		assert true
	}
}

dump(sw.elapsed())

sw2 := time.new_stopwatch()

for _ in 0 .. 1_000_000 {
	if str == '' {
		assert true
	}
	if non_empty != '' {
		assert true
	}
}

dump(sw2.elapsed())

@ttytm
Copy link
Member Author

ttytm commented Apr 17, 2024

It would require an extended test to tell if there really is significance to it. The one I wrote is to simple. If there is performance difference and it is just some nano seconds the benefits of string comparison might still outweigh it.

@JalonSolov
Copy link
Contributor

A decent optimizer will reduce both checks to almost nothing... if possible.

It still might be useful to do the same checks with -prod to see what happens. Also not sure I trust stopwatch...

@ttytm
Copy link
Member Author

ttytm commented Apr 17, 2024

The conditions generate the same C 0 len check... There can't be a difference.

@JalonSolov
Copy link
Contributor

How about other backends?

@ttytm
Copy link
Member Author

ttytm commented Apr 17, 2024

Need to check. If the optimal is not generated for other backends what I already think is that we should address it in the backends.

@spytheman
Copy link
Member

The s == '' optimization was implemented in the C backend in #6527 in 2020-10-01.

I agree, that if it is not, it should be done for the other backends too.

Another alternative, is that it can be reimplemented in the transformer stage, since it is a very mechanical and easy to automate rule. Then all backends will get it.

Copy link
Member

@spytheman spytheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work.

@spytheman spytheman merged commit 8aa9314 into vlang:master Apr 17, 2024
56 checks passed
@spytheman
Copy link
Member

@medvednikov what do you think of having a vet rule, for finding s.len == 0 conditions, and suggesting s == '' instead?

@ttytm ttytm deleted the vlib/refactor-empty-string-checking branch April 18, 2024 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants