Skip to content

Add performance numbers#2

Merged
zhiltsov-max merged 7 commits intodevelopfrom
zm/test-perf
Feb 7, 2026
Merged

Add performance numbers#2
zhiltsov-max merged 7 commits intodevelopfrom
zm/test-perf

Conversation

@zhiltsov-max
Copy link
Owner

  • Added performance test
  • Added performance numbers in Readme

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Criterion benchmark and publishes initial benchmark results in the README to document string_distance::levenshtein() performance.

Changes:

  • Added a Criterion benchmark (cargo bench) to measure levenshtein() over multiple string lengths/charsets.
  • Documented benchmark results and benchmarking instructions in README.md.
  • Added benchmark-related dev dependencies and bench configuration in Cargo.toml (and updated Cargo.lock).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

File Description
benches/my_benchmark.rs New Criterion benchmark for levenshtein() over ASCII and UTF-8 inputs.
README.md Adds a Performance section with reported timings and a Benchmarking section.
Cargo.toml Adds criterion + rand dev-dependencies and declares the benchmark target.
Cargo.lock Locks new dev-dependency transitive graph for benchmarking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

impl rand::distr::Distribution<char> for AlphanumericChar {
fn sample<R: rand::Rng + ?std::marker::Sized>(&self, rng: &mut R) -> char {
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic bound ?std::marker::Sized is invalid Rust syntax and will not compile. Use ?Sized (or just omit it, since R: Rng is Sized by default) in the sample signature.

Suggested change
fn sample<R: rand::Rng + ?std::marker::Sized>(&self, rng: &mut R) -> char {
fn sample<R: rand::Rng>(&self, rng: &mut R) -> char {

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2 to +4
use rand;
use std::hint::black_box;
use string_distance;
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use rand; and use string_distance; are unused here (crate names are already in the extern prelude, and you reference them via rand::... / string_distance::...). This will trigger unused-import warnings when running cargo bench. Remove these imports or import the specific items you need (e.g., rand::Rng).

Suggested change
use rand;
use std::hint::black_box;
use string_distance;
use std::hint::black_box;

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +76
let mut rng = rand::rng();

for length in lengths {
c.bench_function(
&format!(
"levenshtein for {charset} strings of length {length}",
charset = "ASCII",
length = length
),
|b| {
b.iter_batched(
|| {
return (
generate_random_string(length, &mut rng, &ascii_chars),
generate_random_string(length, &mut rng, &ascii_chars),
);
},
|v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),
BatchSize::SmallInput,
)
},
);
}

for length in lengths {
c.bench_function(
&format!(
"levenshtein for {charset} strings of length {length}",
charset = "UTF8",
length = length
),
|b| {
b.iter_batched(
|| {
return (
generate_random_string(length, &mut rng, &utf8_chars),
generate_random_string(length, &mut rng, &utf8_chars),
);
},
|v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),
BatchSize::SmallInput,
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses iter_batched with random string generation in the setup closure; Criterion measures the whole loop, so the reported time includes string generation/allocation as well as levenshtein(). If the goal is to benchmark the distance algorithm, pre-generate inputs per length (or use a pool and clone/select) and benchmark only the levenshtein call.

Suggested change
let mut rng = rand::rng();
for length in lengths {
c.bench_function(
&format!(
"levenshtein for {charset} strings of length {length}",
charset = "ASCII",
length = length
),
|b| {
b.iter_batched(
|| {
return (
generate_random_string(length, &mut rng, &ascii_chars),
generate_random_string(length, &mut rng, &ascii_chars),
);
},
|v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),
BatchSize::SmallInput,
)
},
);
}
for length in lengths {
c.bench_function(
&format!(
"levenshtein for {charset} strings of length {length}",
charset = "UTF8",
length = length
),
|b| {
b.iter_batched(
|| {
return (
generate_random_string(length, &mut rng, &utf8_chars),
generate_random_string(length, &mut rng, &utf8_chars),
);
},
|v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),
BatchSize::SmallInput,
)
// Benchmark ASCII inputs
for length in lengths {
// Pre-generate a pool of input pairs so the benchmark measures only levenshtein()
let mut rng = rand::rng();
let sample_size = 1024usize;
let mut samples: Vec<(String, String)> = Vec::with_capacity(sample_size);
for _ in 0..sample_size {
samples.push((
generate_random_string(length, &mut rng, &ascii_chars),
generate_random_string(length, &mut rng, &ascii_chars),
));
}
c.bench_function(
&format!(
"levenshtein for {charset} strings of length {length}",
charset = "ASCII",
length = length
),
move |b| {
let mut idx = 0usize;
b.iter(|| {
let pair = &samples[idx];
idx += 1;
if idx == samples.len() {
idx = 0;
}
string_distance::levenshtein(black_box(&pair.0), black_box(&pair.1))
})
},
);
}
// Benchmark UTF-8 inputs
for length in lengths {
// Pre-generate a pool of input pairs so the benchmark measures only levenshtein()
let mut rng = rand::rng();
let sample_size = 1024usize;
let mut samples: Vec<(String, String)> = Vec::with_capacity(sample_size);
for _ in 0..sample_size {
samples.push((
generate_random_string(length, &mut rng, &utf8_chars),
generate_random_string(length, &mut rng, &utf8_chars),
));
}
c.bench_function(
&format!(
"levenshtein for {charset} strings of length {length}",
charset = "UTF8",
length = length
),
move |b| {
let mut idx = 0usize;
b.iter(|| {
let pair = &samples[idx];
idx += 1;
if idx == samples.len() {
idx = 0;
}
string_distance::levenshtein(black_box(&pair.0), black_box(&pair.1))
})

Copilot uses AI. Check for mistakes.
for length in lengths {
c.bench_function(
&format!(
"levenshtein for {charset} strings of length {length}",
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "UTF8" benchmarks, length is the number of char scalar values generated, but the library’s UTF path operates on Unicode grapheme clusters (via unicode-segmentation). That means the label "strings of length {length}" may not reflect the actual units processed. Either constrain the generated characters to single-codepoint graphemes or adjust the label/documentation to clarify what "length" means.

Suggested change
"levenshtein for {charset} strings of length {length}",
"levenshtein for {charset} strings with {length} Unicode scalar values",

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@zhiltsov-max zhiltsov-max merged commit 2573801 into develop Feb 7, 2026
3 checks passed
@zhiltsov-max zhiltsov-max deleted the zm/test-perf branch February 7, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants