Skip to content

C optimization: are_int_ish.double() — replace multi-vector arithmetic with single-pass C loop #217

@jonthegeek

Description

@jonthegeek

Summary

are_int_ish.double() currently uses a chain of vectorized R operations that allocates multiple intermediate logical vectors:

are_int_ish.double <- function(x, ...) {
}

Profiling with n = 100,000 shows:

Step Self-time Memory allocated
floor() 66.7% (80 ms) 102 MB
& 16.7% (20 ms) 16 MB
` ` 8.3% (10 ms)
is.na() 8.3% (10 ms)
Total (100 reps) 120 ms 133 MB

A single-pass C loop can check each element in one iteration without allocating any intermediate vectors.

Proposed implementation

Add a C function .are_int_ish_dbl(x) that iterates over the REAL(x) array and returns a LGLSXP in one pass:

SEXP are_int_ish_dbl(SEXP x) {
  int n = LENGTH(x);
  SEXP out = PROTECT(allocVector(LGLSXP, n));
  double *px = REAL(x);
  int *po = LOGICAL(out);
  for (int i = 0; i < n; i++) {
    double v = px[i];
    po[i] = ISNA(v) || (R_FINITE(v) && v == floor(v));
  }
  UNPROTECT(1);
  return out;
}

Then replace the R method body with a call to .Call(C_are_int_ish_dbl, x).

Benchmark target

  • Baseline: ~1.2 ms per call on n = 100,000 (120 ms / 100 reps), 133 MB allocated
  • Target: < 0.3 ms per call on n = 100,000, < 1 MB allocated (single LGLSXP allocation)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions