Summary
are_int_ish.double() currently uses a chain of vectorized R operations that allocates multiple intermediate logical vectors:
are_int_ish.double <- function(x, ...) {
}
Profiling with n = 100,000 shows:
| Step |
Self-time |
Memory allocated |
floor() |
66.7% (80 ms) |
102 MB |
& |
16.7% (20 ms) |
16 MB |
| ` |
` |
8.3% (10 ms) |
is.na() |
8.3% (10 ms) |
— |
| Total (100 reps) |
120 ms |
133 MB |
A single-pass C loop can check each element in one iteration without allocating any intermediate vectors.
Proposed implementation
Add a C function .are_int_ish_dbl(x) that iterates over the REAL(x) array and returns a LGLSXP in one pass:
SEXP are_int_ish_dbl(SEXP x) {
int n = LENGTH(x);
SEXP out = PROTECT(allocVector(LGLSXP, n));
double *px = REAL(x);
int *po = LOGICAL(out);
for (int i = 0; i < n; i++) {
double v = px[i];
po[i] = ISNA(v) || (R_FINITE(v) && v == floor(v));
}
UNPROTECT(1);
return out;
}
Then replace the R method body with a call to .Call(C_are_int_ish_dbl, x).
Benchmark target
- Baseline: ~1.2 ms per call on n = 100,000 (120 ms / 100 reps), 133 MB allocated
- Target: < 0.3 ms per call on n = 100,000, < 1 MB allocated (single LGLSXP allocation)
Summary
are_int_ish.double()currently uses a chain of vectorized R operations that allocates multiple intermediate logical vectors:Profiling with n = 100,000 shows:
floor()&is.na()A single-pass C loop can check each element in one iteration without allocating any intermediate vectors.
Proposed implementation
Add a C function
.are_int_ish_dbl(x)that iterates over the REAL(x) array and returns a LGLSXP in one pass:Then replace the R method body with a call to
.Call(C_are_int_ish_dbl, x).Benchmark target