-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer needs to handle nulls #202
Comments
Can you upload the file somewhere and link to it from this issue please? I've had a quick look and I'm not sure why the file handle isn't being freed - it's happening in C++, so it probably means I've missed a destructor somewhere. You only see the problem on windows because other platforms don't have such insane file locking strategies. |
How about code to create the file? :-) r <- charToRaw('abqc,def\nabc,def')
r[3] <- raw(1)
writeBin(r, 'abc.txt') The file that triggered my initial investigating is not relevant (and way too big), and this code creates the exact file content I used for testing. It's about as "minimal working example" as I could create without going code-golf on it. |
Is it possible that an R error is causing a longjmp that then skips On Thu, Jul 9, 2015, 14:44 r2evans notifications@github.com wrote:
|
@kevinushey oooooooh, maybe |
Ah yes, it looks like that error is thrown by |
You can avoid R errors Here's an example: #include <Rcpp.h>
using namespace Rcpp;
namespace safe {
struct MkCharLenCEData {
const char* data;
int n;
cetype_t enc;
SEXP result;
};
namespace internal {
void mkCharLenCE(void* pData) {
MkCharLenCEData* pCall = (MkCharLenCEData*) pData;
pCall->result = Rf_mkCharLenCE(
pCall->data,
pCall->n,
pCall->enc
);
}
} // namespace internal
SEXP mkCharLenCE(const char* data, int n, cetype_t enc)
{
MkCharLenCEData callData;
callData.data = data;
callData.n = n;
callData.enc = enc;
callData.result = Rf_mkChar("");
Rboolean ok = R_ToplevelExec(
internal::mkCharLenCE,
(void*) &callData
);
if (!ok)
Rprintf("Failed to get character length\n");
return callData.result;
}
} // namespace safe
// [[Rcpp::export]]
CharacterVector test() {
CharacterVector result(1);
result[0] = safe::mkCharLenCE("foo\0bar", 7, CE_UTF8);
Rprintf("Still got here!\n");
return result;
}
/*** R
test()
*/ gives me: > test()
Error: embedded nul in string: 'foo\0bar'
Failed to get character length
Still got here!
[1] "" It seems like adding something like this in Rcpp (e.g. a generic template function that provides a means of executing some R API function 'safely') would be useful, just might be tricky to make generic enough. |
@kevinushey did we not do precisely this in Rcpp11, although perhaps with a bit of forbidden stuff. What we need is access to contexts and some documentation on how to use them ... but hey not sure this will happen until Rcpp* is under 50% of CRAN packages ... |
@romainfrancois you're right; that was all the stuff dragging contexts out to power |
Hmmmm, seems like a lot of work for this case. Might be possible to tackle by checking the string myself |
@kevinushey has anything changed in Rcpp that would affect this? |
@hadley not directly, no -- I think you'll have to just check the string yourself. |
Still needs to be plumbed into warning system.
After attempting to read a file with nulls fails (different issue), the datafile is locked. It is not being shown as an open connection by
showConnections(all=TRUE)
, and I know no other places to check for known open/locked connections.The file, call it
abc.txt
:(That's a null character, not the literal characters ... I don't know another way to depict it in GH.)
The fact that it cannot read a null is not the issue here (though to some people it may be a problem). More the point is that the file is locked and cannot be changed/over-written by anything. (By "anything", I tested emacs/ESS, RStudio text editor, notepad++, even some of R's base file-writing functions.) Tested combinations of
read_delim
arguments, plus the row of the datafile containing the null character:col_names
col_types
When it fails, subsequent calls to one of the two "good" combinations above works fine, though the file is still locked to outside editors. Closing the session of R is the only way I was able to unlock the file for editors to be able to save/over-write the file.
Done on win81_64 using emacs/ESS and RStudio. Attempted it on linux (ubuntu 14.04.2 with R-3.2.0) and none of these argument combinations resulted in a locked file. Not tested on mac.
The text was updated successfully, but these errors were encountered: