-
Notifications
You must be signed in to change notification settings - Fork 192
Split on sentence and other boundaries #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@lmullen that's right. |
@@ -109,6 +109,9 @@ regex <- function(pattern, ignore_case = FALSE, multiline = FALSE, | |||
boundary <- function(type = c("character", "line_break", "sentence", "word"), | |||
skip_word_none = TRUE, ...) { | |||
type <- match.arg(type) | |||
|
|||
if (type != "word" & missingArg(skip_word_none)) skip_word_none <- FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to make the default value of skip_word_none
NA
, and then do:
if (identical(skip_word_none, NA)) {
skip_word_none <- type == "word"
}
This would also need some doc updates
when this will be merged? |
@lmullen do you want to finish this off? It also needs a bullet point in NEWS |
This bug causes warnings if options(warnPartialMatchArgs=TRUE).
…e to use a formula and hyphens to break long lines in the source code.
…h no placeholders.
@hadley Sorry, I screwed up squashing the pull request. Mind if I resubmit this as a new, clean PR? |
Yeah, sure |
This pull request fixes a problem with splitting on boundaries other than words. Currently, splitting on sentence boundaries returns a list with an empty character vector:
The problem is that
boundary()
setsskip_word_none = TRUE
by default. But ifstringi:stri_split_boundaries()
is called for any boundary other than word boundaries, andskip_word_none
is set toTRUE
, then it returns an empty character vector. For non-word boundaries, this fix setsskip_word_none
toFALSE
unless the user has deliberately chosen otherwise.The PR adds tests for sentence splitting.