str_interp()now renders lists consistently independent on the presence of additional placeholders (@amhrasmussen)
str_replace_all()with a named vector now respects modifier functions (#207)
str_trunc()is once again vectorised correctly (#203, @austin3dickey).
NAvalues more gracefully (#217). I've also tweaked the sizing policy so hopefully it should work better in notebooks, while preserving the existing behaviour in knit documents (#232).
- During package build, you may see
Error : object ‘ignore.case’ is not exported by 'namespace:stringr'. This is because the long deprecated
perl()have now been removed.
str_glue_data()provide convenient wrappers around
glue_data()from the glue package (#157).
str_flatten()is a wrapper around
stri_flatten()and clearly conveys flattening a character vector into a single string (#186).
str_remove_all()functions. These wrap
str_replace_all()to remove patterns from strings. (@Shians, #178)
str_squish()removes spaces from both the left and right side of strings, and also converts multiple space (or space-like characters) to a single space within strings (@stephlocke, #197).
omit_naargument for ignoring
NAs and keeps the original strings. (@yutannihilation, #164)
Bug fixes and minor improvements
str_trunc()now preserves NAs (@ClaytonJY, #162)
str_trunc()now throws an error when
widthis shorter than
perl()have now been removed.
str_match_all()now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent with
str_match()and other match failures (#134).
replacementcan now be a function that is called once for each match and whose return value is used to replace the match.
A new vignette (
vignette("regular-expressions")) describes the details of the regular expressions supported by stringr. The main vignette (
vignette("stringr")) has been updated to give a high-level overview of the package.
Minor improvements and bug fixes
numericargument for sorting mixed numbers and strings.
str_replace_all()now throws an error if
replacementis not a character vector. If
NA_character_it replaces the complete string with replaces with
All functions that take a locale (e.g.
str_sort()) default to "en" (English) to ensure that the default is consistent across platforms.
Add sample datasets:
coll()now throw an error if you use them with anything other than a plain string (#60). I've clarified that the replacement for
boundary()has improved defaults when splitting on non-word boundaries (#58, @lmullen).
str_detect()now can detect boundaries (by checking for a
str_count()> 0) (#120).
str_extract_all()now work with
boundary(). This is particularly useful if you want to extract logical constructs like words or sentences.
simplifyargument when used with
str_subset()now respects custom options for
fixed()patterns (#79, @gagolews).
str_replace_all()now behave correctly when a replacement string contains
\\\\1, etc. (#83, #99).
simplifyargument to match
str_view_all()create HTML widgets that display regular expression matches (#96).
NAfor indexes greater than number of words (#112).
stringr is now powered by stringi instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail.
stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal.
str_c()now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, using
str_c("x", NA)now yields
NA. If you want
str_replace_na()on the inputs.
str_replace_all()gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector:
input <- c("abc", "def") str_replace_all(input, c("[ad]" = "!", "[cf]" = "?"))
str_match()now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent with
str_extract()and other match failures.
str_subset()keeps values that match a pattern. It's a convenient wrapper for
str_sort()allow you to sort and order strings in a specified locale.
str_conv()to convert strings from specified encoding to UTF-8.
boundary()allows you to count, locate and split by character, word, line and sentence boundaries.
The documentation got a lot of love, and very similar functions (e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need.
ignore.case(x)has been deprecated in favour of
fixed|regex|coll(x, ignore.case = TRUE),
perl(x)has been deprecated in favour of
str_join()is deprecated, please use
fixed path in
str_wrapexample so works for more R installations.
remove dependency on plyr
Zero input to
str_split_fixedreturns 0 row matrix with
perlthat switches to Perl regular expressions
str_matchnow uses new base function
regmatchesto extract matches - this should hopefully be faster than my previous pure R algorithm
str_wrapfunction which gives
strwrapoutput in a more convenient format
wordfunction extract words from a string given user defined separator (thanks to suggestion by David Cooper)
str_locatenow returns consistent type when matching empty string (thanks to Stavros Macrakis)
str_countcounts number of matches in a string.
str_trimreceive performance tweaks - for large vectors this should give at least a two order of magnitude speed up
str_length returns NA for invalid multibyte strings
fix small bug in internal
- all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters
- fixed() function now tells stringr functions to use fixed matching, rather than escaping the regular expression. Should improve performance for large vectors.
- new ignore.case() modifier tells stringr functions to ignore case of pattern.
- str_replace renamed to str_replace_all and new str_replace function added. This makes str_replace consistent with all functions.
- new str_sub<- function (analogous to substring<-) for substring replacement
- str_sub now understands negative positions as a position from the end of the string. -1 replaces Inf as indicator for string end.
- str_pad side argument can be left, right, or both (instead of center)
- str_trim gains side argument to better match str_pad
- stringr now has a namespace and imports plyr (rather than requiring it)
- fixed() now also escapes |
- str_join() renamed to str_c()
- all functions more carefully check input and return informative error messages if not as expected.
- add invert_match() function to convert a matrix of location of matches to locations of non-matches
- add fixed() function to allow matching of fixed strings.
- str_length now returns correct results when used with factors
- str_sub now correctly replaces Inf in end argument with length of string
- new function str_split_fixed returns fixed number of splits in a character matrix
- str_split no longer uses strsplit to preserve trailing breaks