Releases will be numbered with the following semantic versioning format:
And constructed with the following guidelines:
- Breaking backward compatibility bumps the major (and resets the minor and patch)
- New additions without breaking backward compatibility bumps the minor (and resets the patch)
- Bug fixes and misc changes bumps the patch
qdapRegex 0.7.0 - 0.7.2
rm_dollar's regex now allows for commas in the dollar portion.
as_countadded to convert
ex_citationinto counts of citations.
ex_added to compliment the
graband functions that use
@rm_xxxnow work on
explainis fully functional again as http://rick.measham.id.au/paste/explain is again functioning.
rm_prefixed functions get an extraction counterpart prefixed with
This means users can use
ex_functions directly without using the
rm_form in the less convenient form of
rm_xxx(extract = TRUE).
rm_numberincorrectly did not handle multiple comma separated digits (see issue #17). This behavior has been fixed and a unit test added to ensure proper handling.
rm_betweendid not handle single quotation marks (
') as both the left and right boundary when
extract = TRUE. Related to issue #13
rm_transcript_timeadded to remove transcript specific style of time stamp tagging. See http://help-nv10mac.qsrinternational.com/desktop/procedures/import_audio_or_video_transcripts.htm for details.
as_time2added for use with
These are convert to the standard HH:MM:SS.OS format and optionally converts to
as.POSIXlt. The former outputs a list of vectors of times while the later wraps
regex_supplementdictionary to provide a means to remove all occurrences of a character except the first appearance. Regex from: http://stackoverflow.com/a/31458261/1000343
r_between_multiplepick up a
rightboundaries containing regular expression special characters were fixed by default (escaped). This did not allow for the powerful use of a regular expression for left/right boundaries. The
fixed = TRUEbehavior is still the default but users can now set
fixed = FALSEto work with regular expression boundaries. This new feature was inspired by @Ronak Shah's StackOverflow question: http://stackoverflow.com/q/31623069/1000343
word_boundary_rightregexes in the
regex_supplementdid not include apostrophes as a viable word character. Apostrophes are now included as a word character.
explainno longer prints the regular expression explanation to the command line. Instead the link to http://www.regexper.com is printed. This change is because http://rick.measham.id.au/paste/explain no longer appears to be working. The text explanation functionality will return if the website becomes operational again or if a suitable substitute can be found.
rm_numberdid not extract consecutive digits that aren't comma separated without separating it into multiple strings. For example "12345" became "123" "45". Also 444,44 will not be removed/extracted as it is not a valid comma separated number. These behavior have been corrected and the unit test now include these cases. Thanks to Jason Gray for the rework of the regex. It is simpler and more accurate.
rm_betweendid not handle quotation marks (
") as both the left and right boundary when
extract = TRUE. Bug reported by Tori Shannon, http://stackoverflow.com/q/31119989/1000343, and addressed by Jason Gray. See issue #13
as_numeric2added for use with
rm_number. These are wrappers for
as.numeric(gsub(",", "", x)). The former removes commas and converts a list of vectors of strings to numeric. The later wraps
rm_non_wordsadded to remove every any character that isn't a letter, apostrophe, or single space.
extractedhas been added and is the output of a
extract = TRUE. This allows for the
c.extractedfunction to easily turn the
listoutput into a character vector.
c.extractedadded to provide a quick unlist method for
lists of class
extracted. The is less typing than
unlistfor an approach that is used often.
bind_oradded as a means of quickly wrapping multiple sub-expression elements with left/right boundaries and then concatenate/joins the grouped strings with regular expression or statement ("|").
regex_supplementdictionary for easy negation of
qdapRegex 0.2.1 - 0.3.2
messageto print to the console.
explainnow returns an object of the class
explainwith its own print method which uses
message. Additionally, the characters
&were not handled correctly; this has been corrected.
TC"there is an incomplete sentence. It is as follows: TC utilizes additional rules for capitalization beyond
stri_trans_totitlethat includes..." (found by rmsharp). This has been corrected. See issue #8
regex_cheatdictionary) contained misspellings in the words greedy and beginning. This has been corrected.
rm_numberincorrectly handled numbers containing leading or trailing zeros. See issue #9
rm_caps_phrasescould only extract/remove up to two "words" worth of capital letter phrases at a time. See issue #11
%+%binary operator version of
pastex(x, y, sep = "")added to join regular expressions together.
group_oradded as a means of quickly wrapping multiple sub-expression elements with grouping parenthesis and then concatenate/joins the grouped strings with regular expression or statement ("|").
rm_repeated_charactersadded for removing/extracting/replacing words with repeated characters (each repeated > 2 times). Regex pattern comes from: StackOverflow's vks (http://stackoverflow.com/a/29438461/1000343).
rm_repeated_phrasesadded for removing/extracting/replacing repeating phrases (> 2 times). Regex pattern comes from: StackOverflow's BrodieG (http://stackoverflow.com/a/28786617/1000343).
rm_repeated_wordsadded for removing/extracting/replacing repeating words (> 2 times).
run_splitregex added to the
regex_supplementdictionary to split runs into chunks.
Regular Expression Dictionaries (e.g.,
regex_supplement) are now managed with the regexr package. This enables cleaner updating of the regular expressions with easier to read structure. Longer files will be stored in this format. Files located: https://github.com/trinker/qdapRegex/tree/master/inst/regex_scripts
rm_caps_phrasehas a new regular expression that is more accurate and does not pull trailing white space.
qdapRegex 0.1.3 - 0.2.0
pastexwould throw a warning on a vector (e.g.,
pastex(letters)). This has been fixed.
youtube_idwas documented under
qdap_supplementand contained an invalid hyperlink. This has been fixed.
rm_citationcontained a bug that would not operate on citations with a comma in multiple authors before the and/& sign. See issue #4
is.regexadded as a logical check of a regular expression's validy (conforms to R's regular expression rules).
rm_postal_codeadded for removing/extracting/replacing U.S. postal codes.
Case wrapper functions,
U(upper case), and
L(lower case) added for convenient case manipulation.
groupfunction added to allow for convenient wrapping of grouping parenthesis around regular expressions.
rm_citation_texadded to remove/extract/replace bibkey citations from a .tex (LaTeX) file.
regex_cheatdata set and
cheatfunction added to act as a quick reference for common regex task operations such a lookaheads.
rm_caps_phraseadded to supplement
rm_caps, extending the search to phases.
explainadded to view a visual representation of a regular expression using http://www.regexper.com and http://rick.measham.id.au/paste/explain. Also takes named regular expressions from the
regex_usaor other supplied dictionary.
last_occurrenceregex added to the
regex_supplementdictionary to find the last occurrence of delimiter.
regex_supplementdictionary to provide a true word boundary. Regexes adapted from: http://www.rexegg.com/regex-boundaries.html#real-word-boundary
rm_time2regex added to the
regex_usadictionary to find time + AM/PM
regex_usadictionary regular expressions:
rm_betweenpick up grouping that allows for replacement of individual sections of the substring. See
pastexpicks up a
separgument to allow the user to choose what string is used to separate the collapsed expressions.
rm_citation3now attempt to include last names that contain the lower case particles: von, van, de, da, and du.
CRAN fix for oldrel Windows. Updated to R version 3.1.0 in Description file.
bindadded as a convenience function to add a left and right boundary to each element of a character vector.
First CRAN Release
rm_citationadded for removing/extracting/replacing APA 6 style in-text citations.
rm_whiteand accompanying family of
rm_whitefunctions added to remove white space.
rm_non_asciiadded to remove non-ASCII characters from a string.
around_added to extract word(s) around a given point.
pages2added to the
regex_supplementdata set for removing/extracting/validating page numbers.
rm_XXXfamily of functions now use
stringi::stri_extract_all_regexas this approach is much faster than the
regmatches(text.var, gregexpr(pattern, text.var, perl = TRUE))approach.
qdapRegex 0.0.1 - 0.2.0
This package is a collection of regex tools associated with the qdap package that may be useful outside of the context of discourse analysis. Tools include removal/extraction/replacement of abbreviations, dates, dollar amounts, email addresses, hash tags, numbers, percentages, person tags, phone numbers, times, and zip codes.