Permalink
Fetching contributors…
Cannot retrieve contributors at this time
1210 lines (793 sloc) 44 KB
NEWS
====
Versioning
----------
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
* Breaking backward compatibility bumps the major (and resets the minor
and patch)
* New additions without breaking backward compatibility bumps the minor
(and resets the patch)
* Bug fixes and misc. changes bumps the patch
CHANGES IN qdap VERSION 2.2.5-
----------------------------------------------------------------
BUG FIXES
* `check_spelling` and other spell checkers threw an error with a custom
dictionary that did not have at least one word beginning with all 26 letters
of the alphabet. The dictionary automatically uses `assume.first.correct=FALSE`
if this occurs. Reported by @CallumH of StackOverflow:
http://stackoverflow.com/q/33516466/1000343 See issue #217 for details.
* `check_spelling_interactive` replace substrings rather then bounded words.
This was caught by @chrisjacques See issue #221
* `replace_abbreviation` threw an error because `data.frame` converts character
to factor by default and `nchar` no longer works on factor. This was caught
by @karilint See issue #225
NEW FEATURES
MINOR FEATURES
IMPROVEMENTS
CHANGES
CHANGES IN qdap VERSION 2.2.4
----------------------------------------------------------------
NEW FEATURES
* `add_s` added to add -s, -es, or -ies to word endings.
MINOR FEATURES
IMPROVEMENTS
* `common` now returns `NULL` invisibly with a message rather than an error if
no groups meet the parmeters. Suggested by @bitanshu via issue #213
* `word_cor`'s defualt `group.var` is no longer `NULL` but set to use `1:nrow`
via `qdapTools::id(text.var)`. Thanks to Drew Schmidt for bringing this issue
to attention. Documentation and an error for `group.var = NULL` has been
updated to add clarity.
CHANGES
CHANGES IN qdap VERSION 2.2.2
----------------------------------------------------------------
BUG FIXES
* `type_token_ratio` was misnamed as `type_text_ratio`, this has been corrected.
The plot for this class also contained a misspelling "type-toke ratio" which
has been corrected as well.
NEW FEATURES
* `inspect_text` added to allow for pretty printed viewing of text strings and
**tm** `Corpus`es.
CHANGES
* The following functions had been previously deprecated and now have been
removed: `df2tm_corpus`, `tm2qdap`, `tm_corpus2wfm`, `tm_corpus2df`, `tdm`,
`dtm`, and `polarity_frame`.
CHANGES IN qdap VERSION 2.2.1
----------------------------------------------------------------
BUG FIXES
* The internal vignette "An Introduction to qdap" produced errors when compiled
by `build_qdap_vignete`. This behavior has been fixed by using static
reporting. The root of the behavior is the ability of `cm_` functions to
grab data from the global environment, which may not be the case in a `knitr`/
`rmarkdown` generated environment.
* `polarity` no longer handled phrases (words + spaces) for `polarity.frame`.
This behavior was caught by @Benasso http://stackoverflow.com/q/27156834/1000343.
This bug is a result of the changes made to `bag_o_words` earlier this year.
The bug has been fixed and a unit test put in place to ensure the bug is not
reintroduced.
* `Network.formality` did not include edge width handling. This has been
corrected.
* `word_stats` gave an incorrect warning message for missing endmarks:
"Some sentences not have standard qdap punctuation endmarks." The "do" has
been added: "Some sentences do not have standard qdap punctuation endmarks."
* `pres_debates2012` data set contained missplits in lines: 544, 1054. These
have been corrected (GitHub issue #205).
* `pos` threw an error if only one word was passed to `text.var`. Fix:
`drop = FALSE` has been added to data frame indexing. Caught by
StackOverflow user G_1991 http://stackoverflow.com/q/29896488/1000343.
* `as.tdm.wfm` would error if no grouping variable was supplied. This behavior
has been corrected.
NEW FEATURES
* `word_length` function added to give counts of word length usage by grouping
variable. See `?word_length` for details`
* `word_position` function added to give counts of the position of words within
a sentence.
* `sent_detect_nlp` added in the `sentSplit` family to wrap **NLP** package
functionality into a convenient function.
* `lexical_classification` provides a means of assessing content vs. functional
word usage at the grouping variable and sentence level. The class comes with
generic methods for `preprocessed`, `scores` (and plots of these methods),
`Animated`, `Network`, `cumulative` and `Animate.cumulative`.
* `Animate.character` added as a generic method that allows for the animation of
text. This is useful in conjunction with other \code{Animate} objects to
create complex animations with accompanying text.
* `add_incomplete` added to replace sentences with missing endmarks with a `|`
to indicate an incomplete sentence.
* `type_toke_ratio` added to determine type-token ratio per grouping variable.
IMPROVEMENTS
* `polarity` takes `polarity.frame` with phrases (words with spaces).
* The `Animate` method for the classes: `polarity` & `formality` gains the
ability to print corresponding animated text for combined use with other
`Animated` methods.
* `multigsub`/`mgsub` get a speed boost through better programming choices. See
issue #201 for details. Thank you to @Alexey Ferapontov for his critical post
http://stackoverflow.com/q/27367914/1000343 that inspired the changes.
* `formality` and `pos` now have minimal unit tests.
* `trans_context` used `message` to print to the console. This results in
truncated output. `message` has been replaced with `cat`.
* `strip` gets a speed boost (~10x) by using better regex algorithms,
consolidating code/function calls, and by creating a generic `strip` method
for different classes. Additionally, mutiple white spaces are now condensed
to a single white space.
* `scrubber` would automatically take a space and a single last character and
remove the space. This was to remove spaces before ending punctuation. `scrubber`
used `substring` rather than a more controlled regular expression.
This has been corrected. Report thanks to @Fabrizio Maccallini. See issue #207
for more information.
* `pres_debates2012` picks up a `role` column to make fitering out the
candidates easier. The variable order has also changed to put the `dialogue`
last.
CHANGES
* The **ggplot2** package is no longer in Depends. This means the user will
have to manually load the package to use additional ggplot2 features. See
GitHub issue #199 for more.
* `pos` now treats contractions words as 2 words. For example the word count on
what's is 2 for what + is. The previous behavior was to strip out the
apostrophes. This was undesirable as the sentence "She's cool" would have no
verb in the `pos` output. This change affects `pos_by` and `formality` as
well.
CHANGES IN qdap VERSION 2.2.0
----------------------------------------------------------------
BUG FIXES
* `bag_o_words` did not make use of the `bag_o_words2` helper function that has
finer grained control of the output. `...` were ignored but now are respected.
* `fry` threw an error if a group contained < 300 words but had enough text to
generate 2 texts chunks of 100 words each, caught by S. Enrico P. Indiogine.
The bug has been fixed as these groups are dropped and a warning given.
* `phrase_net` threw an error caused by **dplyr**'s (0.3) approach to subsetting
columns. Previously a vector was returned, now a `tbl_df` object is returned:
https://github.com/hadley/dplyr/issues/587. This was addressed by using
explicit `df[[index]]` rather than `df[, index]`.
NEW FEATURES
* `chunker` added to break text, optionally by grouping variables, into equal
chunks. The chunk size can be specified by giving number of words to be in
each chunk or the number of chunks.
IMPROVEMENTS
`all_words` gains `char.keep` and `char2space` arguments to enable retention
of characters and multi word phrases. These features are passed to
`freq_terms` as well. Suggested by stackoverflow's lawyeR
(http://stackoverflow.com/a/26162401/1000343).
CHANGES
* `rm_url` has been moved into its own canned regex pattern extraction/replacer
package named `qdapRegex`.
* `name2sex` now uses the **gender** package to predict sex. This makes the
function slightly slower but much more accurate than previous versions.
Because of this increased accuracy and dependence on `gender`, the arguments
`pred.sex`, `fuzzy.match`, and `database` are no longer necessary and have
been removed.
CHANGES IN qdap VERSION 2.1.1
----------------------------------------------------------------
BUG FIXES
* `syllable_count` returned the sentence (recycled) in the `words` column of the
output. This behavior has been fixed. See GitHub issue #188 for details.
* `syn` returned antonyms for some words. This was caused by the dictionary:
`qdapDictionaries::key.syn` contained antonyms and elements the were error
messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)
* The `pres_debates2012` data set contained three errors in speech attribution.
This has been corrected and the turn of talk (`tot`) as well.
* `word_stats` would throw an error if no poly-syllable words existed. This has
been corrected (reported by Nicolas Turenne).
NEW FEATURES
* `qdap_df` and `%&%` added to mimic some of the functionality of `dplyr`'s
`tbl_df` and chaining pipe in a more specific, less flexible, `qdap` oriented
way.
* `Text` added to view and change the `text.var` attribute of a `data.frame of
the class `qdap_df`.
* `cumulative` generic method added to view cumulative scores over time.
* `formality` picks up a `cumulative` method.
* `polarity` picks up a `cumulative` method.
* `end_mark` picks up a `class` (`end_mark`), `plot` method, and a `cumulative`
method.
* `syllable_sum`, `polysyllable_sum`, and `combo_syllable_sum` pick up a
`class`, `plot` method, and a `cumulative` method.
* `wfm` becomes a generic method currently applied to a `text.var` that is:
`character`, `factor` (coerced to `character`), or `wfdf`.
* `unbag` added as a compliment to `bag_o_words` and friends for undoing string
splitting. A convenience wrapper for `paste(collapse = " ")`.
* `as.Corpus.TermDocumentMatrix`, `as.Corpus.DocumentTermMatrix`, and
`as.Corpus.wfm` added to convert a matrix format to a `tm::Corpus`.
* `exclude` becomes a generic method for various classes. Functionality is the
same but with improved code readability.
* `check_spelling_interactive`, `check_spelling`, `which_misspelled`, and
`correct` allow the user to identify potentially misspelled words and
optionally suggest replacements.
* `random_data` & `random_sent` added to generate random sentence data sets and
vectors.
* `comma_spacer` added to ensure strings with commas contain a space after them.
* `check_text` added to identify potential problems in text.
* `replace_ordinal` added to convert ordinal representations of 1 through 100 to
strictly ordinal text (e.g., "1st" becomes "first").
* A vignette: `Cleaning Text & Debugging` was added to assist users with
cleaning and debugging problems in `qdap`.
* `pronoun_type`, and `subject_pronoun_type`, `object_pronoun_type` added to
examine usage of subject/object pronouns by grouping variable.
MINOR FEATURES
* `dplyr`'s chaining pipe imported for convenience. See
http://www.rdocumentation.org/packages/magrittr/functions/magrittr for details.
IMPROVEMENTS
* `wfm` gains a speed-up through generic classes and `tm` package integration
(`strip` is no longer used in `wfm`).
* `as.tdm.character` and `as.dtm.character` gain a speed boost with a `tm`
package integration.
* Added message to `as.data.frame.Corpus` for missing end-marks suggesting the
use of: `sent.split = FALSE`.
* `as.Corpus` family of functions didn't necessarily respect document names and
sometimes used numeric sequence instead. The introduction of a reader via
`tm::readTabular` has fixed this.
* `sentSplit` now gives warnings for text that may contain anomalies such as:
non-ASCII characters, factors, missing punctuation, empty cells, and no
alphabetic characters found.
* `read.transcript` now gives a warning when reading from a .docx file and the
separator (`sep`) used is still found in the text as this may indicate the
data did not split correctly.
* `dispersion_plot` now takes a named list of vectors of terms as the argument to
`match.terms`. The vectors are combined as a unified theme named with the
names of the list supplied to `match.terms`.
CHANGES
* `as.data.frame.Corpus`'s default value for `sent.split` is now `FALSE`.
* The `state` column in the `qdap::DATA2` data-set is now character (previously
factor).
CHANGES IN qdap VERSION 2.1.0
----------------------------------------------------------------
BUG FIXES
* `new_project` did not copy the .Rprofile over into the new project. This has
been fixed. Reference issue #184.
* `sentiment_frame` coerced words to factor. `stringsAsFactors = FALSE` has
been added to prevent this.
* `polarity` did not work on > 1 grams due to a bug in `sentiment_frame`
converting character to factor (thanks for the find @chewth). See GitHub
issue #185 for details.
NEW FEATURES
* `unique_by` added to allow the user to find terms unique to individual
elements of a grouping variable.
* `build_qdap_vignette` replaces the temporary place holder version of the
*Introduction to qdap vignette*. This function will replace the (1) HTML,
(2) source, & (3) R code found in `browseVignettes(package = 'qdap')`.
MINOR FEATURES
* `sub_holder` picks up a `alpha.type` argument that allows the user to specify
whether alpha or numeric keys should be used.
* `replace_number` picks up a `remove` argument that removes numbers from text.
IMPROVEMENTS
* `qheat` becomes a generic method. This means some of the internal function
class checking has been moved to individual methods for those classes.
Additionally, `qheat` now works with logical matrices/data.frames.
* The `tm` package compatibility functions have been renamed in a more R-ish
way and take the form of generic methods for specific classes. For example,
`df2tm_corpus` becomes `as.Corpus`. Here is a complete list of changes:
- `df2tm_courpus` is now `as.Corpus`
- `tm_corpus2df` is now `as.data.frame`
- `as.wfm` is now a generic method
- `tm_corpus2wfm` is now `as.wfm`
- `tm2qdap` is now `as.wfm`
- `tdm` is now `as.tdm` or `as.TermDocumentMatrix`
- `dtm` is now `as.dtm` or `as.DocumentTermMatrix`
CHANGES
* `colsplit2df` and `colpaste2df` no longer convert character columns to factor.
* `df2tm_corpus` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.Corpus` instead.
* `tm_corpus2df` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.data.frame` instead.
* `tm2qdap` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.wfm` instead.
* `tm_corpus2wfm` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.wfm` instead.
* `tdm` is deprecated. It will be removed in a subsequent version of `qdap`.
Use `as.tdm` or `as.TermDocumentMatrix` instead.
* `dtm` is deprecated. It will be removed in a subsequent version of `qdap`.
Use `as.dtm` or `as.DocumentTermMatrix` instead.
* The *Introduction to qdap* .Rmd vignette has been moved to an internal
directory. The HTML version is not built by default. This saves CRAN space
and time checking the package source. The file has been replaced with a
temporary place holder that contains instructions for building the actual
vignette. The user may also use the `build_qdap_vignette` directly.
* `qdap` incorporates the changes from the `tm` package version: 0.6:
http://cran.r-project.org/web/packages/tm/news.html Reference issue #187.
CHANGES IN qdap VERSION 2.0.0
----------------------------------------------------------------
The `qdapTools` package now houses several former qdap functions. While
`qdapTools` is a Dependency and all of these functions will be accessible to
the qdap user there is a break in backward compatibility if these functions
are included in code. For this reason this release is a major bump of qdap.
BUG FIXES
* `replace_number` did not replace single digits numbers. Spotted by Ben Bolker.
This behavior has been fixed and unit testing added for this function. See
issue # 178.
NEW FEATURES
* `sub_holder` added; this function holds the place for particular character
values, allowing the user to manipulate the vector and then revert the place
holders back to the original values.
* `Network` method added to make network plots of select qdap objects.
* `qtheme`, `theme_nightheat`, `theme_duskheat`, theme_norah`, `theme_cafe`,
`theme_grayscale`, `theme_badkitchen`, and `theme_hipster` added to style
`Network` plots.
* `polarity` picks up a `Network` method.
* `formality` picks up a `Network` method.
* qdap officially begins utilizing the `testthat` package for unit testing,
though only a few functions have begun the process, more will be added over
time.
MINOR FEATURES
IMPROVEMENTS
CHANGES
* The `qdapTools` package now houses the following former `qdap` functions:
`hash`, `%ha%`, `hash_look`, `hms2sec`, `id`, `lookup`, `%l%`, `%l+%`, `%l*%`,
`repo2github`, `sec2hms`, `text2color`, `url_dl`, `v_outer`, `list2df`,
`matrix2df`, `vect2df`, `list_df2df`, `list_vect2df`, `counts2list`,
`vect2list`, & `mtabulate`. These functions will continue to be available to
qdap users in interactive mode (`qdapTools` is a Dependency and thus these
functions are loaded into the workspace by default). This will allow this
bundle of functions to be used outside of qdap without calling the larger qdap
package per the request of Kirill Muller (see issue #165).
* As scheduled the `dissimilarity` function has been removed from the qdap
package to avoid conflict with the `tm` package. Use `Dissimilarity` function
instead.
CHANGES IN qdap VERSION 1.3.6
----------------------------------------------------------------
MINOR FEATURES
* `polarity` picks up a `constrain` argument that constrains the polarity values
to be between -1 and 1.
IMPROVEMENTS
* `polarity`'s equation now uses primes on the de-amplifiers before they're
confined to be >= -1. This avoids confusion in the indicator function that
took the de-amplifiers variable and returned the same variable.
* `dist_tab`'s frequency columns used a capital F in Freq. This was not
consistent across all column names and has been changed to lower case.
CHANGES
* `polarity_frame` is deprecated and will be removed in a subsequent release.
Please use `sentiment_frame` instead.
CHANGES IN qdap VERSION 1.3.5
----------------------------------------------------------------
BUG FIXES
* The An Introduction to qdap vignette contained a broken link in the tm
Package Compatibility section. This has been fixed. Also the reliance on
`Rgraphviz` from the vignette has been removed. This will eliminate CRAN
WARN in CRAN checks (for some OS) but not the note for `tm`'s reliance on
`Rgraphviz`.
* `polarity` reported the incorrect number of words for sentences containing
commas. This has been fixed (Max Ghenis).
NEW FEATURES
* `formality` picks up an `Animate` method.
* `end_mark_by` function added as a aggregated grouping version of `end_mark`.
MINOR FEATURES
* `raj.act.1POS` added. `raj.act.1POS` is a data set for Romeo and Juliet: Act 1
broken into parts of speech.
IMPROVEMENTS
* `discourse_map` picks up a `pause` argument that enables the user to pause
between plots in interactive mode.
CHANGES
CHANGES IN qdap VERSION 1.3.4
----------------------------------------------------------------
BUG FIXES
NEW FEATURES
* `gantt` and `gantt_wrap` (single facet) pick up and `Animate` method.
* `polarity` picks up an `Animate` method.
* `vertex_apply` and `edge` apply added to make uniform changes to lists of
`igraph` objects.
MINOR FEATURES
IMPROVEMENTS
* `discourse_map` picks up a `condense` argument that allows the user to
condense sequential rows for like grouping variable sub groups.
* `list_df2df` names now use a zero padded numeric portion for default names.
For example `c("L1", "L2", "L3", ... "L10")`, becomes
`c("L01", "L02", "L03", ... "L10")`.
CHANGES
CHANGES IN qdap VERSION 1.3.3
----------------------------------------------------------------
BUG FIXES
* `colpaste2df` dropped the column name for a single retained column when
`keep.orig = FALSE`. See GitHub issue #157 for more.
* `multigsub` (`mgsub`) would return `NA` for replacement of length 1 after the
addition of the `order.pattern` (used to prevent substrings from
replacing meta-strings) in version 1.3.2.
NEW FEATURES
* `phrase_net` function provides functioning similar to the Many Eyes
Phrase Net plot.
* `discourse_map` function provides a network mapping of the flow of discourse
between social actors. Function output is `Animate` ready as well. See
`?discourse_map` and http://trinker.github.io/qdap_examples/animation_dialogue
for more.
* `Animate` function added to convert select qdap outputs to an animated
sequence. See `?Animate.discourse_map` for more.
MINOR FEATURES
* `synonyms_frame` (`syn_frame`) added to allow the user to create a synonym
hash for the revamped `synonyms` function.
* `repo2github` function added to send a directory to GitHub upon first commit.
IMPROVEMENTS
* `new_project` has an improved directory structure and works with any version
of the `reports` package.
* `synonyms` function used the `env.syl` hash data from qdapDictionaries
internally. This approach could cause problems if used within other functions
in a package. It also limits the usability of synonyms. The `synonyms`
function picks up a `synonym.frame` argument that allows the user to specify
a synonym hash table. This can be created via the `synonyms_frame` function
(per a request from J. Aravind).
CHANGES
CHANGES IN qdap VERSION 1.3.2
----------------------------------------------------------------
This is a patch release to address the archiving of the `lsa` package.
BUG FIXES
* The **qdap-tm Package Compatibility** Vignette contained an error in the
Feinerer I, Hornik K, Meyer D (2008) reference (pages listed as 51-54 has been
corrected to pages 1-54 as well as incorrect journal). Caught by Kurt Hornik.
MINOR FEATURES
* `DocumentTermMatrix` and `TermDocumentMatrix` from the tm package pick up a
`Filter` method.
IMPROVEMENTS
* `multigsub` picks up an argument, `order.pattern`, to prevent substrings from
replacing meta-strings.
* The following data sets were added to qdapDictionaries package:
`Fry_1000`, `Leveled_Dolch`, `Dolch`
CHANGES
* The package `lsa` has been removed from Suggests field in the DESCRIPTIONN
file, examples, and vignettes.
CHANGES IN qdap VERSION 1.3.1
----------------------------------------------------------------
A version bump necessary for Re-Submission to CRAN.
CHANGES
* `new_project` was reconfigured with the old code that does not require the
newest version of the reports package.
CHANGES IN qdap VERSION 1.3.0
----------------------------------------------------------------
BUG FIXES
* `read.transcript` could leave a QDAP_PLACE_HOLDER behind if a colon was found
in the person column. This behavior has been fixed.
* `word_cor`'s plotting method threw an error if a word did not have any words
above the r threshold. This behavior has been corrected.
* `Filter` overwrote a base R function; this has been fixed per Joshua Ulrich.
* `scores.polarity`'s print method would return an error if columns were not
indexed yet were rounded. For instance, the following threw an error:
`scores(with(sentSplit(DATA, 4), polarity(state, person)))[, 1:4]`
This behavior has been fixed.
NEW FEATURES
* qdap adds an HTML vignette to better explain the intended work flow and
function use for the package. Use `browseVignettes(package = "qdap")` to
open.
* qdap adds a PDF vignette to describe the compatibility and navigation between
qdap and the `tm` packages. Use `browseVignettes(package = "qdap")` to open.
MINOR FEATURES
IMPROVEMENTS
* `apply_as_df` picks up a `stopwords` and `filter` arguments that allows the
user to remove stopwords and min/max length words.
* `plot.word_cor` picks up the argument `ncol` that allows the user to specify
the number of columns used. This uses `ggplot2`'s `facet_wrap` rather than
`facet_grid` (which is the default if `ncol =NULL`).
* `name2sex` relied upon having qdapDictionaries loaded. This could be an issue
if the function were used internally. The user now supplies a dictionary of
names and probabilities.
* `df2tm_corpus` gains a `demographics.vars` argument that allows the user to
add demographic information to the resulting corpus `DMetaDat`.
* `tm_corpus2df` gains the ability to convert `DMetaDat` into demographic
data.frame columns.
CHANGES
CHANGES IN qdap VERSION 1.2.0
----------------------------------------------------------------
BUG FIXES
NEW FEATURES
* `Filter` added to give the ability to provide a range of character
lengths to filter from a `wfm` object.
* `scores` generic method added to view scores from select qdap objects.
* `counts` generic method added to view counts from select qdap objects.
* `proportions` generic method added to view proportions from select qdap
objects.
* `preprocessed` generic method added to view preprocessed data from select qdap
objects.
* `apply_as_df` added to allow the user to apply qdap functions to a Corpus
directly.
MINOR FEATURES
* `tm_corpus2wfm` added to quickly convert from a **tm** package `Corpus` to a qdap
`wfm` object.
* `as.wfm` added as a means to attempt to coerce a matrix to a `wfm` object.
* `%l+%` added as a counterpart to `%l%` that assumes `missing = NULL`.
* `%bs%` added as quick counterpart to `boolean_search` for indexing.
IMPROVEMENTS
* `df2tm_corpus` now sets metaData information for ID and creator (based on)
`Sys.info()["user"]`.
* `matrix2df` now accepts a simple_triplet_matrix object as well.
* `word_cor` output that was a list (not a correlation matrix) did not have a
plot method. The plot method for `word_cor` now handles both matrices and the
list of correlations.
* `rm_row` picks up the `contains` argument that allows the user to search for,
and remove rows of, within the string, not just the beginning.
* `read.transcript` now handles multiple character spaces as an argument to
`sep` when `text` argument is used.
CHANGES
* `dissimilarity` has been renamed to `Dissimilarity` to prevent tm package
conflicts. The old version has been deprecated and will be removed in a the
next version (minor or major) push to CRAN.
CHANGES IN qdap VERSION 1.1.0
----------------------------------------------------------------
A version bump necessary for Re-Submission to CRAN.
CHANGES
* Downgraded the version requirement for the reports package to
reports (>= 0.1.2) in order to upload to CRAN. reports (>= 0.2.0) is not yet
available on CRAN.
CHANGES IN qdap VERSION 1.0.0
----------------------------------------------------------------
The word lists and dictionaries in `qdap` have been moved to `qdapDictionaries`.
Additionally, many functions have been renamed with underscores instead of the
former period separators. These changes break backward compatibility. Thus
this is a **major** release (ver. 1.0.0).
It is the general practice to deprecate functions within a package before
removal, however, the number of necessary changes in light of qdap being
relatively new to CRAN, made these changes sensible at this point.
BUG FIXES
* `qheat`'s argument `by.column = FALSE` resulted in an error. This behavior
has been fixed.
* `question_type` did not work because of changes to `lookup` that did not
accept a two column matrix for `key.match`. See GitHub issue #127 for more.
* `combo_syllable.sum` threw an error if the `text.var` contained a cell with an
all non-character ([a-z]) string. This behavior has been fixed.
* `todo` function created by `new_project` would not report completed tasks if
`report.completed = TRUE`.
* `termco` and `termco.d` threw an error if more than one consecutive regex
special character was passed to `match.list` or `match.string`. See GitHub
issue #128 for more.
* `trans.cloud` threw an error if a single list with a named vector was passed
to `target.words`. This behavior has been fixed.
* `sentSplit` now returns the "tot" column when `text.place = "original"`.
* `all_words` output dataframe FREQ column class has been changed from factor to
numeric. Additionally, the WORDS column prints using `left.just` but retains
traditional character properties (print class added). `all_words` also picks
up `apostrophe.remove` and `ldots` (for `strip`) arguments.
* `gantt_plot` did not handle `fill.vars`, particularly if the fill was nested
within the `grouping.vars`. This behavior has been fixed with corresponding
examples added.
* `url_dl` - Downloaded an empty file when not using a Dropbox key. This
behavior has been fixed.
* The `cm_code.` family of functions had a bug in the output due to
`cm_long2dummy` and `cm_dummy2long`'s handling of stretching spans. This has
been corrected.
* `cm_code.exclude` did not output the correct excluded spans. This behavior
has been corrected.
* The use of `comment` to convey object characteristics has been replaced with
the use of `class`.
* `question_type` did not include question words ending in 'd as part of the
category. For instance "How'd you like it?" was not classified as a how
question.
* `beg2char` would not include the `char` if `include = TRUE` and `noc = 1`.
* `cm_range2long` returned `NA`s for vectors containing multiple single values.
See GitHub issue #144 for more.
* `termco` family of functions did not handle `NA` values. This has been fixed.
(Matt Williamson) See GitHub issue #147 for details.
* `pos` threw an error for vectors of length 1. This has been fixed (Kurt
Hornik). See GitHub issue #150 for details.
* `formality` threw an error for vectors of length 1. This has been fixed. (Kurt
Hornik) See GitHub issue #151 for details.
NEW FEATURES
* The `cm_xxx2long` family of functions (`cm_df2long`, `cm_range2long` and
`cm_time2long`) now have a generic wrapper, `cm_2long`, to generate the long
formats.
* `hash_look` (and `%ha%`) a counterpart to `hash` added to allow quick access
to a hash table. Intended for use within functions or multiple uses of the
same hash table, whereas `lookup` is intended for a single external (non
function) use which is more convenient though could be slower.
* `boolean_search`, a Boolean term search function, added to allow for indexed
searches of Boolean terms.
* `trans_context` is a printing function desired to grab the context (n rows
before and after) an event (an index from a vector of indices). The function
prints the indices around the episode from a transcript to the console or a
.csv, .xlsx, .txt, or .doc file.
* `colpaste2df` is a wrapper for `paste2` that pastes dataframe columns together
and outputs a dataframe.
* `colcomb2class` quickly combines columns for number of qdap classes including
output from: `termco`, `question_type`, `pos_by`, and `character_table`.
* `lview` a function to unclass a list output that has a special print method
that returns only a portion of the output. `lview` re-classes to "list".
* `word_cor` added to find words within grouping variables that are associated
based on correlation.
* `tm2qdap` a function to convert `"TermDocumentMatrix"` and
`"DocumentTermMatrix"` to a `wfm` added to allow easier integration with the
`tm` package.
* `apply_as_tm` a function to allow functions intended to be used on the `tm`
package's `TermDocumentMatrix` to be applied to a `wfm` object.
* `tm_corpus2df` and `df2tm_corpus` added to convert a tm package corpus to a
dataframe for use in qdap or vice versa.
* `tdm` and `dtm` are now truly compatible with the `tm` package. `tdm` and
`dtm` produce outputs of the class `"TermDocumentMatrix"` and
`"DocumentTermMatrix"` respectively. This change (coupled with the renaming
of `stopwords` to `rm_stopwords`) should make the two packages logical
companions and further extend the qdap package to integrate with the many
packages that already handle `"TermDocumentMatrix"` and
`"DocumentTermMatrix"`.
* `cm_distance` now uses resampling of data from the null model to generate
pvalues for the mean code distances. Useful for determining if an association
(small distance) between codes is likely to happen if the null is true.
* `dispersion_plot` added to enable viewing of word dispersion through
discourse.
* `word_proximity` added to compliment `dispersion_plot` and `word_cor`
functions. `word_proximity` gives the average distance between words in
the unit of sentences.
MINOR FEATURES
* `url_dl` now takes quoted string urls supplied to ... (no url argument is
supplied)
* `condense` is a function that condense dataframe columns that are a list of
vectors to a single vector of strings. This outputs a dataframe with
condensed columns that can be wrote to csv/xlsx.
* `mcsv_w` now uses `condense` to attempt to attempt to condense columns that are
lists of vectors to a single vector of strings. This adds flexibility to
`mcsv_w` with more data sets. `mcsv_w` now writes lists of dataframes to
multiple csvs (e.g., the output from `termco` or `polarity`). `mcsv_w` picks
up a dataframes argument, an optional character vector supplied in lieu of
\ldots that grabs the dataframes from an environment (default id the Global
environment).
* `ngrams` now has an argument ellipsis that passes further arguments supplied
to `strip`
* `dtm` added to compliment `tdm`, allowing for easier integration with other R
packages that utilize `tdm`/`dtm`.
* `dir_map` picks up a `use.path` argument that allows the user to specify a
more flexible path to the created pre-formed `read.transcript` scripts based
on something like `file.path(getwd(), )`. This means portability of code on
different machines.
* `polarity_frame` a function to make a hash environment lookup for use with the
`polarity` function.
* `DATA.SPLIT` a `sentSplit` version of the `DATA` data-set has been added to
qdap.
* `gantt_plot` accepts NULL for `grouping.var` and figures for "all" rows as a
single grouping var.
* `replace_number` now handles 10^47 digits compared to 10^14 previously.
* The `new_project` function gains a `github` argument that optionally sends the
repo to GitHub public account upon creation.
* `qheat`, `polarity.plot` and `formality.plot` pick up the argument `plot`
which optionally suppresses the plotting. This is useful if the user is
operating in **knitr**, **sweave**, etc. and wishes to alter/add onto the plot.
* `lookup` now takes `missing = NULL`. This results in the original values in
`terms` corresponding to the missing elements being retained.
* `cm_time.temp` picks up a `grouping.var` argument that works similarly to
`cm_range.temp`'s `grouping.var`. `cm_time.temp` also takes hour values for
`start` and `end` as in `end = "01:22:03"`.
* `gantt_rep` picks up a generic `plot` method.
* Functions in the `cm_code.xxx` and `cm_xxx2long` pick up a generic plot method
that utilizes `gantt_wrap` to plot a Gantt plot of the span data.
* Functions in the `cm_code.xxx` and `cm_xxx2long` pick up a generic summary
method. This summary method has its own plot method that utilizes `qheat` to
plot a heatmap of the summary statistics. The generic print method
(`print.sum_cmspans`) is useful for output intended for publication.
* `qheat` picks up a `facet.vars` argument that allows a character vector of
length 1 or 2 to facet by.
* `question_type` gives the indices of questions via `$inds`.
* `colsplit2df` not splits multiple columns to match the capabilities of
`colpaste2df`.
* `sentSplit` now handles repeated measures and picks up a turn of talk plot
method.
* `tot_plot` now handles repeated measures and `grouping.var` to be nested
within the turn of talk.
* `wfm` now uses `mtabulate` and is ~10x faster.
* `plot.polarity` gains arguments for optional error bars using the standard
error of the mean polarity.
* `exclude` now works with `wfm` and the `tm` package's `DocumentTermMatrix` and
`TermDocumentMatrix` classes.
* `rm_url` removes/replaces URLs in a string(s).
* `matrix2df` added (under `list2df`) to convert `rownames` of matrix to a
dataframe column.
CHANGES
* The dictionaries and word lists for qdap have been moved to their own package,
`qdapDictionaries`. This will allow easier access to these resources beyond
the qdap package as well as reducing the overall size of the qdap package.
Because this is a major change that make break the code of some users the
major release number has been upped to 1. The following name changes have
occurred:
- `increase.amplification.words` -> became -> `amplification.words`
- The `deamplification.words` wordlist and `env.pol` dictionary were added as
well.
* qdap gains an HTML package vignette to better explain the intended work flow
and function use for the package. This is not currently a part of the build
but can be accessed via:
http://htmlpreview.github.io/?https://github.com/trinker/qdap/blob/master/vignettes/qdap_vignette.html
*Note* that the vignette may include development version functions not yet
available in the current CRAN version
* `polarity` utilizes a new, unbounded algorithm based on weighting to determine
polarity.
* `gantt_wrap` no longer accepts unquoted strings to the `plot.var` argument.
* `cm_df.temp` loses the logical `csv` argument. `file.name` have been replaced
with `file` to fit conventional R naming schemes.
* The plotting feature of `gantt` has been removed and a `plot` method has been
added. The user can plot the output from `gantt` in `base` or `ggplot2`
graphics.
* `cm_time2long` loses the argument `start.end` to ensure that the `cmspans`
class produced would operate as expected.
* Most exported functions utilizing a period separator have been replaced with
underscore named versions.
* `wf_combine` renamed `wfm_combine` to be consistent.
* `question_type` algorithm improvements including implied do/does/did handling.
* `list2df` and `mtabulate` now exported.
* `stopwords` has been renamed to `rm_stopwords`(`rm_stop` shorthand) to better
fit what the action the function performs and to avoid conflicts with the
`tm` package.
* `replace_number`'s `num.paste` becomes logical rather than character input.
This makes use easier as the user doesn't need to remember arguments.
CHANGES IN qdap VERSION 0.2.5
----------------------------------------------------------------
Patch release. This version deals with the changes in the `openNLP` package
that effect qdap. Next major release scheduled after `slidify` package is
pushed to CRAN.
qdap 0.2.3
----------------------------------------------------------------
BUG FIXES
* `new_project` placed a report in the CORRESPONDENCE directory rather than
CONTACT_INFO
* `strip` would not allow the characters "/" and "-" to be passed to
`char.keep`. This has been fixed. (Jens Engelmann)
* `beg2end` would only grab first character of a string after n -1 occurrences of
the character. For example:
`beg2char(c("abc-edw-www", "nmn-ggg", "rer-qqq-fdf"), "-", 2)` resulted in
"abc-e" "nmn-g" "rer-q" rather than "abc-edw" "nmn-ggg" "rer-qqq"
NEW FEATURES
* `names2sex` a function for predicting gender from name.
* Added `NAMES` and `NAMES_SEX` data-sets, based on 1990 U.S. census data.
* `tdm` added as an equivalent to TermDocumentMatrix from the tm package. This
allows for portability across text analysis packages.
MINOR FEATURES
* `mgsub` now gets a `trim` argument that optionally removes trailing leading
white spaces.
* `lookup` now takes a list of named vectors for the key.match argument.
CHANGES
* `new_project` directory can now be transferred without breaking paths (i.e.,
`file.path(getwd(), "DIR/file.ext")` is used rather than the full file path).
CHANGES IN qdap VERSION 0.2.2
----------------------------------------------------------------
BUG FIXES
* `genXtract` labels returned the word "right" rather than the right edge string.
See http://stackoverflow.com/a/15423439/1000343 for an example of the old
behavior. This behavior has been fixed.
* `gradient_cloud`'s `min.freq ` locked at 1. This has been fixed. (Manuel
Fdez-Moya)
* `termco` would produce an error if single length named vectors were passed to
match.list and no multi-length vectors were supplied. Also an error was thrown
if an unnamed multi-length vector was passed to `match.list`. This behavior has
been fixed.
NEW FEATURES
* `tot_plot` a visualizing function that uses a bar graph to visualize patterns
in sentence length and grouping variables by turn of talk.
* `beg2char` and `char2end` functions to grab text from beginning of string to a
character or from a character to the end of a string.
* `ngrams` function to calculate ngrams by grouping variable.
MINOR FEATURES
* `genX` and `bracketX` gain an extra argument `space.fix` to remove extra
spaces left over from bracket removal.
* Updated out of date Dropbox url download in `url_dl`. `url_dl` also takes the
Dropbox key as well.
CHANGES
* qdap is now compiled for mac users (as `openNLP` now passes CRAN checks with no
Errors on Mac).
CHANGES IN qdap VERSION 0.2.1
----------------------------------------------------------------
BUG FIXES
* `word_associate` colors the word cloud appropriately and deals with the error
caused by a grouping variable not containing any words from 1 or more of the
vectors of a list supplied to match string
* `trans.cloud` produced an error when expand.target was TRUE. This error has
been eliminated.
* `termco` would eliminate > 1 columns matching an identical search.term found
in a second vector of match.list. `termco` now counts repeated terms multiple
times.
* `cm_df.transcript` did not give the correct speaker labels (fixed).
NEW FEATURES
* `gradient_cloud`: Binary gradient Word Cloud - A new plotting function
that plots and colors words for a binary variable based on which group of
the binary variable uses the term more frequently.
* `new_project`: A project template generating function designed to increase
efficiency and standardize work flow. The project comes with a .Rproj file
for easy use with RStudio as well as a .Rprofile that makes loading and sourcing
of packages, data and project functions. This function uses the reports package
to generate an extensive reports folder.
MINOR FEATURES
* `stemmer`, `stem2df` and `stem.words` now explicitly have the argument
`char.keep` set to "~~" to enable retaining special character formerly stripped
away.
* `hms2sec`: A function to convert from h:m:s format to seconds.
* `mcsv_w` now takes a list of data.frames.
* `cm_range.temp` now takes the arguments text.var and grouping.var that will
automatically output these (grouping.var) columns as range coded indices.
* `wfm` gets as speed boost as the code has been re-written to be faster.
* `read.transcript` now reads .txt files as well as text similar to read.table.
CHANGES
* `sec2hms` is the new name for `convert`
* `folder` and `delete` have been moved to the reports package which is imported
by qdap. Previously `folder` would not generate a directory with the
time/date stamp if no directory name was given; this has been fixed, though
the function now resides in the reports package.
CHANGES IN qdap VERSION 0.2.0
----------------------------------------------------------------
* The first installation of the qdap package
* Package designed to bridge the gap between qualitative data and quantitative
analysis