Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDATS 0.2.0 #131

Merged
merged 20 commits into from
Jul 10, 2019
Merged

LDATS 0.2.0 #131

merged 20 commits into from
Jul 10, 2019

Conversation

juniperlsimonis
Copy link
Member

@juniperlsimonis juniperlsimonis commented Jul 5, 2019

API updates

  • At the LDA_TS function level, the separate inputs for data tables (document_term_table and document_covariate_table) have been merged into a single input data, which can be just the document_term_table or a list including the document_term_table and optionally also a document_covariate_table. If covariates aren't provided, the function now constructs a covariate table assuming equi-spaced observations. If using a list, the function assumes that one and only one element of the list will have a name containing the letters "term", and at most one element containing the letters "covariate" (regular expressions are used for matching). addresses Data API design questions #119
  • timename has been moved from within the TS_controls_list to a main argument in all associated functions.
  • The control lists have been made easier to interact with. Primarily, the arguments that previously required LDA_controls_list, TS_controls_list, or LDA_TS_controls_list inputs now take general list inputs (so LDA_TS does not need to have a nested set of control functions). Each control list is passed through a function (LDA_set_control, TS_control, or LDA_TS_control) to set any non-input values to their defaults. This also allows the removal of those controls list class definitions. (addresses set up the controls lists to just be lists #130)

Fixed and updated example code to improve user experience

Updated calculation of the number of observations in LDA

  • The number of observations for a VEM-fit LDA is now calculated as the number of entries in the document-term matrix (following Hoffman et al. and Buntine, see ?logLik.LDA_VEM for references.
  • Associated, we now include an AICc function that is general and works in this specific case as defined (addresses add AICc functionality back in #129)

Fixed bug in plotting across multiple outputs

Renamed functions

  • summarize_TS has been renamed package_TS to align with the other package_ functions that package model output.

Simulate functions

  • Basic simulation functionality has been added for help with generating data sets to analyze. (addresses issue 114)
  • sim_LDA_data simulates an LDA model's document-term-matrix
  • sim_TS_data simulates an TS model's document-topic distribution matrix
  • sim_LDA_TS_data simulates an LDA_TS model's document-term-matrix
  • softmax and logsumexp are added as utility functions

Improved pkgdown site

pulling it out of the TS controls list
LDA_TS now takes a single "data" input that can be either a document term table or a list containing at least a document term table and optionally also a covariate table. now if a covariate table isn't provided in LDA_TS, the assumption is made that the data are equispersed in time
also added in some additional tests to fill in coverage
@juniperlsimonis
Copy link
Member Author

@Renata I tagged you as a reviewer as I'd like to make sure the update to the API works smoothly for your integration with MATSS and to make sure that I've covered all the updates that need to happen with the vignettes. I'm not completely done with this PR yet, but take a look and let me know what you think.

@codecov-io
Copy link

codecov-io commented Jul 5, 2019

Codecov Report

Merging #131 into master will increase coverage by 0.38%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #131      +/-   ##
==========================================
+ Coverage   97.18%   97.56%   +0.38%     
==========================================
  Files          10       11       +1     
  Lines        1030     1191     +161     
==========================================
+ Hits         1001     1162     +161     
  Misses         29       29
Impacted Files Coverage Δ
R/simulate.R 100% <100%> (ø)
R/LDA_TS.R 100% <100%> (ø) ⬆️
R/utilities.R 100% <100%> (ø) ⬆️
R/multinom_TS.R 100% <100%> (ø) ⬆️
R/LDA_TS_plots.R 94.44% <100%> (ø) ⬆️
R/TS_plots.R 88.17% <100%> (+0.12%) ⬆️
R/TS_on_LDA.R 99.35% <100%> (+0.04%) ⬆️
R/TS.R 98.68% <100%> (+0.1%) ⬆️
R/LDA_plots.R 98.86% <100%> (ø) ⬆️
R/ptMCMC.R 99.46% <100%> (+0.03%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8fe7099...8d079b6. Read the comment docs.

The control lists have been made easier to interact with. Primarily, the arguments that previously required `LDA_controls_list`, `TS_controls_list`, or `LDA_TS_controls_list` inputs now take general `list` inputs (so  `LDA_TS` does not need to have a nested set of control functions). Each control list is passed through a function (`LDA_set_control`, `TS_control`, or `LDA_TS_control`) to set any non-input values to their defaults. This also allows the removal of those controls list class definitions. ([addresses issue 130](#130))
@juniperlsimonis juniperlsimonis changed the title WIP LDATS 0.2.0 LDATS 0.2.0 Jul 8, 2019
@juniperlsimonis
Copy link
Member Author

@diazrenata i've got all of the updates for v0.2.0 that i wanted to get to in here, so please take a look when you can and let me know what you think/if i need to change anything.
the changes to the api will break pipelines based v0.1.0, so i want to make sure you're squared away before we merge in

what is AIC was being called deviance
replaced AIC with logLik for TS_fit, which allows AIC to work still and give value as before
and strictly enforcing integer or date (and thus integer) timesteps
base simulation functions for data going into an LDA or a TS from parameters
combining the two simulate functions
also adding an input to the sim LDA function for Theta (allowing the simple combination of the two functions but also allowing for logical input to the LDA document by topic and topic by term)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants