Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for latent semantic analysis #207

Closed
BobMuenchen opened this issue Mar 8, 2017 · 2 comments
Closed

Adding support for latent semantic analysis #207

BobMuenchen opened this issue Mar 8, 2017 · 2 comments

Comments

@BobMuenchen
Copy link

I think it would be fairly easy to add support for the lsa package to tidytext and broom (posted as issues at both). See example below.

# Put some docs in a vector
library("dplyr")
doc1 <- c("pets dog cat ferret")
doc2 <- c("sandwiches turkey ham")
doc3 <- c("cat ferret cat bird")
doc4 <- c("turkey beef sandwiches")
myvector <- c(doc1,doc2,doc3,doc4)
mydf <- data_frame(id = 1:4, text = myvector)

# Create a corpus
library("quanteda")
mycorpus <- corpus(mydf, text_field = "text")
mytokens <- tokens(mycorpus)
mydfm <- dfm(mytokens)

# Perform LSA
mytdm <- convert(mydfm, to = "lsa")
mytdm_weighted = lw_logtf(mytdm) * gw_idf(mytdm)
myLSAspace = lsa(mytdm_weighted, dims=2)

# Here's how broom::augment could add 
# factor scores back to the original data frame
factor_scores <- as_data_frame(myLSAspace$dk)
(augmented <- bind_cols(mydf, factor_scores))

# Here's how tidytext:tidy could tidy the factor loadings
library("tidyverse")
# as.data.frame is used to maintain row names until
# rownames_to_column can get them
loadings_tidy <- as.data.frame(myLSAspace$tk) %>%
  rownames_to_column() %>%
  rename(term = rowname) %>%
  gather(factor, loading, # The new variables.
         starts_with("V"), # These go into "loading".
         -term) %>%  # term is not "gathered".
  arrange(factor, desc(loading)) %>% # Sort
  select(factor, term, loading) # Change var order to enhance readablity.

print(loadings_tidy)
@alexpghayes
Copy link
Collaborator

At the moment, NLP related tidiers live in tidytext. It looks like this is on the tidytext docket (juliasilge/tidytext#46), so I'm going to close this issue.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants