-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to get PCA from LSA model #12
Comments
Are you thinking about adding a correspondence analysis (CA) option as well? Arguably, CA could be tapping into underlying linguistic properties a bit better than PCA. |
I hadn't thought about it. From this paper ( If you want, you can open up an issue for me to look into this. I'll do my On Mon, Mar 21, 2016 at 2:03 PM smikhaylov notifications@github.com wrote:
|
You can set it up on DFM directly. That's how it's implemented in quanteda textmodel_ca function. It's calling ca package. Another option is vegan package. Vegan is widely used in ecology and has more functionality. Btw, quanteda is another higher-level framework implementation. |
I will look at quanteda as well. I'm going to do benchmarks on SVD from irlba, RSpectra, and quanteda. I'll implement the version that seems fastest/most scalable. At the end of the day, all of LSA, PCA, and CA rely on SVD. So, it's just a matter of which one works best. It seems that all three of textmineR, text2vec, and quanteda use the same data type. I am in the process of reworking textmineR to be a higher-level package, built on text2vec. @dselivanov has done an amazing job at creating a framework that works faster and is more scalable than any other I've seen (in any language), at least on a single machine. Maybe the quanteda maintainers might want to do the same? My current plan (not written anywhere on GitHub) is to create wrappers for...
The goal is to have a library that uses similar syntax and returns similar objects to get a wide range of topic models so users don't have to hunt them all down. My personal PhD research focuses on evaluation metrics for topic models. So, textmineR has that functionality as well. |
I think that sounds really good. And combination with text2vec is great. |
Add option to get PCA from LSA model
The text was updated successfully, but these errors were encountered: