Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for Linear Discriminant Analysis (LDA) #25

Open
sourish-cmi opened this issue Aug 23, 2022 · 9 comments
Open

Design for Linear Discriminant Analysis (LDA) #25

sourish-cmi opened this issue Aug 23, 2022 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@sourish-cmi
Copy link
Collaborator

sourish-cmi commented Aug 23, 2022

I am thinking about how we should do the Linear Discriminant Analysis (LDA) in CRRao. I am thinking out loud. Please correct me if I am saying something wrong. The design that I am thinking of is as follows:

container = @fitmodel(formula, data, modelClass,ClassificationType,CovarianceType)

Example: For binary classification:

container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary,ShrinkageCov)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary,PythonCov)

Example: For multi-class classification:

container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi,ShrinkageCov)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi,PythonCov)

The default covariance type would be sample covariance.

@ajaynshah
Copy link
Member

What's PythonCov?

What's the state of Julia packages for robust covariance matrices? Should we let the user pass a function as an argument?

@sourish-cmi
Copy link
Collaborator Author

  • By PythonCov, I meant Python's covariance estimation process.
  • To my understanding, Julia is using simple covariance matrices.
  • Regarding your question: Should we let the user pass a function as an argument? - I am giving CovarianceType as one of the options. There may be a default setup.

@ajaynshah
Copy link
Member

I was thinking that there will be a general technology for covariance matrix estimation: simple, multiple robust methods, maybe something that works for spare matrices in big data, etc. So there should be a default (simple) but the caller should be able to supply a function that computes the covariance matrix. Or alternatively maybe there will be a function compute.cov(X, method), the caller to LDA should be able to supply the method.

@sourish-cmi
Copy link
Collaborator Author

Humm -- I like both ideas.

Idea 1) covariance matrix estimation: simple, multiple robust methods, shrinkage estimation methods etc.

Idea 2) a function compute.cov(X, method), the caller to LDA should be able to supply the method.

I like the second idea with a default robust method of R.

@sourish-cmi
Copy link
Collaborator Author

@ajaynshah @ayushpatnaikgit @codetalker7 @ShouvikGhosh2048

Struggling to decide - should we use MultivariateStat.jl for LDA. AND/OR should we use Aman's Julia code from scratch for LDA

Ayush's point if we rely on too many packages - then some people will never able to use CRRao because some package will be broken

On the other hand - why bother we are going to rely on lazy load in any way...

Requesting your comment -- now I want to move to LDA development for CRRao

@sourish-cmi
Copy link
Collaborator Author

For now, I am thinking about developing the LDA with Aman's code which is faster than R and Python sklearn but slower than MultivariateStat.jl

Once MultivariateStat.jl becomes stable - we can later adapt the LDA of MultivariateStat.jl as a fast option.

@sourish-cmi sourish-cmi self-assigned this Sep 7, 2022
@sourish-cmi sourish-cmi added the enhancement New feature or request label Sep 7, 2022
@sourish-cmi
Copy link
Collaborator Author

@ajaynshah @ayushpatnaikgit

We will raise an issue with MultivariateStat.jl that predict is not working. If they provide a solution then we would go ahead and take it in CRRao.jl

Otherwise, we will contribute in MultivariateStat.jl.

@ajaynshah
Copy link
Member

Yes, great, let's put all our knowledge on LDA to work to make MultivariateStat.jl stronger. And then in CRRao we will just call that LDA. Let's do the usual hard work:

  • Textual narration of what's wrong with the existing code
  • test cases which demo that
  • PR that solves this.

so that it gets rapidly accepted into the main package.

@sourish-cmi
Copy link
Collaborator Author

I have created this issue with MultivariateStats.jl

JuliaStats/MultivariateStats.jl#204

Basically I said the predict for MulticlassLDA is not working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants