trthatcher/DiscriminantAnalysis.jl

Regularized discriminant analysis in Julia. Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information. doc example src test .gitignore .travis.yml LICENSE.md Dec 18, 2015 README.md REQUIRE

Discriminant Analysis

Summary

DiscriminantAnalysis.jl is a Julia package for multiple linear and quadratic regularized discriminant analysis (LDA & QDA respectively). LDA and QDA are distribution-based classifiers with the underlying assumption that data follows a multivariate normal distribution. LDA differs from QDA in the assumption about the class variability; LDA assumes that all classes share the same within-class covariance matrix whereas QDA relaxes that constraint and allows for distinct within-class covariance matrices. This results in LDA being a linear classifier and QDA being a quadratic classifier.

Documentation

Full documentation is available on Read the Docs.

Visualization

When the data is modelled via linear discriminant analysis, the resulting classification boundaries are hyperplanes (lines in two dimensions):  Getting Started

If the package has not yet been installed, run the following line of code:

`Pkg.add("DiscriminantAnalysis")`

Sample code is available in example.jl. To get started, run the following block of code:

```using DataFrames, DiscriminantAnalysis;

pool!(iris_df, [:Species]);  # Ensure species is made a pooled data vector```

Although Dataframes.jl is not required, it will help to stage the data. We require a data matrix (row-major or column-major ordering of the observations). We then define the data matrix `X` and the class reference vector `y` (vector of 1-based class indicators):

```X = convert(Array{Float64}, iris_df[[:PetalLength, :PetalWidth, :SepalLength, :SepalWidth]]);
y = iris_df[:Species].refs;  # Class indices```

A Linear Discriminant model can be created using the `lda` function:

```julia> model1 = lda(X, y)
Linear Discriminant Model

Regularization Parameters:
γ = N/A

Class Priors:
Class 1 Probability: 33.3333%
Class 2 Probability: 33.3333%
Class 3 Probability: 33.3333%

Class Means:
Class 1 Mean: [1.464, 0.244, 5.006, 3.418]
Class 2 Mean: [4.26, 1.326, 5.936, 2.77]
Class 3 Mean: [5.552, 2.026, 6.588, 2.974]```

Once the model is defined, we can use it to classify new data with the `classify` function:

```julia> y_pred1 = classify(model1, X);

julia> accuracy1 = sum(y_pred1 .== y)/length(y)
0.98```

By default, the LDA model assumes the data is in row-major ordering of observations, the class weights are equal and that the regularization parameter gamma is unspecified.

```julia> model2 = lda(X', y, order = Val{:col}, gamma = 0.1, priors = [2/5; 2/5; 1/5])
Linear Discriminant Model

Regularization Parameters:
γ = 0.1

Class Priors:
Class 1 Probability: 40.0%
Class 2 Probability: 40.0%
Class 3 Probability: 20.0%

Class Means:
Class 1 Mean: [1.464, 0.244, 5.006, 3.418]
Class 2 Mean: [4.26, 1.326, 5.936, 2.77]
Class 3 Mean: [5.552, 2.026, 6.588, 2.974]

julia> y_pred2 = classify(model2, X');

julia> accuracy2 = sum(y_pred2 .== y)/length(y)
0.98```

We may also fit a Quadratic Discriminant model using the `qda` function:

```julia> model3 = qda(X, y, lambda = 0.1, gamma = 0.1, priors = [1/3; 1/3; 1/3])

Regularization Parameters:
γ = 0.1
λ = 0.1

Class Priors:
Class 1 Probability: 33.3333%
Class 2 Probability: 33.3333%
Class 3 Probability: 33.3333%

Class Means:
Class 1 Mean: [1.464, 0.244, 5.006, 3.418]
Class 2 Mean: [4.26, 1.326, 5.936, 2.77]
Class 3 Mean: [5.552, 2.026, 6.588, 2.974]

julia> y_pred3 = classify(model3, X);

julia> accuracy3 = sum(y_pred3 .== y)/length(y)
0.9866666666666667```

Finally, if the number of classes is less than or equal to the number of parameters, then a Canonical Discriminant model may be used in place of a Linear Discriminant model to simultaneously reduce the dimensionality of the classifier:

```julia> model4 = cda(X, y, gamma = Nullable{Float64}(), priors = [1/3; 1/3; 1/3])
Canonical Discriminant Model

Regularization Parameters:
γ = N/A

Class Priors:
Class 1 Probability: 33.3333%
Class 2 Probability: 33.3333%
Class 3 Probability: 33.3333%

Class Means:
Class 1 Mean: [1.464, 0.244, 5.006, 3.418]
Class 2 Mean: [4.26, 1.326, 5.936, 2.77]
Class 3 Mean: [5.552, 2.026, 6.588, 2.974]

julia> y_pred4 = classify(model4, X);

julia> accuracy4 = sum(y_pred4 .== y)/length(y)
0.98```
You can’t perform that action at this time.