Simple F# demonstration of text classification
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitattributes
.gitignore
ADRDemo.fsproj
ADRDemo.sln
App.config
Classifier.fs
LICENSE.md
NGramTokenizer.fs
Program.fs
README.md
TFIDFCalculator.fs
Test.fsx
Text.fs
Types.fs

README.md

ADRDemo

A simple F# demonstration of Automated Document Recognition using techniques like text tokenization, n-grams, TF-IDF weighting, CSV parsing, and text classification.

The code assumes the existence of some training data, in the form of plaintext files organized into folders by category:

  • \TrainingData
    • \CategoryA
      • \Sample1.txt
      • \Sample2.txt
    • \CategoryB
      • \SampleA.txt

...and a plain text file to be classified: "unknown.txt".

It also assumes the existence of a word whitelist CSV file, but this can be easily changed to a blacklist ("stopwords") or removed altogether.