Skip to content

thompson-dean/TokenizeAndTag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

TokenizeAndTag

Basic word tokenizer and tagger.

Used this project as means to understand Swift's NaturalLanguage library.

Here are is some sample code to build a tokenizer. I had trouble taking the output from the tokenizer and placing it an array due to the output being of type SubString. After researching on stack overflow I found out you can just wrap the SubString in a String wrapper and you are good to go.

func tokenizeText(text: String) {
        let tokenizer = NLTokenizer(unit: .word)
        tokenizer.string = text
        tokenizer.setLanguage(.english)
        
        tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { (range, attributes) -> Bool in
            let word = String(text[range])
            tokenizedWords.append(word)
            return true
        }
    }

Here are is some sample code to build a tagger. The tagger also tokenizes the words so in the end I ended up only using the tagger.

func tag(text: String) {
        let tagger = NSLinguisticTagger(tagSchemes: [.lexicalClass], options: 0)
        tagger.string = text
        let range = NSRange(location: 0, length: text.utf16.count)
        let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace]
        tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange, _ in
            if let tag = tag {
                let word = (text as NSString).substring(with: tokenRange)
                tokenizedWords.append(word)
                tags.append(tag.rawValue)
                
            }
        }

In terms of UI, it was really fun to add an expandable animated button.

Screenshots

Screenshots

About

Basic word tokenizer and tagger

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages