BASIC SPELL CORRECTION IN JAVA

I've recently come across a popular article by Peter Norvig on basic spelling correction at How to Write a Spelling Corrector

Peter Norvig implemented a clever algorithm in 21 of Python code and there are now implementations in many other languages.

I first noticed that the first Java implementation is embarrassingly large at 372 lines of code. And the second java implementation clocking in at 35 lines of code.

So let's see how this looks like in Java 8.

23 lines of code. ( I assume we are counting non empty lines only and exclude the import statements )

I tried to use the modern concepts like Streams, and except loading the dictionary this looks like it's written in a functional style.

It's also thread-safe, which is nice. Obviously, this is Java and it's object oriented. You could have multiple instances using different dictionaries.

Also, I'm afraid to say, that being Java, this turned out being embarrassingly slow.

Tested on a 2015 MacBook Pro.

Original Python code

cpython 2.7 11s
PyPy 2.7 5s

My Java 8 code

Java 1.8.0_73-b02 21s

Likely due to slow string operations. This could be made much faster in Java, but would also require more code.

/**
 * Java 8 Spelling Corrector.
 * Copyright 2016 Peter Kuhar.
 *
 * Open source code under MIT license: http://www.opensource.org/licenses/mit-license.php
 */

import java.nio.file.*;
import java.util.*;
import java.util.stream.*;

public class Spelling {
    private Map<String,Integer> dict = new HashMap<>();

    public Spelling(Path dictionaryFile) throws Exception{
        Stream.of(new String(Files.readAllBytes( dictionaryFile )).toLowerCase().replaceAll("[^a-z ]","").split(" ")).forEach( (word) ->{
            dict.compute( word, (k,v) -> v == null ? 1 : v + 1  );
        });
    }

    Stream<String> edits1(final String word){
        Stream<String> deletes    = IntStream.range(0, word.length())  .mapToObj((i) -> word.substring(0, i) + word.substring(i + 1));
        Stream<String> replaces   = IntStream.range(0, word.length())  .mapToObj((i)->i).flatMap( (i) -> "abcdefghijklmnopqrstuvwxyz".chars().mapToObj( (c) ->  word.substring(0,i) + (char)c + word.substring(i+1) )  );
        Stream<String> inserts    = IntStream.range(0, word.length()+1).mapToObj((i)->i).flatMap( (i) -> "abcdefghijklmnopqrstuvwxyz".chars().mapToObj( (c) ->  word.substring(0,i) + (char)c + word.substring(i) )  );
        Stream<String> transposes = IntStream.range(0, word.length()-1).mapToObj((i)-> word.substring(0,i) + word.substring(i+1,i+2) + word.charAt(i) + word.substring(i+2) );
        return Stream.of( deletes,replaces,inserts,transposes ).flatMap((x)->x);
    }

    Stream<String> known(Stream<String> words){
        return words.filter( (word) -> dict.containsKey(word) );
    }

    String correct(String word){
        Optional<String> e1 = known(edits1(word)).max( (a,b) -> dict.get(a) - dict.get(b) );
        Optional<String> e2 = known(edits1(word).map( (w2)->edits1(w2) ).flatMap((x)->x)).max( (a,b) -> dict.get(a) - dict.get(b) );
        return dict.containsKey(word) ? word : ( e1.isPresent() ? e1.get() : (e2.isPresent() ? e2.get() : word));
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BASIC SPELL CORRECTION IN JAVA

About

Releases

Packages

Languages

unrelatedlabs/SpellingCorrector-Java8

Folders and files

Latest commit

History

Repository files navigation

BASIC SPELL CORRECTION IN JAVA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages