Skip to content
This repository has been archived by the owner on Oct 22, 2023. It is now read-only.
/ wordcut-x Public archive

A word segmentation tool for ASEAN languages written in Java

License

Notifications You must be signed in to change notification settings

veer66/wordcut-x

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wordcut-x

A word segmentation tool for ASEAN languages written in Java

Example

1 answer

package rockers.veer66;

import java.io.IOException;
import java.net.URI;
import java.net.URL;

public class SimpleExample {
	
	public static void main(String[] args) throws IOException {
		URL dixUrl = URI.create("http://file.veer66.rocks/dix/tdict-std.txt").toURL();
		var wc = Wordcut.fromDixUrl(dixUrl);
		System.out.println(wc.putDelimiter("กากาก", "|"));
		// Result: กา|กาก
	}
}

m answers in directed acyclic graph

package rockers.veer66;

import java.io.IOException;
import java.net.URI;
import java.net.URL;

public class DagExample {
	public static void main(String[] args) throws IOException {
		URL dixUrl = URI.create("http://file.veer66.rocks/dix/tdict-std.txt").toURL();
		var wc = Wordcut.fromDixUrl(dixUrl);
		System.out.println(wc.buildDag("กากาก"));
		// RESULT:
		// [[(DAG-EDGE :s 0 :e 0 :etype INIT)], 
		// [(DAG-EDGE :s 0 :e 1 :etype UNK)], 
		// [(DAG-EDGE :s 0 :e 2 :etype DICT)], 
		// [(DAG-EDGE :s 0 :e 3 :etype DICT)], 
		// [(DAG-EDGE :s 2 :e 4 :etype DICT)], 
		// [(DAG-EDGE :s 2 :e 5 :etype DICT)]]
	}
}

About

A word segmentation tool for ASEAN languages written in Java

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages