GitHub - tlwg/swath

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 354 Commits
build-aux		build-aux
conv		conv
data		data
lib		lib
src		src
tests		tests
.cvsignore		.cvsignore
.gitignore		.gitignore
AUTHORS		AUTHORS
COPYING		COPYING
ChangeLog		ChangeLog
INSTALL		INSTALL
Makefile.am		Makefile.am
NEWS		NEWS
README		README
autogen.sh		autogen.sh
configure.ac		configure.ac

Repository files navigation

SWATH (Smart Word Analysis for THai)
====================================

Thai script has no word delimiter. While it's trivial for human readers to
recognize word boundaries while reading, it requires some knowledge for the
machine to do the same when wrapping lines or moving cursor word-wise, etc.
Normally, applications need such feature to support Thai text processing.

Swath is a general-purpose utility to workaround the lack of such capability
in applications. It analyzes the given Thai text by consulting a Thai word
list for word boundaries, before outputting the same text with the predefined
word delimiters inserted.

It can read many kinds of input, including plain text and structured documents
like HTML, RTF, LaTeX and Lambda (Unicode version of LaTeX with Omega
typesetter kernel). [See -f option].

For the known documents, it inserts the common word delimiters used in the
corresponding formats, and pipes (|) for plain text. But the user can always
override this with a preferred delimiter. [See -b option.]

Swath can also be configured to use different algorithms for the analysis.
Currently, it supports two schemes: longest (greedy) matching and maximal
(least words) matching. [See -m option.]

EXAMPLES
========

- For LaTeX (to be used with babel-thai package):

    $ swath -f latex < mydoc.tex > mydoc.ttex
    $ latex mydoc.ttex

  Or if you composed your LaTeX source in UTF-8:

    $ swath -f latex -u u,t mydoc.tex > mydoc.ttex
    $ latex mydoc.ttex

  This is equivalent to filtering with iconv(1):

    $ iconv -f UTF-8 -t TIS-620 mydoc.tex | swath -f latex > mydoc.ttex
    $ latex mydoc.ttex

- For HTML (to provide web pages to web browsers that cannot wrap Thai lines
  properly, but support the <wbr> tag):

    $ swath -f html < mydoc.html > mydoc-wbr.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases 4

Packages

Contributors 2

Languages

License

tlwg/swath

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages