Skip to content

Commit

Permalink
since then my English improved
Browse files Browse the repository at this point in the history
  • Loading branch information
yuex committed Aug 8, 2015
1 parent 7f28a48 commit c56a09a
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 25 deletions.
35 changes: 23 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,32 @@
# CJK Auto Spacing

A pelican plugin to insert spaces between Chinese/Japanese/Korean characters and English words.
A pelican plugin to insert space between Chinese/Japanese/Korean character
and English word.

# Why

For Chinese readers, it's reading for torture rather than pleasure if Chinese characters and English words are put together without spaces. (See [Effects](#effects), there's a comparison)
For Chinese readers, it's reading for torture instead of pleasure if Chinese
characters and English words are mingled togther without spaces. (See
[Effects](#effects) for comparison)

Moreover, research shows that those who love putting Chinese characters and English words together without space have more troubles in love (see [why space?][] in Chinese). Up to 70% marry one they don't love at 34 years old. And the other 30%, even worse, have nobody to inherit their legacies except cats.
Moreover, research shows that those who love putting Chinese characters and
English words together without space have more troubles in love ([why space?][]
in Chinese). Up to 70% marry one they don't love at 34 years old. And the other
30%, even worse, have nobody to inherit their legacies except cats.

So I think it's not very hard to conclude the necessarity of space for Chinese users of pelican.
So, to Chinese users, it is not hard to see the necessarity of using spaces.

**Note:** I don't speak or write Japanese or Korean, but I feel we can get the same conclusion. They have lots in common. And perhaps that's why they are called CJK Unified Ideographs, together as a whole, in the unicode standards.
**Note**: I know nothing about Japanese and Korean, but I feel confidently we
can get the same conclusion. They have lots in common. And perhaps that's why
they are called CJK Unified Ideographs together as a whole in the unicode
standards.

# Options

By default it will only process the content. To process title, add following
line to your `pelicanconf.py`

CJK_AUTO_SPACING_TITLE = True

# Effects

Expand All @@ -28,12 +44,7 @@ With CJK Auto Spacing

![with spacing](./screenshot1.png)

If you feel that the first image is fine, then read it again, again and again, until you feel it's not okay.

# Options

By default it will only process the content.

You can set the ``CJK_AUTO_SPACING_TITLE`` parameter to True if you need the title to be processed as well. Default is False.
If you find nothing wrong in the first picture, re-read it until you find it's
wrong.

[why space?]: https://github.com/vinta/paranoid-auto-spacing
25 changes: 12 additions & 13 deletions cjk_auto_spacing.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,13 @@
]


def _chinese_auto_spacing(str):
def _chinese_auto_spacing(text):

def _with_range(char, check_range):
# XXX: actually this kind of searching will see a improvment from O(n)
# to O(1) when using patricia instead of list. but for a blog plugin
# processing offline, I think it doesn't matter too much. for those who
# writes a lot, this improvements may be expected.
for start, end in check_range:
if char >= start and char <= end:
return True
Expand All @@ -35,7 +39,7 @@ def is_punc(char):
ret = u''
prev = None

for char in str:
for char in text:
sp = u''
curr_is_cjk = is_cjk(char)
curr_is_punc = is_punc(char)
Expand All @@ -44,7 +48,7 @@ def is_punc(char):
prev_is_cjk, prev_is_punc = prev

if curr_is_punc or prev_is_punc:
# do not add space to punctuation
# do not add space around punctuation
sp = u''

elif prev_is_cjk != curr_is_cjk:
Expand All @@ -53,27 +57,22 @@ def is_punc(char):
ret = ret + sp + char
prev = (curr_is_cjk, curr_is_punc)

if ret:
return ret
else:
return str
return ret


def process_content(content):
if content._content == None:
if content._content is None:
return

content._content = _chinese_auto_spacing(content._content)


def process_title(generator, metadata):
if 'CJK_AUTO_SPACING_TITLE' not in generator.settings:
if ('CJK_AUTO_SPACING_TITLE' not in generator.settings
or not generator.settings['CJK_AUTO_SPACING_TITLE']):
return

if not generator.settings['CJK_AUTO_SPACING_TITLE']:
return

if metadata.has_key(u'title'):
if u'title' in metadata:
metadata[u'title'] = _chinese_auto_spacing(metadata[u'title'])


Expand Down

0 comments on commit c56a09a

Please sign in to comment.