Skip to content
Browse files

[scripts] fix the English wiktionary

There is a duplicate article in the dictionary so need to rename one
of them.

This script renames the one with trailing whitespace

(actually a zero or 1 pixel width space character, highlights as a 1
pixel width space in EMACS)

Signed-off-by: Christopher Hall <hsw@openmoko.com>
  • Loading branch information...
1 parent 606a70f commit 0db58e35483805389bd8f20424e51b661cf9f434 @hxw hxw committed Jun 19, 2012
Showing with 18 additions and 0 deletions.
  1. +18 −0 scripts/fix-endict.sh
View
18 scripts/fix-endict.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+
+# fix a duplicate entry in dictionary
+
+for file in enwiktionary-*-pages-articles.xml
+do
+ dst="${file}-FIXED"
+ if [ -e "${dst}" ]
+ then
+ echo already fixed: ${file}
+ else
+ echo -n fixing: ${file} ...
+ # there is a zero width or 1 pixel width space just before '</title>'
+ # so replace it with '-DUP'
+ sed 's@<title>ឃើញ​</title>@<title>ឃើញ-DUP</title>@' < "${file}" > "${dst}"
+ echo ' 'wrote: ${dst}
+ fi
+done

0 comments on commit 0db58e3

Please sign in to comment.
Something went wrong with that request. Please try again.