Skip to content
This repository
Browse code

[scripts] fix the English wiktionary

There is a duplicate article in the dictionary so need to rename one
of them.

This script renames the one with trailing whitespace

(actually a zero or 1 pixel width space character, highlights as a 1
pixel width space in EMACS)

Signed-off-by: Christopher Hall <hsw@openmoko.com>
  • Loading branch information...
commit 0db58e35483805389bd8f20424e51b661cf9f434 1 parent 606a70f
Christopher Hall authored June 19, 2012

Showing 1 changed file with 18 additions and 0 deletions. Show diff stats Hide diff stats

  1. 18  scripts/fix-endict.sh
18  scripts/fix-endict.sh
... ...
@@ -0,0 +1,18 @@
  1
+#!/bin/sh
  2
+
  3
+# fix a duplicate entry in dictionary
  4
+
  5
+for file in enwiktionary-*-pages-articles.xml
  6
+do
  7
+  dst="${file}-FIXED"
  8
+  if [ -e "${dst}" ]
  9
+  then
  10
+    echo already fixed: ${file}
  11
+  else
  12
+    echo -n fixing: ${file} ...
  13
+    # there is a zero width or 1 pixel width space just before '</title>'
  14
+    # so replace it with '-DUP'
  15
+    sed 's@<title>ឃើញ​</title>@<title>ឃើញ-DUP</title>@' < "${file}" > "${dst}"
  16
+    echo ' 'wrote: ${dst}
  17
+  fi
  18
+done

0 notes on commit 0db58e3

Please sign in to comment.
Something went wrong with that request. Please try again.