New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with sed command when running on Mac #4

Closed
tbielawa opened this Issue Sep 23, 2013 · 8 comments

Comments

Projects
None yet
2 participants
@tbielawa
Copy link
Owner

tbielawa commented Sep 23, 2013

Description:
Unable to successfully run generatedeck.sh on Mac OS X.

Expected results:
The XML Mnemosyne deck files in decks/ are regenerated when I delete them and then run generatedeck.sh.

Observed results:
Only the XML wrapper content is regenerated.

Test setup:

<tbielawa>@(skillet)[~/Projects/HamDecks] 08:52:52
$ rm -f decks/{Technician,General,Extra}.xml

<tbielawa>@(skillet)[~/Projects/HamDecks] 08:53:11
$ ls -l decks/*.xml
-rw-r--r--  1 tbielawa  staff  186 Sep 22 20:52 decks/schemas.xml

<tbielawa>@(skillet)[~/Projects/HamDecks] 08:53:22
$ git log -1 | head -n1
commit ca1250f793de10673b9b0a1ed909dcaba76b0cdb

<tbielawa>@(skillet)[~/Projects/HamDecks] 08:53:27
$ uname -a
Darwin skillet.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 i386

<tbielawa>@(skillet)[~/Projects/HamDecks] 08:53:46
$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   decks/Extra.xml
#       modified:   decks/General.xml
#       modified:   decks/Technician.xml
#
no changes added to commit (use "git add" and/or "git commit -a")

Test Procedure:

<tbielawa>@(skillet)[~/Projects/HamDecks] 08:53:29
$ ./generatedeck.sh 
sed: 1: "s/#/’/g
": RE error: illegal byte sequence
sed: 1: "s/#/’/g
": RE error: illegal byte sequence
sed: 1: "s/#/’/g
": RE error: illegal byte sequence

<tbielawa>@(skillet)[~/Projects/HamDecks] 08:53:32
$ cat decks/{Technician,General,Extra}.xml
<?xml version="1.0" encoding="utf-8"?>
<mnemosyne core_version="1">
</mnemosyne>
<?xml version="1.0" encoding="utf-8"?>
<mnemosyne core_version="1">
</mnemosyne>
<?xml version="1.0" encoding="utf-8"?>
<mnemosyne core_version="1">
</mnemosyne>
@tbielawa

This comment has been minimized.

Copy link
Owner

tbielawa commented Sep 23, 2013

Some quick search for the phrase os x sed ": RE error: illegal byte sequence suggests that this is a known issue with the implementation of sed in OS X Lion

  1. Homebrew/homebrew-dupes#21
  2. https://groups.google.com/forum/?fromgroups#!topic/vim_dev/Bb6PAdwOpTc
  3. http://stackoverflow.com/questions/5709540/sed-unable-to-execute-some-commands-on-utf-8-encoded-chars

There is one simple solution offered in reference 3 above (source):

iconv -f latin1 -t utf-8 sourcefile | sed 's/.*/x/' | iconv -f utf-8 -t latin1

However I don't think we necessarily need the final conversion back to latin1 from utf-8.

I'm going to give this a shot and see how it goes.

@tbielawa

This comment has been minimized.

Copy link
Owner

tbielawa commented Sep 23, 2013

Test of using the iconv utility prior to invoking sed was unsuccessful:

Diff:

$ git diff generatedeck.sh 
diff --git a/generatedeck.sh b/generatedeck.sh
index 758a8f3..8a6ad39 100755
--- a/generatedeck.sh
+++ b/generatedeck.sh
@@ -32,7 +32,7 @@ for DECK in Technician General Extra; do
   TMP=tmp/$DECK.txt
   OUT=decks/$DECK.xml
   rm -f $TMP $OUT
-  scripts/stripchars.sh < $IN > $TMP
+  iconv -f latin1 -t utf-8 $IN | scripts/stripchars.sh > $TMP
   echo "<?xml version=\"1.0\" encoding=\"utf-8\"?>" >> $OUT
   echo "<mnemosyne core_version=\"1\">" >> $OUT
   awk -f scripts/parse-categorys.awk $TMP >> $OUT

Test and output:

$ ./generatedeck.sh
sed: 1: "s/�/’/g
": RE error: illegal byte sequence
sed: 1: "s/�/’/g
": RE error: illegal byte sequence
sed: 1: "s/�/’/g
": RE error: illegal byte sequence
@tbielawa

This comment has been minimized.

Copy link
Owner

tbielawa commented Sep 23, 2013

Also in reference 3 people suggest using PERL with some extra options. I don't know if I'm ready to start writing PERL though.

@gwillen are you interested in checking this out?

@gwillen

This comment has been minimized.

Copy link
Collaborator

gwillen commented Sep 23, 2013

Can I get the output of 'env' on your machine? Specifically the LANG and LC_* variables (and anything that looks related).

The purpose of "LANG=C" in the stripchars script is to fix this exact error ("illegal byte sequence"), but I bet you have different locale settings than I do, and maybe there are more locale-oriented variables that have to be set or unset for this to work. (I just copied the first thing I found for "illegal byte sequence", and that was enough to fix it on my mac.)

@tbielawa

This comment has been minimized.

Copy link
Owner

tbielawa commented Sep 23, 2013

Oh crap. I meant to do this before I left for work today. I'll try and remember to get that output during lunch.

@tbielawa

This comment has been minimized.

Copy link
Owner

tbielawa commented Sep 23, 2013

BACKUPS=/Users/tbielawa/Backups
DEBEMAIL=tbielawa@csee.wvu.edu
DEBFULLNAME=Timothy Bielawa (Shaggy)
DEVEDITOR=emacs -nw
EDITOR=emacs -nw
HISTCONTROL=ignoreboth
HOME=/Users/tbielawa
LANG=en_US.utf8
LOGNAME=tbielawa
MAIL=/var/mail/tbielawa
PATH=/opt/local/bin:/opt/local/sbin:/opt/local/bin:/opt/local/sbin:/Users/tbielawa/bin:/Users/tbielawa/LCSEE/bin:/Users/tbielawa/.bin/:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/X11/bin:/usr/X11/bin:/usr/local/MacGPG2/bin
PRINTER=ps701esb
PWD=/Users/tbielawa
QUILT_PATCHES=debian/patches
SHELL=/bin/bash
SHLVL=1
SSH_AUTH_SOCK=/tmp/ssh-PdP3JMKmNR/agent.254
SSH_CLIENT=192.168.1.30 36448 22
SSH_CONNECTION=192.168.1.30 36448 192.168.1.7 22
SSH_TTY=/dev/ttys000
SVN=https://svn.lcsee.wvu.edu/loud
TERM=xterm-256color
TMPDIR=/var/folders/v7/vxd2gz4d5jj6gv6s4kt31cgr0000gn/T/
USER=tbielawa
_=/usr/bin/env
preseeds=/usr/share/doc/lbmbuilder/examples/preseed

And sed stuff:

<tbielawa>@(skillet)[~] 01:07:08
$ which sed
/usr/bin/sed
@gwillen

This comment has been minimized.

Copy link
Collaborator

gwillen commented Sep 24, 2013

Huh, that's odd; you don't have any other locale-oriented environment vars aside from LANG, which I thought I already took care of.

What version of OS X are you on (just out of curiosity, not because it will tell me anything useful)?

I may take the advice of my friends; when I told them I was using sed to translate between character sets, they pointed out that python can almost surely do it in one line. I don't know enough python to know what that line is, but I can figure it out.

@gwillen

This comment has been minimized.

Copy link
Collaborator

gwillen commented Sep 24, 2013

Oh, also, I'm gwillen on freenode; I looked for you today but didn't see you on. Say hi anytime, though my IRC session is inside a screen session, so I may not see it until I reattach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment