Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS: Umlauts in filenames: cat: ö: No such file or directory #316

Closed
rustikus opened this issue Mar 21, 2017 · 15 comments
Closed

MacOS: Umlauts in filenames: cat: ö: No such file or directory #316

rustikus opened this issue Mar 21, 2017 · 15 comments

Comments

@rustikus
Copy link

rustikus commented Mar 21, 2017

Hi,

I am using encfs on macOS installed via homebrew (Build: encfs version 1.9.1) on top of FUSE 3.5.5.

I experience some very strange issues while using filenames with umlauts on an encfs mounted filesystem. If I only use the terminal everything works as expected:

echo ä > ä

cat ä
ä

The problem occurs when I try to use a GUI program like Finder or TextEdit to access this file. It is simply not possible to open it.

If I now create a file 'ö' with TextEdit or a folder named 'ö' with Finder, everything is working fine if I only use these programs. But if I want to access them via Terminal I receive a "No such file or directory" error message.

cat ö
cat: ö: No such file or directory

ls -la
-rw-r--r--@   1 username  staff     2B Mar 21 11:36 ö

Did I miss something in my configuration to make it work on macOS?

I only use the options '-o volname=NAME -o allow_other' for mounting the folder.

Thanks

@benrubson
Copy link
Contributor

Does it correctly work without being in an encfs directory ?

@rustikus
Copy link
Author

rustikus commented Mar 22, 2017

Yes, everything is working fine without encfs.

I just created a new encrypted folder with the default configuration.

$ mkdir .crypt 
$ mkdir crypt 

$ encfs -f .crypt crypt   
Creating new encrypted volume.
Please choose from one of the following options:
 enter "x" for expert configuration mode,
 enter "p" for pre-configured paranoia mode,
 anything else, or an empty line will select standard mode.
?>

Standard configuration selected.

Configuration finished.  The filesystem to be created has
the following properties:
Filesystem cipher: "ssl/aes", version 3:0:2
Filename encoding: "nameio/block32", version 4:0:2
Key Size: 192 bits
Block Size: 1024 bytes
Each file contains 8 byte header with unique IV data.
Filenames encoded using IV chaining mode.
File holes passed through to ciphertext.

Now you will need to enter a password for your filesystem.
You will need to remember this password, as there is absolutely
no recovery mechanism.  However, the password can be changed
later using encfsctl.

New Encfs Password:
Verify Encfs Password:

The same error occurs with the new encfs folder so it seems not to be related to my configuration of .encfs6.xml.

@rfjakob
Copy link
Collaborator

rfjakob commented Mar 25, 2017

That may be related to Unicode normalisation that HFS normally does, but of course won't be able to do when the filename is encrypted. APFS seems to stop doing that: http://mjtsai.com/blog/2017/03/24/apfss-bag-of-bytes-filenames/

@benrubson
Copy link
Contributor

By default Terminal uses the C locale.
Did you try forcing UTF8 locale in Terminal ?
uni

Then when you open a new Terminal window, and type locale, you should see :
LC_CTYPE="UTF-8"

Then redo your tests ?

@rustikus
Copy link
Author

Thanks for the update.

I do have set my locale to UTF8.

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

I am using iTerm2 as terminal application and everything is set to UTF8.

screen shot 2017-03-26 at 19 41 17

Does the APFS change also apply to the HFS filesystem in macOS? I am not sure if I really understand the implications for macOS.

Thanks!

@RokerHRO
Copy link

RokerHRO commented May 5, 2017

Mac OS' HFS+ stores filenames in UTF-8, but in the quite uncommon NFD normalization.

To see whether it is an encoding / Unicode normalization problem can you create a file "foo_ö.txt" and do ls foo_* | hexdump -C and look at the output. On my system the output is this:

00000000  66 6f 6f 5f 6f cc 88 2e  74 78 74 0a              |foo_o...txt.|
0000000c

and you can see: the "ö" (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) is encoded as UTF-8, normalized in NFD: 6f cc 88. That means, it is "decomposed" into a U+006F LATIN SMALL LETTER O followed by 2 octets CC 88, which are the UTF-8-encoding of U+0308 COMBINING DIAERESIS.

That normalization into NFD is done by the HFS+ filesystem driver. So on other filesystems this is not done and the filename is stored "as-is", normally in UTF-8 NFC form (that would encodes the "ö" into the 2 octets C3 B6).

A ls *ö* would also not find that file above.

@rustikus
Copy link
Author

rustikus commented May 8, 2017

It seems there is a difference if I create the files via terminal or a GUI application and if it is an encfs mounted drive. It seems not to be an issue on the normal file system.

I did create a file via Terminal: foo_ö.txt and another file with macOS TextEdit : foo_ä.txt. This is what happens if I use hexdump:

ENCFS mount share

$ touch foo_ö.txt

$ ls foo_* | hexdump -C 
ls: foo_ä.txt: No such file or directory
00000000  66 6f 6f 5f c3 b6 2e 74  78 74 0a                 |foo_...txt.|
0000000b

Normal file system

$ touch foo_ö.txt

$ ls foo_* | hexdump -C 
00000000  66 6f 6f 5f c3 a4 2e 74  78 74 0a 66 6f 6f 5f c3  |foo_...txt.foo_.|
00000010  b6 2e 74 78 74 0a                                 |..txt.|
00000016

@rustikus
Copy link
Author

Just a small update. I somehow "solved" the problem by using a specific mount option (iconv) for fuse.

-o modules=iconv,from_code=UTF8-MAC,to_code=UTF8

With this option I am able to mount the share and create files with terminal and GUI applications. It is strange as to my understanding I should not need the conversion. Especially because it is the other way around.

ENCFS mount share and normal file system

$ touch foo_ü.txt

$ ls foo_ü* | hexdump -C
00000000  66 6f 6f 5f c3 bc 2e 74  78 74 0a                 |foo_...txt.|

@benrubson
Copy link
Contributor

@RokerHRO is right, is faced the same "issue" with another development.
We should make on-the-fly NFC <-> NFD conversion on Mac OS environment.
This could be a starting point :
https://stackoverflow.com/questions/15906344/osx-and-c-unicode-conversion-from-nfd-to-nfc

We would however have to be really careful with implementation, for example I'm not sure reverse mode would need any conversion.

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 20, 2017

I'd rather not have charset conversion inside encfs. The iconv workaround looks good enough.

@benrubson
Copy link
Contributor

Yep, this is a nice solution too.
Perhaps we could add this workaround to the default options for Mac OS.

@RokerHRO
Copy link

Beware: The conversion between NFC & NFD is not trivial and there are some subtile but nasty corner cases! Especially when you got some string from the user that might not be normalized, yet, if you convert it to NFD (to give it to Mac OS FS layer) and than back to NFC you might get a result that is different from the original.

@samrocketman
Copy link
Collaborator

What about like @benrubson suggests? Detect platform and if MacOS check if the user defined the -o option else fall back to defaulting recommended options for OS X like:

-o modules=iconv,from_code=UTF8-MAC,to_code=UTF8

@rfjakob rfjakob changed the title Umlauts in filenames MacOS: Umlauts in filenames: cat: ö: No such file or directory Jul 23, 2017
@rfjakob
Copy link
Collaborator

rfjakob commented Jul 23, 2017

No, sorry, I don't want to add MacOS hacks to EncFS. The simple reason is that it's a maintanance nightmare. I have no way of testing it against different MacOS versions.

Maybe this is something homebrew could patch in their version.

@rfjakob rfjakob closed this as completed Jul 23, 2017
@samrocketman
Copy link
Collaborator

Makes sense @rfjakob.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants