listdir should take directories into account #27

Open
tbielawa opened this Issue Aug 14, 2014 · 2 comments

Projects

None yet

1 participant

@tbielawa
Owner

Directories are like files.... kinda...

listdir should return tuples for those too.

Questions:

  • Return the size of the symbolic link to a directory

or

  • Return the size of the dereferenced link

?

@tbielawa tbielawa added the bug label Aug 14, 2014
@tbielawa
Owner

Fix this in 1.1.0

@tbielawa tbielawa added this to the 1.1.0 milestone Aug 14, 2014
@tbielawa
Owner

NB this discussion is probably unrelated, but kind of cool to have discovered

Just did a quick experiment, comparing the size of links with the length of the path (in characters/bytes) the link points to.

System under test:

$ cat /etc/fedora-release
Fedora release 20 (Heisenbug)

$ mount | grep 'home'
/dev/foo_home on /home type ext4 (rw,relatime,data=ordered)

$ uname -a
Linux foo.com 3.15.6-200.fc20.x86_64 #1 SMP Fri Jul 18 02:36:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ echo $LANG
en_US.utf8
  • Fedora 20, x86_64
  • ext4 filesystem
  • Linux 3.15.200
  • A UTF locale is being used by the console

Tests

stat applied to a symbolic link (which contains only characters matching [a-zA-Z]:

$ mkdir simpledir
$ ln -s simpledir sd

$ file simpledir sd
simpledir: directory
sd:        symbolic link to `simpledir'

$ stat -c '%s' simpledir sd
40
9

stat applied to a symbolic link (which contains characters matching [a-zA-Z] and a single → character (U+2192 RIGHTWARDS ARROW):

$ mkdir foo→bar  
$ ln -s foo→bar fb

$ file foo→bar fb
foo→bar: directory
fb:      symbolic link to `foo→bar'

$ stat -c '%s' foo→bar fb
40
9

Compare to the following, which swaps the arrow character with a hyphen character

$ mkdir foo-bar                
$ ln -s foo-bar f-b      

$ file foo-bar f-b       
foo-bar: directory
f-b:     symbolic link to `foo-bar'

$ stat -c '%s' foo-bar f-b 
40
7

Results Summary

What this shows us is:

The size of the symbolic link is close to the length of the name of the path the link points to. Specifically, the number of bytes required to store the link target in memory. This is why link targets with plain ascii characters demonstrate the property where the number of characters in the destination is equivalent to the number of bytes required to store that path string in memory. This is also why links with UTF characters in the target measure larger than the number of characters in the link name. UTF characters require more space to store in memory.

  • In the first 'foobar' example, the size in bytes of the symbolic link is 9 bytes.
  • In the second 'foobar' example, the size in bytes of the symbolic link is 7 bytes

I believe the size of 'foo→bar' and 'foo-bar' being exactly 2 bytes different has to do with the space required to store a UTF-8 character. I cannot find exact literature to support this assertion at this time. But it's what I recall previously reading.

Using http://mothereff.in/byte-counter I entered the two following values:

  • a

The byte-counter says that the former is 1 byte, and the latter is 3 bytes.

@tbielawa tbielawa modified the milestone: 1.1.0, hack day Aug 15, 2014
@tbielawa tbielawa modified the milestone: hack day, 1.1.0 Aug 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment