To access a file, the file system needs to translate the file's name to its number. for example
/home/raz/Code
might get translated to 690761
directories map human readable names to these numbers. We want to group directories hierarchically so that users can group related files.
Implementing this is simple. We will use files to store directories. So if the OS has the file number of a directory it can open up the directory file and scan through the filename/number pairs until it finds the right one.
For example
if we are looking for foo.txt
we scan the directory and find it has file number 871
This can be recursive
The file number for the directory /home/raz can be found by looking up raz in the home directory. and the directory home can be found by looking at the root directory.
But recursion needs a base case and that base case is the root directory. The solution is to agree on the file number of the root directory ahead of time. For Unix and Linux systems they use 2 as the file number of the root directory.
So to find /home/raz/Code/cs405/fib.c
we look at the directory at file number 2. scan the directory for raz
go to that file number and repeat.
A QUESTION: since directories are stored as files can we use the same API. The same System Calls to access them (open/close/read/write
)?
No. we don't want applications to modify directories directly. We want those to be under control of the OS. So we have special commands. create/mkdir/link/unlink/rmdir
Many early implementations stored linear lists of file name/ file number in a linked list. (for example Linux ext2
)
Works well, but when you have 1,000s of files in a directory -- not so well.
Many recent file systems store directory contents in a tree. (Linux XFS, Microsoft NTFS, Oracle's ZFS)
For example, we can use a B+ tree.
In Linux XFS, B+ tree. It hashes the filename
Process for finding file number of file name
- first compute hash of name (0x0000c45e)
- start at the B+ tree root search for that hash(root at some well known offset in the directory file
- at each level a tree node contains an array of hash key, file offset pairs that represent a pointer to the child node.
Now we can understand hard and soft links. hard links are multiple file directories entries that map different path names to the same file number. Because the file number can appear in multiple directories, file systems must ensure that a file is deleted when the last hard link is removed. To implement this we maintain a reference count. When the reference count is zero we cn remove the file.
Soft links map one file name to another name
Once a file system has translated a file name to a file number the file system must be able to find the blocks associated with that file.
dates from the late 1970s. The file allocation table is an array of 32 bit entries in a reserved area of the volume. Each file in the file system corresponds to a linked list of FAT entries. Each entry contains a pointer to the next entry or an end-of-file marker.
Suppose we have
- one file foo.txt that has a file number of 9
- another bar.txt that has a file number of 12
- 10GB drive 1KB blocks (answer 10M entries = 40MB)
- 1TB drive 1KB blocks (answer 1B entries = 8GB)
math
10GB = 10,000,000,000 / 1000 = 10,000,000 = 10M entries
2^x = 10M? = 2^24 so I would say 30MB
- sequential file read
- random read
- reliability. (no transactions)
can make block size 256k but wastes a lot of space
uses an asymmetric tree called a multilevel index as illustrated:
- Each file is a tree with fixed sized blocks (e.g., 4k)
- each file's tree is rooted in an inode that contains metadata (file owner, permissions, creation time, modification time, whether directory or not)
- the inode also contains an array of pointers for locating the files data blocks.Typically the inode contains 15 pointers.
- The first 12 are direct pointers., that point directly to the first 12 data blocks of a file.
- The 13th pointer is an indirect pointer. points to a node of the tree called the indirect block.
- The indirect block contains an array of direct pointers. (with 4KB blocks and 4-byte pointers an indirect block can contain 1024 pointers, which allows for files up to 4MB)
- The 14th pointer is a double indirect pointer. Points to the double indirect block containing an array of indirect pointers pointing to indirect blocks. so 1024^2 blocks of data
- The 15th pointer is a triple indirect pointer. 1024^3 blocks of data. (4TB file)
a file's number called an inumber is an index into the inode array.
To open a file,
- in the directory find the inumber associated with the file number
- use the inumber to go to the inode
- find the metadata
Each inode contains
- 12 direct pointers
- 1 indirect
- 1 double indirect
- 1 triple indirect
- 1 quadruple indirect
- assume 4KB blocks and 8-byte pointers .
- What is the maximum file size for UMWFS?
Answer
- 12 direct pointers (48K)
- 1 indirect. 512 pointers per block = 2^9 so 2^9 2^12 blocks = 2^21 = 2MB
- 1 double indirect. 1GB
- 1 triple indirect. 512GB
- 1 quadruple indirect. 256.5TB
tree with extents
similar to the Fast File System. There is an MFT master file table. each entry is 1KB and stores a sequence of variable sized attribute records that can store either data or metadata.
sometimes attribute is small can can be stored directly (resident)
other times too large to fit (non resident) stores a pointer.
this illustrates a large file with multiple extents. a multi-GB file can be stored in a few extents if the blocks are contiguous
for a small file the data can be stored within the MFT entry
3 common metadata attributes include
- standard information file's creation time, owner ID, security specifier, read only etc.
- file name
- attribute list the metadata may be larger than what an MFT record can hold.SO it spans multiple records. the attribute list indicates which attributes are in which record.
when updating an existing file a COW file system will not overwrite the existing data or meta data, instead, they write a new version to a new location.
With non COW systems ---To make a change to a file -- writing a C source code file ---might require 3 changes
- Inode - change file size
- pointer block to add a new block
- finally the new block
- these small writes are expensive. particularly as new tech. faster drives etc
- small writes are expensive on RAID.
- small writes are expensie on SSD - any write requires clearing a large erasureblock.
in a traditional system a file's indirect nodes and data blocks can be anywhere on disk. given a file's inumber we can find its inode in a fixed location on disk
In the COW version, we don't want to overwrite inodes in place so we make them mobile. We store them in a file rather than in a fixed array. But now we need to find them.
One way is to have the root inode in a fixed location. but even the root inode we want copy on write.
keep a small array of root inodes each having a monotonically increasing version number and a checksum. If a crash occurs during the write of one, we use the newest inode with a correct checksum.
when we update a block we write it and all of the blocks on the path from it to the root to new locations.