Google FS based distributed file system written by Tobiáš Potoček and Janak Dahal as a term project for CSCI 6450 at the University of New Orleans.
- Vaguely based on Google File System architecture
- Written completely in Java
- Fully JUnit tested
- Fault-tolerance: able to survive crash of any single computer within the cluster without data loss.
- One master server that holds the metadata.
- One shadow server that mirrors the master server.
- Couple of chunk servers that hold the actual data stored in chunks.
- Synchronized by Apache Zookeeper
- Dedicated clients
- Each file is divided into chunks of a fixed size (currently 16MB).
- Each chunks stored on at least two chunk servers (to maintain the fault tolerance guarantee)
- Performance-wise, the file system is capable of transferring data on a local 1 Gb network at rates up to 900 Mb/s.
Do you want to know more? Check out our final presentation that describes in detail individual components of the filesystem.
The file system is bundled with a simple command line client that exposes basic functionality to the user (writing and reading files from the file system).
- Download Zookeeper (tjfs is tested with the 3.4.6 version)
- Unpack the archive, go to the
conf
folder and renamezoo_sample.cfg
tozoo.cfg
- Go to the
bin
folder and under root runzkServer.sh
. You can startzkCli.sh
as well to verify that the Zookeeper server is up and running. More details about this procedure can be found in the project's documentation. - Download tjfs codebase to all machines where you want to run the file system.
- Run
mvn package
to build the code. Java JDK and Maven are required for that. - Go to the
bin
folder and runchunkServer.sh
on every machine where you want to run a chunk server. As the first argument, you can pass Zookeeper connection string (ip:port), as the second the port, on which this chunk server should run, and the last argument defines a folder where the chunk server should store the data. By default, the Zookeeper is expected to run on localhost, chunk server will be running on the port6003
and the data will be stored to thechunks
folder in the root of the code base. If you're running multiple chunk servers on a single machine, don't forget to define different ports and different folders. Don't forget to create the folder before running the filesystem. - Got to the
bin
folder again and runmasterServer.sh
in at least two separate instances. The arguments are exactly the same as forchunkServer.sh
. The important thing is to provide the correct Zookeeper address. After that the servers will find each other. - Go to the
bin
folder once again and runclient.sh
with the Zookeeper address as the first argument. You should be successfully connected and you can start using the filesystem.
Basic commands:
put /path/to/a/local/file /path/to/a/remote/file
write a file to a remote destinationget /path/to/a/remote/file /path/to/a/local/file
get a file from the file systemlist /remote/directory
list all files in a remote directorydelete /remote/file
delete file from the file system
Folders are supported only virtually through the structured paths, it is neither possible nor required to create or delete folders.
Disclaimer: This is a proof-of-concept school project. It is not meant by any means to be used in production. Also check out the presentation for any limitation that the current version has.
Tjfs provides a nice and clean API that can be used in your own application. The API is completely stream-based which means that you are not limited to file operations (you can for example generate data on-the-fly and immediately write them to tjfs). Sample code:
Machine zookeeper = Machine.fromString("127.0.0.1:2181");
TjfsClient tjfsClient = TjfsClient.getInstance(new Config(), zookeeper);
InputStream is = tjfsClient.get(Paths.get("/path/to/remote/file"));
The API is defined and described in [ITjfsClient] (src/main/java/edu/uno/cs/tjfs/client/ITjfsClient.java) interface. The actual implementation is in TjfsClient and for testing purposes you can use a mocked version [DummyTjfsClient] (src/main/java/edu/uno/cs/tjfs/client/DummyTjfsClient.java) that mimics the whole file system using a simple in-memory storage.