File system abstraction with implementations for GCP GCS, AWS S3, Azure, SMB, HTTP, and Local file systems. Provides atomic primitives enabling multiple readers and writers.
- LocalFileSystem employs content hashing to approximate GCS Object Versioning.
- GoogleCloudFileSystem provides consistent parallel access paterns.
- S3FileSystem provides basic file system primitives.
- SMBFileSystem provides basic file system primitives.
- HTTPFileSystem provides a basic HTTP file system.
Provides file format implementations for:
- Lines
- CSV (via csv)
- JSON, ND-JSON / JSONL (via JSONStream and ndjson)
- Parquet including
streamingParquet
codec and parquetjs. - TFRecord including tfrecord-stream.
Additionally provides sharding & merging utilities.
The FileSystem
implementations require peer dependencies:
- AnyFileSystem: None. URL resolution as a
FileSystem
. Files have URLs and HTTP is a file system. - AzureBlobStorageFileSystem:
@azure/storage-blob
and@azure/identity
- AzureFileShareFileSystem:
@azure/storage-file-share
- GoogleCloudFileSystem:
@google-cloud/storage
- HTTPFileSystem:
axios
- LocalFileSystem:
fs-ext
,glob
, andglob-stream
- S3FileSystem:
aws-sdk
,s3-stream-upload
, andathena-express
- SMBFileSystem:
@marsaud/smb2
Built with the tree-stream primitives ReadableStreamTree
and WritableStreamTree
.
The project started to support @wholebuzz/archive, a terabyte-scale archive for GCS. The focus has since expanded to include powering dbcp and @wholebuzz/mapreduce with a collection of file system implementations under a common interface. The atomic primitives are only available for Google Cloud Storage and local.
import { AnyFileSystem } from '@wholebuzz/fs/lib/fs'
import { GoogleCloudFileSystem } from '@wholebuzz/fs/lib/gcp'
import { HTTPFileSystem } from '@wholebuzz/fs/lib/http'
import { LocalFileSystem } from '@wholebuzz/fs/lib/local'
import { S3FileSystem } from '@wholebuzz/fs/lib/s3'
import { readJSON, writeJSON } from '@wholebuzz/fs/lib/json'
const httpFileSystem = new HTTPFileSystem()
const fs = new AnyFileSystem([
{ urlPrefix: 'gs://', fs: new GoogleCloudFileSystem() },
{ urlPrefix: 's3://', fs: new S3FileSystem() },
{ urlPrefix: 'http://', fs: httpFileSystem },
{ urlPrefix: 'https://', fs: httpFileSystem },
{ urlPrefix: '', fs: new LocalFileSystem() },
])
await writeJSON(fs, 's3://bucket/file', { foo: 'bar' })
const foobar = await readJSON(fs, 's3://bucket/file')
node lib/cli.js ls .
node lib/cli.js --help
- appendToFile
- copyFile
- createFile
- ensureDirectory
- fileExists
- getFileStatus
- moveFile
- openReadableFile
- openWritableFile
- queueRemoveFile
- readDirectory
- readDirectoryStream
- removeDirectory
- removeFile
- replaceFile
+ new FileSystem(): FileSystem
Returns: FileSystem
▸ Abstract
appendToFile(urlText
: string, writeCallback
: (stream
: WritableStreamTree) => Promise<boolean>, createCallback?
: (stream
: WritableStreamTree) => Promise<boolean>, createOptions?
: CreateOptions, appendOptions?
: AppendOptions): Promise<null
| FileStatus>
Appends to the file, safely. Either writeCallback
or createCallback
is called.
For simple appends, the same paramter can be supplied for both writeCallback
and
createCallback
.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the file to append to. |
writeCallback |
(stream : WritableStreamTree) => Promise<boolean> |
Stream callback for appending to the file. |
createCallback? |
(stream : WritableStreamTree) => Promise<boolean> |
Stream callback for initializing the file, if necessary. |
createOptions? |
CreateOptions | Initial metadata for initializing the file, if necessary. |
appendOptions? |
AppendOptions | - |
Returns: Promise<null
| FileStatus>
Defined in: src/fs.ts:209
▸ Abstract
copyFile(sourceUrlText
: string, destUrlText
: string): Promise<boolean>
Copies the file.
Name | Type | Description |
---|---|---|
sourceUrlText |
string | The URL of the source file to copy. |
destUrlText |
string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:178
▸ Abstract
createFile(urlText
: string, createCallback?
: (stream
: WritableStreamTree) => Promise<boolean>, options?
: CreateOptions): Promise<boolean>
Creates file, failing if the file already exists.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the file to create. |
createCallback? |
(stream : WritableStreamTree) => Promise<boolean> |
Stream callback for initializing the file. |
options? |
CreateOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:155
▸ Abstract
ensureDirectory(urlText
: string, options?
: EnsureDirectoryOptions): Promise<boolean>
Ensures the directory exists
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the directory. |
options? |
EnsureDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:109
▸ Abstract
fileExists(urlText
: string): Promise<boolean>
Returns true
if the file exists.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the file to check whether exists. |
Returns: Promise<boolean>
Defined in: src/fs.ts:121
▸ Abstract
getFileStatus(urlText
: string, options?
: GetFileStatusOptions): Promise<FileStatus>
Determines the file status. The file version is used to implement atomic mutations.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the file to retrieve the status for. |
options? |
GetFileStatusOptions | - |
Returns: Promise<FileStatus>
Defined in: src/fs.ts:127
▸ Abstract
moveFile(sourceUrlText
: string, destUrlText
: string): Promise<boolean>
Moves the file.
Name | Type | Description |
---|---|---|
sourceUrlText |
string | The URL of the source file to copy. |
destUrlText |
string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:185
▸ Abstract
openReadableFile(url
: string, options?
: OpenReadableFileOptions): Promise<ReadableStreamTree>
Opens a file for reading.
optional
version Fails if version doesn't match for GCS URLs.
Name | Type | Description |
---|---|---|
url |
string | The URL of the file to read from. |
options? |
OpenReadableFileOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:134
▸ Abstract
openWritableFile(url
: string, options?
: OpenWritableFileOptions): Promise<WritableStreamTree>
Opens a file for writing.
optional
version Fails if version doesn't match for GCS URLs.
Name | Type | Description |
---|---|---|
url |
string | The URL of the file to write to. |
options? |
OpenWritableFileOptions | - |
Returns: Promise<WritableStreamTree>
Defined in: src/fs.ts:144
▸ Abstract
queueRemoveFile(urlText
: string): Promise<boolean>
Queues deletion, e.g. after DaysSinceCustomTime.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:171
▸ Abstract
readDirectory(urlText
: string, options?
: ReadDirectoryOptions): Promise<DirectoryEntry[]>
Returns the URLs of the files in a directory.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the directory to list files in. |
options? |
ReadDirectoryOptions | - |
Returns: Promise<DirectoryEntry[]>
Defined in: src/fs.ts:94
▸ Abstract
readDirectoryStream(urlText
: string, options?
: ReadDirectoryOptions): Promise<ReadableStreamTree>
Returns a stream of the URLs of the files in a directory.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the directory to list files in. |
options? |
ReadDirectoryOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:100
▸ Abstract
removeDirectory(urlText
: string, options?
: RemoveDirectoryOptions): Promise<boolean>
Removes the directory
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the directory. |
options? |
RemoveDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:115
▸ Abstract
removeFile(urlText
: string): Promise<boolean>
Deletes the file.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:165
▸ Abstract
replaceFile(urlText
: string, writeCallback
: (stream
: WritableStreamTree) => Promise<boolean>, options?
: ReplaceFileOptions): Promise<boolean>
Replaces the file, failing if the file version doesn't match.
Name | Type | Description |
---|---|---|
urlText |
string | The URL of the file to replace. |
writeCallback |
(stream : WritableStreamTree) => Promise<boolean> |
Stream callback for replacing the file. |
options? |
ReplaceFileOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:194 @wholebuzz/fs / Exports / json
- newJSONLinesFormatter
- newJSONLinesParser
- parseJSON
- parseJSONLines
- pipeJSONFormatter
- pipeJSONLinesFormatter
- pipeJSONLinesParser
- pipeJSONParser
- readJSON
- readJSONHashed
- readJSONLines
- serializeJSON
- serializeJSONLines
- writeJSON
- writeJSONLines
- writeShardedJSONLines
• Const
JSONStream: any
Defined in: src/json.ts:11
▸ Const
newJSONLinesFormatter(): Transform
Returns: Transform
Defined in: src/json.ts:146
▸ Const
newJSONLinesParser(): ThroughStream
Returns: ThroughStream
Defined in: src/json.ts:147
▸ parseJSON(stream
: ReadableStreamTree): Promise<unknown>
Parses JSON object from [[stream]]. Used to implement readJSON.
Name | Type | Description |
---|---|---|
stream |
ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown>
Defined in: src/json.ts:72
▸ parseJSONLines(stream
: ReadableStreamTree): Promise<unknown[]>
Parses JSON object from [[stream]]. Used to implement readJSON.
Name | Type | Description |
---|---|---|
stream |
ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:80
▸ pipeJSONFormatter(stream
: WritableStreamTree, isArray
: boolean): WritableStreamTree
Create JSON formatter stream.
Name | Type | Description |
---|---|---|
stream |
WritableStreamTree | - |
isArray |
boolean | Accept array objects or property tuples. |
Returns: WritableStreamTree
Defined in: src/json.ts:127
▸ pipeJSONLinesFormatter(stream
: WritableStreamTree): WritableStreamTree
Create JSON-lines formatter stream.
Name | Type |
---|---|
stream |
WritableStreamTree |
Returns: WritableStreamTree
Defined in: src/json.ts:142
▸ pipeJSONLinesParser(stream
: ReadableStreamTree): ReadableStreamTree
Create JSON parser stream.
Name | Type |
---|---|
stream |
ReadableStreamTree |
Returns: ReadableStreamTree
Defined in: src/json.ts:119
▸ pipeJSONParser(stream
: ReadableStreamTree, isArray
: boolean): ReadableStreamTree
Create JSON parser stream.
Name | Type |
---|---|
stream |
ReadableStreamTree |
isArray |
boolean |
Returns: ReadableStreamTree
Defined in: src/json.ts:110
▸ readJSON(fileSystem
: FileSystem, url
: string): Promise<unknown>
Reads a serialized JSON object or array from a file.
Name | Type | Description |
---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown>
Defined in: src/json.ts:17
▸ readJSONHashed(fileSystem
: FileSystem, url
: string): Promise<[unknown, null
| string]>
Reads a serialized JSON object from a file, and also hashes the file.
Name | Type | Description |
---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object from. |
Returns: Promise<[unknown, null
| string]>
Defined in: src/json.ts:25
▸ readJSONLines(fileSystem
: FileSystem, url
: string): Promise<unknown[]>
Reads a serialized JSON-lines array from a file.
Name | Type | Description |
---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:35
▸ serializeJSON(stream
: WritableStreamTree, obj
: object | any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSON.
Name | Type | Description |
---|---|---|
stream |
WritableStreamTree | The stream to write a JSON object to. |
obj |
object | any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:88
▸ serializeJSONLines(stream
: WritableStreamTree, obj
: any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSONLines.
Name | Type | Description |
---|---|---|
stream |
WritableStreamTree | The stream to write a JSON object to. |
obj |
any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:103
▸ writeJSON(fileSystem
: FileSystem, url
: string, value
: object | any[]): Promise<boolean>
Serializes object or array to a JSON file.
Name | Type | Description |
---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to serialize a JSON object or array to. |
value |
object | any[] | The object or array to serialize. |
Returns: Promise<boolean>
Defined in: src/json.ts:44
▸ writeJSONLines(fileSystem
: FileSystem, url
: string, obj
: object[]): Promise<boolean>
Serializes array to a JSON Lines file.
Name | Type | Description |
---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to serialize a JSON array to. |
obj |
object[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:53
▸ writeShardedJSONLines(fileSystem
: FileSystem, url
: string, obj
: object[], shards
: number, shardFunction?
: (x
: object, modulus
: number) => number): Promise<boolean>
Name | Type |
---|---|
fileSystem |
FileSystem |
url |
string |
obj |
object[] |
shards |
number |
shardFunction |
(x : object, modulus : number) => number |
Returns: Promise<boolean>
Defined in: src/json.ts:57