Asynchronous rule-based file system walker. It:
- does things most regexp-based walkers hardly can;
- uses super simple rule definitions;
- handles most file system errors by default;
- provides powerful extendable API;
- runs real fast.
This is what a simple demo app does on my old 2,7 GHz MacBook Pro:
The version 6 is hugely different from its ancestors.
The further text describes the usage, API and version history.
NB: This package needs Node.js v12.12 or higher.
Install with yarn or npm
yarn add dwalker ## npm i -S dwalker
The following code walks all given directory trees in parallel, gathering basic statistics:
const walker = new (require('dwalker')).Walker()
const dirs = '/dev ..'.split(' ')
Promise.all(dirs.map(dir => walker.walk(dir))).then(res => {
console.log('Done(%d):', res.length)
}).catch(error => {
console.log('EXCEPTION!', error)
}).finally(() => {
console.log(walker.stats)
})
// -> Done(1): { dirs: 8462, entries: 65444, errors: 2472, retries: 0, revoked: 0 }
// -> Elapsed: 1012 ms
The Walker#walk()
method recursively walks the directory tree width-first.
It scans all directory entries, invoking the handler functions as it goes,
keeping track of its internal rules tree.
For speed, all this is done asynchronously.
Please have a glance at its core concepts, if you haven't done so already.
Contents: package exports, Walker, common helpers, special helpers, rule system
Types referred to below are declared in src/typedefs.js.
The most of the magic happens here. For details, see: methods, properties, class/static API, protected API, and exceptions handling.
constructor
(options : {TWalkerOptions})
avoid : string | strig[]
- theavoid()
instance method will be called.interval : number=
- instance property setting.rules : *
- rule definitions, or aRuler
instance to be cloned.symlinks : boolean=
- enable symbolic links checking byonEntry()
handler.
Walker instance stores given (even unrecognized) options in private _options
property.
See the separate description
of onDir()
, onEntry()
and onFinal()
handler methods.
avoid
(...path) : Walker
- method
Injects the paths into visited
collection thus preventing them from being visited.
The arguments must be strings or arrays of strings - absolute or relative paths.
getDataFor
(dirPath) : *
- method<br/ >
For accessing the data in the internal dictionary. Empty entries are created there before calling
the onDir()
handler. The Walker
itself does not use those values.
getOverride
(error) : number
- method
Returns an overriding action code (if any) for the current exception and its context.
The Walker
calls this method internally and assigns its numeric return value
to error.context.override
before calling its onError()
method. A non-numeric return value
has no effect. Instead of overriding this method, you can directly modify the
overrides export of the package.
onError
(error: Error, context: TDirContext) : *
- method
Called with trapped error after error.context
has been set up.
Default just returns error.context.override
.
Returned action code will be checked for special values; a non-numeric return means this
was an unexpected error rejecting the walk promise.
The Walker
may provide the following context.locus
values:
'onDir', 'openDir', 'iterateDir', 'onEntry', 'closeDir', 'onFinal'
.
Overriding handlers may define their own locus names.
reset
([hard : boolean]) : Walker
- method
Resets a possible STC. In a hard case, it resets all internal state properties,
including those available via stats
.
Calling this method during walk throws an unrecoverable error.
tick
(count : number)
- method
Called during walk automatically. Default does nothing.
Override this for progress monitoring etc.
trace
(handlerName, result, context, args)
- method
Called right after every handler call. Use this for debugging only!
Default is an empty function.
walk
(startPath : string, [options : TWalkOptions]) : Promise
- method
Walks the walk. The startPath
may be any valid pathname defaulting to process.cwd()
.
Via options
you can override trace()
method, any handler methods, as well as
data
and ruler
instance properties.
The promise resolves to data
, to non-numeric return value from a handler or
rejects to unexpected error instance.
duration
: number
- microseconds elapsed from start of the current walk batch
or duration of the most recent batch.
failures
: Error[]
- any exceptions overridden during a walk.
The Error
instances in there will have a context : TDirContext
property set.
ruler
: Ruler
- initial ruler instance for a new walk.
stats
: Object r/o
- general statistics as object with numeric properties:
dirs
- number of visited directories;entries
- number of checked directory entries;errors
- number of exceptions encountered;retries
- number of operation retries (e.g. in case of out of file handles);revoked
- number of directories recognized as already visited (may happen withsymlinks
option set);walks
- number of currently active walks.
walks
: number r/o
- number of currently active walks.
All those are directly available via the package exports.
newRuler
(...args) : Ruler
- factory method.
overrides
: Object
- error override rules as a tree:
( locus -> error.code
-> actionCode ).
shadow
: atring[]
- mask for omitting certain parts of context parameter,
before injecting it to Error instance for logging.
Is described in a separate document.
The good news is: whatever will happen during a walk, the Walker
instance won't throw
an exception!
If an exception occurs and there is an override defined for it, a new entry will be added to the failures instance property, and the walk will continue.
Without an override defined, however, we'll have an unexpected exception.
In this case, the walk will terminate with an augmented Error
instance via rejection,
and the example program above would output something like this:
EXCEPTION! TypeError: Cannot read property 'filter' of undefined
at ProjectWalker.onDir (/Users/me/dev-npm/nsweep/lib/ProjectWalker.js:111:38)
at async doDir (/Users/me/dev-npm/nsweep/node_modules/dwalker/src/Walker.js:491:15)
context: {
depth: 0,
dirPath: '/Users/me/dev-npm/nsweep',
done: undefined,
locus: 'onDir',
rootPath: '/Users/me/dev-npm/nsweep',
override: undefined
}
}
An error stack combined with a walk context snapshot should be enough to spot the bug.
Those helpers are available via package exports and may be useful on writing handlers.
checkDirEntryType
(type : TEntryType) : TEntryType
- function
returns the argument if it is a valid type code; throws an assertion error otherwise.
dirEntryTypeToLabel
(type : TEntryType, [inPlural : boolean]) : string
- function
returns human readable type name for valid type; throws an assertion error otherwise.
makeDirEntry
(name : string , type : TEntryType, [action : number]) : TDirEntry
- function
constructs and returns a ned directory entry with action
defaulting to DO_NOTHING
.
makeDirEntry
(nativeEntry : fs.Dirent) : TDirEntry
- function
returns a new directory entry based on
Node.js native one.
To use those helpers, load them first, like:
const symlinksFinal = require('dwalker/symlinksFinal')
pathTranslate
(path, [absolute]) : string
function.
Translate the path
from POSIX to native format, resolves the
leading '~' to user home directory. If absolute
is on, then
makes the path absolute, always ending with path separator.
relativize
(path, [rootPath, [prefix]]) : string
function.
Strips the rootPath
(defaulting to homeDir
)part from given path
, if it is there.
Optional prefix
string will be applied to resulting relative path.
May help to make some reports easier to read.
relativize.homeDir
: string
- initialized to current user's home directory.
symlinksFinal
(entries, context) : *
async handler.
Use it inside onFinal
handler for following the symbolic links.
Example:
const onFinal = function (entries, context) {
return this._useSymLinks
? symlinksFinal.call(this, entries, context) : Promise.resolve(0)
}
The main goal here was to keep rules simple (atomic), even when describing context-sensitive rules and special exclusions.
Rule definitions are tuples (action-code, {pattern})
,
quite similar to bash glob patterns or .gitignore rules. Example:
ruler.add(
DO_SKIP, '.*', '!/.git/', 'node_modules/', 'test/**/*',
11, 'package.json', '/.git/', '/LICENSE;f', '*;l')
Here the first rule tells to ignore the dreaded node_modules
directory and
any entries starting with '.', except the top-level .git
directory. Also, nothing
under the test
directory, where ever found, will count. The trailing '/'
indicates the directory.
The second rule asks for some sort of special care to be taken for all package.json
entries with no regard to their type, for top-level .git
directory, for top-level
LICENSE
file and for all symbolic links. And, yes, the .weirdos/package.json
will be ignored.
Without explicit type, all rules created are typeless or T_DIR
('d').
Explicit type must match one in S_TYPES
constant.
Behind the scenes, a Ruler
instance creates and interprets a rule tree
formed as an array on records
(type, expression, ancestorIndex, actionCode)
.
For the above example, the Ruler
dump would be like:
node typ regex parent action
-----+---+-----------------------+-------------
0: 'd' null, -1, DO_NOTHING,
1: ' ' /^\./, 0, DO_SKIP,
2: 'd' /^\.git$/, -1, -DO_SKIP,
3: 'd' /^node_modules$/, 0, DO_SKIP,
4: 'd' /^test$/, -1, DO_NOTHING,
5: 'd' null, 4, DO_NOTHING,
6: ' ' /./, 5, DO_SKIP,
7: ' ' /^package\.json$/, 0, 11,
8: 'd' /^\.git$/, -1, 11,
9: 'f' /^LICENSE$/, -1, 11,
10: 'l' /./, 0, 11,
_ancestors: [ [ 0, -1 ] ]
The internal ancestors
array contains tuples (actionCode, ruleIndex)
.
The Ruler#check()
method typically called from Walker#onEntry()
finds
all rules matching the given entry (name, type)
and fills in the
lastMatch array, analogous to ancestors array. Then it returns the most
prominent (the highest) action code value. The DO_SKIP
and other system action codes
prevail the user-defined codes simply because they have higher values.
A negative value screens the actual one. Do not use negative values in rule definitions - the ruler will do this for you, when it encounters a pattern starting with '!'.
The sub-directories opened later will inherit new Ruler
instances with ancestors
set to lastMatch
contents from the upper level.
So, the actual rule matching is trivial, and the rules can be switched dynamically.
For further details, check the
Ruler
reference and
the special demo app.
- v6.0.0 @20201225
- cleaned code and API (breaking changes) after using
dwalker
in some actual projects, so the basic use cases are clear now. As the general concepts persist, migration sould not be a major headache and reading the updated core concepts should help.
- cleaned code and API (breaking changes) after using
- v5.2.0 @20201202
- added: Walker#getOverride instance method.
- v5.1.0 @20201121
- removed: hadAction(), hasAction() Ruler instance methods.
- v5.0.0 @20201120
- Walker totally re-designed (a breaking change);
- Ruler#check() refactored (a non-breaking change);
- documentation and examples re-designed.
- v4.0.0 @20200218
- several important fixes;
- Walker throws error if on illegal action code returned by handler;
- added: Walker#expectedErrors, removed: Walker#getMaster;
- added: check(), hadAction(), hasAction() to Ruler, removed: match();
- an up-to-date documentation;
- v3.1.0 @20200217
- v3.0.0 @20200211
- v2.0.0 @20200126
- v1.0.0 @20200124
- v0.8.3 @20200123: first (remotely) airworthy version.