Skip to content

Commit

Permalink
add option decodeStrings. closes #42. fix documentation for entry.fil…
Browse files Browse the repository at this point in the history
…eComment. closes #47.
  • Loading branch information
thejoshwolfe committed Oct 25, 2016
1 parent 828710b commit cc7455a
Show file tree
Hide file tree
Showing 3 changed files with 222 additions and 134 deletions.
82 changes: 61 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ Design principles:
* Never crash (if used properly).
Don't let malformed zip files bring down client applications who are trying to catch errors.
* Catch unsafe file names.
A zip file entry throws an error if its file name starts with `"/"` or `/[A-Za-z]:\//`
or if it contains `".."` path segments or `"\\"` (per the spec).
See `validateFileName()`.

## Usage

Expand Down Expand Up @@ -70,9 +69,9 @@ function defaultCallback(err) {

### open(path, [options], [callback])

Calls `fs.open(path, "r")` and gives the `fd`, `options`, and `callback` to `fromFd()` below.
Calls `fs.open(path, "r")` and reads the `fd` effectively the same as `fromFd()` would.

`options` may be omitted or `null`. The defaults are `{autoClose: true, lazyEntries: false}`.
`options` may be omitted or `null`. The defaults are `{autoClose: true, lazyEntries: false, decodeStrings: true}`.

`autoClose` is effectively equivalent to:

Expand All @@ -90,45 +89,54 @@ See [issue #22](https://github.com/thejoshwolfe/yauzl/issues/22).
If `lazyEntries` is `true`, an `entry` or `end` event will be emitted in response to each call to `readEntry()`.
This allows processing of one entry at a time, and will keep memory usage under control for zip files with many entries.

`decodeStrings` is the default and causes yauzl to decode strings with `CP437` or `UTF-8` as required by the spec.
The exact effects of turning this option off are:

* `zipfile.comment`, `entry.fileName`, and `entry.fileComment` will be `Buffer` objects instead of `String`s.
* Any Info-ZIP Unicode Path Extra Field will be ignored. See `extraFields`.
* Automatic file name validation will not be performed. See `validateFileName()`.

The `callback` is given the arguments `(err, zipfile)`.
An `err` is provided if the End of Central Directory Record cannot be found, or if its metadata appears malformed.
This kind of error usually indicates that this is not a zip file.
Otherwise, `zipfile` is an instance of `ZipFile`.

### fromFd(fd, [options], [callback])

Reads from the fd, which is presumed to be an open .zip file.
Note that random access is required by the zip file specification,
so the fd cannot be an open socket or any other fd that does not support random access.

The `callback` is given the arguments `(err, zipfile)`.
An `err` is provided if the End of Central Directory Record Signature cannot be found in the file,
which indicates that the fd is not a zip file.
`zipfile` is an instance of `ZipFile`.
`options` may be omitted or `null`. The defaults are `{autoClose: false, lazyEntries: false, decodeStrings: true}`.

`options` may be omitted or `null`. The defaults are `{autoClose: false, lazyEntries: false}`.
See `open()` for the meaning of the options.
See `open()` for the meaning of the options and callback.

### fromBuffer(buffer, [options], [callback])

Like `fromFd()`, but reads from a RAM buffer instead of an open file.
`buffer` is a `Buffer`.
`callback` is effectively passed directly to `fromFd()`.

If a `ZipFile` is acquired from this method,
it will never emit the `close` event,
and calling `close()` is not necessary.

`options` may be omitted or `null`. The defaults are `{lazyEntries: false}`.
See `open()` for the meaning of the options.
`options` may be omitted or `null`. The defaults are `{lazyEntries: false, decodeStrings: true}`.

See `open()` for the meaning of the options and callback.
The `autoClose` option is ignored for this method.

### fromRandomAccessReader(reader, totalSize, [options], [callback])

This method of creating a zip file allows clients to implement their own back-end file system.
This method of reading a zip file allows clients to implement their own back-end file system.
For example, a client might translate read calls into network requests.

The `reader` parameter must be of a type that is a subclass of
[RandomAccessReader](#class-randomaccessreader) that implements the required methods.
The `totalSize` is a Number and indicates the total file size of the zip file.

`options` may be omitted or `null`. The defaults are `{autoClose: true, lazyEntries: false}`.
See `open()` for the meaning of the options.
`options` may be omitted or `null`. The defaults are `{autoClose: true, lazyEntries: false, decodeStrings: true}`.

See `open()` for the meaning of the options and callback.

### dosDateTimeToDate(date, time)

Expand All @@ -137,6 +145,20 @@ Each parameter is a `Number` treated as an unsigned 16-bit integer.
Note that this format does not support timezones,
so the returned object will use the local timezone.

### validateFileName(fileName)

Returns `null` or a `String` error message depending on the validity of `fileName`.
If `fileName` starts with `"/"` or `/[A-Za-z]:\//` or if it contains `".."` path segments or `"\\"`,
this function returns an error message appropriate for use like this:

```js
var errorMessage = yauzl.validateFileName(fileName);
if (errorMessage != null) throw new Error(errorMessage);
```

This function is automatically run for each entry, as long as `decodeStrings` is `true`.
See `open()` and `Event: "entry"` for more information.

### Class: ZipFile

The constructor for the class is not part of the public API.
Expand All @@ -147,6 +169,9 @@ Use `open()`, `fromFd()`, `fromBuffer()`, or `fromRandomAccessReader()` instead.
Callback gets `(entry)`, which is an `Entry`.
See `open()` and `readEntry()` for when this event is emitted.

If `decodeStrings` is `true`, entries emitted via this event have already passed file name validation.
See `validateFileName()` and `open()` for more information.

#### Event: "end"

Emitted after the last `entry` event has been emitted.
Expand Down Expand Up @@ -240,6 +265,8 @@ This includes situations where the `autoClose` option effectively calls this fun

`String`. Always decoded with `CP437` per the spec.

If `decodeStrings` is `false` (see `open()`), this field is the undecoded `Buffer` instead of a decoded `String`.

### Class: Entry

Objects of this class represent Central Directory Records.
Expand Down Expand Up @@ -271,8 +298,12 @@ Following the spec, the bytes for the file name are decoded with
Alternatively, this field may be populated from the Info-ZIP Unicode Path Extra Field
(see `extraFields`).

If `fileName` would contain unsafe characters, such as an absolute path or
a relative directory, yauzl emits an error instead of an entry.
This field is automatically validated by `validateFileName()` before yauzl emits an "entry" event.
If this field would contain unsafe characters, yauzl emits an error instead of an entry.

If `decodeStrings` is `false` (see `open()`), this field is the undecoded `Buffer` instead of a decoded `String`.
Therefore, `generalPurposeBitFlag` and any Info-ZIP Unicode Path Extra Field are ignored.
Furthermore, no automatic file name validation is performed for this file name.

#### extraFields

Expand All @@ -288,16 +319,22 @@ to convey `UTF-8` file names.
When the field is identified and verified to be reliable (see the zipfile spec),
the the file name in this field is stored in the `fileName` property,
and the file name in the central directory record for this entry is ignored.
Note that when `decodeStrings` is false, all Info-ZIP Unicode Path Extra Fields are ignored.

None of the other fields are considered significant by this library.
Fields that this library reads are left unalterned in the `extraFields` array.

#### comment
#### fileComment

`String` decoded with the the charset indicated by `generalPurposeBitFlag & 0x800`
as with the `fileName`.
`String` decoded with the charset indicated by `generalPurposeBitFlag & 0x800` as with the `fileName`.
(The Info-ZIP Unicode Path Extra Field has no effect on the charset used for this field.)

If `decodeStrings` is `false` (see `open()`), this field is the undecoded `Buffer` instead of a decoded `String`.

Prior to yauzl version 2.7.0, this field was erroneously documented as `comment` instead of `fileComment`.
For compatibility with any code that uses the field name `comment`,
yauzl creates an alias field named `comment` which is identical to `fileComment`.

#### getLastModDate()

Effectively implemented as:
Expand Down Expand Up @@ -453,6 +490,9 @@ This library makes no attempt to interpret the Language Encoding Flag.

## Change History

* 2.7.0
* Added option `decodeStrings`. [issue #42](https://github.com/thejoshwolfe/yauzl/issues/42)
* Fixed documentation for `entry.fileComment` and added compatibility alias. [issue #47](https://github.com/thejoshwolfe/yauzl/issues/47)
* 2.6.0
* Support Info-ZIP Unicode Path Extra Field, used by WinRAR for Chinese file names. [issue #33](https://github.com/thejoshwolfe/yauzl/issues/33)
* 2.5.0
Expand Down
98 changes: 60 additions & 38 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ exports.fromFd = fromFd;
exports.fromBuffer = fromBuffer;
exports.fromRandomAccessReader = fromRandomAccessReader;
exports.dosDateTimeToDate = dosDateTimeToDate;
exports.validateFileName = validateFileName;
exports.ZipFile = ZipFile;
exports.Entry = Entry;
exports.RandomAccessReader = RandomAccessReader;
Expand All @@ -25,6 +26,7 @@ function open(path, options, callback) {
if (options == null) options = {};
if (options.autoClose == null) options.autoClose = true;
if (options.lazyEntries == null) options.lazyEntries = false;
if (options.decodeStrings == null) options.decodeStrings = true;
if (callback == null) callback = defaultCallback;
fs.open(path, "r", function(err, fd) {
if (err) return callback(err);
Expand All @@ -43,6 +45,7 @@ function fromFd(fd, options, callback) {
if (options == null) options = {};
if (options.autoClose == null) options.autoClose = false;
if (options.lazyEntries == null) options.lazyEntries = false;
if (options.decodeStrings == null) options.decodeStrings = true;
if (callback == null) callback = defaultCallback;
fs.fstat(fd, function(err, stats) {
if (err) return callback(err);
Expand All @@ -59,6 +62,7 @@ function fromBuffer(buffer, options, callback) {
if (options == null) options = {};
options.autoClose = false;
if (options.lazyEntries == null) options.lazyEntries = false;
if (options.decodeStrings == null) options.decodeStrings = true;
// i got your open file right here.
var reader = fd_slicer.createFromBuffer(buffer);
fromRandomAccessReader(reader, buffer.length, options, callback);
Expand All @@ -72,6 +76,8 @@ function fromRandomAccessReader(reader, totalSize, options, callback) {
if (options == null) options = {};
if (options.autoClose == null) options.autoClose = true;
if (options.lazyEntries == null) options.lazyEntries = false;
if (options.decodeStrings == null) options.decodeStrings = true;
var decodeStrings = !!options.decodeStrings;
if (callback == null) callback = defaultCallback;
if (typeof totalSize !== "number") throw new Error("expected totalSize parameter to be a number");
if (totalSize > Number.MAX_SAFE_INTEGER) {
Expand Down Expand Up @@ -120,10 +126,11 @@ function fromRandomAccessReader(reader, totalSize, options, callback) {
}
// 22 - Comment
// the encoding is always cp437.
var comment = bufferToString(eocdrBuffer, 22, eocdrBuffer.length, false);
var comment = decodeStrings ? decodeBuffer(eocdrBuffer, 22, eocdrBuffer.length, false)
: eocdrBuffer.slice(22);

if (!(entryCount === 0xffff || centralDirectoryOffset === 0xffffffff)) {
return callback(null, new ZipFile(reader, centralDirectoryOffset, totalSize, entryCount, comment, options.autoClose, options.lazyEntries));
return callback(null, new ZipFile(reader, centralDirectoryOffset, totalSize, entryCount, comment, options.autoClose, options.lazyEntries, decodeStrings));
}

// ZIP64 format
Expand Down Expand Up @@ -164,7 +171,7 @@ function fromRandomAccessReader(reader, totalSize, options, callback) {
// 48 - offset of start of central directory with respect to the starting disk number 8 bytes
centralDirectoryOffset = readUInt64LE(zip64EocdrBuffer, 48);
// 56 - zip64 extensible data sector (variable size)
return callback(null, new ZipFile(reader, centralDirectoryOffset, totalSize, entryCount, comment, options.autoClose, options.lazyEntries));
return callback(null, new ZipFile(reader, centralDirectoryOffset, totalSize, entryCount, comment, options.autoClose, options.lazyEntries, decodeStrings));
});
});
return;
Expand All @@ -174,7 +181,7 @@ function fromRandomAccessReader(reader, totalSize, options, callback) {
}

util.inherits(ZipFile, EventEmitter);
function ZipFile(reader, centralDirectoryOffset, fileSize, entryCount, comment, autoClose, lazyEntries) {
function ZipFile(reader, centralDirectoryOffset, fileSize, entryCount, comment, autoClose, lazyEntries, decodeStrings) {
var self = this;
EventEmitter.call(self);
self.reader = reader;
Expand All @@ -193,6 +200,7 @@ function ZipFile(reader, centralDirectoryOffset, fileSize, entryCount, comment,
self.entriesRead = 0;
self.autoClose = !!autoClose;
self.lazyEntries = !!lazyEntries;
self.decodeStrings = !!decodeStrings;
self.isOpen = true;
self.emittedError = false;

Expand Down Expand Up @@ -274,7 +282,8 @@ ZipFile.prototype.readEntry = function() {
if (self.emittedError) return;
// 46 - File name
var isUtf8 = (entry.generalPurposeBitFlag & 0x800) !== 0;
entry.fileName = bufferToString(buffer, 0, entry.fileNameLength, isUtf8);
entry.fileName = self.decodeStrings ? decodeBuffer(buffer, 0, entry.fileNameLength, isUtf8)
: buffer.slice(0, entry.fileNameLength);

// 46+n - Extra field
var fileCommentStart = entry.fileNameLength + entry.extraFieldLength;
Expand All @@ -297,7 +306,10 @@ ZipFile.prototype.readEntry = function() {
}

// 46+n+m - File comment
entry.fileComment = bufferToString(buffer, fileCommentStart, fileCommentStart + entry.fileCommentLength, isUtf8);
entry.fileComment = self.decodeStrings ? decodeBuffer(buffer, fileCommentStart, fileCommentStart + entry.fileCommentLength, isUtf8)
: buffer.slice(fileCommentStart, fileCommentStart + entry.fileCommentLength);
// compatibility hack for https://github.com/thejoshwolfe/yauzl/issues/47
entry.comment = entry.fileComment;

self.readEntryCursor += buffer.length;
self.entriesRead += 1;
Expand Down Expand Up @@ -348,29 +360,31 @@ ZipFile.prototype.readEntry = function() {

// check for Info-ZIP Unicode Path Extra Field (0x7075)
// see https://github.com/thejoshwolfe/yauzl/issues/33
for (var i = 0; i < entry.extraFields.length; i++) {
var extraField = entry.extraFields[i];
if (extraField.id === 0x7075) {
if (extraField.data.length < 6) {
// too short to be meaningful
continue;
}
// Version 1 byte version of this extra field, currently 1
if (extraField.data.readUInt8(0) !== 1) {
// > Changes may not be backward compatible so this extra
// > field should not be used if the version is not recognized.
continue;
}
// NameCRC32 4 bytes File Name Field CRC32 Checksum
var oldNameCrc32 = extraField.data.readUInt32LE(1);
if (crc32.unsigned(buffer.slice(0, entry.fileNameLength)) !== oldNameCrc32) {
// > If the CRC check fails, this UTF-8 Path Extra Field should be
// > ignored and the File Name field in the header should be used instead.
continue;
if (self.decodeStrings) {
for (var i = 0; i < entry.extraFields.length; i++) {
var extraField = entry.extraFields[i];
if (extraField.id === 0x7075) {
if (extraField.data.length < 6) {
// too short to be meaningful
continue;
}
// Version 1 byte version of this extra field, currently 1
if (extraField.data.readUInt8(0) !== 1) {
// > Changes may not be backward compatible so this extra
// > field should not be used if the version is not recognized.
continue;
}
// NameCRC32 4 bytes File Name Field CRC32 Checksum
var oldNameCrc32 = extraField.data.readUInt32LE(1);
if (crc32.unsigned(buffer.slice(0, entry.fileNameLength)) !== oldNameCrc32) {
// > If the CRC check fails, this UTF-8 Path Extra Field should be
// > ignored and the File Name field in the header should be used instead.
continue;
}
// UnicodeName Variable UTF-8 version of the entry File Name
entry.fileName = decodeBuffer(extraField.data, 5, extraField.data.length, true);
break;
}
// UnicodeName Variable UTF-8 version of the entry File Name
entry.fileName = bufferToString(extraField.data, 5, extraField.data.length, true);
break;
}
}

Expand All @@ -382,15 +396,9 @@ ZipFile.prototype.readEntry = function() {
}
}

// validate file name
if (entry.fileName.indexOf("\\") !== -1) {
return emitErrorAndAutoClose(self, new Error("invalid characters in fileName: " + entry.fileName));
}
if (/^[a-zA-Z]:/.test(entry.fileName) || /^\//.test(entry.fileName)) {
return emitErrorAndAutoClose(self, new Error("absolute path: " + entry.fileName));
}
if (entry.fileName.split("/").indexOf("..") !== -1) {
return emitErrorAndAutoClose(self, new Error("invalid relative path: " + entry.fileName));
if (self.decodeStrings) {
var errorMessage = validateFileName(entry.fileName);
if (errorMessage != null) return emitErrorAndAutoClose(self, new Error(errorMessage));
}
self.emit("entry", entry);

Expand Down Expand Up @@ -504,6 +512,20 @@ function dosDateTimeToDate(date, time) {
return new Date(year, month, day, hour, minute, second, millisecond);
}

function validateFileName(fileName) {
if (fileName.indexOf("\\") !== -1) {
return "invalid characters in fileName: " + fileName;
}
if (/^[a-zA-Z]:/.test(fileName) || /^\//.test(fileName)) {
return "absolute path: " + fileName;
}
if (fileName.split("/").indexOf("..") !== -1) {
return "invalid relative path: " + fileName;
}
// all good
return null;
};

function readAndAssertNoEof(reader, buffer, offset, length, position, callback) {
if (length === 0) {
// fs.read will throw an out-of-bounds error if you try to read 0 bytes from a 0 byte file
Expand Down Expand Up @@ -641,7 +663,7 @@ RefUnrefFilter.prototype.unref = function(cb) {
};

var cp437 = '\u0000☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼ !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~⌂ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£¥₧ƒáíóúñѪº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ²■ ';
function bufferToString(buffer, start, end, isUtf8) {
function decodeBuffer(buffer, start, end, isUtf8) {
if (isUtf8) {
return buffer.toString("utf8", start, end);
} else {
Expand Down
Loading

0 comments on commit cc7455a

Please sign in to comment.