MHTML Parser

A MHTML(.mht) file parser.

Only tested in .mht converted by Word 2016.

Installion

npm install mhtml-parser --save

Usage

Parse by filename (Asynchronous)

let parser = require('mhtml-parser');
parser.loadFile(__dirname + "/simple/simple.mht", {
    charset: "gbk" 
}, function(err, data) {
    if (err) throw err;
    console.log(data);
});

Parse by filename and read content manually (Only tested in Windows & Linux)

let parser = require('mhtml-parser');
let fs = require("fs");
let fileName = 'image001.jpg';

parser.loadFile(__dirname + "/simple/simple.mht", {
    charset: "gbk", 
    readMode: parser.constants.READ_MODE_POSITION
}, function(err, data) {
    if (err) throw err;
    fs.open(__dirname + "/test/simple/simple.mht", "r", function(err, fd) {
	    let buffer = new Buffer(data[fileName].bufferLength);
	    fs.readSync(fd, buffer, 0, data[fileName].bufferLength, data[fileName].startPosition); 
	    console.log(buffer.toString());
	});
});

Parse by string (Synchronous)

let parser = require('mhtml-parser');
let data = parser.parse(iconv.decode(require("fs").readFileSync(__dirname + "/simple/simple.mht", null), "gbk"), {});

In this mode, option.readMode will always be constants.READ_MODE_ALL.

Options

charset: String

Default Value: utf-8

To convert binary data to detected charset. Useless when readMode == READ_MODE_POSITION.

decodeQuotedPrintable: boolean

Default Value: false

To decode quoted-printable data, which is something like <a href=3D\"http://zsxsoft.com\">. Useless when readMode == READ_MODE_POSITION.

See here: https://github.com/mathiasbynens/quoted-printable

readMode: string

Default Value: constants.READ_MODE_ALL

Avaiable items list here:

READ_MODE_ALL - Read the whole file to the memory. You can directly get each file's content from data.fileName.data
READ_MODE_POSITION - Scan the whole file and only get the position and length of each file but not reading them.

Example Result

{
	"Hey.htm": {
		"name": "Hey.htm",
		"location": "file:///C:/B133AD19/Hey.htm",
		"encoding": "quoted-printable",
		"type": "text/html; charset=\"gb2312\"",
		"data": "<html xmlns:v=3D\"urn:schemas-microsoft-com:vml\"\nxmlns:o=3D\"urn:schemas-micr>\n<li.....n<p class=3DMsoNormal><span lang=3DEN-US><o:p>&nbsp;</o:p></span></p>\n\n</div>\n\n</body>\n\n</html>",
		"startPosition": 454,
		"bufferLength": 61846
	},
	"item0001.xml": {
		"name": "item0001.xml",
		"location": "file:///C:/B133AD19/Hey.files/item0001.xml",
		"encoding": "quoted-printable",
		"type": "text/xml",
		"data": "<?xml version=3D\"1.0\" encoding=3D\"UTF-8\" standalone=3D\"no\"?><b:Sources xmln=\ns:b=3D\"http://schemas.openxmlformats.org/officeDocument/2006/bibliography\" =\nxmlns=3D\"http://schemas.openxmlformats.org/officeDocument/2006/bibliography=\n\" SelectedStyle=3D\"\\APASixthEditionOfficeOnline.xsl\" StyleName=3D\"APA\" Vers=\nion=3D\"6\"></b:Sources>",
		"startPosition": 62470,
		"bufferLength": 335
	},
	"image001.jpg": {
		"name": "image001.jpg",
		"location": "file:///C:/B133AD19/Hey.files/image001.jpg",
		"encoding": "base64",
		"type": "image/jpeg",
		"data": "/9j/4AAQSkZJRgABBDAAo...+z6EMn3aKKKoD//Z",
		"startPosition": 68533,
		"bufferLength": 12297
	}
}

TODO

More tests

License

The MIT License

Welcome PR :)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
index.js		index.js
package.json		package.json
parser.js		parser.js
readme.md		readme.md
utils.js		utils.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MHTML Parser

Installion

Usage

Parse by filename (Asynchronous)

Parse by filename and read content manually (Only tested in Windows & Linux)

Parse by string (Synchronous)

Options

charset: String

decodeQuotedPrintable: boolean

readMode: string

Example Result

TODO

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zsxsoft-deprecated/mhtml-parser

Folders and files

Latest commit

History

Repository files navigation

MHTML Parser

Installion

Usage

Parse by filename (Asynchronous)

Parse by filename and read content manually (Only tested in Windows & Linux)

Parse by string (Synchronous)

Options

charset: String

decodeQuotedPrintable: boolean

readMode: string

Example Result

TODO

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages