Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JBIG2Decoder implementation #67

Merged
merged 46 commits into from Jul 14, 2019
Merged

JBIG2Decoder implementation #67

merged 46 commits into from Jul 14, 2019

Conversation

kucjac
Copy link
Contributor

@kucjac kucjac commented May 26, 2019

This PR contains JBIG2Decoder implementation. All the package structures are defined in the internal package jbig2.

The package requires the row stride of the image to be taken into account.

Currently the Decode [1.0 0.0] parameter is treated by the JBIG2Encoder - so that it flips the meaning of the bits '0' and '1' b/w.


This change is Reviewable

@gunnsth gunnsth self-requested a review June 14, 2019 07:25
Copy link
Contributor Author

@kucjac kucjac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 64 of 72 files reviewed, 28 unresolved discussions (waiting on @adrg, @gunnsth, and @kucjac)


internal/jbig2/reader/reader.go, line 85 at r7 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Actually the variable declarations with the named return values were fine, just the return expressions should be explicit.
The useful thing about the named return values is that it is more obvious what the meaning of the return values is.

I agree with that, reversed to the named variable declaration.


internal/jbig2/reader/reader.go, line 45 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Having the named parameter is actually fine. In this case it is actually nice as I wouldn't necessery expect a function called Align to return a number of skipped bytes.
Using a named parameter, and returning it explicitly works fine. For example
func (r *Reader) Align() (skipped byte) {
skipped = r.bits
r.bits = 0 // no need to clear cache, will be overwritten on next read
return skipped
}

Reversed the named variable definition, but without naked return.


internal/jbig2/segments/header.go, line 215 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Ammount? Should be amount?

Done, I've also change the name of the variables from amountOf* into numberOf* etc.


internal/jbig2/segments/page-information.go, line 175 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

should be Resolution ?

Done.


internal/jbig2/segments/pattern-dictionary.go, line 94 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

should have a blank newline between functions

Done.


internal/jbig2/segments/pattern-dictionary.go, line 100 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

I expect those numbers are referring to the jbig2 reference? Any particular section or?

the section is defined in the PatternDictionary definition, should I repeat it?


internal/jbig2/segments/symbol-dictionary.go, line 90 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Is this the language used in the reference? "Amount of" rather than "number of " etc? If so, good to stick with the reference

I've changed all the 'amount*' instances into 'number*'. The variable names was similar as in the apache version, but I agree that the 'number' sounds better.


internal/jbig2/segments/symbol-dictionary.go, line 101 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

trace like logging ?

Done.


internal/jbig2/segments/symbol-dictionary.go, line 235 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Perhaps performance test would be better inside tests or benchmarks?

Done.


internal/jbig2/segments/symbol-dictionary.go, line 276 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

typo in "aggergation" - should be "aggregration"

Done.


internal/jbig2/segments/symbol-dictionary.go, line 433 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Can probably put this on a single line (function declaration).

Done.


internal/jbig2/segments/symbol-dictionary.go, line 512 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Don´t need a named return variable (pretty obvious meaning here)

Done.


internal/jbig2/segments/symbol-dictionary.go, line 519 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

return err

Done.


internal/jbig2/segments/symbol-dictionary.go, line 591 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

skip blank newline here

Done.


internal/jbig2/segments/symbol-dictionary.go, line 655 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

check

Done.


internal/jbig2/segments/table_segment.go, line 71 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Named return var not necessary here as meaning obvious.

Done.


internal/jbig2/segments/table_segment.go, line 79 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

check

Done.


internal/jbig2/segments/table_segment.go, line 87 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

check

Done.


internal/jbig2/segments/table_segment.go, line 108 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

check

Done.


internal/jbig2/segments/text-region.go, line 412 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

return dT, err

Done.


internal/jbig2/segments/text-region.go, line 418 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

return dT, err

Done.


internal/jbig2/segments/text-region.go, line 435 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

check

Done.


internal/jbig2/segments/text-region.go, line 886 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

redundant named parameter

Done.


internal/jbig2/segments/text-region.go, line 887 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

trace log

Done.


internal/jbig2/segments/text-region.go, line 897 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

return err

Done.


internal/jbig2/segments/text-region.go, line 924 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

check around

Done.


internal/jbig2/segments/text-region.go, line 1143 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

check blank lines

Done.


internal/jbig2/tests/bench_test.go, line 26 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Are those files accessible somwhere in a github repo?

they're being created after having 'TestDecodeJBIG2Files' done. I've added the 'integration' build tag.


internal/jbig2/tests/document_decode_test.go, line 25 at r8 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

How to integrate these tests into our build? Would be good to get some commands to add to the Jenkinsfile

in order to integrate these builds we need to add the jbig2 test files to the github repository, as well as add the 'integration' build tag i.e.: go test ./... -tags integration. If we would like to include this test also I should also change the Benchmark so that it creates the jbig2 files by itself. I've added the 'integration' because the tests just extracts the images from the files and there are rather few unit tests. It also takes some time to finish as there are multiple images packed in the example pdf files you've sent to me.

Copy link
Contributor

@gunnsth gunnsth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Only missing thing at the moment seems to be to add the tests to the build.

Reviewed 1 of 59 files at r2, 6 of 8 files at r9, 13 of 13 files at r10.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @adrg)

Copy link
Collaborator

@adrg adrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I left a couple of points to consider.

core/encoding.go Outdated

// (PDF32000:2008 Table 39) The array should be of 2 x n size.
// For binary images n stands for 1bit, thus the array should contain 2 numbers.
floatSlice, err := arr.GetAsFloat64Slice()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetAsFloat64Slice converts integer values to floats so both are covered. It looks like the method could be simplified to something like:

func (enc *JBIG2Encoder) setChocolateData(decode PdfObject) {
	arr, ok := decode.(*PdfObjectArray)
	if !ok {
		common.Log.Debug("JBIG2Encoder - Decode is not an array. %T", decode)
		return
	}

	// (PDF32000:2008 Table 39) The array should be of 2 x n size.
	// For binary images n stands for 1bit, thus the array should contain 2 numbers.
	vals, err := arr.GetAsFloat64Slice()
	if err != nil {
		common.Log.Debug("JBIG2Encoder unsupported Decode value. %s", arr.String())
		return
	}
	if len(vals) != 2 {
		return
	}

	valA, valB := int(vals[0]), int(vals[1])
	if valA == 1 && valB == 0 {
		enc.IsChocolateData = true
	} else if valA == 0 && valB == 1 {
		enc.IsChocolateData = false
	} else {
		common.Log.Debug("JBIG2Encoder unsupported DecodeParams->Decode value: %s", arr.String())
	}
}

}

if decodeParams != nil {
if globals := decodeParams.Get("JBIG2Globals"); globals != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the jbig2Globals constant defined at the top of the file be used here instead of hardcoding "JBIG2Globals"? Otherwise, the constant should be removed as it does not seem to be used anywhere.


// GetByte gets and returns the byte at given 'index'.
func (b *Bitmap) GetByte(index int) (byte, error) {
if index > len(b.Data)-1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also check if index is less than 0.

// 'Chocolate' data is the bit interpretation where the 0'th bit means white and the 1'th bit means black.
// The naming convention based on the: `https://en.wikipedia.org/wiki/Binary_image#Interpretation` page.
func (b *Bitmap) GetChocolateData() []byte {
if !b.isVanilla {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is shorter in this form:

func (b *Bitmap) GetChocolateData() []byte {
	if b.isVanilla {
		b.inverseData()
	}

	return b.Data
}

// GetUnpaddedData gets the data without row stride padding.
func (b *Bitmap) GetUnpaddedData() []byte {
padding := uint(b.Width & 0x07)
useShift := padding != 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The useShift variable is not used in the rest of the function. Might as well inline it in the if statement:

if useShift := padding != 0; !useShift {
    return b.Data
}

}

if !g.IsMMREncoded {
var numberOfGbAt int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could initialize numberOfGbAt with 1 and remove the else branch. Also inside the if statement, the else branch can be eliminated by assigned 4 to numberOfGbAt.

}

func newHalftoneRegion(r *reader.Reader) *HalftoneRegion {
hr := &HalftoneRegion{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the hr variable.


// NewRegionSegment creates new Region segment model.
func NewRegionSegment(r reader.StreamReader) *RegionSegment {
rs := &RegionSegment{r: r}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the rs variable.

}
}

if currentExportFlag == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could assign 0 to currentExportFlag before the if statement and remove the else branch.

return nil
}

func (s *SymbolDictionary) setAtPixels() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a variable here so not to repeat code. Something like:

func (s *SymbolDictionary) setAtPixels() error {
	if !s.isHuffmanEncoded {
		index := 1
		if s.sdTemplate == 0 {
			index = 4
		}

		if err := s.readAtPixels(index); err != nil {
			return err
		}
	}
	return nil
}

Copy link
Contributor Author

@kucjac kucjac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 58 of 72 files reviewed, 32 unresolved discussions (waiting on @adrg and @gunnsth)


core/encoding.go, line 2013 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

GetAsFloat64Slice converts integer values to floats so both are covered. It looks like the method could be simplified to something like:

func (enc *JBIG2Encoder) setChocolateData(decode PdfObject) {
	arr, ok := decode.(*PdfObjectArray)
	if !ok {
		common.Log.Debug("JBIG2Encoder - Decode is not an array. %T", decode)
		return
	}

	// (PDF32000:2008 Table 39) The array should be of 2 x n size.
	// For binary images n stands for 1bit, thus the array should contain 2 numbers.
	vals, err := arr.GetAsFloat64Slice()
	if err != nil {
		common.Log.Debug("JBIG2Encoder unsupported Decode value. %s", arr.String())
		return
	}
	if len(vals) != 2 {
		return
	}

	valA, valB := int(vals[0]), int(vals[1])
	if valA == 1 && valB == 0 {
		enc.IsChocolateData = true
	} else if valA == 0 && valB == 1 {
		enc.IsChocolateData = false
	} else {
		common.Log.Debug("JBIG2Encoder unsupported DecodeParams->Decode value: %s", arr.String())
	}
}

Done.


core/encoding.go, line 2075 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Should the jbig2Globals constant defined at the top of the file be used here instead of hardcoding "JBIG2Globals"? Otherwise, the constant should be removed as it does not seem to be used anywhere.

Done. I initially used it more than once that's why I used it as a const. Currently it is more readable by using it in a string form. I've removed the constant.


internal/jbig2/document.go, line 128 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Please add explicit returns.

This function return no values, and current use of the 'return' statement, is to end the function fast if the condition is not fulfilled.


internal/jbig2/bitmap/bitmap.go, line 73 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Should also check if index is less than 0.

Done.


internal/jbig2/bitmap/bitmap.go, line 88 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

The method is shorter in this form:

func (b *Bitmap) GetChocolateData() []byte {
	if b.isVanilla {
		b.inverseData()
	}

	return b.Data
}

Done.


internal/jbig2/bitmap/bitmap.go, line 114 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

The useShift variable is not used in the rest of the function. Might as well inline it in the if statement:

if useShift := padding != 0; !useShift {
    return b.Data
}

Done.


internal/jbig2/bitmap/bitmap.go, line 121 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Remove empty line.

Done.


internal/jbig2/bitmap/bitmap.go, line 150 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could add continue here in order to eliminate the else branch as the code block is relatively long.

Done. Good Idea, thanks.


internal/jbig2/bitmap/bitmap.go, line 201 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could use continue here as well.

Done.


internal/jbig2/bitmap/bitmap.go, line 224 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Shorter like this:

func (b *Bitmap) GetVanillaData() []byte {
	if !b.isVanilla {
		b.inverseData()
	}

	return b.Data
}

Done.


internal/jbig2/bitmap/bitmap.go, line 260 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could also check for negative indices.

Done.


internal/jbig2/bitmap/bitmap.go, line 292 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could set c as color.White by default and remove the else branch.

c := color.White
if b.GetPixel(x, y) {
    c = color.Black
}

Done.


internal/jbig2/bitmap/combine.go, line 141 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could use continue and eliminate the else branch and reduce the indentation level.

Done.


internal/jbig2/decoder/arithmetic/arithmetic.go, line 327 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Is there any way to avoid goto style code here?

Done.


internal/jbig2/decoder/arithmetic/stats.go, line 23 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Variable not needed. Could just return the DecoderStats instance:

return &DecoderStats{
	index:              index,
	contextSize:        contextSize,
	codingContextTable: make([]byte, contextSize),
	mps:                make([]byte, contextSize),
}

Done.


internal/jbig2/decoder/huffman/node.go, line 139 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Remove empty line for consistency.

Done.


internal/jbig2/decoder/huffman/node.go, line 172 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Remove empty line for consistency.

Done.


internal/jbig2/decoder/mmr/mmr.go, line 65 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

return nil, err is clearer. The point is also covered by the style guide of the repository. Please also address the returns in the rest of the function.

Done. Thanks, missed that function.


internal/jbig2/decoder/mmr/mmr.go, line 154 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could initialize value with 1 and remove the else branch.

Done.


internal/jbig2/decoder/mmr/mmr.go, line 246 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could initialize result with EOL and remove the else branch.

Done.


internal/jbig2/reader/substream.go, line 56 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

No need for the s variable.

Done.


internal/jbig2/segments/generic-refinement-region.go, line 729 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could set w1, w2 and w3 before each if statement and remove the else branches.

Done. Reduced also the higher level else branch by setting all w1,w2,w3 to 0 and doing inverse if check.


internal/jbig2/segments/generic-refinement-region.go, line 802 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Remove empty line for consistency with the code block below.

Done.


internal/jbig2/segments/generic-refinement-region.go, line 833 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Please add explicit returns throughout the method.

Done. Missed that method. Thanks.


internal/jbig2/segments/generic-refinement-region.go, line 910 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could just add the log at the bottom of the method as there are no other exit points. No defer function seems to be needed here.

Done.


internal/jbig2/segments/generic-region.go, line 63 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

The g variable is not needed.

Done.


internal/jbig2/segments/generic-region.go, line 94 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Please add explicit returns for the function.

Done.


internal/jbig2/segments/generic-region.go, line 215 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could initialize numberOfGbAt with 1 and remove the else branch. Also inside the if statement, the else branch can be eliminated by assigned 4 to numberOfGbAt.

Done.


internal/jbig2/segments/halftone-segment.go, line 401 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

No need for the hr variable.

Done.


internal/jbig2/segments/region.go, line 40 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

No need for the rs variable.

Done.


internal/jbig2/segments/symbol-dictionary.go, line 693 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could assign 0 to currentExportFlag before the if statement and remove the else branch.

This variable changes on each loop, then it flips it's value. Thus I can't assign it to 0 before the statement.


internal/jbig2/segments/symbol-dictionary.go, line 1016 at r10 (raw file):

Previously, adrg (Adrian-George Bostan) wrote…

Could use a variable here so not to repeat code. Something like:

func (s *SymbolDictionary) setAtPixels() error {
	if !s.isHuffmanEncoded {
		index := 1
		if s.sdTemplate == 0 {
			index = 4
		}

		if err := s.readAtPixels(index); err != nil {
			return err
		}
	}
	return nil
}

Done. I've also changed the if statement, so that it ends fast if the isHuffmanEncoded is true.

@adrg
Copy link
Collaborator

adrg commented Jul 2, 2019


internal/jbig2/document.go, line 128 at r10 (raw file):

Previously, kucjac (Jacek Kucharczyk) wrote…

This function return no values, and current use of the 'return' statement, is to end the function fast if the condition is not fulfilled.

You are right, sorry. I must have been looking at another method.

Copy link
Collaborator

@adrg adrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the review comments. Looks good.

Reviewable status: 58 of 72 files reviewed, all discussions resolved (waiting on @gunnsth)

Copy link
Collaborator

@adrg adrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 58 of 72 files reviewed, all discussions resolved (waiting on @gunnsth)

Copy link
Contributor

@gunnsth gunnsth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 14 of 14 files at r11.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@gunnsth gunnsth changed the base branch from master to development July 5, 2019 00:18
Copy link
Contributor

@gunnsth gunnsth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r12.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@gunnsth
Copy link
Contributor

gunnsth commented Jul 5, 2019

@kucjac I've integrated this with our build system, such that it runs all the integration tests on build (Jenkinsfile update). Changed the environment variable to UNIDOC_JBIG2_TESTDATA for consistency with other integration tests requiring external test data.
Looking at test output, a lot of the extracted images, but a significant fraction appears to have some row stride issues, i.e. content going across lines, like:
image
(003685_31_0.pdf).

Possibly related:
unidoc/unidoc#448

Looks like will be working nicely once the row stride issues are fixed. Once fixed we should add a hash check to detect if output images are changing. If there is a change, the new output needs to be re-checked and hash updated if OK.

Also changed the base branch to development as all new things going in there prior to release.

Copy link
Contributor

@gunnsth gunnsth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. All the output images looking good now. A few comments.

Reviewed 11 of 11 files at r13.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @kucjac)


internal/jbig2/bitmap/bitmap.go, line 50 at r13 (raw file):

}

// NewWithData creates new bitmap with the provided 'widht', 'height' and the byte slice 'data'.

width


internal/jbig2/bitmap/bitmap.go, line 160 at r13 (raw file):

}

// GetVanillaData gets bitmap data as a byte sice with Vanilla bit intepretation.

sice/slice


internal/jbig2/tests/document_decode_test.go, line 27 at r13 (raw file):

// jbig2Validate is the runtime flag which defines if the golden files
// should be validated.
var jbig2Validate bool

The convention is to validate golden files (hash check) if the optional testdata directory is specified. So could do jbig2Validate = len(os.Getenv(EnvDirectory)) > 0.
Better to keep things simple. For someone working on this, should be easy to modify the code and disable validation as needed.

It would be nice to have a flag to update the golden images as well. That often happens when a minor change is made in the library which leads to a small change in the output that still passes visual validation and manual checks (but fails hash checks). In those cases it makes sense to update the golden files.
What is the process for updating the hashes?


internal/jbig2/tests/document_decode_test.go, line 52 at r13 (raw file):

	require.NoError(t, err)

	if len(filenames) > 0 {

Wouldn't it be cleaner to leave the original testdata directory untouched rather than place the output files there?
Typically we create a temporary directory, os.TempDir and remove it once tests are completed (defer). If need to check the outputs can be handy to skip deleting the temp folder.


internal/jbig2/tests/document_decode_test.go, line 55 at r13 (raw file):

		_, err = os.Stat(filepath.Join(dirName, jbig2DecodedDirectory))
		if err != nil {
			err = os.Mkdir(filepath.Join(dirName, jbig2DecodedDirectory), 0666)

With permission 0666 on the folder get an error, Worked after chmod +x on the folder. Should use 0700 permission.
--- FAIL: TestDecodeJBIG2Files/009907 (0.01s)
require.go:794:
Error Trace: document_decode_test.go:90
Error: Received unexpected error:
open path/to/jbig2-testdata/jbig2_decoded_images/009907.zip: permission denied
Test: TestDecodeJBIG2Files/009907
same when it created the "goldens" subfolder


internal/jbig2/tests/goldens_test.go, line 20 at r13 (raw file):

// Goldens is a model for goldens file json document.
type Goldens map[string]*GoldenObject

GoldenFile might be more descriptive, and also consistent with how this is referred to below.


internal/jbig2/tests/goldens_test.go, line 55 at r13 (raw file):

		if !single.IsValid {
			// if the single raw is not valid then udate it's hash

udate/update it's/its hash
Who determines that it is not valid? Is it for the user to judge?


internal/jbig2/tests/goldens_test.go, line 69 at r13 (raw file):

	// check if the directory exists.
	if _, err := os.Stat(goldenDir); err != nil {
		if err = os.Mkdir(goldenDir, 0666); err != nil {

permission issue when creating file here. Should probably use 0700

Copy link
Contributor Author

@kucjac kucjac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 74 of 77 files reviewed, 8 unresolved discussions (waiting on @gunnsth)


internal/jbig2/bitmap/bitmap.go, line 50 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

width

Done.


internal/jbig2/bitmap/bitmap.go, line 160 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

sice/slice

Done.


internal/jbig2/tests/document_decode_test.go, line 27 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

The convention is to validate golden files (hash check) if the optional testdata directory is specified. So could do jbig2Validate = len(os.Getenv(EnvDirectory)) > 0.
Better to keep things simple. For someone working on this, should be easy to modify the code and disable validation as needed.

It would be nice to have a flag to update the golden images as well. That often happens when a minor change is made in the library which leads to a small change in the output that still passes visual validation and manual checks (but fails hash checks). In those cases it makes sense to update the golden files.
What is the process for updating the hashes?

Done. I've changed the logic to store only the 'hash' for each 'filename' as all decoded. I've also changed the flag to 'jbig2-update-goldens'. If the hash doesn't match with the ones stored in the golden file then then current test would fail.


internal/jbig2/tests/document_decode_test.go, line 52 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

Wouldn't it be cleaner to leave the original testdata directory untouched rather than place the output files there?
Typically we create a temporary directory, os.TempDir and remove it once tests are completed (defer). If need to check the outputs can be handy to skip deleting the temp folder.

Done. Definitely a good idea. I couldn't find better and cleaner solution for that.


internal/jbig2/tests/document_decode_test.go, line 55 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

With permission 0666 on the folder get an error, Worked after chmod +x on the folder. Should use 0700 permission.
--- FAIL: TestDecodeJBIG2Files/009907 (0.01s)
require.go:794:
Error Trace: document_decode_test.go:90
Error: Received unexpected error:
open path/to/jbig2-testdata/jbig2_decoded_images/009907.zip: permission denied
Test: TestDecodeJBIG2Files/009907
same when it created the "goldens" subfolder

Done. Changed the permissions to 0700.


internal/jbig2/tests/goldens_test.go, line 20 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

GoldenFile might be more descriptive, and also consistent with how this is referred to below.

Done.


internal/jbig2/tests/goldens_test.go, line 55 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

udate/update it's/its hash
Who determines that it is not valid? Is it for the user to judge?

Done. I wanted to check manually all the files. Changed to keep only the hash for each key filename


internal/jbig2/tests/goldens_test.go, line 69 at r13 (raw file):

Previously, gunnsth (Gunnsteinn Hall) wrote…

permission issue when creating file here. Should probably use 0700

Done.

Copy link
Contributor

@gunnsth gunnsth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r14.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

gunnsth added a commit to gunnsth/unipdf that referenced this pull request Jul 14, 2019
gunnsth added a commit that referenced this pull request Jul 14, 2019
…irectory (#120)

Needed to be able to run tests in PR #67
@gunnsth gunnsth merged commit e85616c into unidoc:development Jul 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants