Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ExtractText triggers a panic: runtime error: index out of range [0] with length 0 #491

Closed
becoded opened this issue Apr 29, 2022 · 1 comment

Comments

@becoded
Copy link
Contributor

becoded commented Apr 29, 2022

Description

We are using ExtractText() and from time to time, we are getting an index out of range error.

Stacktrace:

panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 21 [running]:
testing.tRunner.func1.2({0x1009ba340, 0x140001f5d28})
	/opt/homebrew/Cellar/go/1.18.1/libexec/src/testing/testing.go:1389 +0x1c8
testing.tRunner.func1()
	/opt/homebrew/Cellar/go/1.18.1/libexec/src/testing/testing.go:1392 +0x384
panic({0x1009ba340, 0x140001f5d28})
	/opt/homebrew/Cellar/go/1.18.1/libexec/src/runtime/panic.go:838 +0x204
github.com/unidoc/unipdf/v3/internal/textencoding.CMapEncoder.CharcodeToRune(...)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/internal/textencoding/textencoding.go:552
github.com/unidoc/unipdf/v3/extractor.(*textObject).renderText(0x14000ab02c0, {0x14000759328, 0x1, 0x8})
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:762 +0xab0
github.com/unidoc/unipdf/v3/extractor.(*textObject).showTextAdjusted(0x14000ab02c0, 0x1400000fea8)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:132 +0x178
github.com/unidoc/unipdf/v3/extractor.(*Extractor).extractPageText.func1(0x1400034fdd0, {{0x1009f2d78, 0x100f63dc8}, {0x1009f2e80, 0x14000084360}, {0x1009801a0, 0x140006021c8}, {0x10099ad00, 0x140001f5cf8}, {0x3ff0000000000000, ...}}, ...)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:797 +0x2348
github.com/unidoc/unipdf/v3/contentstream.(*ContentStreamProcessor).Process(0x14000765aa0, 0x100f63dc8?)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/contentstream/contentstream.go:314 +0xa94
github.com/unidoc/unipdf/v3/extractor.(*Extractor).extractPageText(0x14000136060, {0x14000644000, 0x9a44e}, 0x14000418060?, {0x3ff0000000000000, 0x0, 0x0, 0x0, 0x3ff0000000000000, 0x0, ...}, ...)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:828 +0x754
github.com/unidoc/unipdf/v3/extractor.(*Extractor).ExtractPageText(0x14000136060)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:243 +0x74
github.com/unidoc/unipdf/v3/extractor.(*Extractor).ExtractTextWithStats(0x14000214380?)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:508 +0x20
github.com/unidoc/unipdf/v3/extractor.(*Extractor).ExtractText(...)
	/Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:526

Currently, the obfuscated code of CMapEncoder.CharcodeToRune, looks like:

func (_agg CMapEncoder) CharcodeToRune(code CharCode) (rune, bool) {
	_egf, _ceg := _agg.charcodeToString(code)
	return ([]rune(_egf))[0], _ceg
}

The error happens because charcodeToString returns in some cases for these files an empty string. And []rune("") = nil

So a potential fix would be:

func (_agg CMapEncoder) CharcodeToRune(code CharCode) (rune, bool) {
	_egf, _ceg := _agg.charcodeToString(code)

	if _egf == "" {
		return MissingCodeRune, false
	}

	return ([]rune(_egf))[0], _ceg
}

Expected Behavior

No panics when extracting text

Actual Behavior

Triggers a panic: runtime error: index out of range [0] with length 0 in certain cases

Attachments

Sadly enough, I can't share a file due to GDPR reasons.

@sampila
Copy link
Collaborator

sampila commented Jun 7, 2022

Hi @becoded,

Thank you for reporting this issue and the potential fix.
We released new version v3.35.0 https://github.com/unidoc/unipdf-src/releases/tag/v3.35.0

@sampila sampila closed this as completed Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants