Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1: [ 1669644 ] Crash in letter_is_okay() with trigger #3

Closed
jimregan opened this issue Apr 12, 2015 · 5 comments
Closed

Issue 1: [ 1669644 ] Crash in letter_is_okay() with trigger #3

jimregan opened this issue Apr 12, 2015 · 5 comments
Assignees

Comments

@jimregan
Copy link
Contributor

https://code.google.com/p/tesseract-ocr/issues/detail?id=1

Reported by tmbdev, Mar 7, 2007
Filip Gieszczykiewicz - filipg(sf)

recognizing attached tif with v1.03 crashes as follows:
pppppspppppppppspppppppppsppppppppppppppppppppppppppp
Program received signal SIGSEGV, Segmentation fault.
0x080fb8a8 in letter_is_okay (dawg=0xb7f09008, node=0xbf815a04,
char_index=7, prevchar=0 '\0',
word=0xbf815bff "proto-ft", word_end=0) at dawg.cpp:49
49 if (edge_occupied (dawg, edge)) {
(gdb) bt
0 0x080fb8a8 in letter_is_okay (dawg=0xb7f09008, node=0xbf815a04,
char_index=7, prevchar=0 '\0',
word=0xbf815bff "proto-ft", word_end=0) at dawg.cpp:49
1 0x080f3b26 in append_next_choice (dawg=0xb7f09008, node=108107,
permuter=5 '\005',
word=0xbf815bff "proto-ft", choices=0x82e7ad0, char_index=7,
this_choice=0x8260df0,
prevchar=0 '\0', limit=0xbf815c28, rating=0, certainty=-1.15637732,
rating_array=0xbf815ab4,
certainty_array=0xbf815b58, word_ending=0, last_word=0,
result=0xbf815a58) at permdawg.cpp:202
2 0x080f3f03 in dawg_permute (dawg=0xb7f09008, node=108107, permuter=5
'\005',
choices=0x82e7ad0, char_index=7, limit=0xbf815c28, word=0xbf815bff
"proto-ft", rating=0,
certainty=0, rating_array=0xbf815ab4, certainty_array=0xbf815b58,
last_word=0)
at permdawg.cpp:273
3 0x080f40b3 in dawg_permute_and_select (string=0x814f9fc "system
words:", dawg=0xb7f09008,
permuter=5 '\005', character_choices=0x82e7ad0, best_choice=0x8260d40,
system_words=1)
at permdawg.cpp:334
4 0x080f5640 in permute_words (char_choices=0x82e7ad0, rating_limit=1000)
at permute.cpp:1611
5 0x080f6549 in permute_all (char_choices=0x82e7ad0, rating_limit=1000,
raw_choice=0xbf815dc8)
at permute.cpp:1092
6 0x080f6952 in permute_characters (char_choices=0x82e7ad0, limit=1000,
best_choice=0xbf815dd8,
raw_choice=0xbf815dc8) at permute.cpp:1146
7 0x080d1ef6 in chop_word_main (word=0x826f830, fx=1,
best_choice=0xbf815dd8,
raw_choice=0xbf815dc8, tester=0 '\0', trainer=0 '\0') at
chopper.cpp:476
8 0x080cf426 in cc_recog (tessword=0x826f830, best_choice=0xbf815dd8,
best_raw_choice=0xbf815dc8, tester=0 '\0', trainer=0 '\0') at
tface.cpp:247
9 0x08069a94 in recog_word_recursive (word=0x826e9f0, denorm=0x826be54,
matcher=0x80684a0 <tess_default_matcher(PBLOB*, PBLOB*, PBLOB*, WERD*,
DENORM*, BLOB_CHOICE_LIST&)>, tester=0, trainer=,
testing=0 '\0', raw_choice=@0x826be7c,
blob_choices=0xbf8162b8, outword=@0x826be50) at tfacepp.cpp:191
10 0x0806a380 in recog_word (word=0x826e9f0, denorm=0x826be54,
matcher=0x80684a0 <tess_default_matcher(PBLOB*, PBLOB*, PBLOB*, WERD*,
DENORM*, BLOB_CHOICE_LIST&)>, tester=0, trainer=0, testing=0 '\0',
raw_choice=@0x826be7c, blob_choices=0xbf8162b8,
outword=@0x826be50) at tfacepp.cpp:90

I don't think it's related to issue 1546972

It is dependent on the specific image - recreating a new TIF with pbmtext
of the contained text does not crash. Also, scaling image -2.0 or +2.0 does
not crash - just this one does.

Argh, image too big for sf.net - see
http://tesseract-ocr.repairfaq.org/downloads/b37by2.tif

@jimregan jimregan self-assigned this Apr 12, 2015
@jimregan jimregan changed the title [ 1669644 ] Crash in letter_is_okay() with trigger Defect issue Apr 12, 2015
@jimregan jimregan changed the title Defect issue Issue 1: [ 1669644 ] Crash in letter_is_okay() with trigger Apr 12, 2015
@jimregan
Copy link
Contributor Author

2007-03-07T22:23:34.000Z
Mar 7, 2007 Project Member 1 tmbdev
(No comment was entered for this change.)
Owner: ---

@jimregan
Copy link
Contributor Author

2007-03-07T22:24:44.000Z
Mar 7, 2007 Project Member 2 tmbdev
(No comment was entered for this change.)
Owner: theraysmith

@jimregan
Copy link
Contributor Author

2007-03-07T22:47:04.000Z
Mar 7, 2007 Project Member 3 tmbdev
(No comment was entered for this change.)
Summary: [ 1669644 ] Crash in letter_is_okay() with trigger

@jimregan
Copy link
Contributor Author

2008-11-14T03:24:22.000Z
Nov 13, 2008 Project Member 4 theraysmith
I found the cause of this, and it will be fixed in 2.04.
Status: Started

@jimregan
Copy link
Contributor Author

2008-12-24T01:04:46.000Z
Dec 23, 2008 Project Member 5 theraysmith
(No comment was entered for this change.)
Status: Duplicate
Mergedinto: 128

This was referenced Jul 19, 2015
stweil referenced this issue in stweil/tesseract May 20, 2018
The following code caused a crash when Tesseract was compiled with -ftrapv:

1259	  int width = right - left;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff665c231 in __GI_abort () at abort.c:79
#2  0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
    at ../../../src/textord/colpartitiongrid.cpp:1259
#4  0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
#5  0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
    input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
    diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
#6  0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
    diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
#7  0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
    at ../../../src/ccmain/pagesegmain.cpp:141
#8  0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
#9  0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802

Signed-off-by: Stefan Weil <sw@weilnetz.de>
noahmetzger pushed a commit to noahmetzger/tesseract that referenced this issue Jul 31, 2018
The following code caused a crash when Tesseract was compiled with -ftrapv:

1259	  int width = right - left;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
tesseract-ocr#1  0x00007ffff665c231 in __GI_abort () at abort.c:79
tesseract-ocr#2  0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
tesseract-ocr#3  0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
    at ../../../src/textord/colpartitiongrid.cpp:1259
tesseract-ocr#4  0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
tesseract-ocr#5  0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
    input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
    diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
tesseract-ocr#6  0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
    diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
tesseract-ocr#7  0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
    at ../../../src/ccmain/pagesegmain.cpp:141
tesseract-ocr#8  0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
tesseract-ocr#9  0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil added a commit that referenced this issue Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*) tesseract/src/classify/intmatcher.cpp:1163:17
	    #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11
	    #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f35fd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f201e in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads, but does not fix the primary
reason: ProtoLengths currently gets values which are larger than the
allowed index.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil referenced this issue in stweil/tesseract Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*, short) tesseract/src/classify/intmatcher.cpp:1121:17
	    #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11
	    #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f16ee in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads in release builds.
Add also assertions for debug builds.

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil referenced this issue in stweil/tesseract Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1231:62: runtime error: division by zero
	    #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62
	    #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29
	    #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13
	    #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5
	    #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil added a commit that referenced this issue Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1231:62: runtime error: division by zero
	    #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62
	    #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29
	    #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13
	    #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5
	    #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant