Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 2: [ 1608107 ] Conditional jump or move depends on uninitialised value(s) #4

Closed
jimregan opened this issue Apr 12, 2015 · 1 comment
Assignees

Comments

@jimregan
Copy link
Contributor

https://code.google.com/p/tesseract-ocr/issues/detail?id=2

2007-03-07T22:24:30.000Z
Reported by tmbdev, Mar 7, 2007
Emmanuel Fleury - efleury(sf)

Hi all,

I ran valgrind on the bokeoa-64bit-branch branch (with a patch submitted by
me) and I found the following problem:

==12801== Conditional jump or move depends on uninitialised value(s)
==12801== at 0x4A4B5E: IntegerMatcher(INT_CLASS_STRUCT_, unsigned long_,
unsigned long_, unsigned short, short, INT_FEATURE_STRUCT_, int, unsigned
char, INT_RESULT_STRUCT_, int) (intmatcher.cpp:1038)
==12801== by 0x49D7F9: AdaptToChar(blobstruct_, LINE_STATS_, unsigned
char, float) (adaptmatch.cpp:1285)
==12801== by 0x49E3DE: AdaptToWord(wordstruct_, textrowstruct_, char
const_, char const_, char const_) (adaptmatch.cpp:725)
==12801== by 0x427AD6: tess_adapter(WERD_, DENORM_, char const_, char
const_, char const_) (tessbox.cpp:349)
==12801== by 0x40DC33: classify_word_pass1(WERD_RES_, ROW_, unsigned
char, CHAR_SAMPLES_LIST_, CHAR_SAMPLE_LIST_) (control.cpp:611)
==12801== by 0x40E06C: recog_all_words(PAGE_RES_, ETEXT_DESC volatile*)
(control.cpp:295)
==12801== by 0x4044D6: recognize_page(STRING&) (tessedit.cpp:159)
==12801== by 0x4034A3: main (tesseractmain.cpp:104)

This bug is also present in 32bits architecture and does not depends on the
architecture.

Comments

Date: 2007-02-01 00:48
Sender: efleury
Logged In: YES
user_id=122014
Originator: YES

Great !

But, did you checkout ??? I cannot get my local CVS archive to get any
update... :-/

Date: 2007-01-31 15:34
Sender: theraysmithProject Admin
Logged In: YES
user_id=1515161
Originator: NO

This is fixed in 1.03. It was causing the adaptive classifier to not get
used enough.

Date: 2006-12-15 13:30
Sender: filipg
Logged In: YES
user_id=37894
Originator: NO

I think this is a FALSE-ALARM from valgrind (another one below):

Breakpoint 1, IntegerMatcher (ClassTemplate=0x925d8d8,
ProtoMask=0x91de6f0,
ConfigMask=0xbf925a78, BlobLength=47, NumFeatures=53, Features=0xbf925270,
min_misses=0,
NormalizationFactor=0 '\0', Result=0xbf926114, Debug=0) at
intmatcher.cpp:1043
1043 if (Features[Feature].CP_misses >= min_misses) {
(gdb) list
1042 for (Feature = 0, used_features = 0; Feature < NumFeatures;
Feature++) {
1043 if (Features[Feature].CP_misses >= min_misses) {
1044 IMUpdateTablesForFeature (ClassTemplate, ProtoMask,
ConfigMask,
1045 Feature, &(Features[Feature]),
1046 FeatureEvidence, SumOfFeatureEvidence,
1047 ProtoEvidence, Debug);
1048 used_features++;
1049 }
(gdb) print min_misses
$5 = 0
(gdb) print Feature
$6 = 0
(gdb) print Features[Feature]
$7 = {X = 97 'a', Y = 30 '\036', Theta = 192 '�', CP_misses = 0
'\0'}

Looks OK to me... The same seems to be true for "Source and destination
overlap
in strcpy". Take a look:

Breakpoint 1, fix_quotes (string=0x86b5b31 ""'License'');",
word=0x87dc530,
blob_choices=0xbfca8f58) at control.cpp:1034
1034 strcpy (ptr + 1, ptr + 2); //shuffle up
(gdb) list 1029
1029 for (ptr = string;
1030 _ptr != '\0'; ptr++, blob_it.forward (), choice_it.forward ())
{
1031 if ((_ptr == ''' || _ptr == '') 1032 && (_(ptr + 1) == '\'' || *(ptr + 1) == '')) {
1033 *ptr = '"'; //turn to double
1034 strcpy (ptr + 1, ptr + 2); //shuffle up
(gdb) print ptr+1
$1 = 0x86b5b32 "'License'');"
(gdb) print ptr+2
$2 = 0x86b5b33 "License'');"

Looks fine to me. Valgrind pointed to above twice as both recognition
passes call
fix_quotes():

==1993== 3 errors in context 1 of 4:
==1993== Source and destination overlap in strcpy(0x469B2FA, 0x469B2FB)
==1993== at 0x4006AAD: strcpy (mc_replace_strmem.c:106)
==1993== by 0x805333E: fix_quotes(char_, WERD_,
BLOB_CHOICE_LIST_CLIST_) (control.cpp:1034)
==1993== by 0x8054DBD: classify_word_pass1(WERD_RES_, ROW_, unsigned
char, CHAR_SAMPLES_LIST_, CHAR_SAMPLE_LIST_) (control.cpp:592)
==1993== by 0x80554C2: recog_all_words(PAGE_RES_, ETEXT_DESC volatile_)
(control.cpp:317)
==1993== by 0x804B9EB: recognize_page(STRING&) (tessedit.cpp:187)
==1993== by 0x804A869: main (tesseractmain.cpp:454)
==1993==
==1993== 6 errors in context 2 of 4:
==1993== Source and destination overlap in strcpy(0x46998D2, 0x46998D3)
==1993== at 0x4006AAD: strcpy (mc_replace_strmem.c:106)
==1993== by 0x805333E: fix_quotes(char_, WERD_,
BLOB_CHOICE_LIST_CLIST_) (control.cpp:1034)
==1993== by 0x8053B4C: match_word_pass2(WERD_RES_, ROW_, float)
(control.cpp:913)

Guess tesseract needs its own valgrind_suppressions.sh...

I've been meaning to play with valgrind for an unrelated reason - looked
into
this report while installing it. Very nice program. Found my rare
corruption
issue in a house app with it and 6 other potential problems!

Check it out if you haven't: http://www.valgrind.org/ (painless Linux
install)

Cheers,
Fil

@jimregan jimregan self-assigned this Apr 12, 2015
@jimregan jimregan changed the title [ 1608107 ] Conditional jump or move depends on uninitialised value(s) Defect issue Apr 12, 2015
@jimregan jimregan changed the title Defect issue Issue 2: [ 1608107 ] Conditional jump or move depends on uninitialised value(s) Apr 13, 2015
@jimregan
Copy link
Contributor Author

2007-05-17T18:14:16.000Z
Project Member 1 theraysmith
This was fixed in 1.03
Status: Fixed

This was referenced Jul 19, 2015
stweil referenced this issue in stweil/tesseract May 20, 2018
The following code caused a crash when Tesseract was compiled with -ftrapv:

1259	  int width = right - left;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff665c231 in __GI_abort () at abort.c:79
#2  0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
    at ../../../src/textord/colpartitiongrid.cpp:1259
#4  0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
#5  0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
    input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
    diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
#6  0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
    diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
#7  0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
    at ../../../src/ccmain/pagesegmain.cpp:141
#8  0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
#9  0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802

Signed-off-by: Stefan Weil <sw@weilnetz.de>
noahmetzger pushed a commit to noahmetzger/tesseract that referenced this issue Jul 31, 2018
The following code caused a crash when Tesseract was compiled with -ftrapv:

1259	  int width = right - left;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
tesseract-ocr#1  0x00007ffff665c231 in __GI_abort () at abort.c:79
tesseract-ocr#2  0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
tesseract-ocr#3  0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
    at ../../../src/textord/colpartitiongrid.cpp:1259
tesseract-ocr#4  0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
tesseract-ocr#5  0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
    input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
    diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
tesseract-ocr#6  0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
    diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
tesseract-ocr#7  0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
    at ../../../src/ccmain/pagesegmain.cpp:141
tesseract-ocr#8  0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
tesseract-ocr#9  0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil added a commit that referenced this issue Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*) tesseract/src/classify/intmatcher.cpp:1163:17
	    #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11
	    #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f35fd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f201e in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads, but does not fix the primary
reason: ProtoLengths currently gets values which are larger than the
allowed index.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil referenced this issue in stweil/tesseract Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*, short) tesseract/src/classify/intmatcher.cpp:1121:17
	    #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11
	    #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f16ee in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads in release builds.
Add also assertions for debug builds.

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil referenced this issue in stweil/tesseract Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1231:62: runtime error: division by zero
	    #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62
	    #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29
	    #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13
	    #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5
	    #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil added a commit that referenced this issue Mar 24, 2019
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1231:62: runtime error: division by zero
	    #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62
	    #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29
	    #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13
	    #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5
	    #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant