Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBFtrim: trailing 0s in type F fields #6

Closed
yakra opened this issue Nov 10, 2017 · 8 comments · Fixed by #22
Closed

DBFtrim: trailing 0s in type F fields #6

yakra opened this issue Nov 10, 2017 · 8 comments · Fixed by #22

Comments

@yakra
Copy link
Owner

yakra commented Nov 10, 2017

If a type F field never uses scientific notation,
IE, never contains 'e' (or 'E'?)
look into trimming trailing 0s as done for type N fields
(or sometimes if it does contain 'e'?)

@yakra
Copy link
Owner Author

yakra commented Nov 25, 2017

ROADS_ACF.yTrim4.dbf
unique_id: trim ".000000000000000"; save 16 B
ah_blm: trim "0000000"; save 7 B
ah_elm: trim "0000000"; save 7 B
ah_length: trim "0000000"; save 7 B
ah_seg_num: trim ".000000000000000"; save 16 B
TOTAL: save 53 B / record
x 439026 records = 23,268,378 B

2013-03-05: Some fields are missing. Was this list compiled from a culled file? A more relevant comment will be posted below...

@yakra
Copy link
Owner Author

yakra commented Mar 4, 2018

7/9 test files are identical to previous DBFtrim version. Exceptions:

txdot-2015-roadways_48113.dbf (Dallas)
Shape_STLe F 12 2 0. <- 0.0000000000
Why is the extraneous decimal point not trimmed?
(It can't be left-justified; 12 chars take up entire field width. DecCount?)
• Using DBFcull, I see the DIFF is limited to this field. Good...
• Per DBFmine, 0.0000000000 is the only value for all records

ROADS_ACF.yTrim.dbf
Binary files testfiles.old/ROADS_ACF.yTrim.dbf and testfiles.new/ROADS_ACF.yTrim.dbf differ
File sizes are identical. No indication in field info display of any extraneous 0s being trimmed. (Why?) In theory then, nothing should be different... Either:
• load a 610.7 MB file into hex editor,
• fix the bug affecting TX first and hope the problem goes away, or
• something else

@yakra
Copy link
Owner Author

yakra commented Mar 4, 2018

TX

txdot-2015-roadways_48113.dbf (Dallas)
Shape_STLe F 12 2 0. <- 0.0000000000
Why is the extraneous decimal point not trimmed?
(It can't be left-justified; 12 chars take up entire field width. DecCount?)
• Using DBFcull, I see the DIFF is limited to this field. Good...
• Per DBFmine, 0.0000000000 is the only value for all records

Yes, the problem is in DecCount. Set to 0 in original file.
Thus tDBF.fArr[fNum].DecCount = DecCount-MinEx0; -> 0-10; wraps around & becomes 246.
Thus if (MinEx0 == DecCount) MinEx0++; never happens.

A couple workarounds for cases like this should be pretty easy.

First, look thru other files with type F fields, and see if having a Decimal Count of 0 is a common thing, or unique to the TXDOT files.

FWIW, the Decimal Count in the next field(Shape_Leng) is screwy too: Length of 12, Decimal Count of 11. 12 minus the leading zero & the decimal point itself leaves only 10 decimal places.
This may just be a TXDOT thing...

@yakra
Copy link
Owner Author

yakra commented Mar 4, 2018

First, look thru other files with type F fields, and see if having a Decimal Count of 0 is a common thing, or unique to the TXDOT files.

~/gis/data/pe/nrn_rrn_pe_12.0_shp_en/NRN_PE_12_0_ROADSEG.dbf
No type F fields.

~/gis/data/md/SHA_Routes/SHA_LINE_ROUTES_MD_2015
SHAPELENGT, DecCount 0x0B / 11, scientific notation, makes sense
SHAPE_Leng, DecCount 0x0B / 11, scientific notation, makes sense
All good!

~/gis/data/me/medotpubrdss/2016-04-08/medotpubrdss
Shape_len, DecCount 0x0B / 11, scientific notation, makes sense
All good!

~/gis/data/me/e911rdss/e911rds
SHAPE_len, DecCount 0x0B / 11, scientific notation, makes sense
All good!

~/gis/data/me/medotpubrdss/2017-08-24/medotpubrds
PRIM_BMP, PRIM_EMP, SEG_LEN__M, Shape_len
All DecCount 0x0B / 11, scientific notation, make sense
All good!

~/gis/data/nh/roads_dot_2016/Roads_DOT
MP_START, MP_END, SECT_LENGT, SHAPE_Leng
All DecCount 0x0B / 11, scientific notation, make sense
All good!

~/gis/data/ma/RoadInv2017/Road_Inventory.yOrig.dbf
No type F fields.

~/gis/data/ar/ROADS_ACF/ROADS_ACF
unique_id, DecCount 0x0F / 15, only 0s to right of decimal. Blank value exists.
src_code, DecCount 0x0F / 15, only 0s to right of decimal. Blank value exists.
nssda_val, DecCount 0x0F / 15, only 0s to right of decimal. Blank value exists.
ah_blm, DecCount 0x0F / 15, as many as 8 non-0 pl. Blank value. Trimmed DecCount == 3. HOW?
ah_elm, DecCount 0x0F / 15, as many as 8 non-0 pl. Blank value. Trimmed DecCount == 3. HOW?
ah_length, DecCount 0x0F / 15, as many as 8 non-0 pl. Blank value. Trimmed DecCount == 3. HOW?
ah_seg_num, DecCount 0x0F / 15, only 0s to right of decimal. Blank value exists.
Shape_STLe, DecCount 0x0F / 15, all sig figs used & no potential for trimming. No blank value.

@yakra
Copy link
Owner Author

yakra commented Mar 5, 2018

AR

Why are no extraneous zeros trimmed?
How does DecCount get trimmed from 15 to 3?
It's those bloody blank-space values!

  • Record Foo has a value of 0.001000000000000, for example.
    MinEx0 gets set to 12.
    DecCount gets set to 3 (15-12).
  • Records Bar thru Baz have 12 or more extraneous zeros; nothing changes.
  • Record Qux is a blank-space value.
    !strchr(fVal, '.') && !strchr(fVal, 'E') && !strchr(fVal, 'e')
    Thus MinEx0 = 0;
    Thus pad will never again be less than MinEx0.
    Thus DecCount never gets changed again.

Solution:

  • Reset DecCount along with MinEx0 in the else statement.
  • Create an exception to this for blank values [ if (strlen(fVal)) ? ] so I can still trim stuff.
    How would this get handled when saving trimmed records to disk?

What happens to a blank-space value?
15 for (pad = 0; (fVal[pad] <= ' ') && pad < len; pad++);
pad gets set to strlen
16 NewVal = new char[strlen(fVal+pad)+1];
--> NewVal = new char[strlen("\0")+1];
--> NewVal = new char[0+1];
17 strcpy(NewVal, fVal+pad);
--> strcpy(NewVal, "\0");
19 fVal = NewVal;
fVal == "\0"
strlen(fVal) == 0
(...Right?)

@yakra
Copy link
Owner Author

yakra commented Mar 5, 2018

  • Reset DecCount along with MinEx0 in the else statement.
  • Create an exception to this for blank values [ if (strlen(fVal)) ? ] so I can still trim stuff.
    How would this get handled when saving trimmed records to disk?

ah_length gets trimmed 1 byte too many
10.10300000 datum disappears completely
all other values have 1 digit left of decimal

unsigned int RecNum = ((unsigned int)DBFf.tellg()-dbf.HeaLen)/dbf.RecLen+1;

testfiles.1/ROADS_ACF.ah_length-only.dbf
RecNum = (140551-65)/19+1;
RecNum = 140486/19+1;
RecNum = 7394+1;

testfiles.2/ROADS_ACF.ah_length-only.dbf
RecNum = 7394+1; then multiply by RecLen
RecNum = 81334/11+1; then add HeaLen
RecNum = (81399-65)/11+1

Datum there is 0.10300000, I.E. the '1' @ beginning got trimmed...
All data before that only goes to 3 decimal places.

10.103000000000000 stays the same as established from Record 1:
MinEx0 = 12
DecCount = 15-12 = 3
len = strlen(fVal)-MinEx0;
len = 18-12;
len = 6
Later, we reach 0.032541810000000 @ record 9237
MinEx0 = 7
DecCount = 15-7 = 8
if (strlen(fVal) > tDBF.fArr[fNum].len+MinEx0)
if (17 > 6+7)
if (17 > 13) (Yes.)
len = strlen(fVal)-MinEx0;
len = 17-7;
len = 10

Long story short, the solution is to track how many digits are to the left of the decimal.

@yakra
Copy link
Owner Author

yakra commented Mar 6, 2018

MaxIntD implemented...

TX: v1 & v3 no diff. Good.
PE: v0 & v3 no diff. Good.
MD: v0 & v3 no diff. Good.
ME e911rds: v0 & v3 no diff. Good.
NH: v0 & v3 no diff. Good.
ME DOT16: v0 & v3 no diff. Good.
ME DOT17: v0 & v3 no diff. Good.
MA: v0 & v3 no diff. Good.

And finally, AR...
unique_id: trim ".000000000000000"; save 16 B
src_code: trim ".000000000000000"; save 16 B
nssda_val: trim ".000000000000000"; save 16 B
ah_blm: trim "0000000"; save 7 B
ah_elm: trim "0000000"; save 7 B
ah_length: trim "0000000"; save 7 B
ah_seg_num: trim ".000000000000000"; save 16 B
Shape_STLe cannot be trimmed
TOTAL: save 85 B / record
x 439026 records = 37317210 B
610687600 (v1) - 37317210 = 573370390

And the verdict is... 573370390. YES!

@yakra
Copy link
Owner Author

yakra commented Mar 6, 2018

"The TX exception" to trim extraneous decimal point

if (MinEx0 >= DecCount) tDBF.fArr[fNum].DecCount = 0;

...inter alia.

TX: v4 filesize 27610 B greater than v1 thru v3. Good!
All other files identical to v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant