Skip to content

Commit

Permalink
Made output more compatible to the hOCR spec.
Browse files Browse the repository at this point in the history
- properties "image" and "bbox" for the "ocr_page" element.
- correct orientation of coordinate system
  • Loading branch information
jze committed Dec 2, 2016
1 parent 976a3ba commit 060ff21
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions ocropus-hocr
Expand Up @@ -63,7 +63,9 @@ for arg in args.files:
base,_ = ocrolib.allsplitext(arg)
try:
E("===",arg)
P("<div class='ocr_page' title='file %s'>"%arg)
image = ocrolib.read_image_binary(arg)
height, width = image.shape
P("<div class='ocr_page' title='image %s; bbox 0 0 %d %d'>"%(arg,width,height))

# to proceed, we need a pseg file and a
# subdirectory containing text lines
Expand All @@ -88,7 +90,7 @@ for arg in args.files:
# and insert paragraph breaks as needed

id = regions.id(i)
y0,x0,y1,x1 = regions.bboxMath(i)
y0,x0,y1,x1 = regions.bbox(i)
if last_coords is not None:
lx0,ly0 = last_coords
dx,dy = x0-lx0,y1-ly0
Expand Down

0 comments on commit 060ff21

Please sign in to comment.