Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicharset — Incomplete properties #318

Closed
ne0zer0 opened this issue May 8, 2016 · 35 comments
Closed

Unicharset — Incomplete properties #318

ne0zer0 opened this issue May 8, 2016 · 35 comments

Comments

@ne0zer0
Copy link

ne0zer0 commented May 8, 2016

Hi,

I get some strange result when I try to train Tesseract.

Some part are very improved comparing to the default eng.tessdata, when some part are strangely added or modified, while the image quality is very good (24 become eat ???; uppercase letter become lowercase; some words are cut in two words; etc)

I think it may be cause by unicharset.

Indeed, when I try to generate a unicharset file with the following command :
unicharset_extractor eng.palladio-regular.exp8.box
I get an incomplete file. Here the result :

    115
    NULL 0 NULL 0
    Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65 64 ]
    |Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # Broken
    d 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # d [64 ]
    i 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # i [69 ]
    f 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # f [66 ]
    e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # e [65 ]
    r 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # r [72 ]
    n 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # n [6e ]
    t 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # t [74 ]
    N 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # N [4e ]
    w 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # w [77 ]
    A 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # A [41 ]
    c 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # c [63 ]
    l 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # l [6c ]
    s 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # s [73 ]
    p 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # p [70 ]
    a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # a [61 ]
    g 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # g [67 ]
    2 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 2 [32 ]
    3 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 3 [33 ]
    T 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # T [54 ]
    o 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # o [6f ]
    S 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # S [53 ]
    v 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # v [76 ]
    ~ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ~ [7e ]
    D 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # D [44 ]
    C 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # C [43 ]
    h 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # h [68 ]
    ' 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ' [27 ]
    7 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 7 [37 ]
    « 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # « [ab ]
    : 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # : [3a ]
    #0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # # [23 ]
    1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 1 [31 ]
    Z 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # Z [5a ]
    _ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # _ [5f ]
    M 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # M [4d ]
    u 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # u [75 ]
    m 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # m [6d ]
    P 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # P [50 ]
    H 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # H [48 ]
    O 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # O [4f ]
    ( 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ( [28 ]
    ) 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ) [29 ]
    q 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # q [71 ]
    y 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # y [79 ]
    | 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # | [7c ]
    U 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # U [55 ]
    0 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 0 [30 ]
    % 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # % [25 ]
    x 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # x [78 ]
    F 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # F [46 ]
    R 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # R [52 ]
    I 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # I [49 ]
    , 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # , [2c ]
    ! 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ! [21 ]
    E 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # E [45 ]
    b 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # b [62 ]
    \ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # \ [5c ]
    8 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 8 [38 ]
    ? 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ? [3f ]
    & 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # & [26 ]
    ; 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ; [3b ]
    B 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # B [42 ]
    k 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # k [6b ]
    - 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # - [2d ]
    > 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # > [3e ]
    L 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # L [4c ]
    . 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # . [2e ]
    — 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # — [2014 ]
    4 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 4 [34 ]
    » 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # » [bb ]
    € 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # € [20ac ]
    W 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # W [57 ]
    J 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # J [4a ]
    é 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # é [e9 ]
    9 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 9 [39 ]
    ® 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # ® [ae ]
    $ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # $ [24 ]
    5 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 5 [35 ]
    } 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # } [7d ]
    [ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # [ [5b ]
    Y 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # Y [59 ]
    § 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # § [a7 ]
    " 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # " [22 ]
    { 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # { [7b ]
    ¢ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # ¢ [a2 ]
    / 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # / [2f ]
    Q 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # Q [51 ]
    6 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 6 [36 ]
    G 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # G [47 ]
    ” 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # ” [201d ]
    ° 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # ° [b0 ]
    K 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # K [4b ]
    ¥ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # ¥ [a5 ]
    V 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # V [56 ]
    © 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # © [a9 ]
    z 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # z [7a ]
    + 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # + [2b ]
    = 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # = [3d ]
    £ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # £ [a3 ]
    < 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # < [3c ]
    ’ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # ’ [2019 ]
    ‘ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # ‘ [2018 ]
    j 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # j [6a ]
    X 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # X [58 ]
    ] 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ] [5d ]
    * 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # * [2a ]
    “ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # “ [201c ]
    @ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # @ [40 ]
    • 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # • [2022 ]
    – 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # – [2013 ]
    … 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # … [2026 ]
    ^ 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ^ [5e ]

When I try to fix it with with set_unicharset_properties:
set_unicharset_properties --F font_properties -U unicharset -O output_unicharset --script_dir=/

I get these warnings :

    Loaded unicharset of size 115 from file unicharset
    Setting unichar properties
    Other case É of é is not in unicharset
    Warning: properties incomplete for index 3 = d
    Warning: properties incomplete for index 4 = i
    Warning: properties incomplete for index 5 = f
    Warning: properties incomplete for index 6 = e
    Warning: properties incomplete for index 7 = r
    Warning: properties incomplete for index 8 = n
    Warning: properties incomplete for index 9 = t
    Warning: properties incomplete for index 10 = N
    Warning: properties incomplete for index 11 = w
    Warning: properties incomplete for index 12 = A
    Warning: properties incomplete for index 13 = c
    Warning: properties incomplete for index 14 = l
    Warning: properties incomplete for index 15 = s
    Warning: properties incomplete for index 16 = p
    Warning: properties incomplete for index 17 = a
    Warning: properties incomplete for index 18 = g
    Warning: properties incomplete for index 19 = 2
    Warning: properties incomplete for index 20 = 3
    Warning: properties incomplete for index 21 = T
    Warning: properties incomplete for index 22 = o
    Warning: properties incomplete for index 23 = S
    Warning: properties incomplete for index 24 = v
    Warning: properties incomplete for index 25 = ~
    Warning: properties incomplete for index 26 = D
    Warning: properties incomplete for index 27 = C
    Warning: properties incomplete for index 28 = h
    Warning: properties incomplete for index 29 = '
    Warning: properties incomplete for index 30 = 7
    Warning: properties incomplete for index 31 = «
    Warning: properties incomplete for index 32 = :
    Warning: properties incomplete for index 33 = #
    Warning: properties incomplete for index 34 = 1
    Warning: properties incomplete for index 35 = Z
    Warning: properties incomplete for index 36 = _
    Warning: properties incomplete for index 37 = M
    Warning: properties incomplete for index 38 = u
    Warning: properties incomplete for index 39 = m
    Warning: properties incomplete for index 40 = P
    Warning: properties incomplete for index 41 = H
    Warning: properties incomplete for index 42 = O
    Warning: properties incomplete for index 43 = (
    Warning: properties incomplete for index 44 = )
    Warning: properties incomplete for index 45 = q
    Warning: properties incomplete for index 46 = y
    Warning: properties incomplete for index 47 = |
    Warning: properties incomplete for index 48 = U
    Warning: properties incomplete for index 49 = 0
    Warning: properties incomplete for index 50 = %
    Warning: properties incomplete for index 51 = x
    Warning: properties incomplete for index 52 = F
    Warning: properties incomplete for index 53 = R
    Warning: properties incomplete for index 54 = I
    Warning: properties incomplete for index 55 = ,
    Warning: properties incomplete for index 56 = !
    Warning: properties incomplete for index 57 = E
    Warning: properties incomplete for index 58 = b
    Warning: properties incomplete for index 59 = \
    Warning: properties incomplete for index 60 = 8
    Warning: properties incomplete for index 61 = ?
    Warning: properties incomplete for index 62 = &
    Warning: properties incomplete for index 63 = ;
    Warning: properties incomplete for index 64 = B
    Warning: properties incomplete for index 65 = k
    Warning: properties incomplete for index 66 = -
    Warning: properties incomplete for index 67 = >
    Warning: properties incomplete for index 68 = L
    Warning: properties incomplete for index 69 = .
    Warning: properties incomplete for index 70 = —
    Warning: properties incomplete for index 71 = 4
    Warning: properties incomplete for index 72 = »
    Warning: properties incomplete for index 73 = €
    Warning: properties incomplete for index 74 = W
    Warning: properties incomplete for index 75 = J
    Warning: properties incomplete for index 76 = é
    Warning: properties incomplete for index 77 = 9
    Warning: properties incomplete for index 78 = ®
    Warning: properties incomplete for index 79 = $
    Warning: properties incomplete for index 80 = 5
    Warning: properties incomplete for index 81 = }
    Warning: properties incomplete for index 82 = [
    Warning: properties incomplete for index 83 = Y
    Warning: properties incomplete for index 84 = §
    Warning: properties incomplete for index 85 = "
    Warning: properties incomplete for index 86 = {
    Warning: properties incomplete for index 87 = ¢
    Warning: properties incomplete for index 88 = /
    Warning: properties incomplete for index 89 = Q
    Warning: properties incomplete for index 90 = 6
    Warning: properties incomplete for index 91 = G
    Warning: properties incomplete for index 92 = ”
    Warning: properties incomplete for index 93 = °
    Warning: properties incomplete for index 94 = K
    Warning: properties incomplete for index 95 = ¥
    Warning: properties incomplete for index 96 = V
    Warning: properties incomplete for index 97 = ©
    Warning: properties incomplete for index 98 = z
    Warning: properties incomplete for index 99 = +
    Warning: properties incomplete for index 100 = =
    Warning: properties incomplete for index 101 = £
    Warning: properties incomplete for index 102 = <
    Warning: properties incomplete for index 103 = ’
    Warning: properties incomplete for index 104 = ‘
    Warning: properties incomplete for index 105 = j
    Warning: properties incomplete for index 106 = X
    Warning: properties incomplete for index 107 = ]
    Warning: properties incomplete for index 108 = *
    Warning: properties incomplete for index 109 = “
    Warning: properties incomplete for index 110 = @
    Warning: properties incomplete for index 111 = •
    Warning: properties incomplete for index 112 = –
    Warning: properties incomplete for index 113 = …
    Warning: properties incomplete for index 114 = ^

And this incomplete file :

    115
    NULL 0 Common 0
    Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
    |Broken|0|1 f 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
    d 3 0,255,0,255,0,0,0,0,0,0 Latin 26 0 3 d  # d [64 ]a
    i 3 0,255,0,255,0,0,0,0,0,0 Latin 54 0 4 i  # i [69 ]a
    f 3 0,255,0,255,0,0,0,0,0,0 Latin 52 0 5 f  # f [66 ]a
    e 3 0,255,0,255,0,0,0,0,0,0 Latin 57 0 6 e  # e [65 ]a
    r 3 0,255,0,255,0,0,0,0,0,0 Latin 53 0 7 r  # r [72 ]a
    n 3 0,255,0,255,0,0,0,0,0,0 Latin 10 0 8 n  # n [6e ]a
    t 3 0,255,0,255,0,0,0,0,0,0 Latin 21 0 9 t  # t [74 ]a
    N 5 0,255,0,255,0,0,0,0,0,0 Latin 8 0 10 N  # N [4e ]A
    w 3 0,255,0,255,0,0,0,0,0,0 Latin 74 0 11 w # w [77 ]a
    A 5 0,255,0,255,0,0,0,0,0,0 Latin 17 0 12 A # A [41 ]A
    c 3 0,255,0,255,0,0,0,0,0,0 Latin 27 0 13 c # c [63 ]a
    l 3 0,255,0,255,0,0,0,0,0,0 Latin 68 0 14 l # l [6c ]a
    s 3 0,255,0,255,0,0,0,0,0,0 Latin 23 0 15 s # s [73 ]a
    p 3 0,255,0,255,0,0,0,0,0,0 Latin 40 0 16 p # p [70 ]a
    a 3 0,255,0,255,0,0,0,0,0,0 Latin 12 0 17 a # a [61 ]a
    g 3 0,255,0,255,0,0,0,0,0,0 Latin 91 0 18 g # g [67 ]a
    2 8 0,255,0,255,0,0,0,0,0,0 Common 19 2 19 2    # 2 [32 ]0
    3 8 0,255,0,255,0,0,0,0,0,0 Common 20 2 20 3    # 3 [33 ]0
    T 5 0,255,0,255,0,0,0,0,0,0 Latin 9 0 21 T  # T [54 ]A
    o 3 0,255,0,255,0,0,0,0,0,0 Latin 42 0 22 o # o [6f ]a
    S 5 0,255,0,255,0,0,0,0,0,0 Latin 15 0 23 S # S [53 ]A
    v 3 0,255,0,255,0,0,0,0,0,0 Latin 96 0 24 v # v [76 ]a
    ~ 0 0,255,0,255,0,0,0,0,0,0 Common 25 10 25 ~   # ~ [7e ]
    D 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 26 D  # D [44 ]A
    C 5 0,255,0,255,0,0,0,0,0,0 Latin 13 0 27 C # C [43 ]A
    h 3 0,255,0,255,0,0,0,0,0,0 Latin 41 0 28 h # h [68 ]a
    ' 10 0,255,0,255,0,0,0,0,0,0 Common 29 10 29 '  # ' [27 ]p
    7 8 0,255,0,255,0,0,0,0,0,0 Common 30 2 30 7    # 7 [37 ]0
    « 10 0,255,0,255,0,0,0,0,0,0 Common 31 10 72 «    # « [ab ]p
    : 10 0,255,0,255,0,0,0,0,0,0 Common 32 6 32 :   # : [3a ]p
    #10 0,255,0,255,0,0,0,0,0,0 Common 33 4 33 #   # # [23 ]p
    1 8 0,255,0,255,0,0,0,0,0,0 Common 34 2 34 1    # 1 [31 ]0
    Z 5 0,255,0,255,0,0,0,0,0,0 Latin 98 0 35 Z # Z [5a ]A
    _ 10 0,255,0,255,0,0,0,0,0,0 Common 36 10 36 _  # _ [5f ]p
    M 5 0,255,0,255,0,0,0,0,0,0 Latin 39 0 37 M # M [4d ]A
    u 3 0,255,0,255,0,0,0,0,0,0 Latin 48 0 38 u # u [75 ]a
    m 3 0,255,0,255,0,0,0,0,0,0 Latin 37 0 39 m # m [6d ]a
    P 5 0,255,0,255,0,0,0,0,0,0 Latin 16 0 40 P # P [50 ]A
    H 5 0,255,0,255,0,0,0,0,0,0 Latin 28 0 41 H # H [48 ]A
    O 5 0,255,0,255,0,0,0,0,0,0 Latin 22 0 42 O # O [4f ]A
    ( 10 0,255,0,255,0,0,0,0,0,0 Common 43 10 44 (  # ( [28 ]p
    ) 10 0,255,0,255,0,0,0,0,0,0 Common 44 10 43 )  # ) [29 ]p
    q 3 0,255,0,255,0,0,0,0,0,0 Latin 89 0 45 q # q [71 ]a
    y 3 0,255,0,255,0,0,0,0,0,0 Latin 83 0 46 y # y [79 ]a
    | 0 0,255,0,255,0,0,0,0,0,0 Common 47 10 47 |   # | [7c ]
    U 5 0,255,0,255,0,0,0,0,0,0 Latin 38 0 48 U # U [55 ]A
    0 8 0,255,0,255,0,0,0,0,0,0 Common 49 2 49 0    # 0 [30 ]0
    % 10 0,255,0,255,0,0,0,0,0,0 Common 50 4 50 %   # % [25 ]p
    x 3 0,255,0,255,0,0,0,0,0,0 Latin 106 0 51 x    # x [78 ]a
    F 5 0,255,0,255,0,0,0,0,0,0 Latin 5 0 52 F  # F [46 ]A
    R 5 0,255,0,255,0,0,0,0,0,0 Latin 7 0 53 R  # R [52 ]A
    I 5 0,255,0,255,0,0,0,0,0,0 Latin 4 0 54 I  # I [49 ]A
    , 10 0,255,0,255,0,0,0,0,0,0 Common 55 6 55 ,   # , [2c ]p
    ! 10 0,255,0,255,0,0,0,0,0,0 Common 56 10 56 !  # ! [21 ]p
    E 5 0,255,0,255,0,0,0,0,0,0 Latin 6 0 57 E  # E [45 ]A
    b 3 0,255,0,255,0,0,0,0,0,0 Latin 64 0 58 b # b [62 ]a
    \ 10 0,255,0,255,0,0,0,0,0,0 Common 59 10 59 \  # \ [5c ]p
    8 8 0,255,0,255,0,0,0,0,0,0 Common 60 2 60 8    # 8 [38 ]0
    ? 10 0,255,0,255,0,0,0,0,0,0 Common 61 10 61 ?  # ? [3f ]p
    & 10 0,255,0,255,0,0,0,0,0,0 Common 62 10 62 &  # & [26 ]p
    ; 10 0,255,0,255,0,0,0,0,0,0 Common 63 10 63 ;  # ; [3b ]p
    B 5 0,255,0,255,0,0,0,0,0,0 Latin 58 0 64 B # B [42 ]A
    k 3 0,255,0,255,0,0,0,0,0,0 Latin 94 0 65 k # k [6b ]a
    - 10 0,255,0,255,0,0,0,0,0,0 Common 66 3 66 -   # - [2d ]p
    > 0 0,255,0,255,0,0,0,0,0,0 Common 67 10 102 >  # > [3e ]
    L 5 0,255,0,255,0,0,0,0,0,0 Latin 14 0 68 L # L [4c ]A
    . 10 0,255,0,255,0,0,0,0,0,0 Common 69 6 69 .   # . [2e ]p
    — 10 0,255,0,255,0,0,0,0,0,0 Common 70 10 70 -    # — [2014 ]p
    4 8 0,255,0,255,0,0,0,0,0,0 Common 71 2 71 4    # 4 [34 ]0
    » 10 0,255,0,255,0,0,0,0,0,0 Common 72 10 31 »    # » [bb ]p
    € 0 0,255,0,255,0,0,0,0,0,0 Common 73 4 73 €    # € [20ac ]
    W 5 0,255,0,255,0,0,0,0,0,0 Latin 11 0 74 W # W [57 ]A
    J 5 0,255,0,255,0,0,0,0,0,0 Latin 105 0 75 J    # J [4a ]A
    é 3 0,255,0,255,0,0,0,0,0,0 Latin 76 0 76 é   # é [e9 ]a
    9 8 0,255,0,255,0,0,0,0,0,0 Common 77 2 77 9    # 9 [39 ]0
    ® 0 0,255,0,255,0,0,0,0,0,0 Common 78 10 78 ® # ® [ae ]
    $ 0 0,255,0,255,0,0,0,0,0,0 Common 79 4 79 $    # $ [24 ]
    5 8 0,255,0,255,0,0,0,0,0,0 Common 80 2 80 5    # 5 [35 ]0
    } 10 0,255,0,255,0,0,0,0,0,0 Common 81 10 86 }  # } [7d ]p
    [ 10 0,255,0,255,0,0,0,0,0,0 Common 82 10 107 [ # [ [5b ]p
    Y 5 0,255,0,255,0,0,0,0,0,0 Latin 46 0 83 Y # Y [59 ]A
    § 10 0,255,0,255,0,0,0,0,0,0 Common 84 10 84 §    # § [a7 ]p
    " 10 0,255,0,255,0,0,0,0,0,0 Common 85 10 85 "  # " [22 ]p
    { 10 0,255,0,255,0,0,0,0,0,0 Common 86 10 81 {  # { [7b ]p
    ¢ 0 0,255,0,255,0,0,0,0,0,0 Common 87 4 87 ¢  # ¢ [a2 ]
    / 10 0,255,0,255,0,0,0,0,0,0 Common 88 6 88 /   # / [2f ]p
    Q 5 0,255,0,255,0,0,0,0,0,0 Latin 45 0 89 Q # Q [51 ]A
    6 8 0,255,0,255,0,0,0,0,0,0 Common 90 2 90 6    # 6 [36 ]0
    G 5 0,255,0,255,0,0,0,0,0,0 Latin 18 0 91 G # G [47 ]A
    ” 10 0,255,0,255,0,0,0,0,0,0 Common 92 10 92 "    # ” [201d ]p
    ° 0 0,255,0,255,0,0,0,0,0,0 Common 93 4 93 °  # ° [b0 ]
    K 5 0,255,0,255,0,0,0,0,0,0 Latin 65 0 94 K # K [4b ]A
    ¥ 0 0,255,0,255,0,0,0,0,0,0 Common 95 4 95 ¥  # ¥ [a5 ]
    V 5 0,255,0,255,0,0,0,0,0,0 Latin 24 0 96 V # V [56 ]A
    © 0 0,255,0,255,0,0,0,0,0,0 Common 97 10 97 © # © [a9 ]
    z 3 0,255,0,255,0,0,0,0,0,0 Latin 35 0 98 z # z [7a ]a
    + 0 0,255,0,255,0,0,0,0,0,0 Common 99 3 99 +    # + [2b ]
    = 0 0,255,0,255,0,0,0,0,0,0 Common 100 10 100 = # = [3d ]
    £ 0 0,255,0,255,0,0,0,0,0,0 Common 101 4 101 £    # £ [a3 ]
    < 0 0,255,0,255,0,0,0,0,0,0 Common 102 10 67 <  # < [3c ]
    ’ 10 0,255,0,255,0,0,0,0,0,0 Common 103 10 103 '  # ’ [2019 ]p
    ‘ 10 0,255,0,255,0,0,0,0,0,0 Common 104 10 104 '  # ‘ [2018 ]p
    j 3 0,255,0,255,0,0,0,0,0,0 Latin 75 0 105 j    # j [6a ]a
    X 5 0,255,0,255,0,0,0,0,0,0 Latin 51 0 106 X    # X [58 ]A
    ] 10 0,255,0,255,0,0,0,0,0,0 Common 107 10 82 ] # ] [5d ]p
    * 10 0,255,0,255,0,0,0,0,0,0 Common 108 10 108 *    # * [2a ]p
    “ 10 0,255,0,255,0,0,0,0,0,0 Common 109 10 109 "  # “ [201c ]p
    @ 10 0,255,0,255,0,0,0,0,0,0 Common 110 10 110 @    # @ [40 ]p
    • 10 0,255,0,255,0,0,0,0,0,0 Common 111 10 111 •    # • [2022 ]p
    – 10 0,255,0,255,0,0,0,0,0,0 Common 112 10 112 -  # – [2013 ]p
    … 10 0,255,0,255,0,0,0,0,0,0 Common 113 10 113 ...    # … [2026 ]p
    ^ 0 0,255,0,255,0,0,0,0,0,0 Common 114 10 114 ^ # ^ [5e ]

We can see that some parts are missing.

Here from the documentation to see the difference, so the "missing" part :
https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/doc/unicharset.5.asc

    EXAMPLE (v3.02)

    110
    NULL 0 NULL 0
    N 5 59,68,216,255,87,236,0,27,104,227 Latin 11 0 1 N
    Y 5 59,68,216,255,91,205,0,47,91,223 Latin 33 0 2 Y
    1 8 59,69,203,255,45,128,0,66,74,173 Common 3 2 3 1
    9 8 18,66,203,255,89,156,0,39,104,173 Common 4 2 4 9
    a 3 58,65,186,198,85,164,0,26,97,185 Latin 56 0 5 a
    . . .
@amitdo
Copy link
Collaborator

amitdo commented May 9, 2016

script_dir in set_unicharset_properties should point to a directory that contains a *.unicharset file. For English and other Latin based scripts, the file is Latin.unicharset.
You can find the *.unicharset files here: https://github.com/tesseract-ocr/langdata

@ne0zer0
Copy link
Author

ne0zer0 commented May 9, 2016

script_dir in set_unicharset_properties should point to a directory that contains a *.unicharset file.

This is the case, the file I get from unicharset_extractor is used as input file by set_unicharset_properties.

For English and other Latin based scripts, the file is Latin.unicharset.
You can find the *.unicharset files here: https://github.com/tesseract-ocr/langdata

I have some questions:

  1. If I follow the documentation, I should use the unicharset file generate by unicharset_extractor, because it is adapted to the chosen font. Isn’t it?
  2. What if I use the latin.unicharset that do not match the xheight of the chosen font?
  3. Why set_unicharset_properties still complain when I use latin.unicharset? See below the output:

Is it normal?

    Loaded unicharset of size 3504 from file latin.unicharset
    Setting unichar properties
    Other case Ȿ of ȿ is not in unicharset
    Other case Ɀ of ɀ is not in unicharset
    Other case Ɐ of ɐ is not in unicharset
    Other case Ɒ of ɒ is not in unicharset
    Other case Ɜ of ɜ is not in unicharset
    Other case Ɡ of ɡ is not in unicharset
    Other case Ɥ of ɥ is not in unicharset
    Other case Ɦ of ɦ is not in unicharset
    Other case Ɬ of ɬ is not in unicharset
    Other case Ɱ of ɱ is not in unicharset
    Other case Ʇ of ʇ is not in unicharset
    Other case Ʞ of ʞ is not in unicharset
    Other case Μ of µ is not in unicharset
    Other case ϳ of Ϳ is not in unicharset
    Mirror ⧵ of ∕ is not in unicharset
    Mirror ⦸ of ⊘ is not in unicharset
    Mirror ⫞ of ⊦ is not in unicharset
    Mirror ⫤ of ⊨ is not in unicharset
    Mirror ⫣ of ⊩ is not in unicharset
    Mirror ⫥ of ⊫ is not in unicharset
    Warning: properties incomplete for index 1073 = ~
    Warning: properties incomplete for index 1081 = ¨
    Warning: properties incomplete for index 1087 = ¯
    Warning: properties incomplete for index 1090 = ²
    Warning: properties incomplete for index 1091 = ³
    Warning: properties incomplete for index 1092 = ´
    Warning: properties incomplete for index 1096 = ¸
    Warning: properties incomplete for index 1097 = ¹
    Warning: properties incomplete for index 1117 = ˆ
    Warning: properties incomplete for index 1118 = ˇ
    Warning: properties incomplete for index 1135 = ˘
    Warning: properties incomplete for index 1136 = ˙
    Warning: properties incomplete for index 1137 = ˚
    Warning: properties incomplete for index 1138 = ˛
    Warning: properties incomplete for index 1139 = ˜
    Warning: properties incomplete for index 1168 = ̀
    Warning: properties incomplete for index 1169 = ́
    Warning: properties incomplete for index 1170 = ̂
    Warning: properties incomplete for index 1171 = ̃
    Warning: properties incomplete for index 1172 = ̄
    Warning: properties incomplete for index 1173 = ̅
    Warning: properties incomplete for index 1174 = ̆
    Warning: properties incomplete for index 1175 = ̇
    Warning: properties incomplete for index 1176 = ̈
    Warning: properties incomplete for index 1177 = ̉
    Warning: properties incomplete for index 1178 = ̊
    Warning: properties incomplete for index 1179 = ̋
    Warning: properties incomplete for index 1180 = ̌
    Warning: properties incomplete for index 1181 = ̍
    Warning: properties incomplete for index 1182 = ̎
    Warning: properties incomplete for index 1183 = ̏
    Warning: properties incomplete for index 1184 = ̐
    Warning: properties incomplete for index 1185 = ̑
    Warning: properties incomplete for index 1186 = ̒
    Warning: properties incomplete for index 1187 = ̓
    Warning: properties incomplete for index 1188 = ̔
    Warning: properties incomplete for index 1189 = ̕
    Warning: properties incomplete for index 1194 = ̚
    Warning: properties incomplete for index 1195 = ̛
    Warning: properties incomplete for index 1201 = ̡
    Warning: properties incomplete for index 1202 = ̢
    Warning: properties incomplete for index 1203 = ̣
    Warning: properties incomplete for index 1204 = ̤
    Warning: properties incomplete for index 1205 = ̥
    Warning: properties incomplete for index 1211 = ̫
    Warning: properties incomplete for index 1212 = ̬
    Warning: properties incomplete for index 1213 = ̭
    Warning: properties incomplete for index 1214 = ̮
    Warning: properties incomplete for index 1216 = ̰
    Warning: properties incomplete for index 1217 = ̱
    Warning: properties incomplete for index 1218 = ̲
    Warning: properties incomplete for index 1219 = ̳
    Warning: properties incomplete for index 1220 = ̴
    Warning: properties incomplete for index 1221 = ̵
    Warning: properties incomplete for index 1222 = ̶
    Warning: properties incomplete for index 1223 = ̷
    Warning: properties incomplete for index 1224 = ̸
    Warning: properties incomplete for index 1228 = ̼
    Warning: properties incomplete for index 1229 = ̽
    Warning: properties incomplete for index 1230 = ̾
    Warning: properties incomplete for index 1231 = ̿
    Warning: properties incomplete for index 1233 = ́
    Warning: properties incomplete for index 1234 = ͂
    Warning: properties incomplete for index 1236 = ̈́
    Warning: properties incomplete for index 1240 = ͋
    Warning: properties incomplete for index 1248 = ͘
    Warning: properties incomplete for index 1252 = ͜
    Warning: properties incomplete for index 1253 = ͝
    Warning: properties incomplete for index 1254 = ͞
    Warning: properties incomplete for index 1255 = ͟
    Warning: properties incomplete for index 1256 = ͠
    Warning: properties incomplete for index 1257 = ͡
    Warning: properties incomplete for index 1258 = ͢
    Warning: properties incomplete for index 1294 = ً
    Warning: properties incomplete for index 1295 = ٌ
    Warning: properties incomplete for index 1296 = ٍ
    Warning: properties incomplete for index 1297 = َ
    Warning: properties incomplete for index 1298 = ُ
    Warning: properties incomplete for index 1299 = ِ
    Warning: properties incomplete for index 1300 = ّ
    Warning: properties incomplete for index 1301 = ْ
    Warning: properties incomplete for index 1302 = ٓ
    Warning: properties incomplete for index 1303 = ٔ
    Warning: properties incomplete for index 1304 = ٕ
    Warning: properties incomplete for index 1315 = ٰ
    Warning: properties incomplete for index 1317 = ॒
    Warning: properties incomplete for index 1386 = ⁄
    Warning: properties incomplete for index 1406 = 
    Warning: properties incomplete for index 1407 = 
    Warning: properties incomplete for index 1408 = 
    Warning: properties incomplete for index 1409 = 
    Warning: properties incomplete for index 1410 = 
    Warning: properties incomplete for index 1411 = 
    Warning: properties incomplete for index 3101 = ゙
    Warning: properties incomplete for index 3102 = ゚
    Warning: properties incomplete for index 3103 = ゛
    Warning: properties incomplete for index 3104 = ゜
    Writing unicharset to file output_unicharset

@ggdhines-zz
Copy link

To follow up with #316 , I added the line

/home/ggdhines/github/tesseract/training/set_unicharset_properties -U unicharset -O new_unicharset --script_dir=/home/ggdhines/github/langdata/Latin.unicharset

then new_unicharset looks like:

1 8 0,255,0,255,0,0,0,0,0,0 Common 3 2 3 1 # 1 [31 ]0
2 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 2 # 2 [32 ]0
9 8 0,255,0,255,0,0,0,0,0,0 Common 5 2 5 9 # 9 [39 ]0

(Only trying for 3 characters right now). This looks better than before (no null values) but I'm still getting the error:
Bad properties for index 3, char 1: 0,255 0,255 0,0 0,0 0,0
(Repeated for each character.)

@ne0zer0 's questions are good ones.

@ggdhines-zz
Copy link

ggdhines-zz commented May 9, 2016

Also just realized that the example unicharset file in the Compute the Character Set of the official documents:
; 10 Common 46
b 3 Latin 59
W 5 Latin 40
7 8 Common 66
= 0 Common 93

appears to be out of date (I think that's Tesseract version 2)

@amitdo
Copy link
Collaborator

amitdo commented May 9, 2016

@ggdhines

Try this:

/home/ggdhines/github/tesseract/training/set_unicharset_properties -U unicharset -O new_unicharset --script_dir=/home/ggdhines/github/langdata

Since your unicharset file has glyphs which belong to the Common script I think you should also put the Common.unicharset in the script dir.
https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Common.unicharset

@ne0zer0
Copy link
Author

ne0zer0 commented May 9, 2016

appears to be out of date (I think that's Tesseract version 2)

Yes, you should read unicharset(5) doc:
https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/doc/unicharset.5.asc

And for set_unicharset_properties, where I can find file listing font xheights for all my desired fonts? (Adobe Jenson too)

set_unicharset_properties --help
USAGE: set_unicharset_properties
  --debug_level  Level of Trainer debugging  (type:int default:0)
  --load_images  Load images with tr files  (type:int default:0)
  --clusterconfig_min_samples_fraction  Min number of samples per proto as % of total  (type:double default:0.625)
  --clusterconfig_max_illegal  Max percentage of samples in a cluster which have more than 1 feature in that cluster  (type:double default:0.05)
  --clusterconfig_independence  Desired independence between dimensions  (type:double default:1)
  --clusterconfig_confidence  Desired confidence in prototypes created  (type:double default:1e-06)
  --script_dir  Directory name for input script unicharsets/xheights  (type:string default:)
  --configfile  File to load more configs from  (type:string default:)
  --D  Directory to write output files to  (type:string default:)
  --F  File listing font properties  (type:string default:font_properties)
  --X  File listing font xheights  (type:string default:)
  --U  File to load unicharset from  (type:string default:unicharset)
  --O  File to write unicharset to  (type:string default:)
  --T  File to load trainer from  (type:string default:)
  --output_trainer  File to write trainer to  (type:string default:)
  --test_ch  UTF8 test character string  (type:string default:)

Or a way to compute these xheights for every kind of fonts?

Maybe, it is a problem with wctype functions on systems? As I read on the documentation:

If your system supports the wctype functions, these values will be set automatically by unicharset_extractor and there is no need to edit the unicharset file. On some very old systems (eg Windows 95), the unicharset file must be edited by hand to add these property description codes.

@amitdo
Copy link
Collaborator

amitdo commented May 9, 2016

@ne0zer0

unicharset_extractor produces a unicharset file.

You need to pass this file to set_unicharset_properties.

-U unicharset

Download these files:
https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Latin.unicharset
https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Common.unicharset

Lets say you put these files in langdata directory located under /home/ne0zer0.

Now, run this:

set_unicharset_properties --F font_properties -U unicharset -O output_unicharset --script_dir=/home/ne0zer0/langdata

@ggdhines-zz
Copy link

thanks @amitdo for the help. I'm a little confused though as to why we need to use Latin.unicharset and Common.unicharset. Shouldn't we be teaching Tesseract new fonts based on the actual examples (and box files). Using some preexisting unicharset file makes it seem as if we're not actually training Tesseract on the new font

@ne0zer0
Copy link
Author

ne0zer0 commented May 9, 2016

unicharset_extractor produces a unicharset file.

You need to pass this file to set_unicharset_properties.

-U unicharset

It is exactly what I do.

Download these files:
https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Latin.unicharset

Already done.

https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Common.unicharset

Done after reading your post addressed to ggdhines

set_unicharset_properties --F font_properties -U unicharset -O output_unicharset --script_dir=/home/ne0zer0/langdata

What did and I wrote in the first post (current directory):

When I try to fix it with with set_unicharset_properties:
set_unicharset_properties --F font_properties -U unicharset -O output_unicharset --script_dir=/

where Common.unicharset is now put.

But if I put Latin.unicharset, what I get is not what is adapted for my fonts,
xheights can change per fonts:
https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Latin.xheights
So how can I be sure that this default value is correct for my all fonts???

As says ggdhines, according to the documentation, we have to compute the actual size of our fonts:

https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/doc/unicharset.5.asc

CAVEATS

Although the unicharset reader maintains the ability to read unicharsets of older formats and will assign default values to missing fields, the accuracy will be degraded.

What you suggest is likely to produce such degraded result. (what I seem to experiment)

Always for the above link:

Further, most other data files are indexed by the unicharset file, so changing it without re-generating the others is likely to have dire consequences

So, as it is stated that "assign default values to missing fields, the accuracy will be degraded", and "is likely to have dire consequences", your proposition can not be accepted, because it lead to what I experiment if I do like you say: strange results.

Unless I did not understand anything, in which case, as I am not the only one, you have to review the documentation.

@ne0zer0
Copy link
Author

ne0zer0 commented May 9, 2016

@amitdo

Here some output according to your recommendation:

tesseract training:

    tesseract -l eng2 eng.palladio-regular.exp9.tif eng.palladio-regular.exp9 box.train.stderr
    Tesseract Open Source OCR Engine v3.04.01 with Leptonica
    Page 1
    APPLY_BOXES:
       Boxes read from boxfile:    1583
       Found 1583 good blobs.
    Generated training data for 403 words
    Page 2
    APPLY_BOXES:
       Boxes read from boxfile:    1590
       Found 1590 good blobs.
    Generated training data for 380 words
    Page 3
    APPLY_BOXES:
       Boxes read from boxfile:    1577
       Found 1577 good blobs.
    Generated training data for 392 words
    Page 4
    APPLY_BOXES:
       Boxes read from boxfile:    1613
       Found 1613 good blobs.
    Generated training data for 363 words
    Page 5
    APPLY_BOXES:
       Boxes read from boxfile:    1435
       Found 1435 good blobs.
    Generated training data for 312 words
    Page 6
    APPLY_BOXES:
       Boxes read from boxfile:    1670
       Found 1670 good blobs.
    Generated training data for 367 words
    Page 7
    APPLY_BOXES:
       Boxes read from boxfile:    1684
       Found 1684 good blobs.
    Generated training data for 374 words
    Page 8
    APPLY_BOXES:
       Boxes read from boxfile:    1673
       Found 1673 good blobs.
    Generated training data for 365 words
    Page 9
    APPLY_BOXES:
       Boxes read from boxfile:    1639
       Found 1639 good blobs.
    Generated training data for 381 words
    Page 10
    APPLY_BOXES:
       Boxes read from boxfile:    1671
       Found 1671 good blobs.
    Generated training data for 369 words
    Page 11
    APPLY_BOXES:
       Boxes read from boxfile:    1723
       Found 1723 good blobs.
    Generated training data for 357 words
    Page 12
    FAIL!
    APPLY_BOXES: boxfile line 1058/1 ((1279,1540),(1321,1612)): FAILURE! Couldn't find a matching blob
    FAIL!
    APPLY_BOXES: boxfile line 1382/1 ((1287,742),(1330,814)): FAILURE! Couldn't find a matching blob
    APPLY_BOXES:
       Boxes read from boxfile:    1631
       Boxes failed resegmentation:       2
       Found 1629 good blobs.
    Generated training data for 303 words
    Page 13
    FAIL!
    APPLY_BOXES: boxfile line 75/1 ((1299,4479),(1342,4550)): FAILURE! Couldn't find a matching blob
    FAIL!
    APPLY_BOXES: boxfile line 321/1 ((1289,3814),(1332,3885)): FAILURE! Couldn't find a matching blob
    FAIL!
    APPLY_BOXES: boxfile line 475/1 ((1284,3415),(1327,3486)): FAILURE! Couldn't find a matching blob
    FAIL!
    APPLY_BOXES: boxfile line 629/1 ((1278,3016),(1321,3087)): FAILURE! Couldn't find a matching blob
    FAIL!
    APPLY_BOXES: boxfile line 937/1 ((1267,2218),(1310,2289)): FAILURE! Couldn't find a matching blob
    APPLY_BOXES:
       Boxes read from boxfile:    1727
       Boxes failed resegmentation:       5
       Found 1722 good blobs.
    Generated training data for 239 words
    Page 14
    APPLY_BOXES:
       Boxes read from boxfile:    1651
       Found 1651 good blobs.
    Generated training data for 350 words
    Page 15
    APPLY_BOXES:
       Boxes read from boxfile:    1619
       Found 1619 good blobs.
    Generated training data for 179 words
    Page 16
    APPLY_BOXES:
       Boxes read from boxfile:    1634
       Found 1634 good blobs.
    Generated training data for 238 words
    Page 17
    APPLY_BOXES:
       Boxes read from boxfile:    1677
       Found 1677 good blobs.
    Generated training data for 386 words
    Page 18
    APPLY_BOXES:
       Boxes read from boxfile:    1643
       Found 1643 good blobs.
    Generated training data for 401 words
    Page 19
    APPLY_BOXES:
       Boxes read from boxfile:    1659
       Found 1659 good blobs.
    Generated training data for 376 words
    Page 20
    APPLY_BOXES:
       Boxes read from boxfile:    1317
       Found 1317 good blobs.
    Generated training data for 304 words

set_unicharset_properties

    set_unicharset_properties --F font_properties --script_dir=/home/ne0zer0/Tesseract/Test2/langdata -U unicharset -O output_unicharsetLoaded unicharset of size 115 from file unicharset
    Setting unichar properties
    Other case É of é is not in unicharset
    Warning: properties incomplete for index 25 = ~
    Writing unicharset to file output_unicharset

shapeclustering

    shapeclustering -F font_properties -U output_unicharset eng.palladio-regular.exp9.tr
    Reading eng.palladio-regular.exp9.tr ...
    Bad properties for index 25, char ~: 91,229 135,255 73,174 0,41 0,200
    Building master shape table
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
    Distance = 0.000000: Distance = 0.000000: Distance = 0.006410: Distance = 0.007353: Distance = 0.007463: Distance = 0.012427: Distance = 0.012987: Distance = 0.015045: Distance = 0.020725: Distance = 0.020841: Stopped with 10 merged, min dist 0.026087
    Master shape_table:Number of shapes = 102 max unichars = 3 number with multiple unichars = 7

mftraining

    mftraining -F font_properties -U output_unicharset -O eng.unicharset eng.palladio-regular.exp9.tr
    Read shape table shapetable of 102 shapes
    Reading eng.palladio-regular.exp9.tr ...
    Bad properties for index 25, char ~: 91,229 135,255 73,174 0,41 0,200
    Warning: no protos/configs for sh0099 in CreateIntTemplates()
    Warning: no protos/configs for sh0100 in CreateIntTemplates()
    Warning: no protos/configs for sh0101 in CreateIntTemplates()
    Done!

The result is as strange as before, but now I have this warning in mftraining:

Warning: no protos/configs for sh0101 in CreateIntTemplates()

And why these failure?

APPLY_BOXES: boxfile line 1382/1 ((1287,742),(1330,814)): FAILURE! Couldn't find a matching blob

@amitdo
Copy link
Collaborator

amitdo commented May 9, 2016

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract#shapeclustering-new-in-302

shapeclustering should not be used except for the Indic languages.

@ne0zer0
Copy link
Author

ne0zer0 commented May 9, 2016

Yes I know,

At first, i tried without shapeclustering, but I finally say "what if".

Anyway, here the output:

    mftraining -F font_properties -U output_unicharset -O eng.unicharset eng.palladio-regular.exp9.tr
    Warning: No shape table file present: shapetable
    Reading eng.palladio-regular.exp9.tr ...
    Flat shape table summary: Number of shapes = 112 max unichars = 1 number with multiple unichars = 0
    Warning: no protos/configs for Joined in CreateIntTemplates()
    Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
    Done!

For the same result…

@amitdo
Copy link
Collaborator

amitdo commented May 9, 2016

FAIL!
APPLY_BOXES: boxfile line 937/1 ((1267,2218),(1310,2289)): FAILURE! Couldn't find a matching blob

And:

Boxes read from boxfile: 1727
Boxes failed resegmentation: 5
Found 1722 good blobs.

As long as you get only a few of these failures, it's probably OK.

Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()

This is also normal.

Overall, the output of the commands looks OK.

@ne0zer0
Copy link
Author

ne0zer0 commented May 9, 2016

Thank you for your reply,

But what about theses questions:

But if I put Latin.unicharset, what I get is not what is adapted for my fonts,
xheights can change per fonts:
https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Latin.xheights
So how can I be sure that this default value is correct for my all fonts???

As says ggdhines, according to the documentation, we have to compute the actual size of our fonts:

https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/doc/unicharset.5.asc

CAVEATS

Although the unicharset reader maintains the ability to read unicharsets of older formats and will assign default values to missing fields, the accuracy will be degraded.

What you suggest is likely to produce such degraded result. (what I seem to experiment)

Always for the above link:

Further, most other data files are indexed by the unicharset file, so changing it without re-generating the others is likely to have dire consequences

So, as it is stated that "assign default values to missing fields, the accuracy will be degraded", and "is likely to have dire consequences", your proposition can not be accepted, because it lead to what I experiment if I do like you say: strange results.

Unless I did not understand anything, in which case, as I am not the only one, you have to review the documentation.

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2016

https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/doc/unicharset.5.asc

CAVEATS

Although the unicharset reader maintains the ability to read unicharsets of older formats and will assign default values to missing fields, the accuracy will be degraded.

ne0zer0:

What you suggest is likely to produce such degraded result. (what I seem to experiment)

It's the opposite of what you said. You interpreting the above paragraph wrongly.

I'll give you more answers later.

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2016

The meaning of this 'CAVEAT':

Starting from Tesseract version 3.02 the unicharset file should look like this:

110
NULL 0 NULL 0
N 5 59,68,216,255,87,236,0,27,104,227 Latin 11 0 1 N
Y 5 59,68,216,255,91,205,0,47,91,223 Latin 33 0 2 Y
1 8 59,69,203,255,45,128,0,66,74,173 Common 3 2 3 1
9 8 18,66,203,255,89,156,0,39,104,173 Common 4 2 4 9
a 3 58,65,186,198,85,164,0,26,97,185 Latin 56 0 5 a
...

If you will use the old format:

; 10 Common 46
b 3 Latin 59
W 5 Latin 40
7 8 Common 66
= 0 Common 93

the accuracy will be degraded, because the training tool will assign default (suboptimal) values to missing fields.

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2016

0,255,0,255,0,0,0,0,0,0 is the default (suboptimal) values for glyph_metrics.

@ne0zer0
Copy link
Author

ne0zer0 commented May 10, 2016

CAVEATS
Although the unicharset reader maintains the ability to read unicharsets of older formats and will assign default values to missing fields, the accuracy will be degraded.

What I understood is: incomplete v3.02 unicharset format file (like what I get) will result, infine, to the old format (after all, some fields are zeroing). These 0 lead to default value.

As informations are missing, owing to the fact that 0 are put instead of actual value, the accuracy will be degraded.

@ggdhines-zz
Copy link

@amitdo - why is this necessary at all? Shouldn't Tesseract being learning based on the training examples we provide? Pre-existing data isn't going to be helpful with new fonts.

@ne0zer0
Copy link
Author

ne0zer0 commented May 10, 2016

@ggdhines 👍

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2016

@ne0zer0
Again, you are completely wrong.

You should both be patient and wait for my further answers which may clear things up for you.

@ne0zer0
Copy link
Author

ne0zer0 commented May 10, 2016

Ok, I will wait.

Thanks for your patience.

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2016

When you run unicharset_extractor you get a unicharset file in which each line has these fields:
character properties glyph_metrics script other_case direction mirror normed_form.

In this stage all the fields accept the character field are set to their default values. We want to set these fields to their correct values!

So we run set_unicharset_properties (we will call it 'the tool'):

set_unicharset_properties -U unicharset -O new_unicharset -X xheights --script_dir=/home/myusername/tesseract-ocr/langdata

The tool will take our unicharset file and add new values in the various fields. We will get a new fixed new_unicharset file.

More details:
First, the tool will call a few functions to fill the correct values to these fields:
properties script other_case direction mirror normed_form.

What left to fill are the 10 glyph_metrics fields' values.

Tesseract does not provide a training tool that generate the correct values for the glyph_metrics fields for the specific trained fonts.

Instead, there is a pre-made unicharset file for each script (the unicharset files in the langdata repo) which contains "universal" glyph_metrics that have been set from a large number of fonts.

The tool will look for a few files in the directory you told it to search, /home/myusername/tesseract-ocr/langdata in this example.

The tool will scan the lines in the unicharset file, and for each character field in a line it will search the appropriate Scriptname.unicharset. For example, for the character 'C' it will read the Latin.unicharset and for the character '8' it will read the Common.unicharset. The tool will search a matching line in the Scriptname.unicharset that has the same character field as the 'current' line in the unicharset line. When it find such a match it will take the glyph_metrics values located in the same line as the character and implant them in the matching line as new glyph_metrics values in the the unicharset file line.

A second file that the tool will search is a Scriptname.xheights file.
Here is a link to the Latin.xheights file.
https://raw.githubusercontent.com/tesseract-ocr/langdata/master/Latin.unicharset

According to Ray Smith, the lead developer of Tesseract:

If the font you are using is not listed in this file, it will use the mean of the ones that are. IIRC these numbers are used to set up expectations for inter-character spacing. They are for a fixed, quite large size (32 pt??).

('using' = 'training')

The tool will process the info in Latin.xheights together with the new values of the glyph_metrics and will output the result to a file, xheights in our example.

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2016

That's it. I did my best efforts to explain things you ask about. You can now respond... :)

@ne0zer0
Copy link
Author

ne0zer0 commented May 11, 2016

Hi,

Thank you for this clarification.

I am somewhat disappointed, and I have more questions than before:

  1. Why using default values, when we want to train for specific fonts, and we could get specific values?
  2. Are not specific values better than default values??? Why not using default values "only" when we need it? For example, when there are no matching font?
  3. It will be certainly less accurate to use default values; so we cannot get the best result for specific fonts? Unless to waste a lot of time in training Tesseract? For an "uncertain" result?
  4. Why set_unicharset_properties does not compute such values? It would not be too difficult to develop such a functionality. And I think it would be better in a lot of way (result and time).
  5. I cannot understand how these default values can be better. For example, theses default values are likely not to match Adobe Jenson Pro glyph metrics and other fonts. So I have to train again, again, and again, to get better result due to lack of specific values, owing to "missing functionality"?
  6. You write:

set_unicharset_properties -U unicharset -O new_unicharset -X xheights --script_dir=/home/myusername/tesseract-ocr/langdata

but the xheights file generated is always blank, and the file Latin.xheights seems to do nothing (I already tried this), and I always get the same output_unicharset file, with or without Latin.xheights (located in langdata folder, or in the current directory).

What can be done with a filled (with default values) xheights file?

Thank you

@amitdo
Copy link
Collaborator

amitdo commented May 11, 2016

It will be certainly less accurate to use default values; so we cannot get the best result for specific fonts? Unless to waste a lot of time in training Tesseract? For an "uncertain" result?

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

There are a variety of reasons you might not get good quality output from Tesseract. It's important to note that unless you're using a very unusual font or a new language retraining Tesseract is unlikely to help.

Why set_unicharset_properties does not compute such values? It would not be too difficult to develop such a functionality.

If you develop such a tool (or hire someone to do so) we can add a link in the wiki to your site...

My answer for your other questions:
This is the current situation and you should accept it...

Your last question - you probably did something wrong if you get an empty file. I will try to test it later.

Two last notes:
I and the other people responding most of the time here and in the mailing-list are volunteers.
Within free (not paid) open source projects, complaining would not help, kind request might help but it's not guaranteed.

@amitdo
Copy link
Collaborator

amitdo commented May 11, 2016

the file Latin.xheights seems to do nothing... and I always get the same output_unicharset file, with or without Latin.xheights.

The Latin.xheights is not supposed to change anything in the output_unicharset.

@amitdo
Copy link
Collaborator

amitdo commented May 11, 2016

You seem to think that / is the current directory, but it's not.
/ is your 'root' directory. ./ (or just .) is the current directory.

@amitdo
Copy link
Collaborator

amitdo commented May 11, 2016

What can be done with a filled xheights file?

https://github.com/tesseract-ocr/tesseract/blob/master/doc/mftraining.1.asc

@ne0zer0
Copy link
Author

ne0zer0 commented May 11, 2016

If you develop such a tool (or hire someone to do so) we can add a link in the wiki to your site...

  1. Not enough time
  2. Not enough money
  3. The developers of Tesseract will do it faster, cleaner, and cheaper than me.

My answer for your other questions:
This is the current situation and you should accept it...

Indeed. In fact, I expected too much from Tesseract.

Your last question - you probably did something wrong if you get an empty file. I will try to test it later.

I will wait for your test.

Two last notes:
I and the other people responding most of the time here and in the mailing-list are volunteers.
Within free (not paid) open source projects, complaining would not help, kind request might help but it's not guaranteed.

Sorry, but I did not want to hurt anybody.

You seem to think that / is the current directory, but it's not.
/ is your 'root' directory. ./ (or just .) is the current directory.

No, just a mistake when I wrote.

What can be done with a filled xheights file?

https://github.com/tesseract-ocr/tesseract/blob/master/doc/mftraining.1.asc

Thanks for the link.

Anyway, thank you for everything.

@ne0zer0
Copy link
Author

ne0zer0 commented May 12, 2016

I finally resolved my problem with xheights. I did a spelling mistake :s

Anyway, thanks for all.

@ne0zer0 ne0zer0 closed this as completed May 12, 2016
@amitdo
Copy link
Collaborator

amitdo commented May 12, 2016

Nick White @nickjwhite had tried to build the tool you want.
See this old thread:
https://groups.google.com/forum/?hl=en#!searchin/tesseract-ocr/pango|sort:date/tesseract-ocr/QH09G5p1jGI/QcHIJvfzFaYJ

@nickjwhite
Copy link

More recently I made the addmetrics and xheights tools, which are in the tools directory of the git repo https://ancientgreekocr.org/grctraining.git

@amitdo
Copy link
Collaborator

amitdo commented May 12, 2016

Hi Nick!
Did you test the impact of using the output of these tools compared to the "universal" unicharset?

@amitdo
Copy link
Collaborator

amitdo commented Jun 5, 2016

Pinging @nickjwhite ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants