Merging with kraken? #125

amitdo · 2016-10-20T16:51:31Z

Is there any plan to merge ocropy with its fork kraken?

Having two "ocropy"s is kinda confusing :)

kba · 2016-10-20T19:31:16Z

This is really up to @tmbdev and @mittagessen to answer from a main developer's perspective but from my perspective, merging Ocropy and Kraken is not that important, but interoperability.

I think it's amazing what @mittagessen has created in terms of clean interfaces, documentation, forward compatibility and smart handling of models. I would really love to see Kraken's compact model serialization and repository features integrated into Ocropus.

On the other hand, Ocropus has seen so many iterations and produced so many tools (if you consider the code in OLD, clstm, hocr-tools...) with so many configuration options that I find new features all the time. But it has not been steadily maintained and keeping everything running and fixing bugs is a lot of effort.

So, for me it would be a more realistic goal to make interoperable/document the training mechanisms, model serializations and exchange formats.

amitdo · 2016-10-20T20:38:55Z

I think it's amazing what @mittagessen has created in terms of clean interfaces, documentation, forward compatibility and smart handling of models.

Totally agree. 👍 to @mittagessen

One important feature that he added is built-in RTL support (for languages like Arabic and Hebrew).

tmbdev · 2016-10-22T01:56:29Z

I wasn't planning on changing ocropy much at this point. To me, ocropy is a collection of basic reference implementations of the most important steps in OCR.

I hope I'll have more time to work on document analysis, and my plan was to develop trainable versions of nlseg, gpageseg, language modeling, and logical layout analysis using GPUs and newer deep models. These will simply be separate projects and otherwise work like the existing command line versions.

amitdo · 2016-10-22T08:15:30Z

Tom, thanks for giving us ocropy / clstm !

mittagessen · 2016-10-22T11:20:52Z

I was working on creating shims for most ocropus-* tools early during development but quickly came to my senses and by now quite a bit of code was removed (1800 vs 5300 SLOCs excluding OLD). Porting everything from ocropy to kraken or separating all the logic in ocropy's command line drivers into a true library will IMO not have any major benefit. So there's agreement from my side to treat ocropy just an essentially fixed collection of reference implementations with sometimes quirky command line interfaces (no offense).

As most of the functionality is still the same and will probably stay there for a while adapting patches will be possible with minimal effort. The only exception is page segmentation which is high on my list of things to get rid of in favor of something better and trainable, albeit I'd really like to keep everything at least marginally workable on CPUs. Even then keeping the old implementation around won't cause much harm.

jbaiter · 2016-10-24T13:53:09Z

@tmbdev, @mittagessen Can you recommend some papers on logical layout analysis/page segmentation with trainable models? I'd love to play around with that :-)

amitdo closed this as completed Oct 22, 2016

wrznr mentioned this issue May 2, 2017

Training kraken and RTL support? mittagessen/kraken#36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging with kraken? #125

Merging with kraken? #125

amitdo commented Oct 20, 2016

kba commented Oct 20, 2016

amitdo commented Oct 20, 2016 •

edited

Loading

tmbdev commented Oct 22, 2016

amitdo commented Oct 22, 2016

mittagessen commented Oct 22, 2016

jbaiter commented Oct 24, 2016 •

edited

Loading

Merging with kraken? #125

Merging with kraken? #125

Comments

amitdo commented Oct 20, 2016

kba commented Oct 20, 2016

amitdo commented Oct 20, 2016 • edited Loading

tmbdev commented Oct 22, 2016

amitdo commented Oct 22, 2016

mittagessen commented Oct 22, 2016

jbaiter commented Oct 24, 2016 • edited Loading

amitdo commented Oct 20, 2016 •

edited

Loading

jbaiter commented Oct 24, 2016 •

edited

Loading