Skip to content
Branch: master
Go to file

Latest commit


Failed to load latest commit information.
Latest commit message
Commit time

PAN Kreator bot

PAN Kreator bot is an internet crawler that digs the resources of the PAN Biblioteka Gdańska and posts interesting results on the Twitter/Facebook.

Bot uses the OAI-PMH API to connect to the and perform a query. Matching record is downloaded, unzipped and converted from djvu to jpg. Finally, the image is posted on the Twitter.

But this is just the part of the bot's abilities. This guy uses machine learning algorithms (Support Vector Machine) to get the idea about the content of the downloaded book. He's able to tell the difference between the text, blank page and image (preferably a figure). Bot goes through all pages of a books and picks only those that are worth posting from his point of view. When a book ends, he chooses the page that seems to contain highest percent of images.

How does he know what to look for?

The bot was initially taught to distinguish three categories of pages by a human. We used a set of 368 images that contained different data.

For example this was marked as a text (which we don't want to publish on Twitter):

this as a blank page (also not very interesting):

but this as an image, because it contains something different and possibly worth showing:

The effectiveness of the image recognition is quite hard to predict, but it makes the results of bot's work interesting.

To check what PAN Kreator have found recently, please visit his Twitter or Facebook page.

Please follow him if you like this!


An internet crawler that digs the resources of the PAN Biblioteka Gdańska





No releases published
You can’t perform that action at this time.