Skip to content
Search based Procedural Content Generation (PCG) using Solr and Project Gutenberg.
XSLT JavaScript Java CSS HTML
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


One thing that always fascinated me about software is its ability to do things on a massive scale. Things that single person can't do well at least in practical terms. Take for example Procedural Content Generation (PCG). Today's world of gaming has a high demand for more content. Especially quality content. Both big and small gaming studios leverage PCG to generate content for their games. Anything from terrain to music to a full game can be generated with help of software. The quality of such content can not necessarily compete with the creativity of a game developer or an artist yet in some special cases it is "good enough" or "playable". However, when we talk about quantity no single human can beat software. If you are looking into releasing daily levels for your game or maybe even levels that are tailored to each player and you have millions of players then you might want to consider exploring PCG.

This sample project demonstrates how we can generate levels for text based games. Working with text, or rather, natural language is not an easy task. Making computer understand it and generate back some meaningful response is a challenging but rewarding experience. Luckily, there is a lot of ready to use frameworks that can be leveraged for this task. This project uses Solr, an open source search framework (a.k.a mini Google), to index textual information and run search queries against it. The result of such queries can be used to generate playable levels for text based games. We can refer to this approach as Search based Procedural Content Generation.

Language games are very effective way to learn a new language. With search based PCG we can generate lots of games for any language. We can process thousands of publicly available books and generate tons of playable content in a matter of seconds!

To get a feel as to what kind of games can be made with such generators take a look at Sporcle literature and language games sections.

In this project, you will find two Solr based content generators (MissingWords and TopWords). They could be used to convert text to playable levels for the example games described below.


Game 1: Commonest words in "A Connecticut Yankee in King Arthur's Court" by Mark Twain

Content for the text based games like Hitchhikers Guide to the Galaxy can be easily generated for any book or series of books. The goals of the game: Can you name the [x] commonest words in [you favorite book]? To generate content for games like this all we need is to index our book and run a Solr query that will return the frequency of each word. Example of such data for M. Twain's book could look as follows the=812, and=766, of=439, a=416, to=409, i=337, in=287, that=256, was=241, it=204 and so on.

Game 2: Fill-in the missing words.

All of the sentences below missing the same word or phrase. Can you figure what's missing?

  • ... projectile are ______ ______ broken but they are also torn, twisted and shredded, and so quick is the action that ...
  • ... which will stand ______ ______ the shock of being fired from heavy guns at high velocities, but which will ...
  • ... so constituted as to court death, ______ ______ for himself but for others about him, when handling ...
  • ... The cheerfulness was ______ ______ in his disease, but in his temperament ...
  • ... He said that instruction would do, and he was ______ ______ younger and handsomer, but he was fresher from ...
  • ... particularly sane spectacle, that impatience to be off to some place that lay ______ ______ in the distance, but also ...
  • ... burned the fierce white light of the sun, in which ______ ______ the earth seemed to parch and thirst, but the ...

Did you guess it? The answer is: "not only". To generate content for games like this all we need is to run a query against Solr index with a missing phrase as search criteria. Solr will return highlighted snippets of text where the search phrase is present. After some massaging of the results, we can replace the search phrase with blank and the level is ready to play.


Many more types of simple text games can be generated this way. Generating meaningful text that can be used as game content is not easy yet lots of tools available to help us do the job. You can extend this project and add other generators. If you do please share your creations.

You can’t perform that action at this time.