Skip to content
browsable, for search engine indexable, referencable archive. delivers also content as xml and json
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


deliver contents in a different way:

  • browsable without javascript
  • for search engine indexable
  • referencable archive, i.e. all pages have a nice url, you have an anchor on each line/paragraph which can be used to share you finding with others.
  • delivers content as xml and json - can be used by webservices via CORS or directly for mobile apps
  • each page downloaded carries enought context information that you can find the page either here or even on

API on local filesystem

  • all files are encoded in UTF-8
  • start point is: archive/api

Common Context inside all Files

  • path - null means root, all others are relative to the root. the path is using only ascii characters.
  • baseUrl - on local filesystem this is 'file:' but from an online source the respective base-url will be there
  • version - null mean version. this has no real meaning for the ilfe itself but can be used to consstructed versioned url for new links. this allows to keep navigation within the version selected by user without parsing http parameters.
  • titlePath - reversed path names in roman script, can be used for page title.
  • menus - these are all menus from the root menu until the one for this directory/document

Directory and their Files

Each directory has an json with its context, i.e.

The directory files only contains the common context described above.

Document Files

Each document carries the same context as all directories do and add a few extra info to it

  • normativeSource - the path on to find the original source for his document
  • source - the xml source of the document. xml is much better to markup textual information then json.
  • versions - all possible version variants for the document

The XML document has the actual document embedded add the under the document tag.

Note on Directory Structure from

The original directory structure from has duplicate names which is not possible when storing on filesystem or mapping to an url. In case there are duplicates the second one got the postfix _2_ added to the name or the _3_ for third 'duplicate'. In some cases in could be resolved by adding another sub-directory. This subdirectory can happen sometime in the future.

old stuff might be obsolete

first phase

done and live on

second phase


  • produce a simple xml format from the sources and store them in this git repository. same with the directory listing as index.xml
  • ensure the round trip -> xml-format -> produces the same files
  • deliver the content as html in the same format as the website of the first phase
  • deliver additional formats like the xml or json or the original tei-format and a printerfriendly html view
  • all stored xml will use UTF-8 encoding so all those wonderfull commandline text processing tools under linux and macos just work

third phase


  • for the devanagari files are normative. use those files to produce roman script files and offer all files for both roman and devanagari scripts.
  • ensure the round trip: devanagari-tei-format -> roman-xml-format -> devanagari-xml-format -> devenagari-tei-format
  • encode alternative snippets in xml - not sure how to
  • see if the round trip ending in roman-tei-format matches as well: devanagari-tei-format -> roman-xml-format -> devanagari-xml-format -> roman-tei-format
  • see if the references like [saṃ. ni. 1.35] or [dha. pa. 307 dhammapadepi] etc can be resolved with proper reference on the file level

running a simple server for testing

cd src/main/webap

this will deliver the archive on http://localhost:8888/

You can’t perform that action at this time.