Browse files

[docs] added blog post about data model to COMMITERS, as suggested by…

… ihrd++
  • Loading branch information...
1 parent ecf8e53 commit 81c91be115d22019d4a48366dd9ebad9ce95260c @masak masak committed Nov 15, 2008
Showing with 72 additions and 0 deletions.
  1. +72 −0 docs/COMMITTERS
@@ -23,6 +23,8 @@ mailing list, or the IRC channel #november-wiki over at
+== Coding standard
There's also an emerging coding standard, starting with the message on
the november-wiki mailing list:
@@ -38,3 +40,73 @@ likely for legal reasons, too.
* People are encouraged to add themselves to the AUTHORS file, even
before committing anything else.
+== Data model
+Also, here's a reproduction of <>,
+with the part about the data model reproduced. This part is expected to be
+up-to-date with respect to the model, as opposed to the blog post which may
+be declared obsolete at some point in the future.
+November, being a wiki engine, stores articles containing (markup) text. It
+also stores earlier revisions of each article, as well as a list of the recent
+changes to the wiki as a whole. On top of that, there's a tagging system with
+which you can associate a 'tag' ('label', 'category', whatever) with a page.
+The directory data/articles contains files describing the revision history of
+the wiki's articles. One would perhaps think that these files could contain the
+(latest) contents of the article instead, and in fact they once did, sometime
+early this summer. But that direct model was changed and replaced by the
+current scheme involving one level of indirection, to make article histories
+and recent changes work.
+A typical file in data/articles looks like this:
+ [
+ '1218057494',
+ '1218057386',
+ '1218056287',
+ '1215026471'
+ ]
+There are two things to note about this file. First, it's legal Perl. Turned
+out that was by far the easiest way to read stuff from files, by parsing them
+with eval. This design decision will likely be replaced some time in the
+future... but currently it works well for us.
+Second, the numbers (which are strings for some reason) are unique IDs --
+actually return values from the &time builtin. They function as pointers to
+files with the same name in data/modifications. The revisions are stored
+latest-first. So, for example, information about the latest modification of the
+article above can be found in the file data/modifications/1218057494. It could
+look something like this:
+ [
+ 'Example_article',
+ 'Look, I\'m the contents of an illustrative example!
+ I even contain newlines, fancy that.',
+ 'carl'
+ ]
+Again, this is an array reference, readily parseable by Perl. It contains three
+elements: the title, the contents and the user who made the change.
+Now you know enough to figure out how the file data/recent-changes is
+structured. That's right, it's a serialized array of unique IDs, much like
+those in the files in data/articles.
+In fact, all files in the data directory are similarly serialized data
+structures. (Except for the files in data/page_tags, for some reason. I should
+find out why.)
+Finally, tags. For every article that has tags, data/page_tags has a file with
+that article's name containing a single line with a comma-separated list of tag
+ automobiles, airplanes, trains, space vessels
+And the data/tags_count contains a simple serialized hash which keeps track of
+the number of occurrences of each tag on the page. As far as I can see, this
+file does not reflect the actual number of tags on the wiki in the repository
+right now — most likely, it's only for testing purposes at this stage.

0 comments on commit 81c91be

Please sign in to comment.