Browse files


  • Loading branch information...
1 parent bd96f24 commit d7a8c0756d4af926fc2a54862e8cf1a232a18b30 @wdenton committed Dec 7, 2011
Showing with 193 additions and 0 deletions.
  1. +2 −0 .gitignore
  2. +191 −0 splurge.markdown
@@ -0,0 +1,2 @@
@@ -0,0 +1,191 @@
+# Goal
+Collect usage data from OCUL members and build a recommendation engine
+that can be integrated into any members catalogue. Make the anonymized
+data available under an open license so members and others can better
+assess and understand collection usage in Ontario, and make the software
+available under the GNU Public License so anyone can use it.
+# Background
+This project is based on a British project at JISC called [MOSAIC (Making Our Shared Activity Information Count)]( The documents there include:
+* [MOSAIC Data Collection: A Guide](
+* [MOSAIC Final Report]( (and [Appendices](
+* Also [MOSAIC Demonstration Links](, from a software contest they ran to find new, interesting uses for their data. The examples here go beyond
+the Recommendation Engine idea, but are worth looking at to see other
+possible future directions.)
+The JISC project grew out of work done by Dave Pattern and others at the
+University of Huddersfield. They made usage data available under an Open
+Data Commons License.
+* [Data](
+* [README](
+* Dave Pattern, Library Systems Manager at Huddersfield, explains things in [Free book usage data from the University of Huddersfield](
+# Data gathering
+## Data levels
+MOSAIC set out three levels of usage data in the [Final Report]( (p 40):
+> We refer to library circulation (loan & renewal) information as use data. Use
+> data contains one use record per item borrowed. Sets of use records may
+> have different amounts of information in each record, according to the
+> data level that applies to all the records in the set.
+Level 0 Level 0 use records contain where and when the loan was made and
+the item borrowed. Level 0 use data can be used to indicate popular
+loan items in the participating library.
+Level 1 Level 1 records are as for level 0, but also with borrower context
+information, indicating borrower type (staff or student), and course and
+progression level (for students). Level 1 use data can be used to
+see, via facets, for a given search, what was borrowed in one or more of:
+a particular institution, a particular course, a particular progression
+level (or by staff), and in a particular academic year.
+Level 2 Level 2 records are as for level 0, but also with an anonymised
+user ID Level 2 use data enables recommendations like borrowers of this
+item also borrowed, and borrowers of this item previously borrowed /went
+on to borrow.
+This project would collect use data at Level 0.
+## Data extraction
+Scholars Portal will give template XML files, with instructions, to member
+libraries, who will pull the necessary data from their systems. Because
+there are several different ILSes involved, the necessary database or
+report commands will vary, but once done for one ILS they can be shared
+with other users of the same system.
+!!! TODO Expand with actual examples
+## Data formats
+Following the MOSAIC lead (as described in the README from their script
+repository), we will collect item file and yearly transaction files from
+Item file: items.txt:
+ * item ID
+ * ISBN(s)
+ * title
+ author(s)
+ publisher
+ publication year
+ persistent URL
+ 123 0415972531 Music & copyright L. Marshall Wiley 2004
+ 234 0415969298 Songwriting tips N. Skilbeck Phaidon 1997
+The item ID is whatever ID you want to use to identify a library book. It
+must match the item ID contained in the item file.
+The ISBN(s) are one (or more) ISBNs, separated by a | pipe character where
+more than one ISBN is linked to the item (e.g. 0415966744|0415966752).
+The title is the title of the book.
+The author(s) are one (or more) names, separated by a | pipe character
+where more than one name is present (e.g. John Smith|Julie Johnson).
+The publisher and publication year are the name of the publishing company
+and the year of publication.
+The persistent URL is the web address the item can be found at (e.g. on
+your library catalogue).
+Transaction files: transaction.YYYY.txt
+ * timestamp
+ * item ID
+ * user ID
+ 1222646400 114784 67890
+ 1225756800 103828 67890
+ 1225756800 62580 76543
+The timestamp is in Unix time format (i.e. the number of seconds since 1st
+Jan 1970 UTC). It is used to calculate the day the transaction occurred
+The user ID is whatever ID you want to use to identify an individual
+library user. It will be converted to a MD5 hash value before the data is
+submitted to MOSAIC. It must match the user ID contained in the user file.
+The item ID is whatever ID you want to use to identify a library book. It
+must match the item ID contained in the item file.
+The basic usage data to be gathered is:
+ Item title
+ Number of copies
+ URL of item in catalogue
+ Loan history, giving number of initial circulations per year over
+the last 10 years (or fewer, if 10 years of data is not available)
+The basic also-borrowed data to be gathered for each item (A) is a list of
+other items (B) that shows:
+ how many times A was borrowed before B
+ how many times A and B were borrowed together
+ how many times A was borrowed after B
+ how many times B was borrowed in total
+Scholars Portal will aggregate the data from the different libraries, and
+make the data openly available.
+# Privacy
+No identifying information will be connected to the usage data. It is
+completely anonymous.
+## Data storage
+The data will be stored using the same format as Huddersfield used in
+their data release (see
+ circulation_data.xml contains aggregate usage information for
+individual titles
+ suggestion_data.xml contains people who borrowed X also borrowed Y
+ schools.xml is a lookup file listing OCUL members and ID numbers
+ courses.xml is a lookup file listing course codes and ID numbers
+!!!! TODO Expand
+## Recommendation Engine
+!!! TODO Write up what is known about how this can work, from MOSAIC and
+what Tim Spalding said
+When the Recommendation Engine is given an ISBN or other ID number it will
+suggest a list of related items, using an algorithm based on the
+also-borrowed data and the usage data.
+[suggest algorithm? Also use LibraryThing data? We can get it from Tim
+# Implementation as a web service
+The Recommendation Engine will have web-based API available at Scholars
+Portal. Ideally a library will be able to insert one line of Javascript
+into its HTML template to make the recommendations appear.

0 comments on commit d7a8c07

Please sign in to comment.