- Current issues that this addresses
- Non-standard storage of content and limited interoperability support
- Untested code in one of the most critical areas of Sakai
- Standard system for storing content which is easy to swap/configure
- Industry standard and well tested code which is reliable
- Simplification and clarification of the Content Hosting Service
- JCR is an industry standard and Jackrabbit is well tested
- JCR can interoperate and is highly configurable
- Reduction of Sakai developed code of around 10000+ lines and overall reduction in code paths
- Reduction of Content Hosting Service from 103 methods to around 20 methods (it becomes a simple abstraction layer and utility class for JCR)
- Predictable performance (and we will be able to take advantage of any industry performance improvements)
Overview of JCR in Sakai
The Java Content Repository API is defined in 2 JCP specifications, JSR-170 and JSR-283. JSR-170 is the original specification, defining an API for a Node/Property based Content Repository, with features such as simple versioning, locking, and XPATH/SQL querying. The API is defined into 2 compatibily layers, with the Level 2 layer containing mostly optional features. JSR-283 supercedes JSR-170 and adds support for full versioning, sharable nodes, and other features.
The API from JSR-170, jcr-1.0.jar, is currently going into shared/lib in Sakai.
We define an addition API in the maven module sakai-jcr-api located in the top level sakai subversion module 'jcr'.
This API allows you to obtain a javax.jcr.Session object. This is one of the most top level interfaces in the JCR API that allows you to perform work on the repository. At the moment, obtaining this Session is much like obtaining a DB connection in that there is no default Sakai security performed (although the underlying implementation may choose to enforce some security parameters).
The above diagram outlines the basic idea. There can be a number of implementations of the JCR API that can be installed as Sakai Components. Currently there are working implementations of Apache JackRabbit and Xythos. It's reasonable to say that these are currently of beta quality. Alfresco research is currently underway.
While either of the these implementations can be used, it's not likely that they can be swapped in an existing production instance using one or the other, as the different implementations have wildly different storage schemas. However, a goal of the JCR specification is to allow the copying of resources between implementations via the standard API with some mounting semantics. As an example then, it is conceivable that you could have both a JackRabbit, Xythos, and Implementation XYZ in Sakai, and expose them with some amount of interoperability. This is described in Section 3.2 (Page 14) of the JSR-170 PDF Specification.
On top of the JCR API tools and services can work on top of the Content Repository. A JCR implementation of the existing org.sakaiproject.content.api.ContentHostingService contract in currently underway ( roughly Alpha quality at the moment ). Other components can then build on top of this as usual. There are plans to eventually supercede the ContentHostingService API that relies heavily on the existing JCR API's. This could be some time away, so a current priority is to completely implement the original API as to support existing tools and allow a smooth migration/deprecation process.
Other system components can be built as well. Under research and development is the ability to store Messages in JCR Files and Folders. In this fashion you can model MIME messages and other social postings, as these can naturally be modeled using a file/folder structure. A good read is David's Model, a set of tips from David Nuescheler (a lead JackRabbit developer ) concerning the structuring of content when using JCR.
Content Hosting Stats
These are stats collected from various schools related to access of Content Hosting.
(feel free to contribute your numbers to this)
Max/Avg Usage - indicates the peak and average hits per second in writes and reads
Max content.read / sec
Max content.write / sec
Avg content.read / sec
Avg content.write / sec
Related activity levels - indicates the ratio of the various Content Hosting CRUD operations on an item (compared to read as that is the most prevalent activity), in other words, this shows that reads happen 200x more often than updates