Alfresco in Sakai

I (Antranig) will be using this page as a dumping ground for results of my integration experiments with various JCR repositories. An initial bulk of this information is held at Using JSR-170 (Java Content Repository) which will be migrated over here gradually.

Alfresco JCR compatibility for Content Hosting

Held here are the results of my exploration of the potential of Alfresco as an alternative implementation to Apache Jackrabbit for the new ContentHostingSystem implementation in Sakai 2.5, backed by a standard JSR-170 Java Content Repository. There are two main categories of results, i) performance, ii) JAR/environmental/API compatibility.

The code for this investigation is currently packaged as a standalone Maven 2/Eclipse JUnit test case held at https://saffron.caret.cam.ac.uk/svn/projects/amb26/trunk/jcr-tests.

API/Repository compatibility

Alfresco complies to level 2 of the JCR spec, however its specific interpretation of the spec differs in a number of substantial ways from Jackrabbit's that would obstruct its suitability for a drop-in replacement for Jackrabbit in CHS.

Unstructured schema support

Firstly these is the issue that Alfresco has no support for dynamic or unstructured schemas exposed via JCR (and may indeed have none at all internally, I have not investigated this). The nt:unstructured JCR mixin is not implemented in the Alfresco schema. Properties which are not registered on a node type will cause a NullPointerException if they are accessed or written. Since the CHS implementation has a requirement to support arbitrary resource properties attached to a resource this would seem to obstruct creating an CHS implementation that allowed different implementations to show the same JCR node structure to clients. However this may be possible to paper over at a higher level with some utilities or other API abstractions.

Schema leniency

Alfresco is also less lenient than Jackrabbit on property multiplicity. Accessing a multi-valued property (defined using <multiple>true</multiple> as a scalar property via the API will cause an exception, and vice versa. Alfresco also insists that the jcr:primaryType property be set on every node since this property is <mandatory> in its schema. Unfortunate for JackRabbit, this property is <optional> in its schema, and so causes an exception if set. This difference is "papered over" in the test code with a construction like this:

testNode = root.addNode(testPath, "nt:folder");
    
    if (repositoryFactory.requiresPrimaryType()) {
      testNode.setProperty("jcr:primaryType", "nt:folder");
    }

Versioning

Despite Alfresco possessing built-in version management support, the mix:versionable mixin is not present in its builtin JCR schema. It may be possible to implement this easily, or it may be that Alfresco version management in this release differs too substantially from the JCR semantics.

JAR/Environmental compatibility

Alfresco is delivered "out of the box" in two main configurations. Firstly as a complete Tomcat distribution "ready to run", and secondly as a WAR. There is also an "embedded/SDK" configuration which was the basis of this experimentation. The complete Tomcat is obviously unsuitable for direct integration with Sakai, although could be accessible over some form of RMI/Web Services bridge. The current CHS/JCR implementation expects the backing impl to be delivered as a Sakai component. This impedes the ability to use the WAR configuration, although it would be possible to construct some form of webapp/component bridge. The problem with this approach is ensuring that the lifecycles matched up and that the webapp were started before any use were made of the system.

This exploration intended to establish the obstacles to packaging Alfresco directly as a Sakai component. The "jcr-tests" project has succeeded in establishing a Maven 2 build profile containing the minimal set of JARs necessary to start up a functioning Alfresco repository accessible over JCR. These JARs could in theory be delivered as an alternate implementation of JCR CHS. The obstacles are the following:

Spring/ClassLoader compatibility

(see discussion on Sakai's Component Manager at Component Manager Upgrade). Sakai's ClassLoader structure within components is nonstandard. This would prevent most independently Spring-configured applications from starting up correctly, since references to resource within Spring configuration files will be referred to the wrong ClassLoader. Whilst Alfresco is bundled with Spring 2.0.2, I have verified that it starts up correctly with Spring 2.0.6, the version currently used within Sakai, so Spring version compatibility specifically is not an issue.

Hibernate compatibility

Alfresco is implemented using Hibernate. Unfortunately Hibernate is present at the shared ClassLoader level within Sakai, and so is visible in code throughout the entire system. The version of Hibernate bound to Alfresco 2.1.0 is 3.2.1ga, and the current Sakai version is 3.2.5ga. Unfortunately Alfresco will not run correctly with Sakai's version of Hibernate, due to a change in the syntax tree structure for HQL queries. On startup with 3.2.5ga Alfresco delivers the following exception

2007-11-09 19:44:22,373 ERROR (SessionFactoryImpl.java:363) - <Error in named query: node.patch.GetNodesWithPersistedSerializableProperties>
org.hibernate.QueryException: illegal attempt to dereference collection [SAKDEV:nodeimpl0_.id.properties] with element property reference [SAKDEV:serializableValue] [SAKDEV:
      select distinct
         node
      from
         org.alfresco.repo.domain.hibernate.NodeImpl as node
      where
         node.properties.serializableValue is not null and
         node.properties.multiValued = false
   ]

The complete exception text is present as an attachment to this page.

This indicates that without specific ClassLoader work, it would be impossible to deliver Alfresco within Sakai. See the Component Manager Upgrade page for discussion of some related work, especially the section relating to Hibernate visibility. The most direct route would seem to be to achieve "Stage 1" of the upgrade plans, restoring the correct use of Context ClassLoaders, and then to provide a custom Hibernate-isolating ClassLoader for shielding the Alfresco module. Unless Hibernate were somehow removed from shared, it would even be impossible to deploy the standard Alfresco WAR within Sakai.

Comparative performance

In the limited testing done so far, Alfresco performance seems considerably worse than that of Jackrabbit, both in terms of memory and CPU usage. The testing was somewhat unrealistic in terms of real-life loads, since it consisted of a large number of updates made to the repository made within a single Session. However, a significant degradation in JCR query performance was observed. I have very little understanding of how to configure Alfresco appropriately for good performance and it is certain that many of the problems in this section could be significantly relieved through better configuration, especially that of the Alfresco cache.

Also this testing was done on a Windows box with a very crowded disk partition, and so the absolute timings are an extremely poor guide to what could be achieved in a production environment. However the relative timings between Alfresco and Jackrabbit are probably reasonably accurate.

Repository loading

Alfresco warns at startup if it is configured in a JVM with less than 512Mb heap. When run in the default configuration (64Mb heap), it fails to execute a test which tries to create 1000 nodes due to heap exhaustion. When given 512Mb heap, it still fails to execute a test creating 10000 nodes.

On a 1000 node test for Jackrabbit:

Root node has 1000 descendents
Saved 10240000 bytes in 168298ms: 59.42 K/s, 5.94 nodes/s

However for Alfresco:

Root node has 665 descendents
Saved 6809600 bytes in 903344ms: 7.36 K/s, 0.74 nodes/s

Many of the nodes "appeared" lost due to the excessive time to complete the test causing expiration from the cache. However the nodes were correctly saved to the backing storage.

Querying

JCR XPath queries of increasing length were presented to both Alfresco and Jackrabbit. Alfresco queries were up to two orders of magnitude slower than the same Jackrabbit queries on the same node sets, with Alfresco queries typically taking more than a second to return result sets of a few hundred nodes. Some representative transcripts, after a few runs to let any caching bed down:

Jackrabbit:

Begin run 4
Query /jcr:root/JCRTestPath/element(*, nt:file) concluded: 47ms
Matched 1000 nodes 
Counted 10239000 chars in 6812ms: 1467.85 K/s, 146.80 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0' ] concluded: 16ms
Matched 200 nodes 
Counted 2047800 chars in 828ms: 2415.22 K/s, 241.55 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1' ] concluded: 47ms
Matched 400 nodes 
Counted 4095600 chars in 1875ms: 2133.12 K/s, 213.33 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1'  or @sakaijcr:aclkey = '2' ] concluded: 63ms
Matched 600 nodes 
Counted 6143400 chars in 3079ms: 1948.49 K/s, 194.87 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1'  or @sakaijcr:aclkey = '2'  or @sakaijcr:aclkey = '3' ] concluded: 46ms
Matched 800 nodes 
Counted 8191200 chars in 4203ms: 1903.22 K/s, 190.34 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1'  or @sakaijcr:aclkey = '2'  or @sakaijcr:aclkey = '3'  or @sakaijcr:aclkey = '4' ] concluded: 93ms
Matched 1000 nodes 
Counted 10239000 chars in 6187ms: 1616.13 K/s, 161.63 nodes/s
Logged in as username to a Jackrabbit repository.

Alfresco:

Begin run 4
Query /jcr:root/JCRTestPath/element(*, nt:file) concluded: 1484ms
Matched 1000 nodes 
Counted 10239000 chars in 58078ms: 172.17 K/s, 17.22 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0' ] concluded: 1859ms
Matched 200 nodes 
Counted 2047800 chars in 7359ms: 271.75 K/s, 27.18 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1' ] concluded: 1969ms
Matched 400 nodes 
Counted 4095600 chars in 14406ms: 277.63 K/s, 27.77 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1'  or @sakaijcr:aclkey = '2' ] concluded: 2063ms
Matched 600 nodes 
Counted 6143400 chars in 23344ms: 257.00 K/s, 25.70 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1'  or @sakaijcr:aclkey = '2'  or @sakaijcr:aclkey = '3' ] concluded: 2391ms
Matched 800 nodes 
Counted 8191200 chars in 35078ms: 228.04 K/s, 22.81 nodes/s
Query /jcr:root/JCRTestPath/element(*, nt:file)[@sakaijcr:aclkey = '0'  or @sakaijcr:aclkey = '1'  or @sakaijcr:aclkey = '2'  or @sakaijcr:aclkey = '3'  or @sakaijcr:aclkey = '4' ] concluded: 2563ms
Matched 1000 nodes 
Counted 10239000 chars in 48547ms: 205.97 K/s, 20.60 nodes/s
Logged in as admin to a Alfresco Content Repository (Community Network) repository.

Labels

alfresco alfresco Delete
performance performance Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.