BigStore
I struggled to find a name for this generic feature, but BigStore appears to be sticking. Apparently is was a Marx Brothers film from 1941, but that has nothing to do with this other than the way in which it was developed.
Note: This is also somtimes referred to as hashstore, virtualstore, virtual pathing, and hashpath, sharded (they all mean the idea of using hashes to extend the path to reduce write conflicts)
Sharded might be a better term, but those familiar with the technique in databases start asking lots of questions about tablespaces and the like.
Big store circumvents the limitations of the JCR content system which prevent a node having millions of syblings by providing a mechanism for marking a parent node and then supporting a number of storage strategies under that node, whilst keeping the URL space clean. The most popular way at the moment of performing this translation is to take a SHA1 hash of the child name and then split the hex encoded hash into groups of 2 characters generating a tree that will have no more than 255 syblings at each level.
eg The URL path http://host/store/oneofmillions is stored as /store/ed/ef/4e/ff/oneofmillions
Every level gives a power of 255 more storage locations with acceptable scalability. In Jackrabbit, because child nodes are stored as properties of the parent node, adding a child involves re-writing the child property on the parent. As the number of children rises two things happen. The cost of rewriting increases and the contention on that property also increases resulting in non mergable concurrent writes as nodes are created. By hashing the path and limiting the number of children to 255 maximum, the probability of a collision on update is greatly reduced.
In SkingK2, we could have taken a dumb view, and hard coded the path translation, but this would have limited flexibility and moved the storage patterns away from content store configuration towards hard coded paths. A slightly more complex route has been taken. With an extension to the JcrResourceResolver2 class inside the core of sling we have been able to change the way in which paths that don't exist are processed. In standard Sling, the oneofmillions path would be resolved as a NonExistentResource, impossible to bind further processing to. In the modification we have attached an ability, through implementations of the PathResorceTypeProvider API's, to modify the resource type of non existent paths, and hence bind to Servlets for processing.
At that point the servlet can reprocess the path, resolve the real resource and re dispatch. Unfortunately since the original resource resolution happens to early in the lifecycle we cant perform the reprocessing of the path during the first resolution as that leads to extensive recursion and very slow performance. Although this sounds like it might be simple, its not that easy to get right for all request types and all situations, and so, there are a number of abstract classes to make life easier.
In the first phase of resolution a bundle wanting to mange BigStore space can extend the AbstractPathResourceTypeProvider class and specify which resource type to bind to, as in the PersonalResoruceTypeProvider.
/* * Licensed to the Sakai Foundation (SF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The SF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations under the License. */ package org.sakaiproject.kernel.personal.resource; import static org.sakaiproject.kernel.api.personal.PersonalConstants.USER_PRIVATE_RESOURCE_TYPE; import org.sakaiproject.kernel.resource.AbstractPathResourceTypeProvider; /** * This class checks resource paths to see if there is a preferred resource type, where the * path is not a jcr path. * * @scr.component immediate="true" label="PersonalResourceTypeProvider" * description="Personal Service path resource type provider" * @scr.property name="service.description" value="Handles requests for Personal resources" * @scr.property name="service.vendor" value="The Sakai Foundation" * @scr.service interface="org.apache.sling.jcr.resource.PathResourceTypeProvider" */ public class PersonalResourceTypeProvider extends AbstractPathResourceTypeProvider { /** * {@inheritDoc} * @see org.sakaiproject.kernel.resource.AbstractPathResourceTypeProvider#getResourceType() */ @Override protected String getResourceType() { return USER_PRIVATE_RESOURCE_TYPE; } }
Once that is done a servlet needs to be created to handle the path translation. If the intention is to use default sling processing there is an AbstractVirtualPathServlet that can be extended as with the PersonalServlet
/* * Licensed to the Sakai Foundation (SF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The SF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations under the License. */ package org.sakaiproject.kernel.personal; import org.apache.sling.api.SlingHttpServletRequest; import org.apache.sling.api.resource.Resource; import org.sakaiproject.kernel.resource.AbstractVirtualPathServlet; import org.sakaiproject.kernel.util.PathUtils; /** * @scr.component metatype="no" immediate="true" * @scr.service interface="javax.servlet.Servlet" * @scr.property name="sling.servlet.resourceTypes" value="sakai/personalPrivate" * @scr.property name="sling.servlet.methods" values.0="GET" values.1="POST" * values.2="PUT" values.3="DELETE" */ public class PersonalServlet extends AbstractVirtualPathServlet { /** * */ private static final long serialVersionUID = -2663916166760531044L; /** * {@inheritDoc} * @see org.sakaiproject.kernel.resource.AbstractVirtualPathServlet#getTargetPath(org.apache.sling.api.resource.Resource, org.apache.sling.api.SlingHttpServletRequest, java.lang.String, java.lang.String) */ protected String getTargetPath(Resource baseResource, SlingHttpServletRequest request, String realPath, String virtualPath) { String userId = request.getRemoteUser(); return PathUtils.toInternalHashedPath(realPath, userId, virtualPath); } }
Once that is done and the bundle activated, any URL sub path of a node with the corresponding sling:resourceType will support a virtual path. Because the system operates on non existing paths only the full JCR path can still be used to address any nodes in that space.