Marist launched its first MOOC in June 2013 on "Introduction to Enterprise Computing" course and it is open till August 2013 in CLE 220.127.116.11.We have over 1100 registrations to the course and over 160 completed the course while around 300 students actually attempted to complete the course. We are now working on new MOOC in the area of Fashion and Computer Science which will be running during spring of 2014.
CLE : For the first MOOC in 2013 we used CLE 18.104.22.168 and adding Lesson-builder tool (1.4) . All course content is built using Lesson builder tool and tools we used are Announcements, Test and Quizzes, Gradebook, and Forums. We did not notice any performance based issues with any of the tools we deployed and we have only one CLE instance used for this MOOC. We tested Roster tool before we went live and decided not to use since it takes too long to load every single user in the course.
We have upgraded our CLE to 2.9.2 ( 22.214.171.124 ANI Release) during DEC 2013 and, we have few MOOC's related to Fashion, Computer Science which will be running in year 2014.
We used other applications to achieve full MOOC functionality. Below image describes briefly on which applications are involved.
- Liferay Portal which is the Main portal for all Marist MOOC's which handles User and Course Registrations.
- LDAP and CAS to provide SSO between Main Portal and CLE and any other system which may be introduced as part of MOOC (Apereo OAE which is planned in up coming MOOC's in Spring 2014).
- Mule as a service bus to provide seamless integration between different systems involved in the MOOC. Currently we are using this to provide channel between Main portal and CLE to create users when they register for MOOC and join them to course when they register for a specific course.
Load Test :
We started load testing 2.9 to see the performance impact on the application with large number of users in MOOC. We initially started testing with CLE Load Test Framework , it gave us an insight of CLE performance and Tools used but, we are mainly interested in performance tests on integration between multiple systems used in our setup, Since they are involved in user creation and course registrations.
First Test we did is with User creation which went well. We were able to achieve around 400 user registrations a minute with Mule as bottleneck for this, based on our initial Mule flow we created multiple threads for each request coming through mule and those threads are being used to call Sakai web services to create users using addNewUser function in (SakaiScript.jws), We did not see any data loss.
Second test was course registrations. We encountered problem with Sakai web service "addMemberToSiteWithRole" which uses the AuthzGroup to update of the course. There is huge loss of course registrations, almost 70% loss during each run. Requests made to Sakai web service "addMemberToSiteWithRole" performs many updates on different tables which is very heavy operation and spins a lot of sub threads to deal with this operation, but mainly we require AuthzGroup to adding user to the course. The issue we faced was basically a race condition, when there are multiple web service requests made to "addMemberToSiteWithRole" ( example 10 requests per second) each request spins off the thread internally to work on this request and one of the operations involves updating of the AuthzGroup object which is cached and each thread is working on the cached AuthzGroup object. As one thread tries to update the object it is overwritten by the another thread which is working on same AuthzGroup object which caused data loss of approximately 70%. We tried different ways to address this situation:
- Tried to use "addMemberToAuthzGroupWithRole" rather than "addMemberToSiteWithRole" which is sufficient for registering a student to a course. This is actually one of the sub task in "addMemberToSiteWithRole". This reduced the data loss to approximately 30%, since this operation is light weight compared to "addMemberToSiteWithRole", but since this uses a similar AuthzGroup save operation we still had some loss of registrations.
- We tried lot of approaches but, there are no other options left to solve this issue in Sakai, since DBAuthGroupService which is responsible to save AuthzGroup object spins threads for each save request made, it made clear that there is no way we can solve this in Sakai without changing the Sakai code, so we started to explore Mule to solve this issue.
I am attaching the Sample JMeter test script here which is calling "addMemberToSiteWithRole" where I am spinning 20 threads in 1 second and I can only get 3 or 4 actual registrations successfully made , you can also test the same with "addMemberToAuthzGroupWithRole" which is much better but still has loss of data.
Mule Enterprise Service Bus made the difference:
Finally with limited Mule knowledge, we started exploring using Mule to pipeline the process of receiving the requests and throttling them so that there is only one thread making the request to Sakai and also if needed include a delay between requests.
- We started to explore throttling in Mule and found no direct way and the only reference I can find is related to ActiveMQ based ( http://activemq.apache.org/delay-and-schedule-message-delivery.html). All this does is to delay the inbound messages and add specific delay before it is sends out through outbound queue, This does not actually solve the problem since adding static delay will only delay all simultaneous requests to wait for specified period and release them simultaneously, which will causes same effect.
- Finally, we used Java Component which calls Sakai web services and defined that under singleton scope as defined here (http://www.mulesoft.org/documentation/display/current/Object+Scopes). For a moment I thought Issue should be resolved by making this small change but it did not, all it did was create a singleton of Java Object and it is used by multiple threads generated by multiple inbound requests which caused the same effect, but I then realized I forgot a very simple and important thing, which is to make the Java Object thread safe by adding synchronized to the method calling Sakai web services, This finally gave us relief and solved the issue and there is no data loss.
- Only issue here is the delay in process from when course registration request is made till it successfully made into Sakai. From my tests I see on average we get 1.2 requests per second being served for "addMemberToSiteWithRole" where as with "addMemberToAuthzGroupWithRole" we are getting 2.5 successful requests per second. So we finally made the change to use "addMemberToAuthzGroupWithRole" in production.
Below is the Mule application flow we are currently using.