Child pages
  • Sakai 2.5 Performance Testing at Wiley
Skip to end of metadata
Go to start of metadata

Recently, Unicon, Inc. and John Wiley and Sons, Inc. undertook a serious round of performance testing of a revision of the 2.5.x branch of Sakai. We have other rounds of performance testing planned, and the purpose for this first round was to obtain baseline performance data before we make changes to the application. Below you will find a synopsis of the testing, and you'll also find the complete documents for the test plan and the test results report. If you have comments, questions, suggestions, or criticisms please share them with us via the production mailing list.

Test Plan

We had two objectives for this first round of testing. First we wanted to validate that the core Sakai framework and a particular set of tools met minimum performance criteria. Second, we wanted to establish a baseline set of performance data. The particular set of tools included Announcements, Schedule, Resources, Gradebook, Forums, and Site Info. Our average and worst response-time goals were 2.5 seconds and 30 seconds, respectively, and we measured our response times until the last byte transferred back to the client. We also wanted both the web server and database server not to exceed 75% utilization under a 500 concurrent-user load and an error rate less than 2% (responses other than HTTP 200). Our environment consisted of an 8-CPU Solaris server for the database, a 4-CPU Linux server for the application, several desktop machines to run Silk Performer, and a dedicated LAN with one switch to keep the network as small a factor in the testing as possible.

The test plan document, details the click paths our test students and test instructors followed during the performance tests. Here is a quick overview of the click paths.

Student Click Path

Instructor Click Path

read announcements
view schedule
add event to personal site
do readings
upload resource to personal site
preview questions
do assessments
utilize feedback
review results
communicate in forums
communicate in email
join worksite
populate student profile
search for information
change e-mail preference
browse help

post announcements
upload roster
upload resources
create assignments
preview questions
review grades
export grades
import grades
add schedule event
change role
add tools to worksite
edit site information
search for information

Sakai Distro

The distro we used is based on the 2-5-x "Cafe" distro which is theoretically the minimum source code required to successfully build and start up Sakai. (

The actual externals defining the source code distribution under test are attached to this page. Implementation notes:

  • We added the course management default implementation in order to access database-persisted AcademicSessions.
  • We added the calendar module to meet Wiley's usage pattern requirements. This introduced a dependency on the assignments module.
  • We added the gradebook module to meet Wiley's usage pattern requirements. This introduced a dependency on the sections module.
  • We added the search module to meet Wiley's usage pattern requirements. This introduced a dependency on the mailarchive module.
  • We added the msgcntr module to meet Wiley's usage pattern requirements. This introduced a dependency on the profile module which introduced a dependency on the privacy module.
  • We switched to the "vanilla" (rather than Cafe) version of the user module to meet Wiley's usage pattern requirements for the preferences tool. This introduced a dependency on the syllabus module.
  • We added a top-level "shared-deployer" module to deploy JSTL for a number of tools. (See SAK-12798\).
  • We added the web module to fix the default Home page layout.

Data Management

We modified Alan Berg's provisioning scripts (see SAK-13352) to populate a large amount of data into the database (minor modifications compared to the work he's already put into the scripts). Specifically, we added functions to populate announcements, gradebook assignments, forum topics, and forum posts. We also added the ability to choose a per-site or per-instance distribution of created users, and we added functionality to create/specify a home page for worksites. Finally, we modified both the user/worksite creation and resource creation to produce very predictable names like worksite1, worksite2, worksite1-student1, worksite1-student2, resource1.pdf, and This made authoring the Silk-Performer scripts much easier. Also, the default SOAP timeouts in were too short. We had to increase them to 10 hours because of the large quantity of data we were loading. Here are the config files we used for provisioning; vlw.txt, lw.txt, mw.txt, sw.txt. Once we had the data loaded, we used Oracle Data Pump to obtain and restore backups of the data once we had it loaded, and we would periodically restore the database from backup once our testing had sufficiently polluted the database.

Bugs Identified

The bugs that were identified are detailed in the Performance Testing Results but we would like to gather feedback to determine if anyone else in the community has encountered these issues and is working on a solution. These issues included:

  1. Forums
    • SAK-13188 Messages & Forums Synoptic Tool / More performance improvements
    • SAK-11463 Slow queries from Forums
  2. Memory Leak upon User Login
  3. CPU Consuming JVM

User Scripts

We used Silk Performer for our testing, and we can make the scripts available upon request.

Test Results

With the exception of one needed index (on the SURROGATEKEY column of the MFR_MESSAGE_T), we found the database performed remarkably well, and we had virtually nothing to tune on that end. However, we did find that the connection pool was recycling connections very rapidly, and that was causing us some real headaches. The problem proved to be that while we had the minIdle parameter set, we did not have maxActive specified. As soon as Sakai opened a connection, it would close it. We had a fall-off-the-cliff effect on performance as soon as active connections exceeded the minIdle setting.

We encountered performance issues with the forums tool, and we were unable to meet our stated performance goals there (average page-load time of 2.5 seconds). Rather than investigate and attempt to improve forum performance, we opted to delete the forum data and address the issues at a later date.

We discovered a memory-usage pattern associated with login that limits scalability and steady-state operation. Essentially, when a user logs into a particular JVM running Sakai for the first time, there are many objects created and, we assume, cached. We haven't identified the exact location of the problem, but we think those cached objects never age out of the cache. As new users login, the cache grows, and eventually the JVM runs out of memory. We had our JVM set for 1 gig of RAM, and we could achieve approximately 30,000 new logins before the server spent more time garbage collecting than it did processing. Adding memory to the JVM mitigates the issue, so we will probably choose to run larger JVMs and implement operational procedures (periodic server restarts) rather than attempt to modify the caching behavior,

On several occasions, we experienced stuck threads or threads that would run continuously and consume 100% of a CPU. We found a number of Jira tickets describing similar behavior (SAK-8932, SAK-13386, SAK-11098). These stuck threads skewed the metrics we gathered using sar, but we didn't really notice a significant effect on response times.

If you look at the response times graph in our performance test results, you'll notice that response times remained pretty even as we increased the number of users. In fact, the response times are almost the same with 500 users as with 1 user.

  • No labels


  1. Great post, Alex, and exciting work.

    It looks like you have test scripts in Silk performer and are willing to make them available to the community. Would it be hard to translate them to an open source tool?

    1. Hi Michael,

      I thought that such a conversion might be difficult (though, not impossible), but I wasn't certain of that. I posed this question to Testware, the folks who wrote the Silk Performer scripts, and this is their response.

      "While SilkPerformer can call C/C++ programs, the script
      itself is written in a proprietary language. Essentially the scripts would
      have to be completely redone from scratch using an open-source tool. There
      are a lot of built-in functions/features within SilkPerformer that are not
      available in any of the open-source tools. Based on our experience with some
      of the open source tools, there will also be some significant challenges in
      creating a script to perform all of the functions that we created with the
      SilkPerformer. For example, parsing the page ids will probably have to be
      coded manually for every page and every user, the cookie handling often has
      to be hand coded, the uploading of documents is often not supported and with
      many of the Sakai pages (the ones with tabs), the browser stores a web page
      context so that when you come back to that page, it remembers which tab you
      were on - not sure how this context would be stored/tracked/created with an
      open-source tool."

      That doesn't help you much, though.  Sorry about that.  


      1. I think most of the talk about "all the functions we have created..." is mostly FUD, but they definitely don't make it easy to export the scripts in any way that is usable in other tools (open source or proprietary). Anyway, if anyone is interested in the scripts, we can certainly share them, but they most likely need to be translated by hand to a new tool like Grinder.

  2. 2G of JVM memory is relatively low for large-scale production use on 64 bit app servers. You may want to look at the JIRA PROD category to see some real-world examples.

    1. The amount of memory was not a limiting factor in these tests. There was a memory leak. Increasing the amount of memory would have only caused that to be noticed later. In the one increase in memory we did make, we didn't even notice the additional amount of time that we should have before the memory leak caused us a problem.

      The run away threads, also have nothing to do with memory. From verbose garbage collection reports, we could see at the end of some tests, there was plenty of memory remaining, and yet the CPU was pegged with threads doing something that could not be determined. Increasing the memory would also have not done anything to prevent this problem.