Skip to end of metadata
Go to start of metadata

This page is detailing requirements for the "Archive/Delete" aspect of what some have called Archive/Delete/Import/Export. While Archive/Delete on the one hand and Import/Export on the other are related technically, the requirements and uses cases are fairly distinct. This page is looking at Archive/Delete.

There is a page designed to document specific, short, use cases at: Archive, Delete and Restore Use Cases

Site Archive/Delete/Restore

Organizations running Sakai over the years have stuff piling up in their databases and file systems. They need a way to remove this information from the current system, preserving the information for future use. This should be performed on a site-by-site basis with option for archiving all sites that meet a certain set of criteria (e.g. all course sites more than 4 years old). A common requirement is that sites should be available to students throughout their "normal" time at the university, which means something like 5 or 6 years.

There are different meanings to "preserving this information for future use." Two of the variables here are:

  1. Restore to which version? The ability to restore the site to the same version of Sakai from which it was archived, vs. to later versions of Sakai.
  2. The ability to restore "read only" access vs. the ability to restore it in a fully operational mode. The former is useful for record-keeping purposes, the latter for reuse.

Another distinction is "Soft" delete vs. "Hard" delete. It is desirable to have the ability to "softly" delete a site (essentially hide it) so that it can't be access but can be easily restored (in the "fully operational" sense) easily and quickly. A "hard" delete would remove the site data from the database, requiring a restore operation to have access to the information.

Although most tools should support this, the most important tools are:

  • Content Hosting (and anything that uses it, especially Resources).
  • Forums
  • Assignments
  • Announcements
  • Gradebook
  • Email archive
  • Chat
  • Tests & Quizzes (a.k.a. Samigo)
  • Test Center (a.k.a. Mneme)
  • Wiki

Handling content appears in multiple sites is something to watch out for. Portfolios are an obvious example of this.

Requirements

The following table is intended to be the general set of requirements applying to the overall process. Each of these tools will have their own specialized requirements.

Priority 1: Must have. Blocker for releasing the feature
Priority 2: Highly desirable. May be worth delaying the release
Priority 3: Nice to have

Requirement

Description

Priority

Static archive & view

Archive the site/tool information to a format that can be viewed at a later date. Provide a view of that information that is human readable for historical/investigative purposes.

1

Archive viewer

There should be the ability to view an archived Sakai site without using Sakai.

3

Same version live archive & import

Archive the site/tool data to a format that can be imported by the same version of Sakai and used as if the site had not been deleted.

2

Import past archive

Take an archive from the previous version of Sakai and import it into the current version. (This is really a potential requirement for the next release.)

3 (N/A)

Hard Delete

Remove as much data as possible from the database and remove accessibility of the site. By default, a hard delete should create an archive.

1

Soft Delete

Support a "soft delete" that simply makes a site inaccessible from all but defined administrative roles. The side can easily be restored by the person who deleted it (or others with appropriate permissions).

2

Multiple Site Delete

Soft or Hard delete of sites selected based on criteria such as:

  • Age of site
  • Amount of time in "soft delete" state (obviously for hard delete only)
  • Size of site data (possible to calculate?)
  • Number of participants (as proxy for size)
  • Type of site
  • Position in site hierarchy

3

Authorization

The permission to archive, delete and restore sites should be configurable. The "hard delete" function is generally going to be performed by a senior administrative resource. At certain organizations, "soft delete" may be available to site owners.

1

Operation Scope

The process of archiving/deleting/restoring a site should be configurable; allowing a user to specify which data should be operated on. For example, archive everything and delete everything but the announcements.

3

Archive format

The archive file(s) should be independent from other Sakai sites. The archive should contain the content needed to restore it.

2

Performance

The archive/delete process should be able to be performed without requiring system downtime. The amount of computing resources allocated to the process should be configurable by an administrator.

1

Cross-site content

The delete process should preserve content that appears in multiple sites.

1

Work Estimates

Note: General concerns that not all data is "site" data when you are talking about site archive/deletion in general.

Section Info - stores its data as part of site service, so covered through work on it
Roster - ?

Requirement

Samigo

Announcements

Assignments

Resources

Static archive & view

1 week or more

2 days

1 week

1 week

Archive viewer

?

?

?

?

Same version live archive & import

1 week

1 week

1 week

1 week

Import past archive

?

?

?

?

Hard Delete

1 week

1 week

2 weeks

2 weeks

Soft Delete

 

2-4 weeks

2-4 weeks

2-4 weeks

Multiple Site Delete

?

?

?

?

Authorization

 

?

?

?

Operation Scope

 

?

?

?

Archive format

part of static archive & view above

?

?

?

Performance

?

?

?

?

Cross-site content

note that Assessment Types and Question Pools in Samigo belong to the person, not the site, so they will not be archived/deleted

2-4 days

2-4 days

2-4 days

Have not heard from: Chat, Email Archive, Forums, Gradebook, Test Center (Mneme), Wiki

Alternatives

It is possible to conceive of providing much of this capability by simply implementing hard delete and having institutions run a second copy of Sakai that provides access to past sites.

  • No labels

6 Comments

  1. On the technical side, we struggled to come up with a decent API for archiving entity data (since a lot of developers would have to implement this it is important to get it right). Our requirements: easy to implement, flexible enough to support the different concepts like versioning, able to handle any kind of data (binary, etc)
    This was the one we came up with for EB, so I would like to see comments on it: https://source.sakaiproject.org/svn/entitybroker/trunk/api/src/java/org/sakaiproject/entitybroker/entityprovider/capabilities/Exportable.java https://source.sakaiproject.org/svn/entitybroker/trunk/api/src/java/org/sakaiproject/entitybroker/entityprovider/capabilities/Importable.java

  2. I have a slightly different comment, which goes back to when we first discovered how Sakai deals with the end of academic year cycle. The approach (assuming I have understood it right) is not to cycle out students and bring in new ones, but to copy resources to a new site and leave the old one there.

    I feel it may be valuable to examine the pros and cons of this practice. Perhaps if we look at things from the point of view that we are 'exporting' the students (and the data they need to have preserved) then refreshing the site and bringing in new students, we may find some better/easier designs.

    A lot depends on the extent to which the students need to be members of a site to do what they need to do after the course has finished. If, for example, content were open, or open to 'alumni' they could continue to access content. If we want a record of the students contributions in class, for grading challenges, then this content might be transferred to a grading archive organised by student rather than class.

    As I write this, I realise that so much of our current thinking is built around the notion that the way we control access is through site membership. Since I want to open up access more easily, perhaps this should be reviewed as part of long term planning if not short-term.

    1. I would like to underscore John's comment with an example from evaluations:
      We currently associated users with the templates they have authored and the same is true of the data in the templates. It has nothing to do with a site at all.
      The same could be said of the data in mneme which is associated the same way. I think the concept of associating the data with a user and optionally another thing (like a site or group or whatever) should be core to our thinking moving forward.
      The Exportable interface mentioned in my previous comment was created with this in mind and supports the option of exporting data associated with something (a user, a site, whatever) rather than requiring the data to associate with a site. This would allow, for example, the ability to export data for all entities created by a user or perhaps export a 'portfolio' and get all data associated with it.

  3. OTOH there are also problems with associating data with users (people move on, change their affilitations). Evaluation and assessment templates typically need to be associated with neither a site nor a user, but with a department or organizational grouping. Probably the same applies to certain classes of content (exam paper archives, teaching materials).

    But in delivery, assessment and evaluations are strongly associated with course and therefore site membership.

  4. There's a huge set of possible needs / desires and corresponding implementation problems which could require a lot of time to sort out. Would the following suggestion meet a lot of what people need to have in the very near future?

    • ability to mark a site as "read only". The site and associated contents could then be viewed but not changed.
    • ability to export a copy of a read only site and to import a read only copy of the site into the same Sakai instance.

    The point of marking things read only is avoid the issues about whether or not changes to the copy would work the same as changes to the original. E.g. could you import an exported site and change someone's grade? Could you add to a discussion?

    This should satisfy the need to delete expensive resources without losing information. That seems to be one of the basic current needs for an archive ability.

  5. A third variable that likely needs to be considered for "preserving this information for future use" is what to do with information that Sakai sites depend on an external service for. Should a snap-shot of such data, at the time of the archive's creation, be stored with the site? Is that the data on which the site depends when it is restored, or should it try to re-associate with the external source? Or, should the archive assume the external data source will be there when it is restored? What should it do if the external data source is no longer available?