Issue 9, 2010-03-22

Challenges in Sustainable Open Source: A Case Study

The Archivists’ Toolkit is a successful open source software package for archivists, originally developed with grant funding. The author, who formerly worked on the project at a participating institution, examines some of the challenges in making an open source project self-sustaining past grant funding. A consulting group hired by the project recommended that — like many successful open source projects — they rely on a collaborative volunteer community of users and developers. However, the project has had limited success fostering such a community. The author offers specific recommendations for the project going forward to gain market share and develop a collaborative user and development community, with more open governance.

by Sibyl Schaefer

Introduction

From July 2007 to September 2009 I served as an Archives Analyst for the Archivists’ Toolkit (AT).[1] The project was co-sponsored by the University of California, San Diego, and New York University, where I was based. The Archivists’ Toolkit has been an extremely successful and groundbreaking open source software program for archives. The program has revolutionized archival description and data management by enabling easy authority control processes, linking accessions with descriptions, and allowing multiple standard data outputs from the same data set. The AT project was funded by the Andrew W. Mellon Foundation but was not self-sustaining at the end of the grant phase (September 2009), six years after the inception of the project. Instead of continuing to fund the product, the Mellon Foundation has suggested it merge with Archon, another open source archival data management program. While this move will unite two user communities in support of one product, it remains unclear whether the product is tenable outside of grant funding. During my experience with the AT project it became evident that: 1) the product needed more users to become sustainable and 2) governance of the project needed to be more open, delegating tasks to users whenever possible in order to minimize overhead costs and essentially becoming a true collaborative and community-based open source venture. The merger of the AT with Archon provides an excellent opportunity for the new product team to increase market share and establish a system to harness user contributions. By doing so, the end product will be in a much stronger and more sustainable position.

Background on the Archivists’ Toolkit

The Archivists’ Toolkit claims to be the “first open source archival data management system”[2] although Archon was also released around the same time.[3] Both projects originated out of the need for a tool to support the management and automation of archival data in a manner reflecting archival practices and to output this data in professional standards, such as Encoded Archival Description (EAD). This need was realized first in 2002, when the Digital Library Federation and California Digital Library co-sponsored a couple of meetings to discuss what was then called the Archivists’ Workbench. Two years later, in June 2004, New York University and the University of California at San Diego, working in conjunction with the Five Colleges, were awarded a grant from the Andrew W. Mellon Foundation to develop this tool. The project commenced and by December 2006 the first version of the software was released with the “Archivists’ Toolkit” moniker. The project team at that point was mainly divided by task: project management was handled at UCSD and development at NYU. Analysts wrote specifications from UCSD, NYU, and the Five Colleges area.

The Archivists’ Toolkit was groundbreaking for its successful integration of previously separated data stores. In the pre-AT world, an archival repository may have kept accessions information in an Access database, an Excel spreadsheet or maybe even in a hand-written log. Finding aids, the predominant access tool for archival materials were kept in paper inventories, Word documents, HTML files, or discrete EAD files (if the archives had been able to adopt the technology). Authority control may have been accomplished through the creation of collection-level MARC records uploaded into the ILS of the repository’s parent organization or may not have existed at all. The AT changed these idiosyncratic practices. Accessions information, descriptive data, donor information, location information, and authority records are all kept within one searchable space. Reports and standardized outputs can also be easily generated. The product was similar to an Integrated Library System (ILS) but specifically designed for archival materials. Although ILSs have existed since the rise of computing, the AT was the first software system to tackle the unique needs of archival management and description.

The first release of AT version 1.0 prompted a lot of investigatory testing, but few institutions adopted it on a full-scale production basis. This was to be expected. The program still had some kinks, and most repositories remained wary of a tool that was not fully tested. In 2007, AT version 1.1, a much more mature and stable release, was offered. It allowed for more flexibility in importing legacy data: multiple EAD files could be imported at once, an XML import schema was developed for accessions data, and user-defined fields were allowed within the accessions record. This release prompted many institutions that had previously been working on development instances to move fully into production, although full-scale adoption of the program was still low.

After the release of 1.1, the AT project team hired Ithaka, a non-profit consulting group, to develop a sustainable business model. Ithaka scoped out the archival landscape, divided it into market segments, and conducted interviews with AT users within those segments. From early on the outlook was glum. First, the AT needed to increase its user base. The small size of the archival market meant that the AT needed to become a dominating force within that market. Second, the AT needed to either find a parent organization that would be willing to incubate the project, sharing resources until the project became financially stable– or the project would have to develop its own means of generating income. In the case of the latter, serious budget cuts would need to be made. In order to make up for these cuts in operating expenses, Ithaka suggested the project do what most open source projects do: rely on their users.

Working towards sustainability: increasing the user base

The AT offered archivists new ways to capture, control, and manipulate archival data, and thus can be considered discontinuous innovation; users are required to modify their behavior in order to take advantage of its benefits.[4] Discontinuous innovations are in danger of falling into the “chasm,” or the space that lies between the Early Adopters and Early Majority in George Moore’s Revised Technology Adoption Life Cycle model.[5] Moore warns that Early Adopters are distinctly different from the Early Majority. Unlike Early Adopters, who are eager to implement the latest technology, the Early Majority tend to be pragmatists when making their decisions. They won’t accept a buggy, unpredictable product, and they want the benefits from the product to be demonstrable, especially in increasing productivity. As pragmatists, the Early Majority are also looking for the Whole Product — not just the software, but support, training, and other services and technologies that surround the program to integrate it fully into their existing framework.[6] To cross the chasm effectively, the AT product needed to start gaining support from the Early Majority, particularly by appealing to its pragmatist qualities.

Taking heed of Ithaka’s advice to focus on increasing the user base, AT senior administrators changed my analyst position into a user services support position. I led organized regional user group meetings, held mainly at professional conferences. The meetings generally turned out to be a mix of formal communication from the team and informal communication between users. These meetings were key in informing potential users about the software and its capabilities. The presence of the software at conferences showed we were a viable player and provided a venue for current users to share their successful implementations with potential users.

In addition, the AT team focused on providing the Whole Product. Extensive documentation was developed and maintained. User support, always a strong point for the team, became a priority. We also began working in conjunction with the Society of American Archivists (SAA) to provide training on how to use the program in adherence to national and international standards for archival description. This last move was especially ingenious — working with SAA gave the program an informal endorsement from the nation’s leading professional archival organization.[7] Combining the training with a focus on standards was also key; it sent the message that an additional benefit of adopting the program was making standards compliance easier to implement.

The project was fairly successful in using marketing methods that appealed to the pragmatist nature of the Early Majority, but the team neglected to review aspects of the application that could be modified in order to broaden its appeal. One major oversight was a lack of focus on increasing productivity in every applicable area in the application. Although the AT saves a considerable amount of time by providing automatic EAD markup, the main data entry mechanism in the program is unwieldy, especially when it comes to creating the hierarchy that is inherent in most archival arrangement and description. Archivists accustomed to entering information in Excel or Access are often frustrated by the additional keystrokes, mouse clicks, and navigation required by the AT. Other editing functions typically found in desktop applications are missing in the AT, notably a find and replace feature and an undo mechanism. Because the AT is database-driven, these features are more difficult to implement than in other types of programs, such as word processing software. Yet this difference is not obvious to the typical AT user who generally doesn’t distinguish between one desktop application and another.[8] Since the Early Majority seeks gains in productivity, the lack of these basic and very noticeable features make it easy for them to overlook the other time-saving mechanisms provided by the AT.

The AT also missed another essential part of the Whole Product — EAD search and delivery mechanisms. The inability to serve up EAD from the AT is probably the predominant reason some institutions chose Archon, which does offer immediate web-publishing functionality. The AT does provide basic HTML and PDF transforms, but repositories usually find these outputs insufficient because they don’t allow for complex searching, and because customization requires additional technical skills. Most repositories don’t have the access to technical resources that the AT’s parent universities do; creating a homegrown EAD publishing system is out of their financial and technical reach.[9] The AT did not provide the Whole Product and missed out on a considerable amount of possible market share.[10] I believe the lack of web delivery, as well as the web access Archon offers, figured predominantly in the decision to merge the two applications.

Lastly, the AT also missed the opportunity to develop relationships with third-party partners. Such relationships can help a software project deliver aspects of the Whole Product that they can’t deliver themselves. They also serve to make the product appear stable and well supported to potential adopters. These qualities are especially important for an open source product looking to cross the chasm and which has to face questions not only about its sustainability, but also about the effectiveness of open source software. These partnerships do not just magically appear; they need to be actively sought out.[11] The AT, for instance, could have partnered with a company willing to host the AT database backend and/or provide an EAD publishing system. Such partnerships could have provided more options for users who otherwise were not able to adopt the AT.

Reducing overhead: providing a framework for user contributions

In the spring of 2008, Ithaka informed the AT project team that it was highly unlikely the AT software could expect volunteer development contributions. The archival community is small, and the number of archivists with enough programming prowess to contribute to the AT code base is even smaller. Since we couldn’t expect code contributions from our users, the project was always going have to cover development costs. Ithaka advised the team to find a parent organization that could take the project under its wing. Ideally this would be an organization whose ideology mirrored that of the AT project and who would be willing to incubate it by sharing resources until the AT became financially stable. Should the project not find a suitable parent, Ithaka suggested other business models which could generate income to cover some development costs. They strongly recommended that costs not associated with actual programming – testing, documentation, specification etc. – could all be achieved in the traditional open source manner: using volunteer contributions from the user community. Relying on contributions for these tasks would decrease overhead enough to make the project’s sustainability viable.

By the spring of 2009, talks with potential parent organizations had failed to reach fruition. Around that same time a few prominent archives decided they wanted to extend and improve on the AT base by contracting a programmer (a former lead developer on the AT project) to do so. During this time, six months before the end of the grant period and without any enacted sustainability plan, the AT should have done as much as possible to open up governance of the project to users by delegating tasks and responsibilities whenever feasible. Yet this did not happen, even with the promise of outside development contributions. By all appearances, the project had started off with intentions of encouraging a “bazaar” atmosphere, one in which development is marked by contributions and involvement of numerous people outside the original development team. In order to ensure bugs and other problems were detected, the program was beta tested by 20 different institutions prior to the first release. The original lead developer also embraced the mantra “release early, release often.” Although there were some initial attempts to open up development of the program, it was largely done in a more closed, “cathedral” fashion.[12] For example, although jar files are made available on the AT website for each release, a way to access the latest code, or guidelines on how to be an official “committer” have never been officially provided. Testing eventually became a task completed solely by the project team, due to time considerations and the difficulty in setting up appropriate test conditions.[13]

In his work of managing open source software projects, Karl Fogel provides guidelines on how to incorporate and encourage volunteers to contribute back to an OSS project. One way of doing so, he states, is to “treat every user as a potential volunteer.”[14] The AT project team was congenial and helpful to its users, but it did not recruit users to the extent that it could have. Instead of operating from a level on par with the users, the adopted tone was more one of an expert. This tone was mirrored by a lack of delegation of tasks to the user community. Part of this had to do with the fact that team members were all in salaried positions in which it was their responsibility to work with the software. When it is your business to be an expert on something, it is hard not to assume that stance. It is tempting take on tasks yourself rather than teaching volunteers how to do them, and it can be difficult to enforce tight deadlines when managing volunteers who have full-time positions outside of their volunteer contributions. However, as Fogel states, “the goal is to make every user realize that there is no innate difference between herself and the people who work on the project.”[15] This realization, he believes, is what prompts users to take the initiative to do such tasks in the first place; everyone can become an expert if they are willing to put in the time. In the case of the AT project, we were experts on the program, and could be relied on to provide documentation, run beta testing, answer support questions, etc. There was little incentive to contribute when everything seemed already under control.[16] Because we were experts the team could complete tasks in less time than it would take to train volunteer collaborators .

The first sketches of a framework in which user contributions could be accepted gradually did emerge. The proliferation of new development initiatives forced the AT project to finally start coming to terms with the open source nature of the software. The lead programmer created a plugin framework to allow for new functionality to be added without changing the core code. This isolated code reduced testing time and provided basic means for code contribution without forking the code. In a parallel development, active leaders in the AT user group formed the Archivists’ Toolkit Roundtable, an official SAA interest group, which first met and elected new officials in August of 2009. However, by this time the merger with Archon had been announced, and the role of the roundtable remains undetermined. Likewise, it is uncertain whether the plugins currently in development will be compatible with the merged AT/Archon system.

Conclusion

The Archivists’ Toolkit/Archon merger team has an important task in front of it. Both products were revolutionary in changing the way archivists manage their collections data and greatly eased the production of access instruments. But even though the two products will no longer be competing, the merged product will still need to gain market share to be sustainable. In order to do so, the merger team should:

  • Enable increased productivity by making data entry tasks as easy as possible, provide more visualization for users as they work in the hierarchical description section, include a find and replace function, and add shortcut key functionality.
  • Prioritize usability from the start of application development.
  • Review plugins that are in development as a means of assessing what features are truly important to users.
  • Provide the Whole Product Package. The addition of web services (publishing and editing) will help, but there are other areas that could use revision, notably the reports.
  • Work with other people and companies to provide Whole Product services the core AT/Archon team can’t provide, such as training, set-up, data migration, plugins, etc.

The merger team should also address sustainability issues early in the project. Delegating tasks to dedicated users who are willing to volunteer their time can help cut costs substantially. In order to create a more open and participatory environment, the merger team should:

  • Provide access to the most updated code.
  • Provide guidelines on how code contributions are handled, and who gets “committer” status.
  • Set up an infrastructure that eases the burden of testing on volunteers.
  • Delegate everything that can be possibly delegated to users outside of the development team. Create experts in documentation, testing, specification, etc., in the user community, rather than on the project team. This structure would ideally be organized before the product goes live, so potential new users would not be deterred by the lack of the Whole Product.
  • Use the Roundtable as a form for governance/decision making and task delegation, and keep the number of salaried staff that are not programmers to the absolute minimum

It’s easy to reflect back and discuss what should have been done, but it’s much more difficult to implement systemic changes in the middle of a project with finite resources and tight deadlines. The end product of the AT/Archon merge will likely confront the same problem the AT did: improbable sustainability without a greater number of users, and a coordinated volunteer support system. The new AT/Archon merger team has the unique opportunity to focus on making users part of the infrastructure and fabric of the new program, paving the way for a truly sustainable open source archival data management system.

Notes

[1] During the two years I spent with the project I worked on various different aspects of it: namely marketing, user support, documentation, specification, testing and training. Grant funding for my position expired in July of 2009. NYU generously offered to keep me on for at least one additional year, but several months into that year I left for another position.

[2] Archivists’ Toolkit, “Introduction to the Archivists’ Toolkit,” http://archiviststoolkit.org.

[3] University Library, “Library Archivists Receive Award,” University of Illinois at Urbana- Champaign, http://www.library.illinois.edu/news/archon_award.html.

[4] Geoffrey A. Moore, Crossing the Chasm: Marketing and Selling Disruptive Products to Mainstream Customers (New York: HarperCollins, 2006), 10.
(COinS)

[5] Moore, Crossing the Chasm, 16-20.

[6] Moore, Crossing the Chasm, 112-113.

[7] SAA never officially endorsed the AT. Archon also later began offering training sessions through SAA.

[8] The AT can be run in different configurations — entirely local with the client and database run on the same machine, or with a networked database. The program is written in Java and supports MySQL, MS SQL, or Oracle backends. Hibernate is used as a communication layer between Java and the database.

[9] UCSD contributes to the Online Archives of California (OAC), which serves as an EAD repository for institutions across the state of California. NYU has had its own EAD publishing system in place for several years.

[10] This is based on numbers provided in an Interim Report to the AT project team from Ithaka. Roughly 6,000 archivists work in repositories that lack technical support for all but the most basic needs. In contrast, approximately 850-1200 archival institutions do have the technical resources to create and support their own EAD publishing system.

[11]Moore, Crossing the Chasm, 117.

[12] Eric S. Raymond, “The Cathedral and the Bazaar,” http://catb.org/~esr/writings/homesteading/cathedral-bazaar/

[13] Many archivists who expressed interest in testing were unable to replicate their data and upgrade a different database to the test version, either because they personally did not have the technological know-how to complete such tasks, or because they had technical support that were not amenable to performing them.

[14] Karl Fogel, Producing Open Source Software: How to Run a Successful Free Software Project (2005/2009), 145. Available on the web at http://producingoss.com/
(COinS)

[15] Fogel, Producing Open Source Software, 149.

[16] Ironically, many of these tasks, especially documentation and support, are appealing to potential users and offer more of the Whole Product. I think the project was right in taking on those tasks but should have incorporated users more in their production.

About the Author

Sibyl Schaefer is the Metadata Librarian for the University of Vermont’s Center for Digital Initiatives, where she provides metadata expertise for digital resources and manages the Center’s digitization and metadata projects. She currently is a member of the Society of American Archivists’ Standards Committee and a Certified Archivist. Sibyl previously served as the User Services Liaison on the Archivists’ Toolkit project, a role that included providing customer support and training on the application, conducting usability testing, developing the user group, and specifying new or improved features.

One Response to "Challenges in Sustainable Open Source: A Case Study"

Please leave a response below, or trackback from your own site.

  1. OSS Eval Methods-Lit Review « Practical E-Records,

    [...] As I noted a few weeks ago, Emily Brock and I are reviewing formal evaluation methods for Open Source Software (OSS). We’re doing this because I would like to get a handle on what worked or didn’t work with the Archon project.  Having an objective understanding of that project’s strengths and weaknesses will be critical as the ArchivesSpace project moves forward.  The article that Emily and I hope to write will complement Sybil Shaefer’s excellent Code4Lib piece. [...]

Leave a Reply