Issue 41, 2018-08-09

Are we still working on this? A meta-retrospective of a digital repository migration in the form of a classic Greek Tragedy (in extreme violation of Aristotelian Unity of Time)

In this paper we present a retrospective of a 2.5 year project to migrate a major digital repository system from one open source software platform to another. After more than a decade on DSpace, Oregon State University’s institutional repository was in dire need of a variety of new functionalities. For reasons described in the paper, we deemed it appropriate to migrate our repository to a Samvera platform. The project faced many of the challenges one would expect (slipping deadlines, messy metadata) and many that one might hope never to experience (exceptional amounts of turnover and uncertainty in personnel, software, and community). We talk through our experiences working through the three major phases of this project, using the structure of the Greek Tragedy as a way to reflect (with Stasimon) on these three phases (Episode). We then conclude the paper with the Exodus, wherein we speak at a high level of the lessons learned in the project including Patience, Process, and Perseverance, and why these are key to technical projects broadly. We hope our migration story will be helpful to developers and repository managers as a map of development hurdles and an aspiration of success.

By Steve Van Tuyl, Josh Gum, Margaret Mellinger, Gregorio Luis Ramirez, Brandon Straley, Ryan Wick, Hui Zhang

Prologue

Project overview

ScholarsArchive@OSU is Oregon State University’s Institutional Repository. Started in 2004 as an open repository for faculty articles and other scholarly content, ScholarsArchive@OSU quickly became integrated into multiple workflows for production of scholarly content across the university, including the deposit and distribution of graduate theses and dissertations, undergraduate theses, and historical publications from OSU’s Extension and Experiment Station Communications office. More recently, Oregon State University Libraries and Press (OSULP), the organization managing ScholarsArchive@OSU, has been tasked with implementing the OSU Faculty Senate’s Open Access Policy. OSULP has also started accepting research datasets in ScholarsArchive@OSU in order to help researchers comply with journal and funding agency mandates for data sharing. As of this writing, ScholarsArchive@OSU holds approximately 60,000 scholarly works across a variety of resource types and scholarly domains.

Since its inception, ScholarsArchive@OSU has been built on the DSpace repository platform. DSpace is an open source repository solution that was developed over 15 years ago to meet the needs of institutions seeking long term preservation and access of digital works. DSpace now boasts thousands of adopters, primarily in academic and government user spaces. In 2015, after over a decade on DSpace, OSULP administration tasked ScholarsArchive@OSU managers with evaluating the repository (at that time on DSpace 3.x), determining whether an upgrade or migration was necessary, and if so, to recommend a target platform to which to upgrade/migrate. ScholarsArchive@OSU managers evaluated the existing DSpace repository, collected requirements for the repository from stakeholder groups, and evaluated major repository platforms for their ability to fulfill our requirements (Van Tuyl et al. 2015). As a result of this evaluation, we determined that migrating our content to a Samvera repository solution was in the best interest of our organization in part because it was clear that major changes were needed to meet our use cases for the repository and that, as an organization heavily invested in Samvera through another project (Oregon Digital), it would make sense to unify our resources around Samvera repository solutions.

Why a tragedy?

At its most basic, the Greek Tragedy carries a structure that is familiar to software development teams who have used the Retrospective as a way to reflect on a development effort. In the case of an entire project, though, there are often cycles of work or phases of a project that could be evaluated. These phases are similar, in many ways, to the Episode of the Greek Tragedy – chapters of an overall story. Similarly, these phases of the project each have their own lessons from which we can learn, similar to the Stasimon, or commentary in the Tragedy on the events of each Episode. In essence, the Greek Tragedy offers us a structure to create a retrospective of an entire project, including all the periods of activity in the project, and the Stasimon and Exodus give us opportunity to reflect on lessons learned at the periodic and project-wide time scales.

This tragedy

For this paper, we will not focus, overly, on the technical details of the project or software tools created and used. This is really a social story, rather than a technological one, and we feel that too many technical details will get in the way of the social, organizational, and community aspects of our project. This will also help those reading this, but working in other contexts, impute our experiences onto their own – something we feel techno-jargon would hinder. We hope our migration story will be helpful to developers and repository managers as a map of development hurdles and an aspiration of success

Parados

The resourcing (in terms of personnel) varied a great deal across this project. That said, when sharing technical project experiences with others, one of the first questions that arises is “but how many people did you have working on the project?” At peak staffing, we had five software developers, a product manager, a product owner, a metadata librarian, and 2-3 additional staff and librarians contributing to the project. On the other hand, there were times on this project when we were working with 2-3 software developers working part time on the project along with one person acting as project manager and product owner. The challenges of this variability in personnel resourcing will become apparent as we examine the timeline and major events of the project. Beyond personnel, the project had full support from OSULP administration to use time and resources as needed to complete the project. This speaks to a general atmosphere of support for technology projects at our organization and recognition that these projects are part and parcel of our core library services, and deserve this level of support.

In addition to personnel on our project in our organization, it is important to note that personnel from the Samvera Community at large played an important role on this project. As the reader will see, one of the important outcomes of this project is the value that we found, and believe to be universal, of meaningfully engaging with the Open Source Community we are part of. It is challenging, if not impossible, to usefully quantify the return on investment of involvement in a community like Samvera, but the time spent by other community members providing technical support (solicited and unsolicited), conceptual support, and emotional support have provided us with benefits beyond what one could expect from contracts or subscriptions. We will save further discussion of the value of participating in open source communities for another place and another time.

Episode 1 – Optimism

The aforementioned evaluation and reporting on the future of ScholarsArchive@OSU took place from Fall 2014 through Spring 2015, with the report published in February 2015. The publishing and administrative approval of the report, and its recommendations, initiated the planning phase of our project. At this point, the team determined that leveraging an existing repository solution bundle in the Samvera Community, Sufia, was the best starting point for our future system. Sufia is primarily built as a self-deposit style repository and lacked many of the features we require for our institutional repository application, such as deposit management workflows, collections, and robust analytics. It was clear to us that we had work to do to bring Sufia to a place where we would be able to apply it to our institutional repository use case. At that time, the Samvera community was engaged in a number of initiatives that we found compelling for our use cases, primarily among these was implementation of the Portland Common Data Model, which showed promise for arbitrarily complex compound objects in our repository. As a first step, we initiated a number of development sprints to stand up a generic Sufia repository as a starting point for both digging into the technology stack for Sufia and for helping the repository management team get a sense for the look and feel of the new repository and how it, generally, would function.

By Summer 2015, we had a number of development sprints under our belts and had started to make local changes to Sufia to meet our requirements. It was clear, however, that the team still had a great deal to learn about the technology stack, communication within the team, and the processes we needed to have in place to function effectively. The amount of learning we needed to do was not entirely surprising – we had three developers who were relatively new to the Samvera technology stack and a product owner who had never formally played that role before. Making our way up the technological learning curve wasn’t a bad thing, in fact it is safe to say that diving into the technology was probably the best way for our team to come to a common and in depth understanding of how the technology worked.

The Product Owner and Project Manager also had a lot of learning to do, both about the technology (at their respective levels of granularity and need) and about communications and expectations for the project. Concepts like sprint planning, standup meetings, retrospectives, and demos, while conceptually easy, were challenging for the Product Owner to implement, given his lack of experience and background in this area. The Product Owner came from a background of project management and implementation, but had never done this kind of work in a software development environment and never as an officially titled product owner or manager. This required the Product Owner to parse the concepts of the development framework the team was using (a loose scrum model), crosswalk them to terms or project components he was familiar with, and then try to determine how to apply lessons learned in previous project environments to these concepts. Doing this while engaging in a development project as a product owner was challenging, and the resulting loss of confidence by the Product Owner likely resulted in a loss of efficiency on the project. We also struggled through understanding how best to manage the infrastructure of our project such as Github repositories, project boards, documentation spaces, metadata application profiles, and ancillary data sources required for the project. Suffice to say, we progressed in fits and starts during all of this learning, though it is important to note that we progressed, nonetheless.

Summer and Fall of 2015 were also a time of intense personnel churn in our organization. In late Summer, the head of our technology department left the organization, followed shortly by our lead developer. Through this time, we maintained monthly, two week development sprints on the project, but the rate of progress in these sprints slowed dramatically as we worked through rehiring for these positions and moving forward on development while still on the learning curve for the project. Suffice to say the departure of these two individuals created an enormous delay in the overall project. Though support for the project remained in place, this disruption left a lot of questions in our minds about the future of the project. Some conversations among team members at this time hinted at withdrawing from the migration project – we’d had some successes, but the project was new enough that we could have called it off without too much loss. Fortunately, we chose to keep moving forward – our need for a new repository system was too great, the remaining team was motivated to continue, and we had full support of our administration, even in the face of these challenges.

Stasimon 1 – Acquired Knowledge and Lost Naivete

The learning curve

This is something we expect on every new project, a period of exploration and learning in order to understand enough about the technology, data, requirements, and team to allow the project to progress. We must say, though, that the learning curve for our team, on this project, was steep and long – not only were we working with new technology, but the trappings of a repository migration project were largely new to the team. It is challenging to say how much time must be allotted for scaling the learning curve for a project of this sort. In fact, one might even suggest that such an approach is futile, that the learning must happen either independent of the project, or while the project progresses – otherwise the team will not know enough to effectively do their work. In our case, this looked like a lot of time spent. In our initial months of development sprints, exploring code, and data, and metadata to understand what we were even looking at and how it all fit together. This is an intangible that can be uncomfortable in the planning stages of a project, due to uncertainties around how much time and resources it will require, but it is a necessary discomfort that allows for growth, and building a baseline of experience and understanding.

Managing personnel churn

This is going to happen eventually. There are better and worse times for it to happen, but regardless, it will be disruptive. The primary mistake we made was allowing ourselves to have a single point of failure with respect to succession planning for personnel turnover. This would have been helpful both in the case of management turnover (department head leaving) and in developer turnover (lead developer leaving), though one can imagine that succession planning for all personnel would be useful. The primary issues in management turnover were uncertainty about whether the initial amount of resourcing committed to the project would continue under new leadership. Process could have helped alleviate this problem – having a clear project plan in place that defines the resourcing can allow incoming management understand the expectations for ongoing projects. The issues raised with the departure of our lead developer were that he held all (or at least most) of the connection to the community we worked with, and that much of our work was benchmarked against his skill set. After the lead developer’s departure, we made a conscious effort to engage with the Samvera Community across our team and to set production benchmarks based on individual developers capacity and/or to benchmark based on what the team as a whole could accomplish. It is so critical to understand where the entire team is with respect to ability in order to set realistic milestones and to buffer against losing a team member.

Episode 2 – Clarity

We started Winter and Spring 2016 with an introspective period kicked off by re-evaluating project timelines, expectations, and roles in the face of the issues we’d faced at this point in the project – namely unclear process and significant personnel changes. This re-evaluation gave us an opportunity to assess our previous assumptions about requirements, minimum viable product, and stakeholder engagement in the project. Out of this re-evaluation we realized that we needed more realistic milestones, redefined what we meant by minimum viable product for the repository, and determined that periodic re-evaluations of our milestones and progress should become part of our regular work schedule.

During this time period, we also hired a new lead developer into our group. This hire was indispensable for completing our team, but also resurfaced challenges experienced earlier in the project related to the learning curve for the technology we were using and how we functioned as a team.

At the same time, the Samvera Community was focusing resources around an architectural realignment of some of its repository solution bundles, the result of which would be Hyrax, which targets a wide range of types of digital repositories. As we tracked this community work, we identified three areas where we thought our team could provide assistance. One area was creating a feature that had not yet been spoken for by another institution or individual in the community, implementing nested works. The second effort was initiating a community effort to build flexible workflows into the system. Lastly, we helped initiate an interest group in the community focused on migrating content from DSpace to Samvera. This interest group provided a number of opportunities for collaboration (or at the very least, idea sharing) and, maybe more importantly, peer-to-peer counseling on issues arising from this type of migration.

Stasimon 2 – Patience and Engagement

Iterative evaluation of process

This period of the project started with a re-evaluation of all of our expectations for the project. Initially, this felt like a step backwards – like we had lost traction and were moving in the wrong direction. In retrospect, though, this was an opportunity for our team to learn about iterative evaluation of project expectations, timelines, and roles.

Architectural churn

The architectural churn we experienced in the Samvera Community during this time period, was not unique to our institution – we are but one adopter of the software in question. We believed, though, that our project encountered this churn at just the wrong time – that we were particularly caught up in the difficulty of these changes and uncertainty in the code we were using. One approach to managing this type of churn is to stop and wait until it calms down, evaluate where things stand, and then make a decision about what software (version etc.) to use. Another approach, the one we chose, was to move our development effort onto the master branch of the new system (Hyrax) and “ride the wave” of change and enhancement. This approach requires a special amount of attention and acceptance of change and difficulty, but we found that it paid off. The enhancements being made to the new system, especially as the initial year of work on Hyrax progressed, were enough to justify the risk of riding the wave.

Community engagement

It turns out that this is super valuable. By being closely involved with the Samvera community, not only were we better able to see what directions the community was moving in, as a team committing resources to doing work in the community we were able to have a hand on the steering wheel. This isn’t to say that we swayed the community to do things that were not in its interest, but that our commitment of resources offered a nucleation site for community development around the features we sought to build. This resourcing and the effort that we, and other institutions, put into the Samvera community resulted in features that both met our local needs and the needs of many other community members.

We’re learning

We’ve learned something since Episode 1: changes will happen, we need to be flexible, but keep moving forward. The introduction of a new department head and lead developer, and the loss of our project manager were challenging, but we had experienced personnel challenges before during this project (see above), and we knew how to better work with modifications to our team – namely, expect a learning curve. We also worked hard to accept more frequent reevaluation of project milestones in the face of the skills and personnel at hand, which came in form of regular, approximately quarterly, evaluations of the status of the project and whether our existing milestones were realistic given how things were going.

Episode 3 – Closure

By Winter 2017, we had determined that it was necessary to re-evaluate and reset the project (again) in the face of the aforementioned architectural changes to the software we intended to use. It was clear to us that we wanted to target our repository migration at the emerging Hyrax 2.0, given the set of features that would be included in that release, compared with the Hyrax 1.0 release. But the release of Hyrax 2.0 was only on the horizon when we made this decision, and we needed to develop a plan for tracking the work happening on the Hyrax master branch in order to be ready to release our repository close on the heels of the 2.0 release. This was tricky in some ways, but the extent to which we were engaging closely with the Samvera Community helped guide us through these challenges – we knew what was happening, when it was expected to happen, and how we might maneuver our work around any obstacles.

Parallel to developing against an impending release of Hyrax, we were developing tooling for migrating our repository contents from DSpace to Hyrax 2.0. This development was closely tied to metadata remediation, controlled vocabulary development, and initial user experience testing and conversations with stakeholders. Ultimately, and somewhat surprisingly, this tooling and metadata work proved to be the primary time sink for the project from this point until project completion.

By this time, we had refined our planning and scheduling processes in a way that worked well for our team. We engaged in re-evaluation of project milestones about every 2.5 months, settled on longer work cycles for the project, and worked to define roles and expectations for all team members. In addition, we set a series of meetings to discuss the major aspects of the project, coupled with brief daily check in meetings (stand ups) and longer weekly meetings to discuss ongoing or larger issues. The repository migration project officially finished in January 2018 at which point we switched from migration to maintenance, cleanup, and enhancement.

Stasimon 3 – Parts and Enhancements

This is where we should have been

The hardest part of looking back on this project is seeing how, in the last year of the project, we managed to do most of the heavy lifting, while prior to this year the project felt untethered. As discussed, there were a lot of reasons for the differences in our ability to gain traction on this project in its first years, but it is clear that experience, personnel stability, and code stability played a huge role in our ability to move the project forward in its final year. All we can do, at this point, is learn from our experience, and we have been – over the past six months since our repository migration concluded, our technology department has been working to revise a number of procedures for selecting projects, setting expectations for project structure and process, and defining roles and responsibilities for project participants in order to help future projects gain traction earlier.

This project has multiple tracks

Explicitly defining and splitting the project into multiple tracks (building the repository on Hyrax vs. migrating the content) was critical to breaking this project into manageable (if large) pieces. It also allowed us to better assign resources to these tracks, to define when work needed to happen on these tracks, and to see where the tracks were and were not dependent on one another. This breakdown of the project helped us move past a period of paralysis wherein it felt like we couldn’t make progress because the entire team was trying to solve all of the problems at once.

Risk can be rewarding

Ultimately, our choice to pin our repository work to the impending release of Hyrax 2.0 was a good one. At times, though, it felt risky to place so much control over our project into the hands of the community at large. We feel, though, that our direct and frequent engagement with the Samvera community throughout this phase of our project alleviated much of the mystery around when we would get this new release and what it would look like. We also found that by working so closely with the community during this time period, we knew so much more about Hyrax when we finally launched our repository. We had spent so much time both in the code and working in various iterations of the software as users and repository managers that we had a very good sense for the ups and downs of the final product.

Exodus

In the end, this project was a success, but suffice to say there were a number of times in the past 2.5 years when our grasp on the work felt tenuous at best. Our prior experiences suggested that this would be the case, to an extent, as it is for all projects – nothing ever goes smoothly or well all the way through. But the issues we encountered in this case were manifold, and in this way it feels like we hit all of the bumps one could. So what did we even learn? We’ve discussed each of the three major phases of the project, and lessons learned along the way. Here we distill those lessons.

So what does it take to migrate a digital repository from one platform to the next?

Patience

Unexpected things will occur during the repository migration – in our case these primarily came in the form of organizational change, personnel change, disruptions in our repository community. When these changes come, it is necessary to be patient while they happen, and this patience can express itself in a few ways. First, one can stop and wait for whatever disruption is occurring to pass, then pick the project back up. Second, one can keep moving forward on the project, working through, past, and with whatever challenges come with the disruption. The second approach, the approach we took, certainly resulted in slowdowns to our project. At first, these slowdowns were hard because they were pushing against our timelines and delivery dates. Eventually, though, as we gained patience with the onslaught of disruptions, we learned to take the disruptions in stride and use them as opportunities to evaluate project expectations and goals. It is hard to know which of these two approaches results in a longer timeline for any given project, but we do know that choosing the second approach resulted in a long project timeline. We also know, though, that it was impossible to predict the disruptions we faced, and that commiting to the project was right for us.

Process

If at all possible, have good processes in place before starting a major project. Of course, this isn’t always possible and in many cases, like ours, one believes the processes in place to run a project are sufficient, or even exemplary. What are the key elements of process that we can look back and point to as critical?

  • Project planning and decomposition – at an abstract level, breaking the project into its component pieces and then, where possible, recomposing these pieces into logical project stages or milestones
  • Iterative project evaluation – periodically evaluating progress on the project, roles and expectations, requirements, and milestones
  • Well defined roles and expectations – who is responsible for the decomposed elements of the project and what does it mean to be responsible for those elements?

The first of these, project planning and decomposition, is an area that we felt we had a pretty good handle on at almost every phase of the project. In some ways we did, but our ability to understand the project in its entirety was limited in the beginning, as it always is. That said, we believe it wasn’t really until fairly late in the project that we took the time to fully break the project down into its elemental parts and discuss at length the requirement each part fulfilled, the approach to resolving each part, and who was responsible for each part. Which brings us to the second element, iterative project evaluation – this allows team members to feed their experiences with implementation of requirements and unexpected disruptions to the project back into the project planning, roles, and responsibilities. In the end, we found that evaluation on a number of time scales was helpful, with weekly and bi-monthly evaluations of progress on meeting requirements, roles, and expectations. Last, defining these roles and expectations clearly among team members helps everyone understand to whom they can turn with questions or for help on any element of the project. Like other elements of process, these roles and expectations can change over time, and should be evaluated frequently to ensure every team member is comfortable and clear on their role and the expectations that come along with it.

Perseverance

We had originally named this lesson “Peril,” and indeed peril lurked around many corners throughout. But what gets one past the peril is perseverance – moving forward on the project regardless of the dark places we can all find ourselves when working hard on something challenging. Part of what allows perseverance to flourish is celebrating when things go well, focusing both on the good and the bad in the project. This is something we could have done more of on our project and this is a lesson we’re still trying to learn on our team. We should celebrate our perseverance as much, if not more, than we celebrate an incremental win or the completion of the project. Perseverance is what gets us through those increments and through to the end.

Conclusion

We’ve learned a lot of lessons from this project and expect to learn more as we move on to the next projects. But, here, months after the completion of our repository migration, these three Ps – Patience, Process, and Perseverance – have been a focus for our team and our department as a whole. As we move to other projects, some larger even than this one, we are actively working to learn our lessons and to move into the future on surer footing than we were on before we started.

Acknowledgements

We would like to acknowledge the support of our colleagues at Oregon State University Libraries and Press and in the Samvera Community, who provided insight, feedback, and encouragement throughout this project.

Resources

City Dionysia – Explore the Tragic Structure. http://artsedge.kennedy-center.org/interactives/greece/theater/playsTragicStructure.html (accessed 31 May 2018).

Samvera – an open source solution for digital content. http://samvera.org/ (accessed 31 May 2018)

Van Tuyl, S, Zhang, H, Boock, M. 2015. Analysis of challenges and opportunities for migrating ScholarsArchive@OSU to a new technical platform: requirements analysis, environmental scan, and recommended next steps. http://ir.library.oregonstate.edu/concern/technical_reports/k06989027 (accessed 1 May 2018).

About the Authors

Steve Van Tuyl is the Digital Repository Librarian at Oregon State University Libraries and Press. He currently acts as a project manager for the Emerging Technologies and Services Department, as well as product owner for a number of repositories. He also has worked for many years building and supporting Research Data Services programs and as an all-around data flunky.

Josh Gum is a analyst programmer at Oregon State University Libraries and Press. He has worked in technology related fields for more than 20 years with an emphasis on web technologies.

Gregorio Luis Ramirez is an analyst programmer at Oregon State University Libraries and Press. His work focuses on a broad spectrum of digital projects in the Emerging Technologies and Services Department.

Margaret Mellinger is the Director of Emerging Technologies and Services at Oregon State University Libraries and Press and has worked with open source projects for several years.

Brandon Straley is an analyst programmer at Oregon State University Libraries and Press in the Emerging Technologies and Services department. His work primarily focuses on both on emergent and legacy web technologies and agile development practices. He aspires to continue refining his development practices and learning more about how to better work on large, long term projects.

Ryan Wick is an analyst programmer at Oregon State University Libraries and Press, with Emerging Technologies and Services as well as the Special Collections and Archives Research Center. He works primarily with repositories, metadata and digital collections.

Hui Zhang is a Digital Applications Librarian at Oregon State University Libraries and Press where he focuses on repositories and special projects.

Leave a Reply

ISSN 1940-5758