Issue 46, 2019-11-05

“With One Heart”: Agile approaches for developing Concordia and crowdsourcing at the Library of Congress

In October 2018, the Library of Congress launched its crowdsourcing program By the People. The program is built on Concordia, a transcription and tagging tool developed to power crowdsourced transcription projects. Concordia is open source software designed and developed iteratively at the Library of Congress using Agile methodology and user-centered design. Applying Agile principles allowed us to create a viable product while simultaneously pushing at the boundaries of capability, capacity, and customer satisfaction. In this article, we share more about the process of designing and developing Concordia, including our goals, constraints, successes, and next steps.

by Meghan Ferriter, Kate Zwaard, Elaine Kamlley, Rosie Storey, Chris Adams, Lauren Algee, Victoria Van Hyning, Jamie Bresner, Abigail Potter, Eileen Jakeway, and David Brunton

Concordia and By the People are enactments of the user-centered focus of the Library of Congress Strategy and Digital Strategy. In “Enriching the User Experience,” the Library of Congress 2019-2023 Strategic Plan articulates four essential goals: expand access, enhance services, optimize resources, and measure results. Released in October 2018, the Library of Congress’ first Digital Strategy complements the Library of Congress 2019-2023 plan by emphasizing approaches to meet those goals digitally. Specifically, the Digital Strategy promises that we will throw open our treasure chest, we will connect, and that we will invest in our future.

We were fortunate to build Concordia and By the People at a time when the Library of Congress strategic plan and Digital Strategy were being articulated. These two documents signal directions that you can see in both By the People’s community management and Concordia’s design and development. Underscoring the objectives of the Digital Strategy by providing interactive engagement with the Library’s collections, By the People connects to audiences across levels of learning and geographic location. Concordia’s development is ongoing and sustained in such a way as to build a road toward future, as-yet-undefined capabilities. Furthermore, By the People and Concordia expand access to the Library of Congress through enhanced technical services and a robust program of engagement.

The design principles for this project emphasize trust and approachability as a complement to the authoritative role the Library of Congress plays in American memory. Both By the People and Concordia offer opportunities to deeply engage our audiences with our collections and interactive interfaces. They are also aligned with a manageable guiding practice within the Digital Strategy Office and Library of Congress Labs team: “do something, gather evidence, and then refine the approach.”

In the sections below, we briefly describe the By the People program, how we adopted an Agile approach, and the process of developing Concordia. We hope the discussion demonstrates the ways we designed opportunities for public audiences to engage with Library resources.

Becoming By the People

By the People aims to engage and inspire participants by inviting them to transcribe and tag digitized images from the vast collections of the Library of Congress. Our program has a user-centered approach that supports exchange between many communities and the Library of Congress.

The By the People program launched on October 24, 2018, featuring five digital collections already available on loc.gov from the Manuscript Division at the Library of Congress: selections from the papers of Mary Church Terrell, letters written to Abraham Lincoln, Clara Barton’s diaries, Branch Rickey’s baseball scouting reports, and memoirs of disabled Civil War veterans in the William Oland Bourne collection.

The transcriptions created by volunteers improve access to and enhance the discoverability of handwritten and typed documents that computers cannot accurately translate without human intervention. Transcriptions also increase access through greater compatibility with accessibility technologies such as screen readers used by people with low vision. They also provide legibility as an aid for people who have difficulty reading the source materials. The transcriptions are also the basis for keyword searching and, when served alongside their corresponding digitized images in loc.gov, support pathfinding in the Library of Congress digital collections for lifelong learners, students, teachers, and researchers. Furthermore, we aim to share the data at various levels–page level, item level, and campaign level–and in amenable formats for those seeking to perform large scale textual analysis and explore Library of Congress collections with other methods.

By the People is deeply informed by the ways the Library of Congress and cultural heritage organizations have invited the public into participatory projects. We have benefitted from the great design and experimentation that preceded our entry into this space. By the People is also a program guided by carefully articulated user-centered design principles.

The design principles offer a framework for the path of development and a touchpoint for our decisions. In the design principles, we address privacy, ethics, broad participation, lowering the barrier to entry, and meeting people where they are. We also set expectations for clear communication, use and availability of data, and continuing to improve through user feedback. In sum, the design principles set a vision for the future of Concordia and strong start for a program like By the People.

Our design principles can also be viewed as sequential phases crafted to inform one another: engage, understand, connect, and grow. Working with these principles and goals as foundations for By the People, we have been able to better frame the software, collections selection, and ways we strive to meaningfully recognize the efforts of volunteers. We value volunteers as active participants and co-creators who help define new paths to our collections and data to facilitate access.

An Agile Mindset

The user-centered, participatory mission of By the People is reflected in our choice to implement an Agile methodology for development of Concordia. We needed to have an approach that allowed us to map and manage our work as we continue to learn more from our audiences and about the scope of collections that will be presented.

We knew from the beginning that Agile methods would be a core requirement for whatever software was to be developed. This approach suited what was to be both a new software implemented at the Library of Congress as well as a new program of engagement. Agile allows for research-based hypothesis testing, releasing early and often, risk management, integrating user feedback, a primacy on close communication between motivated individuals across roles, and reflection on how to be more effective and adjust accordingly. [1]

The project kicked off with a Technical Lead, a Product Owner-Program Lead, a stakeholder group with 6 representatives from across the Library of Congress, and a vendor team. Six weeks prior to its public unveiling and formal launch, our core project team consisted of the Technical Lead and Product Owner-Program Lead, Product Manager/Community Manager, Senior IT specialist, two additional community managers, and 20% part-time Designer/UX expert. This group participated directly in the Agile process. The stakeholder group remained involved in the process through weekly report outs and monthly meetings.

At the beginning of each sprint cycle, we started with an overarching idea of what we wanted to achieve and, through the Agile process, added in details and made refinements as we went along. Using this process allowed us to stay focused on the most meaningful aspects of the project and to continue iterating while avoiding unnecessary complexity and feature creep.

We used Agile ceremonies to manage three parallel tracks of software development, collections management, and communication. We joined these efforts together in our daily check-ins. These daily check-ins complemented weekly sprint grooming, as well as sprint closeouts that included demonstration and retrospectives (“retros”) every two weeks.

Even though we knew an Agile approach would lead us through product development, we also knew it would take more than a good piece of software to make the program of engagement a success. Identifying collection materials to feed into the software, and figuring out the manner of getting transcription data out of it, were projects unto themselves. Similarly, the task of managing outreach and user engagement around the crowdsourcing program were significant enough to necessitate a dedicated effort. To that end, three community managers were hired to support the Concordia development process and concurrently build out what would be By the People outreach internally and externally, identify launch collections needing transcriptions, and collaborate on data migration and design.

As we jumped into an Agile methodology, we realized its suitability for developing Concordia was extended to managing the program of engagement for By the People. An Agile approach also has allowed for responsive, flexible community engagement and collections and content selection. As a way of thinking and doing, Agile suited the Concordia project team and showed strongly when the team was called to adapt along the way.

One thing we observed about our product team: in each phase described below the responsibilities and time commitment of team members shifted to some extent. In each sprint, we refined the responsibilities, process, and level of involvement in specific tasks based on the outcomes of retros and the context of the project. Those conditions included roadblocks in development, assessing content for launch, and balancing wider organizational priorities. This process of adjustment allowed us to focus on responding to the state of the product and its needs – before launch, as we established the public program, and now as we move forward with improving and implementing a user-centered product.

Concordia Development Phase

In this section, we define the product development phases and key decisions, as well as the roles and activities of the product team, in those phases: Setting the Stage, Into Development and Launch, and Maintaining and Improving.

Setting the stage

As we kicked off planning and development, the team identified some technical considerations and constraints. First, this would be a containerized Python-Django-Postgres-etc web application. We intentionally selected mature technologies for the stack so we avoided pitfalls of managing rapid change and experimentation in two areas; therefore we could move forward with trying experimental work in a familiar stack. Additionally, this stack aligns to those that support other services at play at the Library of Congress. The decision allowed us to build a tool aligned with existing practices and services at the Library of Congress. It also means that it is more likely that Concordia and By the People could be supported by existing staff in the long term and grow according to user needs and institutional goals. Furthermore, these decisions resulted in a tool that was developed openly to be possibly useful for other organizations using similar stacks.

Secondly, we knew we wanted to host Concordia and By the People in the cloud with no on-premises dependency to benefit from serverless technologies, auto-scaling, and fault recovery without having to implement those ourselves; we also sought to avoid any performance limitations or strain on our coverage and on-premises data center. Using our existing content delivery network or image servers in real time to serve materials on the site wouldn’t be sufficient, since we would need to provide scalable access to many distributed users at the same time.

We piloted a partial solution in a crowdsourcing experiment launched by Library of Congress Labs in 2017. This pilot project, called Beyond Words, allowed us to pinpoint challenges with serving high resolution images—necessary for volunteers to zoom in on detail as they transcribe and review—via our content delivery network. For Beyond Words, we experimented with requesting images and metadata via the Chronicling America Application Programming Interface (API), conversion of images in the cloud, and delivery from AWS s3 buckets to the crowdsourcing interface. We were able to bring some of these approaches forward into Concordia.

A third technical consideration: we wanted to use the publicly-available loc.gov API to call the collection metadata and images in JPEG format and save a copy for use in Concordia. This would leverage our existing technology and build upon the work already undertaken to present our digital collections through a public interaction layer in https://www.loc.gov.

Another technical consideration and decision was to use OpenSeaDragon as an image viewer. Library of Congress staff contributed to OpenSeaDragon and it is used in Library of Congress public interfaces. With this experience, we can be prepared to update or adopt a different viewer should any need arise.

Finally, we wanted to use the BagIt package format for completed transcription data to be joined with item records. The Library of Congress helped co-author the specification for BagIt, which celebrates its 10th anniversary this year. BagIt workflows are used daily at the Library of Congress to manage many collections. Matching Concordia’s capabilities and functionality to existing Library of Congress technical workflows and collections management practices would allow By the People to align with staff efforts. We believed that alignment would, in turn, support lightweight processes to reunite volunteer-created text with the source digital assets for access and preservation.

We also needed to blend technical considerations with project design considerations. At the center of managing Concordia was a practical reality: the project had a fixed launch date. As a direct result, we had to be flexible on functionality requirements, including narrowing scope based on setbacks during development.

At the heart of Concordia’s beginnings were 4 design principles: engage, understand, connect, and grow. These principles provided foundations to pursue the project aims of engaging and inspiring audiences through trust and approachability. Key stakeholders from the Office of the Chief Information Officer, Library Services, and National and International Outreach had a brainstorming session prior to development kickoff, during which we created 55 user stories. The Product Owner and Technical Lead mapped these user stories to ranked priorities and technical needs, and further categorized them in relation to the design principles.

We then identified an essential subset of the user stories in the highest priority category. We used these user stories to scope the development and focus on building a web application that would allow us to best support volunteers, get the best quality transcriptions, and provide the least friction in task workflows. The functionality must-haves we knew we needed from the beginning were:

  • Full-text transcription in a plain text editor (no rich text editing)
  • Allow anonymous transcriptions
  • Ethical design for user registration for contribution tracking
  • Transcription mapped to unique identifiers at the asset level
  • Some kind of tagging at the image level

We further articulated functionality as we better understood user needs through testing:

  • Peer review to achieve completed transcriptions that supported transparency in the transcription workflow
  • Asset reservation system to prevent volunteers from clobbering each others’ work
  • Structured data model to set context for the materials (grouping items into projects and campaigns)

These features and consideration were, we decided, the bare minimum needed to meet the design principles, create an engaging task workflow to inspire volunteers and encourage high quality transcriptions by volunteers and to get a start on collecting volunteer-contributed tag data. For transcription, we pursued a peer review workflow that allowed volunteers to see and edit each others’ work; we believed this would further deepen engagement with collections and shape community activity. Tagging was a feature that was impacted by development setbacks. We decided we would take a baselining approach at initial launch; gathering and then evaluating tagging data. Therefore, we skipped implementing any tag functionality other than adding and removing tags to images. It was more important to make sure the transcription experience was as smooth as possible while managing our constraints on time and staffing.


Figure 1. An early wireframe representing features determined for minimum viable product launch.

Into development and launch

A note on the development environment: we use Docker to run the database and message broker services supporting the application. Using Docker allows developers to get up and running quickly without having to install, configure, and run a list of prerequisites. The main Concordia application itself can run in development as a Docker container or using the Django “runserver” command. In production, the application is deployed as a Docker container in Amazon Web Services Elastic Container Service. Developers have the ability to run the exact same Docker image on their workstations as that which runs in the Elastic Container Service in production.

We kicked off development by creating tickets for the user stories that we considered the core essential functionality needed for a minimum viable product. As we worked sprint by sprint through those tickets, we identified discrete tasks that were of significant enough scope to warrant being documented in their own tickets and estimated level of effort using story points. From there, we broke user stories down into tasks which were small enough to be handled by a single developer in no more than a few days. This allowed us to see tangible movement on the application and measure progress one sprint at a time.

The team worked intensively up to and through launch in late October 2018. In the sprint following launch, we continued to refine basic functionality and improve upon the core product. This process saw us address existing workflow and add significant refinement in 10 releases in the month and a half after By the People’s public unveiling. Along the way, we balanced the minimum viable product approach across our team and with stakeholders; it was often challenging to keep quality high and focus clear while keeping expectations for the near and medium future in check.


Figure 2. The transcription interface at launch, featuring transcribing in progress.

Additionally, we grappled with the realities of intensive launch and stabilization activity, along with post-launch uncertainty. As our pre-launch roles and responsibilities across the product and program team shifted–and as we made steps toward building upon the basic functionality at launch–we identified some ways our process could be improved using Agile methods.

As we moved into active and then intensive development, Agile’s promises showed strongly. It was essential across this period for the technical lead and product owner to work closely with the product manager to remain flexible, deliver and assess the product regularly, and continuously evaluate the needs of the project team with the circumstances of the product, as well as present risk and wider institutional context from many service units across the Library of Congress.

Maintaining and improving

Since launch, we have consistently improved Concordia through 53 releases. The next functionality we needed to address after launch was the review workflow. We took time to consider the experience of review and the ways a volunteer might connect to that task. We gathered feedback via email, social media, surveys, and our discussion forum History Hub. We also continued to explore the best way to design a flexible experience for volunteers, as their motivations may change through activity and over time. We landed on the idea of tracks as a means of continuing support for volunteers to engage more deeply, but also including choice to perform other activities including transcription, review, and tagging. This approach would allow them to explore By the People and contexts of items on https://loc.gov; and still ensure a workflow for high quality transcription and review. We implemented the functionality to support that experience in early 2019. In recent months, we have been focusing some of our next steps on prototyping a task-based approach, refining and expanding our onboarding, and data-driven improvement of the tagging feature. We continue to use Agile principles as we layout our roadmap and determining priorities for these plans.


Figure 3. Presenting the opportunity to volunteers to stay within the review experience–or to select a new task–after accepting a page during review.

What Worked for the Team

In this section, we would like to share some of the successful elements of our process in each sprint and across the phases of development. Agile ceremonies were key in providing a framework for the team to reach iterative goals. We paid particular care to the rhythm of Agile ceremonies, hosting design studios, using our design principles, and balancing priorities with consultation leading into sprint grooming.

During development we moved between two week sprints and three week sprints. Our regular Agile ceremonies include sprint planning, sprint grooming, and a sprint close out that consists of demo and retro, followed by release. These ceremonies also provided structure for the product team, which made it possible for us to adapt to the conditions and needs of the product and the program.

Within the sprint planning process and during each sprint, developers were assigned tickets by the project manager. Once the ticket is assigned, the next step is for the assigned developer to create a branch, do work on that branch, create a pull request, and assign another developer as the reviewer. The reviewer and the developer may go back and forth a few times before a pull request is approved. After that, the branch is merged into master, deployed to the continuous integration environment, and checked by a non-developer member of the product team. This tester may go back and forth with the developer (and possibly the reviewer as well) until all outstanding issues have been either resolved or documented as issues in the backlog. Once the tester is satisfied, the issue can be closed and is considered done.

We closed out sprints with retrospectives. These ceremonies provided the space for the team to make the necessary adjustments to the process. We approached retrospectives with an improvement mindset; in each retro we would identify what went well, what “was what it was,” and what could be improved. We dot voted to surface shared views in each category and discussed these topics in more detail. From there, the Project Manager defined action items, which we discussed in further detail with the Technical Lead and Product Owner, if the retro identified process needing adjustment.

Even beyond the formal Agile ceremonies, this project was a true collaboration across service unts at the Library of Congress. We managed different expectations, forms of feedback, and the process of development with input from many different stakeholders who were not familiar with Agile and represented different views and goals. Our Agile ceremonies allowed us to more effectively bring stakeholder input to our decision-making and prioritization.

As we moved from implementation to improving the product, we used two related tactics to help move the product forward: design studios and prototyping passes. Design studios are facilitated workshops that aim to bring many ideas to the table. While our design principles and our user stories provide a framework on identifying the goals we want to meet, how we execute meeting those goals can be interpreted in many different ways. Design studios not only provide an opportunity to rapidly produce ideas and have them heard by the product team, they also serve to align the team by matching proposed features to our design principles.

For example: Volunteers were experiencing friction in the review process and finding an item to which they could contribute. The team reviewed the current flow of the site to identify where volunteers were having blockers. Then the team was broken up into two groups and were asked to collaboratively define ways to remove those blockers. At hand, the teams had paper print outs of the site, markers, sticky notes, and blank paper to build a paper prototype. After individual brainstorming and team discussion, each team presented their idea to the other. It turned out the two teams had not only similar core ideas, but also complementary features and user flows in mind. We viewed this alignment as representative of our ongoing process to maintain shared vision.

Following that meeting, the Product Owner, Technical Lead, and Project Manager consolidated ideas and decided to build out a code prototype of those ideas. Through three sprints, the development team intensively built versions of the prototype, which places primacy on selecting assets at the task level. We tested and released this new feature in the spring; along with user-testing and a survey, the prototype has allowed the By the People team to present collections to volunteers in new ways. Registered volunteers could try the prototype when logged into By the People.


Figure 4. A view of the adjustable Activity prototype interface featuring images waiting to be reviewed.

A remote-friendly environment has made a difference for this team. This type of working condition is modern, flexible, accessible from anywhere, and boosts morale. The Concordia team has incorporated this into how we work. Working in this way, we reduce the stress of commuting, give more time to staff to dive deep into a technical problem, and have a streamlined way to connect with one another. While in-person meeting time is still very important, providing a remote option opens up the accessibility of participation and does not alienate opportunity. To complement this flexibility, we try to commit to holding backlog grooming and retrospectives and any design-related workshops in person. By normalizing video conferencing, online chat platforms, and discussion in GitHub, collaboration can happen in many formats.

We also consistently returned to the design principles to help refine expectations, determine product feature prioritization, thoughtfully respond to functionality (or efficiency or productivity) requests, and track the progress of the product.

Assessing Approaches with Agile

We also wish to share details of the approaches that, once applied, were not quite adequate for this project. An Agile methodology allowed us to identify, isolate, and address what didn’t work and create contextual, specific solutions.

Undertaking deep and robust user research practice remains a challenge with limited expertise available; however, in consultation with the Library of Congress Design & Development team, the Concordia product team has identified lightweight research mechanisms. Some of the ways we are exploring user research approaches include:

  • Refining understanding through user surveys
  • Brief in-person interviews on public engagement days
  • Creating a feature branch capability so that we can invite remote participants to provide feedback on functionality, requirements, and task barriers
  • Prototyping prior to full product releases
  • Deep listening to inform product roadmap and prioritization

We also explored the idea of using the publicly-available IIIF presentation metadata and extending an IIIF viewer such as Mirador. Not all of the Library of Congress digital collections on www.loc.gov are available in IIIF yet, and we did not want to become responsible for a major open-source project fork. The data model for IIIF is focused on the item level, where one item may have many images representing many pages. We wanted to be able to break up all the pages in an item so that transcription activity could be isolated at the image level rather than the item level. Although we have a great interest in making this tool compatible with other data source sets, we determined that functionality was not a must-have at launch and it would be expensive to implement up front. However, it may be a worthwhile trade-off to do so in the near future. An Agile approach allows us to explore and adjust to that possibility should it arise.

Piloting Workflows and Approaches at the Library of Congress via Concordia

This crowdsourcing tool and program have cleared paths for new technical approaches and project management workflows. Creating this tool at the Library of Congress allowed the product team to articulate standards in coding and continually improve communication. Central to this discussion are the a) design considerations to align the tool with the technical approaches and requirements of the Library of Congress and b) allowing Concordia to scale and adapt to new collections and user needs.

We connected a series of existing workflows to accommodate user-generated content; so, new content delivered into existing workflows. For this project, collection materials are copied from their publicly-available hosted locations on www.loc.gov (which is on-premises) to cloud-based infrastructure, using our API. Concordia makes them available on crowd.loc.gov and collects transcriptions provided by volunteers. Then, BagIt format is used to package up completed transcription data in a directory structure that mirrors the source content. From there, we send it through our accessioning workflow where it is preserved on long-term storage and lands back on www.loc.gov backend storage. Finally, the bag is picked up by the www.loc.gov Extract-Transform-Load (ETL) process and newly mapped content made available on our main website.

Concordia continues to evolve through iterative development. At the time of writing, we have created and evaluated an activity-based prototype. As we described above, this approach allows volunteers to take action more quickly – such as those who are motivated by transcribing or review or want to contribute across collections or filter, rather than drilling down through specific content. In the coming months, we plan to apply a concentrated approach to refine to tagging. We expect to begin by evaluating data generated in our tagging pilot phase, engaging stakeholders around their ideal uses of tagging, through design studios and prototyping, and through interviews with users about the ways in which they respond to the opportunity to tag.

Conclusion

We believe Concordia advances the three core objectives of the Digital Strategy: to throw open the treasure chest, to connect, and to invest in our future. Concordia was created as a user-centered project developed using an Agile methodology. Approaching Concordia’s development in this way was an opportunity to identify and test key software development and project management practices that might be extended to other work across the Library of Congress. With this approach, we are integrating a view to invest in the future. We have been taking small steps and high touch on a product, guided by research-based and specific hypotheses but with leeway to evaluate and adjust as we learned. Unpacking our process for Concordia allows us to consider larger questions of services at the Library of Congress. As we look ahead, stakeholders will be called to consider how we can build scalable platforms that allow us to work with and use the collections instead of merely view them.

As we continually improve Concordia, we are extending opportunities to showcase and invite deeper involvement with our digital collections. We also seize opportunities to connect with a range of audiences from lifelong learners, educators, students, and researchers with By the People. We also want to share transparently with our peers in other organizations about the thinking and practices we used to create Concordia. Since our public launch of By the People in October 2018, more than 11,000 volunteers have registered and provided 232,000 contributions. Over 30,000 pages have been completely transcribed and reviewed (with 58,000 in progress). We continue to return transcriptions for keyword searching with 6,700 transcribed pages now searchable on https://www.loc.gov. If you fancy it, try a search for Lincoln AND “live like princes”.


Figure 5. Completed transcriptions presented alongside the image in https://loc.gov and attributed to By the People volunteers.

We want to end this discussion by sharing that Concordia is free to reuse and the development process is documented in our Concordia GitHub repository. We invite you to explore whether it can be of use to you. Concordia improves as people participate, use the tool, and provide feedback across our repositories and forums. We hope that this article shares our process of creating Concordia and demonstrates the ways the tool and an Agile approach may be of use to other organizations.

Endnotes

[1] Agile at 18F. Accessed 20 March 2019. Available at https://agile.18f.gov/agile-is-something-you-are/

Leave a Reply

ISSN 1940-5758