Evaluating HTJ2K as a Drop-In Replacement for JPEG2000 with IIIF

Glen Robson, Stefano Cossu, Ruven Pillay, Michael D. Smith

JPEG2000 is a widely adopted open standard for images in cultural heritage, both for delivering access and for creating preservation files that are losslessly compressed. Recently, a new extension to JPEG2000 has been developed by the JPEG Committee: “High Throughput JPEG2000,” better known as HTJ2K. HTJ2K promises faster encoding and decoding speeds compared to traditional JPEG2000 Part-1, while requiring little or no changes to existing code and infrastructure. The IIIF community has completed a project to evaluate HTJ2K as a drop-in replacement for encoding JPEG2000 and to validate the expected improvements regarding speed and efficiency.

The group looked at a number of tools including Kakadu, OpenJPEG, and Grok that support HTJ2K and ran encoding tests comparing the encoding speeds and required disk space for these images. The group also set up decoding speed tests comparing HTJ2K with tiled pyramid TIFF and traditional JPEG2000 using one of the major open source IIIF Image servers, IIPImage.

We found that HTJ2K is significantly faster than traditional JPEG2000, though the results are more nuanced when compared with TIFF.

Simplifying ARK ID management for persistent access to digital objects

Kyle Huynh, Natkeeran Ledchumykanthan, Kirsta Stapelfeldt, Irfan Rahman

This article will provide a brief overview of considerations made by the UTSC Library in selecting a persistent identifier scheme for digital collections in a mid-sized Canadian library.  ARKs were selected for their early support of digital object management, the low-cost, decentralized capabilities of the ARK system, and the usefulness of ARK URLs during system migration projects.  In the absence of a subscription to a centralized resolver service for ARKs, the UTSC Library Digital Scholarship Unit built an open source PHP-based application for minting, binding, managing, and tracking ARK IDs. This article will introduce the application’s architecture and affordances, which may be useful to others in the library community with similar use cases, as well as the approach to using ARKs planned for an Islandora 2.x system.

The DSA Toolkit Shines Light Into Dark and Stormy Archives

Shawn M. Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Martin Klein, Michele C. Weigle, Michael L. Nelson

Themed web archive collections exist to make sense of archived web pages (mementos). Some collections contain hundreds of thousands of mementos. There are many collections about the same topic. Few collections on platforms like Archive-It include standardized metadata. Reviewing the documents in a single collection thus becomes an expensive proposition. Search engines help find individual documents but do not provide an overall understanding of each collection as a whole. Visitors need to be able to understand what individual collections contain so they can make decisions about individual collections and compare them to each other. The Dark and Stormy Archives (DSA) Project applies social media storytelling to a subset of a collection to facilitate collection understanding at a glance. As part of this work, we developed the DSA Toolkit, which helps archivists and visitors leverage this capability. As part of our recent International Internet Preservation Consortium (IIPC) grant, Los Alamos National Laboratory (LANL) and Old Dominion University (ODU) piloted the DSA toolkit with the National Library of Australia (NLA). Collectively we have made numerous improvements, from better handling of NLA mementos to native Linux installers to more approachable Web User Interfaces. Our goal is to make the DSA approachable for everyone so that end-users and archivists alike can apply social media storytelling to web archives.

Closing the Gap between FAIR Data Repositories and Hierarchical Data Formats

Connor B. Bailey, Fedor F. Balakirev, and Lyudmila L. Balakireva

Many in the scientific community, particularly in publicly funded research, are pushing to adhere to more accessible data standards to maximize the findability, accessibility, interoperability, and reusability (FAIR) of scientific data, especially with the growing prevalence of machine learning augmented research. Online FAIR data repositories, such as the Open Science Framework (OSF), help facilitate the adoption of these standards by providing frameworks for storage, access, search, APIs, and other features that create organized hubs of scientific data. However, the wider acceptance of such repositories is hindered by the lack of support of hierarchical data formats, such as Technical Data Management Streaming (TDMS) and Hierarchical Data Format 5 (HDF5), that many researchers rely on to organize their datasets. Various tools and strategies should be used to allow hierarchical data formats, FAIR data repositories, and scientific organizations to work more seamlessly together. A pilot project at Los Alamos National Laboratory (LANL) addresses the disconnect between them by integrating the OSF FAIR data repository with hierarchical data renderers, extending support for additional file types in their framework. The multifaceted interactive renderer displays a tree of metadata alongside a table and plot of the data channels in the file. This allows users to quickly and efficiently load large and complex data files directly in the OSF webapp. Users who are browsing files can quickly and intuitively see the files in the way they or their colleagues structured the hierarchical form and immediately grasp their contents. This solution helps bridge the gap between hierarchical data storage techniques and FAIR data repositories, making both of them more viable options for scientific institutions like LANL which have been put off by the lack of integration between them.

Assessing High-volume Transfers from Optical Media at NYPL

Michelle Rothrock, Alison Rhonemus, and Nick Krabbenhoeft

NYPL’s workflow for transferring optical media to long-term storage was met with a challenge: an acquisition of a collection containing thousands of recordable CDs and DVDs. Many programs take a disk-by-disk approach to imaging or transferring optical media, but to deal with a collection of this size, NYPL developed a workflow using a Nimbie AutoLoader and a customized version of KBNL’s open-source IROMLAB software to batch disks for transfer. This workflow prioritized quantity, but, at the outset, it was difficult to tell if every transfer was as accurate as it could be. We discuss the process of evaluating the success of the mass transfer workflow, and the improvements we made to identify and troubleshoot errors that could occur during the transfer. A background of the institution and other institutions’ approaches to similar projects is given, then an in-depth discussion of the process of gathering and analyzing data. We finish with a discussion of our takeaways from the project.

Considered Content: a Design System for Equity, Accessibility, and Sustainability

Erinn Aspinall, Amy Drayer, Gabe Ormsby, and Jen Neveau

The University of Minnesota Libraries developed and applied a principles-based design system to their Health Sciences Library website. With the design system at its center, the revised site was able to achieve accessible, ethical, inclusive, sustainable, responsible, and universal design. The final site was built with elegantly accessible semantic HTML-focused code on Drupal 8 with highly curated and considered content, meeting and exceeding WCAG 2.1 AA guidance and addressing cognitive and learning considerations through the use of plain language, templated pages for consistent page-level organization, and no hidden content. As a result, the site better supports all users regardless of their abilities, attention level, mental status, reading level, and reliability of their internet connection, all of which are especially critical now as an elevated number of people experience crises, anxieties, and depression.

Robustifying Links To Combat Reference Rot

Shawn Jones, Martin Klein, and Herbert Van de Sompel

Links to web resources frequently break, and linked content can change at unpredictable rates. These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information. In this paper, we highlight the significance of reference rot, provide an overview of existing techniques and their characteristics to address it, and introduce our Robust Links approach, including its web service and underlying API. Robustifying links offers a proactive, uniform, and machine-actionable way to combat reference rot. In addition, we discuss our reasoning and approach aimed at keeping the approach functional for the long term. To showcase our approach, we have robustified all links in this article.

Editorial: For Pandemic Times Such as This

Peter Murray

A pandemic changes the world and changes libraries.

Scaling IIIF Image Tiling in the Cloud

Yinlin Chen, Soumik Ghosh, Tingting Jiang, James Tuttle

The International Archive of Women in Architecture, established at Virginia Tech in 1985, collects books, biographical information, and published materials from nearly 40 countries that are divided into around 450 collections. In order to provide public access to these collections, we built an application using the IIIF APIs to pre-generate image tiles and manifests which are statically served in the AWS cloud. We established an automatic image processing pipeline using a suite of AWS services to implement microservices in Lambda and Docker. By doing so, we reduced the processing time for terabytes of images from weeks to days.

In this article, we describe our serverless architecture design and implementations, elaborate the technical solution on integrating multiple AWS services with other techniques into the application, and describe our streamlined and scalable approach to handle extremely large image datasets. Finally, we show the significantly improved performance compared to traditional processing architectures along with a cost evaluation.

“With One Heart”: Agile approaches for developing Concordia and crowdsourcing at the Library of Congress

Meghan Ferriter, Kate Zwaard, Elaine Kamlley, Rosie Storey, Chris Adams, Lauren Algee, Victoria Van Hyning, Jamie Bresner, Abigail Potter, Eileen Jakeway, and David Brunton

In October 2018, the Library of Congress launched its crowdsourcing program By the People. The program is built on Concordia, a transcription and tagging tool developed to power crowdsourced transcription projects. Concordia is open source software designed and developed iteratively at the Library of Congress using Agile methodology and user-centered design. Applying Agile principles allowed us to create a viable product while simultaneously pushing at the boundaries of capability, capacity, and customer satisfaction. In this article, we share more about the process of designing and developing Concordia, including our goals, constraints, successes, and next steps.

Editorial: Just Enough of a Shared Vision

Peter Murray

What makes a vibrant community? A shared vision! When we live into a shared vision, we can accomplish big goals even when our motivations are not completely aligned.

EnviroPi: Taking a DIY Internet-of-Things approach to an environmental monitoring system

Monica Maceli

Monitoring environmental conditions in cultural heritage organizations is vitally important to ensure effective preservation of collections. Environmental monitoring systems may range from stand-alone data-loggers to more complex networked systems and can collect a variety of sensor data such as temperature, humidity, light, or air quality measures. However, such commercial systems are often costly and limited in customizability and extensibility. This article describes a do-it-yourself network of Bluetooth Low Energy-based wireless sensors, which seeks to manage earlier-identified trade-offs in cost, required technical skill, and maintainability, based on the Raspberry Pi™ single-board computer and a series of microcontroller boards. This builds on the author’s prior work exploring the construction of a low-cost Raspberry-Pi™-based datalogger, iterating upon reviewer and practitioners’ feedback to implement and reflect upon suggested improvements.

OneButton: A Link Resolving Application to Guide Users to Optimal Fulfillment Options

Lauren Magnuson, Karl Stutzman, Roger Peters, Noah Brubaker

Like many consortia, institutional members of the Private Academic Library Network of Indiana (PALNI) provide multiple fulfillment options to obtain requested items for their users. Users can place on shelf holds on items, or they can request material that isn’t held by their institution through a group circulation resource sharing network (dubbed PALShare) or through traditional InterLibrary Loan (ILL) (through WorldShare ILL or ILLiad). All of these options can be confusing to users who may not understand the best or fastest way to get access to needed materials. A PHP application, OneButton, was developed that replaces multiple fulfillment buttons in institutional discovery interfaces with a single OpenURL link. OneButton looks up holdings and availability at a user’s home institution and across the consortium and routes the user to the optimal fulfillment option for them. If an item is held by and available at their institution, the user can be shown a stack map to help guide them to the item on the shelf; if an item is held by and available at the consortium, the user is routed to a group circulation request form; otherwise, the user is routed to an ILL request form. All routing and processing are handled by the OneButton application – the user doesn’t need to think about what the ‘best’ fulfillment option is. This article will discuss the experiences of one institution using OneButton in production since fall 2017, analytics data gathered, and how other institutions can adopt the application (freely available on GitHub: https://github.com/PALNI/onebutton).

Assessing the Potential Use of High Efficiency Video Coding (HEVC) and High Efficiency Image File Format (HEIF) in Archival Still Images

Michael J. Bennett

Both HEVC (ISO/IEC 23008–2) video compression and the HEIF (ISO/IEC 23008-12) wrapper format are relatively new and evolving standards. Though attention has been given to their recent adoption as a JPEG replacement for more efficient local still image use on consumer electronic devices, the standards are written to encompass far broader potential application. This study examines current HEVC and HEIF tools, and the standards’ possible value in the context of digital still image archiving in cultural heritage repositories.

A Practical Starter Guide on Developing Accessible Websites

Cynthia Ng and Michael Schofield

There is growing concern about the accessibility of the online content and services provided by libraries and public institutions. While many articles cover legislation, general benefits, and common opportunities to improve web accessibility on the surface (e.g., alt tags), few articles discuss web accessibility in more depth, and when they do, they are typically not specific to library web services. This article is meant to fill in this vacuum and will provide practical best practices and code.

Recount: Revisiting the 42nd Canadian Federal Election to Evaluate the Efficacy of Retroactive Tweet Collection

Anthony T. Pinter and Ben Goldman

In this paper, we report the development and testing of a methodology for collecting tweets from periods beyond the Twitter API’s seven-to-nine day limitation. To accomplish this, we used Twitter’s advanced search feature to search for tweets from past the seven to nine day limit, and then used JavaScript to automatically scan the resulting webpage for tweet IDs. These IDs were then rehydrated (tweet metadata retrieved) using twarc. To examine the efficacy of this method for retrospective collection, we revisited the case study of the 42nd Canadian Federal Election. Using comparisons between the two datasets, we found that our methodology does not produce as robust results as real-time streaming, but that it might be useful as a starting point for researchers or collectors. We close by discussing the implications of these findings.

Editorial: Reflecting on the success and risks to the Code4Lib Journal

Peter E. Murray

At the Code4Lib 2017 conference, I gave a short lightning talk about the Code4Lib Journal and in the process realized that we will soon be closing out the 10th year of "foster[ing] community and share[ing] information among those interested in the intersection of libraries, technology, and the future." That quote comes from the Code4Lib Journal […]

Developing an online platform for gamified library instruction

Jared Cowing

Gamification is a concept that has been catching fire for a while now in education, particularly in libraries. This article describes a pilot effort to create an online gamified platform for use in the Woodbury University Library’s information literacy course. The objectives of this project were both to increase student engagement and learning, and to serve as an opportunity for myself to further develop my web development skills. The platform was developed using the CodeIgniter web framework and consisted of several homework exercises ranging from a top-down two-dimensional library exploration game to a tutorial on cleaning up machine-generated APA citations. This article details the project’s planning and development process, the gamification concepts that helped guide the conceptualization of each exercise, reflections on the platform’s implementation in four course sections, and aspirations for the future of the project. It is hoped that this article will serve as an example of the opportunities–and challenges–that await both librarians and instructors who wish to add coding to their existing skill set.

OSS4EVA: Using Open-Source Tools to Fulfill Digital Preservation Requirements

Janet Carleton, Heidi Dowding, Marty Gengenbach, Blake Graham, Sam Meister, Jessica Moran, Shira Peltzman, Julie Seifert, and Dorothy Waugh

This paper builds on the findings of a workshop held at the 2015 International Conference on Digital Preservation (iPRES), entitled, “Using Open-Source Tools to Fulfill Digital Preservation Requirements” (OSS4PRES hereafter). This day-long workshop brought together participants from across the library and archives community, including practitioners, proprietary vendors, and representatives from open-source projects. The resulting conversations were surprisingly revealing: while OSS’ significance within the preservation landscape was made clear, participants noted that there are a number of roadblocks that discourage or altogether prevent its use in many organizations. Overcoming these challenges will be necessary to further widespread, sustainable OSS adoption within the digital preservation community. This article will mine the rich discussions that took place at OSS4PRES to (1) summarize the workshop’s key themes and major points of debate, (2) provide a comprehensive analysis of the opportunities, gaps, and challenges that using OSS entails at a philosophical, institutional, and individual level, and (3) offer a tangible set of recommendations for future work designed to broaden community engagement and enhance the sustainability of open source initiatives, drawing on both participants’ experience as well as additional research.

An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter

Nick Ruest and Ian Milligan

This article examines the tools, approaches, collaboration, and findings of the Web Archives for Historical Research Group around the capture and analysis of about 4 million tweets during the 2015 Canadian Federal Election. We hope that national libraries and other heritage institutions will find our model useful as they consider how to capture, preserve, and analyze ongoing events using Twitter.

While Twitter is not a representative sample of broader society – Pew research shows in their study of US users that it skews young, college-educated, and affluent (above $50,000 household income) – Twitter still represents an exponential increase in the amount of information generated, retained, and preserved from 'everyday' people. Therefore, when historians study the 2015 federal election, Twitter will be a prime source.

On August 3, 2015, the team initiated both a Search API and Stream API collection with twarc, a tool developed by Ed Summers, using the hashtag #elxn42. The hashtag referred to the election being Canada's 42nd general federal election (hence 'election 42' or elxn42). Data collection ceased on November 5, 2015, the day after Justin Trudeau was sworn in as the 42nd Prime Minister of Canada. We collected for a total of 102 days, 13 hours and 50 minutes.

To analyze the data set, we took advantage of a number of command line tools, utilities that are available within twarc, twarc-report, and jq. In accordance with the Twitter Developer Agreement & Policy, and after ethical deliberations discussed below, we made the tweet IDs and other derivative data available in a data repository. This allows other people to use our dataset, cite our dataset, and enhance their own research projects by drawing on #elxn42 tweets.

Our analytics included:

  • breaking tweet text down by day to track change over time;
  • client analysis, allowing us to see how the scale of mobile devices affected medium interactions;
  • URL analysis, comparing both to Archive-It collections and the Wayback Availability API to add to our understanding of crawl completeness;
  • and image analysis, using an archive of extracted images.

Our article introduces our collecting work, ethical considerations, the analysis we have done, and provides a framework for other collecting institutions to do similar work with our off-the-shelf open-source tools. We conclude by ruminating about connecting Twitter archiving with a broader web archiving strategy.

Open Journal Systems and Dataverse Integration– Helping Journals to Upgrade Data Publication for Reusable Research

Micah Altman, Eleni Castro, Mercè Crosas, Philip Durbin, Alex Garnett, and Jen Whitney

This article describes the novel open source tools for open data publication in open access journal workflows. This comprises a plugin for Open Journal Systems that supports a data submission, citation, review, and publication workflow; and an extension to the Dataverse system that provides a standard deposit API. We describe the function and design of these tools, provide examples of their use, and summarize their initial reception. We conclude by discussing future plans and potential impact.

Collecting and Describing University-Generated Patents in an Institutional Repository: A Case Study from Rice University

Linda Spiro and Scott Carlson

Providing an easy method of browsing a university’s patent output can free up valuable research time for faculty, students, and external researchers. This is especially true for Rice University’s Fondren Library, a USPTO-designated Patent and Trademark Resource Center that serves an academic community widely recognized for cutting edge science and engineering research. In order to make Rice-generated patents easier to find in the university’s community, a team of technical and public services librarians from Fondren Library devised a method to identify, download, and upload patents to the university’s institutional repository, starting with a backlog of over 300. This article discusses the rationale behind the project, its potential benefits, and challenges as new Rice-generated patents are added to the repository on a monthly basis.

Connecting Historical and Digital Frontiers: Enhancing Access to the Latah County Oral History Collection Utilizing OHMS (Oral History Metadata Synchronizer) and Isotope

Devin Becker and Erin Passehl-Stoddart

The University of Idaho Library received a donation of oral histories in 1987 that were conducted and collected by a local county historical society in the 1970s. The audio cassettes and transcriptions were digitized in 2013 and 2014, producing one of the largest digital collections of oral histories – over 300 interviews and over 569 hours – in the Pacific Northwest. To provide enhanced access to the collection, the Digital Initiatives Department employed an open-source plug-in called the Oral History Metadata Synchronizer (OHMS) – an XML and PHP driven system that was created at the Louie B. Nunn Center for Oral History at the University of Kentucky Libraries – to deliver the audio MP3 files together with their  indexes and transcripts. OHMS synchronizes the transcribed text with timestamps in the audio and provides a viewer that connects search results of a transcript to the corresponding moment in the audio file. This article will discuss how we created the infrastructure by importing existing metadata, customized the interface and visual presentation by creating additional levels of access using complex XML files, enhanced descriptions using the Getty Art and Architecture Thesaurus for keywords and subjects, and tagged locations discussed in the interviews that were later connected to Google Maps via latitude and longitude coordinates. We will also discuss the implementation of and philosophy behind our use of the layout library Isotope as the primary point of access to the collection. The Latah County Oral History Collection is one of the first successful digital collections created using the OHMS system outside of the University of Kentucky.

Using SemanticScuttle for managing lists of recommended resources on a library website

Tomasz Neugebauer, Pamela Carson, and Stephen Krujelskis

Concordia University Libraries has adopted SemanticScuttle, an open source and locally-hosted PHP/MySQL application for social bookmarking, as an alternative to Delicious for managing lists of recommended resources on the library’s website. Two implementations for displaying feed content from SemanticScuttle were developed: (1) using the Google Feed API and (2) using direct SQL access to SemanticScuttle’s database.

Using Zapier with Trello for Electronic Resources Troubleshooting Workflow

Meghan Finch

Troubleshooting access problems is an important part of the electronic resources management workflow. This article discusses an opportunity to streamline and track troubleshooting using two web-based services: Trello and Zapier.

EgoSystem: Where are our Alumni?

James Powell, Harihar Shankar, Marko Rodriguez, Herbert Van de Sompel

Comprehensive social search on the Internet remains an unsolved problem. Social networking sites tend to be isolated from each other, and the information they contain is often not fully searchable outside the confines of the site. EgoSystem, developed at Los Alamos National Laboratories (LANL), explores the problems associated with automated discovery of public online identities for people, and the aggregation of the social, institution, conceptual, and artifact data connected to these identities. EgoSystem starts with basic demographic information about former employees and uses that information to locate person identities in various popular online systems. Once identified, their respective social networks, institutional affiliations, artifacts, and associated concepts are retrieved and linked into a graph containing other found identities. This graph is stored in a Titan graph database and can be explored using the Gremlin graph query/traversal language and with the EgoSystem Web interface.

For Video Streaming/Delivery: Is HTML5 the Real Fix?

Elías Tzoc and John Millard

The general movement towards streaming or playing videos on the web has grown exponentially in the last decade. The combination of new streaming technologies and faster Internet connections continue to provide enhanced and robust user experience for video content. For many organizations, adding videos on their websites has transitioned from a “cool” feature to a mission critical service. Some of the benefits in putting videos online include: to engage and convert visitors, to raise awareness or drive interest, to share inspirational stories or recent unique events, etc. Along with the growth in the use and need for video content on the web; delivering videos online also remains a messy activity for developers and web teams. Examples of existing challenges include creating more accessible videos with captions and delivering content (using adaptive streaming) for the diverse range of mobile and tablet devices. In this article, we report on the decision-making and early results in using the Kaltura video platform in two popular library platforms: CONTENTdm and DSpace.

Actions Speak Louder than Words: Analyzing large-scale query logs to improve the research experience

Ted Diamond, Susan Price, Raman Chandrasekar

Analyzing anonymized query and click-through logs leads to a better understanding of user behaviors and intentions, and provides opportunities to create an improved search experience. As a large-scale provider of SaaS services that returns search results against a single unified index, Serials Solutions is uniquely positioned to learn from the dataset of queries issued to its Summon® service by millions of users at hundreds of libraries around the world.

In this paper, we describe the Relevance Metrics Framework that we use to analyze our query logs and provide examples of insights we have gained during development and implementation. We also highlight the ways our analysis is inspiring changes to the Summon® service to improve the academic research experience.

Renewing UPEI’s Institutional Repository: New Features for an Islandora-based Environment

Donald Moses, Kirsta Stapelfeldt

In October of 2012, the University of Prince Edward Island (UPEI) launched an updated version of IslandScholar, UPEI’s Institutional repository. The repository, available from http://www.islandscholar.ca, is built on Islandora 6 (http://islandora.ca). The repository includes a number of new features, including: CSL integration for ingest, site display, and export of user-specific bibliographies; MADS-based Authority integration for Departments and Authors (with authorities created automatically using LDAP); batch ingest from Refworks (crosswalked to MODS for storage in the repository); embargo and statistics functions. Features from the first version of IslandScholar were also migrated to the new site, including Sherpa/Romeo integration (which provides just-in-time information about open access policies).

Editorial Introduction: It is Volunteers All the Way Down…

Peter Murray

by Peter Murray A well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast collection of stars called our galaxy. At the end of the lecture, a little […]

ISSN 1940-5758