Issue 51, 2021-06-14

Editorial: Closer to 100 than to 1

Edward M. Corrado

With the publication of Issue 51, the Code4Lib Journal is now closer to Issue 100 than we are to Issue 1. Also, we are developing a name change policy.

Adaptive Digital Library Services: Emergency Access Digitization at the University of Illinois at Urbana-Champaign During the COVID-19 Pandemic

Kyle R. Rimkus, Alex Dolski, Brynlee Emery, Rachael Johns, Patricia Lampron, William Schlaack, Angela Waarala

This paper describes how the University of Illinois at Urbana-Champaign Library provided access to circulating library materials during the 2020 COVID-19 pandemic. Specifically, it details how the library adapted existing staff roles and digital library infrastructure to offer on-demand digitization of and limited online access to library collection items requested by patrons working in a remote teaching and learning environment. The paper also provides an overview of the technology used, details how dedicated staff with strong local control of technology were able to scale up a university-wide solution, reflects on lessons learned, and analyzes nine months of usage data to shed light on library patrons’ changing needs during the pandemic.

Assessing High-volume Transfers from Optical Media at NYPL

Michelle Rothrock, Alison Rhonemus, and Nick Krabbenhoeft

NYPL’s workflow for transferring optical media to long-term storage was met with a challenge: an acquisition of a collection containing thousands of recordable CDs and DVDs. Many programs take a disk-by-disk approach to imaging or transferring optical media, but to deal with a collection of this size, NYPL developed a workflow using a Nimbie AutoLoader and a customized version of KBNL’s open-source IROMLAB software to batch disks for transfer. This workflow prioritized quantity, but, at the outset, it was difficult to tell if every transfer was as accurate as it could be. We discuss the process of evaluating the success of the mass transfer workflow, and the improvements we made to identify and troubleshoot errors that could occur during the transfer. A background of the institution and other institutions’ approaches to similar projects is given, then an in-depth discussion of the process of gathering and analyzing data. We finish with a discussion of our takeaways from the project.

Better Together: Improving the Lives of Metadata Creators with Natural Language Processing

Paul Kelly

DC Public Library has long held digital copies of the full run of local alternative weekly, Washington City Paper, but had no official status as a rights grantor to enable use. That recently changed due to a full agreement being reached with the publisher. One condition of that agreement, however, was that issues become available with usable descriptive metadata and subject access in time to celebrate the upcoming 40th anniversary of the publication, which at that time was in six months.

One of the most time intensive tasks our metadata specialists work on is assigning description to digital objects. This paper details how we applied Python’s Natural Language Toolkit and OpenRefine’s reconciliation functions to the collection’s OCR text to simplify subject selection for staff with no background in programming.

Choose Your Own Educational Resource: Developing an Interactive OER Using the Ink Scripting Language

Stewart Baker

Learning games are games created with the purpose of educating, as well as entertaining, players. This article describes the potential of interactive fiction (IF), a type of text-based game, to serve as learning games. After summarizing the basic concepts of interactive fiction and learning games, the article describes common interactive fiction programming languages and tools, including Ink, a simple markup language that can be used to create choice based text games that play in a web browser. The final section of the article includes code putting the concepts of Ink, interactive fiction, and learning games into action using part of an interactive OER created by the author in December of 2020.

Enhancing Print Journal Analysis for Shared Print Collections

Dana Jemison, Lucy Liu, Anna Striker, Alison Wohlers, Jing Jiang, and Judy Dobry

The Western Regional Storage Trust (WEST), is a distributed shared print journal repository program serving research libraries, college and university libraries, and library consortia in the Western Region of the United States. WEST solicits serial bibliographic records and related holdings biennially, which are evaluated and identified as candidates for shared print archiving using a complex collection analysis process. California Digital Library’s Discovery & Delivery WEST operations team (WEST-Ops) supports the functionality behind this collection analysis process used by WEST program staff (WEST-Staff) and members.

For WEST, proposals for shared print archiving have been historically predicated on what is known as an Ulrich’s journal family, which pulls together related serial titles, for example, succeeding and preceding serial titles, their supplements, and foreign language parallel titles. Ulrich’s, while it has been invaluable, proves problematic in several ways, resulting in the approximate omission of half of the journal titles submitted for collection analysis.

Part of WEST’s effectiveness in archiving hinges upon its ability to analyze local serials data across its membership as holistically as possible. The process that enables this analysis, and subsequent archiving proposals, is dependent on Ulrich’s journal family, for which ISSN has been traditionally used to match and cluster all related titles within a particular family. As such, the process is limited in that many journals have never been assigned ISSNs, especially older publications, or member bibliographic records may lack an ISSN(s), though the ISSN may exist in an OCLC primary record.

Building a mechanism for matching on ISSNs that goes beyond the base set of primary, former, and succeeding titles, expands the number of eligible ISSNs that facilitate Ulrich’s journal family matching. Furthermore, when no matches in Ulrich’s can be made based on ISSN, other types of control numbers within a bibliographic record may be used to match with records that have been previously matched with an Ulrich’s journal family via ISSN, resulting in a significant increase in the number of titles eligible for collection analysis.

This paper will discuss problems in Ulrich’s journal family matching, improved functional methodologies developed to address those problems, and potential strategies to improve in serial title clustering in the future.

How We Built a Spatial Subject Classification Based on Wikidata

Adrian Pohl

From the fall of 2017 to the beginning of 2020 a project had been carried out to upgrade spatial subject indexing in North Rhine-Westphalian Bibliography (NWBib) from uncontrolled strings to controlled values. For this purpose, a spatial classification with around 4,500 entries was created from Wikidata and published as SKOS (Simple Knowledge Organization System) vocabulary. The article gives an overview over the initial problem and outlines the different implementation steps.

Institutional Data Repository Development, a Moving Target

Colleen Fallaw, Genevieve Schmitt, Hoa Luong, Jason Colwell, and Jason Strutz

At the end of 2019, the Research Data Service (RDS) at the University of Illinois at Urbana-Champaign (UIUC) completed its fifth year as a campus-wide service. In order to gauge the effectiveness of the RDS in meeting the needs of Illinois researchers, RDS staff developed a five-year review consisting of a survey and a series of in-depth focus group interviews. As a result, our institutional data repository developed in-house by University Library IT staff, Illinois Data Bank, was recognized as the most useful service offering by our unit. When launched in 2016, storage resources and web servers for Illinois Data Bank and supporting systems were hosted on-premises at UIUC. As anticipated, researchers increasingly need to share large, and complex datasets. In a responsive effort to leverage the potentially more reliable, highly available, cost-effective, and scalable storage accessible to computation resources, we migrated our item bitstreams and web services to the cloud. Our efforts have met with success, but also with painful bumps along the way. This article describes how we supported data curation workflows through transitioning from on-premises to cloud resource hosting. It details our approaches to ingesting, curating, and offering access to dataset files up to 2TB in size–which may be archive type files (e.g., .zip or .tar) containing complex directory structures.

On the Nature of Extreme Close-Range Photogrammetry: Visualization and Measurement of North African Stone Points

Michael J. Bennett

Image acquisition, visualization, and measurement are examined in the context of extreme close-range photogrammetric data analysis. Manual measurements commonly used in traditional stone artifact investigation are used as a starting point to better gauge the usefulness of high-resolution 3D surrogates and the flexible digital tool sets that can work with them. The potential of various visualization techniques are also explored in the context of future teaching, learning, and research in virtual environments.

Optimizing Elasticsearch Search Experience Using a Thesaurus

Emmanuel Di Pretoro, Edwin De Roock, Wim Fremout, Erik Buelinckx, Stephanie Buyle, Véronique Van der Stede

The Belgian Art Links and Tools (BALaT) (http://balat.kikirpa.be/) is the continuously expanding online documentary platform of the Royal Institute for Cultural Heritage (KIK-IRPA), Brussels (Belgium). BALaT contains over 750,000 images of KIK-IRPA’s unique collection of photo negatives on the cultural heritage of Belgium, but also the library catalogue, PDFs of articles from KIK-IRPA’s Bulletin and other publications, an extensive persons and institutions authority list, and several specialized thematic websites, each of those collections being multilingual as Belgium has three official languages. All these are interlinked to give the user easy access to freely available information on the Belgian cultural heritage. During the last years, KIK-IRPA has been working on a detailed and inclusive data management plan. Through this data management plan, a new project HESCIDA (Heritage Science Data Archive) will upgrade BALaT to BALaT+, enabling access to searchable registries of KIK-IRPA datasets and data interoperability. BALaT+ will be a building block of DIGILAB, one of the future pillars of the European Research Infrastructure for Heritage Science (E-RIHS), which will provide online access to scientific data concerning tangible heritage, following the FAIR-principles (Findable-Accessible-Interoperable-Reusable). It will include and enable access to searchable registries of specialized digital resources (datasets, reference collections, thesauri, ontologies, etc.). In the context of this project, Elasticsearch has been chosen as the technology empowering the search component of BALaT+. An essential feature of this search functionality of BALaT+ is the need for linguistic equivalencies, meaning a term query in French should also return the matching results containing the equivalent term in Dutch. Another important feature is to offer a mechanism to broaden the search with elements of more precise terminology: a term like “furniture” could also match records containing chairs, tables, etc. This article will explain how a thesaurus developed in-house at KIK-IRPA was used to obtain these functionalities, from the processing of that thesaurus to the production of the configuration needed by Elasticsearch.

Pythagoras: Discovering and Visualizing Musical Relationships Using Computer Analysis

Brandon Bellanti

This paper presents an introduction to Pythagoras, an in-progress digital humanities project using Python to parse and analyze XML-encoded music scores. The goal of the project is to use recurring patterns of notes to explore existing relationships among musical works and composers.

An intended outcome of this project is to give music performers, scholars, librarians, and anyone else interested in digital humanities new insights into musical relationships as well as new methods of data analysis in the arts.