Issue 60, 2025-04-14

Editorial

Mark Swenson

Welcome to the 60th issue of Code4Lib Journal. We hope that you enjoy the assortment of articles we have assembled for this issue.

Quality Control Automation for Student Driven Digitization Workflows

Corinne Chatnik and James Gaskell

At Union College Schaffer Library, the digitization lab is mostly staffed by undergraduates who only work a handful of hours a week. While they do a great job, the infrequency of their work hours and lack of experience results in errors in digitization and metadata. Many of these errors are difficult to catch during quality control checks because they are so minute, such as a missed counted page number here, or a transposed character in a filename there. So, a Computer Science student and a librarian collaborated to create a quality control automation application for the digitization workflow. The application is written in Python and relies heavily on using Openpyxl libraries to check the metadata spreadsheet and compare metadata with the digitized files. This article discusses the purpose and theory behind the Quality Control application, how hands-on experience with the digitization workflow informs automation, the methodology, and the user interface decisions. The goal of this application is to make it usable by other students and staff and to build it into the workflow in the future. This collaboration resulted in an experiential learning opportunity that has benefited the student’s ability to apply what they have learned in class to a real-world problem.

OpenWEMI: A Minimally Constrained Vocabulary for Work, Expression, Manifestation, and Item

Karen Coyle

The Dublin Core Metadata Initiative has published a minimally constrainted vocabulary for the concepts of Work, Expression, Manifestation and Item (WEMI) that can support the use of these concepts in metadata describing any type of created resources. These concepts originally were defined for library catalog metadata and did not anticipate uses outside of that application. Employment of the concepts in non-library applications is evidence that the concepts are useful for a wider variety of metadata users, once freed from the constraints necessitated for the library-specific use.

Taming the Generative AI Wild West: Integrating Knowledge Graphs in Digital Library Systems

Jennifer D’Souza

Since the 17th century, scientific publishing has been document-centric, leaving knowledge—such as methods and best practices—largely unstructured and not easily machine-interpretable, despite digital availability. Traditional practices reduce content to keyword indexes, masking richer insights. Advances in semantic technologies, like knowledge graphs, can enhance the structure of scientific records, addressing challenges in a research landscape where millions of contributions are published annually, often as pseudo-digitized PDFs. As a case in point, generative AI Large Language Models (LLMs) like OpenAI’s GPT and Meta AI’s LLAMA exemplify rapid innovation, yet critical information about LLMs remains scattered across articles, blogs, and code repositories. This highlights the need for knowledge-graph-based publishing to make scientific knowledge truly FAIR (Findable, Accessible, Interoperable, Reusable). This article explores semantic publishing workflows, enabling structured descriptions and comparisons of LLMs that support automated research insights—similar to product descriptions on e-commerce platforms. Demonstrated via the Open Research Knowledge Graph (ORKG) platform, a flagship project of the TIB Leibniz Information Centre for Science & Technology and University Library, this approach transforms scientific documentation into machine-actionable knowledge, streamlining research access, update, search, and comparison.

Gamifying Information Literacy: Using Unity and Github to Collaborate on a Video Game for the Library

Halie Kerns and Leah Fitzgerald

Gamification, as a way to engage students in the library, has been a topic explored by librarians for many years. In this article, two librarians at a small rural academic library describe their year-long collaboration with students from a Game Design Program to create a single-player pixel-art video game designed to teach information literacy skills asynchronously. The project was accomplished using the game engine Unity and utilizing GitHub for project management. Outlined are the project’s inspiration, management, team structure, and outcomes. Not only did the project serve to instruct, but it was also meant to test the campus’ appetite for digital scholarship projects. While the project ended with mixed results, it is presented here as an example of how innovation can grow a campus’ digital presence, even in resistant libraries.

Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals

Aerith Y. Netzer

Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, without cloud compute cost. With these tools, university-owned publishing operations can increase their operating efficiency which, when combined with human review, has no effect on quality.

Refactoring Alma: Simplifying Circulation Settings in the Alma Integrated Library System (ILS)

Wilhelmina Randtke

Refactoring is the process of restructuring existing code, in order to make the code easier to maintain, without changing the behavior of the software. Georgia Southern University is the product of a consolidation of two separate universities in 2017. Before consolidation, each predecessor university had its own cataloging practices and software settings in the integrated library system (ILS) / library services platform (LSP). While the machine-readable cataloging (MARC) standard focuses on discovery, and descriptive search blended well to support discovery, settings related to circulation were in discord following the merger. Three busy checkout desks each had different localized behaviors and requested additional behaviors to be built out without centrally standardizing. Complexity stemming from non-unified metadata and settings plus customizations implemented over time for multiple checkout desks had ballooned to make for circulation settings which were overly baroque, difficult to meaningfully edit when a change to circulation practices was needed, and which were layered and complex to such a degree that local standards could not be explained to employees creating and editing library metadata. This resulted in frequent frustration with how circulation worked, difficulty knowing what was or wasn’t a software bug, and inability to quickly fix problems once problems were identified or to make requested changes. During 2024, the Georgia Southern University Libraries (University Libraries) undertook a comprehensive settings clean up in Alma centered around software settings related to circulation. This article describes step-by-step how the University Libraries streamlined and simplified software settings in the Alma ILS, in order to make the software explainable and easier to manage, and all without impacting three busy checkout desks during the change process. Through refactoring, the University Libraries achieved more easily maintainable and explainable software settings, with minimal disruption to day-to-day operations along the way.

Distant Listening: Using Python and Apps Scripts to Text Mine and Tag Oral History Collections

Andrew Weymouth

This article presents a case study for creating subject tags utilizing transcription data across entire oral history collections, adapting Franco Moretti’s distant reading approach to narrative audio material. Designed for oral history project managers, the workflow empowers student workers to generate, modify, and expand subject tags during transcription editing, thereby enhancing the overall accuracy and discoverability of the collection. The paper details the workflow, surveys challenges the process addresses, shares experiences of transcribers, and examines the limitations of data-driven, human-edited tagging.

Static Web Methodology as a Sustainable Approach to Digital Humanities Projects

Olivia M. Wikle and Evan Peter Williamson

The web platforms adopted for digital humanities (DH) projects come with significant short- and long-term costs—selecting a platform will impact how resources are invested in a project and organization. As DH practitioners, the time (or money paid to contractors) we must invest in managing servers, maintaining platform updates, and learning idiosyncratic administrative systems ultimately limits our ability to create and sustain unique, innovative projects. Reexamining DH platforms through a minimal computing lens has led University of Idaho librarians to pursue new project-development methods that minimize digital infrastructure as a means to maximize investment in people, growing agency, agility, and long-term sustainability in both the organization and digital outputs. U of I librarians’ development approach centered around static web-based templates aims to develop transferable technical skills that all digital projects require, while also matching the structure of academic work cycles and fulfilling DH project needs. In particular, a static web approach encourages the creation of preservation-ready project data, enables periods of iterative development, and capitalizes on the low-cost/low-maintenance characteristics of statically-generated sites to optimize limited economic resources and personnel time. This short paper introduces static web development methodology (titled “Lib-Static”) as a provocation to rethink DH infrastructure choices, asking how our frameworks can build internal skills, collaboration, and empowerment to generate more sustainable digital projects.