Issue 56, 2023-04-21

Editorial: Forget the AI, We Have Live Editors

Sara Amato

Welcoming new editors to the Code4Lib Journal

The Brooklyn Health Map: Reflections on a Health Data Dashboard for Brooklyn, NY

Sheena Philogene

Recent years have put a spotlight on the importance of searchers of all kinds being able to quickly and easily find relevant, timely, and useful health information. This article provides a general overview of the process used when creating the Brooklyn Health Map, an interactive Brooklyn-based health data dashboard that visualizes community health information at the census tract, zip code, and neighborhood levels. Built using HTML, CSS, Bootstrap, and JavaScript, the Brooklyn Health Map presents information in the form of interactive web maps, customizable graphs, and local level data summaries. This article also highlights the tools used to simplify the creation of various dynamic features of the dashboard.

by Sheena Philogene

Building a Large-Scale Digital Library Search Interface Using The Libraries Online Catalog

Jason Griffith and Eric Weig

The Kentucky Digital Newspaper Program (KDNP) was born out of the University of Kentucky Libraries’ (UKL) work in the National Digital Newspaper Program (NDNP) that began in 2005. In early 2021, a team of specialists at UKL from library systems, digital archives, and metadata management was formed to explore a new approach to searching this content by leveraging the power of the library services platform (Alma) and discovery system (Primo VE) licensed from Ex Libris. The result was the creation of a dedicated Primo VE search interface that would include KDNP content as well as all Kentucky newspapers held on microfilm in the UKL system. This article will describe the journey from the question of whether we could harness the power of Alma and Primo VE to display KDNP content, to the methodology used in creating a new dedicated search interface that can be replicated to create custom search interfaces of your own.

Apples to Oranges: Using Python and the pymarc library to match bookstore ISBNs to locally held eBook ISBNs

Mitchell Scott

To alleviate financial burdens faced by students and to provide additional avenues for the benefits shown to be present when no-cost materials are available to students (equity and access and an increase in student success metrics), more and more libraries are leveraging their collections and acquisition processes to provide no-cost eBook alternatives to students. It is common practice now for academic libraries to have a partnership with their campus bookstore and to receive a list of print and eBook materials required for an upcoming semester. Libraries take these lists and use various processes and workflows, some extremely labor intensive and others semi-labor intensive, to identify which of these titles they already own as unlimited access eBooks, and which titles could be purchased as unlimited access eBooks. The most common way to match bookstore titles to already licensed eBooks is by searching the bookstore provided ISBN or title in either the Library Management System (LMS), the Analytics and Reporting layer of the LMS, the Library Discovery Layer, or via another homegrown process. While some searching could potentially be automated, depending on the available functionality of the LMS or the Analytics component of the LMS, the difficulty lies in matching the bookstore ISBN, often the print ISBN, to the library eBook ISBN. This article will discuss the use of Python, the Pymarc library in Python, and Library eBook MARC records to create an automated identification process to accurately match bookstore lists to library eBook holdings.

The viability of using an open source locally hosted AI for creating metadata in digital image collections

Ingrid Reiche

Artificial intelligence (AI) can support metadata creation for images by generating descriptions, titles, and keywords for digital collections in libraries. Many AI options are available, ranging from cloud-based corporate software solutions, including Microsoft Azure Custom Vision and Google Cloud Vision, to open-source locally hosted software packages. This case study examines the feasibility of deploying the open-source, locally hosted AI software, Sheeko, and the accuracy of the descriptions generated for images using two of the pre-trained models. The study aims to ascertain if Sheeko’s AI would be a viable solution for producing metadata in the form of descriptions, or titles for digital collections in Libraries and Cultural Resources at the University of Calgary.

by Ingrid Reiche

To Everything There Is a Session: A Time to Listen, a Time to Read Multi-session CDs

Dianne Dietrich and Alex Nelson

When the cost of CD burners dropped precipitously in the late 1990s, consumers had access to the CD-R, a format with far greater storage capacity than floppy disks. Multiple session standards allowed users the flexibility to add subsequent content to an already-burned CD-R, which made them an attractive option for personal backups. In a digital preservation context, CDs with multiple sessions can pose significant challenges to workflows and can lead to data errantly not being acquired or reviewed if users are using a workflow designed for single-session, single-track CDs. In workflows that include CDs as software installation or transmission media, extra-session behavior can have an impact on software supply chain review.

This article provides an overview of the structure of a multi-session CD and outlines tool behavior of disk images generated from multi-session CDs. To support testing in specific contexts, we provide a guide to creating a multi-session CD that can be used when developing workflows. Finally, we provide techniques for extracting content from physical media as well as existing disk images generated from multi-session CDs.

PREMIS Events Through an Event-sourced Lens

Ross Spencer

The PREMIS metadata standard is widely adopted in the digital preservation community. Repository software often include fully compliant implementations or assert some level of conformance. Within PREMIS we have four semantic units, but “Events”, the topic of this paper, are particularly interesting as they describe “actions performed within or outside the repository that affects its capability to preserve Objects over the long term.” Events can help us to observe interactions with digital objects and understand where and when something may have gone wrong with them. Events in PREMIS, however, are slightly different to events in software development paradigms, specifically event driven software development – though similar, the design of PREMIS event logs does not promote their “being complete” nor their consumption and reuse; and so, learning from logs in event driven software development, may help us to simplify the PREMIS data model; plug identified gaps in implementations; and improve the ability to migrate digital content in future repositories.

Strategies for Digital Library Migration

Justin Littman, Mike Giarlo, Peter Mangiafico, Laura Wrubel, Naomi Dushay, Aaron Collier, Arcadia Falcone

A migration of the datastore and data model for Stanford Digital Repository’s digital object metadata was recently completed. This paper describes the motivations for this work and some of the strategies used to accomplish the migration. Strategies include: adopting a validatable data model, abstracting the datastore behind an API, separating concerns, testing metadata mappings against real digital objects, using reports to understand the data, templating unit tests, performing a rolling migration, and incorporating the migration into ongoing project work. These strategies may be useful to other repository or digital library application migrations.

Utilizing R and Python for Institutional Repository Daily Jobs

Yongli Zhou

In recent years, the programming languages R and Python have become very popular and are being used by many professions. However, they are not just limited to data scientists or programmers; they can also help librarians to perform many tasks more efficiently and possibly achieve goals that were almost impossible before. R and Python are scripting languages, which means they are not very complicated. With minimal programming experience, a librarian can learn how to program in these languages and start to apply them to work. This article provides examples of how to use R and Python to clean up metadata, resize images, and match transcripts with scanned images for the Colorado State University Institutional Repository.

ISSN 1940-5758