Issue 57, 2023-08-29

Editorial: Big code, little code, open code, old code

Péter Király

Paraphrasing the title of Christine L. Borgman’s inaugural lecture in Göttingen some years ago “Big data, little data, open data” I could say that the current issue of Code4Lib is about big code, little code, open code, old code. The good side of coding is that effective contribution could be done with different levels and types of background knowledge. The issue proves to us that even small modifications or sharing knowledge about command line usage of a tool might be very useful for the user community. Let’s see what we have!

Evaluating HTJ2K as a Drop-In Replacement for JPEG2000 with IIIF

Glen Robson, Stefano Cossu, Ruven Pillay, Michael D. Smith

JPEG2000 is a widely adopted open standard for images in cultural heritage, both for delivering access and for creating preservation files that are losslessly compressed. Recently, a new extension to JPEG2000 has been developed by the JPEG Committee: “High Throughput JPEG2000,” better known as HTJ2K. HTJ2K promises faster encoding and decoding speeds compared to traditional JPEG2000 Part-1, while requiring little or no changes to existing code and infrastructure. The IIIF community has completed a project to evaluate HTJ2K as a drop-in replacement for encoding JPEG2000 and to validate the expected improvements regarding speed and efficiency.

The group looked at a number of tools including Kakadu, OpenJPEG, and Grok that support HTJ2K and ran encoding tests comparing the encoding speeds and required disk space for these images. The group also set up decoding speed tests comparing HTJ2K with tiled pyramid TIFF and traditional JPEG2000 using one of the major open source IIIF Image servers, IIPImage.

We found that HTJ2K is significantly faster than traditional JPEG2000, though the results are more nuanced when compared with TIFF.

Standardization of Journal Title Information from Interlibrary Loan Data: A Customized Python Code Approach

Jennifer Ye Moon-Chung

Interlibrary loan (ILL) data plays a crucial role in making informed journal subscription decisions. However, inconsistent or incomplete data associated with journal titles and International Standard Serial Numbers (ISSNs) as data points often entered inaccurately by requestors, presents challenges when attempting to make use of the ILL data. This article introduces a solution utilizing customized Python code to standardize journal titles obtained from user-entered data. The solution incorporates a preprocessing workflow that filters out irrelevant information and employs Application Programming Interfaces (APIs) to replace inaccurate titles with precise ones based on retrieved ISSNs, ensuring data accuracy. The solution then presents the processed data in a dashboard format, highlighting the most requested journals and enabling librarians to interactively explore the data. By adopting this approach, librarians can make well-informed decisions and conduct thorough analysis, resulting in more efficient and effective management of library resources.

ChronoNLP: Exploration and Analysis of Chronological Textual Corpora

Erin Wolfe

This article introduces ChronoNLP, a free and open-source web application designed to enable the application of Natural Language Processing (NLP) techniques to textual datasets with a time-based component. This interactive Python platform allows users to filter, search, explore, and visualize this data, allowing the temporal aspect to play a central role in data analysis. ChronoNLP makes use of several powerful NLP libraries to facilitate various text analysis techniques including topic modeling, term/TF-IDF frequency evaluation, automated keyword extraction, named entity recognition and other tasks through a graphical interface without the need for coding or technical knowledge. By highlighting the temporal aspect of specific types of corpora, ChronoNLP provides access to methods of parsing and visualizing the data in a user-friendly format to help uncover patterns and trends in text-based materials.

A Very Small Pond: Discovery Systems That Can Be Used with FOLIO in Academic Libraries

Aaron Neslin, Jaime Taylor

FOLIO, an open source library services platform, does not have a front end patron interface for searching and using library materials. Any library installing FOLIO will need at least one other software to perform those functions. This article evaluates which systems, in a limited marketplace, are available for academic libraries to use with FOLIO.

Supporting Library Consortia Website Needs: Two Case Studies

Elizabeth Joan Kelly

LOUIS: The Louisiana Library Network provides library technology infrastructure, electronic resources, affordable learning, and digital literacy support for its 47 academic library members. With this support comes a need to develop web solutions for members, a challenging task as the members have their own websites on a multitude of platforms, and a multitude of library faculty and staff with differing needs. This article details two case studies in developing consortia-specific web design projects. The first summarizes the LOUIS Tabbed Search Box Order Form, an opportunity for members to “order” a custom-made search box for the various services LOUIS supports that can then be embedded on their library’s website. The second involves the LOUIS Community Jobs Board, a member-driven job listing tool that exists on the LOUIS site, but that members can publish jobs to using a Google Form. Both the Search Box Order Form and the Jobs Board have resulted in increased engagement with and satisfaction from member libraries. This article will include best practices, practical solutions, and sample code for both projects.

From DSpace to Islandora: Why and How

Vlastimil Krejčíř, Alžbeta Strakošová, and Jan Adler

The article summarizes the experience of switching from DSpace to Islandora. It briefly gives the historical background and reasons for switching to Islandora. It then compares the basic features of the two systems: installation, updates, operations, and customization options. Finally, it concludes practical lessons learned from the migration and provides examples of implemented digital libraries at Masaryk University.

Creating a Full Multitenant Back End User Experience in Omeka S with the Teams Module

Alexander Dryden and Daniel G. Tracy

When Omeka S appeared as a beta release in 2016, it offered the opportunity for researchers or larger organizations to publish multiple Omeka sites from the same installation. Multisite functionality was and continues to be a major advance for what had become the premiere platform for scholarly digital exhibits produced by libraries, museums, researchers, and students. However, while geared to larger institutional contexts, Omeka S poses some user experience challenges on the back end for larger organizations with numerous users creating different sites. These challenges include a “cluttered” effect for many users seeing resources they do not need to access and data integrity challenges due to the possibility of users editing resources that other users need in their current state. The University of Illinois Library, drawing on two local use cases as well as two additional external use cases, developed the Teams module to address these challenges. This article describes the needs leading to the decision to create the module, the project requirement gathering process, and the implementation and ongoing development of Teams. The module and findings are likely to be of interest to other institutions adopting Omeka S but also, more generally, to libraries seeking to contribute successfully to larger open-source initiatives.

The Forgotten Disc: Synthesis and Recommendations for Viable VCD Preservation

Andrew Weaver and Ashley Blewer

As optical media held by cultural heritage institutions has fully transitioned from use as a digital preservation ‘solution’ to a digital preservation risk, an increasing amount of effort has been focused on exploring tools and workflows to migrate the data off of these materials before it is permanently lost to physical degradation. One optical format, however, has been broadly ignored by the existing body of work: the humble Video CD.

While never a dominant format in the Anglosphere, the Video CD, or VCD, held wide popularity from the 1990s through the 2000s in Asia and other regions. As such, a dedicated exploration of preservation solutions for VCD has utility both as a resource for institutions that collect heavily in Pacific Rim materials, as well as a means to, in a minor way, aid in the ongoing efforts to expand the Digital Preservation corpus beyond its traditional focus of issues prevalent in North America and Europe.

This paper introduces an overview of VCD as a format and summarizes its unique characteristics that impact preservation decisions and presents the results of a survey of existing tools and methods for the migration of VCD contents. This paper conveys practical methods for migrating VCD material from the original carrier and into both digital preservation and access workflows.

Breathing Life into Archon: A Case Study in Working with an Unsupported System

Krista L. Gray

Archival repositories at the University of Illinois Urbana-Champaign Library have relied on Archon to represent archival description and finding aids to researchers worldwide since its launch in 2006. Archon has been officially unsupported software, however, for more than half of this time span. This article will discuss strategies and approaches used to enhance and extend Archon’s functionality during this period of little to no support for maintaining the software. Whether in enhancing accessibility and visual aesthetics through custom theming, considering how to present data points in new ways to support additional functions, or making modifications so that the database would support UTF-8 encoding, a wide variety of opportunities proved possible for enhancing user experience despite the inherent limitations of working with an unsupported system. Working primarily from the skill set of an archivist with programming experience, rather than that of a software developer, the author also discusses some of the strengths emerging from this “on the ground” approach to developing enhancements to an archival access and collection management system.

An introduction to using metrics to assess the health and sustainability of library open source software projects

Jenn Colt

In LYRASIS 2021 Open Source Software Report: Understanding the Landscape of Open Source Software Support in American Libraries (Rosen & Grogg, 2021), responding libraries indicated the sustainability of OSS projects to be an important concern when making decisions about adoption. However, methods libraries might use to gather information about sustainability is not discussed. Metrics defined by the Linux Foundation’s CHAOSS project (https://chaoss.community/) are designed to measure the health and sustainability of open source software (OSS) communities and may be useful for libraries who are making decisions about adopting particular OSS applications. I demonstrate the use of cauldron.io as one method to gather and visualize the data for these metrics, and discuss the benefits and limitations of using them for decision-making.

Searching for Meaning Rather Than Keywords and Returning Answers Rather Than Links

Kent Fitch

Large language models (LLMs) have transformed the largest web search engines: for over ten years, public expectations of being able to search on meaning rather than just keywords have become increasingly realised. Expectations are now moving further: from a search query generating a list of “ten blue links” to producing an answer to a question, complete with citations.

This article describes a proof-of-concept that applies the latest search technology to library collections by implementing a semantic search across a collection of 45,000 newspaper articles from the National Library of Australia’s Trove repository, and using OpenAI’s ChatGPT4 API to generate answers to questions on that collection that include source article citations. It also describes some techniques used to scale semantic search to a collection of 220 million articles.

ISSN 1940-5758