A practical method for searching scholarly papers in the General Index without a high-performance computer

Emily Cukier

The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguistics. Despite being positioned as a public utility, there is no user interface; one must download, query, and extract results from raw data tables. Not only is computing skill a barrier to use, but the file sizes are too large for most desktop computers to handle. This article will show a practical way to use the GI for researchers with moderate skills and resources. It will walk though building a bibliography of articles and a visualizing yearly prevalence of a topic in the General Index, using simple R programming commands and a modestly equipped desktop computer (code is available at https://osf.io/s39n7/). It will briefly discuss what else can be done (and how) with more powerful computational resources.

Creating a Full Multitenant Back End User Experience in Omeka S with the Teams Module

Alexander Dryden and Daniel G. Tracy

When Omeka S appeared as a beta release in 2016, it offered the opportunity for researchers or larger organizations to publish multiple Omeka sites from the same installation. Multisite functionality was and continues to be a major advance for what had become the premiere platform for scholarly digital exhibits produced by libraries, museums, researchers, and students. However, while geared to larger institutional contexts, Omeka S poses some user experience challenges on the back end for larger organizations with numerous users creating different sites. These challenges include a “cluttered” effect for many users seeing resources they do not need to access and data integrity challenges due to the possibility of users editing resources that other users need in their current state. The University of Illinois Library, drawing on two local use cases as well as two additional external use cases, developed the Teams module to address these challenges. This article describes the needs leading to the decision to create the module, the project requirement gathering process, and the implementation and ongoing development of Teams. The module and findings are likely to be of interest to other institutions adopting Omeka S but also, more generally, to libraries seeking to contribute successfully to larger open-source initiatives.

Building a Large-Scale Digital Library Search Interface Using The Libraries Online Catalog

Jason Griffith and Eric Weig

The Kentucky Digital Newspaper Program (KDNP) was born out of the University of Kentucky Libraries’ (UKL) work in the National Digital Newspaper Program (NDNP) that began in 2005. In early 2021, a team of specialists at UKL from library systems, digital archives, and metadata management was formed to explore a new approach to searching this content by leveraging the power of the library services platform (Alma) and discovery system (Primo VE) licensed from Ex Libris. The result was the creation of a dedicated Primo VE search interface that would include KDNP content as well as all Kentucky newspapers held on microfilm in the UKL system. This article will describe the journey from the question of whether we could harness the power of Alma and Primo VE to display KDNP content, to the methodology used in creating a new dedicated search interface that can be replicated to create custom search interfaces of your own.

A Fast and Full-Text Search Engine for Educational Lecture Archives

Arun F. Adrakatti and K.R. Mulla

E-lecturing and online learning are more common and convenient than offline teaching and classroom learning in the academic community after the covid-19 pandemic. Universities and research institutions are recording the lecture videos delivered by the faculty members and archiving them internally. Most of the lecture videos are hosted on popular video-sharing platforms creating private channels. The students access published lecture videos independent of time and location. Searching becomes difficult from large video repositories for students as search is restricted on metadata. We presented a design and developed an open-source application to build an education lecture archive with fast and full-text search within the video content.

Editorial — New name change policy

Ron Peterson

The Code4Lib Journal Editorial Committee is implementing a new name change policy aimed to facilitate the process and ensure timely and comprehensive name changes for anyone who needs to change their name within the Journal.

Introducing SAGE: An Open-Source Solution for Customizable Discovery Across Collections

David B. Lowe, James Creel, Elizabeth German, Douglas Hahn, and Jeremy Huff

Digital libraries at research universities make use of a wide range of unique tools to enable the sharing of eclectic sets of texts, images, audio, video, and other digital objects. Presenting these assorted local treasures to the world can be a challenge, since text is often siloed with text, images with images, and so on, such that per type, there may be separate user experiences in a variety of unique discovery interfaces. One common tool that has been developed in recent years to potentially unite them all is the Apache Solr index. Texas A&M University (TAMU) Libraries has harnessed Solr for internal indexing for repositories like DSpace, Fedora, and Avalon. Impressed by frameworks like Blacklight at peer institutions, TAMU Libraries wrote an analogous set of tools in Java, and thus was born SAGE, the Solr AGgregation Engine, with two primary functions: 1) aggregating Solr indices or “cores,” from various local sources, and 2) presenting search facility to the user in a discovery interface.

Enhancing Print Journal Analysis for Shared Print Collections

Dana Jemison, Lucy Liu, Anna Striker, Alison Wohlers, Jing Jiang, and Judy Dobry

The Western Regional Storage Trust (WEST), is a distributed shared print journal repository program serving research libraries, college and university libraries, and library consortia in the Western Region of the United States. WEST solicits serial bibliographic records and related holdings biennially, which are evaluated and identified as candidates for shared print archiving using a complex collection analysis process. California Digital Library’s Discovery & Delivery WEST operations team (WEST-Ops) supports the functionality behind this collection analysis process used by WEST program staff (WEST-Staff) and members.

For WEST, proposals for shared print archiving have been historically predicated on what is known as an Ulrich’s journal family, which pulls together related serial titles, for example, succeeding and preceding serial titles, their supplements, and foreign language parallel titles. Ulrich’s, while it has been invaluable, proves problematic in several ways, resulting in the approximate omission of half of the journal titles submitted for collection analysis.

Part of WEST’s effectiveness in archiving hinges upon its ability to analyze local serials data across its membership as holistically as possible. The process that enables this analysis, and subsequent archiving proposals, is dependent on Ulrich’s journal family, for which ISSN has been traditionally used to match and cluster all related titles within a particular family. As such, the process is limited in that many journals have never been assigned ISSNs, especially older publications, or member bibliographic records may lack an ISSN(s), though the ISSN may exist in an OCLC primary record.

Building a mechanism for matching on ISSNs that goes beyond the base set of primary, former, and succeeding titles, expands the number of eligible ISSNs that facilitate Ulrich’s journal family matching. Furthermore, when no matches in Ulrich’s can be made based on ISSN, other types of control numbers within a bibliographic record may be used to match with records that have been previously matched with an Ulrich’s journal family via ISSN, resulting in a significant increase in the number of titles eligible for collection analysis.

This paper will discuss problems in Ulrich’s journal family matching, improved functional methodologies developed to address those problems, and potential strategies to improve in serial title clustering in the future.

Managing an institutional repository workflow with GitLab and a folder-based deposit system

Whitney R. Johnson-Freeman, Mark E. Phillips, and Kristy K. Phillips

Institutional Repositories (IR) exist in a variety of configurations and in various states of development across the country. Each organization with an IR has a workflow that can range from explicitly documented and codified sets of software and human workflows, to ad hoc assortments of methods for working with faculty to acquire, process and load items into a repository. The University of North Texas (UNT) Libraries has managed an IR called UNT Scholarly Works for the past decade but has until recently relied on ad hoc workflows. Over the past six months, we have worked to improve our processes in a way that is extensible and flexible while also providing a clear workflow for our staff to process submitted and harvested content. Our approach makes use of GitLab and its associated tools to track and communicate priorities for a multi-user team processing resources. We paired this Web-based management with a folder-based system for moving the deposited resources through a sequential set of processes that are necessary to describe, upload, and preserve the resource. This strategy can be used in a number of different applications and can serve as a set of building blocks that can be configured in different ways. This article will discuss which components of GitLab are used together as tools for tracking deposits from faculty as they move through different steps in the workflow. Likewise, the folder-based workflow queue will be presented and described as implemented at UNT, and examples for how we have used it in different situations will be presented.

MatchMarc: A Google Sheets Add-on that uses the WorldCat Search API

Michelle Suranofsky and Lisa McColl

Lehigh University Libraries has developed a new tool for querying WorldCat using the WorldCat Search API.  The tool is a Google Sheet Add-on and is available now via the Google Sheets Add-ons menu under the name “MatchMarc.” The add-on is easily customizable, with no knowledge of coding needed. The tool will return a single “best” OCLC record number, and its bibliographic information for a given ISBN or LCCN, allowing the user to set up and define “best.” Because all of the information, the input, the criteria, and the results exist in the Google Sheets environment, efficient workflows can be developed from this flexible starting point. This article will discuss the development of the add-on, how it works, and future plans for development.

Consortial RightsStatements.org Implementation and Faceted Search for Reuse Rights in Digital Library Materials

Wilhelmina Randtke, Randy Fischer, and Gail Lewis

The Florida Academic Library Services Cooperative (FALSC) makes available digital library hosting free-of-charge to all institutions of Florida public higher education. 21 institutions participate in the Islandora digital library platform hosted through FALSC. Centralized digital library hosting through FALSC, or its predecessor consortium, has been available since 1994. Meanwhile, the RightsStatements.org standard, which provides a controlled vocabulary for indicating the copyright status of digital library material, was released in 2016. After the standard was released, participating libraries expressed interest in implementing RightsStatements.org for existing digital content. During Fall 2018 and Spring 2019, FALSC implemented RightsStatements.org values on Islandora sites. This article describes the process undertaken by FALSC, the lessons learned, and recommendations for libraries looking to implement RightsStatements.org values.

Making the Move to Open Journal Systems 3: Recommendations for a (mostly) painless upgrade

Mariya Maistrovskaya & Kaitlin Newson

From June 2017 to August 2018, Scholars Portal, a consortial service of the Ontario Council of University Libraries, upgraded 10 different multi-journal instances of the Open Journal Systems (OJS) 3 software, building expertise on the upgrade process along the way. The final and the largest instance to be upgraded was the University of Toronto Libraries, which hosts over 50 journals. In this article, we will discuss the upgrade planning and process, problems encountered along the way, and some best practices in supporting journal teams through the upgrade on a multi-journal instance. We will also include checklists and technical troubleshooting tips to help institutions make their upgrade as smooth and worry-free as possible. Finally, we will go over post-upgrade support strategies and next steps in making the most out of your transition to OJS 3.

This article will primarily be useful for institutions hosting instances of OJS 2, but those that have already upgraded, or are considering hosting the software, may find the outlined approach to support and testing helpful.

A Systematic Approach to Collecting Student Work

Janina Mueller

Digital technology has profoundly changed design education over the past couple of decades. The digital design process generates design solutions from many different angles and points of views, captured and expressed in many file formats and file types. In this environment of ubiquitous digital files, what are effective ways for a design school to capture a snapshot of the work created within their school, and to create a long-term collection of student files for purposes of research and promotion, and for preserving the history of the school?

This paper describes the recent efforts of the Harvard Graduate School of Design in creating a scalable and long-term data management solution for digital student work files. The first part describes the context and history of student work at the Harvard Graduate School of Design. The second section of the paper focuses on the functionality of the tool we created, and lastly, the paper looks at the library’s current efforts for the long-term archiving of the collected student files in Harvard’s digital repository.

Editorial: Looking to the Past to Find the Future

Ron Peterson

I reflect on my 10+ year tenure with the Code4Lib Journal. Ponder the work of our editors and authors. And come out the other side ready for 10 more years.

Getting More out of MARC with Primo: Strategies for Display, Search and Faceting

Kelley McGrath and Lesley Lowery

Going beyond author, title, subject and notes, there are many new (or newly-revitalized) fields and subfields in the MARC 21 format that support more structured data and could be beneficial to users if exposed in a discovery interface. In this article, we describe how the Orbis Cascade Alliance has implemented display, search and faceting for several of these fields and subfields in our Primo discovery interface. We discuss problems and challenges we encountered, both Primo-specific and those that would apply in any search interface.

The Tools We Don’t Have: Future and Current Inventory Management in a Room Reservation System

Denis Galvin, Mang Sun, and Hanjun Lee

Fondren Library at Rice University has numerous study rooms which are very popular with students. Study rooms, and equipment, have future inventory needs which require a visual calendar for reservation. Traditionally libraries’ manage reservations through a booking module in an Integrated Library System (ILS), but most, if not all, booking modules lack a visual calendar which allows patrons to pick out a place and time to create a reservation. The IT department at Fondren library was able to overcome this limitation by modifying the open source Booked Scheduling software so that it did all of the front end work for the ILS, while still allowing the ILS to manage the use of the rooms.

Ship It: Logistical tracking of ILL physical loans

Ryan Litsey & Scott Luker

The OBILLSK Shipment Tracking system is the first consolidated and comprehensive shipment information system for interlibrary loan. The system is unique because not only does it offer an interface for consolidating the items being shipped out of an ILL office, it also provides real time statistical data of global geographic shipping patterns, tracking of packages across all major couriers, and customized date range reporting for ILL shipment activity. This system takes advantage of several web-based technologies that makes it easy to use for students, staff and library administrators. The web-based software utilizes a .NET platform and SQL Server database. Client-side frameworks include Bootstrap and jQuery for responsive design, Shield UI for data visualizations, and jVectorMap for geographical representation of shipments. The system is now available for all libraries. It is actively in use at 15 academic libraries nationwide and has over 190,000 items scanned since October of 2016. It is through the development of innovative technologies that libraries can continue to serve as incubators for practical solutions that can help the discipline and practice of librarianship.

Using the ‘rentrez’ R Package to Identify Repository Records for NCBI LinkOut

Yoo Young Lee, Erin D. Foster, David E. Polley, and Jere Odell

In this article, we provide a brief overview of the National Center for Biotechnology Information (NCBI) LinkOut service for institutional repositories, a service that allows links from the PubMed database to full-text versions of articles in participating institutional repositories (IRs). We discuss the criteria for participation in NCBI LinkOut for IRs, current methods for participating, and outline our solution for automating the identification of eligible articles in a repository using R and the ‘rentrez’ package. Using our solution, we quickly processed 4,400 open access items from our repository, identified the 557 eligible records, and sent them to the NLM. Direct linking from PubMed resulted in a 17% increase in web traffic.

Leveraging Python to improve ebook metadata selection, ingest, and management

Kelly Thompson and Stacie Traill

Libraries face many challenges in managing descriptive metadata for ebooks, including quality control, completeness of coverage, and ongoing management. The recent emergence of library management systems that automatically provide descriptive metadata for e-resources activated in system knowledge bases means that ebook management models are moving toward both greater efficiency and more complex implementation and maintenance choices. Automated and data-driven processes for ebook management have always been desirable, but in the current environment, they become necessary. In addition to initial selection of a record source, automation can be applied to quality control processes and ongoing maintenance in order to keep manual, eyes-on work to a minimum while providing the best possible discovery and access. In this article, we describe how we are using Python scripts to address these challenges.

DuEPublicA: Automated bibliometric reports based on the University Bibliography and external citation data

Eike T. Spielberg

This paper describes a web application to generate bibliometric reports based on the University Bibliography and the Scopus citation database. Our goal is to offer an alternative to easy-to-prepare automated reports from commercial sources. These often suffer from an incomplete coverage of publication types and a difficult attribution to people, institutes and universities. Using our University Bibliography as the source to select relevant publications solves the two problems. As it is a local system, maintained and set up by the library, we can include every publication type we want. As the University Bibliography is linked to the identity management system of the university, it enables an easy selection of publications for people, institutes and the whole university.

The program is designed as a web application, which collects publications from the University Bibliography, enriches them with citation data from Scopus and performs three kinds of analyses:
1. A general analysis (number and type of publications, publications per year etc.),
2. A citation analysis (average citations per publication, h-index, uncitedness), and
3. An affiliation analysis (home and partner institutions)

We tried to keep the code highly generic, so that the inclusion of other databases (Web of Science, IEEE) or other bibliographies is easily feasible. The application is written in Java and XML and uses XSL transformations and LaTeX to generate bibliometric reports as HTML pages and in pdf format. Warnings and alerts are automatically included if the citation analysis covers only a small fraction of the publications from the University Bibliography. In addition, we describe a small tool that helps to collect author details for an analysis.

Participatory Design Methods for Collaboration and Communication

Tara Wood, Cate Kompare

Website redesigns can be contentious and fraught in any type of organization, and libraries are no exception. Coming to consensus on priorities and design decisions is nearly impossible, as different groups compete to ensure their subject or specialty area is represented. To keep projects on track and on time, libraries may give a few staff members the authority to make all of the decisions, while keeping user research limited to a small number of usability tests. While these tactics are sometimes necessary, at best they can leave many feeling left out of the process, and at worst, can result in major oversights in the final design.

Participatory design methods can bring users and stakeholders into the design process and ultimately lead to a better design and less friction in the project. The authors share their experience and lessons learned using participatory design techniques in a website redesign project at a large, multi-location academic library, and how these techniques facilitated communication, shaped design decisions, and kept a complex, difficult project on track.

Consortial-Based Customizations for New Primo UI

Dan Moore and Nathan Mealey

Users interested in customizing their Primo installation are required to configure specific settings, files, and code during the View setup process. A consequence of this is that unique customizations are not easily sharable between institutions. With the release of the new Primo User Interface, Ex Libris has enabled institutions to manage interface customizations via the Package Customization Manager. In the summer of 2016, an Orbis Cascade Alliance working group investigated the efficacy of the Package Manager as a means of centrally sharing and deploying Orbis Cascade Alliance Primo Toolkit customizations. By virtue of passively loading customizations to the central package, each institution could pass custom parameters with local JS in order to adapt central customizations to the specific needs of that institution’s users. This article will address both the potential and the limitations of the Primo Package Customization Manager. It will also provide best practices for consortia seeking to centrally manage and share Primo enhancements and it will identify areas of future development for centrally shared customizations.

Editorial Introduction – Summer Reading List

Ron Peterson

New additions for your summer reading list!

Data for Decision Making: Tracking Your Library’s Needs With TrackRef

Michael Carlozzi

Library services must adapt to changing patron needs. These adaptations should be data-driven. This paper reports on the use of TrackRef, an open source and free web program for managing reference statistics.

Shining a Light on Scientific Data: Building a Data Catalog to Foster Data Sharing and Reuse

Ian Lamb and Catherine Larson

The scientific community’s growing eagerness to make research data available to the public provides libraries — with our expertise in metadata and discovery — an interesting new opportunity. This paper details the in-house creation of a “data catalog” which describes datasets ranging from population-level studies like the US Census to small, specialized datasets created by researchers at our own institution. Based on Symfony2 and Solr, the data catalog provides a powerful search interface to help researchers locate the data that can help them, and an administrative interface so librarians can add, edit, and manage metadata elements at will. This paper will outline the successes, failures, and total redos that culminated in the current manifestation of our data catalog.

Bringing our Internet Archive collection back home: A case study from the University of Mary Washington

Katherine Perdue

The Internet Archive is a great boon to smaller libraries that may not have the resources to host their own digital materials. However, individual items uploaded to the Internet Archive are hard to treat as a collection. Full text searching can only be done within an item. It can be difficult to direct patrons to local resources. Since 2010, the University of Mary Washington has uploaded over two thousand digitized university publications, including the student newspaper and the yearbook, to the Internet Archive. Taken together, these represent almost 100 years of UMW history. Using Apache Lucy, we built a search interface, Eagle Explorer, that treats our Internet Archive collection as a cohesive whole. Patrons can use Eagle Explorer to full-text search within the collection and to filter by date and publication. This article will describe how we created Eagle Explorer, the challenges we encountered, and its reception from the campus community.

Making User Rights Clear: Adding e-resource License Information in Library Systems

Jenny Jing, Qinqin Lin, Ahmedullah Sharifi and Mark Swartz

Libraries sign a wide variety of licensing agreements that specify terms of both access and use of a publisher’s electronic collections. Adding easily accessible licensing information to collections helps ensure that library users comply with these agreements. This article will describe the addition of licensing permissions to resource displays using Mondo [1] by Queen’s University and Scholars Portal (a service of the Ontario Council of University Libraries) [2] . We will give a brief introduction to Mondo and explain how we improved Mondo to add the license permissions to different library systems. The systems we used are an ILS (Voyager), an OpenURL Link Resolver (360 Link), and a Discovery System (Summon). However, libraries can use Mondo to add the license permissions to other library systems which allow user configurations.

Improving Access to Archival Collections with Automated Entity Extraction

Kyle Banerjee and Max Johnson

The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise remain unknown.

Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with automated extraction of access points, we discuss using publically accessible API’s to extract entities (i.e. people, places, concepts, etc.) from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with recommendations about how this method can be used in archives as well as for other library applications.

Code as Code: Speculations on Diversity, Inequity, and Digital Women

Sharon L. Comstock, Jerica Copeny, and Cynthia Landrum

All technologies are social. Taking this socio-technological position becomes less a political stance as a necessity when considering the lived experience of digital inequity, divides, and –isms as they are encountered in every-day library work spheres. Personal experience as women and women of color in our respective technological and leadership communities provides both fore- and background to explore the private-public lines delineating definitions of “diversity”, “inequity”, and digital literacies in library practice. We suggest that by not probing these definitions at the most personal level of lived experience, we in the LIS and technology professions will remain well-intentioned, but ineffective, in genuine inclusion.

Parsing and Matching Dates in VIAF

Jenny A. Toves and Thomas B. Hickey

The Virtual International Authority File (OCLC Online Computer Library Center 2013) http://viaf.org is built from dozens of authority files with tens of millions of names in more than 150 million authority and bibliographic records expressed in multiple languages, scripts and formats. One of the main tasks in VIAF is to bring together personal names which may have various dates associated with them, such as birth, death or when they were active. These dates can be quite complicated with ranges, approximations, BCE dates, different scripts, and even different calendars. Analysis of the nearly 400,000 unique date strings in VIAF led us to a parsing technique that relies on only a few basic patterns for them. Our goal is to correctly interpret at least 99% of all the dates we find in each of VIAF’s authority files and to use the dates to facilitate matches between authority records.

Python source code for the process described here is available at https://github.com/OCLC-Developer-Network/viaf-dates.

Editorial Introduction: Seeking a Diversity of Voices

Ron Peterson

Making the Journal the best that it can be.

ISSN 1940-5758