Issue 49, 2020-08-10
Editorial: For Pandemic Times Such as This
A pandemic changes the world and changes libraries.
Open Source Tools for Scaling Data Curation at QDR
This paper describes the development of services and tools for scaling data curation services at the Qualitative Data Repository (QDR). Through a set of open-source tools, semi-automated workflows, and extensions to the Dataverse platform, our team has built services for curators to efficiently and effectively publish collections of qualitatively derived data. The contributions we seek to make in this paper are as follows:
1. We describe ‘human-in-the-loop’ curation and the tools that facilitate this model at QDR;
2. We provide an in-depth discussion of the design and implementation of these tools, including applications specific to the Dataverse software repository, as well as standalone archiving tools written in R; and
3. We highlight the role of providing a service layer for data discovery and accessibility of qualitative data.
Keywords: Data curation; open-source; qualitative data
From Text to Map: Combing Named Entity Recognition and Geographic Information Systems
This tutorial shows readers how to leverage the power of named entity recognition (NER) and geographic information systems (GIS) to extract place names from text, geocode them, and create a public-facing map. This process is highly useful across disciplines. For example, it can be used to generate maps from historical primary sources, works of literature set in the real world, and corpora of academic scholarship. In order to lead the reader through this process, the authors work with a 500 article sample of the COVID-19 Open Research Dataset Challenge (CORD-19) dataset. As of the date of writing, CORD-19 includes 45,000 full-text articles with metadata. Using this sample, the authors demonstrate how to extract locations from the full-text with the spaCy library in Python, highlight methods to clean up the extracted data with the Pandas library, and finally teach the reader how to create an interactive map of the places using ArcGIS Online. The processes and code are described in a manner that is reusable for any corpus of text
Using Integrated Library Systems and Open Data to Analyze Library Cardholders
The Harrison Public Library in Westchester County, New York operates two library buildings in Harrison: The Richard E. Halperin Memorial Library Building (the library’s main building, located in downtown Harrison) and a West Harrison branch location. As part of its latest three-year strategic plan, the library sought to use existing resources to improve understanding of its cardholders at both locations.
To do so, we needed to link the circulation data in our integrated library system, Evergreen, to geographic data and demographic data. We decided to build a geodemographic heatmap that incorporated all three aforementioned types of data. Using Evergreen, American Community Survey (ACS) data, and Google Maps, we plotted each cardholder’s residence on a map, added census boundaries (called tracts) and our town’s borders to the map, and produced summary statistics for each tract detailing its demographics and the library card usage of its residents. In this article, we describe how we acquired the necessary data and built the heatmap. We also touch on how we safeguarded the data while building the heatmap, which is an internal tool available only to select authorized staff members. Finally, we discuss what we learned from the heatmap and how libraries can use open data to benefit their communities.
Update OCLC Holdings Without Paying Additional Fees: A Patchwork Approach
Accurate OCLC holdings are vital for interlibrary loan transactions. However, over time weeding projects, replacing lost or damaged materials, and human error can leave a library with a catalog that is no longer reflected through OCLC. While OCLC offers reclamation services to bring poorly maintained collections up-to-date, the associated fee may be cost prohibitive for libraries with limited budgets. This article will describe the process used at Austin Peay State University to identify, isolate, and update holdings using OCLC Collection Manager queries, MarcEdit, Excel, and Python. Some portions of this process are completed using basic coding; however, troubleshooting techniques will be included for those with limited previous experience.
Data reuse in linked data projects: a comparison of Alma and Share-VDE BIBFRAME networks
This article presents an analysis of the enrichment, transformation, and clustering used by vendors Casalini Libri/@CULT and Ex Libris for their respective conversions of MARC data to BIBFRAME. The analysis considers the source MARC21 data used by Alma then the enrichment and transformation of MARC21 data from Share-VDE partner libraries. The clustering of linked data into a BIBFRAME network is a key outcome of data reuse in linked data projects and fundamental to the improvement of the discovery of library collections on the web and within search systems.
CollectionBuilder-CONTENTdm: Developing a Static Web ‘Skin’ for CONTENTdm-based Digital Collections
Unsatisfied with customization options for CONTENTdm, librarians at University of Idaho Library have been using a modern static web approach to creating digital exhibit websites that sit in front of the digital repository. This “skin” is designed to provide users with new pathways to discover and explore collection content and context. This article describes the concepts behind the approach and how it has developed into an open source, data-driven tool called CollectionBuilider-CONTENTdm. The authors outline the design decisions and principles guiding the development of CollectionBuilder, and detail how a version is used at the University of Idaho Library to collaboratively build digital collections and digital scholarship projects.
Automated Collections Workflows in GOBI: Using Python to Scrape for Purchase Options
The NC State University Libraries has developed a tool for querying GOBI, our print and ebook ordering vendor platform, to automate monthly collections reports. These reports detail purchase options for missing or long-overdue items, as well as popular items with multiple holds. GOBI does not offer an API, forcing staff to conduct manual title-by-title searches that previously took up to 15 hours per month. To make this process more efficient, we wrote a Python script that automates title searches and the extraction of key data (price, date of publication, binding type) from GOBI. This tool can gather data for hundreds of titles in half an hour or less, freeing up time for other projects.
This article will describe the process of creating this script, as well as how it finds and selects data in GOBI. It will also discuss how these results are paired with NC State’s holdings data to create reports for collection managers. Lastly, the article will examine obstacles that were experienced in the creation of the tool and offer recommendations for other organizations seeking to automate collections workflows.
Testing remote access to e-resource with CodeceptJS
At the Badische Landesbibliothek Karlsruhe (BLB) we offer a variety of e-resources with different access requirements. On the one hand, there is free access to open access material, no matter where you are. On the other hand, there are e-resources that you can only access when you are in the rooms of the BLB. We also offer e-resources that you can access from anywhere, but you must have a library account for authentication to gain access. To test the functionality of these access methods, we have created a project to automatically test the entire process from searching our catalogue, selecting a hit, logging in to the provider’s site and checking the results. For this we use the End 2 End Testing Framework CodeceptJS.