Issue 20, 2013-04-17

Editorial Introduction: It is Volunteers All the Way Down…

Peter Murray

Workflow Tools for Digital Curation

Andrew James Weidner and Daniel Gelaw Alemneh

Maintaining usable and sustainable digital collections requires a complex set of actions that address the many challenges at various stages of the digital object lifecycle. Digital curation activities enhance access and retrieval, maintain quality, add value, and facilitate use and re-use over time. Digital resource lifecycle management is becoming an increasingly important topic as digital curators actively explore software tools that perform metadata curation and file management tasks. Accordingly, the University of North Texas (UNT) Libraries develop tools and workflows that streamline production and quality assurance activities. This article demonstrates two open source software tools, AutoHotkey and Selenium IDE, which the UNT Digital Libraries Division has adopted for use during the pre-ingest and post-ingest stages of the digital resource lifecycle.

Augmenting the Cataloger’s Bag of Tricks : Using MarcEdit, Python, and PyMARC for Batch-Processing MARC Records Generated From the Archivists’ Toolkit

Heidi Frank

Catalogers have traditionally created and edited MARC records on a one-by-one basis. Recently, it has become more common for catalogers to delve into scripting and programming tools in order to automate the processing of large numbers of records simultaneously. This article provides a case study showing how MARCXML archival records generated by the Archivists’ Toolkit (AT) can be modified in batches using the MarcEdit software, Python scripting and the PyMARC module in Python. It analyzes selected problems with the MARC records exported by the AT and shows how MarcEdit and Python are used to resolve them so that the records are formatted correctly for loading into the library’s local catalog. Similar methods could be used by catalogers dealing with any large set of MARC data, such as ebook records from vendors.

Keeping up with Ebooks: Automated Normalization and Access Checking with Normac

Kathryn Lybarger

Cataloging ebooks is difficult to do well, as they are often purchased in large collections, sometimes with only low-quality cataloging copy available. MARC records may be provided upfront in a large batch, or trickle in one at a time as they become available. Records may contain links that point nowhere, to the wrong book, or to an offer to sell you the book you already own. Loading records sight unseen may introduce inconsistency or overlay good print records with poor electronic ones, making the catalog much more difficult to search.

This article describes in more detail the major challenges in ebook cataloging, record normalization and access checking, and introduces Normac: an open source web-based tool for processing MARC records.

Developing a Digital Video Library with the YouTube Data API

Jason Clark

MSU Library has created a digital video library using the YouTube API to power our local library channel. It is a complete search and browse application with item level views, microdata, a caching and optimization routine, and a file backup routine. The article will discuss applying the YouTube API as a database application layer: workflow efficiencies, metadata procedures and local backup and optimization procedures. Code samples in PHP, .htaccess examples, and shell commands used in developing the application and routines will be explained at length. And finally, a complete prototype application will be released on github for other libraries to get started using the lessons learned. A live version of the application is here: http://www.lib.montana.edu/channel/. The real benefit of this method is the low overhead for smaller shops and the ability to scale production and distribution of digital video.

Better Search Through Query Expansion Using Controlled Vocabularies and Apache Solr

Scott Williams

This article describes how the University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum) modified its Solr-based discovery interface to improve recall and enable end users to benefit from the power of their in-house controlled vocabularies. These modifications automatically expand the query generated by any search term that matches their controlled vocabulary to include all related alternate and narrower terms. For example, if a user enters Ohio, that search will retrieve the record for an arrowhead found in Cincinnati (a narrower term of Ohio) even if that record does not include the term Ohio.

Breaking Up With CONTENTdm: Why and How One Institution Took the Leap to Open Source

Heather Gilbert and Tyler Mobley

In 2011, College of Charleston found itself at a digital asset management crossroads. The Lowcountry Digital Library (LCDL), a multi-institution cooperative founded less than three years prior, was rapidly approaching its CONTENTdm license limit of 50,000 items. Understaffed and without a programmer, the College assessed their options and ultimately began construction on a Fedora Commons repository with a Blacklight discovery layer, an installation of Rutgers’ OpenWMS for Fedora ingestion and a Drupal front end as a replacement for their existing digital library. The system has been built and over 20,000 items have been migrated. The project was a success but a lot of hard lessons were learned.

Arduino-enabled Patron Interaction Counting

Tim Ribaric, Jonathan Younker

Using the Arduino development board (http://arduino.cc) has become a very popular way to create hardware prototypes that bridge the divide between the physical world and the Internet. This article outlines how to use an Arduino, some off-the-shelf electronic parts, the Processing programming language, and Google Documents to create a push-button reference desk transaction tally device.

The design: plugged into a computer at the reference desk, staff members push the appropriate button on the device when a reference transaction occurs, and the action is instantly tallied in a Google Document. Having a physical device on the desktop increases chances of proper collection of information since it is constantly visible and easily accessible, versus requiring staff members to click through a series of options in a piece of software running on the PC. The data can be tabulated in Google Documents or any other source that processes form-based HTML data.

This article covers all of the major components of creating the project:
- Constructing the Arduino circuit and programming it
- Creating the Google Docs form
- Creating the Processing program that will listen for information from the Arduino and send it to the Google Docs form