Issue 26, 2014-10-21

Editorial Introduction: On Being on The Code4Lib Journal Editorial Committee

Kelley McGrath

Behind the scenes of the The Code4Lib Journal…

Archiving the Web: A Case Study from the University of Victoria

Corey Davis

The University of Victoria Libraries started archiving websites in 2013, and it quickly became apparent that many scholarly websites being produced by faculty, especially in the digital humanities, were going to prove very challenging to effectively capture and play back. This article will provide an overview of web archiving and explore the considerable legal and technical challenges of implementing a web archiving initiative at a research library, using the University of Victoria’s implementation of Archive-it, a web archiving service from the Internet Archive, as a case study, with a special focus on capturing complex, interactive websites that scholars are creating to disseminate their research in new ways.

Technical Challenges in Developing Software to Collect Twitter Data

Daniel Chudnov, Daniel Kerchner, Ankushi Sharma and Laura Wrubel

Over the past two years, George Washington University Libraries developed Social Feed Manager (SFM), a Python and Django-based application for collecting social media data from Twitter. Expanding the project from a research prototype to a more widely useful application has presented a number of technical challenges, including changes in the Twitter API, supervision of simultaneous streaming processes, management, storage, and organization of collected data, meeting researcher needs for groups or sets of data, and improving documentation to facilitate other institutions’ installation and use of SFM. This article will describe how the Social Feed Manager project addressed these issues, use of supervisord to manage processes, and other technical decisions made in the course of this project through late summer 2014. This article is targeted towards librarians and archivists who are interested in building collections around web archives and social media data, and have a particular interest in the technical work involved in applying software to the problem of building a sustainable collection management program around these sources.

Exposing Library Services with AngularJS

Jakob Voß and Moritz Horn

This article provides an introduction to the JavaScript framework AngularJS and specific AngularJS modules for accessing library services. It shows how information such as search suggestions, additional links, and availability can be embedded in any website. The ease of reuse may encourage more libraries to expose their services via standard APIs to allow usage in different contexts.

Hacking Summon 2.0 The Elegant Way

Annette Bailey and Godmar Back

Libraries have long been adding content and customizations to vendor-provided web-based search interfaces, including discovery systems such as ProQuest’s Summon(™). Unlike solutions based on using an API, these approaches augment the vendor-designed user interface using library-provided JavaScript code. Recently, vendors have been implementing such user interfaces using client-centric model-view-controller (MVC) frameworks such as AngularJS, which are characterized by the use of modern software engineering techniques such as domain-specific markup, data binding, encapsulation, and dependency injection.

Consequently, traditional approaches such as reverse-engineering the document model (DOM) have become more difficult or even impossible to use because the DOM is highly dynamic, the templates used are difficult to discern, the vendor-provided JavaScript code is both encapsulated and partially obfuscated, and the data binding mechanisms impose a strict separation of model and view that discourages direct DOM manipulation. In fact, practitioners have started to complain that AngularJS-based websites such as Summon 2.0 are very difficult to enhance with custom content in a robust and efficient manner.

In this article, we show how to reverse-engineer the AngularJS-based Summon 2.0 interface to discover the modules, directives, controllers, and services it uses, and we explain how we can use AngularJS’s built-in mechanisms to create new directives and controllers that integrate with and augment the vendor-provided ones to add desired customization and interactions.

We have implemented several features that demonstrate our approach, such as a click-recording script, COinS and facet customization, and the integration of eBook public notes. Our explanation and code should be of direct use for adoption or as examples for other Summon 2.0 customers, but they may also be useful to anyone faced with the need to add enhancements to other vendor-controlled MVC-based sites.

Parsing and Matching Dates in VIAF

Jenny A. Toves and Thomas B. Hickey

The Virtual International Authority File (OCLC Online Computer Library Center 2013) http://viaf.org is built from dozens of authority files with tens of millions of names in more than 150 million authority and bibliographic records expressed in multiple languages, scripts and formats. One of the main tasks in VIAF is to bring together personal names which may have various dates associated with them, such as birth, death or when they were active. These dates can be quite complicated with ranges, approximations, BCE dates, different scripts, and even different calendars. Analysis of the nearly 400,000 unique date strings in VIAF led us to a parsing technique that relies on only a few basic patterns for them. Our goal is to correctly interpret at least 99% of all the dates we find in each of VIAF’s authority files and to use the dates to facilitate matches between authority records.

Python source code for the process described here is available at https://github.com/OCLC-Developer-Network/viaf-dates.

Mdmap: A Tool for Metadata Collection and Matching

Rico Simke

This paper describes a front-end for the semi-automatic collection, matching, and generation of bibliographic metadata obtained from different sources for use within a digitization architecture. The Library of a Billion Words project is building an infrastructure for digitizing text that requires high-quality bibliographic metadata, but currently only sparse metadata from digitized editions is available. The project’s approach is to collect metadata for each digitized item from as many sources as possible. An expert user can then use an intuitive front-end tool to choose matching metadata. The collected metadata are centrally displayed in an interactive grid view. The user can choose which metadata they want to assign to a certain edition, and export these data as MARCXML. This paper presents a new approach to bibliographic work and metadata correction. We try to achieve a high quality of the metadata by generating a large amount of metadata to choose from, as well as by giving librarians an intuitive tool to manage their data.

Using Zapier with Trello for Electronic Resources Troubleshooting Workflow

Meghan Finch

Troubleshooting access problems is an important part of the electronic resources management workflow. This article discusses an opportunity to streamline and track troubleshooting using two web-based services: Trello and Zapier.

Developing Applications in the Era of Cloud-based SaaS Library Systems

Josh Weisman

As the move to cloud-based SaaS library systems accelerates, we must consider what it means to develop applications when the core of the system isn’t under the library’s control. The entire application lifecycle is changing, from development to testing to production. Developing applications for cloud solutions raises new concerns, such as security, multi-tenancy, latency, and analytics. In this article, we review the landscape and suggest a view of how to be successful for the benefit of library staff and end-users in this new reality. We discuss what kinds of APIs and protocols vendors should be supporting, and suggest how best to take advantage of the innovations being introduced.