Issue 26, 2014-10-21
Behind the scenes of the The Code4Lib Journal…
The University of Victoria Libraries started archiving websites in 2013, and it quickly became apparent that many scholarly websites being produced by faculty, especially in the digital humanities, were going to prove very challenging to effectively capture and play back. This article will provide an overview of web archiving and explore the considerable legal and technical challenges of implementing a web archiving initiative at a research library, using the University of Victoria’s implementation of Archive-it, a web archiving service from the Internet Archive, as a case study, with a special focus on capturing complex, interactive websites that scholars are creating to disseminate their research in new ways.
Over the past two years, George Washington University Libraries developed Social Feed Manager (SFM), a Python and Django-based application for collecting social media data from Twitter. Expanding the project from a research prototype to a more widely useful application has presented a number of technical challenges, including changes in the Twitter API, supervision of simultaneous streaming processes, management, storage, and organization of collected data, meeting researcher needs for groups or sets of data, and improving documentation to facilitate other institutions’ installation and use of SFM. This article will describe how the Social Feed Manager project addressed these issues, use of supervisord to manage processes, and other technical decisions made in the course of this project through late summer 2014. This article is targeted towards librarians and archivists who are interested in building collections around web archives and social media data, and have a particular interest in the technical work involved in applying software to the problem of building a sustainable collection management program around these sources.
In this article, we show how to reverse-engineer the AngularJS-based Summon 2.0 interface to discover the modules, directives, controllers, and services it uses, and we explain how we can use AngularJS’s built-in mechanisms to create new directives and controllers that integrate with and augment the vendor-provided ones to add desired customization and interactions.
We have implemented several features that demonstrate our approach, such as a click-recording script, COinS and facet customization, and the integration of eBook public notes. Our explanation and code should be of direct use for adoption or as examples for other Summon 2.0 customers, but they may also be useful to anyone faced with the need to add enhancements to other vendor-controlled MVC-based sites.
The Virtual International Authority File (OCLC Online Computer Library Center 2013) http://viaf.org is built from dozens of authority files with tens of millions of names in more than 150 million authority and bibliographic records expressed in multiple languages, scripts and formats. One of the main tasks in VIAF is to bring together personal names which may have various dates associated with them, such as birth, death or when they were active. These dates can be quite complicated with ranges, approximations, BCE dates, different scripts, and even different calendars. Analysis of the nearly 400,000 unique date strings in VIAF led us to a parsing technique that relies on only a few basic patterns for them. Our goal is to correctly interpret at least 99% of all the dates we find in each of VIAF’s authority files and to use the dates to facilitate matches between authority records.
Python source code for the process described here is available at https://github.com/OCLC-Developer-Network/viaf-dates.
This paper describes a front-end for the semi-automatic collection, matching, and generation of bibliographic metadata obtained from different sources for use within a digitization architecture. The Library of a Billion Words project is building an infrastructure for digitizing text that requires high-quality bibliographic metadata, but currently only sparse metadata from digitized editions is available. The project’s approach is to collect metadata for each digitized item from as many sources as possible. An expert user can then use an intuitive front-end tool to choose matching metadata. The collected metadata are centrally displayed in an interactive grid view. The user can choose which metadata they want to assign to a certain edition, and export these data as MARCXML. This paper presents a new approach to bibliographic work and metadata correction. We try to achieve a high quality of the metadata by generating a large amount of metadata to choose from, as well as by giving librarians an intuitive tool to manage their data.
Troubleshooting access problems is an important part of the electronic resources management workflow. This article discusses an opportunity to streamline and track troubleshooting using two web-based services: Trello and Zapier.
As the move to cloud-based SaaS library systems accelerates, we must consider what it means to develop applications when the core of the system isn’t under the library’s control. The entire application lifecycle is changing, from development to testing to production. Developing applications for cloud solutions raises new concerns, such as security, multi-tenancy, latency, and analytics. In this article, we review the landscape and suggest a view of how to be successful for the benefit of library staff and end-users in this new reality. We discuss what kinds of APIs and protocols vendors should be supporting, and suggest how best to take advantage of the innovations being introduced.