Issue 22, 2013-10-14

Editorial Introduction: Join Us at the Table

Sara Amato

Introducing Issue 22

VIAFbot and the Integration of Library Data on Wikipedia

Maximilian Klein and Alex Kyrios

This article presents a case study of a project, led by Wikipedians in Residence at OCLC and the British Library, to integrate authority data from the Virtual International Authority File (VIAF) with biographical Wikipedia articles. This linking of data represents an opportunity for libraries to present their traditionally siloed data, such as catalog and authority records, in more openly accessible web platforms. The project successfully added authority data to hundreds of thousands of articles on the English Wikipedia, and is poised to do so on the hundreds of other Wikipedias in other languages. Furthermore, the advent of Wikidata has created opportunities for further analysis and comparison of data from libraries and Wikipedia alike. This project, for example, has already led to insights into gender imbalance both on Wikipedia and in library authority work. We explore the possibility of similar efforts to link other library data, such as classification schemes, in Wikipedia.

From Finding Aids to Wiki Pages: Remixing Archival Metadata with RAMP

Timothy A. Thompson, James Little, David González, Andrew Darby, and Matt Carruthers

The Remixing Archival Metadata Project (RAMP) is a lightweight web-based editing tool that is intended to let users do two things: (1) generate enhanced authority records for creators of archival collections and (2) publish the content of those records as Wikipedia pages. The RAMP editor can extract biographical and historical data from EAD finding aids to create new authority records for persons, corporate bodies, and families associated with archival and special collections (using the EAC-CPF format). It can then let users enhance those records with additional data from sources like VIAF and WorldCat Identities. Finally, it can transform those records into wiki markup so that users can edit them directly, merge them with any existing Wikipedia pages, and publish them to Wikipedia through its API.

Thresholds for Discovery: EAD Tag Analysis in ArchiveGrid, and Implications for Discovery Systems

M. Bron, M. Proffitt and B. Washburn

The ArchiveGrid discovery system is made up in part of an aggregation of EAD (Encoded Archival Description) encoded finding aids from hundreds of contributing institutions. In creating the ArchiveGrid discovery interface, the OCLC Research project team has long wrestled with what we can reasonably do with the large (120,000+) corpus of EAD documents. This paper presents an analysis of the EAD documents (the largest analysis of EAD documents to date). The analysis is paired with an evaluation of how well the documents support various aspects of online discovery. The paper also establishes a framework for thresholds of completeness and consistency to evaluate the results. We find that, while the EAD standard and encoding practices have not offered support for all aspects of online discovery, especially in a large and heterogeneous aggregation of EAD documents, current trends suggest that the evolution of the EAD standard and the shift from retrospective conversion to new shared tools for improved encoding hold real promise for the future.

Fedora Commons With Apache Hadoop: A Research Study

Mohamed Mohideen Abdul Rasheed

The Digital Collections digital repository at the University of Maryland Libraries is growing and in need of a new backend storage system to replace the current filesystem storage. Though not a traditional storage management system, we chose to evaluate Apache Hadoop because of its large and growing community and software ecosystem. Additionally, Hadoop’s capabilities for distributed computation could prove useful in providing new kinds of digital object services and maintenance for ever increasing amounts of data. We tested storage of Fedora Commons data in the Hadoop Distributed File System (HDFS) using an early development version of Akubra-HDFS interface created by Frank Asseg. This article examines the findings of our research study, which evaluated Fedora-Hadoop integration in the areas of performance, ease of access, security, disaster recovery, and costs.

Harnessing Apache Mahout to Link Content

LIM Chee Kiam, Balakumar CHINNASAMY

The National Library Board of Singapore has successfully used Apache Mahout to link contents in several collections such as its Infopedia collection of articles (http://infopedia.nl.sg). This article introduces Apache Mahout (http://mahout.apache.org) and focuses on its ability to link content through text analytic techniques. The article will run through the what, why, and the how. If there is a big collection of content that needs to be linked, Apache Mahout may just be the answer.

For Video Streaming/Delivery: Is HTML5 the Real Fix?

Elías Tzoc and John Millard

The general movement towards streaming or playing videos on the web has grown exponentially in the last decade. The combination of new streaming technologies and faster Internet connections continue to provide enhanced and robust user experience for video content. For many organizations, adding videos on their websites has transitioned from a “cool” feature to a mission critical service. Some of the benefits in putting videos online include: to engage and convert visitors, to raise awareness or drive interest, to share inspirational stories or recent unique events, etc. Along with the growth in the use and need for video content on the web; delivering videos online also remains a messy activity for developers and web teams. Examples of existing challenges include creating more accessible videos with captions and delivering content (using adaptive streaming) for the diverse range of mobile and tablet devices. In this article, we report on the decision-making and early results in using the Kaltura video platform in two popular library platforms: CONTENTdm and DSpace.