Issue 30, 2015-10-15
Data capture and use is not new to libraries. We know data isn’t everything, but it is ubiquitous in our work, enabling myriads of new ideas and projects. Articles in this issue reflect the expansion of data creation, capture, use, and analysis in library systems and services.
WorldCat records are clustered into works, and within works, into content and manifestation clusters. A recent project revisited the clustering of collected works that had been previously sidelined because of the challenges posed by their complexity. Attention was given to both the identification of collected works and to the determination of the component works within them. By extensively analysing cast-list information, performance notes, contents notes, titles, uniform titles and added entries, the contents of collected works could be identified and differentiated so that correct clustering was achieved. Further work is envisaged in the form of refining the tests and weights and also in the creation and use of name/title authority records and other knowledge cards in clustering. There is a requirement to link collected works with their component works for use in search and retrieval.
Data munging, or the work of remediating, enhancing and transforming library datasets for new or improved uses, has become more important and staff-inclusive in many library technology discussions and projects. Many times we know how we want our data to look, as well as how we want our data to act in discovery interfaces or when exposed, but we are uncertain how to make the data we have into the data we want. This article introduces and compares two library data munging tools that can help: LODRefine (OpenRefine with the DERI RDF Extension) and Catmandu.
The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution’s metadata migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and the expansion of vocabulary reconciliation services are mentioned.
The use of research impact metrics and analytics has become an integral component to many aspects of institutional assessment. Many platforms currently exist to provide such analytics, both proprietary and open source; however, the functionality of these systems may not always overlap to serve uniquely specific needs. In this paper, I describe a novel web-based platform, named Manifold, that I built to serve custom research impact assessment needs in the University of Minnesota Medical School. Built on a standard LAMP architecture, Manifold automatically pulls publication data for faculty from Scopus through APIs, calculates impact metrics through automated analytics, and dynamically generates report-like profiles that visualize those metrics. Work on this project has resulted in many lessons learned about challenges to sustainability and scalability in developing a system of such magnitude.
Open Journal Systems and Dataverse Integration– Helping Journals to Upgrade Data Publication for Reusable Research
This article describes the novel open source tools for open data publication in open access journal workflows. This comprises a plugin for Open Journal Systems that supports a data submission, citation, review, and publication workflow; and an extension to the Dataverse system that provides a standard deposit API. We describe the function and design of these tools, provide examples of their use, and summarize their initial reception. We conclude by discussing future plans and potential impact.
Collecting and Describing University-Generated Patents in an Institutional Repository: A Case Study from Rice University
Providing an easy method of browsing a university’s patent output can free up valuable research time for faculty, students, and external researchers. This is especially true for Rice University’s Fondren Library, a USPTO-designated Patent and Trademark Resource Center that serves an academic community widely recognized for cutting edge science and engineering research. In order to make Rice-generated patents easier to find in the university’s community, a team of technical and public services librarians from Fondren Library devised a method to identify, download, and upload patents to the university’s institutional repository, starting with a backlog of over 300. This article discusses the rationale behind the project, its potential benefits, and challenges as new Rice-generated patents are added to the repository on a monthly basis.
Innovative Interface’s Sierra(™) Integrated Library System (ILS) brings with it a Database Navigator Application (SierraDNA) – in layman’s terms SierraDNA gives Sierra sites read access to their ILS database. Unlike the closed use cases produced by vendor supplied APIs, which restrict Libraries to limited development opportunities, SierraDNA enables sites to publish their own APIs and scripts based upon custom SQL code to meet their own needs and those of their users and processes.
In this article we give examples showing how SierraDNA can be utilized to improve Library services. We highlight three example use cases which have benefited our users, enhanced online security and improved our back office processes.
In the first use case we employ user access data from our electronic resources proxy server (WAM) to detect hacked user accounts. Three scripts are used in conjunction to flag user accounts which are being hijacked to systematically steal content from our electronic resource provider’s websites. In the second we utilize the reading histories of our users to augment our search experience with an Amazon style “People who borrowed this book also borrowed…these books” feature. Two scripts are used together to determine which other items were borrowed by borrowers of the item currently of interest. And lastly, we use item holds data to improve our acquisitions workflow through an automated demand based ordering process.
Our explanation and SQL code should be of direct use for adoption or as examples for other Sierra customers willing to exploit their ILS data in similar ways, but the principles may also be useful to non-Sierra sites that also wish to enhancement security, user services or improve back office processes.
This article starts by examining why a Chrome Extension was desired and how we saw it making the workflow for requesting new items both easier and more accurate. We then go on to outline how we constructed our extension, looking at the folder structure, third party scripts and services that combine to make this achievable. Finally, the article looks at how the extension is regulated and plans for future development.
Long-term access to digitized audio may be heavily dependent on the quality of technical metadata captured during digitization. The AES57-2011 standard offers a standardized method of documenting fairly comprehensive technical information, but its complexity may be confusing. In an effort to lower the barrier to use, we have developed software that generates valid AES57 files for digitized audio, using output from FITS (File Information Tool Set) and a few fields of information from a tab-delimited spreadsheet. This article will describe the logic used, the fields required, the basic process, applications, and options for further development.
With funding from an Institute of Museum and Library Services (IMLS) Sparks! Ignition Grant, researchers from the University of Illinois Library designed and tested a mobile recommender app with augmented reality features. By embedding open source optical character recognition software into a “Topic Space” module, the augmented reality app can recognize call numbers on a book in the library and suggest relevant items that are not shelved nearby. Topic Space can also show users items that are normally shelved in the starting location but that are currently checked out. Using formative UX methods, grant staff shaped app interface and functionality through early user testing. This paper reports results of UX testing; a redesigned mobile interface, and provides considerations on the future development of personalized recommendation functionality.
The SELIDA framework is an integration layer of standardized services that takes an Internet-of-Things approach for item traceability in the library setting. The aim of the framework is to provide tracing of RFID tagged physical items among or within various libraries. Using SELIDA we are able to integrate typical library services—such as checking in or out items at different libraries with different Integrated Library Systems—without requiring substantial changes, code-wise, in their structural parts. To do so, we employ the Object Naming Service mechanism that allows us to retrieve and process information from the Electronic Product Code of an item and its associated services through the use of distributed mapping servers. We present two use case scenarios involving the Koha open source ILS and we briefly discuss the potential of this framework in supporting bibliographic Linked Data.