Issue 32, 2016-04-25
by Meghan Finch Two issues ago, coordinating editor Carol Bean identified a focus on data, in our profession and in the Issue 30 articles, and also recognized that as information professionals, it goes beyond just the data to the conventions and standards necessary for working with data.  I’d like to offer a similar sentiment […]
An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter
This article examines the tools, approaches, collaboration, and findings of the Web Archives for Historical Research Group around the capture and analysis of about 4 million tweets during the 2015 Canadian Federal Election. We hope that national libraries and other heritage institutions will find our model useful as they consider how to capture, preserve, and analyze ongoing events using Twitter.
While Twitter is not a representative sample of broader society – Pew research shows in their study of US users that it skews young, college-educated, and affluent (above $50,000 household income) – Twitter still represents an exponential increase in the amount of information generated, retained, and preserved from 'everyday' people. Therefore, when historians study the 2015 federal election, Twitter will be a prime source.
On August 3, 2015, the team initiated both a Search API and Stream API collection with twarc, a tool developed by Ed Summers, using the hashtag #elxn42. The hashtag referred to the election being Canada's 42nd general federal election (hence 'election 42' or elxn42). Data collection ceased on November 5, 2015, the day after Justin Trudeau was sworn in as the 42nd Prime Minister of Canada. We collected for a total of 102 days, 13 hours and 50 minutes.
To analyze the data set, we took advantage of a number of command line tools, utilities that are available within twarc, twarc-report, and
jq. In accordance with the Twitter Developer Agreement & Policy, and after ethical deliberations discussed below, we made the tweet IDs and other derivative data available in a data repository. This allows other people to use our dataset, cite our dataset, and enhance their own research projects by drawing on #elxn42 tweets.
Our analytics included:
- breaking tweet text down by day to track change over time;
- client analysis, allowing us to see how the scale of mobile devices affected medium interactions;
- URL analysis, comparing both to Archive-It collections and the Wayback Availability API to add to our understanding of crawl completeness;
- and image analysis, using an archive of extracted images.
Our article introduces our collecting work, ethical considerations, the analysis we have done, and provides a framework for other collecting institutions to do similar work with our off-the-shelf open-source tools. We conclude by ruminating about connecting Twitter archiving with a broader web archiving strategy.
Emulated access of complex media has long been discussed, but there are very few instances in which complex, interactive, born-digital emulations are available to researchers. New York Public Library has made 1980-90’s era video games from 5.25″ floppy disks in the Timothy Leary Papers accessible via a DosBox emulator. These games appear in various stages of development and display the work of at least four of Leary’s collaborators on the games. 56 disk images from the Leary Papers are currently emulated in the reading room. New York University has made late 1990s-mid 2000’s era Photoshop files from the Jeremy Blake Papers accessible to researchers. The Blake Papers include over 300 pieces of media. Cornell University Library was awarded a grant from the NEH to analyze approximately 100 born-digital artworks created for CD-ROM from the Rose Goldsen Archive of New Media Art to develop preservation workflows, access strategies, and metadata frameworks. Rhizome has undertaken a number of emulation projects as a major part of its preservation strategy for born-digital artworks. In cooperation with the University of Freiburg in Germany, Rhizome recently restored several digital artworks for public access using a cloud-based emulation framework. This framework (bwFLA) has been designed to facilitate the reenactments of software on a large scale, for internal use or public access. This paper will guide readers through how to implement emulation. Each of the institutions weigh in on oddities and idiosyncrasies they encountered throughout the process — from accession to access.
Our application team was struggling. We had good people and the desire to create good software, but the library as an organization did not yet have experience with software development processes. Work halted. Team members felt unfulfilled. The once moderately competent developer felt frustrated, ashamed, helpless, and incompetent. Then, miraculously, a director with experience in software project management and an experienced and talented systems administrator were hired and began to work with the team. People in the group developed a sense of teamwork that they had not experienced in their entire time at the library. Now we are happy, excited, and energetic. We hope that you will appreciate our “feel-good” testimony of how excellent people and appropriate processes transformed an unhealthy work environment into a fit and happy team.
The scientific community’s growing eagerness to make research data available to the public provides libraries — with our expertise in metadata and discovery — an interesting new opportunity. This paper details the in-house creation of a “data catalog” which describes datasets ranging from population-level studies like the US Census to small, specialized datasets created by researchers at our own institution. Based on Symfony2 and Solr, the data catalog provides a powerful search interface to help researchers locate the data that can help them, and an administrative interface so librarians can add, edit, and manage metadata elements at will. This paper will outline the successes, failures, and total redos that culminated in the current manifestation of our data catalog.
We describe the design, development, and deployment of a library tour application utilizing Bluetooth Low Energy devices know as iBeacons. The tour application will serve as library orientation for incoming students. The students visit stations in the library with mobile equipment running a special tour app. When the app detects a beacon nearby, it automatically plays a video that describes the current location. After the tour, students are assessed according to the defined learning objectives.
Special attention is given to issues encountered during development, deployment, content creation, and testing of this application that depend on functioning hardware, and the necessity of appointing a project manager to limit scope, define priorities, and create an actionable plan for the experiment.
This article is based on an independent cyber security risk management audit for a public library system completed by the authors in early 2015 and based on a research paper by the same group at Clark University in 2014. We stress that while cyber security must include raising public knowledge in regard to cyber security issues and resources, and libraries are indeed the perfect place to disseminate this knowledge, librarians are also in a unique position as the gatekeepers of information services provided to the public and should conduct internal audits to ensure our content partners and IT vendors take cyber security as seriously as the library and its staff.
One way to do this is through periodic reviews of existing vendor relationships. To this end, the authors created a simple grading rubric you can adopt or modify to help take this first step towards securing your library data. It is intended to be used by both technical and non-technical staff as a simple measurement of what vendor agreements currently exist and how they rank, while at the same time providing a roadmap for which security features or policy statements the library can or should require moving forward.
This article describes the use of discovery system search logs as a vehicle for encouraging constructive conversations across departments in an academic library. The project focused on bringing together systems and teaching librarians to evaluate the results of anonymized patron searches in order to improve communication across departments, as well as to identify opportunities for improvement to the discovery system itself.