Extending and Adapting Metadata Audit Tools for Mountain West Digital Library Members
As a DPLA regional service hub, Mountain West Digital Library harvests metadata from 16 member repositories representing over 70 partners throughout the Western US and hosts over 950,000 records in its portal. The collections harvested range in size from a handful of records to many thousands, presenting both quality control and efficiency issues. To assist members in auditing records for metadata required by the MWDL Metadata Application Profile before harvesting, MWDL hosts a metadata auditing tool adapted from North Carolina Digital Heritage Center’s original DPLA OAI Aggregation Tools project, available on GitHub. The tool uses XSL tests of the OAI-PMH stream from a repository to check conformance of incoming data with the MWDL Metadata Application Profile. Use of the tool enables student workers and non-professionals to perform large-scale metadata auditing even if they have no prior knowledge of application profiles or metadata auditing workflows.
In the spring of 2018, we further adapted and extended this tool to audit collections coming from a new member, Oregon Digital. The OAI-PMH provision from Oregon Digital’s Samvera repository is configured differently than that of the CONTENTdm repositories used by existing MWDL members, requiring adaptation of the tool. We also extended the tool by adding the Dublin Core Facet Viewer, which gives the ability to view and analyze values used in both required and recommended fields by frequency.
Use of this tool enhances metadata completeness, correctness, and consistency. This article will discuss the technical challenges of project, offer code samples, and offer ideas for further updates.
Editorial: Beyond Posters: On Hospitality in Libtech
In this editorial, I will be using the word hospitality to mean the intentional welcome of others into a space which one currently occupies, possibly as a member of a dominant group. I do not wish to encourage the idea that one should cultivate or maintain a role of benevolent host in a way that forces others to remain forever guest or outsider, although there will always be newcomers. Hospitality may be a first step to ceding one’s position as host in a space. It may be expanding that space to become a place with many potential hosts, each respected for their varied contributions and skillsets. It may also be supporting those in a different space or a different role, such as those who use the technologies we build and support (both colleagues and patrons), and respecting them in that space.
Wikidata: a platform for your library’s linked open data
Seized with the desire to improve the visibility of Canadian music in the world, a ragtag band of librarians led by Stacy Allison-Cassin set out to host Wikipedia edit-a-thons in the style of Art+Feminism, but with a focus on addressing Canadian music instead. Along the way, they recognized that Wikidata offered a low-barrier, high-result method of making that data not only visible but reusable as linked open data, and consequently incorporated Wikidata into their edit-a-thons. This is their story.
Countering Stryker’s Punch: Algorithmically Filling the Black Hole
Two current digital image editing programs are examined in the context of filling in missing visual image data from hole-punched United States Farm Security Administration (FSA) negatives. Specifically, Photoshop’s Content-Aware Fill feature and GIMP’s Resynthesizer plugin are evaluated and contrasted against comparable images. A possible automated workflow geared towards large scale editing of similarly hole-punched negatives is also explored. Finally, potential future research based upon this study’s results are proposed in the context of leveraging previously-enhanced, image-level metadata.
Recommendations for the application of Schema.org to aggregated Cultural Heritage metadata to increase relevance and visibility to search engines: the case of Europeana
Europeana provides access to more than 54 million cultural heritage objects through its portal Europeana Collections. It is crucial for Europeana to be recognized by search engines as a trusted authoritative repository of cultural heritage objects. Indeed, even though its portal is the main entry point, most Europeana users come to it via search engines.
Europeana Collections is fuelled by metadata describing cultural objects, represented in the Europeana Data Model (EDM). This paper presents the research and consequent recommendations for publishing Europeana metadata using the Schema.org vocabulary and best practices. Schema.org html embedded metadata to be consumed by search engines to power rich services (such as Google Knowledge Graph). Schema.org is an open and widely adopted initiative (used by over 12 million domains) backed by Google, Bing, Yahoo!, and Yandex, for sharing metadata across the web It underpins the emergence of new web techniques, such as so called Semantic SEO.
Our research addressed the representation of the embedded metadata as part of the Europeana HTML pages and sitemaps so that the re-use of this data can be optimized.
The practical objective of our work is to produce a Schema.org representation of Europeana resources described in EDM, being the richest as possible and tailored to Europeana’s realities and user needs as well the search engines and their users.
Editorial: Introspection as Activism, or, Getting Our Houses in Order
Those of us in libraries like to trace our history to Alexandria or to the French governmental system of record-keeping, but the construction of the modern GLAM world is far more recent, almost as new as coding. It has evolved almost as rapidly. And its future is on us, whether we choose to passively accept a status quo others build or to act and grow and develop ourselves and our workplaces.
Supporting Oral Histories in Islandora
Since 2014, the University of Toronto Scarborough Library’s Digital Scholarship Unit (DSU) has been working on an Islandora-based solution for creating and stewarding oral histories (the Oral Histories solution pack). Although regular updates regarding the status of this work have been presented at Open Repositories conferences, this is the first article to describe the goals and features associated with this codebase, as well as the roadmap for development. An Islandora-based approach is appropriate for addressing the challenges of Oral History, an interdisciplinary methodology with complex notions of authorship and audience that both brings a corresponding complexity of use cases and roots Oral Histories projects in the ever-emergent technical and preservation challenges associated with multimedia and born digital assets. By leveraging Islandora, those embarking on Oral Histories projects benefit from existing community-supported code. By writing and maintaining the Oral Histories solution pack, the library seeks to build on common ground for those supporting Oral Histories projects and encourage a sustainable solution and feature set.
Digital Archaeology and/or Forensics: Working with Floppy Disks from the 1980s
While software originating from the domain of digital forensics has demonstrated utility for data recovery from contemporary storage media, it is not as effective for working with floppy disks from the 1980s. This paper details alternative strategies for recovering data from floppy disks employing software originating from the software preservation and retro computing communities. Imaging hardware, storage formats and processing workflows are also discussed.
How to Party Like it’s 1999: Emulation for Everyone
Emulated access of complex media has long been discussed, but there are very few instances in which complex, interactive, born-digital emulations are available to researchers. New York Public Library has made 1980-90’s era video games from 5.25″ floppy disks in the Timothy Leary Papers accessible via a DosBox emulator. These games appear in various stages of development and display the work of at least four of Leary’s collaborators on the games. 56 disk images from the Leary Papers are currently emulated in the reading room. New York University has made late 1990s-mid 2000’s era Photoshop files from the Jeremy Blake Papers accessible to researchers. The Blake Papers include over 300 pieces of media. Cornell University Library was awarded a grant from the NEH to analyze approximately 100 born-digital artworks created for CD-ROM from the Rose Goldsen Archive of New Media Art to develop preservation workflows, access strategies, and metadata frameworks. Rhizome has undertaken a number of emulation projects as a major part of its preservation strategy for born-digital artworks. In cooperation with the University of Freiburg in Germany, Rhizome recently restored several digital artworks for public access using a cloud-based emulation framework. This framework (bwFLA) has been designed to facilitate the reenactments of software on a large scale, for internal use or public access. This paper will guide readers through how to implement emulation. Each of the institutions weigh in on oddities and idiosyncrasies they encountered throughout the process — from accession to access.
Peripleo: a Tool for Exploring Heterogeneous Data through the Dimensions of Space and Time
This article introduces Peripleo, a prototype spatiotemporal search and visualization tool. Peripleo enables users to explore the geographic, temporal and thematic composition of distributed digital collections in their entirety, and then to progressively filter and drill down to explore individual records. We provide an overview of Peripleo’s features, and present the underlying technical architecture. Furthermore, we discuss how datasets that differ vastly in terms of size, content type and theme can be made uniformly accessible through a set of lightweight metadata conventions we term “connectivity through common references”. Our current demo installation links approximately half a million records from 25 datasets. These datasets originate from a spectrum of sources, ranging from the small personal photo collection with 35 records, to the large institutional database with 134.000 objects. The product of research in the Andrew W. Mellon-funded Pelagios 3 project, Peripleo is Open Source software.
Data Munging Tools in Preparation for RDF: Catmandu and LODRefine
Data munging, or the work of remediating, enhancing and transforming library datasets for new or improved uses, has become more important and staff-inclusive in many library technology discussions and projects. Many times we know how we want our data to look, as well as how we want our data to act in discovery interfaces or when exposed, but we are uncertain how to make the data we have into the data we want. This article introduces and compares two library data munging tools that can help: LODRefine (OpenRefine with the DERI RDF Extension) and Catmandu.
The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution’s metadata migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and the expansion of vocabulary reconciliation services are mentioned.