Issue 55, 2023-1-20

Revamping Metadata Maker for ‘Linked Data Editor’: Thinking Out Loud

With the development of linked data technologies and launch of the Bibliographic Framework Initiative (BIBFRAME), the library community has conducted several experiments to design and build linked data editors. While efforts have been made to create original linked data ‘records’ from scratch, less attention has been given to copy cataloging workflows in a linked data environment. Developed and released as an open-source application in 2015, Metadata Maker is a cataloging creation tool that allows users to create bibliographic metadata without previous knowledge in cataloging. Metadata Maker might have the potential to be adopted by paraprofessional catalogers in practice with new linked data sources added, including auto suggestion of Virtual International Authority File (VIAF) personal name and Library of Congress Subject Heading (LCSH) recommendations based on the users’ text input. This article introduces those new features, shares the user testing results, and discusses the possible future steps.

by Greta Heng, Myung-Ja Han

Introduction

Libraries have been using MAchine Readable Cataloging (MARC) as a tool to create bibliographic and authority data since the 1960s. While MARC brought libraries a new way to organize information in the past, the evolving information landscape asks for libraries to explore other means of information organization that can connect library collections with resources on the Web.

As a successor to MARC, Bibliographic Framework (BIBFRAME) initiative was launched by the Library of Congress (LC) in 2012.[1] It is expressed in the Resource Description Framework (RDF, a data model for structured data)[2] and based on three categories of abstraction (work, instance, item). As the library’s new entity relation data model, BIBFRAME is grounded in linked data techniques, which allows metadata creators to build relationships with web resources by facilitating shared structured data and Uniform Resource Identifiers (URIs).

Many national and research libraries have been exploring the possibility of converting the MARC format metadata to BIBFRAME and, even further, creating metadata as linked data using a linked data/BIBFRAME editor. Libraries such as the Swedish National Library,[3] the French National Library,[4] the German National Library (DNB),[5] and the Library of Congress[6] have been involved in the MARC to linked data conversion, linked data based new discovery services, and linked data editor experiments.

In addition, some external linked data management platforms are gaining popularity among GLAM (galleries, libraries, archives, and museums) institutions. Wikidata,[7] an open, collaborative, and multilingual global linked data repository, is being used by libraries as an alternate source of name and subject data for bibliographic description. However, since Wikidata is designed to represent all domains of knowledge and not specific for library use, concerns about its capacity and suitability for describing library resources were raised by the Wikidata community.[8]

While there has been much discussion on and development of tools for creating full original linked data, less attention has been given to copy cataloging workflows (creating new short records by deriving from other records or creating minimum records) in linked data environments.

Developed and released as an open-source application in 2015, Metadata Maker[9] is a metadata creation tool that can be used by anyone regardless of their cataloging experience and knowledge, allowing them to create a minimum level catalog record. Metadata Maker has been updated in several areas since then, including supporting different formats of resources (currently in ten modules) and BIBFRAME output service for monographs.[10] As more and more cataloging and metadata creation work relies on paraprofessional catalogers or non-catalogers[11] with language or subject expertise, the authors tried to revamp Metadata Maker with linked data authority services to test whether this tool and the updated functions can facilitate the minimum record creation in a linked data cataloging environment. This paper shares the revamping process and issues found in linked data sources and their service, and discusses the user testing results of Metadata Maker and a BIBFRAME editor.

Changing Landscape

The development of linked data technologies brought out a systematic change in libraries’ cataloging production practice. As Van der Werf said, “libraries used to be knowledge organizations and library professionals were trained in bibliographic description and authority control. Now, authorities are called entities and the new description logic is about creating a ‘Knowledge Graph of Entities.’”[12] It is noticeable that the focus of metadata creation has gradually shifted from the curation of text strings to the management of entities (work, persons, corporate bodies, places, events, etc), i.e., linking resources using URIs and managing URIs instead of name strings.[13] This revolution has triggered a discussion on linked data cataloging models, standards, and tools in the library.

Changing library cataloging production practice

Libraries have carried out several initiatives to re-design cataloging workflows and devise the transition plan from traditional cataloging to linked data cataloging, for example, the development of MARC to BIBFRAME conversion tools and BIBFRAME editors. Notably, the Linked Data for Libraries (LD4L)[14] community made a series of significant efforts on linked data cataloging from 2014 to 2022, including Linked Data for Libraries Labs (LD4L Labs),[15] Linked Data for Production (LD4P),[16] Linked Data for Production: Pathway to Implementation (LD4P2),[17] and Linked Data for Production: Closing the Loop (LD4P3).[18]

Albeit those new linked data cataloging tools, catalogers need to be versed in new linked data related knowledge and exercise new skills, such as RDF, SPARQL, BIBFRAME ontology, and more, to create library data as linked data. In addition, as linked data implementations in libraries are still under development, it is hard to keep up to date with the most current linked data application developments, e.g., BIBFRAME editors. It is challenging to identify the type of skills that catalogers need to be developing. As a result, catalogers may feel overwhelmed by the new linked data technology, and administrators are experiencing challenges in designing and providing training for the ever-growing skill set and emerging linked data tools for catalogers.[19]

The shifting roles of librarians and staff in technical services are an additional challenge in linked data training and planning. Libraries used to depend on professional cataloging librarians to do original cataloging. Copy cataloging was usually performed by paraprofessional catalogers. However, this is no longer true. With shrinking budgets, organizational restructuring, and changes in cataloging software and workflows, more paraprofessional staff are responsible for both original and copy cataloging tasks (El-Sherbini & Klim, 1997; Zhu, 2012).[20] As Van der Werf articulated, the number of professional librarians is decreasing while paraprofessional staff are increasing in cataloging departments.[21] In fact, not only are professional librarians decreasing, but the whole cataloging team is also shrinking. While there are several options that can ease the shortage of manpower, such as outsourcing to vendors, cooperative cataloging programs, and more productive cataloging workflows, libraries still lack staff with expertise to catalog special collections and/or foreign language materials. The need for foreign language and special collection cataloging will not go away in a linked data environment as libraries keep purchasing resources from foreign countries and work with perpetual backlogs.

BIBFRAME Editors and Copy Cataloging

Currently, there are three BIBFRAME editors that are widely known and used: LC’s BIBFRAME Editor,[22] Marva,[23] and LD4P’s Sinopia.[24] All three editors seem to target experienced catalogers as their user group, not paraprofessional catalogers or non-catalogers. For one, they use the Resource Description and Access (RDA) terms[25] as field names and BIBFRAME’s three categories of abstraction, work, instance, item, as record/data types. Those cataloging terms, though commonly used by professional catalogers, may result in a learning curve for paraprofessional catalogers. For example, “parallel title” is not a common phrase and the differences between work and instance are not self-explanatory for many. For another, some abbreviations that appear in the user interface as controlled vocabularies, including Getty_AAT, LCGFT, and GAC, are not familiar to paraprofessional catalogers. In order to use the editor and add appropriate values to those data fields, it requires training on RDA, BIBFRAME ontology, authority, and the editor itself at the very least.

Another challenge is a lack of clear definition as to what makes full level and brief BIBFRAME data. The core BIBFRAME data fields are still under discussion by the Program for Cooperative Cataloging (PCC) BIBFRAME Interoperability Group (BIG).[26] As there are no clear guidelines, some BIBFRAME editors mark required fields while some do not. For catalogers or users of BIBFRAME editors, it seems that one needs to fill out all fields to create full-level BIBFRAME data and provide values for those required fields, if applied, to generate brief BIBFRAME data. As there is no quick way of filling out the minimum data fields to produce brief BIBFRAME data, the cataloging workflow used in the current BIBFRAME editors might not meet libraries’ needs for cataloging large volumes of perpetual backlogs with a shrinking cataloging team.

Lorimer (2022) stated that the notion of copy cataloging has broadened and expanded in a linked data environment,[27] which emphasizes reusing metadata rather than creating completely new metadata from scratch. Some BIBFRAME editors like Sinopia indeed allow catalogers to search, load, and copy or clone existing BIBFRAME data to revise and reuse those descriptions by sharing URIs. This workflow would help reduce duplicate work-level bibliographic records and increase cataloging efficiency. Yet, considering the reality and looking into the future, libraries, with professional catalogers and language/subject experts shortage, will have to resort to non-catalogers and paraprofessional catalogers with limited linked data and cataloging knowledge to create records in BIBFRAME editors. Shall users adapt to the BIBFRAME editors or shall the editors be designed to be more friendly to their users? This dilemma raises a question: Is it possible to build a linked data editor without cataloging jargon in the application interface?

Given the above mentioned issues, this project is an attempt to build a straightforward linked data editor that does not use RDA terms for non-catalogers for the purpose of copy cataloging. Libraries may benefit from adopting Metadata Maker as it does not require new hiring or training for catalogers and allows non-catalogers with needed language/subject knowledge to create minimum level cataloging records. The authors also conducted a small-scale survey to learn catalogers’ opinions about Metadata Maker and a linked data editor.

Revamping Metadata Maker

Metadata Maker enables any user to create catalog records that are “good enough” (provide sufficient information to identify a bibliographic item and generate a basic bibliographic description)[28] in various formats, including MARC, regardless of one’s knowledge of or experience with cataloging standards, integrated library systems, or OCLC. It now has ten different modules or templates (datasets[29], monographs[30], monographs (LD)[31], ebooks[32], government documents[33], maps[34], microfilms[35], scores[36], serials[37], theses and dissertations[38]). Users can select a module based on the resource type, fill out basic information about the resource, and choose the download format, including MARC binary, MARCXML, Metadata Object Description Schema (MODS), HTML, and BIBFRAME.[39]

For this phase, two new linked data features, Virtual International Authority File (VIAF) personal name suggestions and Library of Congress Subject Heading (LCSH) suggestions were added in the Monographs (LD) module in Metadata Maker. The new functions support search and auto completion of personal names in VIAF, and LCSH (keywords) generation based on the user provided text. URIs of the controlled terms are added in the output metadata.


Figure 1. Metadata Maker Interface Screenshot.

Linked Data Input

VIAF name search

The VIAF personal name autocomplete dropdown list in Fig. 2 uses VIAF Auto Suggest API[40] to retrieve the personal name’s label, VIAF URI, and Library of Congress Name Authority File (LCNAF) URI. When a name is selected, the links to both URIs, if they are available in VIAF, will be presented on Metadata Maker. Users have the option to verify the name entity’s information on either the VIAF or LCNAF page if so desired. The application then retrieves values of the 100 field subfields a to d from LCNAF whenever they are available. If no LCNAF URI is provided in VIAF, the preferred label from DNB[41] is the alternative option if that can be found in VIAF. The LCNAF and VIAF URIs are added to subfield 0 and 1 respectively in the MARC and MARCXML 100 field or 700 field based on their role. For other supported output formats, the URIs and the label/preferred name are also inserted into the appropriate elements. If there is no satisfactory result in the autocomplete dropdown list, it also allows users to manually input the name strings. The code is available online.[42]


Figure 2. VIAF Auto Suggest Dropdown List.

// Using VIAF Auto Suggest API Fetch Personal Names
(function($) {
  $.widget("oclc.viafauto", $.ui.autocomplete, {
   options: {
 select: function(event, ui) {
	alert("Selected!"); return this._super(event, ui); },
	source: function(request, response) {
    	var term = $.trim(request.term);
    	var url  = "https://viaf.org/viaf/AutoSuggest?query=" + term;
    	var me = this;
    	$.ajax({
        	url: url,
        	dataType: "jsonp",
        	success: function(data) {
            	if (data.result) {
                	response( $.map( data.result, function(item) {
                    	if (item.nametype == "personal"){
                        	var retLbl = item.term + " [" + item.nametype + "]";
                        	var uri = "http://viaf.org/viaf/" + item.viafid;
                        	if (item.lc){
                            	return {
                                	label: retLbl,
                                	value: item.term,
                                	id: item.viafid,
                                	viafuri: uri,
                                	lcuri: "http://id.loc.gov/authorities/names/" + item.lc,
                                	nametype: item.nametype
                            	}
                        	}else{
                            	return {
                                	label: retLbl,
                               	 value: item.term,
                                	id: item.viafid,
                                	viafuri: uri,
                                	lcuri: "noLC",
                                	nametype: item.nametype
                   	         }
                        	}
                    	}
                	}));
            	} else {
                	me._trigger('nomatch', null, {term: term});
            	}
        	},
    	});   
	}
	},     
	_create: function() {
    	return this._super();
	},
	_setOption: function( key, value ) {
    	this._super( key, value );
	},
	_setOptions: function( options ) {
    	this._super( options );
	}
  });
})(jQuery);
// Get Information for User Selected Name in the author Input Field
$(function() {
  	$(".author").viafautox( {
        	select: function(event, ui){
              	var item = ui.item;}
        	}
  	});
});

LCSH and FAST Suggest

The second function that was added to Metadata Maker is the LCSH suggestion using Annif API.[43] Annif (http://annif.org/) is a subject suggest tool for documents, originally developed by the National Library of Finland.[44] According to its webpage, Annif can be trained through natural language processing and machine learning algorithms to support any kind of subject headings. To make Annif support LCSH, the LD4P group used Annif’s built-in algorithms and training corpus from the IvyPlus Platform for Open Data (POD)[45] and Share-VDE (Virtual Discovery Environment)[46] to train Annif (Hahn, 2022;[47] Khan, 2020[48]).[49] Upon request, Annif LCSH API returns a list of suggested LCSHs labels, URIs, and predicted scores. The list is sorted by the predicted score from high to low: the higher the score, the more relevant the subject heading is.

// Annif LCSH API Response
[
	{
    	"label": "Clothing and dress--China--History",
        "notation": null,
    	"score": 0.06058865785598755,
    	"uri": "http://id.loc.gov/authorities/subjects/sh2003012066"
	},
	{
    	"label": "Costume--China",
        "notation": null,
    	"score": 0.014286939986050129,
    	"uri": "http://id.loc.gov/authorities/subjects/sh85033251"
	},
	{
    	"label": "Costume--China--History",
        "notation": null,
    	"score": 0.014127381145954132,
    	"uri": "http://id.loc.gov/authorities/subjects/sh85033252"
	},
	{
    	"label": "Clothing and dress--History",
        "notation": null,
    	"score": 0.011828765273094177,
    	"uri": "http://id.loc.gov/authorities/subjects/sh2003012061"
	},
	{
    	"label": "Clothing and dress--Social aspects",
        "notation": null,
    	"score": 0.008354970254004002,
    	"uri": "http://id.loc.gov/authorities/subjects/sh85027167"
	},
	{
    	"label": "Fashion--History",
        "notation": null,
    	"score": 0.008040583692491055,
    	"uri": "http://id.loc.gov/authorities/subjects/sh2008103592"
	},
	{
  	  "label": "Fashion--History--20th century",
        "notation": null,
    	"score": 0.007795797660946846,
    	"uri": "http://id.loc.gov/authorities/subjects/sh2008103594"
	},
	{
    	"label": "Chinese poetry--Translations into English",
        "notation": null,
    	"score": 0.007471516728401184,
    	"uri": "http://id.loc.gov/authorities/subjects/sh2008100615"
	},
	{
    	"label": "Medicine, Chinese",
        "notation": null,
    	"score": 0.0065437802113592625,
    	"uri": "http://id.loc.gov/authorities/subjects/sh85083125"
	},
	{
    	"label": "Clothing and dress in literature",
        "notation": null,
    	"score": 0.005863940808922052,
    	"uri": "http://id.loc.gov/authorities/subjects/sh85033275"
	}
]

Using Annif LCSH API, Metadata Maker can recommend ten LCSH terms given a book summary in any Romance languages. Users can select zero to ten LCSH terms by checking the provided checkbox. It is also possible to re-run the Suggest function by updating the summary in the input box and clicking the Suggest button. If one is not satisfied with the recommended keywords or uncomfortable using LCSH, users can still use an autocomplete Faceted Application of Subject Terminology (FAST) heading search box to add keywords.


Figure 3. Keyword (Summary Suggest and Keyword Search Box) Screenshot.

// If a user clicks the #LCSHSuggest button, based on the user’s 
// text input in the #summary box, LCSH will generate in the #LCSHresponse div container
$(function() {
  	document.getElementById('LCSHSuggest').onclick = function(){
        	document.getElementById("LCSHresponse").innerHTML = "";
        	var summary = document.getElementById('summary').value;
        	if (summary!=null){
              	var requests = "text=" + summary;
              	var url = "http://annif.info/v1/projects/upenn-omikuji-bonsai-en-gen/suggest";
              	var xhr = new XMLHttpRequest();
              	xhr.open("POST", url, false);
              	xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
              	xhr.setRequestHeader("Accept", "application/json");
              	xhr.onreadystatechange = function () {
                    	if (xhr.readyState === 4) {
                          	var data = xhr.responseText;
                          	var jsonResponse = JSON.parse(data);
                          	console.log(jsonResponse);
                          	if (jsonResponse["results"] && jsonResponse["results"].length){
                                	for (var i = 0; i < jsonResponse["results"].length; i++){
                                      	var lcshabel = jsonResponse["results"][i]["label"];
                                      	var lcshurl = jsonResponse["results"][i]["uri"];
                                	  	document.getElementById("LCSHresponse").innerHTML += '<input type="checkbox" name="lcsh" class= "lcshcheckbox" uri="'+lcshurl +'" value="'+lcshabel+'">'+lcshabel+'<br>';}
                                	}
                    	}
              	};
              	xhr.send(requests);
        	}
  	};
});

BIBFRAME Output

With recent updates, the BIBFRAME output data now includes URIs of personal names, LCSH, and FAST Headings in the Monographs (LD) module.

Below is an example of a <bf:contribution>. The LCNAF URI of “Shakespeare” is added to the Agent node. Both VIAF and LCNAF URIs of “Shakespeare” are added as the value of identifiers.

<!-- example of a <bf:contribution> -->
<bf:contribution>
	<bf:Contribution>
    	<bf:role>
        	<bf:Role rdf:about="http://id.loc.gov/vocabulary/relators/aut"/>
    	</bf:role>
    	<bf:agent>
        	<bf:Agent rdf:about="http://id.loc.gov/authorities/names/n78095332">
            	<rdf:type rdf:resource="http://id.loc.gov/ontologies/bibframe/Person"/>
                <rdfs:label>Shakespeare, William, 1564-1616</rdfs:label>
                <bf:identifiedBy>
                        <bf:Identifier>
                                <rdf:value rdf:resource="http://viaf.org/viaf/96994048"/>
                        </bf:Identifier>
                </bf:identifiedBy>
                <bf:identifiedBy>
                        <bf:Identifier>
                                <rdf:value rdf:resource="http://id.loc.gov/authorities/names/n78095332"/>
                        </bf:Identifier>
                </bf:identifiedBy>
        	</bf:Agent>
    	</bf:agent>
	</bf:Contribution>
</bf:contribution>

The second is an example of a <bf:subject>. The Fast Heading URI is added to the Topic node. These are represented in BIBFRAME metadata as below.

<!-- example of a <bf:subject> -->
<bf:subject>
	<bf:Topic rdf:about="http://id.worldcat.org/fast/1921567">
    	<rdfs:label>Comedy plays [Form/Genre]</rdfs:label>
    	<rdf:type rdf:resource="http://www.loc.gov/mads/rdf/v1#Topic"/>
    	<bf:source>
        	<bf:Source rdf:about="http://id.loc.gov/vocabulary/identifiers/fast"/>
    	</bf:source>
	</bf:Topic>
</bf:subject>

Some consideration

While developing new features for Metadata Maker, the authors found some issues with the APIs and linked data sources.

Encoding

VIAF provides a single name authority file that combines name authority files from more than 40 organizations,[50] making it convenient for libraries to take advantage of linked data and obtain information about name entities from one source. Yet, the aggregation process might cause some encoding issues in VIAF records. For example, when one searches for “Greta Reyghere,”[51] the name includes empty boxes in the dropdown list returned by the API. The same issue also appeared in the VIAF JSON record: the source of the name with empty boxes was DNB according to the VIAF JSON record (see Fig. 5).[52] However, the DNB record did not have anything anomalous.[53] It seems that the empty boxes in the name label only exist in the VIAF record; aggregation setting in VIAF might be the reason why.


Figure 4. Empty Boxes in VIAF.


Figure 5. Empty Boxes in VIAF JSON.

Name Entities Search Scope

When describing resources in BIBFRAME editors, cataloging experts tend to use name authority files like LCNAF. However, non-catalogers or paraprofessional catalogers may not be aware of those sources and are more likely to rely on the linked data editor itself. It is expected that BIBFRAME editors understand the different name entity search behaviors between experienced and nascent catalogers. Specifically, there are two expectations for the name entity search function in linked data editors: (1) no restraints on the order of a name; and (2) supporting variant name searching. As many non-professional catalogers may not receive identity management (authority) training, it is not intuitive for them to search names following the MARC 100 field format: “last name, first name.” It is also important to make BIBFRAME editors connected to various linked data sources for name entities on the Web and collect the name variances from as many sources as possible.

To meet the two expectations, Metadata Maker adopts VIAF Auto Suggest API for personal name searching. The VIAF Auto Suggest API supports both preferred name and variant name searching without any name format or name order constraints. This flexibility allows non-catalogers to find the desired personal name in different ways.

One BIBFRAME editor that was tested for this project supports only preferred name label search. A Korean author, Han, Shin-Kap,[54] has name variances: “한신갑” and “Shin-Kap Han” in his authority record. The BIBFRAME editor only brought a result when the term “Han, Shin-Kap” was searched, as it matched with the existing LCNAF record 100 field. The other two variant names did not bring any results as the selected editor does not support variant name search. The failed search may drive non-catalogers to create duplicate name entity records or use strings instead of URIs to represent the person.


Figure 6. Search Han, Shin-Kap in a Linked Data Editor.


Figure 7. Search 한신갑 in a Linked Data Editor.


Figure 8. Search Shin-Kap Han in a Linked Data Editor.

Quality of Authority Data

VIAF authority data provided via JSON-linked data (JSON-LD) format does not always have detailed and granular information. VIAF Authority Cluster endpoint allows catalogers to retrieve authority data in various formats.[55] The name-related elements in the JSON-LD representation of VIAF authority records include family name, given name, alternative name, and name (full name). More complicated names may contain title, numeration, and other information about the entity. Take “John Paul II, Pope” as an example.[56] “Pope” is the title of “John Paul II.” “John Paul” is the papal name and “II” is the numeration. However, in his VIAF JSON-LD record (see below), “John Paul II” is treated as the family name and “Pope” is treated as the given name, which is not correct. While this would not be a problem when using data models that do not require name parts information like BIBFRAME, it could be a problem for schemas that have fields or attributes specifically designated for name part, e.g., first name and last name.

// JSON-LD description of John Paul II, Pope in VIAF
{
...
 "familyName" : [ "Janis", "John Paul II", "Juan Pablo II.", "Jawién", "Joannes Paulus II.", "Ioannis Pauli II", "Yūhạnnā Būlus at-Tanī", "Ioann Pavel II", "Wojytla", "Jean Paul II.", "ויטילה", "Vajtyla", "Wojtila", "II", "Jean Paul II", "Jean-Paul II.", "Voitilah", "Ján Pavol II.", "János Pál II.", "Ivan Pavao II.", "Yuhạnnā-Būlus at-Tanī", "Jawieň", "Ṿoiṭilah", "Juan Pablo II", "Vojtyla", "Ivan Pavlo II.", "Ян Павел II", "Johannes Paulus II.", "Giovanni Paolo‏ II", "Voityla", "Jasien", "Jasień", "Yoḥanan Paʾulus ha-sheni", "Voitila", "Xoán Paulo II", "Ṿoiṭilah", "Jasień", "ואיטילה", "Gruda", "Giovanni Paolos II.", "Wojtyla", "Johano Paŭlo la Dua", "Войтыла", "Jawień", "Wojtiła", "Johannes Paul II", "Paulus", "Yuḥannā-Būlus at-Tānī", "Johannes Paul II.", "John Paul II.", "Wojtyła", "보이티야", "アンジェイ", "Jan Paweł II", "Jean-Paul II", "Yuhạnnā-Būlus at-Tanī", "보이티와", "Janez", "Jan Paweł II.", "Jawien", "Jan Paweł", "Jawień", "Yūḥannā Būlus at-Tānī", "Giovanni Paolo II", "Janez Pavel II.", "Ioannis Paulus II.", "Yūḥannā Būlus at-Tānī", "Vojtila", "Iohannes Paulus PP. II", "Yūhạnnā Būlus at-Tanī", "Jan Pavel Druhý", "Ioannes Paulus II.", "Jānis Pāvils II.", "Yuḥannā-Būlus at-Tānī", "Joannes Paulus II" ],
	"gender" : "http://www.wikidata.org/entity/Q6581097",
	"givenName" : [ "Karal'", "Pape", "Papież", "Karol Józef", "al-Bābā", "Stanislaw Andrzej", "Carlo", "Karols", "Karol'", "Stanisław A.", "Ḳarol", "Pope", "Папа Рымскі", "Кароль", "קארול", "‏ papa", "Karol Joźef", "Johannes", "Andrzej", "Pāvests", "papież", "Ḳarol", "Carol", "ヤヴィエニ", "Karol J.", "카롤", "Piotr", "saint", "Lolek", "Stanisław", "K.", "Stanisław Andrzej", "papa", "Heiliger", "santo", "Karolis", "Karol Jozef", "Pavils", "Pápa", "Papa", "카롤 유제프", "Karol Józef", "Karolʹ", "Papst, Heiliger", "Papst", "al-Bābā", "II", "Karol", "Karel", "Pavel", "pape", "John Paul", "Paus", "קרול" ],
...
}

Testing

After adding the VIAF API into Metadata Maker, the authors did a very small scale unofficial usability testing in University of Illinois with eleven participants: five paraprofessional catalogers who create original cataloging records as part of their responsibilities; two hourly catalogers who did not have cataloging knowledge but with language and subject knowledge; two graduate assistants; and two cataloging and metadata librarians. They were asked to create a record for a monograph book in Sinopia and Metadata Maker and share their thoughts on two things: ease of use and knowledge/skills required to use each tool. The survey also had a section where testers could add their thoughts.[57]

Ease of use

For the first question, testers could choose one answer from the following options:

  • Extremely hard
  • Hard, but can follow through it
  • Easy
  • Very easy


Figure 9. Survey Result: Ease of Use.

Eight participants said that Metadata Maker is easy to use (five chose “Very easy” and three chose “Easy”) while ten people said that Sinopia is hard to use (five chose “Extremely hard” and another five chose “Hard, but can follow through it”).

The survey reveals that the majority of participants prefer the simple interface of Metadata Maker to the relatively complex and verbose interface of Sinopia. There is one person who chose that Metadata Maker is “Extremely hard to use” and two people chose “Hard, but can follow through it”. Those who answered that Metadata Maker is hard to use are paraprofessional catalogers who create original records in OCLC. During the follow-up interview, they expressed that they do not like the simple interface of Metadata Maker and the notion of creating short/minimum records. They want the BIBFRAME editors to be similar to the OCLC Connexion, the tool that they are familiar with and allows them to create full level cataloging records. An undergraduate student with language skills answered that Sinopia is easy to use. The student added that while there is a lot to learn and it takes time, they can follow through the Sinopia by reading the information provided for each element.

While Sinopia allows users to view the output data in JSON-LD, Turtle, N-triples, RDF table, and interface view formats, three participants commented that it is hard to check the outcome of their work in Sinopia. It might be because those participants have not learned RDF data models and linked data serialization formats. Metadata Maker, however, allows records to be downloaded and viewed locally. Those participants also added that it would be helpful to know the dataflow once the record is created in both editors.

Knowledge and Skills Required to Use the Editors

The second multiple-choice question was to ask participants what kind of skills they thought were needed for the two BIBFRAME editors, such as Functional Requirements for Bibliographic Records (FRBR),[58] RDA, BIBFRAME, LCSH, and other controlled vocabularies, name authority, linked data, and MARC. However, the authors quickly realized that the jargon and acronyms in this question caused misunderstandings for many participants as they did not know some or all options, especially the two non-catalogers who do not have cataloging knowledge/education. Those staff members who routinely create original records also are not familiar with FRBR, BIBFRAME, and linked data. As a result, the answers to this question are all over the place as below:

Table 1. Answers from 11 Participants: Knowledge and Skills Required to Use the Editors.
Sinopia Metadata Maker
Unsure None
BIBFRAME, MARC, LCSH and other controlled vocabularies, name authority, linked data MARC, None
RDA, BIBFRAME, FRBR, LCSH and other controlled vocabularies, linked data, Need an extreme understanding of FRBR terms and RDA standards just to read/understand the interface I feel like you don’t actually need to know anything about cataloging standards to use this interface
MARC, LCSH and other controlled vocabularies, name authority, linked data MARC
MARC, LCSH and other controlled vocabularies, name authority, linked data None
BIBFRAME None
RDA, BIBFRAME, FRBR, linked data, I did not use it enough to know all that one needs to know, but this is meant for experienced (and very technically savvy) catalogers None, If applicable, an non-English language.
MARC LCSH and other controlled vocabularies
I do not know? RDA Basic book information
Everything Basic book information
None None

However, one thing that is clear is that while many participants said there are things that are necessary to learn in order to use the BIBFRAME editor, the majority of participants said no knowledge is needed to use the Metadata Maker.

Discussion and Next Steps

The process of revamping Metadata Maker with linked data sources and BIBFRAME output presented a possibility for building a linked data editor without any cataloging terminologies that can be used by anyone. The intuitive design, self-explanatory wording, and one-page web form break the learning barriers of BIBFRAME cataloging and allow non-professional catalogers and language/subject experts to get involved in linked data metadata creation. As Metadata Maker is designed for generating “good enough” records, it can also serve as a quick BIBFRAME generation tool for paraprofessional catalogers. However, the authors have learned some concerns from catalogers with regard to using this tool in practice, such as an oversimplified interface and unclear dataflow. The authors were perplexed by the variant degree of acceptance for Metadata Maker among survey participants. Paraprofessional catalogers are inclined to use quasi-Connexion editors with the option to describe detailed information about resources; whereas nascent catalogers might be more comfortable using linked data editors that do not require such prerequisite knowledge. The developers of linked data editors will need to balance those two needs.

While the library domain has made significant progress in the development of and experimentation with linked data and BIBFRAME production, there are still many things that the library community has to think further about and work together on to find a solution.

First, a clear dataflow needs to be established. As of now, BIBFRAME linked data created from the current BIBFRAME editors are not automatically ingested into any integrated library system.[59] This was brought up by several staff members who tested Sinopia. In addition, most vendors do not support BIBFRAME import as of this writing. The authors acknowledge that the dataflow requires a possible new integrated library system that can work with metadata in different formats and with a different ontology.

Second, libraries may have a completely different data sharing method in the linked data environment compared with the current centralized shared database.[60] If that is the case, what would a data sharing model be like? If it is still possible to have a centralized linked data database, then who is going to manage it, and how is it going to be managed?

Third, a discussion of work distribution between human catalogers and machines needs to start. As machines can do MARC to BIBFRAME conversion and authority reconciliation work rather effectively, libraries might want to think about what machines can do and what cataloging and metadata professionals should do. If there are tasks that machines can do better, then it would be better to leave those to the machines, and identify what cataloging and metadata professionals should focus on, in terms of linked data creation and workflows.

Fourth, according to Fortier, Pretty, and Scott (2022),[61] the understanding and knowledge of BIBFRAME among Canadian Libraries is still low after close to two decades of ongoing discussion and development efforts. While it is important to understand the underlying structure of BIBFRAME and linked data, it would be worthwhile to think about how much training is adequate for cataloging professionals and how much integration of RDA terms into the BIBFRAME editors is necessary for the transition to linked data creation. Or, maybe what libraries really need is a linked data editor rather than a BIBFRAME editor. If there are problems in understanding BIBFRAME and RDA among ourselves, it would be much more difficult for users on the Web to understand what kind of data we are sharing.

About the Author

Greta Heng (ORCID: 0000-0002-3606-6357) is Cataloging and Metadata Strategies Librarian at San Diego State University.

Myung-Ja (MJ) K. Han (ORCID: 0000-0001-5891-6466) is a Professor and Metadata Librarian at the University of Illinois at Urbana-Champaign.

Bibliography

[1] Library of Congress. Bibliographic Framework Initiative. https://www.loc.gov/bibframe/.
[2] World Wide Web Consortium (W3C). RDF. https://www.w3.org/RDF/.
[3] Wennerlund, B., & Berggren, A. (2017). Leaving Comfort Behind: a National Union Catalogue Transition to Linked Data. Paper presented at: IFLA WLIC 2019 – Athens, Greece – Libraries: dialogue for change in Session S15 – Big Data. In: Data intelligence in libraries: the actual and artificial perspectives, 22-23 August 2019, Frankfurt, Germany.
[4] French National Library. Semantic Web and Data Model. https://data.bnf.fr/en/semanticweb.
[5] German National Library. Linked Data Service. https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/LDS/lds_node.html.
[6] Library of Congress. Marva Editor. https://bibframe.org/marva/editor/.
[7] Wikidata. Wikidata Main Page. https://www.wikidata.org/wiki/Wikidata:Main_Page.
[8] Godby, J., Smith-Yoshimura, K., Washburn, B., Davis, K., Detling, K., Eslao, C., Folsom, S., Li, X., McGee, M., Miller, K., Moody, H., Thomas, C., & Tomren, H. (2019). Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage (pp.70). OCLC Research. https://doi.org/10.25333/faq3-ax08.
[9] Han, M. K., Ream-Sotomayor, N. E., Lampron, P., & Kudeki, D. (2016). Making Metadata Maker: A web application for metadata production, Library Resources & Technical Services, 60(2), 89–98.; all the source codes are available in GitHub: https://github.com/dkudeki/metadata-maker; Metadata Maker is still in the exploratory phase and currently only supports linked data cataloging for monographs.
[10] Michael, B., & Han, M. J. K. (2019). Assessing BIBFRAME 2.0: Exploratory implementation in metadata maker. Proceedings of the International Conference on Dublin Core and Metadata Applications, 26-31.
[11] Non-catalogers refer to people who do cataloging work but do not have adequate cataloging experience or may not need it as they do not pursue a career in cataloging.
[12] Van der Werf, T. (2021, March 4). Next Generation Metadata… it’s getting real! Hanging Together, OCLC Research Blog. https://hangingtogether.org/next-generation-metadata-it-is-getting-real/.
[13] Dalgord,C. Shared Entity Management Infrastructure Project update. OCLC. https://www.loc.gov/bibframe/news/source/bibframe-from-home-oclc-update.pptx.
[14] Linked Data for Libraries. https://wiki.lyrasis.org/pages/viewpage.action?pageId=41354028.
[15] Linked Data for Libraries Labs. https://wiki.lyrasis.org/pages/viewpage.action?pageId=77447730.
[16] Linked Data for Production. https://wiki.lyrasis.org/pages/viewpage.action?pageId=74515029.
[17] Linked Data for Production: Pathway to Implementation. https://wiki.lyrasis.org/display/LD4P2.
[18] Linked Data for Production: Closing the Loop. https://wiki.lyrasis.org/display/LD4P3.
[19] Lnenicka, M., Kopackova, H., Machova, R., & Komarkova, J. (2020). Big and open linked data analytics: a study on changing roles and skills in the higher educational process. International Journal of Educational Technology in Higher Education, 17(1), 1-30.
[20] El-Sherbini, M. & Klim, G. (1997). Changes in technical services and their effect on the role of catalogers and staff education: An overview. Cataloging & Classification Quarterly, 24(1-2), 23-33; Zhu, L. (2012). The role of paraprofessionals in technical services in academic libraries. Library Resources & Technical Services, 56(3), 127-154.
[21] Van der Werf, Next Generation Metadata… it’s getting real!
[22] Library of Congress, BIBFRAME Editor. https://BIBFRAME.org/bfe/index.html.
[23] Library of Congress, MARVA. https://BIBFRAME.org/marva/editor/.
[24] Linked Data for Production: Pathway to Implementation. Sinopia. https://sinopia.io/.
[25] RDA Toolkit: https://www.rdatoolkit.org/.
[26] BIBFRAME Interoperability Group. (2022. April 15). Terms of Reference. https://www.loc.gov/aba/pcc/bibframe/TaskGroups/BIG/BIG-TOR.pdf.
[27] Lorimer, N.(2022, March 8). Re-use or Copy? Redefining Copy Cataloging in a Linked Data Environment. ALA Copy Cataloging IG, online. https://docs.google.com/presentation/d/1UKxcDjEA-CwMXnfiXFiBdPbCVN_JxvmymOOJgZIbI9o/edit?usp=sharing.
[28] Library of Congress. Appendix C – Minimal Level Record Examples. https://www.loc.gov/marc/bibliographic/bdapndxc.html.
[29] http://quest.library.illinois.edu/marcmaker/dataset/.
[30] http://quest.library.illinois.edu/marcmaker/.
[31] Aka, monograph (linked data), http://quest.library.illinois.edu/marcmaker/monoviaf/.
[32] http://quest.library.illinois.edu/marcmaker/ebooks/.
[33] http://quest.library.illinois.edu/marcmaker/govdocs/.
[34] http://quest.library.illinois.edu/marcmaker/maps/.
[35] http://quest.library.illinois.edu/marcmaker/microfilms/.
[36] http://quest.library.illinois.edu/marcmaker/scores/.
[37] http://quest.library.illinois.edu/marcmaker/serials/.
[38] http://quest.library.illinois.edu/marcmaker/theses/.
[39] BIBFRAME is only added to two monograph modules for now.
[40] OCLC Developer Network. Authority Cluster Resource. https://www.oclc.org/developer/api/oclc-apis/viaf/authority-cluster.en.html.
[41] DNB was selected as an alternative name label source because (1) it provides linked data service; and (2) it is a national library for a non-native English speaking countries which may compensate for LCNAF.
[42] https://github.com/dkudeki/metadata-maker/blob/monoviaf/LCSH/lcshsearch.js.
[43] Suominen, O., Inkinen, J., Virolainen, T., Fürneisen, M., Kinoshita, B. P., Veldhoen, S., Sjöberg, M., Zumstein, P., Neatherway, R., & Lehtinen, M. (2022). Annif (Version 0.60.0-dev) [Computer software]. https://doi.org/10.5281/zenodo.2578948; https://api.annif.org/v1/ui/.
[44] Annif Github Repository. https://github.com/NatLibFi/Annif.
[45] IvyPlus Platform for Open Data. https://pod.stanford.edu/.
[46] Share-VDE (Virtual Discovery Environment). https://www.svde.org/.
[47] Hahn, J. (2022, June 20). Cataloger acceptance and use of semiautomated subject recommendations for web scale linked data systems. 87th IFLA World Library and Information Congress (WLIC) / 2022 in Dublin, Ireland. https://repository.ifla.org/handle/123456789/1955.
[48] Khan ,H. (2020, March 10). Annif Use and Explanation. Linked Data for Production: Pathway to Implementation. https://wiki.lyrasis.org/display/LD4P2/Annif+Use+and+Explanation.
[49] When accessed http://lcsh.annif.info/ in October 2022, Annif LCSH API project updated its vocabulary sources: “Ivyplus-tfidf” was changed to “penn-fasttext-en” (penn (LCSH English) conference papers and proceedings), “upenn-omikuji-bonsai-en-gen” (upenn (LCSH English) all genres), and “upenn-omikuji-bonsai-spa-gen” (upenn (LCSH Spanish) all genres).
[50] Virtual International Authority File. https://www.oclc.org/en/viaf.html.
[51] VIAF authority record for De Reyghère, Greta. Retrieved on September 11, 2022, from http://viaf.org/viaf/69118441.
[52] VIAF authority record in JSON for De Reyghère, Greta. Retrieved on September 11, 2022, from https://viaf.org/viaf/69118441/viaf.json.
[53] DNB authority record for De Reyghère, Greta. Retrieved on September 11, 2022, from https://hub.culturegraph.org/entityfacts/134496175, and https://d-nb.info/gnd/134496175.
[54] VIAF authority record for Han, Shin-Kap. Retrieved on September 11, 2022, from http://viaf.org/viaf/198153409742041581752.
[55] OCLC Authority Cluster Resource. https://www.oclc.org/developer/api/oclc-apis/viaf/authority-cluster.en.html. Retrieved on October 5, 2022.
[56] VIAF authority record in JSON-LD for John Paul II, Pope. Retrieved on September 20, 2022, from https://viaf.org/viaf/35605/viaf.jsonld.
[57] We chose Sinopia over other BIBFRAME editors because it is created for the community and has PCC templates that have been tested out by many catalogers. We also understand that the purpose of the BIBFRAME editor and Metadata Maker are different.
[58] The International Federation of Library Associations and Institutions. Functional Requirements for Bibliographic Records (FRBR). https://www.loc.gov/marc/bibliographic/bdapndxc.html.
[59] There are some unofficial statements that Folio and Ex Libris have been working on BIBFRAME data import. But as of October 6, 2022, there has not been a BIBFRAME data import function released by them.
[60] Library of Congress. BIBFRAME and the PCC. https://www.loc.gov/aba/pcc/bibframe/bibframe-and-pcc.html.
[61] Fortier, A., Pretty, H., & Scott, D. (2022): Assessing the Readiness for and Knowledge of BIBFRAME in Canadian Libraries, Cataloging & Classification Quarterly. https://doi.org/10.1080/01639374.2022.2119456.

Leave a Reply

ISSN 1940-5758