Analysis of 2018 International Linked Data Survey for Implementers

Karen Smith-Yoshimura

Issue 42, 2018-11-08

Analysis of 2018 International Linked Data Survey for Implementers

OCLC Research conducted an International Linked Data Survey for Implementers in 2014 and 2015. Curious about what might have changed since the last survey, and eager to learn about new projects or services that format metadata as linked data or make subsequent uses of it, OCLC Research repeated the survey between 17 April and 25 May 2018.

A total of 143 institutions in 23 countries responded to one or more of the surveys. This analysis covers the 104 linked data projects or services described by the 81 institutions which responded to the 2018 survey—those that publish linked data, consume linked data, or both. This article provides an overview of the linked data projects or services institutions have implemented or are implementing; what data they publish and consume; the reasons given for implementing linked data and the barriers encountered; and some advice given by respondents to those considering implementing a linked data project or service. Differences with previous survey responses are noted, but as the majority of linked projects and services described are either not yet in production or implemented within the last two years, these differences may reflect new trends rather than changes in implementations.

By Karen Smith-Yoshimura

Introduction

The impetus for an “International Linked Data Survey for Implementers” was a set of discussions with OCLC Research Library Partnership (RLP)[1] metadata managers who were aware of some linked data projects or services but felt there must be more “out there” that they should know about. The survey instrument was designed in consultation with OCLC colleagues and a few RLP institutions and beta tested by several linked data implementers. We conducted the initial survey in July – August 2014, distributing the link to the survey on multiple listservs and on Twitter. The survey targeted those who had already implemented a linked data project or service, or were in the process of doing so. Questions were asked both about publishing linked data and consuming linked data. The results were published in a series of posts on the OCLC Research blog, HangingTogether.org.[2]

While the initial survey results received a generally appreciative response from readers, some noted and regretted the absence of several prominent linked data efforts in Europe. To address these gaps, we repeated the survey in June-July 2015. The results were published in the July/August 2016 issue of D-Lib Magazine.[3]

Curious about what might have changed since the last survey, and eager to learn about new projects or services that format metadata as linked data or make subsequent uses of it, OCLC Research repeated the survey between 17 April and 25 May 2018. More institutions responded than in the previous survey (81 institutions, compared to 71 institutions in 2015) but described slightly fewer linked data projects or services (104, compared to 112 in 2015).

Spreadsheets[4] containing all responses to the 2014, 2015, and 2018 surveys include links to the linked data projects or services in production.

Overview

A total of 143 institutions in 23 countries responded to one or more of the surveys (see the list appended at the end of this article). Of the 81 institutions responding to the 2018 survey, 46 (57%) had responded to one or both of the previous surveys. Even those who had responded to a previous survey did not always describe the same linked data projects or services. Of the 104 linked data projects or services described, only 42 had been described previously. Even when the same project or service was described, the respondent sometimes differed from the one who responded previously. Some respondents did not answer every question, so the totals for each question may vary.

Half of the respondents to the 2018 survey that have implemented or are implementing a linked data project or service reported that they plan to implement another linked data project within the next two years, a slight decrease from the 60% who had plans within the next two years in 2015. But two of the 2015 respondents who reported that they had plans did indeed do so when repeating the survey in 2018.

Respondents from the United States to the 2018 survey accounted for 42%, with 34 institutions, followed by Spain (12), the United Kingdom (8), and The Netherlands (4). We received three responses each from Canada, Germany, and Norway, two responses from Italy, and one response each from Australia, Austria, China, the Czech Republic, Finland, France, Hungary, Japan, Luxembourg, Portugal, South Africa, and Switzerland.

We categorized the responding institutions by type. Research libraries represented 28% of the 2018 respondents (23), followed by 13 national libraries (16%), 11 research institutions (14%), eight library networks and eight government (10%), six service providers (7%), five public libraries (6%), four museums (5%), and two “others” (a concert venue and a publisher). This breakdown generally follows the types of institutions that responded to the previous survey with one exception—for the first time, we received responses from service providers, which provide linked data services for their customers. Figure 1 compares the number of responding institutions to the three surveys by type of institution.

Chart of types of institutions responding to the surveys conducted in 2014, 2015, and 2018
Figure 1:Types of institutions responding to the surveys conducted in 2014, 2015, and 2018.

Of the 104 linked data projects or services described at various levels of detail by the 2018 survey respondents, three-fourths are in production, of which 23% (18) have been in production for more than two years but less than four, and 40% (31) have been in production for four or more years. Three of them are not yet accessible, and three are “private,” for that institution’s use only.

Most of the linked data projects or services are done entirely in-house (61 or 59% of the implementations), with 23 (22%) part of a multi-institutional implementation, and 20 (19%) provided by an external vendor or service provider. But even those who responded that the work to deliver linked data functionality was all done in-house reported collaboration with external groups or organizations. Only 25 (24%) of the implementations involved “only my institution.” In order of frequency named, the external collaborators were:

other libraries or archives
other universities or research institutions
external consultants or developers
a systems vendor
part of an international collaboration and other consortium members
part of a national collaboration
a corporation or company
part of a discipline-specific collaboration
a foundation
a scholarly society
part of a state- or province-wide initiative.

The rankings resemble those in the 2015 survey, with slightly higher rankings for a systems vendor, international collaboration, other consortium members, discipline-specific collaboration, and foundation.

Staffing: Most of the institutions that have implemented or are implementing linked data projects/services have added linked data to the responsibilities of current staff (86); only 15 have not. The most significant change from the 2015 survey is that the number of those who have staff dedicated to linked data increased by 50%. In Table 1, the first column shows the number of responses and the second column the percentage of all those who responded to the question.

Table 1: Comparison of staffing in 2018 and 2015 surveys.

Staffing	2018		2015
Added to the responsibilities of current staff.	86	85%	98	92%
We have staff dedicated to linked data project(s).	30	30%	20	19%
Adding/have added new staff with linked data expertise.	10	10%	4	4%
Adding/have added temporary staff with linked data expertise.	13	13%	13	12%
Hiring/have hired external consultants with linked data expertise.	12	12%	17	16%
Respondents to question	101		107

Funding: Twenty-two of the linked data projects or services received grant funding; 73 (70%) are funded by the library/archive and/or the parent institution. Six linked data projects received funding support from partner institutions, five were privately funded, and one received corporate funding. Eight have never applied for a grant, but plan to.

Success assessment: More respondents reported that their linked data project or service was successful or “mostly” successful in 2018 than in 2015: 58 compared to 46. Fewer didn’t know yet as their projects were still at an early stage (pre-implementation or early implementation): 31 in 2018 compared to 52 in 2015. Comments from respondents whose linked data project or service has been in production for at least four years reveal the following indicators of success.

Usage: Most respondents noted substantial increases in usage over the years, and more contributors. One noted the high search ranking for rare content. Others noted the increase in requests for digital services support discovered through their linked data offerings. Conference presentations and articles by others referring to their own work were cited as evidence of success. Expanded exposure on the Internet is “dragging people into library services.”
Data re-use: Several noted that other applications making use of their linked data implementation is a metric of success. Another metric is the number of bulk downloads.
Interoperability: Several noted their linked data service provides access to their other resources. One noted the value of aggregating data from resources around the world to their users.
User satisfaction: Linked data offers users a richer experience that is much more contextualized and inter-related. One pointed to better support of multilingualism by fetching multilingual labels from linked data vocabularies. One noted that their “happy users” are “probably unaware that the service is based on linked data.”
Influence: The success of a project gains attention and illustrates what’s possible. Several noted that their services are well-known in the community and developers in the cultural sector are increasingly aware of their value. Their data models have influenced other initiatives and moved the discussion on linked data in the library community.
Professional development: Even absent metrics demonstrating linked data’s value to others, linked data projects still provide professional development for staff. “We learned a great deal that we can build on.”

The reasons given by implementers for assessing their project or service as partially or mostly successful centered around:

the need to upgrade and expand the service but lacking funding for it
lack of tools to measure satisfaction of the users with the linked data offering
difficulty in assessing success of data dissemination and in determining how the data is re-used
impression that interest isn’t high within the library community
impact seems marginal
did not result in similar projects, and the project did not grow in size and scale

In both the 2018 and 2015 surveys, most projects/services both consume and publish linked data. Even fewer of the described projects only publish linked data in the 2018 survey.

Table 2: Survey responses on how linked data is used.

How linked data is used	2018 survey	2015 survey
Consume linked data	34	38
Publish linked data	5	10
Both consume & publish	65	64

What and Why Linked Data Is Published

Given the relatively large representation of libraries among respondents, it is no surprise that descriptive metadata and bibliographic data are the most common types of data published (51 and 47 responses respectively), with authority data a close third (45). Other types of data published as linked data reported: Data about people (33), ontologies/vocabularies (33), digital collections (27), geographic data (23), datasets (19), data about museum objects (12), organizational data (12), encoded archival descriptions (3), and statistical data (3). Other responses included holdings and availability information, data about performance work, data bout monuments and archeology, provenance, and time periods.

In the three years since the 2015 survey, the number of linked data datasets with over 1 billion triples increased from three to 11, and three of them are over 5 billion triples: Biblioteca de Galicia ‘s digital library (6.3 billion triples), Europeana (a little over 5 billion), and OCLC’s WorldCat linked data (over 10 billion triples). But most of the linked data datasets are small. Of the 63 responses reporting their datasets’ size, 33 were less than 10 million triples, nine were between 10 and 100 million triples, and 10 were between 100 million and 1 billion triples.

Comparing the 2018 and 2015 survey results, the key motivations to publish linked data appear unchanged. Because some 2015 survey respondents had noted the need to publish linked data in order to consume it as an “other” motivation, we offered it as an option in the 2018 survey; it became the fourth most commonly cited reason.

Table 3: Chief motivations for publishing linked data.

Chief Motivations for Publishing Linked Data (n=92 in both surveys)	2018	2015
Expose data to a larger audience on the web.	74%	73%
Demonstrate what could be done with datasets as linked data.	65%	64%
Heard about linked data and wanted to try it out by exposing some local data as linked data.	45%	47%
Needed to publish linked data in order to consume it.	25%
Explore whether publishing data as linked data will improve Search Engine Optimization (SEO) for local resources.	24%	30%
Administration requested that we expose our data as linked data.	11%	5%

The British Library noted that its linked data implementations were part of the UK Government Public Sector Initiative. Other reasons written in included:

experiment with linked data outside the library catalog
increase interoperability
required for a grant-funded project
to link together information across different institutions
to provide an ontology extension to BIBFRAME[5]
develop supporting tools

More than half of the responses either did not know the average number of requests the linked data project or service received daily over the previous six months, did not keep or have access to usage statistics, or had no usage yet (61 of 103 responses). The eight most heavily used linked data datasets as measured by the average number of requests a day, with over 100,000 requests per day, are:

American Numismatic Society’s nomisma, a thesaurus of numismatic concepts. In the 2015 survey, it had reported daily usage as between 10,000 and 50,000 requests a day, so its usage has more than doubled over the last three years.
Bibliothèque nationale de France’s data.bnf.fr, providing access to the BnF’s collections and providing a hub among different resources. In the 2015 survey, it also had reported daily usage as between 10,000 and 50,000 requests a day.
Europeana, which aggregates metadata for digital objects from museums, archives, and audiovisual archives across Europe. It had reported the same daily usage in 2015.
Library of Congress’ Linked Data Service with over 50 vocabularies. Although usage fluctuates, it receives 500,000 to a million requests a day.
National Diet Library’s NDL Search, providing access to bibliographic data from Japanese libraries, archives, museums and academic research institutions. It had reported the same daily usage in 2015.
North Rhine-Westphalian Library Service Center’s Linked Open Data service, providing access to bibliographic resources, libraries and related organizations, and authority data. It had reported the same daily usage in 2015.
OCLC’s Virtual International Authority File (VIAF), an aggregation of over 40 authority files from different countries and regions. It had reported the same daily usage in 2015.
OCLC’s WorldCat Linked Data, a catalog of over 400 million bibliographic records made experimentally available in linked data form. It had reported the same daily usage in 2015.

Another three linked data datasets receive between 50,000 and 100,000 requests a day:

British Library’s British National Bibliography. In the 2015 survey, it had reported daily usage as between 10,000 and 50,000 requests a day, so its usage has doubled over the last three years.
National Library of Finland’s Finnish Thesaurus and Ontology Service. (Although this service was launched in 2014, it did not respond to the 2015 survey.)
OCLC’s FAST (Faceted Application of Subject Headings), a faceted subject heading schema derived from Library of Congress’ subject headings. In the 2015 survey, it had reported daily usage as between 10,000 and 50,000 requests a day, so its usage has doubled over the last three years.

Linked data datasets use a variety of RDF vocabularies and ontologies, and most use multiple ones. The percentage of those using Simple Knowledge Organization System (SKOS) decreased from 60% of respondents in 2015 to 44% in 2018, which was mirrored by an increase in those using Schema.org (30% in 2015 vs. 46% in 2018). Similar decreases showed up in those using Dublin Core Metadata Element Set, DCMI Metadata Terms, and Friend of a Friend (foaf), as well as a smaller decrease in those using RDF Schema. BIBFRAME vocabulary usage increased, from 15% in 2015 to 27% in 2018.

The table below shows the top eight in 2018 (used by at least 20% of respondents) compared to the 2015 responses. Ninety-five responded to the question in 2018 vs. 99 in 2015. The first column shows the number of responses and the second column the percentage of all those who responded to the question.

Table 4: Top 8 RDF vocabularies/ontologies used in 2018 compared to 2015.

RDF Vocabularies/Ontologies Used	2018		2015
Schema.org	44	46%	30	30%
SKOS	42	44%	59	60%
Dublin Core Terms	39	41%	51	52%
FOAF	36	38%	55	56%
DCMI Metadata Terms	35	37%	49	49%
RDF Schema	35	37%	45	45%
BibFrame	26	27%	15	15%
Local vocabulary	21	22%	19	19%

Other RDF vocabularies and ontologies named by the 2018 survey respondents in order of the frequency cited are:

Resource Description and Access (rda)
Europeana Data Model vocabulary (edm)
The Bibliographic Ontology (bibo)
CIDOC Conceptual Reference Model (crm)
Expression of Core FRBR Concepts in RDF (frbr)
Metadata Authority Description Schema (mads)

OWL 2 Web Ontology Language (owl 2)

ISBD elements (isbd)

WGS84 Geo Positioning (geo)

The Event Ontology (event)

Music Ontology (mo)

The OAI ORE terms vocabulary (ore)
International Standard Name Identifier (isni)

Metadata Object Description Schema (mods)

VIVO Core Ontology (vivo)

DPLA Metadata Application Profile (MAP)
Biographical Ontology (bio)

FRBR-aligned Bibliographic Ontology (fabio)

Organization Ontology (org)

Data Catalog Vocabulary (dcat)

EAC-CPF Descriptions Ontology for Linked Archival Data (eac-cpf)

rdag2

British Library Terms RDF schema (blt)

Library extension of schema.org [aka Purl.org/library] (lib)

Archival collections ontology (arch)

Nomisma Ontology

Thirty-seven respondents cited RDF vocabularies/ontologies not listed above. In alphabetical order they are: Activity Streams, Description of a Project (DOAP), Exif Data Description Vocabulary, Funding, Research Administration and Projects Ontology (FRAPO), GND Ontology, International Image Interoperability Framework (IIIF), Muninn Project ontologies, National Diet Library Dublin Core Metadata Description (DC-NDL), Ontology for Media Resources, Product Ontology, SKOS eXtension for Labels (SKOS-XL), Upper Mapping and Binding Exchange Layer (UMBEL), VRA Core, and Web Annotation Data model.

Licenses[6]: 32 projects/services do not announce any explicit license; 19 apply CC0 1.0 Universal, the most common license used by the 2018 survey respondents. Other licenses that respondents use, in order of frequency, are:

Open Data Commons Attribution (ODC-BY)
Public Domain Dedication and License (PPDL)
Open Data Commons Open Database License (ODC-ODbl)
Creative Commons Attribution-NonCommercial-NoDerivatives (BY-NC-ND)

Other licenses cited by 29 respondents not listed above:

Creative Commons Attribution-NonCommercial-ShareAlike (BY-NC-SA)
Creative Commons Public Domain Mark 1.0
Creative Commons Attribution 3.0 United States (CC-BY 3.0 US)
Creative Commons Attribution 4.0 International (CC-BY 4.0)
French Government’s Open Licence (similar to ODC-BY)
Open Government License [UK]

Accessibility: Of the 70 projects or services that publish linked data, 19 do not currently make their data accessible outside their institution. Those that do offer multiple access formats. Web pages are the most common, followed by file dumps, content negotiation, SPARQL endpoint, SPARQL editor, embedded markup, and applications. The most common serialization of linked data used is RDF/XML. Other serializations that are less often used by order of frequency cited: Turtle, JSON-LD, N-Triples, RDF/JSON, RDFa, N3 RDF triplets, and N-Quads.

Technologies: The technologies used by respondents to publish linked data are diverse, and most used multiple technologies. Table 5 lists the technologies used in order of frequency.

Table 5: Technologies used for publishing linked data.

No. of Projects Used	Technology (order of frequency)
More than 20	SPARQL, Java
10 – 20	Python, XSLT, RDF Store, Solr, Jena Applications, Virtuoso Universal Server (provides SPARQL endpoint),
2 – 9	Google Refine, Apache Fuseki, Blazegraph, GraphDB (formerly OWLIM by Ontotext Software), DIGIBIB for Libraries, 4store, Fedora Commons, Map/Reduce, Metafacture, Django, Elasticsearch, AllegroGraph, Drupal7, OpenRDF, Pubby
1	4store Seme4 platform, Amazon Neptune, Apache Spark, ARC2 on PHP, Arches, bib-lod-ui (Web app for publishing bibliographic Linked Open Data), bib-rdf-pipeline (converting MARC into RDF), Blacklight, Catamandu, Cliopatria, Cubicweb, D3 libraries, FAST converter, FreeMarker templates, Government Site Builder [Germany], Hbase/Hadoop, JAX-RS, MARC Report and MARC Global (from The MARC of Quality), MongoDB, Mapping Memory Mapper (3M), MarkLogic Semantics, Orbean Xforms, Permanent Identifiers for the Web, RDFLib for Python, ResearchSpace, Ruby on Rails, Skosmos Skosify EasyRdf library for PHP, SPARQL result visualizer, Squebi SPARQL editor, Stardog

Barriers: The rankings of barriers or challenges in publishing linked data are mostly the same in both the 2018 and 2015 surveys. The top barrier in both surveys was the steep learning curve for staff. So many wrote in “lack of resources” as a response to the 2015 survey that it was added as another choice in the 2018 survey, becoming the fourth-most cited barrier, tied with little documentation.

Table 6: Barriers encountered in publishing linked data.

Barriers/Challenges	2018		2015
Steep learning curve for staff	41	51%	40	51%
Inconsistency in legacy data	38	48%	33	42%
Selecting appropriate ontologies to represent our data	26	33%	31	39%
Lack of resources	23	29%
Little documentation or advice on how to build the systems	23	29%	21	27%
Establishing the links	22	28%	27	34%
Lack of tools	18	23%	15	19%
Immature software	17	21%	11	14%
Ascertaining who owns the data	4	5%	10	13%
Other	19	24%	21	27%
Respondents to question	80		79

Several respondents also noted as additional barriers the complexity of data transformations, lack of best practices, lack of tool integration, scaling triplestores, security and privacy issues, data sets too large to publish as a whole (and difficult for others to consume), and insufficient institutional support.

What and Why Linked Data Is Consumed

In the 2018 survey, a total of 69 projects described consumed linked data (compared to 68 projects in 2015). Table 7 shows the top 10 linked data sources consumed by the 2018 survey respondents (used by over 12 projects or services) compared to their usage in the 2015 survey. The first columns under 2018 and 2015 are the number of projects reporting that they consumed the resource, and the second columns are the percentage the resource is used of all projects that consumed linked data described in that year. Asterisks denote resources from institutions that responded to the 2018 survey. Percentage differences greater than 10% are denoted with a ^.

The biggest change: the surge in usage of Wikidata, ranking #5 among all the linked data sources by the 2018 survey respondents (compared to #15 in the 2015 survey), now tying WorldCat.org in usage. ISNI also rose to be in the “Top 10,” while “Resources we convert to linked data ourselves” fell from five in 2015 to 10 in 2018.

Table 7: Top 10 linked data sources consumed.

Top 10 Linked Data Sources Consumed	2018		2015
id.loc.gov*	39	57%	35	51%
VIAF (Virtual International Authority File)*	36	51%	41	60%
Dbpedia	30	43%	36	53%
GeoNames	29	42%	35	51%
Wikidata	28	41%^	6	9%
WorldCat.org*	28	41%^	15	22%
Getty Vocabularies	23	33%	16	24%
FAST (Faceted Application of Subject Terminology)*	17	25%	15	22%
ISNI (International Standard Name Identifier)	17	25%^	8	12%
Resources we convert to linked data ourselves	13	19%	17	25%

These could be considered successful publishers of linked data by the degree to which others consume the data provided. Although library-related linked data projects have tended to consume linked data from sources in the library domain, the emergence of Wikidata as a top resource, along with DBpedia and GeoNames, indicates more experimentation with expanding implementations’ scope to non-library sources. Just under half of the linked data projects and services that now consume Wikidata as a linked data source (13) have been in production for over four years.

Other linked data sources consumed by at least four projects or services in order of frequency cited: Europeana*, Deutsche National Bibliothek Linked Data Services*, Lexvo, WorldCat.org Works, data.bnf.fr *, ORCID (Open Researcher and Contributor ID), DPLA* (Digital Public Library of America), and Hispana.

The primary reasons for institutions to consume linked data had the same top rankings in both the 2018 and 2015 surveys, as shown in Table 8. Decreases of 10% or more appear in aspirations to improve SEO for local resources (from 28% in 2015 to 10% in 2018); achieve more effective internal metadata management (from 47% in 2015 to 30% in 2018); and to provide greater accuracy and scope in local search results (from 40% in 2015 to 28% in 2018).

Table 8: Chief motivations for consuming linked data.

Chief Motivations for Consuming Linked Data	2018	2015
Provide local users with a richer experience.	78%	75%
Enhance local data by consuming linked data from other sources.	71%	74%
Heard about linked data and wanted to try it out by using linked data resources.	33%	25%
More effective internal metadata management.	30%	47%
Experiment with combining different types of data into a single triple store.	29%	25%
Greater accuracy and scope in local search results.	28%	40%
Explore whether consuming linked data from external sources will improve Search Engine Optimization (SEO) for local resources.	10%	28%
Respondents to question	69	68

Barriers: The top barrier in both the 2018 and 2015 surveys was matching, disambiguating, and aligning source data and linked data sources. The biggest difference was an uptick in the number of responses pointing to unstable endpoints and service reliability, as shown in Table 9.

Table 9: Barriers encountered in consuming linked data.

Barriers/Challenges	2018		2015
Matching, disambiguating and aligning source data and the linked data resources	28	48%	23	39%
What is published to the Internet as linked data is not always reusable or lacks URIs	18	31%	16	27%
Size of RDF dumps	16	28%	12	20%
Unstable endpoints	16	28%	10	17%
Service reliability	15	26%	9	15%
Mapping of vocabulary	15	26%	17	29%
Understanding how the data is structured before using it.	14	24%	12	20%
Lack of needed off-the-shelf tools	14	24%	10	17%
Datasets not being updated	13	22%	14	24%
Lack of authority control	11	19%	15	25%
Volatility of data formats of dumps	10	17%	11	19%
Disambiguation of terms across different languages is difficult	8	14%	6	10%
It’s difficult to get other institutions to do their own harmonization between objects and concepts.	7	12%	9	15%
Other	13	22%	15	25%
Respondents to question	58		59

Other barriers and challenges written in included: ill-formed RDF serializations; automation processes to link the data are undeveloped; triplestores are not ideal for interactive workflows; complexity of current modeling; and the amount of data cleanup that is required.

Advice

Asked what they would do differently if they started their linked data project or service again, respondents wrote about integrating linked data into existing services, remediating legacy metadata, and adjusting expectations, such as:

Linked data is now more mature; we would therefore have a wider frame of reference. We would seek to develop a system more integrated with our core services.
Embed it better in our existing infrastructure. Take a more holistic perspective and try to incorporate the project more into existing procedures.
Do more data enhancement at time of conversion.
The clean-up of data sets would have benefitted from wider organizational support.
The check and correction of the catalog’s data need more staff dedicated in it.
The group of data on which the project would be carried out would be more consistent and we would try to give more visibility to the results.
Maybe we would try other reconciliation services, i.e., beyond OpenRefine.
I would set realistic expectations and make sure that people didn’t only focus on Google search results, but also understand how the data is used directly.
We are still figuring out the role of bibliographic linked data within our organization and its various processes. However, creating this service has been a major milestone and helped refocus the discussion by making it plain and clear what the possibilities (and limitations) are.

Respondents offered advice for others considering a linked data project. Some recurring themes included:

Take advantage of the many more exemplars of good practice and information related to linked data; read as widely as possible, including W3C recommendations, and consult with community experts.
Focus on what you want to achieve, develop internal and external use cases, match the use cases and user needs with what the data can provide, and be prepared to work with multiple data models. Define the scope and requirements before starting any work.
Develop a long-term plan for integration support and ongoing maintenance.
Consider joining up with others rather than doing everything yourself.
Integrate work on linked data projects into your daily workflows.
Never underestimate the amount of data cleanup that will be required. Think about acceptable levels of quality before starting.
Analyze your legacy data to determine what should be converted to linked data.
Avoid minting your own identifiers for everything, but instead use those already created by authoritative linked data sources whenever possible.
Use existing ontologies whenever possible rather than creating your own. Create your own only to fill gaps.
Select the most productive linked data sources by both the content and number of links to other linked data sources.
Make sure that linked data publishers are committed to ensuring their data is persistent.
Plan on how you will measure impact as early as you can; line up some early users who would benefit from the work. Expect that you will start to see benefits only after you have reached scale.
Be prepared to iterate based on feedback from your users.
Communicate, communicate, communicate! Publicize your results and how your target audiences will benefit.

Conclusion

The responses to the 2018 survey may be considered another partial snapshot of the linked data environment still evolving. This view is limited by which institutions respond to the survey and who responded, as responses even from linked data implementations described in earlier surveys may differ because of different individual perspectives. Responses from libraries (individual libraries, research libraries, national libraries, library networks) still predominate.

Even with an overlapping but different respondent pool, responses to both the 2015 and 2018 surveys are similar in many aspects. The chief motivations for publishing and consuming linked data, and the primary barriers encountered are the same.

Growth of activity and usage among the more mature linked data services are encouraging. The emergence of service providers may lead to fewer individual institutions launching their own linked data projects. Among the 2018 survey respondents, 37% relied at least partially on a system vendor, corporation, or external consultants or developers to implement their linked data project or service, and several institutions were clients of providers who also responded to the survey. Commercial providers did not see themselves in some of the survey questions, which are oriented toward cultural heritage institutions, and especially libraries.

Forty-two percent of the “new” responses to the survey (not described in previous surveys) were outside the library domain. Besides the new category of service providers noted earlier, we see more linked data initiatives from research institutions and cultural heritage organizations, such as Australian National University, Carnegie Hall, the Historic Environment Scotland, and the Rijksmuseum. Alas, Springer Nature remains the only publisher to respond to these surveys. But the growing diversity of linked data implementations reflected in the survey responses is suggested by the wide range of ontologies and technologies used.

A few noticeable differences from the 2015 survey responses:

Even fewer projects or services that only publish linked data, with a mirrored increase of those that both publish and consume linked data.
More staffing dedicated to linked data.
An increase in publishing linked data in schema.org and BIBFRAME, mirrored by a decrease in SKOS.
The rise of Wikidata as a linked data source. Noted the National Library of Finland, “Wikidata is becoming more and more significant for cultural heritage institutions, including our library.”

As the majority of linked projects and services described are either not yet in production or implemented within the last two years, these differences may reflect new trends rather than changes in implementations. Most linked data projects and services remain experimental or educational in nature. Observes the Oslo Public Library, “As far as I can see, Oslo public library is still the first and only library with its production catalogue and original cataloguing workflows done directly with linked data.”

Most implementers encourage more linked data initiatives. One advocate wrote in:

“This is the future of data for libraries and the longer we wait the further behind we’re going to fall.”

Appendix: Institutions Responding to International Linked Data Survey for Implementers

Table 10: Institutions responding to international linked data survey for implementers.

Responding Institution	Country	2018 Survey	2015 Survey	2014 Survey
Agence bibliographique d’lenseignement supérieur (ABES)	France		X
Agencia Española de Cooperación Internacional para el Desarrollo (AECID)	Spain	X	X
American Antiquarian Society	USA			X
American Numismatic Society	USA	X	X	X
Anythink Libraries	USA	X	X
Arapahoe Library District	USA		X
Archaeology Data Service (UK)	UK	X		X
Australian National University	Australia	X
Australian War Memorial	Australia			X*
Bavarian State Library	Germany	X
Biblioteca de Castilla y León	Spain	X*
Biblioteca de Galicia	Spain	X
Biblioteca della Camera dei deputati (Italy)	Italy		X	X
Biblioteca. Real Academia Nacional de Medicina	Spain		X
Biblioteca Valenciana Nicolau Primitiu	Spain	X	X
Biblioteca Virtual de Derecho Aragonés	Spain	X	X
Bibliotheque nationale de France	France	X	X
Bibliothèque nationale de Luxembourg	Luxembourg	X
BIBSYS NTNU (Norwegian University of Science and Technology)	Norway	X	X	X
Big Data Institute	Canada		X
British Library	UK	X	X	X
British Museum	UK	X*	X	X
Britain Memorial Library	UK			X*
Campus Condorcet	France			X*
Carleton College	USA		X	X
Carleton University	Canada	X*
Carnegie Hall	USA	X
Casalini Libri (SHARE-VDE group)	Italy	X
Charles University in Prague	Czech Republic	X		X
Chemical Heritage Foundation	USA		X
Colorado College	USA	X	X	X
Colorado State University	USA		X	X
Columbia University	USA		X	X*
Consejería de Educación, Cultura y Deportes Gobierno de Castilla-La Mancha, Españaa	Spain		X
Consorci de Serveis Universitaris de Catalunya	Spain	X*	X
Coordinamento delle Biblioteche Speciali e Specialistiche di Torino (CoBIS)	Italy	X
Cornell University	USA	X	X	X
Credo Reference	USA	X*
Cultural Heritage Agency of The Netherlands	The Netherlands	X
Dartmouth College	USA		X
Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Sciences	The Netherlands			X
Defense Language Institute Foreign Language Center (DLIFLC) Aiso Library	USA			X*
Digital Public Library of America	USA	X	X	X
Diputación de Málaga. Cultura y Deportes. Biblioteca Cánovas del Castillo (Biblioteca Virtual de la Provincia de Málaga)	Spain	X	X
Drexel University Libraries	USA		X
East China Normal University Library	China	X
Europeana Foundation	The Netherlands	X	X	X
Evansville Vanderburgh Public Library	USA		X
Frankfurt am Main University Library	Germany			X*
Free University of Amsterdam	The Netherlands			X*
Fundacción Ignacio Larramendi (Spain)	Spain	X	X	X
George Mason University, Roy Rosenzweig Center for History and New Media	USA	X*
George Washington University	USA	X
German National Library (Deutsche Nationalbibliothek)	Germany	X	X
Goldsmiths’ College	UK			X
Haute école de gestion de Genève (SwissBib)	Switzerland		X
Historic Environment Scotland	UK	X
International Institute of Social History	The Netherlands	X*
J. Paul Getty Trust (Getty Research Institute)	USA	X	X
Johns Hopkins University	USA			X*
Koninklijke Bibliotheek	The Netherlands		X
Korea National Library of Medicine	South Korea			X*
Laurentian University	Canada		X
Library Link Network	USA	X
Library of Congress	USA	X	X	X
Lund University	Sweden		X*
Memorial University of Newfoundland	Canada	X
Ministry of Defense (Spain) (Ministerio de Defensa )	Spain	X	X
Minnesota Historical Society	USA		X	X
Missoula Public Library	USA	X		X
Mt. Lebanon Public Library	USA			X*
National Diet Library	Japan	X	X
National Library Board (NLB) of Singapore	Singapore			X
National Library of Finland	Finland	X
National Library of Malaysia	Malaysia		X
National Library of Medicine	USA	X	X	X
National Library of Portugal	Portugal	X	X
National Library of Scotland	UK	X
National Library of Spain (Biblioteca Nacional de España)	Spain	X	X
National Library of Sweden	Sweden		X
National Library of Wales	UK	X*	X	X*
National Széchényi Library	Hungary	X	X
New York Public Library	USA		X
New York University	USA	X	X
North Carolina State University Libraries	USA	X*	X	X
North Rhine-Westphalian Library Service Center (HBZ)	Germany	X	X
NTNU (Norwegian University of Science and Technology) University Library	Norway		X	X
OCLC	USA	X	X	X
Ohio State University	USA		X*
Oregon State University	USA			X*
Oslo Public Library	Norway	X	X	X
Pratt Institute	USA	X	X
Prueba	Spain	X
Public Record Office, Victoria	Australia			X
Queen’s University Library	Australia			X
RERO – Library Network of Western Switzerland	Switzerland	X*	X
Research Libraries UK	UK			X
Rhodes University	South Africa	X*
Rijksmuseum Amsterdam	The Netherlands	X
Royal Commission on the Ancient and Historical Monuments of Scotland	UK			X*
Seme4	UK	X
Singapore Integrated Library Automation Services (SILAS)	Singapore			X*
Smithsonian	USA	X	X	X
Spanish Office of Library Cooperation	Spain	X*
Springer	USA	X	X	X
Stanford University	USA	X	X	X
Stichting Bibliotheek.nl	The Netherlands			X
Swiss National Library	Switzerland			X*
The European Library	The Netherlands		X	X
Thematix	USA	X*
Tresoar (Leeuwarden – The Netherlands)	The Netherlands			X
Università degli Studi Roma TRE	Italy		X
University College Dublin	Ireland			X
University College London (UCL)	UK			X
University of Alberta Libraries	Canada	X	X	X
University of Applied Sciences St. Poelten	Austria	X*	X
University of Arkansas	USA			X*
University of Bergen Library	Norway	X*	X*	X
University of British Columbia	Canada			X
University of California-Irvine	USA		X	X
University of California-Los Angeles	USA	X
University of Chicago	USA	X
University of Colorado Boulder	USA	X*
University of Florida	USA		X*
University of Illinois at Urbana-Champaign	USA	X		X
University of Limerick	Ireland			X*
University of Liverpool	UK		X
University of Maryland	USA	X*
University of Nevada, Las Vegas	USA	X	X
University of North Texas	USA			X
University of Oklahoma Libraries	USA	X
University of Oxford	UK	X		X
University of Pennsylvania Libraries	USA	X	X	X
University of South Florida, St. Petersburg	USA	X
University of Tennessee, Knoxville	USA		X
University of Texas at Austin	USA		X	X
University of Washington	USA	X*
University of Wisconsin – Madison	USA	X
Villanova University	USA		X
Wellcome Library	UK		X
Western Michigan University	USA		X	X
Woods Hole Oceanic Institute (MBLWHOI)	USA	X*
Yale Center for British Art	USA			X
Zeitschriftendatenbank	Germany		X*

*Institutions reporting linked data projects but described none

References

[1] The OCLC Research Library Partnership allies like-minded research libraries in twelve countries, providing a venue to undertake cooperative actions that benefit scholars and researchers everywhere. See: https://www.oclc.org/research/partnership.html

[2] The results of the 2014 survey were posted on HangingTogether.org between 28 August 2014 and 8 September 2014:

Linked Data Survey results 1—Who’s doing it

Linked Data Survey results 2—Examples in production

Linked Data Survey results 3—Why and what institutions are consuming

Linked Data Survey results 4—Why and what institutions are publishing

Linked Data Survey results 5—Technical details

Linked Data Survey results 6—Advice from the implementers

[3] Smith-Yoshimura. 2017. “Analysis of International Linked Data Survey for Implementers.” D-Lib Magazine, 22(7/8), 141–167. http://doi.org/10.1045/july2016-smith-yoshimura.

[4] Spreadsheets with the complete responses to the 2014, 2015 and 2018 International Linked Data Survey for Implementers (without the contact information which we promised we’d keep confidential) are publicly available at: http://www.oclc.org/content/dam/research/activities/linkeddata/oclc-research-linked-data-implementers-survey-2014.xlsx.

[5] BIBFRAME is the Bibliographic Framework Initiative launched by the Library of Congress to provide a foundation for the future of bibliographic description in the broader networked world. For details, see: https://www.loc.gov/bibframe/

[6] The W3C provides an overview of licensing for linked open data at https://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataLicensing. For Open Data Commons (ODC) licenses see https://creativecommons.org/licenses/.; for Creative Commons (CC) licenses see https://creativecommons.org/licenses/.

About the Author

Karen Smith-Yoshimura is a Senior Program Officer working with research institutions affiliated with the trans-national OCLC Research Library Partnership. She focuses on issues related to metadata needed to describe and provide access to the multilingual resources managed by libraries, archives, museums, and other cultural heritage organizations.

Subscribe to comments: For this article | For all articles

Analysis of 2018 International Linked Data Survey for Implementers

Introduction

Overview

What and Why Linked Data Is Published

What and Why Linked Data Is Consumed

Advice

Conclusion

Appendix: Institutions Responding to International Linked Data Survey for Implementers

References

About the Author

Leave a Reply

Current Issue

Previous Issues

For Authors