Issue 42, 2018-11-08

Analysis of 2018 International Linked Data Survey for Implementers

OCLC Research conducted an International Linked Data Survey for Implementers in 2014 and 2015. Curious about what might have changed since the last survey, and eager to learn about new projects or services that format metadata as linked data or make subsequent uses of it, OCLC Research repeated the survey between 17 April and 25 May 2018.

A total of 143 institutions in 23 countries responded to one or more of the surveys. This analysis covers the 104 linked data projects or services described by the 81 institutions which responded to the 2018 survey—those that publish linked data, consume linked data, or both. This article provides an overview of the linked data projects or services institutions have implemented or are implementing; what data they publish and consume; the reasons given for implementing linked data and the barriers encountered; and some advice given by respondents to those considering implementing a linked data project or service. Differences with previous survey responses are noted, but as the majority of linked projects and services described are either not yet in production or implemented within the last two years, these differences may reflect new trends rather than changes in implementations.

By Karen Smith-Yoshimura

Introduction

The impetus for an “International Linked Data Survey for Implementers” was a set of discussions with OCLC Research Library Partnership (RLP)[1] metadata managers who were aware of some linked data projects or services but felt there must be more “out there” that they should know about. The survey instrument was designed in consultation with OCLC colleagues and a few RLP institutions and beta tested by several linked data implementers. We conducted the initial survey in July – August 2014, distributing the link to the survey on multiple listservs and on Twitter. The survey targeted those who had already implemented a linked data project or service, or were in the process of doing so. Questions were asked both about publishing linked data and consuming linked data. The results were published in a series of posts on the OCLC Research blog, HangingTogether.org.[2]

While the initial survey results received a generally appreciative response from readers, some noted and regretted the absence of several prominent linked data efforts in Europe. To address these gaps, we repeated the survey in June-July 2015. The results were published in the July/August 2016 issue of D-Lib Magazine.[3]

Curious about what might have changed since the last survey, and eager to learn about new projects or services that format metadata as linked data or make subsequent uses of it, OCLC Research repeated the survey between 17 April and 25 May 2018. More institutions responded than in the previous survey (81 institutions, compared to 71 institutions in 2015) but described slightly fewer linked data projects or services (104, compared to 112 in 2015).

Spreadsheets[4] containing all responses to the 2014, 2015, and 2018 surveys include links to the linked data projects or services in production.

Overview

A total of 143 institutions in 23 countries responded to one or more of the surveys (see the list appended at the end of this article). Of the 81 institutions responding to the 2018 survey, 46 (57%) had responded to one or both of the previous surveys. Even those who had responded to a previous survey did not always describe the same linked data projects or services. Of the 104 linked data projects or services described, only 42 had been described previously. Even when the same project or service was described, the respondent sometimes differed from the one who responded previously. Some respondents did not answer every question, so the totals for each question may vary.

Half of the respondents to the 2018 survey that have implemented or are implementing a linked data project or service reported that they plan to implement another linked data project within the next two years, a slight decrease from the 60% who had plans within the next two years in 2015. But two of the 2015 respondents who reported that they had plans did indeed do so when repeating the survey in 2018.

Respondents from the United States to the 2018 survey accounted for 42%, with 34 institutions, followed by Spain (12), the United Kingdom (8), and The Netherlands (4). We received three responses each from Canada, Germany, and Norway, two responses from Italy, and one response each from Australia, Austria, China, the Czech Republic, Finland, France, Hungary, Japan, Luxembourg, Portugal, South Africa, and Switzerland.

We categorized the responding institutions by type. Research libraries represented 28% of the 2018 respondents (23), followed by 13 national libraries (16%), 11 research institutions (14%), eight library networks and eight government (10%), six service providers (7%), five public libraries (6%), four museums (5%), and two “others” (a concert venue and a publisher). This breakdown generally follows the types of institutions that responded to the previous survey with one exception—for the first time, we received responses from service providers, which provide linked data services for their customers. Figure 1 compares the number of responding institutions to the three surveys by type of institution.

Chart of types of institutions responding to the surveys conducted in 2014, 2015, and 2018
Figure 1:Types of institutions responding to the surveys conducted in 2014, 2015, and 2018.

Of the 104 linked data projects or services described at various levels of detail by the 2018 survey respondents, three-fourths are in production, of which 23% (18) have been in production for more than two years but less than four, and 40% (31) have been in production for four or more years. Three of them are not yet accessible, and three are “private,” for that institution’s use only.

Most of the linked data projects or services are done entirely in-house (61 or 59% of the implementations), with 23 (22%) part of a multi-institutional implementation, and 20 (19%) provided by an external vendor or service provider. But even those who responded that the work to deliver linked data functionality was all done in-house reported collaboration with external groups or organizations. Only 25 (24%) of the implementations involved “only my institution.” In order of frequency named, the external collaborators were:

  • other libraries or archives
  • other universities or research institutions
  • external consultants or developers
  • a systems vendor
  • part of an international collaboration and other consortium members
  • part of a national collaboration
  • a corporation or company
  • part of a discipline-specific collaboration
  • a foundation
  • a scholarly society
  • part of a state- or province-wide initiative.

The rankings resemble those in the 2015 survey, with slightly higher rankings for a systems vendor, international collaboration, other consortium members, discipline-specific collaboration, and foundation.

Staffing: Most of the institutions that have implemented or are implementing linked data projects/services have added linked data to the responsibilities of current staff (86); only 15 have not. The most significant change from the 2015 survey is that the number of those who have staff dedicated to linked data increased by 50%. In Table 1, the first column shows the number of responses and the second column the percentage of all those who responded to the question.

Table 1: Comparison of staffing in 2018 and 2015 surveys.

Staffing

2018

2015

Added to the responsibilities of current staff.

86

85%

98

92%

We have staff dedicated to linked data project(s).

30

30%

20

19%

Adding/have added new staff with linked data expertise.

10

10%

4

4%

Adding/have added temporary staff with linked data expertise.

13

13%

13

12%

Hiring/have hired external consultants with linked data expertise.

12

12%

17

16%

Respondents to question

101

107

Funding: Twenty-two of the linked data projects or services received grant funding; 73 (70%) are funded by the library/archive and/or the parent institution. Six linked data projects received funding support from partner institutions, five were privately funded, and one received corporate funding. Eight have never applied for a grant, but plan to.

Success assessment: More respondents reported that their linked data project or service was successful or “mostly” successful in 2018 than in 2015: 58 compared to 46. Fewer didn’t know yet as their projects were still at an early stage (pre-implementation or early implementation): 31 in 2018 compared to 52 in 2015. Comments from respondents whose linked data project or service has been in production for at least four years reveal the following indicators of success.

  • Usage: Most respondents noted substantial increases in usage over the years, and more contributors. One noted the high search ranking for rare content. Others noted the increase in requests for digital services support discovered through their linked data offerings. Conference presentations and articles by others referring to their own work were cited as evidence of success. Expanded exposure on the Internet is “dragging people into library services.”
  • Data re-use: Several noted that other applications making use of their linked data implementation is a metric of success. Another metric is the number of bulk downloads.
  • Interoperability: Several noted their linked data service provides access to their other resources. One noted the value of aggregating data from resources around the world to their users.
  • User satisfaction: Linked data offers users a richer experience that is much more contextualized and inter-related. One pointed to better support of multilingualism by fetching multilingual labels from linked data vocabularies. One noted that their “happy users” are “probably unaware that the service is based on linked data.”
  • Influence: The success of a project gains attention and illustrates what’s possible. Several noted that their services are well-known in the community and developers in the cultural sector are increasingly aware of their value. Their data models have influenced other initiatives and moved the discussion on linked data in the library community.
  • Professional development: Even absent metrics demonstrating linked data’s value to others, linked data projects still provide professional development for staff. “We learned a great deal that we can build on.”

The reasons given by implementers for assessing their project or service as partially or mostly successful centered around:

  • the need to upgrade and expand the service but lacking funding for it
  • lack of tools to measure satisfaction of the users with the linked data offering
  • difficulty in assessing success of data dissemination and in determining how the data is re-used
  • impression that interest isn’t high within the library community
  • impact seems marginal
  • did not result in similar projects, and the project did not grow in size and scale

In both the 2018 and 2015 surveys, most projects/services both consume and publish linked data. Even fewer of the described projects only publish linked data in the 2018 survey.

Table 2: Survey responses on how linked data is used.

How linked data is used

2018 survey

2015 survey

Consume linked data

34

38

Publish linked data

5

10

Both consume & publish

65

64

What and Why Linked Data Is Published

Given the relatively large representation of libraries among respondents, it is no surprise that descriptive metadata and bibliographic data are the most common types of data published (51 and 47 responses respectively), with authority data a close third (45). Other types of data published as linked data reported: Data about people (33), ontologies/vocabularies (33), digital collections (27), geographic data (23), datasets (19), data about museum objects (12), organizational data (12), encoded archival descriptions (3), and statistical data (3). Other responses included holdings and availability information, data about performance work, data bout monuments and archeology, provenance, and time periods.

In the three years since the 2015 survey, the number of linked data datasets with over 1 billion triples increased from three to 11, and three of them are over 5 billion triples: Biblioteca de Galicia ‘s digital library (6.3 billion triples), Europeana (a little over 5 billion), and OCLC’s WorldCat linked data (over 10 billion triples). But most of the linked data datasets are small. Of the 63 responses reporting their datasets’ size, 33 were less than 10 million triples, nine were between 10 and 100 million triples, and 10 were between 100 million and 1 billion triples.

Comparing the 2018 and 2015 survey results, the key motivations to publish linked data appear unchanged. Because some 2015 survey respondents had noted the need to publish linked data in order to consume it as an “other” motivation, we offered it as an option in the 2018 survey; it became the fourth most commonly cited reason.

Table 3: Chief motivations for publishing linked data.

Chief Motivations for Publishing Linked Data (n=92 in both surveys)

2018

2015

Expose data to a larger audience on the web.

74%

73%

Demonstrate what could be done with datasets as linked data.

65%

64%

Heard about linked data and wanted to try it out by exposing some local data as linked data.

45%

47%

Needed to publish linked data in order to consume it.

25%

Explore whether publishing data as linked data will improve Search Engine Optimization (SEO) for local resources.

24%

30%

Administration requested that we expose our data as linked data.

11%

5%

The British Library noted that its linked data implementations were part of the UK Government Public Sector Initiative. Other reasons written in included:

  • experiment with linked data outside the library catalog
  • increase interoperability
  • required for a grant-funded project
  • to link together information across different institutions
  • to provide an ontology extension to BIBFRAME[5]
  • develop supporting tools

More than half of the responses either did not know the average number of requests the linked data project or service received daily over the previous six months, did not keep or have access to usage statistics, or had no usage yet (61 of 103 responses). The eight most heavily used linked data datasets as measured by the average number of requests a day, with over 100,000 requests per day, are:

  • American Numismatic Society’s nomisma, a thesaurus of numismatic concepts. In the 2015 survey, it had reported daily usage as between 10,000 and 50,000 requests a day, so its usage has more than doubled over the last three years.
  • Bibliothèque nationale de France’s data.bnf.fr, providing access to the BnF’s collections and providing a hub among different resources. In the 2015 survey, it also had reported daily usage as between 10,000 and 50,000 requests a day.
  • Europeana, which aggregates metadata for digital objects from museums, archives, and audiovisual archives across Europe. It had reported the same daily usage in 2015.
  • Library of Congress’ Linked Data Service with over 50 vocabularies. Although usage fluctuates, it receives 500,000 to a million requests a day.
  • National Diet Library’s NDL Search, providing access to bibliographic data from Japanese libraries, archives, museums and academic research institutions. It had reported the same daily usage in 2015.
  • North Rhine-Westphalian Library Service Center’s Linked Open Data service, providing access to bibliographic resources, libraries and related organizations, and authority data. It had reported the same daily usage in 2015.
  • OCLC’s Virtual International Authority File (VIAF), an aggregation of over 40 authority files from different countries and regions. It had reported the same daily usage in 2015.
  • OCLC’s WorldCat Linked Data, a catalog of over 400 million bibliographic records made experimentally available in linked data form. It had reported the same daily usage in 2015.

Another three linked data datasets receive between 50,000 and 100,000 requests a day:

  • British Library’s British National Bibliography. In the 2015 survey, it had reported daily usage as between 10,000 and 50,000 requests a day, so its usage has doubled over the last three years.
  • National Library of Finland’s Finnish Thesaurus and Ontology Service. (Although this service was launched in 2014, it did not respond to the 2015 survey.)
  • OCLC’s FAST (Faceted Application of Subject Headings), a faceted subject heading schema derived from Library of Congress’ subject headings. In the 2015 survey, it had reported daily usage as between 10,000 and 50,000 requests a day, so its usage has doubled over the last three years.

Linked data datasets use a variety of RDF vocabularies and ontologies, and most use multiple ones. The percentage of those using Simple Knowledge Organization System (SKOS) decreased from 60% of respondents in 2015 to 44% in 2018, which was mirrored by an increase in those using Schema.org (30% in 2015 vs. 46% in 2018). Similar decreases showed up in those using Dublin Core Metadata Element Set, DCMI Metadata Terms, and Friend of a Friend (foaf), as well as a smaller decrease in those using RDF Schema. BIBFRAME vocabulary usage increased, from 15% in 2015 to 27% in 2018.

The table below shows the top eight in 2018 (used by at least 20% of respondents) compared to the 2015 responses. Ninety-five responded to the question in 2018 vs. 99 in 2015. The first column shows the number of responses and the second column the percentage of all those who responded to the question.

Table 4: Top 8 RDF vocabularies/ontologies used in 2018 compared to 2015.

RDF Vocabularies/Ontologies Used

2018

2015

Schema.org

44

46%

30

30%

SKOS

42

44%

59

60%

Dublin Core Terms

39

41%

51

52%

FOAF

36

38%

55

56%

DCMI Metadata Terms

35

37%

49

49%

RDF Schema

35

37%

45

45%

BibFrame

26

27%

15

15%

Local vocabulary

21

22%

19

19%

Other RDF vocabularies and ontologies named by the 2018 survey respondents in order of the frequency cited are:

  1. Resource Description and Access (rda)
  2. Europeana Data Model vocabulary (edm)
  3. The Bibliographic Ontology (bibo)
  4. CIDOC Conceptual Reference Model (crm)
  5. Expression of Core FRBR Concepts in RDF (frbr)
  6. Metadata Authority Description Schema (mads)
  7. OWL 2 Web Ontology Language (owl 2)

  1. ISBD elements (isbd)
  2. WGS84 Geo Positioning (geo)

  1. The Event Ontology (event)
  2. Music Ontology (mo)

  1. The OAI ORE terms vocabulary (ore)
  2. International Standard Name Identifier (isni)
  3. Metadata Object Description Schema (mods)

    VIVO Core Ontology (vivo)

  1. DPLA Metadata Application Profile (MAP)
  2. Biographical Ontology (bio)
  3. FRBR-aligned Bibliographic Ontology (fabio)

    Organization Ontology (org)

  1. Data Catalog Vocabulary (dcat)
  2. EAC-CPF Descriptions Ontology for Linked Archival Data (eac-cpf)

    RDA Group 2 Elements (rdag2)

  1. British Library Terms RDF schema (blt)
  2. Library extension of schema.org [aka Purl.org/library] (lib)

  1. Archival collections ontology (arch)
  2. Nomisma Ontology

Thirty-seven respondents cited RDF vocabularies/ontologies not listed above. In alphabetical order they are: Activity Streams, Description of a Project (DOAP), Exif Data Description Vocabulary, Funding, Research Administration and Projects Ontology (FRAPO), GND Ontology, International Image Interoperability Framework (IIIF), Muninn Project ontologies, National Diet Library Dublin Core Metadata Description (DC-NDL), Ontology for Media Resources, Product Ontology, SKOS eXtension for Labels (SKOS-XL), Upper Mapping and Binding Exchange Layer (UMBEL), VRA Core, and Web Annotation Data model.

Licenses[6]: 32 projects/services do not announce any explicit license; 19 apply CC0 1.0 Universal, the most common license used by the 2018 survey respondents. Other licenses that respondents use, in order of frequency, are:

  • Open Data Commons Attribution (ODC-BY)
  • Public Domain Dedication and License (PPDL)
  • Open Data Commons Open Database License (ODC-ODbl)
  • Creative Commons Attribution-NonCommercial-NoDerivatives (BY-NC-ND)

Other licenses cited by 29 respondents not listed above:

Accessibility: Of the 70 projects or services that publish linked data, 19 do not currently make their data accessible outside their institution. Those that do offer multiple access formats. Web pages are the most common, followed by file dumps, content negotiation, SPARQL endpoint, SPARQL editor, embedded markup, and applications. The most common serialization of linked data used is RDF/XML. Other serializations that are less often used by order of frequency cited: Turtle, JSON-LD, N-Triples, RDF/JSON, RDFa, N3 RDF triplets, and N-Quads.

Technologies: The technologies used by respondents to publish linked data are diverse, and most used multiple technologies. Table 5 lists the technologies used in order of frequency.

Table 5: Technologies used for publishing linked data.

No. of Projects Used

Technology (order of frequency)

More than 20

SPARQL, Java

10 – 20

Python, XSLT, RDF Store, Solr, Jena Applications, Virtuoso Universal Server (provides SPARQL endpoint),

2 – 9

Google Refine, Apache Fuseki, Blazegraph, GraphDB (formerly OWLIM by Ontotext Software), DIGIBIB for Libraries, 4store, Fedora Commons, Map/Reduce, Metafacture, Django, Elasticsearch, AllegroGraph, Drupal7, OpenRDF, Pubby

1

4store Seme4 platform, Amazon Neptune, Apache Spark, ARC2 on PHP, Arches, bib-lod-ui (Web app for publishing bibliographic Linked Open Data), bib-rdf-pipeline (converting MARC into RDF), Blacklight, Catamandu, Cliopatria, Cubicweb, D3 libraries, FAST converter, FreeMarker templates, Government Site Builder [Germany], Hbase/Hadoop, JAX-RS, MARC Report and MARC Global (from The MARC of Quality), MongoDB, Mapping Memory Mapper (3M), MarkLogic Semantics, Orbean Xforms, Permanent Identifiers for the Web, RDFLib for Python, ResearchSpace, Ruby on Rails, Skosmos Skosify EasyRdf library for PHP, SPARQL result visualizer, Squebi SPARQL editor, Stardog

Barriers: The rankings of barriers or challenges in publishing linked data are mostly the same in both the 2018 and 2015 surveys. The top barrier in both surveys was the steep learning curve for staff. So many wrote in “lack of resources” as a response to the 2015 survey that it was added as another choice in the 2018 survey, becoming the fourth-most cited barrier, tied with little documentation.

Table 6: Barriers encountered in publishing linked data.

Barriers/Challenges

2018

2015

Steep learning curve for staff

41

51%

40

51%

Inconsistency in legacy data

38

48%

33

42%

Selecting appropriate ontologies to represent our data

26

33%

31

39%

Lack of resources

23

29%

Little documentation or advice on how to build the systems

23

29%

21

27%

Establishing the links

22

28%

27

34%

Lack of tools

18

23%

15

19%

Immature software

17

21%

11

14%

Ascertaining who owns the data

4

5%

10

13%

Other

19

24%

21

27%

Respondents to question

80

79

Several respondents also noted as additional barriers the complexity of data transformations, lack of best practices, lack of tool integration, scaling triplestores, security and privacy issues, data sets too large to publish as a whole (and difficult for others to consume), and insufficient institutional support.

What and Why Linked Data Is Consumed

In the 2018 survey, a total of 69 projects described consumed linked data (compared to 68 projects in 2015). Table 7 shows the top 10 linked data sources consumed by the 2018 survey respondents (used by over 12 projects or services) compared to their usage in the 2015 survey. The first columns under 2018 and 2015 are the number of projects reporting that they consumed the resource, and the second columns are the percentage the resource is used of all projects that consumed linked data described in that year. Asterisks denote resources from institutions that responded to the 2018 survey. Percentage differences greater than 10% are denoted with a ^.

The biggest change: the surge in usage of Wikidata, ranking #5 among all the linked data sources by the 2018 survey respondents (compared to #15 in the 2015 survey), now tying WorldCat.org in usage. ISNI also rose to be in the “Top 10,” while “Resources we convert to linked data ourselves” fell from five in 2015 to 10 in 2018.

Table 7: Top 10 linked data sources consumed.

Top 10 Linked Data Sources Consumed

2018

2015

id.loc.gov*

39

57%

35

51%

VIAF (Virtual International Authority File)*

36

51%

41

60%

Dbpedia

30

43%

36

53%

GeoNames

29

42%

35

51%

Wikidata

28

41%^

6

9%

WorldCat.org*

28

41%^

15

22%

Getty Vocabularies

23

33%

16

24%

FAST (Faceted Application of Subject Terminology)*

17

25%

15

22%

ISNI (International Standard Name Identifier)

17

25%^

8

12%

Resources we convert to linked data ourselves

13

19%

17

25%

These could be considered successful publishers of linked data by the degree to which others consume the data provided. Although library-related linked data projects have tended to consume linked data from sources in the library domain, the emergence of Wikidata as a top resource, along with DBpedia and GeoNames, indicates more experimentation with expanding implementations’ scope to non-library sources. Just under half of the linked data projects and services that now consume Wikidata as a linked data source (13) have been in production for over four years.

Other linked data sources consumed by at least four projects or services in order of frequency cited: Europeana*, Deutsche National Bibliothek Linked Data Services*, Lexvo, WorldCat.org Works, data.bnf.fr *, ORCID (Open Researcher and Contributor ID), DPLA* (Digital Public Library of America), and Hispana.

The primary reasons for institutions to consume linked data had the same top rankings in both the 2018 and 2015 surveys, as shown in Table 8. Decreases of 10% or more appear in aspirations to improve SEO for local resources (from 28% in 2015 to 10% in 2018); achieve more effective internal metadata management (from 47% in 2015 to 30% in 2018); and to provide greater accuracy and scope in local search results (from 40% in 2015 to 28% in 2018).

Table 8: Chief motivations for consuming linked data.

Chief Motivations for Consuming Linked Data

2018

2015

Provide local users with a richer experience.

78%

75%

Enhance local data by consuming linked data from other sources.

71%

74%

Heard about linked data and wanted to try it out by using linked data resources.

33%

25%

More effective internal metadata management.

30%

47%

Experiment with combining different types of data into a single triple store.

29%

25%

Greater accuracy and scope in local search results.

28%

40%

Explore whether consuming linked data from external sources will improve Search Engine Optimization (SEO) for local resources.

10%

28%

Respondents to question

69

68

Barriers: The top barrier in both the 2018 and 2015 surveys was matching, disambiguating, and aligning source data and linked data sources. The biggest difference was an uptick in the number of responses pointing to unstable endpoints and service reliability, as shown in Table 9.

Table 9: Barriers encountered in consuming linked data.

Barriers/Challenges

2018

2015

Matching, disambiguating and aligning source data and the linked data resources

28

48%

23

39%

What is published to the Internet as linked data is not always reusable or lacks URIs

18

31%

16

27%

Size of RDF dumps

16

28%

12

20%

Unstable endpoints

16

28%

10

17%

Service reliability

15

26%

9

15%

Mapping of vocabulary

15

26%

17

29%

Understanding how the data is structured before using it.

14

24%

12

20%

Lack of needed off-the-shelf tools

14

24%

10

17%

Datasets not being updated

13

22%

14

24%

Lack of authority control

11

19%

15

25%

Volatility of data formats of dumps

10

17%

11

19%

Disambiguation of terms across different languages is difficult

8

14%

6

10%

It’s difficult to get other institutions to do their own harmonization between objects and concepts.

7

12%

9

15%

Other

13

22%

15

25%

Respondents to question

58

59

Other barriers and challenges written in included: ill-formed RDF serializations; automation processes to link the data are undeveloped; triplestores are not ideal for interactive workflows; complexity of current modeling; and the amount of data cleanup that is required.

Advice

Asked what they would do differently if they started their linked data project or service again, respondents wrote about integrating linked data into existing services, remediating legacy metadata, and adjusting expectations, such as:

  • Linked data is now more mature; we would therefore have a wider frame of reference. We would seek to develop a system more integrated with our core services.
  • Embed it better in our existing infrastructure. Take a more holistic perspective and try to incorporate the project more into existing procedures.
  • Do more data enhancement at time of conversion.
  • The clean-up of data sets would have benefitted from wider organizational support.
  • The check and correction of the catalog’s data need more staff dedicated in it.
  • The group of data on which the project would be carried out would be more consistent and we would try to give more visibility to the results.
  • Maybe we would try other reconciliation services, i.e., beyond OpenRefine.
  • I would set realistic expectations and make sure that people didn’t only focus on Google search results, but also understand how the data is used directly.
  • We are still figuring out the role of bibliographic linked data within our organization and its various processes. However, creating this service has been a major milestone and helped refocus the discussion by making it plain and clear what the possibilities (and limitations) are.

Respondents offered advice for others considering a linked data project. Some recurring themes included:

  • Take advantage of the many more exemplars of good practice and information related to linked data; read as widely as possible, including W3C recommendations, and consult with community experts.
  • Focus on what you want to achieve, develop internal and external use cases, match the use cases and user needs with what the data can provide, and be prepared to work with multiple data models. Define the scope and requirements before starting any work.
  • Develop a long-term plan for integration support and ongoing maintenance.
  • Consider joining up with others rather than doing everything yourself.
  • Integrate work on linked data projects into your daily workflows.
  • Never underestimate the amount of data cleanup that will be required. Think about acceptable levels of quality before starting.
  • Analyze your legacy data to determine what should be converted to linked data.
  • Avoid minting your own identifiers for everything, but instead use those already created by authoritative linked data sources whenever possible.
  • Use existing ontologies whenever possible rather than creating your own. Create your own only to fill gaps.
  • Select the most productive linked data sources by both the content and number of links to other linked data sources.
  • Make sure that linked data publishers are committed to ensuring their data is persistent.
  • Plan on how you will measure impact as early as you can; line up some early users who would benefit from the work. Expect that you will start to see benefits only after you have reached scale.
  • Be prepared to iterate based on feedback from your users.
  • Communicate, communicate, communicate! Publicize your results and how your target audiences will benefit.

Conclusion

The responses to the 2018 survey may be considered another partial snapshot of the linked data environment still evolving. This view is limited by which institutions respond to the survey and who responded, as responses even from linked data implementations described in earlier surveys may differ because of different individual perspectives. Responses from libraries (individual libraries, research libraries, national libraries, library networks) still predominate.

Even with an overlapping but different respondent pool, responses to both the 2015 and 2018 surveys are similar in many aspects. The chief motivations for publishing and consuming linked data, and the primary barriers encountered are the same.

Growth of activity and usage among the more mature linked data services are encouraging. The emergence of service providers may lead to fewer individual institutions launching their own linked data projects. Among the 2018 survey respondents, 37% relied at least partially on a system vendor, corporation, or external consultants or developers to implement their linked data project or service, and several institutions were clients of providers who also responded to the survey. Commercial providers did not see themselves in some of the survey questions, which are oriented toward cultural heritage institutions, and especially libraries.

Forty-two percent of the “new” responses to the survey (not described in previous surveys) were outside the library domain. Besides the new category of service providers noted earlier, we see more linked data initiatives from research institutions and cultural heritage organizations, such as Australian National University, Carnegie Hall, the Historic Environment Scotland, and the Rijksmuseum. Alas, Springer Nature remains the only publisher to respond to these surveys. But the growing diversity of linked data implementations reflected in the survey responses is suggested by the wide range of ontologies and technologies used.

A few noticeable differences from the 2015 survey responses:

  • Even fewer projects or services that only publish linked data, with a mirrored increase of those that both publish and consume linked data.
  • More staffing dedicated to linked data.
  • An increase in publishing linked data in schema.org and BIBFRAME, mirrored by a decrease in SKOS.
  • The rise of Wikidata as a linked data source. Noted the National Library of Finland, “Wikidata is becoming more and more significant for cultural heritage institutions, including our library.”

As the majority of linked projects and services described are either not yet in production or implemented within the last two years, these differences may reflect new trends rather than changes in implementations. Most linked data projects and services remain experimental or educational in nature. Observes the Oslo Public Library, “As far as I can see, Oslo public library is still the first and only library with its production catalogue and original cataloguing workflows done directly with linked data.”

Most implementers encourage more linked data initiatives. One advocate wrote in:

“This is the future of data for libraries and the longer we wait the further behind we’re going to fall.”

Appendix: Institutions Responding to International Linked Data Survey for Implementers

Table 10: Institutions responding to international linked data survey for implementers.

Responding Institution

Country

2018 Survey

2015 Survey

2014 Survey

Agence bibliographique d’lenseignement supérieur (ABES)

France

X

Agencia Española de Cooperación Internacional para el Desarrollo (AECID)

Spain

X

X

American Antiquarian Society

USA

X

American Numismatic Society

USA

X

X

X

Anythink Libraries

USA

X

X

Arapahoe Library District

USA

X

Archaeology Data Service (UK)

UK

X

X

Australian National University

Australia

X

Australian War Memorial

Australia

X*

Bavarian State Library

Germany

X

Biblioteca de Castilla y León

Spain

X*

Biblioteca de Galicia

Spain

X

Biblioteca della Camera dei deputati (Italy)

Italy

X

X

Biblioteca. Real Academia Nacional de Medicina

Spain

X

Biblioteca Valenciana Nicolau Primitiu

Spain

X

X

Biblioteca Virtual de Derecho Aragonés

Spain

X

X

Bibliotheque nationale de France

France

X

X

Bibliothèque nationale de Luxembourg

Luxembourg

X

BIBSYS NTNU (Norwegian University of Science and Technology)

Norway

X

X

X

Big Data Institute

Canada

X

British Library

UK

X

X

X

British Museum

UK

X*

X

X

Britain Memorial Library

UK

X*

Campus Condorcet

France

X*

Carleton College

USA

X

X

Carleton University

Canada

X*

Carnegie Hall

USA

X

Casalini Libri (SHARE-VDE group)

Italy

X

Charles University in Prague

Czech Republic

X

X

Chemical Heritage Foundation

USA

X

Colorado College

USA

X

X

X

Colorado State University

USA

X

X

Columbia University

USA

X

X*

Consejería de Educación, Cultura y Deportes Gobierno de Castilla-La Mancha, Españaa

Spain

X

Consorci de Serveis Universitaris de Catalunya

Spain

X*

X

Coordinamento delle Biblioteche Speciali e Specialistiche di Torino (CoBIS)

Italy

X

Cornell University

USA

X

X

X

Credo Reference

USA

X*

Cultural Heritage Agency of The Netherlands

The Netherlands

X

Dartmouth College

USA

X

Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Sciences

The Netherlands

X

Defense Language Institute Foreign Language Center (DLIFLC) Aiso Library

USA

X*

Digital Public Library of America

USA

X

X

X

Diputación de Málaga. Cultura y Deportes. Biblioteca Cánovas del Castillo (Biblioteca Virtual de la Provincia de Málaga)

Spain

X

X

Drexel University Libraries

USA

X

East China Normal University Library

China

X

Europeana Foundation

The Netherlands

X

X

X

Evansville Vanderburgh Public Library

USA

X

Frankfurt am Main University Library

Germany

X*

Free University of Amsterdam

The Netherlands

X*

Fundacción Ignacio Larramendi (Spain)

Spain

X

X

X

George Mason University, Roy Rosenzweig Center for History and New Media

USA

X*

George Washington University

USA

X

German National Library (Deutsche Nationalbibliothek)

Germany

X

X

Goldsmiths’ College

UK

X

Haute école de gestion de Genève (SwissBib)

Switzerland

X

Historic Environment Scotland

UK

X

International Institute of Social History

The Netherlands

X*

J. Paul Getty Trust (Getty Research Institute)

USA

X

X

Johns Hopkins University

USA

X*

Koninklijke Bibliotheek

The Netherlands

X

Korea National Library of Medicine

South Korea

X*

Laurentian University

Canada

X

Library Link Network

USA

X

Library of Congress

USA

X

X

X

Lund University

Sweden

X*

Memorial University of Newfoundland

Canada

X

Ministry of Defense (Spain) (Ministerio de Defensa )

Spain

X

X

Minnesota Historical Society

USA

X

X

Missoula Public Library

USA

X

X

Mt. Lebanon Public Library

USA

X*

National Diet Library

Japan

X

X

National Library Board (NLB) of Singapore

Singapore

X

National Library of Finland

Finland

X

National Library of Malaysia

Malaysia

X

National Library of Medicine

USA

X

X

X

National Library of Portugal

Portugal

X

X

National Library of Scotland

UK

X

National Library of Spain (Biblioteca Nacional de España)

Spain

X

X

National Library of Sweden

Sweden

X

National Library of Wales

UK

X*

X

X*

National Széchényi Library

Hungary

X

X

New York Public Library

USA

X

New York University

USA

X

X

North Carolina State University Libraries

USA

X*

X

X

North Rhine-Westphalian Library Service Center (HBZ)

Germany

X

X

NTNU (Norwegian University of Science and Technology) University Library

Norway

X

X

OCLC

USA

X

X

X

Ohio State University

USA

X*

Oregon State University

USA

X*

Oslo Public Library

Norway

X

X

X

Pratt Institute

USA

X

X

Prueba

Spain

X

Public Record Office, Victoria

Australia

X

Queen’s University Library

Australia

X

RERO – Library Network of Western Switzerland

Switzerland

X*

X

Research Libraries UK

UK

X

Rhodes University

South Africa

X*

Rijksmuseum Amsterdam

The Netherlands

X

Royal Commission on the Ancient and Historical Monuments of Scotland

UK

X*

Seme4

UK

X

Singapore Integrated Library Automation Services (SILAS)

Singapore

X*

Smithsonian

USA

X

X

X

Spanish Office of Library Cooperation

Spain

X*

Springer

USA

X

X

X

Stanford University

USA

X

X

X

Stichting Bibliotheek.nl

The Netherlands

X

Swiss National Library

Switzerland

X*

The European Library

The Netherlands

X

X

Thematix

USA

X*

Tresoar (Leeuwarden – The Netherlands)

The Netherlands

X

Università degli Studi Roma TRE

Italy

X

University College Dublin

Ireland

X

University College London (UCL)

UK

X

University of Alberta Libraries

Canada

X

X

X

University of Applied Sciences St. Poelten

Austria

X*

X

University of Arkansas

USA

X*

University of Bergen Library

Norway

X*

X*

X

University of British Columbia

Canada

X

University of California-Irvine

USA

X

X

University of California-Los Angeles

USA

X

University of Chicago

USA

X

University of Colorado Boulder

USA

X*

University of Florida

USA

X*

University of Illinois at Urbana-Champaign

USA

X

X

University of Limerick

Ireland

X*

University of Liverpool

UK

X

University of Maryland

USA

X*

University of Nevada, Las Vegas

USA

X

X

University of North Texas

USA

X

University of Oklahoma Libraries

USA

X

University of Oxford

UK

X

X

University of Pennsylvania Libraries

USA

X

X

X

University of South Florida, St. Petersburg

USA

X

University of Tennessee, Knoxville

USA

X

University of Texas at Austin

USA

X

X

University of Washington

USA

X*

University of Wisconsin – Madison

USA

X

Villanova University

USA

X

Wellcome Library

UK

X

Western Michigan University

USA

X

X

Woods Hole Oceanic Institute (MBLWHOI)

USA

X*

Yale Center for British Art

USA

X

Zeitschriftendatenbank

Germany

X*

*Institutions reporting linked data projects but described none

References


[1] The OCLC Research Library Partnership allies like-minded research libraries in twelve countries, providing a venue to undertake cooperative actions that benefit scholars and researchers everywhere. See: https://www.oclc.org/research/partnership.html

[2] The results of the 2014 survey were posted on HangingTogether.org between 28 August 2014 and 8 September 2014:

Linked Data Survey results 1—Who’s doing it

Linked Data Survey results 2—Examples in production

Linked Data Survey results 3—Why and what institutions are consuming

Linked Data Survey results 4—Why and what institutions are publishing

Linked Data Survey results 5—Technical details

Linked Data Survey results 6—Advice from the implementers

[3] Smith-Yoshimura. 2017. “Analysis of International Linked Data Survey for Implementers.” D-Lib Magazine, 22(7/8), 141–167. http://doi.org/10.1045/july2016-smith-yoshimura.

[4] Spreadsheets with the complete responses to the 2014, 2015 and 2018 International Linked Data Survey for Implementers (without the contact information which we promised we’d keep confidential) are publicly available at: http://www.oclc.org/content/dam/research/activities/linkeddata/oclc-research-linked-data-implementers-survey-2014.xlsx.

[5] BIBFRAME is the Bibliographic Framework Initiative launched by the Library of Congress to provide a foundation for the future of bibliographic description in the broader networked world. For details, see: https://www.loc.gov/bibframe/

[6] The W3C provides an overview of licensing for linked open data at https://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataLicensing. For Open Data Commons (ODC) licenses see https://creativecommons.org/licenses/.; for Creative Commons (CC) licenses see https://creativecommons.org/licenses/.

About the Author

Karen Smith-Yoshimura is a Senior Program Officer working with research institutions affiliated with the trans-national OCLC Research Library Partnership. She focuses on issues related to metadata needed to describe and provide access to the multilingual resources managed by libraries, archives, museums, and other cultural heritage organizations.

Leave a Reply

ISSN 1940-5758