Linked data

Last updated
Wikidata in the Linked Open Data Cloud. Databases indicated as circles (with wikidata indicated as 'WD'), with grey lines linking databases in the network if their data is aligned. Generated from https://lod-cloud.net/datasets . Wikidata in the Linked Open Data cloud 2020-08-20.svg
Wikidata in the Linked Open Data Cloud. Databases indicated as circles (with wikidata indicated as ‘WD’), with grey lines linking databases in the network if their data is aligned. Generated from https://lod-cloud.net/datasets .
DBpedia as the most interlinked LOD dataset and crystallization point of the Linked Open Data Cloud since 2008, image from 2021, generated from https://lod-cloud.net. Screenshot from 2021-05-17 12-26-27.png
DBpedia as the most interlinked LOD dataset and crystallization point of the Linked Open Data Cloud since 2008, image from 2021, generated from https://lod-cloud.net.

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database. [1]

Contents

Tim Berners-Lee, director of the World Wide Web Consortium (W3C), coined the term in a 2006 design note about the Semantic Web project. [2]

Linked data may also be open data, in which case it is usually described as Linked Open Data. [3]

Principles

In his 2006 "Linked Data" note, Tim Berners-Lee outlined four principles of linked data, paraphrased along the following lines: [2]

  1. Uniform Resource Identifiers (URIs) should be used to name and identify individual things.
  2. HTTP URIs should be used to allow these things to be looked up, interpreted, and subsequently "dereferenced".
  3. Useful information about what a name identifies should be provided through open standards such as RDF, SPARQL, etc.
  4. When publishing data on the Web, other things should be referred to using their HTTP URI-based names.

Tim Berners-Lee later restated these principles at a 2009 TED conference, again paraphrased along the following lines: [4]

  1. All conceptual things should have a name starting with HTTP.
  2. Looking up an HTTP name should return useful data about the thing in question in a standard format.
  3. Anything else that that same thing has a relationship with through its data should also be given a name beginning with HTTP.

Components

Thus, we can identify the following components as essential to a global Linked Data system as envisioned, and to any actual Linked Data subset within it:

Linked open data

Linked open data are linked data that are open data. [5] [6] [7] Tim Berners-Lee gives the clearest definition of linked open data as differentiated from linked data.

Linked Open Data (LOD) is Linked Data which is released under an open license, which does not impede its reuse for free.

Tim Berners-Lee, Linked Data [2] [8]

Large linked open data sets include DBpedia, Wikibase, Wikidata and Open ICEcat  [ uk; nl ].

5-star linked open data

Deployment scheme for Linked Open Data 5-star deployment scheme for Open Data.png
Deployment scheme for Linked Open Data

In 2010, Tim Berners-Lee suggested a 5-star scheme for grading the quality of open data on the web, for which the highest ranking is Linked Open Data: [10]

History

The term "linked open data" has been in use since at least February 2007, when the "Linking Open Data" mailing list [11] was created. [12] The mailing list was initially hosted by the SIMILE project [13] at the Massachusetts Institute of Technology.

Linking Open Data community project

The above diagram shows which Linking Open Data datasets are connected, as of August 2014. This was produced by the Linked Open Data Cloud project, which was started in 2007. Some sets may include copyrighted data which is freely available. LOD Cloud 2014-08.svg
The above diagram shows which Linking Open Data datasets are connected, as of August 2014. This was produced by the Linked Open Data Cloud project, which was started in 2007. Some sets may include copyrighted data which is freely available.
The same diagram as above, but for February 2017, showing the growth in just two and a half years Lod-cloud 2017-02-20.png
The same diagram as above, but for February 2017, showing the growth in just two and a half years

The goal of the W3C Semantic Web Education and Outreach group's Linking Open Data community project is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources. In October 2007, datasets consisted of over two billion RDF triples, which were interlinked by over two million RDF links. [15] [16] By September 2011 this had grown to 31 billion RDF triples, interlinked by around 504 million RDF links. A detailed statistical breakdown was published in 2014. [17]

European Union projects

There are a number of European Union projects involving linked data. These include the linked open data around the clock (LATC) project, [18] the AKN4EU project for machine-readable legislative data, [19] the PlanetData project, [20] the DaPaaS (Data-and-Platform-as-a-Service) project, [21] and the Linked Open Data 2 (LOD2) project. [22] [23] [24] Data linking is one of the main goals of the EU Open Data Portal, which makes available thousands of datasets for anyone to reuse and link.

Ontologies

Ontologies are formal descriptions of data structures. Some of the better known ontologies are:

Datasets

Dataset instance and class relationships

Clickable diagrams that show the individual datasets and their relationships within the DBpedia-spawned LOD cloud (as by the figures to the right) are available. [29] [30]

See also

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

<span class="mw-page-title-main">FOAF</span> Semantic Web ontology to describe relations between people

FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe themselves. FOAF allows groups of people to describe social networks without the need for a centralised database.

A web resource is any identifiable resource present on or connected to the World Wide Web. Resources are identified using Uniform Resource Identifiers (URIs). In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

<span class="mw-page-title-main">Semantic technology</span> Technology to help machines understand data

The ultimate goal of semantic technology is to help machines understand data. To enable the encoding of semantics with the data, well-known technologies are RDF and OWL. These technologies formally represent the meaning involved in information. For example, ontology can describe concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

<span class="mw-page-title-main">Semantically Interlinked Online Communities</span>

Semantically Interlinked Online Communities Project is a Semantic Web technology. SIOC provides methods for interconnecting discussion methods such as blogs, forums and mailing lists to each other. It consists of the SIOC ontology, an open-standard machine-readable format for expressing the information contained both explicitly and implicitly in Internet discussion methods, of SIOC metadata producers for a number of popular blogging platforms and content management systems, and of storage and browsing/searching systems for leveraging this SIOC data.

<span class="mw-page-title-main">DBpedia</span> Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

The Semantic Web Stack, also known as Semantic Web Cake or Semantic Web Layer Cake, illustrates the architecture of the Semantic Web.

The FAO geopolitical ontology is an ontology developed by the Food and Agriculture Organization of the United Nations (FAO) to describe, manage and exchange data related to geopolitical entities such as countries, territories, regions and other similar areas.

<span class="mw-page-title-main">Ontology engineering</span> Field that studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

The Vocabulary of Interlinked Datasets (VoID) is an RDF vocabulary, and a set of instructions, that enables the discovery and usage of linked data sets. A linked dataset is a collection of data, published and maintained by a single provider, available as RDF on the Web, where at least some of the resources in the dataset are identified by dereferencable URIs. VoID is used to provide metadata on RDF datasets to facilitate query processing on a graph of interlinked datasets in the semantic web.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

Linked Data Platform (LDP) is a linked data specification defining a set of integration patterns for building RESTful HTTP services that are capable of read/write of RDF data.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

References

  1. "Linked Data as JSON". Linked Data as JSON. Retrieved 2020-12-04.
  2. 1 2 3 Tim Berners-Lee (2006-07-27). "Linked Data". Design Issues. W3C . Retrieved 2010-12-18.
  3. "What are Linked Data and Linked Open Data?". Ontotext. Retrieved 2019-05-08.
  4. "Tim Berners-Lee on the next Web". Archived from the original on 2011-04-10. Retrieved 2009-03-15.
  5. "Frequently Asked Questions (FAQs) - Linked Data - Connect Distributed Data across the Web". Archived from the original on 2015-11-18. Retrieved 2014-12-29.
  6. "COAR » 7 things you should know about…Linked Data". Archived from the original on 2015-11-18. Retrieved 2015-12-29.
  7. "Linked Data Basics for Techies". Archived from the original on 2021-05-05. Retrieved 2015-12-29.
  8. "5 Star Open Data".
  9. "5-star Open Data". 5stardata.info. Retrieved 2021-03-07.
  10. "What is 5 Star Linked Data? | Webize Everything Community Group". www.w3.org. Retrieved 2021-03-07.
  11. "public-lod@w3.org Mail Archives".
  12. "SweoIG/TaskForces/CommunityProjects/LinkingOpenData/NewsArchive".
  13. "SIMILE Project - Mailing Lists".
  14. Linking open data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
  15. "SweoIG/TaskForces/CommunityProjects/LinkingOpenData - W3C Wiki". esw.w3.org. Retrieved 22 March 2018.
  16. Fensel, Dieter; Facca, Federico Michele; Simperl, Elena; Ioan, Toma (2011). Semantic Web Services. Springer. p. 99. ISBN   978-3642191923.
  17. Max. "State of the LOD Cloud". linkeddatacatalog.dws.informatik.uni-mannheim.de. Retrieved 22 March 2018.
  18. "Linked open data around the clock (LATC)". latc-project.eu. Archived from the original on 19 September 2018. Retrieved 22 March 2018.
  19. Flatt, Amelie; Langner, Arne; Leps, Olof (2022), "Model-Driven Development of AKN Application Profiles: Background and Requirements", Model-Driven Development of Akoma Ntoso Application Profiles, Cham: Springer International Publishing, pp. 5–12, doi:10.1007/978-3-031-14132-4_2, ISBN   978-3-031-14131-7 , retrieved 2023-01-07
  20. "Welcome to PlanetData! - PlanetData". planet-data.eu. Archived from the original on 21 April 2021. Retrieved 22 March 2018.
  21. "DaPaaS". project.dapaas.eu. Archived from the original on 18 December 2020. Retrieved 22 March 2018.
  22. Linking Open Data 2 (LOD2)
  23. "CORDIS FP7 ICT Projects – LOD2". European Commission. 2010-04-20.
  24. "LOD2 Project Fact Sheet – Project Summary" (PDF). 2010-09-01. Archived from the original (PDF) on 2011-07-20. Retrieved 2010-12-18.
  25. "GRID Statistics". grid.ac/stats. Retrieved 2018-10-26.
  26. "GRID Policies". grid.ac. Retrieved 2018-10-26.
  27. "KnowWhereGraph". knowwheregraph.org. Retrieved 2022-05-16.
  28. Krzysztof Janowicz; Pascal Hitzler; Wenwen Li; Dean Rehberger; Mark Schildhauer; Rui Zhu; Cogan Shimizu; Colby K. Fisher; Ling Cai; Gengchen Mai; Joseph Zalewski; Lu Zhou; Shirly Stephen; Seila Gonzalez Estrecha; Bryce D. Mecum; Anna Lopez-Carr; Andrew Schroeder; Dave Smith; Dawn J. Wright; Sizhe Wang; Yuanyuan Tian; Zilong Liu; Meilin Shi; Anthony D'Onofrio; Zhining G; Kitty Currier (2022). "Know, Know Where, Knowwheregraph: A Densely Connected, Cross-Domain Knowledge Graph and Geo-Enrichment Service Stack for Applications in Environmental Intelligence". AI Magazine. 43 (1): 30–39. doi: 10.1609/aimag.v43i1.19120 . hdl: 1983/be176aba-9dec-456c-9615-01a0e8556b7b .
  29. "Instance relationships amongst datasets". fu-berlin.de. Archived from the original on 2012-10-17. Retrieved 22 March 2018.
  30. "Class relationships amongst datasets". Archived from the original on 28 August 2011. Retrieved 22 March 2018.

Further reading