ShEx

Last updated
ShEx - Shape Expressions
ShexLogo.png
Paradigm Data Validation
Designed by Eric Prud'hommeaux, Iovka Boneva, Jose Emilio Labra Gayo, Gregg Kellogg, Shape Expressions W3C Community Group
Stable release
2.1 / November 21, 2018;4 years ago (2018-11-21)
Scope Semantic Web
Implementation language JavaScript, Scala
Filename extensions shex, sx
Website www.w3.org/community/shex/
Major implementations
shex.js, [1] Shaclex [1]
Influenced by
Turtle, SPARQL, RelaxNG
Influenced
SHACL

Shape Expressions (ShEx) [2] is a data modelling language for validating and describing a Resource Description Framework (RDF).

Contents

It was proposed at the 2012 RDF Validation Workshop [3] as a high-level, concise language for RDF validation.

The shapes can be defined in a human-friendly compact syntax called ShExC or using any RDF serialization formats like JSON-LD or Turtle.

ShEx expressions can be used both to describe RDF and to automatically check the conformance of RDF data. The syntax of ShEx is similar to Turtle and SPARQL while the semantics is inspired by regular expression languages like RelaxNG.

Example

PREFIX:<http://example.org/>PREFIXschema:<http://schema.org/>PREFIXxsd:<http://www.w3.org/2001/XMLSchema#>:Person{schema:namexsd:string;schema:knows@:Person*;}

The previous example declares that nodes conforming to shape Person must have one property schema:name with a string value and zero or more properties schema:knows whose values must conform with shape Person.

Implementations

ProjectProgramming languageVersionLatest releaseCompatible ShEx versionFeatures
value checkingcardinalitymanifest shapemapimportsexternal shapesannotationssemantic actions
ShEx.ex Elixirv0.1.42020-10-13????nononono
Ruby ShEx Ruby0.7.12022-01-292.0??????yes
shexjava Javanonenone2.0???????
PyShEx Pythonv0.8.12022-04-142.0yesyesnono???
entityshape Python0.0.22023-06-24?yesyesnonononono
shaclex Scala0.1.702020-11-02????????
shex.js JavaScriptv1.0.0-alpha.262023-04-25????????

Online playgrounds and demos

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

In computing, RELAX NG is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

RDF Schema (Resource Description Framework Schema, variously abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can extract some knowledge from them using a query language, like SPARQL.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

<span class="mw-page-title-main">RDFLib</span> Python library to serialize, parse and process RDF data

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. This library contains parsers/serializers for almost all of the known RDF serializations, such as RDF/XML, Turtle, N-Triples, & JSON-LD, many of which are now supported in their updated form. The library also contains both in-memory and persistent Graph back-ends for storing RDF information and numerous convenience functions for declaring graph namespaces, lodging SPARQL queries and so on. It is in continuous development with the most recent stable release, rdflib 6.1.1 having been released on 20 December 2021. It was originally created by Daniel Krech with the first release in November, 2002.

In computing, Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the Resource Description Framework (RDF) data model. Turtle syntax is similar to that of SPARQL, an RDF query language. It is a common data format for storing RDF data, along with N-Triples, JSON-LD and RDF/XML.

An RDF query language is a computer language, specifically a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

N-Triples is a format for storing and transmitting data. It is a line-based, plain text serialisation format for RDF graphs, and a subset of the Turtle format. N-Triples should not be confused with Notation3 which is a superset of Turtle. N-Triples was primarily developed by Dave Beckett at the University of Bristol and Art Barstow at the World Wide Web Consortium (W3C).

The Publishing Requirements for Industry Standard Metadata (PRISM) for the Internet, computing, and computer science, is a specification that defines a set of XML metadata vocabularies for syndicating, aggregating, post-processing and multi-purposing content.

TriG is a serialization format for RDF graphs. It is a plain text format for serializing named graphs and RDF Datasets which offers a compact and readable alternative to the XML-based TriX syntax.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.

The PROV standard defines a data model, serializations, and definitions to support the interchange of provenance information on the Web. Here provenance includes all "information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness".

SciGraph is a search engine tool developed by Springer Nature. The technology, which is considered a Linked Open Data (LOD) platform, collects information that covers the research landscape, which includes research projects, publications, conferences, funding agencies, and others. Key features of the platform include the detailed semantic description of the relationship of information and the visualization of the scholarly domain.

References

  1. 1 2 Labra Gayo, Jose Emilio; Prud'hommeaux, Eric; Boneva, Iovka; Kontokostas, Dimitris (2018). Validating RDF Data. Morgan & Claypool. p. 328. ISBN   9781681731650.
  2. "Shape Expressions Language 2.0".
  3. "RDF Validation Workshop: Practical Assurances for Quality RDF Data".

Further reading

Specification
Other

See also