Web resource

Last updated April 07, 2024

A web resource is any identifiable resource (digital, physical, or abstract) present on or connected to the World Wide Web.^[1]^[2]^[3] Resources are identified using Uniform Resource Identifiers (URIs).^[1]^[4] In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).^[5]

The concept of a web resource has evolved during the Web's history, from the early notion of static addressable documents or files, to a more generic and abstract definition, now encompassing every "thing" or entity that can be identified, named, addressed or handled, in any way whatsoever, in the web at large, or in any networked information system. The declarative aspects of a resource (identification and naming) and its functional aspects (addressing and technical handling) weren't clearly distinct in the early specifications of the web, and the very definition of the concept has been the subject of long and still open debate involving difficult, and often arcane, technical, social, linguistic and philosophical issues.

From documents and files to web resources

In the early specifications of the web (1990–1994), the term resource is barely used at all. The web is designed as a network of more or less static addressable objects, basically files and documents, linked using Uniform Resource Locators (URLs). A web resource is implicitly defined as something which can be identified. The identification serves two distinct purposes: naming and addressing; the latter only depends on a protocol. It is notable that RFC 1630 does not attempt to define at all the notion of resource; actually it barely uses the term besides its occurrence in Uniform Resource Identifier (URI), Uniform Resource Locator (URL), and Uniform Resource Name (URN), and still speaks about "Objects of the Network".

RFC 1738 (December 1994) further specifies URLs, the term "Universal" being changed to "Uniform". The document is making a more systematic use of resource to refer to objects which are "available", or "can be located and accessed" through the internet. There again, the term resource itself is not explicitly defined.

From web resources to abstract resources

The first explicit definition of resource is found in RFC 2396, in August 1998:

A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources. The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, provided that the conceptual mapping is not changed in the process.

Although examples in this document were still limited to physical entities, the definition opened the door to more abstract resources. Providing a concept is given an identity, and this identity is expressed by a well-formed URI (Uniform Resource Identifier, a superset of URLs), then a concept can be a resource as well.

In January 2005, RFC 3986 makes this extension of the definition completely explicit: '…abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).'

Resources in RDF and the Semantic Web

First released in 1999, RDF was first intended to describe resources, in other words to declare metadata of resources in a standard way. A RDF description of a resource is a set of triples (subject, predicate, object), where subject represents the resource to be described, predicate a type of property relevant to this resource, and object can be data or another resource. The predicate itself is considered as a resource and identified by a URI. Hence, properties like "title", "author" are represented in RDF as resources, which can be used, in a recursive way, as the subject of other triples. Building on this recursive principle, RDF vocabularies, such as RDF Schema (RDFS), Web Ontology Language (OWL), and Simple Knowledge Organization System will pile up definitions of abstract resources such as classes, properties, concepts, all identified by URIs.

RDF also specifies the definition of anonymous resources or blank nodes, which are not absolutely identified by URIs.

Using HTTP URIs to identify abstract resources

URLs, particularly HTTP URIs, are frequently used to identify abstract resources, such as classes, properties or other kind of concepts. Examples can be found in RDFS or OWL ontologies. Since such URIs are associated with the HTTP protocol, the question arose of which kind of representation, if any, should one get for such resources through this protocol, typically using a web browser, and if the syntax of the URI itself could help to differentiate "abstract" resources from "information" resources. The URI specifications such as RFC 3986 left to the protocol specification the task of defining actions performed on the resources and they do not provide any answer to this question. It had been suggested that an HTTP URI identifying a resource in the original sense, such as a file, document, or any kind of so-called information resource, should be "slash" URIs — in other words, should not contain a fragment identifier, whereas a URI used to identify a concept or abstract resource should be a "hash" URI using a fragment identifier.

For example: http://www.example.org/catalogue/widgets.html would both identify and locate a web page (maybe providing some human-readable description of the widgets sold by Silly Widgets, Inc.) whereas http://www.example.org/ontology#Widget would identify the abstract concept or class "Widget" in this company ontology, and would not necessarily retrieve any physical resource through HTTP protocol. But it has been answered that such a distinction is impossible to enforce in practice, and famous standard vocabularies provide counter-examples widely used. For example, the Dublin Core concepts such as "title", "publisher", "creator" are identified by "slash" URIs like http://purl.org/dc/elements/1.1/title.

The general question of which kind of resources HTTP URI should or should not identify has been formerly known in W3C as the httpRange-14 issue, following its name on the list defined by the (TAG). The TAG delivered in 2005 a final answer to this issue, making the distinction between an "information resource" and a "non-information" resource dependent on the type of answer given by the server to a "GET" request:

2xx Success indicates resource is an information resource.
303 See Other indicates the resource could be informational or abstract; the redirection target could tell you.
4xx Client Error provides no information at all.

This allows vocabularies (like Dublin Core, FOAF, and Wordnet) to continue to use slash instead of hash for pragmatic reasons. While this compromise seems to have met a consensus in the Semantic Web community, some of its prominent members such as Pat Hayes have expressed concerns both on its technical feasibility and conceptual foundation. According to Patrick Hayes' viewpoint, the very distinction between "information resource" and "other resource" is impossible to find and should better not be specified at all, and ambiguity of the referent resource is inherent to URIs like to any naming mechanism.

Resource ownership, intellectual property and trust

In RDF, "anybody can declare anything about anything". Resources are defined by formal descriptions which anyone can publish, copy, modify and publish over the web. If the content of a web resource in the classical sense (a web page or on-line file) is clearly owned by its publisher, who can claim intellectual property on it, an abstract resource can be defined by an accumulation of RDF descriptions, not necessarily controlled by a unique publisher, and not necessarily consistent with each other. It's an open issue to know if a resource should have an authoritative definition with clear and trustable ownership, and in this case, how to make this description technically distinct from other descriptions. A parallel issue is how intellectual property may apply to such descriptions.

Related Research Articles

The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources. It was the first metadata standard for describing web content. The Dublin Core Metadata Initiative (DCMI) is responsible for formulating the Dublin Core; DCMI is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies an abstract or physical resource, such as resources on a webpage, mail address, phone number, books, real-world objects such as people and places, concepts. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable. URNs cannot be used to directly locate an item and need not be resolvable, as they are simply templates that another parser may use to find an item.

REST is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of a distributed, Internet-scale hypermedia system, such as the Web, should behave. The REST architectural style emphasises uniform interfaces, independent deployment of components, the scalability of interactions between them, and creating a layered architecture to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems.

The Internationalized Resource Identifier (IRI) is an internet protocol standard which builds on the Uniform Resource Identifier (URI) protocol by greatly expanding the set of permitted characters. It was defined by the Internet Engineering Task Force (IETF) in 2005 in RFC 3987. While URIs are limited to a subset of the US-ASCII character set, IRIs may additionally contain most characters from the Universal Character Set, including Chinese, Japanese, Korean, and Cyrillic characters.

A persistent uniform resource locator (PURL) is a uniform resource locator (URL) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using HTTP status codes.

In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.

URL encoding, officially known as percent-encoding, is a method to encode arbitrary data in a uniform resource identifier (URI) using only the US-ASCII characters legal within a URI. Although it is known as URL encoding, it is also used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). Consequently, it is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.

<span class="mw-page-title-main">URI normalization</span> Process by which URIs are standardized

URI normalization is the process by which URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine if two syntactically different URIs may be equivalent.

Gellish is an ontology language for data storage and communication, designed and developed by Andries van Renssen since mid-1990s. It started out as an engineering modeling language but evolved into a universal and extendable conceptual data modeling language with general applications. Because it includes domain-specific terminology and definitions, it is also a semantic data modelling language and the Gellish modeling methodology is a member of the family of semantic modeling methodologies.

In RDF, a blank node is a node in an RDF graph representing a resource for which a URI or literal is not given. The resource represented by a blank node is also called an anonymous resource. According to the RDF standard a blank node can only be used as subject or object of an RDF triple.

The FAO geopolitical ontology is an ontology developed by the Food and Agriculture Organization of the United Nations (FAO) to describe, manage and exchange data related to geopolitical entities such as countries, territories, regions and other similar areas.

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".

The HTTP Location header field is returned in responses from an HTTP server under two circumstances:

To ask a web browser to load a different web page. In this circumstance, the Location header should be sent with an HTTP status code of 3xx. It is passed as part of the response by a web server when the requested URI has:
To provide information about the location of a newly created resource. In this circumstance, the Location header should be sent with an HTTP status code of 201 or 202.

geo URI scheme System of geographic location identifiers

The geo URI scheme is a Uniform Resource Identifier (URI) scheme defined by the Internet Engineering Task Force's RFC 5870 as:

a Uniform Resource Identifier (URI) for geographic locations using the 'geo' scheme name. A 'geo' URI identifies a physical location in a two- or three-dimensional coordinate reference system in a compact, simple, human-readable, and protocol-independent way.

A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.

A well-known URI is a Uniform Resource Identifier for URL path prefixes that start with /.well-known/. They are implemented in webservers so that requests to the servers for well-known services or information are available at URLs consistent well-known locations across servers.

References

Citations

1 2 RFC 3986 Uniform Resource Identifier (URI): Generic Syntax
↑ Roy T. Fielding's Dissertation
↑ What do HTTP URIs Identify?, by Tim Berners-Lee
↑ RFC 1738 Uniform Resource Locators (URL)
↑ RDF Current Status

Sources

Web Characterization Terminology & Definitions Sheet, editors: Brian Lavoie and Henrik Frystyk Nielsen, May 1999.
A Short History of "Resource" in web architecture., by Tim Berners-Lee
Presentations at IRW 2006 conference, Web resources
- A Pragmatic Theory of Reference for the Web, by Dan Connolly.
- In Defense of Ambiguity, by Patrick Hayes.
Towards an OWL ontology for identity on the web, by Valentina Presutti and Aldo Gangemi, SWAP2006 conference.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[rfc3986-1] 1 2 RFC 3986 Uniform Resource Identifier (URI): Generic Syntax

[fielding_dissertation-2] Roy T. Fielding's Dissertation

[uri_identify-3] What do HTTP URIs Identify?, by Tim Berners-Lee

[rfc1738-4] RFC 1738 Uniform Resource Locators (URL)

[rdf_index-5] RDF Current Status

[1]

[2]

[3]

[4]

[5]