Freebase (database)

Last updated
Freebase
Freebase Logo optimised.svg
Type of site
Online database
Available inEnglish
Owner Metaweb Technologies (Google)
URL www.freebase.com [ dead link ]
CommercialNo
RegistrationOptional
Launched3 March 2007;17 years ago (2007-03-03)
Current statusOffline (since 2 May 2016), succeeded by Wikidata [1] [2]
Content license
Creative Commons Attribution License

Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions. [3] [2] Freebase aimed to create a global resource that allowed people (and machines) to access common information more effectively. It was developed by the American software company Metaweb and run publicly beginning in March 2007. Metaweb was acquired by Google in a private sale announced on 16 July 2010. [4] Google's Knowledge Graph is powered in part by Freebase. [5]

Contents

During its existence, Freebase data was available for commercial and non-commercial use under a Creative Commons Attribution License, and an open API, RDF endpoint, and a database dump is provided for programmers.

On 16 December 2014, Google announced that it would shut down Freebase over the succeeding six months and help with the move of the data from Freebase to Wikidata. [1]

On 16 December 2015, Google officially announced the Knowledge Graph API, which is meant to be a replacement to the Freebase API. Freebase.com was officially shut down on 2 May 2016. [6] [2]

Both Graphd and MQL, the graph database and JSON-based query language developed by Metaweb for Freebase, are open-sourced by Google under the Apache 2.0 license, and are available on GitHub. Graphd is open-sourced on September 8, 2018. [7] MQL is open-sourced on August 4, 2020. [8]

Overview

On 3 March 2007 Metaweb announced Freebase, describing it as "an open shared database of the world's knowledge", and "a massive, collaboratively edited database of cross-linked data". Often understood as a database model using Wikipedia-turned-database or entity-relationship model, Freebase provided an interface that allowed non-programmers to fill in structured data, or metadata, of general information and to categorize or connect data items in meaningful, semantic ways.

Described by Tim O'Reilly upon the launch, "Freebase is the bridge between the bottom up vision of Web 2.0 collective intelligence and the more structured world of the semantic web". [9]

Freebase contained data harvested from sources such as Wikipedia, NNDB, Fashion Model Directory and MusicBrainz, as well as data contributed by its users. The structured data was licensed under the Creative Commons Attribution License, [9] and a JSON-based HTTP API is provided to programmers for developing applications on any platform to utilize the Freebase data. The source code for the Metaweb application itself is proprietary.

Freebase ran on a database infrastructure created in-house by Metaweb that use a graph model: Instead of using tables and keys to define data structures, Freebase defined its data structure as a set of nodes and a set of links that established relationships between the nodes. Because its data structure was non-hierarchical, Freebase could model much more complex relationships between individual elements than a conventional database [ citation needed ], and was open for users to enter new objects and relationships into the underlying graph. Queries to the database are made in Metaweb Query Language (MQL) and served by a triplestore called graphd. [10]

Development

Danny Hillis Danny Hillis, 2014 (crop).jpg
Danny Hillis

Danny Hillis first described his idea for creating a knowledge web he called Aristotle in a paper in 2000, [11] but he said he did not try to build the system until he had recruited technical experts. Veda Hlubinka-Cook, an expert in parallel computing, [3] became Metaweb's Executive Vice President for Product. Kurt Bollacker brought deep expertise in distributed systems, database design, and information retrieval to his role as Chief Scientist at Metaweb. John Giannandrea, formerly Chief Technologist at Tellme Networks and Chief Technologist of the Web browser group at Netscape/AOL, was Chief Technology Officer. [3]

Originally accessible by invitation only, Freebase opened full anonymous read access to the public in its alpha stage of development and later required registration only for data contributions.

On 29 October 2008, at the International Semantic Web Conference, Freebase released its RDF service for generating RDF representations of Freebase topics, allowing Freebase to be used as linked data. [12]

Organization and policy

Freebase's subjects are called "topics", and the data stored about them depended on their "type", as to how they were classified. For example, an entry for Arnold Schwarzenegger, the former governor of California, would be entered as a topic that would include a variety of types describing him as an actor, bodybuilder, and politician. [13] As of January 2014, Freebase had approximately 44 million topics and 2.4 billion facts. [14]

Freebase's types are themselves user-editable. [9] Each type had a number of defined predicates, called "properties".

[U]nlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomy approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions. [9]

However, Freebase differed from the wiki model in many ways. User-created types were not adopted in the "public commons" until promoted by a Metaweb employee. Also, users could not modify each other's types. The reason Freebase could not open up permissions of schemas is that external applications relied on them; thus, changing a type's schema – for instance by deleting a property or changing a simple property – might have broken queries for API users and even within Freebase itself, for example in saved views.

Discontinuation

On 16 December 2014, the Freebase team officially announced [1] that the website and the API would be shut down by 30 June 2015. Google provided an update on 16 December 2015 that they would discontinue the Freebase API and widget three months after a Suggest widget replacement was launched in early 2016.

See also

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

<span class="mw-page-title-main">Apache Jena</span> Open source semantic web framework for Java

Apache Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these. A model can also be queried through SPARQL 1.1.

<span class="mw-page-title-main">DBpedia</span> Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete Web application framework. OSF is made available under the Apache 2 license.

Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.

<span class="mw-page-title-main">Blazegraph</span> Open source triplestore and graph database

Blazegraph is an open source triplestore and graph database, developed by Systap, which is used in the Wikidata SPARQL endpoint and by other large customers. It is licensed under the GNU GPL.

TerminusDB is an open source knowledge graph and document store. It is used to build versioned data products. It is a native revision control database that is architecturally similar to Git. It is listed on DB-Engines.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics or relationships underlying these entities.

Datacommons.org is an open knowledge graph hosted by Google that provides a unified view across multiple public datasets, combining economic, scientific and other open datasets into an integrated data graph. The Datacommons.org site was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network. Google has worked with partners including the United States Census, the World Bank, and US Bureau of Labor Statistics to populate the repository, which also hosts data from Wikipedia, the National Oceanic and Atmospheric Administration and the Federal Bureau of Investigation. The service expanded during 2019 to include an RDF-style Knowledge Graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019. In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus.

<span class="mw-page-title-main">Ontotext GraphDB</span> RDF-store

Ontotext GraphDB is a graph database and knowledge discovery tool compliant with RDF and SPARQL and available as a high-availability cluster. Ontotext GraphDB is used in various European research projects.

References

  1. 1 2 3 "Freebase". Google Plus . 16 December 2014. Archived from the original on 20 March 2019.
  2. 1 2 3 Tanon, Thomas; Vrandečić, Denny; Sebastian, Schaffert; Thomas, Steiner; Lydia, Pintscher (2016). From Freebase to Wikidata: The Great Migration. WWW '16: Proceedings of the 25th International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland Conferences Steering Committee: International World Wide Web. pp. 1419–1428. doi: 10.1145/2872427.2874809 . ISBN   978-1-4503-4143-1.
  3. 1 2 3 Markoff, John (2007-03-09). "Start-up Aims for Database to Automate Web Searching". The New York Times . Retrieved 2007-03-09.
  4. Menzel, Jack (July 16, 2010). "Deeper Understanding with Metaweb". Google Official Blog. Retrieved September 6, 2014.
  5. Singhal, Amit (May 16, 2012). "Introducing the Knowledge Graph: Things, Not Strings". Google Official Blog. Retrieved September 6, 2014.
  6. "So long and thanks for all the data!". 2 May 2016. Retrieved 5 May 2016.
  7. "graphd project on github.com". GitHub . 1 October 2019. Retrieved 1 October 2019.
  8. "pymql project on github.com". GitHub . 15 September 2020. Retrieved 15 September 2020.
  9. 1 2 3 4 O'Reilly, Tim (March 8, 2007). "Freebase Will Prove Addictive". O'Reilly Radar . O'Reilly Media. Archived from the original on October 14, 2008. Retrieved September 6, 2014.
  10. Meyer, Scott (April 8, 2008). "A Brief Tour of Graphd". blog.freebase.com. Archived from the original on May 30, 2012. Retrieved September 6, 2014.
  11. Hillis, W. Daniel (2000). ""Aristotle" (the Knowledge Web)". Archived from the original on January 17, 2013. Retrieved January 20, 2013.
  12. Taylor, Jamie (October 30, 2008). "Introducing the Freebase RDF service". Archived from the original on May 16, 2012. Retrieved September 6, 2014.
  13. "Arnold Schwarzenegger". Freebase. Archived from the original on 2012-07-03. Retrieved 2014-02-14.
  14. "Explore Freebase Data". www.freebase.com. Archived from the original on 2010-06-14. Retrieved 2013-02-14.