Markov logic network

Last updated

A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, defining probability distributions on possible worlds on any given domain .

Contents

History

In 2002, Ben Taskar, Pieter Abbeel and Daphne Koller introduced relational Markov networks as templates to specify Markov networks abstractly and without reference to a specific domain. [1] [2] Work on Markov logic networks began in 2003 by Pedro Domingos and Matt Richardson. [3] [4] As of 2023, Markov logic networks are still among the most popular formalisms for statistical relational learning. [5]

Syntax

A Markov logic network consists of a collection of formulas from first-order logic, to each of which is assigned a real number, the weight. The underlying idea is that an interpretation is more likely if it satisfies formulas with positive weights and less likely if it satisfies formulas with negative weights. [6]

For instance, the following Markov logic network codifies how smokers are more likely to be friends with other smokers, and how stress encourages smoking: [7]

Semantics

Markov network induced by the theory from Syntax to a two-element domain. Grounded Markov logic network.jpg
Markov network induced by the theory from Syntax to a two-element domain.

Together with a given domain, a Markov logic network defines a probability distribution on the set of all interpretations of its predicates on the given domain. The underlying idea is that an interpretation is more likely if it satisfies formulas with positive weights and less likely if it satisfies formulas with negative weights.

For any -ary predicate symbol that occurs in the Markov logic network and every - tuple of domain elements, is a grounding of . An interpretation is given by allocating a Boolean truth value (true or false) to each grounding of an element. A true grounding of a formula in an interpretation with free variables is a variable assignment of that makes true in that interpretation.

Then the probability of any given interpretation is directly proportional to , where is the weight of the -th sentence of the Markov logic network and is the number of its true groundings. [1] [4]

This can also be seen as inducing a Markov network whose nodes are the groundings of the predicates occurring in the Markov logic network. The feature functions of this network are the groundings of the sentences occurring in the Markov logic network, with value if the grounding is true and 1 otherwise (where again is the weight of the formula). [1]

Inference

The probability distributions induced by Markov logic networks can be queried for the probability of a particular event, given by an atomic formula (marginal inference), possibly conditioned by another atomic formula. [6]

Marginal inference can be performed using standard Markov network inference techniques over the minimal subset of the relevant Markov network required for answering the query. Exact inference is known to be #P-complete in the size of the domain. [6]

In practice, the exact probability is often approximated. [8] Techniques for approximate inference include Gibbs sampling, belief propagation, or approximation via pseudolikelihood.

The class of Markov logic networks which use only two variables in any formula allows for polynomial time exact inference by reduction to weighted model counting. [9] [6]

See also

Resources

  1. 1 2 3 Cozman, Fabio Gagliardi (2020), "Languages for Probabilistic Modeling Over Structured and Relational Domains", A Guided Tour of Artificial Intelligence Research, Cham: Springer International Publishing, pp. 247–283, doi:10.1007/978-3-030-06167-8_9, ISBN   978-3-030-06166-1
  2. Taskar, Ben; Abbeel, Pieter; Koller, Daphne (2002-08-01). "Discriminative probabilistic models for relational data". Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence. UAI'02. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.: 485–492. ISBN   978-1-55860-897-9.
  3. Domingos, Pedro (2015). The Master Algorithm: How machine learning is reshaping how we live. p. 246-7.
  4. 1 2 Richardson, Matthew; Domingos, Pedro (2006). "Markov Logic Networks" (PDF). Machine Learning. 62 (1–2): 107–136. doi: 10.1007/s10994-006-5833-1 .
  5. Augustine, Eriq (2023). Building Practical Statistical Relational Learning Systems (Thesis). UC Santa Cruz.
  6. 1 2 3 4 Sun, Zhengya; Zhao, Yangyang; Wei, Zhuoyu; Zhang, Wensheng; Wang, Jue (2017). "Scalable learning and inference in Markov logic networks". International Journal of Approximate Reasoning. 82: 39–55. doi:10.1016/j.ijar.2016.12.003. ISSN   0888-613X.
  7. Marra, Giuseppe; Dumančić, Sebastijan; Manhaeve, Robin; De Raedt, Luc (2024-03-01). "From statistical relational to neurosymbolic artificial intelligence: A survey". Artificial Intelligence. 328: 104062. arXiv: 2108.11451 . doi:10.1016/j.artint.2023.104062. ISSN   0004-3702. Creative Commons by small.svg  This article incorporates text by Giuseppe Marra, Sebastijan Dumančić, Robin Manhaeve and Luc De Raedt available under the CC BY 4.0 license.
  8. Venugopal, Deepak (2017). "Advances in Inference Methods for Markov Logic Networks" (PDF). IEEE Intelligent Informatics Bulletin. 18 (2): 13–19.
  9. Kuzelka, Ondrej (2021-03-29). "Weighted First-Order Model Counting in the Two-Variable Fragment With Counting Quantifiers". Journal of Artificial Intelligence Research. 70: 1281–1307. arXiv: 2007.05619 . doi:10.1613/jair.1.12320. ISSN   1076-9757.

Related Research Articles

In artificial intelligence, with implications for cognitive science, the frame problem describes an issue with using first-order logic to express facts about a robot in the world. Representing the state of a robot with traditional first-order logic requires the use of many axioms that simply imply that things in the environment do not change arbitrarily. For example, Hayes describes a "block world" with rules about stacking blocks together. In a first-order logic system, additional axioms are required to make inferences about the environment. The frame problem is the problem of finding adequate collections of axioms for a viable description of a robot environment.

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers to philosophical rather than mathematical induction. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

<span class="mw-page-title-main">Dynamic Bayesian network</span> Probabilistic graphical model

A dynamic Bayesian network (DBN) is a Bayesian network (BN) which relates variables to each other over adjacent time steps.

<span class="mw-page-title-main">Markov random field</span> Set of random variables

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

Probabilistic logic involves the use of probability and logic to deal with uncertain situations. Probabilistic logic extends traditional logic truth tables with probabilistic expressions. A difficulty of probabilistic logics is their tendency to multiply the computational complexities of their probabilistic and logical components. Other difficulties include the possibility of counter-intuitive results, such as in case of belief fusion in Dempster–Shafer theory. Source trust and epistemic uncertainty about the probabilities they provide, such as defined in subjective logic, are additional elements to consider. The need to deal with a broad variety of contexts and issues has led to many different proposals.

Progol is an implementation of inductive logic programming that combines inverse entailment with general-to-specific search through a refinement graph.

Subjective logic is a type of probabilistic logic that explicitly takes epistemic uncertainty and source trust into account. In general, subjective logic is suitable for modeling and analysing situations involving uncertainty and relatively unreliable sources. For example, it can be used for modeling and analysing trust networks and Bayesian networks.

Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both uncertainty and complex, relational structure. Typically, the knowledge representation formalisms developed in SRL use first-order logic to describe relational properties of a domain in a general manner and draw upon probabilistic graphical models to model the uncertainty; some also build upon the methods of inductive logic programming. Significant contributions to the field have been made since the late 1990s.

A probabilistic logic network (PLN) is a conceptual, mathematical and computational approach to uncertain inference; inspired by logic programming, but using probabilities in place of crisp (true/false) truth values, and fractional uncertainty in place of crisp known/unknown values. In order to carry out effective reasoning in real-world circumstances, artificial intelligence software must robustly handle uncertainty. However, previous approaches to uncertain inference do not have the breadth of scope required to provide an integrated treatment of the disparate forms of cognitively critical uncertainty as they manifest themselves within the various forms of pragmatic inference. Going beyond prior probabilistic approaches to uncertain inference, PLN is able to encompass within uncertain logic such ideas as induction, abduction, analogy, fuzziness and speculation, and reasoning about time and causality.

Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.

<span class="mw-page-title-main">Probabilistic soft logic</span>

Probabilistic Soft Logic (PSL) is a statistical relational learning (SRL) framework for modeling probabilistic and relational domains. It is applicable to a variety of machine learning problems, such as collective classification, entity resolution, link prediction, and ontology alignment. PSL combines two tools: first-order logic, with its ability to succinctly represent complex phenomena, and probabilistic graphical models, which capture the uncertainty and incompleteness inherent in real-world knowledge. More specifically, PSL uses "soft" logic as its logical component and Markov random fields as its statistical model. PSL provides sophisticated inference techniques for finding the most likely answer (i.e. the maximum a posteriori (MAP) state). The "softening" of the logical formulas makes inference a polynomial time operation rather than an NP-hard operation.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents.

In network theory, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a social network, predicting co-authorship links in a citation network, and predicting interactions between genes and proteins in a biological network. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time , the goal is to predict the links at time . Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.

In network theory, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.

ProbLog is a probabilistic logic programming language that extends Prolog with probabilities. It minimally extends Prolog by adding the notion of a probabilistic fact, which combines the idea of logical atoms and random variables. Similarly to Prolog, ProbLog can query an atom. While Prolog returns the truth value of the queried atom, ProbLog returns the probability of it being true.

Probabilistic logic programming is a programming paradigm that combines logic programming with probabilities.