Graph (abstract data type)

Last updated
A directed graph with three vertices (blue circles) and three edges (black arrows). Directed.svg
A directed graph with three vertices (blue circles) and three edges (black arrows).

In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from the field of graph theory within mathematics.

Contents

A graph data structure consists of a finite (and possibly mutable) set of vertices (also called nodes or points), together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges (also called links or lines), and for a directed graph are also known as edges but also sometimes arrows or arcs. The vertices may be part of the graph structure, or may be external entities represented by integer indices or references.

A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).

Operations

UML class diagram of a Graph (abstract data type) UML graph ADT.svg
UML class diagram of a Graph (abstract data type)

The basic operations provided by a graph data structure G usually include: [1]

Structures that associate values to the edges usually also provide: [1]

Common data structures for graph representation

Adjacency list [2]
Vertices are stored as records or objects, and every vertex stores a list of adjacent vertices. This data structure allows the storage of additional data on the vertices. Additional data can be stored if edges are also stored as objects, in which case each vertex stores its incident edges and each edge stores its incident vertices.
Adjacency matrix [3]
A two-dimensional matrix, in which the rows represent source vertices and columns represent destination vertices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices.
Incidence matrix [4]
A two-dimensional matrix, in which the rows represent the vertices and columns represent the edges. The entries indicate the incidence relation between the vertex at a row and edge at a column.

The following table gives the time complexity cost of performing various operations on graphs, for each of these representations, with |V| the number of vertices and |E| the number of edges.[ citation needed ] In the matrix representations, the entries encode the cost of following an edge. The cost of edges that are not present are assumed to be ∞.

Adjacency listAdjacency matrixIncidence matrix
Store graph
Add vertex
Add edge
Remove vertex
Remove edge
Are vertices x and y adjacent (assuming that their storage positions are known)?
RemarksSlow to remove vertices and edges, because it needs to find all vertices or edgesSlow to add or remove vertices, because matrix must be resized/copiedSlow to add or remove vertices and edges, because matrix must be resized/copied

Adjacency lists are generally preferred for the representation of sparse graphs, while an adjacency matrix is preferred if the graph is dense; that is, the number of edges is close to the number of vertices squared, , or if one must be able to quickly look up if there is an edge connecting two vertices. [5] [6]

More efficient representation of adjacency sets

The time complexity of operations in the adjacency list representation can be improved by storing the sets of adjacent vertices in more efficient data structures, such as hash tables or balanced binary search trees (the latter representation requires that vertices are identified by elements of a linearly ordered set, such as integers or character strings). A representation of adjacent vertices via hash tables leads to an amortized average time complexity of to test adjacency of two given vertices and to remove an edge and an amortized average time complexity [7] of to remove a given vertex x of degree . The time complexity of the other operations and the asymptotic space requirement do not change.

Parallel representations

The parallelization of graph problems faces significant challenges: Data-driven computations, unstructured problems, poor locality and high data access to computation ratio. [8] [9] The graph representation used for parallel architectures plays a significant role in facing those challenges. Poorly chosen representations may unnecessarily drive up the communication cost of the algorithm, which will decrease its scalability. In the following, shared and distributed memory architectures are considered.

Shared memory

In the case of a shared memory model, the graph representations used for parallel processing are the same as in the sequential case, [10] since parallel read-only access to the graph representation (e.g. an adjacency list) is efficient in shared memory.

Distributed memory

In the distributed memory model, the usual approach is to partition the vertex set of the graph into sets . Here, is the amount of available processing elements (PE). The vertex set partitions are then distributed to the PEs with matching index, additionally to the corresponding edges. Every PE has its own subgraph representation, where edges with an endpoint in another partition require special attention. For standard communication interfaces like MPI, the ID of the PE owning the other endpoint has to be identifiable. During computation in a distributed graph algorithms, passing information along these edges implies communication. [10]

Partitioning the graph needs to be done carefully - there is a trade-off between low communication and even size partitioning [11] But partitioning a graph is a NP-hard problem, so it is not feasible to calculate them. Instead, the following heuristics are used.

1D partitioning: Every processor gets vertices and the corresponding outgoing edges. This can be understood as a row-wise or column-wise decomposition of the adjacency matrix. For algorithms operating on this representation, this requires an All-to-All communication step as well as message buffer sizes, as each PE potentially has outgoing edges to every other PE. [12]

2D partitioning: Every processor gets a submatrix of the adjacency matrix. Assume the processors are aligned in a rectangle , where and are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension . This can be visualized as a checkerboard pattern in a matrix. [12] Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to out of possible ones.

Compressed representations

Graphs with trillions of edges occur in machine learning, social network analysis, and other areas. Compressed graph representations have been developed to reduce I/O and memory requirements. General techniques such as Huffman coding are applicable, but the adjacency list or adjacency matrix can be processed in specific ways to increase efficiency. [13]

Graph traversal

Breadth-first search (BFS) and depth-first search (DFS) are two closely-related approaches that are used for exploring all of the nodes in a given connected component. Both start with an arbitrary node, the "root". [14]

See also

Related Research Articles

<span class="mw-page-title-main">Graph theory</span> Area of discrete mathematics

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

<span class="mw-page-title-main">Kruskal's algorithm</span> Minimum spanning forest algorithm that greedily adds edges

Kruskal's algorithm finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. It is a greedy algorithm that in each step adds to the forest the lowest-weight edge that will not form a cycle. The key steps of the algorithm are sorting and the use of a disjoint-set data structure to detect cycles. Its running time is dominated by the time to sort all of the graph edges by their weight.

<span class="mw-page-title-main">Prim's algorithm</span> Method for finding minimum spanning trees

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

<span class="mw-page-title-main">Depth-first search</span> Search algorithm

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph.

<span class="mw-page-title-main">Directed acyclic graph</span> Directed graph with no directed cycles

In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges, with each edge directed from one vertex to another, such that following those directions will never form a closed loop. A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions. DAGs have numerous scientific and computational applications, ranging from biology to information science to computation (scheduling).

<span class="mw-page-title-main">Hypergraph</span> Generalization of graph theory

In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.

In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

<span class="mw-page-title-main">Graph isomorphism</span> Bijection between the vertex set of two graphs

In graph theory, an isomorphism of graphsG and H is a bijection between the vertex sets of G and H

<span class="mw-page-title-main">Graph (discrete mathematics)</span> Vertices connected in pairs by edges

In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects are represented by abstractions called vertices and each of the related pairs of vertices is called an edge. Typically, a graph is depicted in diagrammatic form as a set of dots or circles for the vertices, joined by lines or curves for the edges.

<span class="mw-page-title-main">Adjacency list</span> Data structure representing a graph

In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This is one of several commonly used representations of graphs for use in computer programs.

<span class="mw-page-title-main">Spanning tree</span> Tree which includes all vertices of a graph

In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. Named after Pierre-Simon Laplace, the graph Laplacian matrix can be viewed as a matrix form of the negative discrete Laplace operator on a graph approximating the negative continuous Laplacian obtained by the finite difference method.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .

<span class="mw-page-title-main">SPQR tree</span> Representation of a graphs triconnected components

In graph theory, a branch of mathematics, the triconnected components of a biconnected graph are a system of smaller graphs that describe all of the 2-vertex cuts in the graph. An SPQR tree is a tree data structure used in computer science, and more specifically graph algorithms, to represent the triconnected components of a graph. The SPQR tree of a graph may be constructed in linear time and has several applications in dynamic graph algorithms and graph drawing.

<span class="mw-page-title-main">Directed graph</span> Graph with oriented edges

In mathematics, and more specifically in graph theory, a directed graph is a graph that is made up of a set of vertices connected by directed edges, often called arcs.

In graph theory, the modular decomposition is a decomposition of a graph into subsets of vertices called modules. A module is a generalization of a connected component of a graph. Unlike connected components, however, one module can be a proper subset of another. Modules therefore lead to a recursive (hierarchical) decomposition of the graph, instead of just a partition.

The breadth-first-search algorithm is a way to explore the vertices of a graph layer by layer. It is a basic algorithm in graph theory which can be used as a part of other graph algorithms. For instance, BFS is used by Dinic's algorithm to find maximum flow in a graph. Moreover, BFS is also one of the kernel algorithms in Graph500 benchmark, which is a benchmark for data-intensive supercomputing problems. This article discusses the possibility of speeding up BFS through the use of parallel computing.

References

  1. 1 2 See, e.g. Goodrich & Tamassia (2015), Section 13.1.2: Operations on graphs, p. 360. For a more detailed set of operations, see Mehlhorn, K.; Näher, S. (1999). "Chapter 6: Graphs and their data structures". LEDA: A platform for combinatorial and geometric computing (PDF). Cambridge University Press. pp. 240–282.
  2. Cormen et al. (2001), pp. 528–529; Goodrich & Tamassia (2015), pp. 361-362.
  3. Cormen et al. (2001), pp. 529–530; Goodrich & Tamassia (2015), p. 363.
  4. Cormen et al. (2001), Exercise 22.1-7, p. 531.
  5. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 22.1: Representations of graphs". Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill. pp. 527–531. ISBN   0-262-03293-7.
  6. Goodrich, Michael T.; Tamassia, Roberto (2015). "Section 13.1: Graph terminology and representations". Algorithm Design and Applications. Wiley. pp. 355–364. ISBN   978-1-118-33591-8.
  7. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009). Introduction to Algorithms (3rd ed.). Massachusetts Institute of Technology. pp. 253–280. ISBN   978-0-262-03384-8.
  8. Bader, David; Meyerhenke, Henning; Sanders, Peter; Wagner, Dorothea (January 2013). Graph Partitioning and Graph Clustering. Contemporary Mathematics. Vol. 588. American Mathematical Society. doi:10.1090/conm/588/11709. ISBN   978-0-8218-9038-7.
  9. Lumsdaine, Andrew; Gregor, Douglas; Hendrickson, Bruce; Berry, Jonathan (March 2007). "Challenges in Parallel Graph Processing". Parallel Processing Letters. 17 (1): 5–20. doi:10.1142/s0129626407002843. ISSN   0129-6264.
  10. 1 2 Sanders, Peter; Mehlhorn, Kurt; Dietzfelbinger, Martin; Dementiev, Roman (2019). Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox. Springer International Publishing. ISBN   978-3-030-25208-3.
  11. "Parallel Processing of Graphs" (PDF).
  12. 1 2 Buluç, A.; Madduri, Kamesh (2011). "Applications". Parallel breadth-first search on distributed memory systems. 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. CiteSeerX   10.1.1.767.5248 . doi:10.1145/2063384.2063471. ISBN   978-1-4503-0771-0. S2CID   6540738.
  13. Besta, Maciej; Hoefler, Torsten (27 April 2019). "Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations". arXiv: 1806.01799 [cs.DS].
  14. Purti (July–September 2018). "Graph Traversals and its Applications" (PDF). International Journal of Research and Analytical Reviews. 5 (3): 2.