HPX

Last updated
HPX
Developer(s) The STEllAR Group
LSU Center for Computation and Technology
Initial release2008 (2008)
Stable release
1.10.0 / May 29, 2024;4 days ago (2024-05-29)
Repository github.com/STEllAR-GROUP/hpx
Written in C++
Operating system Microsoft Windows
Linux
Mac OS X
Type Partitioned global address space
Parallel programming
Runtime System
License Boost Software License [1]
Website hpx.stellar-group.org

HPX, short for High Performance ParalleX, is a runtime system for high-performance computing. It is currently under active development by the STE||AR group [2] at Louisiana State University. Focused on scientific computing, it provides an alternative execution model to conventional approaches such as MPI. HPX aims to overcome the challenges MPI faces with increasing large supercomputers by using asynchronous communication between nodes and lightweight control objects instead of global barriers, allowing application developers to exploit fine-grained parallelism. [3] [4] [5]

Contents

HPX is developed in idiomatic C++ and released as open source under the Boost Software License, which allows usage in commercial applications.

Applications

Though designed as a general-purpose environment for high-performance computing, HPX has primarily been used in

Related Research Articles

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Neuromorphic computing is an approach to computing that is inspired by the structure and function of the human brain. A neuromorphic computer/chip is any device that uses physical artificial neurons to do computations. In recent times, the term neuromorphic has been used to describe analog, digital, mixed-mode analog/digital VLSI, and software systems that implement models of neural systems. The implementation of neuromorphic computing on the hardware level can be realized by oxide-based memristors, spintronic memories, threshold switches, transistors, among others. Training software-based neuromorphic systems of spiking neural networks can be achieved using error backpropagation, e.g., using Python based frameworks such as snnTorch, or using canonical learning rules from the biological learning literature, e.g., using BindsNet.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

The Ravenscar profile is a subset of the Ada tasking features designed for safety-critical hard real-time computing. It was defined by a separate technical report in Ada 95; it is now part of the Ada 2012 Standard. It has been named after the English village of Ravenscar, the location of the 8th International Real-Time Ada Workshop.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a programming language, as an extension to an existing languages.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

Charm++ is a parallel object-oriented programming paradigm based on C++ and developed in the Parallel Programming Laboratory at the University of Illinois at Urbana–Champaign. Charm++ is designed with the goal of enhancing programmer productivity by providing a high-level abstraction of a parallel program while at the same time delivering good performance on a wide variety of underlying hardware platforms. Programs written in Charm++ are decomposed into a number of cooperating message-driven objects called chares. When a programmer invokes a method on an object, the Charm++ runtime system sends a message to the invoked object, which may reside on the local processor or on a remote processor in a parallel computation. This message triggers the execution of code within the chare to handle the message asynchronously.

<span class="mw-page-title-main">Microsoft Robotics Developer Studio</span>

Microsoft Robotics Developer Studio is a discontinued Windows-based environment for robot control and simulation that was aimed at academic, hobbyist, and commercial developers and handled a wide variety of robot hardware. It requires a Microsoft Windows 7 operating system or later.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

Compute Unified Device Architecture (CUDA) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations (like moving data between the CPU and the GPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

Özalp Babaoğlu, is a Turkish computer scientist. He was professor of computer science at the University of Bologna, Italy until 2022. He received a Ph.D. in 1981 from the University of California at Berkeley. He is the recipient of 1982 Sakrison Memorial Award, 1989 UNIX InternationalRecognition Award and 1993 USENIX AssociationLifetime Achievement Award for his contributions to the UNIX system community and to Open Industry Standards. Before moving to Bologna in 1988, Babaoğlu was an associate professor in the Department of Computer Science at Cornell University. He has participated in several European research projects in distributed computing and complex systems. Babaoğlu is an ACM Fellow and has served as a resident fellow of the Institute of Advanced Studies at the University of Bologna and on the editorial boards for ACM Transactions on Computer Systems, ACM Transactions on Autonomous and Adaptive Systems and Springer-Verlag Distributed Computing.

In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph. If the number of resulting edges is small compared to the original graph, then the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing, VLSI circuit design, and task scheduling in multiprocessor computers, among others. Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see Buluc et al. (2013). Two common examples of graph partitioning are minimum cut and maximum cut problems.

Reverse computation is a software application of the concept of reversible computing.

<span class="mw-page-title-main">MilkyWay@home</span> BOINC based volunteer computing project researching astronomy

MilkyWay@home is a volunteer computing project in the astrophysics category, running on the Berkeley Open Infrastructure for Network Computing (BOINC) platform. Using spare computing power from over 38,000 computers run by over 27,000 active volunteers as of November 2011, the MilkyWay@home project aims to generate accurate three-dimensional dynamic models of stellar streams in the immediate vicinity of the Milky Way. With SETI@home and Einstein@home, it is the third computing project of this type that has the investigation of phenomena in interstellar space as its primary purpose. Its secondary objective is to develop and optimize algorithms for volunteer computing.

<span class="mw-page-title-main">Object code optimizer</span> Aspect of software compilation

An object code optimizer, sometimes also known as a post pass optimizer or, for small sections of code, peephole optimizer, forms part of a software compiler. It takes the output from the source language compile step - the object code or binary file - and tries to replace identifiable sections of the code with replacement code that is more algorithmically efficient.

MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions using adaptive and fast harmonic analysis methods with guaranteed precision based on multiresolution analysis and separated representations .

<span class="mw-page-title-main">SpiNNaker</span>

SpiNNaker is a massively parallel, manycore supercomputer architecture designed by the Advanced Processor Technologies Research Group (APT) at the Department of Computer Science, University of Manchester. It is composed of 57,600 processing nodes, each with 18 ARM9 processors and 128 MB of mobile DDR SDRAM, totalling 1,036,800 cores and over 7 TB of RAM. The computing platform is based on spiking neural networks, useful in simulating the human brain.

CloudSim is a framework for modeling and simulation of cloud computing infrastructures and services. Originally built primarily at the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, the University of Melbourne, Australia, CloudSim has become one of the most popular open source cloud simulators in the research and academia. CloudSim is completely written in Java. The latest version of CloudSim is CloudSim v6.0.0-beta on GitHub.

An event camera, also known as a neuromorphic camera, silicon retina or dynamic vision sensor, is an imaging sensor that responds to local changes in brightness. Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.

Manish Parashar is a Presidential Professor in the School of Computing, Director of the Scientific Computing and Imaging (SCI) Institute and Chair in Computational Science and Engineering at the University of Utah. He also currently serves as Office Director in the US National Science Foundation’s Office of Advanced Cyberinfrastructure. Parashar is the editor-in-chief of IEEE Transactions on Parallel and Distributed Systems, and Founding Chair of the IEEE Technical Community on High Performance Computing. He is an AAAS Fellow, ACM Fellow, and IEEE Fellow.

References

  1. "License", Boost Software License – Version 1.0, boost.org, retrieved 2012-07-30
  2. "About the STE||AR Group" . Retrieved 17 April 2019.
  3. Kaiser, Hartmut; Brodowicz, Maciek; Sterling, Thomas (2009). "ParalleX an Advanced Parallel Execution Model for Scaling-Impaired Applications". 2009 International Conference on Parallel Processing Workshops. pp. 394–401. doi:10.1109/icppw.2009.14. ISBN   978-1-4244-4923-1. S2CID   898158.
  4. Wagle, Bibek; Kellar, Samuel; Serio, Adrian; Kaiser, Hartmut (2018). "Methodology for Adaptive Active Message Coalescing in Task Based Runtime Systems". 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). pp. 1133–1140. doi:10.1109/IPDPSW.2018.00173. ISBN   978-1-5386-5555-9. S2CID   51921994.
  5. 1 2 Wagle, Bibek; Monil, Mohammad Alaul Haque; Huck, Kevin; Malony, Allen D.; Serio, Adrian; Kaiser, Hartmut (2019). "Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems". Proceedings of the 48th International Conference on Parallel Processing. pp. 1–10. doi:10.1145/3337821.3337915. ISBN   9781450362955. S2CID   198963569.
  6. C. Dekate, M. Anderson, M. Brodowicz, H. Kaiser, B. Adelstein-Lelbach and T. Sterling (2012). "Improving the Scalability of Parallel N-body Applications with an Event-driven Constraint-based Execution Model". International Journal of High Performance Computing Applications. 26 (3): 319–332. arXiv: 1109.5190 . doi:10.1177/1094342012440585. S2CID   9556798.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  7. M. Anderson, T. Sterling, H. Kaiser and D. Neilsen (2011). "Neutron Star Evolutions using Tabulated Equations of State with a New Execution Model" (PDF). American Physical Society April 2012 Meeting.{{cite web}}: CS1 maint: multiple names: authors list (link)
  8. D. Pfander, G. Daiß, D. Marcello, H. Kaiser, D. Pflüger, David (2018). "Accelerating Octo-Tiger: Stellar Mergers on Intel Knights Landing with HPX". DHPCC++ Conference 2018 Hosted by IWOCL. doi:10.1145/3204919.3204938. S2CID   21126354.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  9. Marcello, Dominic; Daiß, Gregor; Parsa Amini; Kaiser, Hartmut; Diehl, Patrick; Wash, Bryce Adelstein Lelbach Aka; Heller, Thomas; Shibersag; Huck, Kevin; Biddiscombe, John; Schäfer, Andreas (2019-04-17), STEllAR-GROUP/octotiger Repository on GitHub, The STE||AR Group, doi:10.5281/zenodo.5093174 , retrieved 2019-04-17
  10. Heller, Thomas; Lelbach, Bryce Adelstein; Huck, Kevin A; Biddiscombe, John; Grubel, Patricia; Koniges, Alice E; Kretz, Matthias; Marcello, Dominic; Pfander, David (2019-02-14). "Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars". The International Journal of High Performance Computing Applications. 33 (4): 699–715. doi: 10.1177/1094342018819744 . ISSN   1094-3420. OSTI   1524389.
  11. "LibGeoDecomp – Petascale Computer Simulations". www.libgeodecomp.org. Retrieved 2019-04-17.
  12. A library for C++/Fortran computer simulations (e.g. stencil codes, mesh-free, unstructured grids, n-body & particle methods). Scales from smartphones to petascale supercomputers (e.g. Titan, T.., The STE||AR Group, 2019-04-06, retrieved 2019-04-17
  13. A. Schäfer, D. Fey (2008). "LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes". Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science. Vol. 5205. pp. 285–294. doi:10.1007/978-3-540-87475-1_39. ISBN   978-3-540-87474-4.
  14. Diehl, Patrick; Jha, Prashant K.; Kaiser, Hartmut; Lipton, Robert; Levesque, Martin (2020). "An asynchronous and task-based implementation of peridynamics utilizing HPX—the C++ standard library for parallelism and concurrency". SN Applied Sciences. 2 (12). arXiv: 1806.06917 . doi: 10.1007/s42452-020-03784-x . S2CID   227240479.
  15. "Phylanx – A Distributed Array Toolkit" . Retrieved 2019-04-17.
  16. An Asynchronous Distributed C++ Array Processing Toolkit: STEllAR-GROUP/phylanx, The STE||AR Group, 2019-04-16, retrieved 2019-04-17
  17. Tohid, R.; Wagle, Bibek; Shirzad, Shahrzad; Diehl, Patrick; Serio, Adrian; Kheirkhahan, Alireza; Amini, Parsa; Williams, Katy; Isaacs, Kate; Huck, Kevin; Brandt, Steven; Kaiser, Hartmut (2018). "Asynchronous Execution of Python Code on Task-Based Runtime Systems". 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). pp. 37–45. arXiv: 1810.07591 . doi:10.1109/ESPM2.2018.00009. ISBN   978-1-72810-178-1. S2CID   52988499.