Nexus (data format)

Last updated

NeXus is a data format for experimental science that is commonly used in the neutron, x-ray, and muon scientific communities. It is being developed as an international standard by scientists and programmers representing major scientific facilities in Europe, Asia, Australia, and North America in order to facilitate greater cooperation in the analysis and visualization of scientific data. Technically, NeXus is a data model rather than a file format, because it describes how data should be organised and structured within a file and has little to say about how that data is encoded for storage. A NeXus file can be mapped into many different container formats, though the preferred and best supported backend is HDF5. [1] XML is used for demonstration purposes mainly.

Contents

Early history and motivation

By the early 1990s, several groups of scientists in the fields of neutron and X-ray science were frustrated that each of the instruments they worked with had a locally defined format for recording experimental data. With various formats, much of the scientists' time was being wasted in the task of writing import readers for processing and analysis programs. As is common, the exact information to be documented from each instrument in a data file evolves and makes compromises based on new features and limitations in the evolving hardware. Many of these formats lacked the generality to extend to the new data to be stored, thus another new format was devised. In such environments, the documentation of each generation of data format is often lacking.

Three parallel developments led to the creation of NeXus:

These scientists proposed methods to store data using a self-describing, extensible format that was already in broad use in other scientific disciplines. Their proposals formed the basis for the current design of the NeXus standard which was developed across a series of workshops organized by Ray Osborn (ANL), attended by representatives of a range of neutron and X-ray facilities. The NeXus API was released in late 1997. [2]

Main features

NeXus is primarily concerned with how data is organised within a file. To achieve this, NeXus provides: [3]

The NeXus format is composed of "Base Class" objects that represent various types of hardware and other convenient groupings of information, such as a geometry, or the state of a beam at a given position. These Base Classes provide a dictionary of terms that might be used to describe specific properties in an instance of that class and provide clarity on what a terms means and which specific name should be used for something that might have a number of equivalent choices. The way in which Base Classes are combined is then given by an Application Definition, which describes the hierarchical structure, the minimum set of required information and optional additions for a type of experiment. While the documented NeXus philosophy guides Application Definitions towards a shared structure, there is freedom for an Application Definition to diverge from the others in order to suit the needs of the community it intends to serve.

NeXus strongly encourages data files to contain a default dataset that can be easily plotted, and contain a full description of the experiment in meaningful physical terms and thus not require extra knowledge to interpret the file contents.

Community

The NeXus community typically interact through the NeXus mailing lists, monthly teleconferences and an annual meeting.

Governance

The development of NeXus is overseen by the NeXus International Advisory Committee (NIAC). [4] The NIAC seeks a balanced representation of the international community. Most major neutron, X-ray and muon facilities have appointed delegates. Other facilities and interested parties are invited to join. The NIAC reviews any proposed amendments to the NeXus base classes and application definitions, and holds votes to ratify changes. Full meetings of the NIAC are held every 2 years, usually in conjunction with the NOBUGS conferences.

Related Research Articles

<span class="mw-page-title-main">Muon</span> Subatomic particle

A muon is an elementary particle similar to the electron, with an electric charge of −1 e and a spin of 12, but with a much greater mass. It is classified as a lepton. As with other leptons, the muon is not thought to be composed of any simpler particles; that is, it is a fundamental particle.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

Digital Imaging and Communications in Medicine (DICOM) is the standard for the communication and management of medical imaging information and related data. DICOM is most commonly used for storing and transmitting medical images enabling the integration of medical imaging devices such as scanners, servers, workstations, printers, network hardware, and picture archiving and communication systems (PACS) from multiple manufacturers. It has been widely adopted by hospitals and is making inroads into smaller applications such as dentists' and doctors' offices.

<span class="mw-page-title-main">Small-angle neutron scattering</span>

Small-angle neutron scattering (SANS) is an experimental technique that uses elastic neutron scattering at small scattering angles to investigate the structure of various substances at a mesoscopic scale of about 1–100 nm.

<span class="mw-page-title-main">Space Telescope Science Institute</span> Science operations center operated by NASA

The Space Telescope Science Institute (STScI) is the science operations center for the Hubble Space Telescope (HST), science operations and mission operations center for the James Webb Space Telescope (JWST), and science operations center for the Nancy Grace Roman Space Telescope. STScI was established in 1981 as a community-based science center that is operated for NASA by the Association of Universities for Research in Astronomy (AURA). STScI's offices are located on the Johns Hopkins University Homewood Campus and in the Rotunda building in Baltimore, Maryland.

The MPEG-21 standard, from the Moving Picture Experts Group, aims at defining an open framework for multimedia applications. MPEG-21 is ratified in the standards ISO/IEC 21000 - Multimedia framework (MPEG-21).

Flexible Image Transport System (FITS) is an open standard defining a digital file format useful for storage, transmission and processing of data: formatted as multi-dimensional arrays, or tables. FITS is the most commonly used digital file format in astronomy. The FITS standard was designed specifically for astronomical data, and includes provisions such as describing photometric and spatial calibration information, together with image origin metadata.

<span class="mw-page-title-main">Hierarchical Data Format</span> Set of file formats

Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF.

<span class="mw-page-title-main">ROOT</span> Data analysis software

ROOT is an object-oriented program and library developed by CERN. It was originally designed for particle physics data analysis and contains several features specific to the field, but it is also used in other applications such as astronomy and data mining. The latest minor release is 6.26, as of 2022-03-08.

<span class="mw-page-title-main">Paul Scherrer Institute</span> Swiss federal research institute

The Paul Scherrer Institute (PSI) is a multi-disciplinary research institute for natural and engineering sciences in Switzerland. It is located in the Canton of Aargau in the municipalities Villigen and Würenlingen on either side of the River Aare, and covers an area over 35 hectares in size. Like ETH Zurich and EPFL, PSI belongs to the Swiss Federal Institutes of Technology Domain of the Swiss Confederation. The PSI employs around 2,100 people. It conducts basic and applied research in the fields of matter and materials, human health, and energy and the environment. About 37% of PSI's research activities focus on material sciences, 24% on life sciences, 19% on general energy, 11% on nuclear energy and safety, and 9% on particle physics.

The Industry Foundation Classes (IFC) is a CAD data exchange data schema intended for description of architectural, building and construction industry data.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

CGNS stands for CFD General Notation System. It is a general, portable, and extensible standard for the storage and retrieval of CFD analysis data. It consists of a collection of conventions, and free and open software implementing those conventions. It is self-descriptive, cross-platform also termed platform or machine independent, documented, and administered by an international steering committee. It is also an American Institute of Aeronautics and Astronautics (AIAA) recommended practice. The CGNS project originated in 1994 as a joint effort between Boeing and NASA, and has since grown to include many other contributing organizations worldwide. In 1999, control of CGNS was completely transferred to a public forum known as the CGNS Steering Committee. This Committee is made up of international representatives from government and private industry.

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata program at the University Corporation for Atmospheric Research (UCAR). They are also the chief source of netCDF software, standards development, updates, etc. The format is an open standard. NetCDF Classic and 64-bit Offset Format are an international standard of the Open Geospatial Consortium.

<span class="mw-page-title-main">Veusz</span> Plotting software

Veusz is a scientific plotting package. Veusz is a Qt application written in Python, PyQt and NumPy. It is freely available for anyone to distribute under the terms of the GPL. It is designed to produce publication-quality plots. The name should be pronounced as "views".

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system, also productized as BigQuery. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Muon tomography or muography is a technique that uses cosmic ray muons to generate three-dimensional images of volumes using information contained in the Coulomb scattering of the muons. Since muons are much more deeply penetrating than X-rays, muon tomography can be used to image through much thicker material than x-ray based tomography such as CT scanning. The muon flux at the Earth's surface is such that a single muon passes through an area the size of a human hand per second.

References

  1. Wuttke, J.; Wintersberger, E.; Watts, B.; Suzuki, J.; Richter, T.; Peterson, P. F.; Osborn, R.; Männicke, D.; Jemian, P. R. (2015-02-01). "The NeXus data format". Journal of Applied Crystallography. 48 (1): 301–305. doi:10.1107/S1600576714027575. ISSN   1600-5767. PMC   4453170 . PMID   26089752.
  2. "Motivations for the NeXus standard in the Scientific Community".
  3. "NeXus Introduction". NeXus Documentation. 12 August 2019.
  4. "NIAC". NeXus International Advisory Committee. 12 August 2019.