COM Structured Storage

Last updated

COM Structured Storage (variously also known as COM structured storage or OLE structured storage) is a technology developed by Microsoft as part of its Windows operating system for storing hierarchical data within a single file. Strictly speaking, the term structured storage refers to a set of COM interfaces that a conforming implementation must provide, and not to a specific implementation, nor to a specific file format (in fact, a structured storage implementation need not store its data in a file at all). In addition to providing a hierarchical structure for data, structured storage may also provide a limited form of transactional support for data access. Microsoft provides an implementation that supports transactions, as well as one that does not (called simple-mode storage, the latter implementation is limited in other ways as well, although it performs better).

Contents

Structured storage is widely used in Microsoft Office applications, although newer releases (starting with Office 2007) use the XML-based Office Open XML by default. It is also an important part of both COM and the related Object Linking and Embedding (OLE) technologies. Other notable applications of structured storage include SQL Server, the Windows shell, and many third-party CAD programs.

Motivation

Structured storage addresses some inherent difficulties of storing multiple data objects within a single file. One difficulty arises when an object persisted in the file changes in size due to an update. If the application that is reading/writing the file expects the objects in the file to remain in a certain order, everything following that object's representation in the file may need to be shifted backward to make room if the object grows, or forward to fill in the space left over if the object shrinks. If the file is large, this could result in a costly operation. Of course, there are many possible solutions to this difficulty, but often the application programmer does not want to deal with low level details such as binary file formats.

Structured storage provides an abstraction known as a stream, represented by the interface IStream. A stream is conceptually very similar to a file, and the IStream interface provides methods for reading and writing similar to file input/output. A stream could reside in memory, within a file, within another stream, etc., depending on the implementation. Another important abstraction is that of a storage, represented by the interface IStorage. A storage is conceptually very similar to a directory on a file system. Storages can contain streams, as well as other storages.

If an application wishes to persist several data objects to a file, one way to do so would be to open an IStorage that represents the contents of that file and save each of the objects within a single IStream. One way to accomplish the latter is through the standard COM interface IPersistStream. OLE depends heavily on this model to embed objects within documents.

Format

Microsoft's implementation uses a file format known as compound files, and all of the widely deployed structured storage implementations read and write this format. Compound files use a FAT-like structure to represent storages and streams. Chunks of the file, known as sectors (these may or may not correspond to sectors of the underlying file system), are allocated as needed to add new streams and to increase the size of existing streams. If streams are deleted or shrink, leaving unallocated sectors, those sectors can be reused for new streams.

The following applications use the OLE Structured Storage (Compound Document Format)

Native Structured Storage

During the beta testing phase of Windows 2000, it included a feature titled Native Structured Storage (NSS) for storage of Structured Storage documents (like the binary Microsoft Office formats and the thumbs.db file Windows Explorer uses to cache thumbnails) with each Stream that makes up a document stored in a separate NTFS data stream. It included utilities that automatically split up the streams in a regular Structured Storage document into NTFS data streams and vice versa. However, the feature was withdrawn after Beta 3 due to incompatibilities with other OS components, and any NSS files automatically converted to the single data stream format. [1]

Implementations

Related Research Articles

In computing, serialization or serialisation is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

VBScript is an Active Scripting language developed by Microsoft that is modeled on Visual Basic. It allows Microsoft Windows system administrators to generate powerful tools for managing computers with error handling, subroutines, and other advanced programming constructs. It can give the user complete control over many aspects of their computing environment.

NTFS is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family.

Object Linking & Embedding (OLE) is a proprietary technology developed by Microsoft that allows embedding and linking to documents and other objects. For developers, it brought OLE Control Extension (OCX), a way to develop and use custom user interface elements. On a technical level, an OLE object is any object that implements the IOleObject interface, possibly along with a wide range of other interfaces, depending on the object's needs.

The resource fork is a fork or section of a file on Apple's classic Mac OS operating system, which was also carried over to the modern macOS for compatibility, used to store structured data along with the unstructured data stored within the data fork.

An object-oriented operating system is an operating system that is designed, structured, and operated using object-oriented programming principles.

WinFS Data storage and management system OS in Windows

WinFS was the code name for a canceled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003 as an advanced storage subsystem for the Microsoft Windows operating system, designed for persistence and management of structured, semi-structured and unstructured data.

Hierarchical Data Format Set of file formats

Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data. Originally developed at the National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF.

File system Format or program for storing files and directories

In computing, file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stops and the next begins. By separating the data into pieces and giving each piece a name, the data is easily isolated and identified. Taking its name from the way paper-based data management system is named, each group of data is called a "file." The structure and logic rules used to manage the groups of data and their names is called a "file system."

In a computer file system, a fork is a set of data associated with a file-system object. File systems without forks only allow a single set of data for the contents, while file systems with forks allow multiple such contents. Every non-empty file must have at least one fork, often of default type, and depending on the file system, a file may have one or more other associated forks, which in turn may contain primary data integral to the file, or just metadata.

OLE DB, an API designed by Microsoft, allows accessing data from a variety of sources in a uniform manner. The API provides a set of interfaces implemented using the Component Object Model (COM); it is otherwise unrelated to OLE. Microsoft originally intended OLE DB as a higher-level replacement for, and successor to, ODBC, extending its feature set to support a wider variety of non-relational databases, such as object databases and spreadsheets that do not necessarily implement.

Microsoft Data Access Components framework

Microsoft Data Access Components is a framework of interrelated Microsoft technologies that allows programmers a uniform and comprehensive way of developing applications that can access almost any data store. Its components include: ActiveX Data Objects (ADO), OLE DB, and Open Database Connectivity (ODBC). There have been several deprecated components as well, such as the Microsoft Jet Database Engine, MSDASQL, and Remote Data Services (RDS). Some components have also become obsolete, such as the former Data Access Objects API and Remote Data Objects.

An NTFS reparse point is a type of NTFS file system object. It is available with the NTFS v3.0 found in Windows 2000 or later versions. Reparse points provide a way to extend the NTFS filesystem. A reparse point contains a reparse tag and data that are interpreted by a filesystem filter identified by the tag. Microsoft includes several default tags including NTFS symbolic links, directory junction points, volume mount points and Unix domain sockets. Also, reparse points are used as placeholders for files moved by Windows 2000's Remote Storage Hierarchical Storage System. They also can act as hard links, but aren't limited to point to files on the same volume: they can point to directories on any local volume. The feature is inherited to ReFS.

Compound File Binary Format (CFBF), also called Compound File, Compound Document format, or Composite Document File V2 (CDF), is a compound document file format for storing numerous files and streams within a single file on a disk. CFBF is developed by Microsoft and is an implementation of Microsoft COM Structured Storage.

Windows Vista has many significant new features compared with previous Microsoft Windows versions, covering most aspects of the operating system.

Component Object Model (COM) is a binary-interface standard for software components introduced by Microsoft in 1993. It is used to enable inter-process communication object creation in a large range of programming languages. COM is the basis for several other Microsoft technologies and frameworks, including OLE, OLE Automation, Browser Helper Object, ActiveX, COM+, DCOM, the Windows shell, DirectX, UMDF and Windows Runtime. The essence of COM is a language-neutral way of implementing objects that can be used in environments different from the one in which they were created, even across machine boundaries. For well-authored components, COM allows reuse of objects with no knowledge of their internal implementation, as it forces component implementers to provide well-defined interfaces that are separated from the implementation. The different allocation semantics of languages are accommodated by making objects responsible for their own creation and destruction through reference-counting. Type conversion casting between different interfaces of an object is achieved through the QueryInterface method. The preferred method of "inheritance" within COM is the creation of sub-objects to which method "calls" are delegated.

Effi is C++ application development framework.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

References

  1. "What is Native Structured Storage?". Archived from the original on 2007-09-27. Retrieved 2007-12-03.