News Industry Text Format

Last updated

News Industry Text Format (NITF) is an XML specification designed to standardize the content and structure of individual text news articles.

Contents

Usage

The NITF specification defines a standard way to mark up an article's content and structure, as well as a wide variety of metadata that different organizations may choose to use.

Additionally, multimedia can be associated with articles, although NITF does not allow for layout of multimedia within article text. [1] Since NITF files are XML, they can be easily parsed, as well as transformed via XSLT to other formats.

The format is widely used across the news industry. Newspapers such as The New York Times, amongst others, news agencies such as Associated Press and Agence France-Presse, and archival services such as LexisNexis use NITF for inter-agency transmission of news as well as internal transmission and storage. [2]

NITF complements NewsML-G2 an IPTC XML format for bundling and transmitting news. NITF provided schema (XSD) files in addition to DTDs for validating NITF files.

History

NITF was developed jointly by the International Press Telecommunications Council (IPTC) and the Newspaper Association of America, the two major standards organizations of the global and the US American news industry. It started out as a SGML specification prior to its XML incarnation. [3]

Related Research Articles

Markup language Modern system for annotating a document

In computer text processing, a markup language is a system for annotating a document in a way that is visually distinguishable from the content. It is used only to format the text, so that when the document is processed for display, the markup language does not appear. The idea and terminology evolved from the "marking up" of paper manuscripts, which is traditionally written with a red pen or blue pencil on authors' manuscripts. Such "markup" typically includes both content corrections, and also typographic instructions, such as to make a heading larger or boldface.

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.

Scalable Vector Graphics Open standard for two-dimensional vector graphics

Scalable Vector Graphics (SVG) is an (XML)-based vector image format for two-dimensional graphics with support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999.

XML Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

The International Press Telecommunications Council (IPTC), based in London, United Kingdom, is a consortium of the world's major news agencies, other news providers and news industry vendors and acts as the global standards body of the news media.

The Darwin Information Typing Architecture (DITA) specification defines a set of document types for authoring and organizing topic-oriented information, as well as a set of mechanisms for combining, extending, and constraining document types. It is an open standard that is defined and maintained by the OASIS DITA Technical Committee.

The National Imagery Transmission Format Standard (NITFS) is a U.S. Department of Defense (DoD) and Federal Intelligence Community (IC) suite of standards for the exchange, storage, and transmission of digital-imagery products and image-related products.

The Extensible Metadata Platform (XMP) is an ISO standard, originally created by Adobe Systems Inc., for the creation, processing and interchange of standardized and custom metadata for digital documents and data sets.

Digital Accessible Information System Technical standard for digital audiobooks, periodicals and computerized text

Digital accessible information system (DAISY) is a technical standard for digital audiobooks, periodicals, and computerized text. DAISY is designed to be a complete audio substitute for print material and is specifically designed for use by people with "print disabilities", including blindness, impaired vision, and dyslexia. Based on the MP3 and XML formats, the DAISY format has advanced features in addition to those of a traditional audio book. Users can search, place bookmarks, precisely navigate line by line, and regulate the speaking speed without distortion. DAISY also provides aurally accessible tables, references, and additional information. As a result, DAISY allows visually impaired listeners to navigate something as complex as an encyclopedia or textbook, otherwise impossible using conventional audio recordings.

This article describes the technical specifications of the OpenDocument office document standard, as developed by the OASIS industry consortium. A variety of organizations developed the standard publicly and make it publicly accessible, meaning it can be implemented by anyone without restriction. The OpenDocument format aims to provide an open alternative to proprietary document formats.

A media type is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication of these classifications. Media types were originally defined in Request for Comments RFC 2045 (MIME) Part One: Format of Internet Message Bodies in November 1996 as a part of MIME specification, for denoting type of email message content and attachments; hence the original name, MIME type. Media types are also used by other internet protocols such as HTTP and document file formats such as HTML, for similar purposes.

The following is a comparison of e-book formats used to create and publish e-books.

The Information Interchange Model (IIM) is a file structure and set of metadata attributes that can be applied to text, images and other media types. It was developed in the early 1990s by the International Press Telecommunications Council (IPTC) to expedite the international exchange of news among newspapers and news agencies.

IPTC 7901 is a news service text markup specification published by the International Press Telecommunications Council that was designed to standardize the content and structure of text news articles. It was formally approved in 1979, and is still the world's most common way of transmitting news articles to newspapers, web sites and broadcasters from news services.

ANPA-1312 is a 7-bit news agency text markup specification published by the Newspaper Association of America, designed to standardize the content and structure of text news articles.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

EPUB E-book file format

EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook standard.

The Publishing Requirements for Industry Standard Metadata (PRISM) specification defines a set of XML metadata vocabularies for syndicating, aggregating, post-processing and multi-purposing content. PRISM provides a framework for the interchange and preservation of content and metadata, a collection of elements to describe that content, and a set of controlled vocabularies listing the values for those elements. PRISM can be XML, RDF/XML, or XMP and incorporates Dublin Core elements. PRISM can be thought of as a set of XML tags used to contain the metadata of articles and even tag article content.

References

  1. "Interactive NITF Documentation". NITF.
  2. "Who's Using NITF". IPTC.
  3. Dumbill, Edd (2000-07-17). "XML in news syndication". XML.com.