Content-based image retrieval

Last updated
General scheme of content-based image retrieval Principe cbir.png
General scheme of content-based image retrieval

Content-based image retrieval, also known as query by image content ( QBIC ) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases (see this survey [1] for a scientific overview of the CBIR field). Content-based image retrieval is opposed to traditional concept-based approaches (see Concept-based image indexing ).

Contents

"Content-based" means that the search analyzes the contents of the image rather than the metadata such as keywords, tags, or descriptions associated with the image. The term "content" in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself. CBIR is desirable because searches that rely purely on metadata are dependent on annotation quality and completeness.

Comparison with metadata searching

An image meta search requires humans to have manually annotated images by entering keywords or metadata in a large database, which can be time-consuming and may not capture the keywords desired to describe the image. The evaluation of the effectiveness of keyword image search is subjective and has not been well-defined. In the same regard, CBIR systems have similar challenges in defining success. [2] "Keywords also limit the scope of queries to the set of predetermined criteria." and, "having been set up" are less reliable than using the content itself. [3]

History

The term "content-based image retrieval" seems to have originated in 1992 when it was used by Japanese Electrotechnical Laboratory engineer Toshikazu Kato to describe experiments into automatic retrieval of images from a database, based on the colors and shapes present. [2] [4] Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools, and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision. [1]

QBIC - Query By Image Content

The earliest commercial CBIR system was developed by IBM and was called QBIC (Query By Image Content). [5] [6] Recent network- and graph-based approaches have presented a simple and attractive alternative to existing methods. [7]

While the storing of multiple images as part of a single entity preceded the term BLOB (Binary Large OBject), [8] the ability to fully search by content, rather than by description, had to await IBM's QBIC. [3]

VisualRank

VisualRank is a system for finding and ranking images by analysing and comparing their content, rather than searching image names, Web links or other text. Google scientists made their VisualRank work public in a paper describing applying PageRank to Google image search at the International World Wide Web Conference in Beijing in 2008.

[9]

Technical progress

The interest in CBIR has grown because of the limitations inherent in metadata-based systems, as well as the large range of possible uses for efficient image retrieval. Textual information about images can be easily searched using existing technology, but this requires humans to manually describe each image in the database. This can be impractical for very large databases or for images that are generated automatically, e.g. those from surveillance cameras. It is also possible to miss images that use different synonyms in their descriptions. Systems based on categorizing images in semantic classes like "cat" as a subclass of "animal" can avoid the miscategorization problem, but will require more effort by a user to find images that might be "cats", but are only classified as an "animal". Many standards have been developed to categorize images, but all still face scaling and miscategorization issues. [2]

Initial CBIR systems were developed to search databases based on image color, texture, and shape properties. After these systems were developed, the need for user-friendly interfaces became apparent. Therefore, efforts in the CBIR field started to include human-centered design that tried to meet the needs of the user performing the search. This typically means inclusion of: query methods that may allow descriptive semantics, queries that may involve user feedback, systems that may include machine learning, and systems that may understand user satisfaction levels. [1]

Techniques

Many CBIR systems have been developed, but as of 2006, the problem of retrieving images on the basis of their pixel content remains largely unsolved. [1] [ needs update ]

Different query techniques and implementations of CBIR make use of different types of user queries.

Query By Example

QBE (Query By Example) is a query technique [10] that involves providing the CBIR system with an example image that it will then base its search upon. The underlying search algorithms may vary depending on the application, but result images should all share common elements with the provided example. [11]

Options for providing example images to the system include:

This query technique removes the difficulties that can arise when trying to describe images with words.

Semantic retrieval

Semantic retrieval starts with a user making a request like "find pictures of Abraham Lincoln". This type of open-ended task is very difficult for computers to perform - Lincoln may not always be facing the camera or in the same pose. Many CBIR systems therefore generally make use of lower-level features like texture, color, and shape. These features are either used in combination with interfaces that allow easier input of the criteria or with databases that have already been trained to match features (such as faces, fingerprints, or shape matching). However, in general, image retrieval requires human feedback in order to identify higher-level concepts. [6]

Relevance feedback (human interaction)

Combining CBIR search techniques available with the wide range of potential users and their intent can be a difficult task. An aspect of making CBIR successful relies entirely on the ability to understand the user intent. [12] CBIR systems can make use of relevance feedback , where the user progressively refines the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the search query, then repeating the search with the new information. Examples of this type of interface have been developed. [13]

Iterative/machine learning

Machine learning and application of iterative techniques are becoming more common in CBIR. [14]

Other query methods

Other query methods include browsing for example images, navigating customized/hierarchical categories, querying by image region (rather than the entire image), querying by multiple example images, querying by visual sketch, querying by direct specification of image features, and multimodal queries (e.g. combining touch, voice, etc.) [15]

Content comparison using image distance measures

The most common method for comparing two images in content-based image retrieval (typically an example image and an image from the database) is using an image distance measure. An image distance measure compares the similarity of two images in various dimensions such as color, texture, shape, and others. For example, a distance of 0 signifies an exact match with the query, with respect to the dimensions that were considered. As one may intuitively gather, a value greater than 0 indicates various degrees of similarities between the images. Search results then can be sorted based on their distance to the queried image. [11] Many measures of image distance (Similarity Models) have been developed. [16]

Color

Computing distance measures based on color similarity is achieved by computing a color histogram for each image that identifies the proportion of pixels within an image holding specific values. [2] Examining images based on the colors they contain is one of the most widely used techniques because it can be completed without regard to image size or orientation. [6] However, research has also attempted to segment color proportion by region and by spatial relationship among several color regions. [15]

Texture

Texture measures look for visual patterns in images and how they are spatially defined. Textures are represented by texels which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located. [11]

Texture is a difficult concept to represent. The identification of specific textures in an image is achieved primarily by modeling texture as a two-dimensional gray level variation. The relative brightness of pairs of pixels is computed such that degree of contrast, regularity, coarseness and directionality may be estimated. [6] [17] The problem is in identifying patterns of co-pixel variation and associating them with particular classes of textures such as silky, or rough.

Other methods of classifying textures include:

Shape

Shape does not refer to the shape of an image but to the shape of a particular region that is being sought out. Shapes will often be determined first applying segmentation or edge detection to an image. Other methods use shape filters to identify given shapes of an image. [18] Shape descriptors may also need to be invariant to translation, rotation, and scale. [6]

Some shape descriptors include: [6]

Vulnerabilities, attacks and defenses

Like other tasks in computer vision such as recognition and detection, recent neural network based retrieval algorithms are susceptible to adversarial attacks, both as candidate and the query attacks. [19] It is shown that retrieved ranking could be dramatically altered with only small perturbations imperceptible to human beings. In addition, model-agnostic transferable adversarial examples are also possible, which enables black-box adversarial attacks on deep ranking systems without requiring access to their underlying implementations. [19] [20]

Conversely, the resistance to such attacks can be improved via adversarial defenses such as the Madry defense. [21]

Image retrieval evaluation

Measures of image retrieval can be defined in terms of precision and recall. However, there are other methods being considered. [22]

Image retrieval in CBIR system simultaneously by different techniques

An image is retrieved in CBIR system by adopting several techniques simultaneously such as Integrating Pixel Cluster Indexing, histogram intersection and discrete wavelet transform methods. [23]

Applications

Potential uses for CBIR include: [2]

Commercial Systems that have been developed include: [2]

Experimental Systems include: [2]

See also

Related Research Articles

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.

<span class="mw-page-title-main">Metasearch engine</span> ALO.Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

Keyword spotting is a problem that was historically first defined in the context of speech processing. In speech processing, keyword spotting deals with the identification of keywords in utterances.

<span class="mw-page-title-main">Automatic image annotation</span>

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

<span class="mw-page-title-main">Tag cloud</span> Visual representation of word frequency

A tag cloud is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine in response to a keyword query.

<span class="mw-page-title-main">Image organizer</span> Software for organising digital images

An image organizer or image management application is application software for organising digital images. It is a kind of desktop organizer software application.

Image meta search is a type of search engine specialised on finding pictures, images, animations etc. Like the text search, image search is an information retrieval system designed to help to find information on the Internet and it allows the user to look for images etc. using keywords or search phrases and to receive a set of thumbnail images, sorted by relevancy.

In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others.

An audio search engine is a web-based search engine which crawls the web for audio content. The information can consist of web pages, images, audio files, or another type of document. Various techniques exist for research on these engines.

Artificial imagination is a narrow subcomponent of artificial general intelligence which generates, simulates, and facilitates real or possible fiction models to create predictions, inventions, or conscious experiences.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

XML retrieval, or XML information retrieval, is the content-based retrieval of documents structured with XML. As such it is used for computing relevance of XML documents.

<span class="mw-page-title-main">Reverse image search</span> Content-based image retrieval

Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful. In particular, reverse image search is characterized by a lack of search terms. This effectively removes the need for a user to guess at keywords or terms that may or may not return a correct result. Reverse image search also allows users to discover content that is related to a specific sample image or the popularity of an image, and to discover manipulated versions and derivative works.

<span class="mw-page-title-main">Learning to rank</span> Use of machine learning to rank items

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

<span class="mw-page-title-main">Visual Word</span>

Visual words, as used in image retrieval systems, refer to small parts of an image that carry some kind of information related to the features or changes occurring in the pixels such as the filtering, low-level feature descriptors.

A 3D Content Retrieval system is a computer system for browsing, searching and retrieving three dimensional digital contents from a large database of digital images. The most original way of doing 3D content retrieval uses methods to add description text to 3D content files such as the content file name, link text, and the web page title so that related 3D content can be found through text retrieval. Because of the inefficiency of manually annotating 3D files, researchers have investigated ways to automate the annotation process and provide a unified standard to create text descriptions for 3D contents. Moreover, the increase in 3D content has demanded and inspired more advanced ways to retrieve 3D information. Thus, shape matching methods for 3D content retrieval have become popular. Shape matching retrieval is based on techniques that compare and contrast similarities between 3D models.

References

  1. 1 2 3 4 Content-based Multimedia Information Retrieval: State of the Art and Challenges (Original source, 404'd)Content-based Multimedia Information Retrieval: State of the Art and Challenges Archived 2007-09-28 at the Wayback Machine , Michael Lew, et al., ACM Transactions on Multimedia Computing, Communications, and Applications, pp. 1–19, 2006.
  2. 1 2 3 4 5 6 7 Eakins, John; Graham, Margaret. "Content-based Image Retrieval". University of Northumbria at Newcastle. Archived from the original on 2012-02-05. Retrieved 2014-03-10.
  3. 1 2 Julie Anderson (April 29, 1996). "Search Images / Object Design Inc - Bargain of the year Stock Discussion Forums (Aug. 6, 1996)". Information Week (OnLine-reprinted in Silicon Investor's Stock Discussion Forums (Aug. 6, 1996). p. 69 (IW). At DB Expo in San Francisco earlier this month ...[ permanent dead link ]
  4. Kato, Toshikazu (April 1992). Jamberdino, Albert A.; Niblack, Carlton W. (eds.). "Database architecture for content-based image retrieval". Image Storage and Retrieval Systems. International Society for Optics and Photonics. 1662: 112–123. Bibcode:1992SPIE.1662..112K. doi:10.1117/12.58497. S2CID   14342247.
  5. Flickner, M.; Sawhney, H.; Niblack, W.; Ashley, J.; Qian Huang; Dom, B.; Gorkani, M.; Hafner, J.; Lee, D.; Petkovic, D.; Steele, D.; Yanker, P. (1995). "Query by image and video content: the QBIC system". Computer. 28 (9): 23–32. doi:10.1109/2.410146. Abstract: Research on ways to extend and improve query methods for image databases is widespread. We have developed the QBIC (Query by Image Content) ...
  6. 1 2 3 4 5 6 Rui, Yong; Huang, Thomas S.; Chang, Shih-Fu (1999). "Image Retrieval: Current Techniques, Promising Directions, and Open Issues". Journal of Visual Communication and Image Representation. 10: 39–62. CiteSeerX   10.1.1.32.7819 . doi:10.1006/jvci.1999.0413. S2CID   2910032.[ permanent dead link ]
  7. Banerjee, S. J.; et al. (2015). "Using complex networks towards information retrieval and diagnostics in multidimensional imaging". Scientific Reports. 5: 17271. arXiv: 1506.02602 . Bibcode:2015NatSR...517271B. doi:10.1038/srep17271. PMC   4667282 . PMID   26626047.
  8. "The true story of BLOBs". Archived from the original on 2011-07-23.
  9. Yushi Jing and Baluja, S. (2008). "VisualRank: Applying PageRank to Large-Scale Image Search". IEEE Transactions on Pattern Analysis and Machine Intelligence. 30 (11): 1877–1890. CiteSeerX   10.1.1.309.741 . doi:10.1109/TPAMI.2008.121. ISSN   0162-8828. PMID   18787237. S2CID   10545157..
  10. "Query-by-Example". IBM.com KnowledgeCenter. QBE is a language for querying ...
  11. 1 2 3 4 Shapiro, Linda; George Stockman (2001). Computer Vision. Upper Saddle River, NJ: Prentice Hall. ISBN   978-0-13-030796-5.
  12. Datta, Ritendra; Dhiraj Joshi; Jia Li; James Z. Wang (2008). "Image Retrieval: Ideas, Influences, and Trends of the New Age". ACM Computing Surveys. 40 (2): 1–60. doi:10.1145/1348246.1348248. S2CID   7060187.
  13. 1 2 Bird, C.L.; P.J. Elliott; E. Griffiths (1996). "User interfaces for content-based image retrieval". IEE Colloquium on Intelligent Image Databases. IET. doi:10.1049/ic:19960746.
  14. Cardoso, Douglas; et al. "Iterative Technique for Content-Based Image Retrieval using Multiple SVM Ensembles" (PDF). Federal University of Parana(Brazil). Retrieved 2014-03-11.
  15. 1 2 Liam M. Mayron. "Image Retrieval Using Visual Attention" (PDF). Mayron.net. Retrieved 2012-10-18.
  16. Eidenberger, Horst (2011). "Fundamental Media Understanding", atpress. ISBN   978-3-8423-7917-6.
  17. Tamura, Hideyuki; Mori, Shunji; Yamawaki, Takashi (1978). "Textural Features Corresponding to Visual Perception". IEEE Transactions on Systems, Man, and Cybernetics. 8 (6): 460, 473. doi:10.1109/tsmc.1978.4309999. S2CID   32197839.
  18. Tushabe, F.; M.H.F. Wilkinson (2008). "Content-Based Image Retrieval Using Combined 2D Attribute Pattern Spectra". Advances in Multilingual and Multimodal Information Retrieval (PDF). Lecture Notes in Computer Science. Vol. 5152. pp. 554–561. doi:10.1007/978-3-540-85760-0_69. ISBN   978-3-540-85759-4. S2CID   18566543.
  19. 1 2 Zhou, Mo; Niu, Zhenxing; Wang, Le; Zhang, Qilin; Hua, Gang (2020). "Adversarial Ranking Attack and Defense". arXiv: 2002.11293v2 [cs.CV].
  20. Li, Jie; Ji, Rongrong; Liu, Hong; Hong, Xiaopeng; Gao, Yue; Tian, Qi (2019). "Universal Perturbation Attack Against Image Retrieval". pp. 4899–4908. arXiv: 1812.00552 [cs.CV].
  21. Madry, Aleksander; Makelov, Aleksandar; Schmidt, Ludwig; Tsipras, Dimitris; Vladu, Adrian (2017-06-19). "Towards Deep Learning Models Resistant to Adversarial Attacks". arXiv: 1706.06083v4 [stat.ML].
  22. Deselaers, Thomas; Keysers, Daniel; Ney, Hermann (2007). "Features for Image Retrieval: An Experimental Comparison" (PDF). RWTH Aachen University. Retrieved 11 March 2014.
  23. Bhattacharjee, Pijush kanti (2010). "Integrating Pixel Cluster Indexing, Histogram Intersection and Discrete Wavelet Transform Methods for Color Images Content Based Image Retrieval System" (PDF). International Journal of Computer and Electrical Engineering [IJCEE], Singapore, vol. 2, no. 2, pp. 345-352, 2010.
  24. Wang, James Ze; Jia Li; Gio Wiederhold; Oscar Firschein (1998). "System for Screening Objectionable Images". Computer Communications. 21 (15): 1355–1360. CiteSeerX   10.1.1.78.7689 . doi:10.1016/s0140-3664(98)00203-5.

Further reading

Relevant research papers