This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) A major contributor to this article appears to have a close connection with its subject. It may require cleanup to comply with Wikipedia's content policies, particularly neutral point of view. Please discuss further on the talk page. (July 2014) (Learn how and when to remove this message) This article possibly contains original research. Please improve it by verifying the claims made and adding inline citations. Statements consisting only of original research should be removed. (July 2014) (Learn how and when to remove this message) (Learn how and when to remove this message)

Multimedia information retrieval (MMIR or MIR) is a research discipline of computer science that aims at extracting semantic information from multimedia data sources.[1][failed verification] Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text, semantic descriptions,[2] biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:

  1. Methods for the summarization of media content (feature extraction). The result of feature extraction is a description.
  2. Methods for the filtering of media descriptions (for example, elimination of redundancy)
  3. Methods for the categorization of media descriptions into classes.

Feature extraction methods

Feature extraction is motivated by the sheer size of multimedia objects as well as their redundancy and, possibly, noisiness.[1]: 2 [failed verification] Generally, two possible goals can be achieved by feature extraction:

Merging and filtering methods

Multimedia Information Retrieval implies that multiple channels are employed for the understanding of media content.[5] Each of this channels is described by media-specific feature transformations. The resulting descriptions have to be merged to one description per media object. Merging can be performed by simple concatenation if the descriptions are of fixed size. Variable-sized descriptions – as they frequently occur in motion description – have to be normalized to a fixed length first.

Frequently used methods for description filtering include factor analysis (e.g. by PCA), singular value decomposition (e.g. as latent semantic indexing in text retrieval) and the extraction and testing of statistical moments. Advanced concepts such as the Kalman filter are used for merging of descriptions.

Categorization methods

Generally, all forms of machine learning can be employed for the categorization of multimedia descriptions[1]: 125 [failed verification] though some methods are more frequently used in one area than another. For example, hidden Markov models are state-of-the-art in speech recognition, while dynamic time warping – a semantically related method – is state-of-the-art in gene sequence alignment. The list of applicable classifiers includes the following:

The selection of the best classifier for a given problem (test set with descriptions and class labels, so-called ground truth) can be performed automatically, for example, using the Weka Data Miner.

Open problems

The quality of MMIR Systems[6] depends heavily on the quality of the training data. Discriminative descriptions can be extracted from media sources in various forms. Machine learning provides categorization methods for all types of data. However, the classifier can only be as good as the given training data. On the other hand, it requires considerable effort to provide class labels for large databases. The future success of MMIR will depend on the provision of such data.[7] The annual TRECVID competition is currently one of the most relevant sources of high-quality ground truth.

Related areas

MMIR provides an overview over methods employed in the areas of information retrieval.[8][9] Methods of one area are adapted and employed on other types of media. Multimedia content is merged before the classification is performed. MMIR methods are, therefore, usually reused from other areas such as:

The International Journal of Multimedia Information Retrieval[10] documents the development of MMIR as a research discipline that is independent of these areas. See also Handbook of Multimedia Information Retrieval[11] for a complete overview over this research discipline.

References

  1. ^ a b c H Eidenberger. Fundamental Media Understanding, atpress, 2011, p. 1.
  2. ^ Sikos, L. F. (2016). "RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing: a comprehensive review". Multimedia Tools and Applications. 76 (12): 14437–14460. doi:10.1007/s11042-016-3705-7. S2CID 254832794.
  3. ^ A Del Bimbo. Visual Information Retrieval, Morgan Kaufmann, 1999.
  4. ^ HG Kim, N Moreau, T Sikora. MPEG-7 Audio and Beyond", Wiley, 2005.
  5. ^ MS Lew (Ed.). Principles of Visual Information Retrieval, Springer, 2001.
  6. ^ JC Nordbotten. "Multimedia Information Retrieval Systems". Retrieved 14 October 2011.
  7. ^ H Eidenberger. Frontiers of Media Understanding, atpress, 2012.
  8. ^ H Eidenberger. Professional Media Understanding, atpress, 2012.
  9. ^ Raieli, Roberto (2016). "Introducing Multimedia Information Retrieval to libraries". JLIS.it. 7 (3): 9–42. doi:10.4403/jlis.it-11530. S2CID 56652314.
  10. ^ "International Journal of Multimedia Information Retrieval", Springer, 2011, Retrieved 21 October 2011.
  11. ^ H Eidenberger. Handbook of Multimedia Information Retrieval, atpress, 2012.