This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: "Multi-document summarization" – news · newspapers · books · scholar · JSTOR (January 2016) (Learn how and when to remove this template message)

Multi-document summarization is an automatic procedure aimed at extraction of information from multiple texts written about the same topic. The resulting summary report allows individual users, such as professional information consumers, to quickly familiarize themselves with information contained in a large cluster of documents. In such a way, multi-document summarization systems are complementing the news aggregators performing the next step down the road of coping with information overload.

Key benefits and difficulties

Multi-document summarization creates information reports that are both concise and comprehensive. With different opinions being put together & outlined, every topic is described from multiple perspectives within a single document. While the goal of a brief summary is to simplify information search and cut the time by pointing to the most relevant source documents, comprehensive multi-document summary should in theory contain the required information, hence limiting the need for accessing original files to cases when refinement is required. In practice, it is hard to summarize multiple documents with conflicting views and biases. In fact, it is almost impossible to achieve clear extractive summarization of documents with conflicting views. Abstractive summarization is the preferred venue in this case.

Automatic summaries present information extracted from multiple sources algorithmically, without any editorial touch or subjective human intervention, thus making it completely unbiased. The difficulties remain, if doing automatic extractive summaries of documents with conflicting views.

Technological challenges

The multi-document summarization task is more complex than summarizing a single document, even a long one. The difficulty arises from thematic diversity within a large set of documents. A good summarization technology aims to combine the main themes with completeness, readability, and concision. The Document Understanding Conferences,[1] conducted annually by NIST, have developed sophisticated evaluation criteria for techniques accepting the multi-document summarization challenge.

An ideal multi-document summarization system not only shortens the source texts, but also presents information organized around the key aspects to represent diverse views. Success produces an overview of a given topic. Such text compilations should also basic requirements for an overview text compiled by a human. The multi-document summary quality criteria are as follows:

The latter point deserves an additional note. Care is taken to ensure that the automatic overview shows:

Real-life systems

The multi-document summarization technology is now coming of age - a view supported by a choice of advanced web-based systems that are currently available.

As auto-generated multi-document summaries increasingly resemble the overviews written by a human, their use of extracted text snippets may one day face copyright issues in relation to the fair use copyright concept.

Bibliography

See also

References

  1. ^ "Document Understanding Conferences". Nlpir.nist.gov. 2014-09-09. Retrieved 2016-01-10.
  2. ^ "Generate Research Report". Ultimate Research Assistant. Retrieved 2016-01-10.
  3. ^ "iResearch Reporter service". Iresearch-reporter.com. Archived from the original on 2013-06-09. Retrieved 2016-01-10.
  4. ^ [1] Archived April 16, 2013, at the Wayback Machine
  5. ^ [2] Archived April 11, 2011, at the Wayback Machine
  6. ^ "News Feed Researcher | General Stuff". Newsfeedresearcher.com. Retrieved 2016-01-10.
  7. ^ [3] Archived September 19, 2009, at the Wayback Machine
  8. ^ [4] Archived May 29, 2013, at the Wayback Machine