This is the talk page for discussing improvements to the Data article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This level-5 vital article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: |
||||||||||||||||||||||||||||
|
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later. |
This article deals with WikiProject Computing and there is an article for Data (Computing). Why we have that inconsistency?
Data is defined as: "is a set of values of qualitative or quantitative variables. Up to here, it is ok, but constrained to the Computing world. Should we constrain the definition to computing only? (This seems a good decision indeed though).
"restated, pieces of data are individual pieces of information". The definition of Data includes, naively, the term "pieces of data".
Indeed, I just came here to post a similar comment. So I agree that the definition is naive and circular. martyn.jones@cambriano.es
And finally infers: "pieces of data are pieces of information". Data=Information. I am the only one reading that?
Information is defined as: "Information (shortened as info or info.) is that which informs," Information is that which informs...: Tautologic Expression.
"i.e. an answer to a question, as well as that from which knowledge and data can be derived (as data represents values attributed to parameters,".
Note also that Information is "that from which knowledge and data can be derived". Circular Definition toward Data.
"and knowledge signifies understanding of real things or abstract concepts" We ALL know that data is not information, and that is not knowledge, but including as the keystone for the definition?? I simply cannot agree. It is like to say: "Data is a piece of information, and no, beware that is not information." and "Information is which informs, and no, beware that is not knowledge"
Please guys, anybody supports or agree with this?? — Preceding unsigned comment added by Hypfco (talk • contribs) 20:19, 31 May 2015 (UTC)
As above author(s) noted, the current definition given for data: (a) says that data is information; and (b) implies that the word is used in computing only (particularly in the 2nd paragraph).
Please consider the following:
I propose the following definition:
Rwilkin (talk) 02:24, 12 August 2015 (UTC) Rwilkin (talk) 03:36, 12 August 2015 (UTC) Rwilkin (talk) 05:48, 12 August 2015 (UTC)
It looks like there are some disagreements on the usage of data as a mass noun. It is excellent that we have many citations on it's usage, but instead of stating statistics from the sources, we are having minor edit wars on words such as "many"/"most", "often/usually", etc. Perhaps we should instead cite the statistics from the source and leave speculation to the reader.Gsonnenf (talk) 11:36, 2 April 2009 (UTC)
I'm going to try clearing out the weasel words from the Usage in English section and see what is allowed to stick. Citation needed tag added because it's so specific, not that it's disputed.Acronymsical (talk) 16:59, 28 February 2011 (UTC)
It look difficult for me to understand A datum is a statement accepted at face value..
What do think about definition and explanations like this:
Data is ~evidence (or some another term) on the input of information system. Data is subject of data processing by information system. Data could contain usefull information and could not.
I think, it is good, when a definition uses other wikipedia terms. Not just plain English. Kenny sh 08:30, 10 May 2004 (UTC)
A separate page for datum is needed. In geology/cartography/geography and surveying a datum is a reference surface. For instance, sea-level is often used as a datum below which depths (or above which heights) are measured.
Hello COMPATT, to address your comments about the distinction between data and information -- I agree that programs are a form of data, but I think it's important to keep in mind that the word "data" has a history of usage that goes back much farther than computer science. The distinction between data and information, which is made in the article, is that information is derived from an interpretation of data. Some data don't have any obvious interpretation, and so we might noodle over ancient inscriptions for a long time, but some other data have such an immediate interpretation, especially in a given cultural context, that the interpretation is held to be the same as the data -- for example if I look at a photograph, I might immediately see "a dog" instead of "a pattern of silver particles which suggests a dog". I think the interpretation aspect, and its dependence on context, might be emphasized in the article. Well, I've rambled on long enough! Have a great day, Wile E. Heresiarch 14:33, 18 Mar 2004 (UTC)
Hello, as a comment on the edit that I just made. I put a new, short intro paragraph at the beginning, to hopefully get straight to the point. (The article was noodling around in etymology a little too much before getting to the punch line. Hopefully that's corrected now.) As the term "data" is rather general, I've attempted to give a general definition, and then immediately describe one of the most-used types of data (measurements & observations). I'm hoping that there is a right level of generality now. Happy editing, Wile E. Heresiarch 15:44, 19 Mar 2004 (UTC)
There's another meaning of the singular datum. In the US Navy, the term is applied to the last known position of a submarine whose precise location is no longer known. I don't think I ever heard it used in the plural; there just aren't that many submarines and there's a great deal of seawater under which to spread them. Dick Kimball (talk) 18:20, 2 April 2008 (UTC)
Hi,
- I inserted most general and shortest functional definition of data (see function definition)
- about
Referring to the sentence "this is all the data from the experiment", the assertion that "this usage is inconsistent with the rules of Latin grammar and traditional English" seems odd. If the word data is being treated as a mass noun, then surely the sentence is consistent with "traditional English".
Freddygetty (talk) 09:48, 19 March 2011 (UTC)
I changed it. In my opinion: - too much information noise (uncertainty of the author (?)) in this paragraph.
- As it is, the phone number is not actionable - you know it is a phone number, but it is of no use. This information becomes knowledge when you can act on this information, either to solve a problem (for example, to call Helen, whose phone number it is), or to gain insight into an issue (e.g. by noting that other phone numbers have the same exchange). People or computers can find patterns in and between data to perceive relationships between information, creating or enhancing knowledge. Since knowledge is prerequisite to wisdom, we always want more data and information. But, as modern societies verge on information overload, we especially need better ways to find patterns.
This in not about data, it is not necessary digresion – I removed.
See also: http://en.wikipedia.org/wiki/Talk:Knowledge about DIKW.
I do not find (on the Web) any articles which confirm the interpretation of the DIKW model which were suggested.
--Adam M. Gadomski 18:01, 4 November 2005 (UTC)
Simon, your reply is a meta-response. Is it a style of "Space-invaders"? You copy the original research with not proper references - is it correct???
You (and only you) inserted DIKW in Wikipedia in a few articles.
Why do you do it?
- I see that your self-promotion on the Web is perfect, my congratulations, but I would like to see your sc.publications too - maybe this information could clear my doubts why "you are linking extensively" to and "update" this subject.
--Adam M. Gadomski 16:41, 24 November 2005 (UTC)
Information#Information is not data does not seem available anymore. --Inkiwna 15:42, 2 March 2007 (UTC)
The first line of this article needs to change. Datum WAS the plural of datum, but no one uses it this way. In fact, in surveying, datum and data are too completely different words. Datum is a coordinate system for locating a point on the earth, while surveyors use data to mean what everyone else does. The plural of survey datum is datums, since data has a completely different meaning.
English does not follow the rules of a dead language that it happened to borrow a word from. See the back-formation article for numerous examples. You'll note that no one ever complains that "asset" is incorrect usage.
The statement "but these are English sentences, so Latin grammar rules do not apply" seems to be an unencyclopaedic opinion tagged on to an otherwise neutral sentence stating the status of the word as plural in Latin. The rules applied in English sentences are clearly rules of English grammar, not Latin, but English happens to have the same rule as Latin in this instance, i.e., that a plural noun requires a verb in the plural. The debate is not whether Latin rules should apply to English, but whether the word data is plural or singular in English, based on etymology and usage. I propose to delete the clause "but these are English sentences..." if there is no further discussion. GKantaris (talk) 15:45, 2 January 2008 (UTC) - OK, as there is no discussion, I've deleted the clause. GKantaris (talk) 16:21, 14 January 2008 (UTC)
The problem is not one of right vs wrong but of precision. In general English usage, 'data' is used interchangably with 'information' so it feels more natural to use it as a mass noun. For more technical use, 'data' must be pluralised to distinguish it from 'datum' and 'information'. (15 (a datum), is part of 15-08-65 (data) which is my birthday (information).
Many words, such as 'average', 'intellegent' or 'fruit' have precise technical meanings that differ from the way they are used in everyday speech and there is nothing wrong with this. —Preceding unsigned comment added by 194.150.177.249 (talk) 14:33, 24 November 2008 (UTC)
Data originated as the plural of Latin datum, "something given," and many maintain that it must still be treated as a plural form. The New York Times, for example, adheres to the traditional rule in this headline: "Data Are Elusive on the Homeless." But while data comes from a Latin plural form, the practice of treating data as plural in English often does not correspond to its meaning, given an understanding of what counts as data in modern research. We know, for example, what "data on the homeless" would consist of — surveys, case histories, statistical analyses, and so forth — but it would be a vain exercise to try to sort all of these out into sets of individual facts, each of them a "datum" on the homeless. (Does a case history count as a single datum, or as a collection of them? Is a correlation between rates of homelessness and unemployment itself a datum, or is it an abstraction over a number of data?) Since scientists and researchers think of data as a singular mass entity like information, it is entirely natural that they should have come to talk about it as such and that others should defer to their practice. Sixty percent of the Usage Panel accepts the use of data with a singular verb and pronoun in the sentence Once the data is in, we can begin to analyze it. A still larger number, 77 percent, accepts the sentence We have very little data on the efficacy of such programs, where the singularity of data is implicit in the use of the quantifier very little (contrast the oddness of We have very little facts on the efficacy of such programs).
To summarize, data has never been a plural of a count noun in English. It is used in two constructions — plural, with plural apparatus, and singular, as a mass noun, with singular apparatus. Both constructions are fully standard at any level of formality. The plural construction is more common.
Pronounced "Day-Ta" (US) and "Dar-Tar" (AU & UK*)
Living in the UK, I've only ever heard it pronounced as the former, "Day-ta"; only from Americans have I heard the latter, "Dar-Tar".
I've lived in many states in the US, from the west coast to the east coast to the midwest. I've never heard anyone say dar-tar. I've heard day-ta and daa-ta (like Dagwood). Never dar-tar. Entbark 03:48, 23 July 2007 (UTC)
Someone changed the page to say data is not a synonym for information. They should look it up in the dictionary: http://www.dict.org/bin/Dict?Form=Dict1&Query=data&Strategy=*&Database=* Daniel.Cardenas 15:34, 25 April 2007 (UTC)
That is classroom material applicable to computer science people and the like, but not 100% applicable to the rest of the world. Thanks for the link. Daniel.Cardenas 15:04, 27 April 2007 (UTC)
This statement, 'The word data is the plural of Latin datum, neuter past participle of dare, "to give", hence "something given",' is a little confusing. If datum and data are both nouns, they cannot also be past participles since participles are verb forms. That statement makes it sound like the noun datum is a particple of dare. Nouns cannot be particples. The same word can be used as both a noun and a verb (e.g., "I scream" and "I heard a scream"), but a noun is NOT a participle EVER.
Oh, and I found where that phrase was taken from: http://www.johntcullen.com/sharpwriter/content/data_is.htm. Hardly a trustworthy source. He doesn't list any references, much less know the difference between a verb and noun.
Entbark 19:49, 12 July 2007 (UTC)
So, if no one is opposed to me changing it, I will modify the etymology section in a few days. Entbark 03:53, 23 July 2007 (UTC)
I prefer "these data" because it makes everyone pause, and reflect on how wrong their notions of grammar are. —Preceding unsigned comment added by 71.193.226.225 (talk) 07:58, 2 April 2008 (UTC)
Of course, 'data' is the plural of 'datum', just as 'bacteria' is the plural of 'bacterium', 'media' of 'medium','phenomena' of 'phenomenon', 'criteria' of 'criterion', etc. etc. After all, English has a huge legacy from Latin (and Greek) to cherish, which shouldn't be chucked out for the sake of dumbing down. The term 'mass noun' is a licence to forced collectivization. The mere fact that 'data' (and 'media') are treated often as singular today is a sign of the degenerative grammatical dementia rampant in these supposedly advanced modern times. --Artefactme (talk) 10:16, 22 April 2015 (UTC)
Grammatical rules dictate that a mass or uncountable noun, when appended to a determiner, must choose a determiner of the same type.
So if data is treated as a mass noun, one would ask " How much data was collected?" On the other hand if data is treated strictly as a countable, one would ask "How many data were collected?"
Does anyone else find this awkward? —Preceding unsigned comment added by 63.201.67.93 (talk) 06:11, 16 July 2008 (UTC)
PAISA PAISA PAISA —Preceding unsigned comment added by 220.226.199.10 (talk) 09:53, 19 July 2008 (UTC)
The current page says:
Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three.[citation needed] For example, the height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge".
for the needed citation I propose
Most frequently the data - information - knowledge - [wisdom] hierarchy is attributed to Ackoff
Ackoff, Russell L (1989). “From Data to Wisdom” Journal of Applied Systems Analysis, v. 16 pp. 3-9
but it has been presented by earlier authors:
Kochen, Manfred (1974) Principles of Information Retrieval John Wiley & Sons Inc. (Ch 3)
I think that a careful reading of Kochen or Ackoff would lead one to argue that knowledge resides within the human mind (as soon as it is written down it becomes information) Thus I would change the example by deleting
" and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge"."
and substituting
"and the practical understanding of an experienced climber of the best way to reach Mt. Everest's peak may be considered as "knowledge".
CarlD (talk) 13:47, 14 August 2008 (UTC)CarlD
I think this statement is incorrect. I have always understood data to be raw unprocessed, where as information was determined by the data. Yet this statement seems to be saying that Information is the raw form and data is the processed form. I have checked up on Google and it seems that this article is the only place which refers to data as process form see
http://www.diffen.com/difference/Data_vs_Information
or
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/foils/introToCP1500.html
or
http://www.cs.siena.edu/~ebreimer/courses/csis-114-s08/lectures/Data%20vs.%20Information%20(4).ppt
To see what I mean each case they are talking about data raw facts without reference, information is facts with reference which can be used.Harvyk (talk) 04:23, 2 October 2008 (UTC)
I agree, as this is the academic interpretation I have always heard. I was surprised to find it reversed in the article. —Preceding unsigned comment added by 68.46.139.114 (talk) 18:37, 9 October 2008 (UTC)
It is plainly evident that the question of whether "data" is a plural or is a mass noun is relevant only to the argument itself. This discussion exists to perpetuate the sense of correctness felt by the arguers on either side, nothing more. Struhs (talk) 18:56, 29 September 2009 (UTC)
Though, there exists some sense of correctness felt by arguers, as exists in all debates, etymology and word usage IS the domain of encyclopedic knowledge. The debate is important to linguists, authors of style guides, and academic sources. Many etymologists may even find evolution of the word striking and exciting.71.222.241.78 (talk) —Preceding undated comment added 07:44, 9 July 2010 (UTC).
I know that the whole data is/data are debate is a hot button on this page, but Wiki policy does require one to at least be consistent. Since the article opens with the assertion that data is the plural of datum I believe would should use it that way consistenly in this article, except of course where we are presenting examples of its usage as a singular noun. So I went through the few places where its use was inconsistent and, um, regularized it. Dave (djkernen)|Talk to me|Please help! 20:28, 6 December 2011 (UTC)
data is a type of chart that you do in data for example projects — Preceding unsigned comment added by 98.77.248.196 (talk) 21:28, 28 August 2012 (UTC)
The article says: 'Some major newspapers such as The New York Times use it either in the singular or plural. In the New York Times the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared within one day.' However the author of this sentence is parsing the second example incorrectly. The verb 'is' refers to 'the first year,' definitely singular, and not to the word 'data.'
Data(computing) in an Operational definition states Data are the quantities, characters, or symbols on which operations are performed by a computer.... Characters/symbols include what is commonly referred to as texts. Examples of texts where operations are performed by a computer include: computer programs (say written in COBOL), word processing, and a Google search of millions of web pages. As I understand the Theoretical definition of data given here, texts in general are neither qualitative nor quantitative, thus texts in general are not data (some texts, "male/female" for example, may be qualitative data)
If texts in general are not data then is the following sentence correct? Data and texts are the quantities, characters, or symbols on which operations are performed by a computer.... Rather than simply adding and texts, is their a better correction? It would be awkward to have an article titled Data(computing) that in its first sentence expands the article beyond just data.
It seems unfortunate to have excluded texts (and language!) from being data. If that was not intended then possibly changes here .... 50.136.247.190 (talk) 08:35, 14 July 2013 (UTC)
I am a newbie, please advise if there is a better way to go about what I am trying to do. Which is to suggest that there are fundamental problems with the Data entry with the hope that it may be improved.
(1) There is an entry for "Data" and an entry for "Data (computing)", but the talk page for "Data" refers to "WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology". This sounds more like a talk page for "Data (computing)" than for "Data". I would have thought that the entry for "Data" would encompass contexts other than computing. Examples follow.
(2) Consider for example data in the physical sciences, the life sciences, the social sciences, in statistics and in "official" statistics.
(2a) Data in the physical sciences tends to measurements, e.g., of the positions of stars that were the motivation for Gauss' development of the normal distribution. Note Wikipedia entry Accuracy and precision.
(2b) Data in life and social sciences often consists of counts, e.g., numbers of persons in a population or numbers of births during a time period. Note Wikipedia entry Population biology.
(2c) Official statistics refers to data produced by governments, from which various kinds of statistics are derived, e.g., economic statistics, demographic and social statistics, and environmental statistics. The United Nations Statistics Division website http://unstats.un.org contains extensive information on official statistics.
(2d) Data in Statistics incorporates all these contexts. Note the definition of a statistic as "a function of the data" ([[Estimator]]).
(3) As the word is used in all these contexts, "Data" does indeed refer to "a collection of organised information". This shows that usage in these areas is not consistent with the usage given by http://www.diffen.com/difference/Data_vs_Information. Either one accepts that the same word is used with different meanings in different contexts, or one makes a choice for one meaning or the other. I suggest that in this instance extensive and at least roughly consistent usage of the word in Life sciences, Social science, Official statistics and Statistics ought to override the usage proposed in http://www.diffen.com/difference/Data_vs_Information.
(4) As the word is used in these contexts, the characterization can be sharpened beyond "organized information", which is so broad as to encompass nearly anything. "Data" is used more specficially to refer to systematic information about entities in some well-defined aggregate. "Systematic" signifies that the same information is provided for every entity in the aggregate, undefined values (age at first marriage for never married persons) and missing values excepted. "Well-defined" signifies conditions that define membership in the aggregate ("Emperor penguins in Antarctic on midnight 31 December 2013/1 January 2014"). Data in this sense is more specific than Information.
(5) From this perspective, at least, the sentence with which the Data entry begins, "Data are values of qualitative or quantitative variables that belong to a set" is deeply confused. Data provides values of variables, and the variables it provides values for constitute a set, but there is no reference to the set of entities the variables refer to.
(6) Data in this sense may or may not be "raw". The "raw" data captured from Population census (this redirects to Census, which is far more general) census questionnaires is processed by "editing" to produce "clean" data. The processes are described in detail in the United Nations Principles and Recommendations for Population and Housing Censuses and Handbook on Population and Housing Census Editing.
(7) Data in this sense is information, but of a very specific kind. Information is far more general that data in this sense.
(8) "Data" probably encompasses too much to manage with a single meaning for all contexts. The challenge is to identify a manageable number of meanings and characterize them well. The characterization sketched above may not be able to accommodate literary texts regarded as data, for example, and this may be a well established and defensible usage. It is probably necessary to say a good deal more about Data structure, though not only in the context of computing. The content of the [[Data]] and [[Data (computing)]] entries does not to me justify the distinction.
(9) This discussion is pertinent to improving the Data quality assessment entry, currently in a primitive state. Considering data quality assessment issues might be a useful for clarifying what "data" is.
Gfeeney (talk) 05:17, 10 December 2013 (UTC)
The article claims that the word data is now considered a mass noun. However, this is only true in some contexts (see http://grammarist.com/usage/data/ ). Additionally, data cannot truly be considered a mass noun the way words such as audience is. Someone might easily say "The whole audience burst out laughing" and no one would think twice, but if we said "The whole data is stored on this flashdrive" most people would object and say "No, all the data is stored there."190.81.202.250 (talk) 18:03, 23 February 2015 (UTC)
I just want to say that this Venn diagram is beautiful. Red Slash 17:23, 2 November 2016 (UTC)
It seems to me that the opening statement is not as clear as it could be. I almost think the more technical paragraphs below the opening statement might be better as an opening. I didn't want to just change it because there is good stuff there and I didn't want to overwrite anyone's good work. Alex Jackl (talk) 16:30, 20 September 2017 (UTC)
This topic has been broached again and again so I am hesitant to just make changes to the open paragraphs of the article without a discussion here. Although "surprising" data may be more informative, that is hardly the key distinction between data and information. As has been said many times here in this talk page and referenced , information is data with context. This is reference din the "Meaning" section of the article. I suggest taking the overly academic second sentence of the opening paragraph out and replacing it with one for a more general audience. If I get no objection here I will do that in a week or two. Alex Jackl (talk) 17:02, 5 December 2018 (UTC)
I think on 1:29, 13 July 2022, Pooryorick~enwiki made an edit (not Discospinster at 17:22, 17 August 2022, as I previously thought) that replaced the very first sentence:
Data are individual facts, statistics, or items of information, used for evidence-based logical reasoning.
with something that eventually became
In a conceptual model, data is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. A datum is an indivdual value in a set of data.
I think this was a big mistake. The sense of "data are individual facts", previously the very first words of the article, is completely lost and not recovered elsewhere in the article. Moreover the replacement refers to what is quite clearly a different concept, namely raw information in the sense of information theory, or perhaps "data structures" in the sense of computer science. I strongly support reverting this. Jimmymath (talk) 17:55, 18 August 2022 (UTC) edited Jimmymath (talk) 18:43, 26 August 2022 (UTC)
Moreover, maybe I misunderstand things, but it seems that a few editors are making several edits at random places at high frequency without any sort of consultation on the talk page first. An example is this first paragraph over the last month or two. Can this be limited? Jimmymath (talk) 18:43, 26 August 2022 (UTC)
That part about data, information, knowledge, and wisdom looks like it *could* have come straight from my Wikia. AWESOME! Do give credit where it is due though, please. TheLastWordSword (talk) 11:25, 4 March 2019 (UTC)
In reading the article, I found it a bit jarring when I hit the sentence, "For example, the height of Mount Everest is generally considered data." That sentence is problematic whether we're considering data to be a mass noun or as the plural of datum.
If the word data is being used as the plural of datum, the sentence should read, "For example, the height of Mount Everest is generally considered a datum," or perhaps "For example, the height of Mount Everest is a datum." On the other hand, if the word data is being used as a mass noun, it should probably be, "For example, the height of Mount Everest is generally considered to be a piece of data," given that discrete units/portions/items of things that make up mass nouns are typically (almost always? always?) distinguished from the mass noun - 'a grain of sand,' 'a piece of glass,' 'two gallons of water,' 'an iota of courage,' etc.
I'm going to change it to one or the other ('a piece of data' or 'a datum') but I want to see if there's any kind of consensus on the issue. If you have an opinion, please share it. I'm leaning toward 'a piece of data' given that one can reasonably argue that 'a piece of data' can reasonably be considered to be a singular form of the plural data, whereas 'a datum' seems much less likely to appease those who think that data is (and always must be) a mass noun. So if I don't get any comments by the time I get back to it, 'a piece of data' is what I'm going to go with. CruiserBob (talk) 17:24, 11 August 2020 (UTC)
I removed the reference for collecting through observation because, although the OECD defines it as such, many data are generated through inference or deduction and therefore are not directly generated from observation. It was not "incorrect" just potentially misleading and too narrow a definition for the general page for "Data" in Wikipedia. Alex Jackl (talk) 17:06, 7 October 2021 (UTC)
What type of information was drawn from the data? 2409:4063:4D87:747E:409C:334C:B82E:C1ED (talk) 11:26, 7 July 2022 (UTC)
Data is now treated as a singular mass noun, even in books, slightly more than as a plural: ngrams Red Slash 15:38, 14 July 2022 (UTC)