This level-5 vital article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
I think it would be helpful to be clear what range of values the kurtosis statistic can take. I can infer that there is a lower bound of -2, when the article discusses the binomial distribution being an extreme case; this took a fair bit of reading. There is nothing about an upper bound; presumably one exists else you end up with improper distribution? — Preceding unsigned comment added by 194.176.105.139 (talk) 09:48, 4 July 2012 (UTC)
The author defines kurtosis in terms of Gamma, but fails to define Gamma? Is it the Gamma Distribution? Then why doesn't it have 2 parameters? Is it the Factorial? Then why does it have a non-integer parameter? Is it acceptable to use ambiguous functions in a definition without disambiguating them?
Hsfrey (talk) 00:46, 17 March 2012 (UTC)
Hi statistic wikipedia folks. In this page the Kurtosis definition has a "-3" in it (because the normal has a Kurtosis of 3 so this definition "normalises" things so to say). Subtracting this 3 is actually a convention, maybe this should be mentioned.
A more important point is that every single page on distributions I've encountered here does NOT include the -3 in the Kurtosis formula given on the right (correct me if I'm wrong? I didn't recalculate them all manually :)). So while this is only a matter of convention, we should at least get wikipedia consistent with its own definition conventions? The easiest way seems adapting the definition in this page.
Regards
woutersmet
The reason for this (I think!) is that people who have contributed to this page are from an econometrics background where its common to assume a conditional normal distribution. Hence the -3. —Preceding unsigned comment added by 62.30.156.106 (talk) 21:45, 14 March 2008 (UTC)
If this is the "fourth standardized moment", what are the other 3 and what is a standardized moment anyway? do we need an article on it? -- Tarquin 10:39 Feb 6, 2003 (UTC)
Kurtosis is a measure of the peakedness ... so what does that mean? If I have a positive kurtosis, is my distribution pointy? Is it flat? -- JohnFouhy, 01:53, 11 Nov 2004
It has been pointed out that kurtosis is not synonymous with shape or peakedness, even for symmetric unimodal distributions, please see:
1) A common error concerning kurtosis, Kaplansky - Journal of the American Statistical Association, 1945
2) Kurtosis: a critical review, Balanda, HL MacGillivray - American Statistician, 1988 —Preceding unsigned comment added by Studentt2046 (talk • contribs) 16:27, 10 March 2009 (UTC)
Just backing up that we should not describe Kurtosis as "peakedness", see "Kurtosis as Peakedness, 1905–2014. R.I.P." in The American Statistician 11 Aug 2014 — Preceding unsigned comment added by 130.225.116.170 (talk) 09:33, 2 October 2014 (UTC)
Unfortunately, I couldn't find a citeable source for this, but I suspect that the use of the term "peakedness" is in part driven by surface metrology; if a surface has sharp peaks, its height distribution will have a pronounced tail and therefore high curtosis. I have seen "kortosis=peakedness" quite often in books on surface metrology. I will try to find more on that. 2A02:8071:28A:7700:4125:57A9:C1EB:6C80 (talk) 23:36, 7 March 2021 (UTC)
(Note: In whatever field of study, metrology, imaging, finance, or whatever, the beta(.5,1) distribution provides a canonical example to demonstrate that a sharp peak does not imply higher kurtosis.) — Preceding unsigned comment added by BigBendRegion (talk • contribs) 16:28, 5 June 2021 (UTC)
Peakedness, as I understand, is not intended to imply anything about the shape of the _distribution_, which the cited statistical reference above suggests. Such notions are just plain wrong, but that is not how it is supposed to be interpreted. The term implies that peaks in e.g. a time signal will contribute to a higher kurtosis value (because of the fourth moment, and _yes_, outliers such as peaks in a time signal _do_ contribute to higher kurtosis values). That is, if you want to compare signals (data series) with respect to how much peaks they contain, kurtosis may be used as a measure. Kurtosis has for instance been used as a complement to equivalent SPL (i.e. the average RMS) to catch sound signals with a lot transients/peaks in cases where they don't contribute much to the SPL level. An example of such "correct" usage of peakedness/kurtosis can e.g. be found here: https://www.bksv.com/media/doc/bo0510.pdf. Therefore, it is misleading to state "scaled version of the fourth moment of the distribution. This number is related to the tails of the distribution, not its peak;[2] hence, the sometimes-seen characterization of kurtosis as "peakedness" is incorrect." (quotation from this Wiki article). This is ONLY a proper conclusion if the peakedness is supposed to refer to the shape of the _distribution_, which, as far as I know, is _not_ the intended meaning. Instead of saying it is "incorrect", the text should just explain that peakedness does not suggest that higher kurtosis corresponds to more or higher peaks in the statistical distribution, but it suggest that higher kurtosis suggest more/higher peaks in an associated data _series_ (e.g. a time series). — Preceding unsigned comment added by 79.136.121.89 (talk) 08:19, 20 September 2023 (UTC)
I believe the equation for the sample kurtosis is incorrect (n should be in denominator, not numerator). I fixed it. Neema Sep 7, 2005
The statement, "This is because the kurtosis as we have defined it is the ratio of the fourth cumulant and the square of the second cumulant of the probability distribution," does not explain (to me, at least) why it is obvious that subtracting three gives the pretty sample mean result. Isn't it just a result of cranking through the algebra, and if so, should we include this explanation? More concretely, the kurtosis is a ratio of central moments, not cumulants. I don't want to change one false explanation that I don't understand to another, though. Gray 01:30, 15 January 2006 (UTC)
It says: "Distributions with zero kurtosis are called mesokurtic. The most prominent example of a mesokurtic distribution is the normal distribution family, regardless of the values of its parameters." Yet here: http://en.wikipedia.org/wiki/Normal_distribution, we can see that Kurtosis = 3, it's Skewness that = 0 for normal. Agree? Disagree?
I have just added a discussion to the skewness page. Similar comments apply here. Unbiasedness of the given kurtosis estimator requires independence of the observations and does not therefore apply to a finite population.
The independent observations version is biased, but the bias is small. This is because, although we can make the numerator and denominator unbiased separately, the ratio will still be biased. Removing this bias can be done only for specific populations. The best we can do is either:
1 use an unbiased estimate for the fourth moment about the mean,
2 use an unbiased estimate of the fourth cumulant,
in the numerator; and either:
3 use an unbiased estimate for the variance,
4 use an unbiased estimate for the square of the variance,
in the denominator.
According to the article, the given formula is 2 and 3 but I have not checked this. User:Terry Moore 11 Jun 2005
I mean, what is the etymology of the term? -FZ 19:48, 22 Jun 2005 (UTC)
I've heard of "excess kurtosis," but not vice-versa. Is "kurtosis excess" a common term? Gray 01:12, 15 January 2006 (UTC)
A picture would be nice ... (one is needed for skewness as well. I'd whip one up, but final projects have me beat right now. 24.7.106.155 08:27, 19 April 2006 (UTC)
The picture is taken from in my doctoral thesis ("A.Meskauskas.Gravitorpic reaction: the role of calcium and phytochrome", defended in 1997, Vilnius State University, Lithuania). I added this note to the picture description in commons. The picture represents my own experimental data but the dissertation should be considered a published work. The real experimental data cannot be "adjusted" in any "preferred" way but in reality likely no scientist will ever observe an "absolutely clean" effect that only changes kurtosis and nothing else. Audriusa (talk) 14:41, 19 December 2008 (UTC)
Is the range -2, +infinity correct? why not -3, +infinity?
I corrected the french article, which given 0, +infinity for kurtosis ( so -3, +infinity for excess kurtosis). The good range for kurtosis are : 1 , +infinity and for the excess kurtosis : -2 , +infinity
Very simple demonstration :
We have
or
so
with , we have
This demonstration can be realize with Jensen's inegality (add 10/16/09).
Jensen's Inegality :
We have
so
Thierry —Preceding unsigned comment added by 132.169.19.128 (talk) 08:22, 4 June 2009 (UTC)
Is the given formula for the sample kurtosis really right? Isn't it supposed to have the -3 in the denominator? --60.12.8.166
In the discussion of the "D" formula, the summation seems to be over i terms, whereas the post lists: "xi - the value of the x'th measurement" I think this should read: "xi - the value of the i'th measurement of x" (or something close) --Twopoint718 19:25, 13 May 2007 (UTC)
In terms of shape, a leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "fat tails" (that is, a higher probability than a normally distributed variable of extreme values)
Is that right? How can a function have both a greater probability near the mean and a greater probability at the tails? Ditto for platykurtic distributions--DocGov 21:49, 18 November 2006 (UTC)
Looking at the Pearson Distribution page - isn't the example a Pearson V, not Pearson VII as stated in the title? And, if not, where is more info on Type VII - the Pearson Wikipedia page only goes up to V. 128.152.20.33 19:34, 7 December 2006 (UTC)
(Herbmuell's comment "one is not possible without the other" is false. Take the beta(.5,1) distribution and reflect it around the origin. The new distribution is (1) symmetric and (2) infinitely peaked, but (3) it is light tailed (kurtosis <3). For another example, take the U(-1,1) distribution and mix it with the Cauchy, with mixing probabilities .99999 and.00001. The resulting distribution is (1) symmetric and (2) appears perfectly flat over 99.999% of the observable data, but (3) has fat tails (infinite kurtosis).)BigBendRegion (talk) 15:42, 20 June 2022 (UTC)
"A distribution whose kurtosis is deemed unaccepatably large or small is said to be kurtoxic. Similarly, if the degree of skew is too great or little, it is said to be skewicked" – two words that had no hits in Google. I think someone was kidding us. DFH 20:33, 9 February 2007 (UTC)
I think the definitions of lepto-/platy- kurtic in the article are confusing: the prefixes are reversed. I'm not confident enough in statistics to change this. Could someone who understands the subject check that this is the correct usage?
A distribution with positive kurtosis is called leptokurtic, or leptokurtotic. In terms of shape, a leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "thin tails" (that is, a lower probability than a normally distributed variable of extreme values). Examples of leptokurtic distributions include the Laplace distribution and the logistic distribution.
A distribution with negative kurtosis is called platykurtic, or platykurtotic. In terms of shape, a platykurtic distribution has a smaller "peak" around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and "heavy tails" (that is, a higher probability than a normally distributed variable of extreme values).
leptokurtic: –adjective Statistics. 1. (of a frequency distribution) being more concentrated about the mean than the corresponding normal distribution. 2. (of a frequency distribution curve) having a high, narrow concentration about the mode. [Origin: 1900–05; lepto- + irreg. transliteration of Gk kyrt(ós) swelling + -ic]
lepto- a combining form meaning "thin," "fine," "slight"
platykurtic: 1. (of a frequency distribution) less concentrated about the mean than the corresponding normal distribution. 2. (of a frequency distribution curve) having a wide, rather flat distribution about the mode. [Origin: 1900–05; platy- + kurt- (irreg. < Gk kyrtós bulging, swelling) + -ic]
platy- a combining form meaning "flat," "broad".
--Blick 19:43, 21 February 2007 (UTC)
Well, I think it does agree with outside sources, at least the _American Statistican_ Maybe, to make it less confusing, it's helpful to talk about length (which is what you're talking about) and thinness. Here's a quote:(Kevin P. Balanda and H. L. MacGillivray The American Statistician, Vol. 42, No. 2 (May, 1988), pp. 111-119.) Who write: "Dyson (1943) gave two amusing mnemonics attributed to Student for these names: platykurtic curves, like playpuses, are square with short tails whereas leptokurtic curves are high with long tails, like kangaroos, noted for "lepping" The terms supposedly refer to the general shape of a distribution, withplatykurtic distributions being flat topped compared with the normal, leptokurtic distributions being more sharply peaked than the normal, and mesokurtic distributions having shape comparable to that of the normal. So, yes, "leptokurtic" distributions have long and thin tails, Platykurtic distributions have short heavy tails.). —Preceding unsigned comment added by 128.206.28.43 (talk) 15:56, 11 November 2008 (UTC)
Not really. Moments are more sensitive to the tails, because of the way powers work. The squares of 1 , 2, 3 etc. are 1, 4, 9 etc. which are successively spaced farther apart. The effect is greater for 4th powers. So, although the names playkurtic and leptokurtic are inspired by the appearance of the centre of the density function, the tails are more important. Also it is the behaviour of the tails that determine how robust statistical methods will be and the kurtosis is one diagnostic for that.203.97.74.238 00:46, 1 September 2007 (UTC)Terry Moore
I don't have the time to write about that, but I think the article should mention L-kurtosis, too. --Gaborgulya (talk) 01:13, 22 January 2008 (UTC)
to find out if its mesokurtic, platykurtic or leptokurtic, why compare it to 3? —Preceding unsigned comment added by Reesete (talk • contribs) 10:18, 5 March 2008 (UTC)
The expected Kurtosis for sample of IID standard normal data is 3 (see the wiki article on the normal distribution for more). We tend to refer to excess kurtosis as the sample kurtosis of a series -3 for that reason.. —Preceding unsigned comment added by 62.30.156.106 (talk) 21:42, 14 March 2008 (UTC)
Perhaps the article should include more explicit notes on bias. In particular, I'm wondering why the formula is using biased estimates of sample moments about the mean; perhaps someone more knowledgeable than I might explain why this is the preferred formula? —Preceding unsigned comment added by 140.247.11.37 (talk) 14:30, 25 June 2008 (UTC)
The way the "modern" definition is phrased in the article makes it look like could be what they're referring to as excess kurtosis.
However, I get the impression that "excess kurtosis" is actually the "minus 3" term. Is this correct? kostmo (talk) 05:57, 25 September 2008 (UTC)
I found this:
k is best interpreted as a measure of dispersion of the values of Z^2 around their expected value of 1, where as usual Z = (X-mu)/sigma
It has been written by Dick Darlington in an old mail thread. It does not account for the -3 used in Wikipedia's article, but it is clear and could be added to the initial definition. --Pot (talk) 10:40, 19 February 2009 (UTC)
Okay, I barely understand the statistical part of the article, why do you have to use an example that involves something only botanists and biologists can understand.. I undertsnad that encyclopaedias are supposed to be erudite, but not pedantic. They shouldn't make you have to keep clicking on newer and newer subjects that you have to read up on just so you can understand the one you originally started with. An example in an encyclopaedia is supposed simple and straightforward, something the uninitiated laymen can understand, not something having to do with red-lights and gravitropic celeoptiles. It's the people's encyclopaedia, you don't have to dumb it down to make it more accessible. My point is, get a better visual example for what kurtosis is. —Preceding unsigned comment added by 70.73.34.109 (talk) 10:26, 30 April 2009 (UTC)
I agree - two clear examples, of very high and very low kurtosis, would make this article much clearer, and much easier to understand at a glance. Use a couple of every-day activities to prove the point. 165.193.168.6 (talk) 12:27, 13 August 2013 (UTC)
I'm no statistician, but the description of leptokurtosis currently says it has a more acute peak and fatter tails, whereas playkurtosis has a flatter peak and thinner tails. A quick mental diagram demonstrates to me that this is impossible, and the author(s) must have confused the thickness of the tails for the two cases. A leptokurtic curve must have thinner tails and a platykurtic curve must have fatter tails. Unless anyone objects, I'll correct this in a moment. —Preceding unsigned comment added by 194.153.106.254 (talk) 10:33, 23 July 2009 (UTC)
On Latin wiki page for Distributio normalis you find a recent (2003) scientific paper which rearranges differently the fourth moment to define a number said in English arch (and in Latin fornix) which ranges from 0 to infinity (and for the normal distribution is 1) instead of the quite strange [-2, infinity). by Alexor65 — Preceding unsigned comment added by Alexor65 (talk • contribs) 20:24, 29 March 2011 (UTC)
Link "Celebrating 100 years of Kurtosis" does not work because file has changed address, now it is at least in faculty.etsu.edu/seier/doc/Kurtosis100years.doc ----Alexor65 —Preceding unsigned comment added by 151.76.68.54 (talk) 21:02, 2 April 2011 (UTC)
The first sentence states "In probability theory and statistics, kurtosis is a measure of the "peakedness" of the probability distribution of a real-valued random variable, although some sources are insistent that heavy tails, and not peakedness, is what is really being measured by kurtosis.[1]"
The reference given says, "The heaviness of the tails of a distribution affects the behavior of many statistics. Hence it is useful to have a measure of tail heaviness. One such measure is kurtosis...Statistical literature sometimes reports that kurtosis measures the peakedness of a density. However, heavy tails have much more influence on kurtosis than does the shape of the distribution near the mean (Kaplansky 1945; Ali 1974; Johnson, et al. 1980)."
The reference seems to directly contradict the first sentence. — Preceding unsigned comment added by 140.226.46.75 (talk) 21:26, 7 October 2011 (UTC)
When I first read through the introduction, I did not understand what it was saying. However, I read through it again, and, perhaps because I picked up on some word I missed the fist time around, the meaning of the introduction became completely clear. As this seems to suggest that understanding the introduction hinges (or at least is heavily dependent on) on noticing and understanding a very small portion of it, I would suggest that a small, one-sentence "introduction introduction" be added above (or included in) the current introduction, such that it would quickly convey to readers a general "complexity level" at which the article deals with its subject. To clarify, in this context I am using the phrase "complexity level" to refer to a measure of a work's position on a "sliding scale" of sorts that measures the amount that a work is affected by the general tendency of larger words to become more critical to comprehension as the complexity of a work's subject (among other factors) increases. For instance, a college-level thermodynamics textbook is unlikely to spend the same amount of time leading up to a definition of thermal conduction and insulation that an elementary-level science textbook would. As such, a prior understanding of thermal conduction and insulation becomes more necessary to understand the rest of the book in the college textbook than in the elementary school textbook.
Alternately, the "complexity level" could be reduced, for instance, by using more familiar terms than, also for instance, "peakedness", which, while helping the reader to associate the concept with common phrases such as "highly peaked", could perhaps be moved lower in the introduction (or even put into the article itself) and replaced by another term, such as "sharpness", and reducing the repetition of clauses, such as removing the redundant "just as for skewness" in the second sentence.
Aero-Plex (talk) 17:24, 10 November 2011 (UTC)
(Split due to needing to reset the router)
EDIT:
Unfortunately, I do not have the necessary time to read and edit the article now, and probably won't for some time, so I cannot edit the article for now. However, from my brief skim through, I did notice that 1. there is a noticeable amount of repetition of terminology, which could be improved, 2. no compact, direct description of a graph with high/low kurtosis is made in the text, and I could only find one by looking in the image descriptions, which may not be noticed by some (I suggest that a sentence along the lines of "High kurtosis causes narrow curves, while low kurtosis causes wide graphs." be added somewhere in the article where it would be noticed), and 3. the "coin toss" example could be better elaborated on, as it seems like it could be very helpful, especially for people who are only coming to this page for a quick summary.
Aero-Plex (talk) 17:41, 10 November 2011 (UTC)
I just conducted a simulation study which seems to confirm that "The usual estimator of the population kurtosis" is in fact an estimator for excess curtosis. Which seems to make sense given the lest part of the formula -3*X — Preceding unsigned comment added by 83.89.29.84 (talk) 23:28, 24 January 2012 (UTC) ALSO: it claims that the estimator is used in Excel, however the excel formula seems to use standard deviation rather than variance: http://office.microsoft.com/en-us/excel-help/kurt-HP005209150.aspx BUT the wiki artickle claims it must be the unbiased standard deviation estimator, which i dont believe exist.. — Preceding unsigned comment added by 83.89.29.84 (talk) 00:11, 25 January 2012 (UTC)
As has been pointed out above, kurtosis is NOT an accurate measure of peakedness. This should be obvious by looking at a graph of the Student's t-distribution with degrees of freedom above 4 and trying to see if you can see anything approaching sharp peakedness as the d.o.f. drops down to 4 and the kurtosis shoots up to infinity. Similarly, look at the gamma distribution graph and try to notice any correlation at all between the sharpness and softness of the peak when k > 1 and the smallness of k (higher kurtosis). The point is that kurtosis measures ONLY heaviness of the tails — and contrary to the former text, there's no difference in this respect between Pearson's kurtosis and excess kurtosis. (Nor can there be, since the two are identical save for being shifted by 3). In fact, it should be obvious that heavy tails and sharp peaks CANNOT in general be correlated -- i.e. could radically change the shape of the peak in the middle of the graph by a strictly local rearrangement of the nearby area while leaving the tails entirely untouched. It's rather sad that a basic article like this had such basic errors for such a long time, but at least they are fixed now. Benwing (talk) 07:10, 23 March 2012 (UTC)
I found the following article to be very helpful in explaining the common misconceptions about kurtosis, specifically related to its use a a measure of "peakedness" and "tail weight." (http://www.columbia.edu/~ld208/psymeth97.pdf) Basically, it explains that kurtosis is a movement of mass not explained by the variance. Thus, when we see heavier tails, this means that data points are spread further out, which should lead to an increase in the variance - BUT if the there is also an increase in the number of data points near the mean, this leads to a decrease in the variance; kurtosis is able to explain the change in shape of a distribution when both of these occurrences happen to equality.
I also believe this article makes an important point about the distributions crossing twice, which is helpful to dispel misconceptions about kurtosis.
Section Kurtosis#Applications is tagged with ((Expand section))
; here are some suggestions: Special:WhatLinksHere/Kurtosis. Fgnievinski (talk) 04:34, 19 January 2013 (UTC)
Here are some sources from IDRE/UCLA for the various definitions of kurtosis, including citations and which versions SAS, SPSS and STATA use [4], [5]. Regards, Anameofmyveryown (talk) 18:53, 11 March 2013 (UTC)
This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Hi. Ref. 4 (Pearson, Biometrika 1929) about the lower bound of the kurtosis, is edit protected and it is erroneous, including the doi: anybody can check. Even if the ref to Pearson can be corrected, the previous reference (paper of 2013) which was reverted the 5 February 2014 remains eligible because (a) it applies to a wider class of distributions than the finite discrete ones (because the proof of 2013 use math expectations), and (b) it is issued from a more general inequality applied to d-variate distributions, established in 2004 (ref cited in the 2013 paper). I can send the pdfs of the 2004 and of the 2013 paper to the administrator (please just tell me how to do that) and to interested people. At least please correct the erroneous reference about Pearson, if it is relevant. If it is not possible, please undo the change of the 5 February 2014, and replace ref. 4 by the previous ref 4, which is: [2] Thank you. Michel.
About the actual ref 4: Pearson, K. (1929). "Editorial note to ‘Inequalities for moments of frequency functions and for various statistical constants’". Biometrika 21 (1–4): 370–375. doi:10.1093/biomet/21.1-4.361 (1) The toc of Biometrika 1929, 21(1-4) is at: http://biomet.oxfordjournals.org/content/21/1-4.toc I failed to find this paper of Pearson on this toc, and I failed to find it with ZMATH. The doi redirects to the paper of Joanes and Gill, "The Statistician" 1998, vol 47, part 1, pp. 183-189. Indeed it deals with skeweness and kurtosis, but it does not cite Pearson and it does not give a general proof of the inequality valid for any random variable distribution. Anyway there is disagreement between the doi and the ref to Pearson. (2) The 2013 paper is publicly available on http://petitjeanmichel.free.fr/itoweb.petitjean.skewness.html (see ref. 2: "download pdf paper"): see the result top of p.3 and eq. 6. The proof of the more general inequality for random vectors is in my paper: "From Shape Similarity to Shape Complementarity: Toward a Docking Theory." J. Math. Chem. 2004,35[3],147-158. (DOI 10.1023/B:JOMC.0000033252.59423.6b), see eq. A10 in the appendix. I cannot load it on the web due to the copyright. Only one assumption: the moments of order 4 must exist (so, it is not restricted to samples). I do not claim to have discovered the sharp lower bound of the kurtosis, even in its more general form, and I do not care if my 2013 paper is not cited. However I was the first to mention the inequality on the Wikipedia page, and at first glance my own proof seems to be original. I just say that the reader should be directed to a proof valid in all cases, e.g. via a valid source. If the ref works only for samples, the text should be updated accordingly. To conclude, I give you the hint for the full proof for random variables (for vectors, see the 2004 paper), available to anybody aware of math expectations: X1 and X2 are random variables, translate X2, calculate the translation minimizing the variance of the squared difference of the random variables, and look at the expression of the minimized variance: it should be a non negative quantity, hence the desired inequality. Mailto: petitjean.chiral@gmail.com (preferred) or michel.petitjean@univ-paris-diderot.fr 81.194.29.18 (talk) 13:55, 10 December 2014 (UTC)
— Preceding unsigned comment added by 81.194.29.18 (talk) 18:54, 8 December 2014 (UTC)
References
I pasted the wrong doi of ref.4 (mouse catched the doi of the line above). In fact the Editorial is appended to the paper of Shohat. I cancel my request. Please accept my apologies for the inconvenience caused. Thanks for your patience.81.194.29.18 (talk) 14:33, 10 December 2014 (UTC)
It seems that people keep wanting to insert something about "peakedness" into the interpretation of kurtosis.
What follows is a clear explanation of why “peakedness” is simply wrong as a descriptor of kurtosis.
Suppose someone tells you that they have calculated negative excess kurtosis either from data or from a probability distribution function (pdf). According to the “peakedness” dogma (started unfortunately by Pearson in 1905, and carried forward by R.A. Fisher through the 14th edition of his classic text, Statistical Methods for Research Workers), you are supposed to conclude that the distribution is “flat-topped” when graphed. But this is obviously false in general. For one example, the beta distribution beta(.5,1) has an infinite peak and has negative excess kurtosis. For another example, the 0.5*N(0, 1) + 0.5*N(4,1) mixture distribution is bimodal (wavy); not flat at all, and also has negative excess kurtosis. These are just two examples out of an infinite number of other non-flat-topped distributions having negative excess kurtosis.
Yes, the continuous uniform distribution U(0,1) is flat-topped and has negative excess kurtosis. But obviously, a single example does not prove the general case. If that were so, we could say, based on the beta(.5,1) distribution, that negative excess kurtosis implies that the pdf is "infinitely pointy." We could also say, based on the 0.5*N(0, 1) + 0.5*N(4,1) distribution, that negative excess kurtosis implies that the pdf is "wavy." It’s like saying, “well, I know all bears are mammals, so it must be the case that all mammals are bears.”
Now suppose someone tells you that they have calculated positive excess kurtosis from either data or a pdf. According to the “peakedness” dogma (again, started by Pearson in 1905), you are supposed to conclude that the distribution is “peaked” or “pointy” when graphed. But this is also obviously false in general. For example, take a U(0,1) distribution and mix it with a N(0,1000000) distribution, with .00001 mixing probability on the normal. The resulting distribution, when graphed, appears perfectly flat at its peak, but has very high kurtosis.
You can play the same game with any distribution other than U(0,1). If you take a distribution with any shape peak whatsoever, then mix it with a much wider distribution like N(0,1000000), with small mixing probability, you will get a pdf with the same shape of peak (flat, bimodal, trimodal, sinusoidal, whatever) as the original, but with high kurtosis.
And yes, the Laplace distribution has positive excess kurtosis and is pointy. But you can have any shape of the peak whatsoever and have positive excess kurtosis. So the bear/mammal analogy applies again.
One thing that can be said about cases where the data exhibit high kurtosis is that when you draw the histogram, the peak will occupy a narrow vertical strip of the graph. The reason this happens is that there will be a very small proportion of outliers (call them “rare extreme observations” if you do not like the term “outliers”) that occupy most of the horizontal scale, leading to an appearance of the histogram that some have characterized as “peaked” or “concentrated toward the mean.”
But the outliers do not determine the shape of the peak. When you zoom in on the bulk of the data, which is, after all, what is most commonly observed, you can have any shape whatsoever – pointy, inverted U, flat, sinusoidal, bimodal, trimodal, etc.
So, given that someone tells you that there is high kurtosis, all you can legitimately infer, in the absence of any other information, is that there are rare, extreme data points (or potentially observable data points). Other than the rare, extreme data points, you have no idea whatsoever as to what is the shape of the peak without actually drawing the histogram (or pdf), and zooming in on the location of the majority of the (potential) data points.
And given that someone tells you that there is negative excess kurtosis, all you can legitimately infer, in the absence of any other information, is that the outlier characteristic of the data (or pdf) is less extreme than that of a normal distribution. But you will have no idea whatsoever as to what is the shape of the peak, without actually drawing the histogram (or pdf).
The logic for why the kurtosis statistic measures outliers (rare, extreme observations in the case of data; potential rare, extreme observations in the case of a pdf) rather than the peak is actually quite simple. Kurtosis is the average (or expected value in the case of the pdf) of the Z-scores (-values), each taken to the 4th power. In the case where there are (potential) outliers, there will be some extremely large values, giving a high kurtosis. If there are less outliers than, say, predicted by a normal pdf, then the most extreme values will not be particularly large, giving smaller kurtosis.
What of the peak? Well, near the peak, the values are extremely small and contribute very little to their overall average (which again, is the kurtosis). That is why kurtosis tells you virtually nothing about the shape of the peak. I give mathematical bounds on the contribution of the data near the peak to the kurtosis measure in the following article:
Kurtosis as Peakedness, 1905 – 2014. R.I.P. The American Statistician, 68, 191–195.
I hope this helps.
Peter Westfall
P.S. The height of the peak is also unrelated to kurtosis; see Kaplansky, I. (1945), “A Common Error Concerning Kurtosis,” Journal of the American Statistical Association, 40, 259. But the “height” misinterpretation also seems to persist.
P.P.S. Some believe that the "peakedness" and "flatness" interpretation holds in the special case of symmetric unimodal distributions, because in the esteemed journal Psychological Methods, De Carlo (1997) states at the beginning of the abstract to his paper "On the Meaning and Use of Kurtosis," as follows:
“For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness.”
But this statement is also easily shown to be false as regards "peakedness" and "flatness":
Take a U(-1,1) mixed with a N(0,1000000), with mixing p=.0001 on the normal. The distribution is symmetric and unimodal, has extremely high kurtosis, but appears flat at its peak when graphed. The high kurtosis here, as with all distributions, is explained by potential outliers, not by the peak.
Now mix a beta(.5,1) with a -beta(.5,1), with equal probabilities. The distribution is symmetric and unimodal, has negative excess kurtosis, but has an infinite peak. The negative excess kurtosis here, as with all distributions, is explained by paucity of potential outliers, not by the peak. (The maximum possible absolute is around 2.24 for this distribution).
— Preceding unsigned comment added by 129.118.195.172 (talk) 16:07, 31 August 2017 (UTC)
If kurtosis is not a measure of the "peakedness", are the terms "leptokurtic" and "platykurtic" still meaningful? Don't they just mean "more peaked" and "less peaked"? Or do they need to be either abandoned or re-defined? --Roland (talk) 21:41, 5 November 2018 (UTC)
When comparing 2 datasets, we often compute the distribution of the differences, or errors, of the 2 datasets. When excess kurtosis was considered a measure of "peakedness", a larger positive excess kurtosis would imply better agreement. Now it has been realized that excess kurtosis is actually a measure of "tailedness". What, then, should we wish for when we wish 2 datasets agree? A kurtosis as small as possible? --Roland (talk) 00:59, 27 November 2018 (UTC)
The section's reference, http://www.cs.albany.edu/~lsw/homepage/PUBLICATIONS_files/ICCP.pdf, is dead.
It may also be mentioned that is used by SPSS to calculate the kurtosis. — Preceding unsigned comment added by Ad van der Ven (talk • contribs) 08:54, 25 April 2019 (UTC)