The case against deprecation

Background information

Multiple-byte units
Decimal
Value Metric
1000 kB kilobyte
10002 MB megabyte
10003 GB gigabyte
10004 TB terabyte
10005 PB petabyte
10006 EB exabyte
10007 ZB zettabyte
10008 YB yottabyte
10009 RB ronnabyte
100010 QB quettabyte
Binary
Value IEC Memory
1024 KiB kibibyte KB kilobyte
10242 MiB mebibyte MB megabyte
10243 GiB gibibyte GB gigabyte
10244 TiB tebibyte TB terabyte
10245 PiB pebibyte
10246 EiB exbibyte
10247 ZiB zebibyte
10248 YiB yobibyte
10249
102410
Orders of magnitude of data

Why Wikipedia should not deprecate the use of IEC prefixes

  1. IEC prefixes are unambiguous, succinct, simple to use and simple to understand.
  2. The use of IEC prefixes is endorsed by national and international standards bodies.
  3. The use of one symbol (e.g. GB) to mean two different things in the same article creates confusion and ambiguity. Despite this ambiguity, there are many WP articles in which kilobyte, megabyte and/or gigabyte are used in this way. In this situation, the IEC prefixes provide an ideal disambiguation tool because they are unambiguous and succinct.
  4. Deprecation (of IEC prefixes) increases the difficulty threshold for disambiguation, reducing the rate at which articles can be disambiguated by expert editors.
  5. In turn this reduces the total number of articles that can be further improved by less expert editors with footnotes etc (assuming that there is consensus to do so).
  6. Deprecation is interpreted by some editors as a justification for changing unambiguous units into ambiguous ones.
  7. Removing IEC prefixes from articles, even when disambiguated with footnotes, destroys a part of the information that was there before, because it requires an expert to work out which footnote corresponds to which use in the article.
  8. In the long term, the use of IEC prefixes would ultimately avoid the need to use same symbol (e.g., MB) with two different meanings. This may sound like a pipe dream, but it could be implemented as a user preference, so that readers could choose between familiar (ambiguous) units and (unfamiliar) unambiguous ones.
  9. The main argument for not using IEC prefixes is the unfamiliarity of, for example, the mebibyte (MiB) compared with the megabyte (MB). The unfamiliarity is not disputed, but is not relevant to disambiguation. The point is that disambiguation is rare and therefore all disambiguation methods are unfamiliar.
  10. Alternative disambiguation methods are either cumbersome (i.e., exact numbers of bytes), difficult and time-consuming to implement in a manner that is clear to the reader (i.e., footnotes)[4] or unlikely to be understood (i.e. exponentiation).
In conclusion, disambiguation is not easy, so it would be unwise to discard the simplest disambiguation tool at our disposal just because it is unfamiliar to some readers. The best disambiguation method has yet to be established, so it is premature to deprecate this one.

See also

Footnotes

  1. ^ MB even has a third meaning, equal to 1000 KiB or 1,024,000 B
  2. ^ Snow Leopard changes how file and drive sizes are calculated
  3. ^ According to the LBA Count for IDE Hard Disk Drives Standard from the website of the International Disk Drive Equipment and Materials Association (IDEMA), there are 1,000,194,048 bytes (1,953,504 logical blocks x 512 bytes/logical block) per nominal gigabyte of hard drive storage.
  4. ^ This problem is illustrated by Address space layout randomization, which includes the confusing disambiguation footnote "Transistorized memory, such as RAM and cache sizes (other than solid state disk devices such as USB drives, CompactFlash cards, and so on) as well as CD-based storage size are specified using binary meanings for K (10241), M (10242), G (10243), ..."