Floatingpoint formats 

IEEE 754 

Other 
Alternatives 
Hexadecimal floating point (now called HFP by IBM) is a format for encoding floatingpoint numbers first introduced on the IBM System/360 computers, and supported on subsequent machines based on that architecture,^{[1]}^{[2]}^{[3]} as well as machines which were intended to be applicationcompatible with System/360.^{[4]}^{[5]}
In comparison to IEEE 754 floating point, the HFP format has a longer significand, and a shorter exponent. All HFP formats have 7 bits of exponent with a bias of 64. The normalized range of representable numbers is from 16^{−65} to 16^{63} (approx. 5.39761 × 10^{−79} to 7.237005 × 10^{75}).
The number is represented as the following formula: (−1)^{sign} × 0.significand × 16^{exponent−64}.
A singleprecision HFP number (called "short" by IBM) is stored in a 32bit word:
1  7  24  (width in bits)  
S  Exp  Fraction  
31  30  ...  24  23  ...  0  (bit index)* 
* IBM documentation numbers the bits from left to right, so that the most significant bit is designated as bit number 0. 
In this format the initial bit is not suppressed, and the radix (hexadecimal) point is set to the left of the significand (fraction in IBM documentation and the figures).
Since the base is 16, the exponent in this form is about twice as large as the equivalent in IEEE 754, in order to have similar exponent range in binary, 9 exponent bits would be required.
Consider encoding the value −118.625 as an HFP singleprecision floatingpoint value.
The value is negative, so the sign bit is 1.
The value 118.625_{10} in binary is 1110110.101_{2}. This value is normalized by moving the radix point left four bits (one hexadecimal digit) at a time until the leftmost digit is zero, yielding 0.01110110101_{2}. The remaining rightmost digits are padded with zeros, yielding a 24bit fraction of .0111 0110 1010 0000 0000 0000_{2}.
The normalized value moved the radix point two hexadecimal digits to the left, yielding a multiplier and exponent of 16^{+2}. A bias of +64 is added to the exponent (+2), yielding +66, which is 100 0010_{2}.
Combining the sign, exponent plus bias, and normalized fraction produces this encoding:
S  Exp  Fraction  
1  100 0010  0111 0110 1010 0000 0000 0000 
In other words, the number represented is −0.76A000_{16} × 16^{66 − 64} = −0.4633789… × 16^{+2} = −118.625
S  Exp  Fraction  
0  111 1111  1111 1111 1111 1111 1111 1111 
The number represented is +0.FFFFFF_{16} × 16^{127 − 64} = (1 − 16^{−6}) × 16^{63} ≈ +7.2370051 × 10^{75}
S  Exp  Fraction  
0  000 0000  0001 0000 0000 0000 0000 0000 
The number represented is +0.1_{16} × 16^{0 − 64} = 16^{−1} × 16^{−64} ≈ +5.397605 × 10^{−79}.
S  Exp  Fraction  
0  000 0000  0000 0000 0000 0000 0000 0000 
Zero (0.0) is represented in normalized form as all zero bits, which is arithmetically the value +0.0_{16} × 16^{0 − 64} = +0 × 16^{−64} ≈ +0.000000 × 10^{−79} = 0. Given a fraction of allbits zero, any combination of positive or negative sign bit and a nonzero biased exponent will yield a value arithmetically equal to zero. However, the normalized form generated for zero by CPU hardware is allbits zero. This is true for all three floatingpoint precision formats. Addition or subtraction with other exponent values can lose precision in the result.
Since the base is 16, there can be up to three leading zero bits in the binary significand. That means when the number is converted into binary, there can be as few as 21 bits of precision. Because of the "wobbling precision" effect, this can cause some calculations to be very inaccurate. This has caused considerable criticism.^{[6]}
A good example of the inaccuracy is representation of decimal value 0.1. It has no exact binary or hexadecimal representation. In hexadecimal format, it is represented as 0.19999999..._{16} or 0.0001 1001 1001 1001 1001 1001 1001..._{2}, that is:
S  Exp  Fraction  
0  100 0000  0001 1001 1001 1001 1001 1010 
This has only 21 bits, whereas the binary version has 24 bits of precision.
Six hexadecimal digits of precision is roughly equivalent to six decimal digits (i.e. (6 − 1) log_{10}(16) ≈ 6.02). A conversion of single precision hexadecimal float to decimal string would require at least 9 significant digits (i.e. 6 log_{10}(16) + 1 ≈ 8.22) in order to convert back to the same hexadecimal float value.
The doubleprecision HFP format (called "long" by IBM) is the same as the "short" format except that the fraction field is wider and the doubleprecision number is stored in a double word (8 bytes):
1  7  56  (width in bits)  
S  Exp  Fraction  
63  62  ...  56  55  ...  0  (bit index)* 
* IBM documentation numbers the bits from left to right, so that the most significant bit is designated as bit number 0. 
The exponent for this format covers only about a quarter of the range as the corresponding IEEE binary format.
14 hexadecimal digits of precision is roughly equivalent to 17 decimal digits. A conversion of double precision hexadecimal float to decimal string would require at least 18 significant digits in order to convert back to the same hexadecimal float value.
Called extendedprecision by IBM, a quadrupleprecision HFP format was added to the System/370 series and was available on some S/360 models (S/36085, 195, and others by special request or simulated by OS software). The extendedprecision fraction field is wider, and the extendedprecision number is stored as two double words (16 bytes):
Highorder part  
1  7  56  (width in bits)  
S  Exp  Fraction (highorder 14 digits)  
127  126  ...  120  119  ...  64  (bit index)* 
Loworder part  
8  56  (width in bits)  
Unused  Fraction (loworder 14 digits)  
63  ...  56  55  ...  0  (bit index)*  
* IBM documentation numbers the bits from left to right, so that the most significant bit is designated as bit number 0. 
28 hexadecimal digits of precision is roughly equivalent to 32 decimal digits. A conversion of extended precision HFP to decimal string would require at least 35 significant digits in order to convert back to the same HFP value. The stored exponent in the loworder part is 14 less than the highorder part, unless this would be less than zero.
Available arithmetic operations are add and subtract, both normalized and unnormalized, and compare. Prenormalization is done based on the exponent difference. Multiply and divide prenormalize unnormalized values, and truncate the result after one guard digit. There is a halve operation to simplify dividing by two. Starting in ESA/390, there is a square root operation. All operations have one hexadecimal guard digit to avoid precision loss. Most arithmetic operations truncate like simple pocket calculators. Therefore, 1 − 16^{−8} = 1. In this case, the result is rounded away from zero.^{[7]}
Starting with the S/390 G5 in 1998,^{[8]} IBM mainframes have also included IEEE binary floatingpoint units which conform to the IEEE 754 Standard for FloatingPoint Arithmetic. IEEE decimal floatingpoint was added to IBM System z9 GA2^{[9]} in 2007 using millicode^{[10]} and in 2008 to the IBM System z10 in hardware.^{[11]}
Modern IBM mainframes support three floatingpoint radices with 3 hexadecimal (HFP) formats, 3 binary (BFP) formats, and 3 decimal (DFP) formats. There are two floatingpoint units per core; one supporting HFP and BFP, and one supporting DFP; there is one register file, FPRs, which holds all 3 formats. Starting with the z13 in 2015, processors have added a vector facility that includes 32 vector registers, each 128 bits wide; a vector register can contain two 64bit or four 32bit floatingpoint numbers.^{[12]} The traditional 16 floatingpoint registers are overlaid on the new vector registers so some data can be manipulated with traditional floatingpoint instructions or with the newer vector instructions.
The IBM HFP format is used in:
As IBM is the only remaining provider of hardware using the HFP format, and as the only IBM machines that support that format are their mainframes, few file formats require it. One exception is the SAS 5 Transport file format, which the FDA requires; in that format, "All floatingpoint numbers in the file are stored using the IBM mainframe representation. [...] Most platforms use the IEEE representation for floatingpoint numbers. [...] To assist you in reading and/or writing transport files, we are providing routines to convert from IEEE representation (either big endian or little endian) to transport representation and back again."^{[13]} Code for IBM's format is also available under LGPLv2.1.^{[15]}
The article "Architecture of the IBM System/360" explains the choice as being because "the frequency of preshift, overflow, and precisionloss postshift on floatingpoint addition are substantially reduced by this choice."^{[16]} This allowed higher performance for the large System/360 models, and reduced cost for the small ones. The authors were aware of the potential for precision loss, but assumed that this would not be significant for 64bit floatingpoint variables. Unfortunately, the designers seem not to have been aware of Benford's Law which means that a large proportion of numbers will suffer reduced precision.
The book "Computer Architecture" by two of the System/360 architects quotes Sweeney's study of 195865 which showed that using a base greater than 2 greatly reduced the number of shifts required for alignment and normalisation, in particular the number of different shifts needed. They used a larger base to make the implementations run faster, and the choice of base 16 was natural given 8bit bytes. The intention was that 32bit floats would only be used for calculations that would not propagate rounding errors, and 64bit double precision would be used for all scientific and engineering calculations. The initial implementation of double precision lacked a guard digit to allow proper rounding, but this was changed soon after the first customer deliveries.^{[17]}