Fixed-point arithmetic
In computing, fixed-point is a method of representing fractional numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents. More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g., a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.
In the fixed-point representation, the fraction is often expressed in the same number base as the integer part, but using negative powers of the base b. The most common variants are decimal and binary. The latter is commonly known also as binary scaling. Thus, if n fraction digits are stored, the value will always be an integer multiple of b−n. Fixed-point representation can also be used to omit the low-order digits of integer values, for instance, when representing large dollar values as multiples of $1000.
When decimal fixed-point numbers are displayed for human reading, the fraction digits are usually separated from those of the integer part by a radix character. Internally, however, there is no separation, and the distinction between the two groups of digits is defined only by the programs that handle such numbers.
Fixed-point representation was the norm in mechanical calculators. Since most modern processors have a fast floating-point unit, fixed-point representations in processor-based implementations are now used only in special situations, such as in low-cost embedded microprocessors and microcontrollers; in applications that demand high speed or low power consumption or small chip area, like image, video, and digital signal processing; or when their use is more natural for the problem. Examples of the latter are accounting of dollar amounts, when fractions of cents must be rounded to whole cents in strictly prescribed ways; and the evaluation of functions by table lookup, or any application where rational numbers need to be represented without rounding errors. Fixed-point representation is still the norm for field-programmable gate array implementations, as floating-point support in an FPGA requires significantly more resources than fixed-point support.
Representation
A fixed-point representation of a fractional number is essentially an integer that is to be implicitly multiplied by a fixed scaling factor. For example, the value 1.23 can be stored in a variable as the integer value 123 with an implicit scaling factor of 1/100. This representation allows standard integer arithmetic logic units to perform rational number calculations.Negative values are usually represented in binary fixed-point format as a signed integer in two's complement representation with an implicit scaling factor as above. The sign of the value will always be indicated by the first stored bit, even if the number of fraction bits is greater than or equal to the total number of bits. For example, the 8-bit signed binary integer 2 = −11, taken with −3, +5, and +12 implied fraction bits, would represent the values −11/2−3 = −88, −11/25 = −0., and −11/212 = −0., respectively.
Alternatively, negative values can be represented by an integer in the sign-magnitude format, in which case the sign is never included in the number of implied fraction bits. This variant is more commonly used in decimal fixed-point arithmetic. Thus the signed 5-digit decimal integer 10, taken with −3, +5, and +12 implied decimal fraction digits, would represent the values −25/10−3 = −25000, −25/105 = −0.00025, and −25/1012 = −0., respectively.
A program will usually assume that all fixed-point values that will be stored into a given variable, or will be produced by a given instruction, will have the same scaling factor. This parameter can usually be chosen by the programmer depending on the precision needed and range of values to be stored.
The scaling factor of a variable or formula may not appear explicitly in the program. Good programming practice then requires that it be provided in the documentation, at least as a comment in the source code.
Choice of scaling factors
For greater efficiency, scaling factors are often chosen to be powers of the base b used to represent the integers internally. However, often the best scaling factor is dictated by the application. Thus, one often uses scaling factors that are powers of 10, for human convenience, even when the integers are represented internally in binary. Decimal scaling factors also mesh well with the metric system, since the choice of the fixed-point scaling factor is often equivalent to the choice of a unit of measure.However, other scaling factors may be used occasionally, e.g., a fractional amount of hours may be represented as an integer number of seconds; that is, as a fixed-point number with scale factor of 1/3600.
Even with the most careful rounding, fixed-point values represented with a scaling factor S may have an error of up to ±0.5 in the stored integer, that is, ±0.5 S in the value. Therefore, smaller scaling factors generally produce more accurate results.
On the other hand, a smaller scaling factor means a smaller range of the values that can be stored in a given program variable. The maximum fixed-point value that can be stored into a variable is the largest integer value that can be stored into it, multiplied by the scaling factor, and similarly for the minimum value. For example, the table below gives the implied scaling factor S, the minimum and maximum representable values Vmin and Vmax, and the accuracy δ = S/2 of values that could be represented in 16-bit signed binary fixed point format, depending on the number f of implied fraction bits.
| f | S | δ | Vmin | Vmax |
| −3 | 1/2−3 = 8 | 4 | − | + |
| 0 | 1/20 = 1 | 0.5 | − | + |
| 5 | 1/25 = 1/32 | < 0.016 | −1024. | +1023. |
| 14 | 1/214 = 1/ | < 0. | −2. | +1. |
| 15 | 1/215 = 1/ | < 0. | −1. | +0. |
| 16 | 1/216 = 1/ | < 0. | −0. | +0. |
| 20 | 1/220 = 1/ | < 0. | −0. | +0. |
Fixed-point formats with scaling factors of the form 2n−1 have been said to be appropriate for image processing and other digital signal processing tasks. They are supposed to provide more consistent conversions between fixed- and floating-point values than the usual 2n scaling. The Julia programming language implements both versions.
Exact values
Any binary fraction a/2m, such as 1/16 or 17/32, can be exactly represented in fixed-point, with a power-of-two scaling factor 1/2n with any n ≥ m. However, most decimal fractions like 0.1 or 0.123 are infinite repeating fractions in base 2. and hence cannot be represented that way.Similarly, any decimal fraction a/10m, such as 1/100 or 37/1000, can be exactly represented in fixed point with a power-of-ten scaling factor 1/10n with any n ≥ m. This decimal format can also represent any binary fraction a/2m, such as 1/8 or 17/32.
More generally, a rational number a/''b, with a'' and b relatively prime and b positive, can be exactly represented in binary fixed point only if b is a power of 2; and in decimal fixed point only if b has no prime factors other than 2 and/or 5.
Comparison with floating-point
Fixed-point computations can be faster and/or use less hardware than floating-point ones. If the range of the values to be represented is known in advance and is sufficiently limited, fixed point can make better use of the available bits. For example, if 32 bits are available to represent a number between 0 and 1, a fixed-point representation can have error less than 1.2 × 10−10, whereas the standard floating-point representation may have error up to 596 × 10−10 — because 9 of the bits are wasted with the sign and exponent of the dynamic scaling factor. Specifically, comparing 32-bit fixed-point to floating-point audio, a recording requiring less than 40 dB of headroom has a higher signal-to-noise ratio using 32-bit fixed.Programs using fixed-point computations are usually more portable than those using floating-point since they do not depend on the availability of an FPU. This advantage was particularly strong before the IEEE Floating Point Standard was widely adopted, when floating-point computations with the same data would yield different results depending on the manufacturer, and often on the computer model.
Many embedded processors lack an FPU, because integer arithmetic units require substantially fewer logic gates and consume much smaller chip area than an FPU; and software emulation of floating-point on low-speed devices would be too slow for most applications. CPU chips for the earlier personal computers and game consoles, like the Intel 386 and 486SX, also lacked an FPU.
The absolute resolution of any fixed-point format is constant over the whole range, namely the scaling factor S. In contrast, the relative resolution of a floating-point format is approximately constant over its whole range, varying within a factor of the base b; whereas their absolute resolution varies by many orders of magnitude, like the values themselves.
In many cases, the rounding and truncation errors of fixed-point computations are easier to analyze than those of the equivalent floating-point computations. Applying linearization techniques to truncation, such as dithering and/or noise shaping is more straightforward within fixed-point arithmetic. On the other hand, the use of fixed point requires greater care by the programmer. Avoidance of overflow requires much tighter estimates for the ranges of variables and all intermediate values in the computation, and often also extra code to adjust their scaling factors. Fixed-point programming normally requires the use of integer types of different widths. Fixed-point applications can make use of block floating point, which is a fixed-point environment having each array of fixed-point data be scaled with a common exponent in a single word.