CORDIC

CORDIC, short for coordinate rotation digital computer, is a simple and efficient algorithm to calculate trigonometric functions, hyperbolic functions, square roots, multiplications, divisions, exponentials, and logarithms with arbitrary base, typically converging with one digit per iteration. CORDIC is therefore an example of a digit-by-digit algorithm. The original system is sometimes referred to as Volder's algorithm.
CORDIC and closely related methods known as pseudo-multiplication and pseudo-division or factor combining are commonly used when no hardware multiplier is available, as the only operations they require are addition, subtraction, bitshift and lookup tables. As such, they all belong to the class of shift-and-add algorithms. In computer science, CORDIC is often used to implement floating-point arithmetic when the target platform lacks hardware multiply for cost or space reasons. This was the case for most early microcomputers based on processors like the MOS 6502 and Zilog Z80.
Over the years, a number of variations on the concept emerged, including Circular CORDIC, Linear CORDIC, Hyperbolic CORDIC, and Generalized Hyperbolic CORDIC ,

History

Similar mathematical techniques were published by Henry Briggs as early as 1624 and Robert Flower in 1771, but CORDIC is better optimized for low-complexity finite-state CPUs.
CORDIC was conceived in 1956 by Jack E. Volder at the aeroelectronics department of Convair out of necessity to replace the analog resolver in the B-58 bomber's navigation computer with a more accurate and faster real-time digital solution. Therefore, CORDIC is sometimes referred to as a digital resolver.
In his research Volder was inspired by a formula in the 1946 edition of the CRC Handbook of Chemistry and Physics:
where is such that, and.
His research led to an internal technical report proposing the CORDIC algorithm to solve sine and cosine functions and a prototypical computer implementing it. The report also discussed the possibility to compute hyperbolic coordinate rotation, logarithms and exponential functions with modified CORDIC algorithms. Utilizing CORDIC for multiplication and division was also conceived at this time. Based on the CORDIC principle, Dan H. Daggett, a colleague of Volder at Convair, developed conversion algorithms between binary and binary-coded decimal.
In 1958, Convair finally started to build a demonstration system to solve radar fix–taking problems named CORDIC I, completed in 1960 without Volder, who had left the company already. More universal CORDIC II models A and B were built and tested by Daggett and Harry Schuss in 1962.
Volder's CORDIC algorithm was first described in public in 1959, which caused it to be incorporated into navigation computers by companies including Martin-Orlando, Computer Control, Litton, Kearfott, Lear-Siegler, Sperry, Raytheon, and Collins Radio.
Volder teamed up with Malcolm McMillan to build Athena, a fixed-point desktop calculator utilizing his binary CORDIC algorithm. The design was introduced to Hewlett-Packard in June 1965, but not accepted. Still, McMillan introduced David S. Cochran to Volder's algorithm and when Cochran later met Volder he referred him to a similar approach John E. Meggitt had proposed as pseudo-multiplication and pseudo-division in 1961. Meggitt's method also suggested the use of base 10 rather than base 2, as used by Volder's CORDIC so far. These efforts led to the ROMable logic implementation of a decimal CORDIC prototype machine inside of Hewlett-Packard in 1966, built by and conceptually derived from Thomas E. Osborne's prototypical Green Machine, a four-function, floating-point desktop calculator he had completed in DTL logic in December 1964. This project resulted in the public demonstration of Hewlett-Packard's first desktop calculator with scientific functions, the HP 9100A in March 1968, with series production starting later that year.
When Wang Laboratories found that the HP 9100A used an approach similar to the factor combining method in their earlier LOCI-1 and LOCI-2 Logarithmic Computing Instrument desktop calculators, they unsuccessfully accused Hewlett-Packard of infringement of one of An Wang's patents in 1968.
John Stephen Walther at Hewlett-Packard generalized the algorithm into the Unified CORDIC algorithm in 1971, allowing it to calculate hyperbolic functions, natural exponentials, natural logarithms, multiplications, divisions, and square roots. The CORDIC subroutines for trigonometric and hyperbolic functions could share most of their code. This development resulted in the first scientific handheld calculator, the HP-35 in 1972. Based on hyperbolic CORDIC, Yuanyong Luo et al. further proposed a Generalized Hyperbolic CORDIC to directly compute logarithms and exponentials with an arbitrary fixed base in 2019. Theoretically, Hyperbolic CORDIC is a special case of GH CORDIC.
Originally, CORDIC was implemented only using the binary numeral system and despite Meggitt suggesting the use of the decimal system for his pseudo-multiplication approach, decimal CORDIC continued to remain mostly unheard of for several more years, so that Hermann Schmid and Anthony Bogacki still suggested it as a novelty as late as 1973 and it was found only later that Hewlett-Packard had implemented it in 1966 already.
Decimal CORDIC became widely used in pocket calculators, most of which operate in binary-coded decimal rather than binary. This change in the input and output format did not alter CORDIC's core calculation algorithms. CORDIC is particularly well-suited for handheld calculators, in which low cost – and thus low chip gate count – is much more important than speed.
CORDIC has been implemented in the ARM-based STM32G4, Intel 8087, 80287, 80387 up to the 80486 coprocessor series as well as in the Motorola 68881 and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts of the FPU sub-system.

Applications

CORDIC uses simple shift-add operations for several computing tasks such as the calculation of trigonometric, hyperbolic and logarithmic functions, real and complex multiplications, division, square-root calculation, solution of linear systems, eigenvalue estimation, singular value decomposition, QR factorization and many others. As a consequence, CORDIC has been used for applications in diverse areas such as signal and image processing, communication systems, robotics and 3D graphics apart from general scientific and technical computation.

Hardware

The algorithm was used in the navigational system of the Apollo program's Lunar Roving Vehicle to compute bearing and range, or distance from the Lunar module. CORDIC was used to implement the Intel 8087 math coprocessor in 1980, avoiding the need to implement hardware multiplication.
CORDIC is generally faster than other approaches when a hardware multiplier is not available, or when the number of gates required to implement the functions it supports should be minimized.
In fact, CORDIC is a standard drop-in IP in FPGA development applications such as Vivado for Xilinx, while a power series implementation is not due to the specificity of such an IP, i.e. CORDIC can compute many different functions while a hardware multiplier configured to execute power series implementations can only compute the function it was designed for.
On the other hand, when a hardware multiplier is available, table-lookup methods and power series are generally faster than CORDIC. In recent years, the CORDIC algorithm has been used extensively for various biomedical applications, especially in FPGA implementations.
The STM32G4, STM32U5 and STM32H5 series and certain STM32H7 series of MCUs implement a CORDIC module to accelerate computations in various mixed signal applications such as graphics for human-machine interface and field oriented control of motors. While not as fast as a power series approximation, CORDIC is indeed faster than interpolating table based implementations such as the ones provided by the ARM CMSIS and C standard libraries. Though the results may be slightly less accurate as the CORDIC modules provided only achieve 20 bits of precision in the result. For example, most of the performance difference compared to the ARM implementation is due to the overhead of the interpolation algorithm, which achieves full floating point precision and can likely achieve relative error to that precision. Another benefit is that the CORDIC module is a coprocessor and can be run in parallel with other CPU tasks.
The issue with using Taylor series is that while they do provide small absolute error, they do not exhibit well behaved relative error. Other means of polynomial approximation, such as minimax optimization, may be used to control both kinds of error.

Software

Many older systems with integer-only CPUs have implemented CORDIC to varying extents as part of their IEEE floating-point libraries. As most modern general-purpose CPUs have floating-point registers with common operations such as add, subtract, multiply, divide, sine, cosine, square root, log₁₀, natural log, the need to implement CORDIC in them with software is nearly non-existent. Only microcontroller or special safety and time-constrained software applications would need to consider using CORDIC.

Modes of operation

Rotation mode

CORDIC can be used to calculate a number of different functions. This explanation shows how to use CORDIC in rotation mode to calculate the sine and cosine of an angle, assuming that the desired angle is given in radians and represented in a fixed-point format. To determine the sine or cosine for an angle the y or x coordinate of a point on the unit circle corresponding to the desired angle must be found. Using CORDIC, one would start with the vector :
Image:CORDIC-illustration.svg|thumb|300px|An illustration of the CORDIC algorithm in progress
In the first iteration, this vector is rotated 45° counterclockwise to get the vector. Successive iterations rotate the vector in one or the other direction by size-decreasing steps, until the desired angle has been achieved. Each step angle is for.
More formally, every iteration calculates a rotation, which is performed by multiplying the vector with the rotation matrix :
The rotation matrix is given by
Using the trigonometric identity:
the cosine factor can be taken out to give:
The expression for the rotated vector then becomes:
where and are the components of. Setting the angle for each iteration such that still yields a series that converges to every possible output value. The multiplication with the tangent can therefore be replaced by a division by a power of two, which is efficiently done in digital computer hardware using a bit shift. The expression then becomes:
in which determines the direction of the rotation. If the rotation angle is to be positive, is +1, otherwise it is −1.
The following trigonometric identity can be used to replace the cosine:
giving this multiplier for each iteration:
The factors can then be taken out of the iterative process and applied all at once afterwards with a scaling factor :
which is calculated in advance and stored in a table or as a single constant, if the number of iterations is fixed. This correction could also be made in advance, by scaling and hence saving a multiplication. Additionally, it can be noted that
to allow further reduction of the algorithm's complexity. Some applications may avoid correcting for altogether, resulting in a processing gain :
After a sufficient number of iterations, the vector's angle will be close to the wanted angle. For most ordinary purposes, 40 iterations are sufficient to obtain the correct result to the 10th decimal place.
The only task left is to determine whether the rotation should be clockwise or counterclockwise at each iteration. This is done by keeping track of how much the angle was rotated at each iteration and subtracting that from the wanted angle; then in order to get closer to the wanted angle, if is positive, the rotation is clockwise, otherwise it is negative and the rotation is counterclockwise:
The values of must also be precomputed and stored. For small angles it can be approximated with to reduce the table size.
As can be seen in the illustration above, the sine of the angle is the y coordinate of the final vector while the x coordinate is the cosine value.