Magic number (programming)
In computer programming, a magic number or file signature is a numeric literal in source code that has a special, particular meaning that is less than clear to the reader. Also in computing, but not limited to programming, the term is used for a number that identifies a particular concept but without additional knowledge its meaning is less than clear. For example, some file formats are identified by an embedded magic number in the file ). Also, a number that is relatively uniquely associated with a particular concept, such as a universally unique identifier, might be classified as a magic number.
Numeric literal
A magic number or magic constant is a numeric literal in source code which has a special meaning that is less than clear in context. This is considered an anti-pattern and breaks one of the oldest rules of programming, dating back to the COBOL, FORTRAN and PL/1 manuals of the 1960s.For example, in the following code that computes a price after tax,
1.05 is a magic number since the value encodes the sales tax rate, 5%, in a way that is less than obvious.price_after_tax = 1.05 * price
The use of magic numbers in code obscures the developers' intent in choosing that number, increases opportunities for subtle errors, and makes it more difficult for the program to be adapted and extended in the future. As an example, it is difficult to tell whether every digit in
3.14159265358979323846 is correctly typed, or if this constant for pi can be truncated to 3.14159 without affecting the functionality of the program with its reduced precision. Replacing all significant magic numbers with named constants makes programs easier to read, understand and maintain.The example above can be improved by adding a descriptively named variable:
TAX = 0.05
price_after_tax = * price
A good name can result in code that is more easily understood by a maintainer who is not the original author and even the original author after a period of time. An example of an uninformatively named constant is
int SIXTEEN = 16, while int NUMBER_OF_BITS = 16 might be more useful.Non-numeric data can have the same magical properties, and therefore, the same issues as magic numbers. Thus, declaring
const string testUserName = "John" and using might be better than using the literal "John" directly.Example
For example, if it is required to randomly shuffle the values in an array representing a standard pack of playing cards, this pseudocode does the job using the Fisher–Yates shuffle algorithm:for i from 1 to 52
j := i + randomInt - 1
a.swapEntries
where
a is an array object, the function randomInt chooses a random integer between 1 and x, inclusive, and swapEntries swaps the ith and jth entries in the array. In the preceding example, 52 and 53 are magic numbers, also not clearly related to each other. It is considered better programming style to write the following:int deckSize:= 52
for i from 1 to deckSize
j := i + randomInt - 1
a.swapEntries
This is preferable for several reasons:
- Better readability. A programmer reading the first example might wonder, What does the number 52 mean here? Why 52? The programmer might infer the meaning after reading the code carefully, but it is not obvious. Magic numbers become particularly confusing when the same number is used for different purposes in one section of code.
- Easier to maintain. It is easier to alter the value of the number, as it is not duplicated. Changing the value of a magic number is error-prone, because the same value is often used several times in different places within a program. Also, when two semantically distinct variables or numbers have the same value they may be accidentally both edited together. To modify the first example to shuffle a Tarot deck, which has 78 cards, a programmer might naively replace every instance of 52 in the program with 78. This would cause two problems. First, it would miss the value 53 on the second line of the example, which would cause the algorithm to fail in a subtle way. Second, it would likely replace the characters "52" everywhere, regardless of whether they refer to the deck size or to something else entirely, such as the number of weeks in a Gregorian calendar year, or more insidiously, are part of a number like "1523", all of which would introduce bugs. By contrast, changing the value of the
deckSizevariable in the second example would be a simple, one-line change. - Encourages documentation. The single place where the named variable is declared makes a good place to document what the value means and why it has the value it does. Having the same value in a plethora of places either leads to duplicate comments or leaves no one place where it's both natural for the author to explain the value and likely the reader shall look for an explanation.
- Coalesces information. The declarations of "magic number" variables can be placed together, usually at the top of a function or file, facilitating their review and change.
- Detects typos. Using a variable takes advantage of a compiler's checking. Accidentally typing "62" instead of "52" would go undetected, whereas typing "
dekSize" instead of "deckSize" would result in the compiler's warning thatdekSizeis undeclared. - Reduces typing. If a IDE supports code completion, it will fill in most of the variable's name from the first few letters.
- Facilitates parameterization. For example, to generalize the above example into a procedure that shuffles a deck of any number of cards, it would be sufficient to turn
deckSizeinto a parameter of that procedure, whereas the first example would require several changes.
for i from 1 to deckSize
j := i + randomInt - 1
a.swapEntries
Disadvantages are:
- Breaks locality. When the named constant is not defined near its use, it hurts the locality, and thus comprehensibility, of the code. Putting the 52 in a possibly distant place means that, to understand the workings of the "for" loop completely, one must track down the definition and verify that it is the expected number. This is easy to avoid when the constant is only used in one portion of the code. When the named constant is used in disparate portions, on the other hand, the remote location is a clue to the reader that the same value appears in other places in the code, which may also be worth looking into.
- Causes verbosity. The declaration of the constant adds a line. When the constant's name is longer than the value's, particularly if several such constants appear in one line, it may make it necessary to split one logical statement of the code across several lines. An increase in verbosity may be justified when there is some likelihood of confusion about the constant, or when there is a likelihood the constant may need to be changed, such as reuse of a shuffling routine for other card games. It may equally be justified as an increase in expressiveness.
- Performance considerations. It may be slower to process the expression
deckSize + 1at run-time than the value "53". That being said, most modern compilers will use techniques like constant folding and loop optimization to resolve the addition during compilation, so there is usually no or negligible speed penalty compared to using magic numbers in code. Especially the cost of debugging and the time needed trying to understand non-explanatory code must be held against the tiny calculation cost.Accepted use
- Use of 0 and 1 as initial or incremental values in a for loop, such as
- Use of 2 to check whether a number is even or odd, as in
isEven =, where%is the modulo operator - Use of simple literals, e.g., in expressions such as
circumference = 2 * Math.PI * radius, or for calculating the discriminant of a quadratic equation asd = b^2 − 4*a*c - Use of powers of 10 to convert metric values or to calculate percentage and per mille values
- Exponents in expressions such as
** 2 + f ** 0.5for - The literals 1 and 0 are sometimes used to represent the Boolean values true and false. Arguably, assigning these values to names such as TRUE and FALSE might be better.
- In C and C++, 0 is often used to mean null pointer even though the C standard library defines a macro
NULLand modern C++ includes a keyword.Format indicator
Origin
Format indicators were first used in early Version 7 Unix source code.Unix was ported to one of the first DEC PDP-11/20s, which did not have memory protection. So early versions of Unix used the relocatable memory reference model. Pre-Sixth Edition Unix versions read an executable file into memory and jumped to the first low memory address of the program, relative address zero. With the development of paged versions of Unix, a header was created to describe the executable image components. Also, a branch instruction was inserted as the first word of the header to skip the header and start the program. In this way a program could be run in the older relocatable memory reference mode or in paged mode. As more executable formats were developed, new constants were added by incrementing the branch offset.
In the Sixth Edition source code of the Unix program loader, the exec function read the executable image from the file system. The first 8 bytes of the file was a header containing the sizes of the program and initialized data areas. Also, the first 16-bit word of the header was compared to two constants to determine if the executable image contained relocatable memory references, the newly implemented paged read-only executable image, or the separated instruction and data paged image. There was no mention of the dual role of the header constant, but the high order byte of the constant was, in fact, the operation code for the PDP-11 branch instruction. Adding seven to the program counter showed that if this constant was executed, it would branch the Unix exec service over the executable image eight byte header and start the program.
Since the Sixth and Seventh Editions of Unix employed paging code, the dual role of the header constant was hidden. That is, the exec service read the executable file header data into a kernel space buffer, but read the executable image into user space, thereby not using the constant's branching feature. Magic number creation was implemented in the Unix linker and loader and magic number branching was probably still used in the suite of stand-alone diagnostic programs that came with the Sixth and Seventh Editions. Thus, the header constant did provide an illusion and met the criteria for magic.
In Version Seven Unix, the header constant was not tested directly, but assigned to a variable labeled ux_mag and subsequently referred to as the magic number. Probably because of its uniqueness, the term magic number came to mean executable format type, then expanded to mean file system type, and expanded again to mean any type of file.