Source lines of code
Source lines of code, also known as lines of code, is a software metric used to measure the size of a computer program by counting the number of lines in the text of the program's source code. SLOC is typically used to predict the amount of effort that will be required to develop a program, as well as to estimate programming productivity or maintainability once the software is produced.
Measurement methods
Multiple useful comparisons involve only the order of magnitude of lines of code in a project. Using lines of code to compare a 10,000-line project to a 100,000-line project is far more useful than when comparing a 20,000-line project with a 21,000-line project. While it is debatable exactly how to measure lines of code, discrepancies of an order of magnitude can be clear indicators of software complexity or man-hours.There are two major types of SLOC measures: physical SLOC and logical SLOC. Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of lines in the text of the program's source code excluding comment lines.
Logical SLOC attempts to measure the number of executable "statements", but their specific definitions are tied to specific computer languages. It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are more sensitive to logically irrelevant formatting and style conventions than logical SLOC. However, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC.
Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC:
for printf; /* How many lines of code is this? */
In this example we have:
- 1 physical line of code,
- 2 logical lines of code ,
- 1 comment line.
/* Now how many lines of code is this? */
for
In this example we have:
- 4 physical lines of code : is placing braces work to be estimated?
- 2 logical lines of code : what about all the work writing non-statement lines?
- 1 comment line: tools must account for all code and comments regardless of comment placement.
Origins
At the time when SLOC was introduced as a metric, the most commonly used languages, such as FORTRAN and assembly language, were line-oriented languages. These languages were developed at the time when punched cards were the main form of data entry for programming. One punched card usually represented one line of code. It was one discrete object that was easily counted. It was the visible output of the programmer, so it made sense to managers to count lines of code as a measurement of a programmer's productivity, even referring to such as "card images". Today, the most commonly used computer languages allow a lot more leeway for formatting. Text lines are no longer limited to 80 or 96 columns, and one line of text no longer necessarily corresponds to one line of code.Usage of SLOC measures
SLOC measures are somewhat controversial, particularly in the way that they are sometimes misused. Experiments have repeatedly confirmed that effort is highly correlated with SLOC, that is, programs with larger SLOC values take more time to develop. Thus, SLOC can be effective in estimating effort. However, functionality is less well correlated with SLOC: skilled developers may be able to develop the same functionality with far less code, so one program with fewer SLOC may exhibit more functionality than another similar program. Counting SLOC as productivity measure has its caveats, since a developer can develop only a few lines and yet be far more productive in terms of functionality than a developer who ends up creating more lines. Good developers may merge multiple code modules into a single module, improving the system yet appearing to have negative productivity because they remove code. Furthermore, inexperienced developers often resort to code duplication, which is highly discouraged as it is more bug-prone and costly to maintain, but it results in higher SLOC.SLOC counting exhibits further accuracy issues at comparing programs written in different languages unless adjustment factors are applied to normalize languages. Various computer languages balance brevity and clarity in different ways; as an extreme example, most assembly languages would require hundreds of lines of code to perform the same task as a few characters in APL. The following example shows a comparison of a "hello world" program written in BASIC, C, and COBOL.
| BASIC | C | COBOL |
PRINT "hello, world" |
| identification division. program-id. hello. procedure division. display "hello, world" goback. end program hello. |
| Lines of code: 1 | Lines of code: 4 | Lines of code: 6 |
Another increasingly common problem in comparing SLOC metrics is the difference between auto-generated and hand-written code. Modern software tools often have the capability to auto-generate enormous amounts of code with a few clicks of a mouse. For instance, graphical user interface builders automatically generate all the source code for a graphical control elements simply by dragging an icon onto a workspace. The work involved in creating this code cannot reasonably be compared to the work necessary to write a device driver, for instance. By the same token, a hand-coded custom GUI class could easily be more demanding than a simple device driver; hence the shortcoming of this metric.
There are several cost, schedule, and effort estimation models which use SLOC as an input parameter, including the widely used Constructive Cost Model series of models by Barry Boehm et al., PRICE Systems True S and Galorath's SEER-SEM. While these models have shown good predictive power, they are only as good as the estimates fed to them. Many have advocated the use of function points instead of SLOC as a measure of functionality, but since function points are highly correlated to SLOC this is not a universally held view.
Example
According to Vincent Maraia, the SLOC values for various operating systems in Microsoft's Windows NT product line are as follows:| Year | Operating system | SLOC |
| 1993 | Windows NT 3.1 | 4–5 |
| 1994 | Windows NT 3.5 | 7–8 |
| 1996 | Windows NT 4.0 | 11–12 |
| 2000 | Windows 2000 | more than 29 |
| 2001 | Windows XP | 45 |
| 2003 | Windows Server 2003 | 50 |
David A. Wheeler studied the Red Hat distribution of the Linux operating system, and reported that Red Hat Linux version 7.1 contained over 30 million physical SLOC. He also extrapolated that, had it been developed by conventional proprietary means, it would have required about 8,000 person-years of development effort and would have cost over $1 billion.
A similar study was later made of Debian GNU/Linux version 2.2 ; this operating system was originally released in August 2000. This study found that Debian GNU/Linux 2.2 included over 55 million SLOC, and if developed in a conventional proprietary way would have required 14,005 person-years and cost US$1.9 billion to develop. Later runs of the tools used report that the following release of Debian had 104 million SLOC, and, the newest release is going to include over 213 million SLOC.
| Year | Operating system | SLOC |
| 2000 | Debian 2.2 | 55–59 |
| 2002 | Debian 3.0 | 104 |
| 2005 | Debian 3.1 | 215 |
| 2007 | Debian 4.0 | 283 |
| 2009 | Debian 5.0 | 324 |
| 2012 | Debian 7.0 | 419 |
| 2009 | OpenSolaris | 9.7 |
| FreeBSD | 8.8 | |
| 2005 | Mac OS X 10.4 | 86 |
| 1991 | Linux kernel 0.01 | 0.010239 |
| 2001 | Linux kernel 2.4.2 | 2.4 |
| 2003 | Linux kernel 2.6.0 | 5.2 |
| 2009 | Linux kernel 2.6.29 | 11.0 |
| 2009 | Linux kernel 2.6.32 | 12.6 |
| 2010 | Linux kernel 2.6.35 | 13.5 |
| 2012 | Linux kernel 3.6 | 15.9 |
| 2015-06-30 | Linux kernel pre-4.2 | 20.2 |
Utility
Advantages
- Scope for automation of counting: since line of code is a physical entity, manual counting effort can be easily eliminated by automating the counting process. Small utilities may be developed for counting the LOC in a program. However, a logical code counting utility developed for a specific language cannot be used for other languages due to the syntactical and structural differences among languages. Physical LOC counters, however, have been produced which count dozens of languages.
- An intuitive metric: line of code serves as an intuitive metric for measuring the size of software because it can be seen, and the effect of it can be visualized. Function points are said to be more of an objective metric which cannot be imagined as being a physical entity, it exists only in the logical space. This way, LOC comes in handy to express the size of software among programmers with low levels of experience.
- Ubiquitous measure: LOC measures have been around since the earliest days of software. As such, it is arguable that more LOC data is available than any other size measure.