Icon (programming language)


Icon is a very high-level programming language based on the concept of "goal-directed execution" in which an expression in code returns "success" along with a result, or a "failure", indicating that there is no valid result. The success and failure of a given expression is used to direct further processing, whereas conventional languages would typically use Boolean logic written by the programmer to achieve the same ends. Because the logic for basic control structures is often implicit in Icon, common tasks can be completed with less explicit code.
Icon was designed by Ralph Griswold after leaving Bell Labs where he was a major contributor to the SNOBOL language. SNOBOL was a string-processing language with what would be considered dated syntax by the standards of the early 1970s. After moving to the University of Arizona, he further developed the underlying SNOBOL concepts in SL5, but considered the result to be a failure. This led to the significantly updated Icon, which blends the short but conceptually dense code of SNOBOL-like languages with the more familiar syntax of ALGOL-inspired languages like C or Pascal.
Like the languages that inspired it, the primary area of use of Icon is managing strings and textual patterns. String operations often fail, for instance, finding "the" in "world". In most languages, this requires testing and branching to avoid using a non-valid result. In Icon most of these sorts of tests are simply unneeded, reducing the amount of code that must be written. Complex pattern handling can be done in a few lines of terse code, similar to more dedicated languages like Perl but retaining a more function-oriented syntax familiar to users of other ALGOL-like languages.
Icon is not object-oriented, but an object-oriented extension named Idol was developed in 1996 which eventually became Unicon. It also inspired other languages, with its simple generators being especially influential; Icon's generators were a major inspiration for the Python language.

History

SNOBOL

The original SNOBOL effort, retroactively known as SNOBOL1, launched in the fall of 1962 at the Bell Labs Programming Research Studies Department. The effort was a reaction to the frustrations of attempting to use the SCL language for polynomial formula manipulation, symbolic integration and studying Markov chains. SCL, written by the department head Chester Lee, was both slow and had a low-level syntax that resulting in volumes of code for even simple projects. After briefly considering the COMIT language, Ivan Polonsky, Ralph Griswold and David Farber, all members of the six-person department, decided to write their own language to solve these problems.
The first versions were running on the IBM 7090 in early 1963, and by the summer had been built out and was being used across Bell. This led almost immediately to SNOBOL2, which added a number of built-in functions, and the ability to link to external assembly language code. It was released in April 1964 and mostly used within Bell, but also saw some use at Project MAC. The introduction of system functions served mostly to indicate the need for user-defined functions, which was the major feature of SNOBOL3, released in July 1964.
SNOBOL3's introduction corresponded with major changes within the Bell Labs computing department, including the addition of the new GE 645 mainframe which would require a rewrite of SNOBOL. Instead, the team suggested writing a new version that would run on a virtual machine, named SIL for SNOBOL Intermediate Language, allowing it to be easily ported to any sufficiently powerful platform. This proposal was accepted as SNOBOL4 in September 1965. By this time, plans for a significantly improved version of the language emerged in August 1966. Further work on the language continued throughout the rest of the 1960s, notably adding the associative array type in later version, which they referred to as a table.

SL5 leads to Icon

Griswold left Bell Labs to become a professor at the University of Arizona in August 1971. He introduced SNOBOL4 as a research tool at that time. He received grants from the National Science Foundation to continue supporting and evolving SNOBOL.
As a language originally developed in the early 1960s, SNOBOL's syntax bears the marks of other early programming languages like FORTRAN and COBOL. In particular, the language is column-dependant, as many of these languages were entered on punch cards where column layout is natural. Additionally, control structures were almost entirely based on branching around code rather than the use of blocks, which were becoming a must-have feature after the introduction of ALGOL 60. By the time he moved to Arizona, the syntax of SNOBOL4 was hopelessly outdated.
Griswold began the effort of implementing SNOBOL's underlying success/failure concept with traditional flow control structures like if/then. This became SL5, short for "SNOBOL Language 5", but the result was unsatisfying. In 1977, he returned to the language to consider a new version. He abandoned the very powerful function system introduced in SL5 with a simpler concept of suspend/resume and developed a new concept for the natural successor to SNOBOL4 with the following principles;
  • SNOBOL4's philosophic and sematic basis
  • SL5 syntactic basis
  • SL5 features, excluding the generalized procedure mechanism
The new language was initially known as SNOBOL5, but as it was significantly different from SNOBOL in all but the underlying concept, a new name was ultimately desired. After considering "s" as a sort of homage to "C", but this was ultimately abandoned due to the problems with typesetting documents using that name. A series of new names were proposed and abandoned; Irving, bard, and "TL" for "The Language". It was at this time that Xerox PARC began publishing about their work on graphical user interfaces and the term "icon" began to enter the computer lexicon. The decision was made to change the name initially to "icon" before finally choosing "Icon".

Language

Basic syntax

The Icon language is derived from the ALGOL-class of structured programming languages, and thus has syntax similar to C or Pascal. Icon is most similar to Pascal, using syntax for assignments, the keyword and similar syntax. On the other hand, Icon uses C-style braces for structuring execution groups, and programs start by running a procedure called.
In many ways Icon also shares features with most scripting languages : variables do not have to be declared, types are cast automatically, and numbers can be converted to strings and back automatically. Another feature common to many scripting languages, but not all, is the lack of a line-ending character; in Icon, lines that do not end with a semicolon get ended by an implied semicolon if it makes sense.
Procedures are the basic building blocks of Icon programs. Although they use Pascal naming, they work more like C functions and can return values; there is no keyword in Icon.

procedure doSomething
write
end

Goal-directed execution

One of the key concepts in SNOBOL was that its functions returned the "success" or "failure" as primitives of the language rather than using magic numbers or other techniques.
For example, a function that returns the position of a substring within another string is a common routine found in most language runtime systems. In JavaScript to find the position of the word "World" within a "Hello, World!" program would be accomplished with, which would return 7 in the variable. If one instead asks for the the code will "fail", as the search term does not appear in the string. In JavaScript, as in most languages, this will be indicated by returning a magic number, in this case -1.
In SNOBOL a failure of this sort returns a special value,. SNOBOL's syntax operates directly on the success or failure of the operation, jumping to labelled sections of the code without having to write a separate test. For instance, the following code prints "Hello, world!" five times:

  • SNOBOL program to print Hello World
I = 1
LOOP OUTPUT = "Hello, world!"
I = I + 1
LE : S
END

To perform the loop, the less-than-or-equal operator,, is called on the index variable I, and if it ucceeds, meaning I is less than 5, it branches to the named label and continues.
Icon retained the concept of flow control based on success or failure but developed the language further. One change was the replacement of the labelled -like branching with block-oriented structures in keeping with the structured programming style that was sweeping the computer industry in the late 1960s. The second was to allow "failure" to be passed along the call chain so that entire blocks will succeed or fail as a whole. This is a key concept of the Icon language. Whereas in traditional languages one would have to include code to test the success or failure based on Boolean logic and then branch based on the outcome, such tests and branches are inherent to Icon code and do not have to be explicitly written.
For instance, consider this bit of code written in the Java programming language. It calls the function to read a character from a file, assigns the result to the variable, and then s the value of to another file. The result is to copy one file to another. will eventually run out of characters to read from the file, potentially on its very first call, which would leave in an undetermined state and potentially cause to cause a null pointer exception. To avoid this, returns the special value in this situation, which requires an explicit test to avoid ing it:

while ) != EOF)

In contrast, in Icon the function returns a line of text or. is not simply an analog of, as it is explicitly understood by the language to mean "stop processing" or "do the fail case" depending on the context. The equivalent code in Icon is:

while a := read do write

This means, "as long as read does not fail, call write, otherwise stop". There is no need to specify a test against the magic number as in the Java example, this is implicit, and the resulting code is simplified. Because success and failure are passed up through the call chain, one can embed function calls within others and they stop when the nested function call fails. For instance, the code above can be reduced to:

while write)

In this version, if the call fails, the call fails, and the stops. Icon's branching and looping constructs are all based on the success or failure of the code inside them, not on an arbitrary Boolean test provided by the programmer. performs the block if its "test" returns a value, and performs the block or moves to the next line if it returns. Likewise, continues calling its block until it receives a fail. Icon refers to this concept as goal-directed execution.
It is important to contrast the concept of success and failure with the concept of an exception; exceptions are unusual situations, not expected outcomes. Fails in Icon are expected outcomes; reaching the end of a file is an expected situation and not an exception. Icon does not have exception handling in the traditional sense, although fail is often used in exception-like situations. For instance, if the file being read does not exist, fails without a special situation being indicated. In traditional language, these "other conditions" have no natural way of being indicated; additional magic numbers may be used, but more typically exception handling is used to "throw" a value. For instance, to handle a missing file in the Java code, one might see:

try catch

This case needs two comparisons: one for EOF and another for all other errors. Since Java does not allow exceptions to be compared as logic elements, as under Icon, the lengthy syntax must be used instead. Try blocks also impose a performance penalty even if no exception is thrown, a distributed cost that Icon normally avoids.
Icon uses this same goal-directed mechanism to perform traditional Boolean tests, although with subtle differences. A simple comparison like does not mean, "if the conditional expression evaluation results in or returns a true value" as they would under most languages; instead, it means something more like, "if the conditional expression succeeds and does not fail". In this case, the operator succeeds if the comparison is true. The calls its clause if the expression succeeds, and either the or the next line if it fails. The result is similar to the traditional if/then seen in other languages, the performs if is less than. The subtlety is that the same comparison expression can be placed anywhere, for instance:

write

Another difference is that the operator returns its second argument if it succeeds, which in this example will result in the value of being written if it is larger than, otherwise nothing is written. As this is not a test per se, but an operator that returns a value, they can be strung together allowing things like, a common type of comparison that in most languages must be written as a conjunction of two inequalities like.
A key aspect of goal-directed execution is that the program may have to rewind to an earlier state if a procedure fails, a task known as backtracking. For instance, consider code that sets a variable to a starting location and then performs operations that may change the value - this is common in string scanning operations for instance, which will advance a cursor through the string as it scans. If the procedure fails, it is important that any subsequent reads of that variable return the original state, not the state as it was being internally manipulated. For this task, Icon has the reversible assignment operator,, and the reversible exchange,. For instance, consider some code that is attempting to find a pattern string within a larger string:


This code begins by moving to 10, the starting location for the search. However, if the fails, the block will fail as a whole, which results in the value of being left at 10 as an undesirable side effect. Replacing with indicates that should be reset to its previous value if the block fails. This provides an analog of atomicity in the execution.