Alphabetical order


Alphabetical order is a system whereby character strings are placed in order based on the position of the characters in a specific ordering of an alphabet. It is one of the methods of collation. In mathematics, a lexicographical order is the generalization of the alphabetical order to other data types, such as sequences of numbers or other ordered mathematical objects.
When applied to strings or sequences that may contain digits, numbers or more elaborate types of elements, in addition to alphabetical characters, the alphabetical order is generally called a lexicographical order.
To determine which of two strings of characters comes first when arranging in alphabetical order, their first letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet comes before the other string. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the shorter string is deemed to come first in alphabetical order.
Capital or upper case letters are generally considered to be identical to their corresponding lower case letters for the purposes of alphabetical ordering, although conventions may be adopted to handle situations where two strings differ only in capitalization. Various conventions also exist for the handling of strings containing spaces, modified letters, such as those with diacritics, and non-letter characters such as marks of punctuation.
The result of placing a set of words or strings in alphabetical order is that all of the strings beginning with the same letter are grouped together; within that grouping all words beginning with the same two-letter sequence are grouped together; and so on. The system thus tends to maximize the number of common initial letters between adjacent words.

History

The order of the letters of the alphabet is attested from the 14th century BC in the town of Ugarit on Syria's northern coast. Tablets found there bear over one thousand cuneiform signs, but these signs are not Babylonian and there are only thirty distinct characters. About twelve of the tablets have the signs set out in alphabetic order. There are two orders found, one of which is nearly identical to the order used for Hebrew, Greek and Latin, and a second order very similar to that used for Geʽez.
It is not known how many letters the Proto-Sinaitic alphabet had nor what their alphabetic order was. Among its descendants, the Ugaritic alphabet had 27 consonants, the South Arabian alphabets had 29, and the Phoenician alphabet 22. These scripts were arranged in two orders, an ABGDE order in Phoenician and an HLĦMQ order in the south; Ugaritic preserved both orders. Both sequences proved remarkably stable among the descendants of these scripts.
As applied to words, alphabetical order was first used in the 1st millennium BCE by Northwest Semitic scribes using the abjad system. However, a range of other methods of classifying and ordering material, including geographical, chronological, hierarchical and by category, were preferred over alphabetical order for centuries.
Parts of the Bible are dated to the 7th–6th centuries BCE. In the Book of Jeremiah, the prophet utilizes the Atbash substitution cipher, based on alphabetical order. Similarly, biblical authors used acrostics based on the Hebrew alphabet.
The first effective use of alphabetical order as a cataloging device among scholars may have been in ancient Alexandria, in the Great Library of Alexandria, which was founded around 300 BCE. The poet and scholar Callimachus, who worked there, is thought to have created the world's first library catalog, known as the Pinakes, with scrolls shelved in alphabetical order of the first letter of authors' names.
In the 1st century BC, Roman writer Varro compiled alphabetic lists of authors and titles. In the 2nd century CE, Sextus Pompeius Festus wrote an encyclopedic epitome of the works of Verrius Flaccus, De verborum significatu, with entries in alphabetic order. In the 3rd century CE, Harpocration wrote a Homeric lexicon alphabetized by all letters.
The 10th century saw major alphabetical lexicons of Greek, Arabic, and Biblical Hebrew. Alphabetical order as an aid to consultation flourished in 11th-century Italy, which contributed works on Latin and Talmudic Aramaic.
In the second half of the 12th century, Christian preachers adopted alphabetical tools to analyse biblical vocabulary. This led to the compilation of alphabetical concordances of the Bible by the Dominican friars in Paris in the 13th century, under Hugh of Saint Cher. Older reference works such as St. Jerome's Interpretations of Hebrew Names were alphabetized for ease of consultation. The use of alphabetical order was initially resisted by scholars, who expected their students to master their area of study according to its own rational structures; its success was driven by such tools as Robert Kilwardby's index to the works of St. Augustine, which helped readers access the full original text instead of depending on the compilations of excerpts which had become prominent in 12th century scholasticism. The adoption of alphabetical order was part of the transition from the primacy of memory to that of written works. The idea of ordering information by the order of the alphabet also met resistance from the compilers of encyclopaedias in the 12th and 13th centuries, who were all devout churchmen. They preferred to organise their material theologically – in the order of God's creation, starting with Deus.
In 1604 Robert Cawdrey had to explain in Table Alphabeticall, the first monolingual English dictionary, "Nowe if the word, which thou art desirous to finde, begin with then looke in the beginning of this Table, but if with looke towards the end". Although as late as 1803 Samuel Taylor Coleridge condemned encyclopedias with "an arrangement determined by the accident of initial letters", many lists are today based on this principle.

Ordering in the Latin script

Basic order and examples

The standard order of the modern ISO basic Latin alphabet is:
An example of straightforward alphabetical ordering follows:
  • As; Aster; Astrolabe; Astronomy; Astrophysics; At; Ataman; Attack; Baa
Another example:
  • Barnacle; Be; Been; Benefit; Bent
The above words are ordered alphabetically. As comes before Aster because they begin with the same two letters and As has no more letters after that whereas Aster does. The next three words come after Aster because their fourth letter is r, which comes after e in the alphabet. Those words themselves are ordered based on their sixth letters. Then comes At, which differs from the preceding words in the second letter. Ataman comes after At for the same reason that Aster came after As. Attack follows Ataman based on comparison of their third letters, and Baa comes after all of the others because it has a different first letter.

Treatment of multiword strings

When some of the strings being ordered consist of more than one word, i.e., they contain spaces or other separators such as hyphens, then two basic approaches may be taken. In the first approach, all strings are ordered initially according to their first word, as in the sequence:
  • Oak; Oak Hill; Oak Ridge; Oakley Park; Oakley River
  • :where all strings beginning with the separate word Oak precede all those beginning with Oakley, because Oak precedes Oakley in alphabetical order.
In the second approach, strings are alphabetized as if they had no spaces or hyphens, giving the sequence:
  • Oak; Oak Hill; Oakley Park; Oakley River; Oak Ridge
  • :where Oak Ridge now comes after the Oakley strings, as it would if it were written "Oakridge".
The second approach is the one usually taken in dictionaries, and it is thus often called dictionary order by publishers. The first approach has often been used in book indexes, although each publisher traditionally set its own standards for which approach to use therein; there was no ISO standard for book indexes before 1975.

Special cases

Modified letters

In French, modified letters are treated the same as the base letter for alphabetical ordering purposes. For example, rôle comes between rock and rose, as if it were written role. However, languages that use such letters systematically generally have their own ordering rules. See below.

Ordering by surname

In most cultures where family names are written after given names, it is still desired to sort lists of names by family name first. In this case, names need to be reordered to be sorted correctly. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way. Capturing this rule in a computer collation algorithm is complex, and simple attempts will fail. For example, unless the algorithm has at its disposal an extensive list of family names, there is no way to decide if "Gillian Lucille van der Waal" is "van der Waal, Gillian Lucille", "Waal, Gillian Lucille van der", or even "Lucille van der Waal, Gillian".
Ordering by surname is frequently encountered in academic contexts. Within a single multi-author paper, ordering the authors alphabetically by surname, rather than by other methods such as reverse seniority or subjective degree of contribution to the paper, is seen as a way of "acknowledg similar contributions" or "avoid disharmony in collaborating groups". The practice in certain fields of ordering citations in bibliographies by the surnames of their authors has been found to create bias in favour of authors with surnames which appear earlier in the alphabet, while this effect does not appear in fields in which bibliographies are ordered chronologically.