Letter case

Letter case is the distinction between the letters that are in larger uppercase or capitals and smaller lowercase in the written representation of certain languages. The writing systems that distinguish between the upper- and lowercase have two parallel sets of letters: each in the majuscule set has a counterpart in the minuscule set. Some counterpart letters have the same shape, and differ only in size, but for others the shapes are different. The two case variants are alternative representations of the same letter: they have the same name and pronunciation and are typically treated identically when sorting in alphabetical order.
Letter case is generally applied in a mixed-case fashion, with both upper and lowercase letters appearing in a given piece of text for legibility. The choice of case is often denoted by the grammar of a language or by the conventions of a particular discipline. In orthography, the uppercase is reserved for special purposes, such as the first letter of a sentence or of a proper noun, which makes lowercase more common in regular text.
In some contexts, it is conventional to use one case only. For example, engineering design drawings are typically labelled entirely in uppercase letters, which are easier to distinguish individually than the lowercase when space restrictions require very small lettering. In mathematics, on the other hand, uppercase and lowercase letters denote generally different mathematical objects, which may be related when the two cases of the same letter are used; for example, may denote an element of a set.

Terminology

The terms upper case and lower case may be written as two consecutive words, connected with a hyphen, or as a single word. These terms originated from the common layouts of the shallow drawers called type cases used to hold the movable type for letterpress printing. Traditionally, the capital letters were stored in a separate shallow tray or "case" that was located above the case that held the small letters.

Majuscule

Majuscule, for palaeographers, is technically any script whose letters have no or very few ascenders and descenders. Consequently, all of the letters of a majuscule script are of roughly the same height and it is written within the space of just two parallel lines. This is contrasted with the four lines required by a minuscule script, on which see below. While majuscule scripts were originally used to write entire texts, their letters eventually came to be used primarily with the modern function of uppercase letters in European writing, so the term may sometimes be used as a synonym of 'uppercase letter' or 'capital' in contemporary contexts.

Minuscule

Minuscule refers to lower-case letters. In paleography, the term refers to a script which includes letters of different heights, with ascenders and descenders, so it needs to be written within a space of four parallel lines. Minuscule script letters eventually came to be used with the function of lowercase letters in European writing.

Typographical considerations

The glyphs of lowercase letters can resemble smaller forms of the uppercase glyphs restricted to the baseband or can look hardly related. Here is a comparison of the upper and lower case variants of each letter included in the English alphabet :
Typographically, the basic difference between the majuscules and minuscules is not that the majuscules are big and minuscules small, but that the majuscules generally are of uniform height.
There is more variation in the height of the minuscules, as some of them have parts higher or lower than the typical size. Normally, b, d, f, h, k, l, t are the letters with ascenders, and g, j, p, q, y are the ones with descenders. In addition, with old-style numerals still used by some traditional or classical fonts, 6 and 8 make up the ascender set, and 3, 4, 5, 7, and 9 the descender set.

Bicameral script

A minority of writing systems use two separate cases. Such writing systems are called bicameral scripts. These scripts include the Latin, Cyrillic, Greek, Coptic, Armenian, Glagolitic, Adlam, Warang Citi, Old Hungarian, Garay, Zaghawa, Osage, Vithkuqi, and Deseret scripts. Languages written in these scripts use letter cases as an aid to clarity. The Georgian alphabet has several variants, and there were attempts to use them as different cases, but the modern written Georgian language does not distinguish case.
All other writing systems make no distinction between majuscules and minuscules a system called unicameral script or unicase. This includes most syllabic and other non-alphabetic scripts.
In scripts with a case distinction, lowercase is generally used for the majority of text; capitals are used for capitalisation and emphasis when boldface is not available. Acronyms are often written in all-caps, depending on various factors.

Capitalisation

Capitalisation is the writing of a word with its first letter in uppercase and the remaining letters in lowercase. Capitalisation rules vary by language and are often quite complex, but in most modern languages that have capitalisation, the first word of every sentence is capitalised, as are all proper nouns.
Capitalisation in English, in terms of the general orthographic rules independent of context, is universally standardised for formal writing. Capital letters are used as the first letter of a sentence, a proper noun, or a proper adjective. The names of the days of the week and the names of the months are also capitalised, as are the first-person pronoun "I" and the vocative particle "O". There are a few pairs of words of different meanings whose only difference is capitalisation of the first letter. Honorifics and personal titles showing rank or prestige are capitalised when used together with the name of the person or as a direct address, but normally not when used alone and in a more general sense. It can also be seen as customary to capitalise any word in some contexts even a pronoun referring to a deity.
Other words normally start with a lower-case letter. There are, however, situations where further capitalisation may be used to give added emphasis, for example in headings and publication titles. In some traditional forms of poetry, capitalisation has conventionally been used as a marker to indicate the beginning of a line of verse independent of any grammatical feature. In political writing, parody and satire, the unexpected emphasis afforded by otherwise ill-advised capitalisation is often used to great stylistic effect, such as in the case of George Orwell's Big Brother.
Other languages vary in their use of capitals. For example, in German all nouns are capitalised, while in Romance and most other European languages the names of the days of the week, the names of the months, and adjectives of nationality, religion, and so on normally begin with a lower-case letter. On the other hand, in some languages it is customary to capitalise formal polite pronouns, for example De, Dem, Sie, Ihnen, and Vd or Ud.
Informal communication, such as texting, instant messaging or a handwritten sticky note, may not bother to follow the conventions concerning capitalisation, but that is because its users usually do not expect it to be formal.

Exceptional letters and digraphs

The German letter "ß" formerly existed only in lower case. The orthographical capitalisation does not concern "ß", which generally does not occur at the beginning of a word, and in the all-caps style it has traditionally been replaced by the digraph "SS". Since June 2017, however, capital ẞ is accepted as an alternative in the all-caps style.
The Greek upper-case letter "Σ" has two different lower-case forms: "ς" in word-final position and "σ" elsewhere. In a similar manner, the Latin lower-case letter "S" used to have two different lower-case forms: "s" in word-final position and " ſ " elsewhere. The latter form, called the long s, fell out of general use before the middle of the 19th century, except in countries that continued to use blackletter typefaces such as Fraktur. When blackletter type fell out of general use in the mid-20th century, even those countries dropped the long s.
The treatment of the Greek iota subscript with upper-case letters is complicated.
Unlike most languages that use Latin-script and link the dotless upper-case "I" with the dotted lower-case "i", Turkish, Tatar, Crimean Tatar as well as Azeri in Azerbaijan have both a dotted and dotless I, each in both upper and lower case. The two pairs represent distinct phonemes.
In some languages, specific digraphs may be regarded as single letters, and in Dutch, the digraph "IJ/ij" is even capitalised with both components written in uppercase. In other languages, such as Welsh and Hungarian, various digraphs are regarded as single letters for collation purposes, but the second component of the digraph will still be written in lower case even if the first component is capitalised. Similarly, in South Slavic languages whose orthography is coordinated between the Cyrillic and Latin scripts, the Latin digraphs "ǈ/ǉ", "ǋ/ǌ" and "ǅ/ǆ" are each regarded as a single letter, but only in all-caps style should both components be in upper case. Unicode designates a single character for each case variant of the three digraphs.
Some English surnames such as fforbes are traditionally spelt with a digraph instead of a capital letter.
In the Hawaiian orthography, the okina is a phonemic symbol that visually resembles a left single quotation mark. Representing the glottal stop, the okina can be characterised as either a letter or a diacritic. As a unicase letter, the okina is unaffected by capitalisation; it is the following letter that is capitalised instead.

Related features

Similar orthographic and graphostylistic conventions are used for emphasis or following language-specific or other rules, including:

Font effects such as italic type or oblique type, boldface, and choice of serif vs. sans-serif.
In mathematical notation lower-case and upper-case letters have generally different meanings, and other meanings can be implied by the use of other typefaces, such as boldface, fraktur, script typeface, and blackboard bold.
Some letters of the Arabic and Hebrew alphabets and some jamo of the Korean hangul have different forms depending on placement within a word, but these rules are strict and the different forms cannot be used for emphasis.
* In the Arabic and Arabic-based alphabets, letters in a word are connected, except for several that cannot connect to the following letter. Letters may have distinct forms depending on whether they are initial, medial, final, or isolated.
* In the Hebrew alphabet, five letters have a distinct form that is used when they are word-final.
In Georgian, some authors use isolated letters from the ancient Asomtavruli alphabet within a text otherwise written in the modern Mkhedruli in a fashion that is reminiscent of the usage of upper-case letters in the Latin, Greek, and Cyrillic alphabets.
In the Japanese writing system, an author has the option of switching between kanji, hiragana, katakana, and rōmaji. In particular, every hiragana character has an equivalent katakana character, and vice versa. Romanised Japanese sometimes uses lowercase letters to represent words that would be written in hiragana, and uppercase letters to represent words that would be written in katakana. Some kana characters are written in smaller type when they modify or combine with the preceding sign or the following sign.

Stylistic or specialised usage

In English, a variety of case styles are used in various circumstances:
; Sentence case
; Title case
; Start case
; All caps
; Small caps
; All lowercase

Headings and publication titles

In English-language publications, various conventions are used for the capitalisation of words in publication titles and headlines, including chapter and section headings. The rules differ substantially between individual house styles.
The convention followed by many British publishers and many U.S. newspapers is sentence-style capitalisation in headlines, i.e. capitalisation follows the same rules that apply for sentences. This convention is usually called sentence case. It may also be applied to publication titles, especially in bibliographic references and library catalogues. An example of a global publisher whose English-language house style prescribes sentence-case titles and headings is the International Organization for Standardization.
For publication titles it is, however, a common typographic practice among both British and U.S. publishers to capitalise significant words. This family of typographic conventions is usually called title case. For example, R. M. Ritter's Oxford Manual of Style suggests capitalising "the first word and all nouns, pronouns, adjectives, verbs and adverbs, but generally not articles, conjunctions and short prepositions". This is an old form of emphasis, similar to the more modern practice of using a larger or boldface font for titles. The rules which prescribe which words to capitalise are not based on any grammatically inherent correct–incorrect distinction and are not universally standardised; they differ between style guides, although most style guides tend to follow a few strong conventions, as follows:

Most styles capitalise all words except for short closed-class words ; but the first word and last word are also capitalised, regardless of their part of speech. Many styles capitalise longer prepositions such as "between" and "throughout", but not shorter ones such as "for" and "with". Typically, a preposition is considered short if it has up to three or four letters.
A few styles capitalise all words in title case, which has the advantage of being easy to implement and hard to get "wrong". Because of this rule's simplicity, software [|case-folding] routines can handle 95% or more of the editing, especially if they are programmed for desired exceptions.
As for whether hyphenated words are capitalised not only at the beginning but also after the hyphen, there is no universal standard; variation occurs in the wild and among house styles. Traditional copyediting makes a distinction between temporary compounds, in which every part of the hyphenated word is capitalised, and permanent compounds, which are terms that, although compound and hyphenated, are so well established that dictionaries enter them as headwords.

Title case is widely used in many English-language publications, especially in the United States. However, its conventions are sometimes not followed strictlyespecially in informal writing.
In creative typography, such as music record covers and other artistic material, all styles are commonly encountered, including all-lowercase letters and special case styles, such as studly caps. For example, in the wordmarks of video games it is not uncommon to use stylised upper-case letters at the beginning and end of a title, with the intermediate letters in small caps or lower case.

Multi-word proper nouns

Single-word proper nouns are capitalised in formal written English, unless the name is intentionally stylised to break this rule.
Multi-word proper nouns include names of organisations, publications, and people. Often the rules for "title case" are applied to these names, so that non-initial articles, conjunctions, and short prepositions are lowercase, and all other words are uppercase. For example, the short preposition "of" and the article "the" are lowercase in "Steering Committee of the Finance Department". Usually only capitalised words are used to form an acronym variant of the name, though there is some variation in this.
With personal names, this practice can vary, but is not limited to English names. Examples include the English names Tamar of Georgia and Catherine the Great, "van" and "der" in Dutch names, "von" and "zu" in German, "de", "los", and "y" in Spanish names, "de" or "d'" in French names, and "ibn" in Arabic names.
Some surname prefixes also affect the capitalisation of the following internal letter or word, for example "Mac" in Celtic names and "Al" in Arabic names.

Unit symbols and prefixes in the metric system

In the International System of Units, a letter usually has different meanings in upper and lower case when used as a unit symbol. Generally, unit symbols are written in lower case, but if the name of the unit is derived from a proper noun, the first letter of the symbol is capitalised. Nevertheless, the name of the unit, if spelled out, is always considered a common noun and written accordingly in lower case. For example:

1 s when used for the base unit of time.
1 S when used for the unit of electric conductance and admittance.
1 Sv, used for the unit of ionising radiation dose.

For the purpose of clarity, the symbol for litre can optionally be written in upper case even though the name is not derived from a proper noun. For example, "one litre" may be written as:

, the original form, for typefaces in which "digit one", "lower-case ell", and "upper-case i" look different.
1 L, an alternative form, for typefaces in which these characters are difficult to distinguish, or the typeface the reader will be using is unknown. A "script l" in various typefaces has traditionally been used in some countries to prevent confusion; however, the separate Unicode character which represents this,, is deprecated by the SI. Another solution sometimes seen in Web typography is to use a serif font for "lower-case ell" in otherwise sans-serif material.

The letter case of a prefix symbol is determined independently of the unit symbol to which it is attached. Lower case is used for all submultiple prefix symbols and the small multiple prefix symbols up to "k", whereas upper case is used for larger multipliers:

1 mW, milliwatt, a small measure of power.
1 MW, megawatt, a large measure of power.
1 mS, millisiemens, a small measure of electric conductance.
1 MS, megasiemens, a large measure of electric conductance.
1 mm, millimetre, a small measure of length.
1 Mm, megametre, a large measure of length.

Use within programming languages

Some case styles are not used in standard English, but are common in computer programming, product branding, or other specialised fields.
The usage derives from how programming languages are parsed, programmatically. They generally separate their syntactic tokens by simple whitespace, including space characters, tabs, and newlines. When the tokens, such as function and variable names start to multiply in complex software development, and there is still a need to keep the source code human-readable, Naming conventions make this possible. So for example, a function dealing with matrix multiplication might formally be called:

, with the asterisk standing in for an equally inscrutable list of 13 parameters,
, in some hypothetical higher level manifestly typed language, broadly following the syntax of C++ or Java,
in something derived from LISP, or perhaps
in the CLOS, or some newer derivative language supporting type inference and multiple dispatch.

In each case, the capitalisation or lack thereof supports a different function. In the first, FORTRAN compatibility requires case-insensitive naming and short function names. The second supports easily discernible function and argument names and types, within the context of an imperative, strongly typed language. The third supports the macro facilities of LISP, and its tendency to view programs and data minimalistically, and as interchangeable. The fourth idiom needs much less syntactic sugar overall, because much of the semantics are implied, but because of its brevity and so lack of the need for capitalization or multipart words at all, might also make the code too abstract and overloaded for the common programmer to understand.
Understandably then, such coding conventions are highly subjective, and can lead to rather opinionated debate, such as in the case of editor wars, or those about indent style. Capitalisation is no exception.

Camel case

Spaces and punctuation are removed and the first letter of each word is capitalised. If this includes the first letter of the first word, the case is sometimes called upper camel case, Pascal case in reference to the Pascal programming language or bumpy case.
When the first letter of the first word is lowercase, the case is usually known as lower camel case or dromedary case. This format has become popular in the branding of information technology products and services, with an initial "i" meaning "Internet" or "intelligent", as in iPod, or an initial "e" meaning "electronic", as in email or e-commerce.

Snake case

Punctuation is removed and spaces are replaced by single underscores. Normally the letters share the same case but the case can be mixed, as in OCaml variant constructors. The style may also be called pothole case, especially in Python programming, in which this convention is often used for naming variables. Illustratively, it may be rendered snake_case, pothole_case, etc.. When all-upper-case, it may be referred to as screaming snake case or hazard case.

Kebab case

Similar to snake case, above, except hyphens rather than underscores are used to replace spaces. It is also known as spinal case, param case, Lisp case in reference to the Lisp programming language, or dash case. If every word is capitalised, the style is known as train case.
In CSS, all property names and most keyword values are primarily formatted in kebab case.

Middot case

Similar to kebab case, above, except it uses interpunct rather than underscores to replace spaces. Its use is possible in many programming languages supporting Unicode identifiers, as unlike the hyphen it generally does not conflict with a reserved use for denoting an operator, albeit exceptions such as Julia exist. Its lack of visibility in most standard keyboard layouts certainly contribute to its infrequent employ, though some modern input tools allow one to reach it rather easily.

Alternating caps

Alternating caps are an arbitrary mixing of the cases with no semantic or syntactic significance to the use of the capitals. Sometimes only vowels are upper case, at other times upper and lower case are alternated, but often it is simply random. One such usage is for mockery. For example, it is sometimes used to mock the violation of standard English case conventions by marketers in the naming of computer software packages, even when there is no technical requirement to do so – e.g., Sun Microsystems' naming of a windowing system NeWS.

Case folding and case conversion

In the character sets developed for computing, each upper- and lower-case letter is encoded as a separate character. In order to enable case folding and case conversion, the software needs to link together the two characters representing the case variants of a letter.
Case-insensitive operations can be said to fold case, from the idea of folding the character code table so that upper- and lower-case letters coincide. The conversion of letter case in a string is common practice in computer applications, for instance to make case-insensitive comparisons. Many high-level programming languages provide simple methods for case conversion, at least for the ASCII character set.
Whether or not the case variants are treated as equivalent to each other varies depending on the computer system and context. For example, user passwords are generally case sensitive in order to allow more diversity and make them more difficult to break. In contrast, case is often ignored in keyword searches in order to ignore insignificant variations in keyword capitalisation both in queries and queried material.

Unicode case folding and script identification

Unicode defines case folding through the three case-mapping properties of each character: upper case, lower case, and title case. These properties relate all characters in scripts with differing cases to the other case variants of the character.
As briefly discussed in Unicode Technical Note #26, "In terms of implementation issues, any attempt at a unification of Latin, Greek, and Cyrillic would wreak havoc make casing operations an unholy mess, in effect making all casing operations context sensitive ". In other words, while the shapes of letters like A, B, E, H, K, M, O, P, T, X, Y and so on are shared between the Latin, Greek, and Cyrillic alphabets, it would still be problematic for a multilingual character set or a font to provide only a single code point for, say, uppercase letter B, as this would make it quite difficult for a wordprocessor to change that single uppercase letter to one of the three different choices for the lower-case letter, the Latin b, Greek β or Cyrillic в. Therefore, the corresponding Latin, Greek and Cyrillic upper-case letters are also encoded as separate characters, despite their appearance being identical. Without letter case, a "unified European alphabet"such as ABБCГDΔΕЄЗFΦGHIИJ...Z, with an appropriate subset for each languageis feasible; but considering letter case, it becomes very clear that these alphabets are rather distinct sets of symbols.

Methods in word processing

Most modern word processors provide automated case conversion with a simple click or keystroke. For example, in Microsoft Office Word, there is a dialog box for toggling the selected text through UPPERCASE, then lowercase, then Title Case. The keystroke does the same.

Methods in programming

In some forms of BASIC there are two methods for case conversion:

UpperA$ = UCASE$
LowerA$ = LCASE$

C and C++, as well as any C-like language that conforms to its standard library, provide these functions in the file ctype.h:

char upperA = toupper;
char lowerA = tolower;

Case conversion is different with different character sets. In ASCII or EBCDIC, case can be converted in the following way, in C:

int toupper
int tolower

This only works because the letters of upper and lower cases are spaced out equally. In ASCII they are consecutive, whereas with EBCDIC they are not; nonetheless the upper-case letters are arranged in the same pattern and with the same gaps as are the lower-case letters, so the technique still works.
Some computer programming languages offer facilities for converting text to a form in which all words are capitalised. Visual Basic calls this "proper case"; Python calls it "title case". This differs from usual title casing conventions, such as the English convention in which minor words are not capitalised.

History

Originally alphabets were written entirely in majuscule letters, spaced between well-defined upper and lower bounds. When written quickly with a pen, these tended to turn into rounder and much simpler forms. It is from these that the first minuscule hands developed.
In Latin, papyri from Herculaneum dating before 79 CE have been found that have been written in old Roman cursive, where the early forms of minuscule letters "d", "h" and "r", for example, can already be recognised. According to papyrologist Knut Kleve, "The theory, then, that the lower-case letters have been developed from the fifth century uncials and the ninth century Carolingian minuscules seems to be wrong." Both majuscule and minuscule letters existed, but the difference between the two variants was initially stylistic rather than orthographic and the writing system was still basically unicameral: a given handwritten document could use either one style or the other but these were not mixed. The cursive minuscule in turn formed the foundations for the Carolingian minuscule script, developed by Alcuin for use in the court of Charlemagne, which quickly spread across Europe. The advantage of the minuscule over majuscule was improved, faster readability.
The timeline of writing in Western Europe can be divided into four eras:

Greek majuscule in contrast to the Greek uncial script and the later Greek minuscule
Roman majuscule in contrast to the Roman uncial, Roman half uncial, and minuscule
Carolingian majuscule in contrast to the Carolingian minuscule
Gothic majuscule, in contrast to the early Gothic, Gothic, and late Gothic minuscules.

Traditionally, certain letters were rendered differently according to a set of rules. Those letters that began sentences or nouns were made larger and often written in a distinct script. There was no fixed capitalisation system until the early 18th century. The English language eventually dropped the rule for nouns, while the German language keeps it.
Similar developments have taken place in other alphabets. The lower-case script for the Greek alphabet has its origins in the 7th century and acquired its quadrilinear form in the 8th century. Over time, uncial letter forms were increasingly mixed into the script. The earliest dated Greek lower-case text is the Uspenski Gospels in the year 835. The modern practice of capitalising the first letter of every sentence seems to be imported.

Type cases

The individual type blocks used in hand typesetting are stored in shallow wooden or metal drawers known as "type cases". Each is subdivided into a number of compartments for the storage of different individual letters.
The Oxford Universal Dictionary on Historical Advanced Proportional Principles indicates that case in this sense was first used in English in 1588. Originally one large case was used for each typeface, then "divided cases", pairs of cases for majuscules and minuscules, were introduced in the region of today's Belgium by 1563, England by 1588, and France before 1723.
The terms upper and lower case originate from this division. By convention, when the two cases were taken out of the storage rack and placed on a rack on the compositor's desk, the case containing the capitals and small capitals stood at a steeper angle at the back of the desk, with the case for the small letters, punctuation, and spaces being more easily reached at a shallower angle below it to the front of the desk, hence upper and lower case.
Though pairs of cases were used in English-speaking countries and many European countries in the seventeenth century, in Germany and Scandinavia the single case continued in use.
Various patterns of cases are available, often with the compartments for lower-case letters varying in size according to the frequency of use of letters, so that the commonest letters are grouped together in larger boxes at the centre of the case. The compositor takes the letter blocks from the compartments and places them in a composing stick, working from left to right and placing the letters upside down with the nick to the top, then sets the assembled type in a galley.