Polish orthography
Polish orthography is the system of writing the Polish language. The language is written using the Polish alphabet, which derives from the Latin alphabet, but includes some additional letters with diacritics. The orthography is mostly phonetic, or rather phonemic—the written letters correspond in a consistent manner to the sounds, or rather the phonemes, of spoken Polish. For detailed information about the system of phonemes, see Polish phonology.
Polish alphabet
The diacritics used in the Polish alphabet are the kreska in the letters ć, ń, ó, ś, ź; the kropka in the letter ż; the stroke in the letter ł; and the ogonek in the letters ą, ę. There are 32 letters in the Polish alphabet: 9 vowels and 23 or 26 consonants.The letters q, v, and x are used in some foreign words and commercial names. In loanwords q and v are often replaced by kw and w, respectively, and x by ks or gz.
When giving the spelling of words, certain letters may be said in more emphatic ways to distinguish them from other identically pronounced characters. For example, H may be referred to as samo h to distinguish it from CH. The letter Ż may be called "żet to distinguish it from RZ. The letter U may be called u otwarte or u zwykłe, to distinguish it from Ó, which is sometimes called ó zamknięte, ó kreskowane or ó z kreską, alternatively o kreskowane or o z kreską. The letter ó is a relic from hundreds of years ago when there was a length distinction in Polish similar to that in Czech, with á and é also being common at the time. Subsequently, the length distinction disappeared and á and é were abolished, but ó came to be pronounced the same as u.
Note that Polish letters with diacritics are treated as fully independent letters in alphabetical ordering. For example, być comes after bycie. The diacritic letters also have their own sections in dictionaries. However, there are no regular words that begin with ą or ń.
Digraphs
Polish additionally uses the digraphs ch, cz, dz, dź, dż, rz, and sz. Combinations of certain consonants with the letter i before a vowel can be considered digraphs: ci as a positional variant of ć, si as a positional variant of ś, zi as a positional variant of ź, and ni as a positional variant of ń ; and there is also one trigraph dzi as a positional variant of dź. These are not given any special treatment in alphabetical ordering. For example, ch is treated simply as c followed by h, and not as a single letter as in Czech or Slovak.Spelling rules
Graphemes and values
| Grapheme | Usual value | Voiced or devoiced |
| b | if devoiced | |
| c | if voiced | |
| ć | if voiced | |
| cz | if voiced | |
| d | if devoiced | |
| dz | if devoiced | |
| dź | if devoiced | |
| dż | if devoiced | |
| f | if voiced | |
| g | if devoiced | |
| h | if voiced | |
| ch | if voiced | |
| j | ||
| k | if voiced | |
| l | ||
| ł | ||
| m | ||
| n | ||
| ń | ||
| p | if voiced | |
| r | ||
| s | if voiced | |
| ś | if voiced | |
| sz | if voiced | |
| t | if voiced | |
| w | if devoiced | |
| z | if devoiced | |
| ź | if devoiced | |
| ż | if devoiced | |
| rz | if devoiced |
See palatalized consonants|below] for rules regarding spelling of alveolo-palatal consonants.
H may be glottal in a small number of dialects.
Rarely, is not a digraph and represents two separate sounds:
- in various forms of the verb zamarzać – "to freeze"
- in various forms of the verb mierzić – "to disgust"
- in the place name Murzasichle
- in borrowings, for example erzac, ''Tarzan''
Voicing and devoicing
Palatal and palatalized consonants
The spelling rule for the alveolo-palatal sounds,,, and is as follows: before the vowel the plain letters are used; before other vowels the combinations are used; when not followed by a vowel the diacritic forms are used. For example, the in siwy, the in siarka and the in święty all represent the sound.| Sound | Word-finally or before a consonant | Before a vowel other than | Before |
| ć | ci | c | |
| dź | dzi | dz | |
| ś | si | s | |
| ź | zi | z | |
| ń | ni | n |
Special attention should be paid to before plus a vowel. In words of foreign origin the causes the palatalization of the preceding consonant to, and it is pronounced as. This situation occurs when the corresponding genitive form ends in -nii, pronounced as, not with -ni, pronounced as . For examples, see the table in the next section.
According to one system, similar principles apply to the palatalized consonants, and, except that these can only occur before vowels. The spellings are thus before, and otherwise. For example, the in kim and the in kiedy both represent. In the system without the palatalized velars, they are analyzed as /k/, /ɡ/ and /x/ before /i/ and /kj/, /ɡj/ and /xj/ before other vowels.
Other issues with ''i'' and ''j''
Except in the cases mentioned in the previous paragraph, the letter if followed by another vowel in the same word usually represents, but it also has the palatalizing effect on the previous consonant. For example, pies is pronounced . Some words with before plus a vowel also follow this pattern.In fact i is the usual spelling of between a preceding consonant and a following vowel. The letter normally appears in this position only after, and if the palatalization effect described above has to be avoided. The letter after consonants is also used in concatenation of two words if the second word in the pair starts with, e.g. wjazd "entrance" originates from w + jazd. The pronunciation of the sequence wja is the same as the pronunciation of wia.
The ending -ii which appears in the inflected forms of some nouns of foreign origin, which have -ia in the nominative case, is pronounced as, with the palatalization of the preceding consonant. For example, dalii, Bułgarii, chemii, religii, amfibii. The common pronunciation is. This is why children commonly misspell and write -i in the inflected forms as armii, Danii or hypercorrectly write ziemii instead of ziemi.
In some rare cases, however, when the consonant is preceded by another consonant, -ii may be pronounced as, but the preceding consonant is still palatalized, for example, Anglii is pronounced.
A special situation applies to : it has the full palatalization to before -ii which is pronounced as – and such a situation occurs only when the corresponding nominative form in -nia is pronounced as, not as.
For example :
| Case | Word | Pronunciation | Meaning | Word | Pronunciation | Meaning |
| Nominative | dania | dishes | Dania | Denmark | ||
| Genitive | Danii | of Denmark | ||||
| Nominative | Mania | Mary | mania | mania | ||
| Genitive | manii | of mania |
The ending -ji, is always pronounced as. It appears only after c, s and z. Pronunciation of it as a simple is considered a pronunciation error. For example, presji is ; poezji is ; racji is.
Nasal vowels
The letters and, when followed by plosives and affricates, represent an oral vowel followed by a nasal consonant, rather than a nasal vowel. For example, in dąb is pronounced, and in tęcza is pronounced . When followed by or, and in the case of, also at the end of words by most speakers, these letters are pronounced as just or.Homophonic spellings
Apart from the cases in the sections above, there are three sounds in Polish that can be spelt in two different ways, depending on the word. Those result from historical sound changes. The correct spelling can often be deduced from the spelling of other morphological forms of the word or cognates in Polish or in other Slavic languages.- can be spelt either or.
- * only occurs in loanwords; however, many of them have been nativized and are not perceived as loanwords. is used:
- ** when cognate words have the letter, or, e.g.:
- *** wahadło – waga
- *** druh – drużyna
- *** błahy – błazen
- ** when the same letter is used in the language from which the word was borrowed, e.g. the Latinized Greek prefixes hekto-, hetero-, homo-, hipo-, hiper-, hydro-, also honor, historia, herbata, etc.
- * is used:
- ** in all native words, e.g. chyba, chrust, chrapać, chować, chcieć
- ** when the same digraph is used in the language from which the word was borrowed, e.g. chór, echo, charakter, chronologia.
- can be spelt or ; the spelling indicates that the sound developed from the historical long .
- * is used:
- ** usually at the beginning of a word
- ** always at the end of a word
- ** in the endings -uch, -ucha, -uchna, -uchny, -uga, -ula, -ulec, -ulek, -uleńka, -ulka, -ulo, -un, -unek, -uni, -unia, -unio, -ur, -us, -usi, -usieńki, -usia, -uszek, -uszka, -uszko, -uś, -utki
- * is used:
- ** when cognate words or other morphological forms have the letter, or, e.g.:
- *** mróz – mrozu
- *** wiózł – wieźć
- *** skrócić – skracać
- ** in the endings -ów, -ówka, -ówna
- can be spelt either or ; the spelling indicates that the sound developed from /r̝/.
- * is used:
- ** when cognate words or other morphological forms have the letter/digraph,,,,,, e.g.:
- *** może – mogę
- *** mosiężny – mosiądz
- *** drużyna – druh
- *** każe – kazać
- *** wożę – woźnica
- *** bliżej – blisko
- ** in the particle że, e.g. skądże, tenże, także
- ** after,,, e.g.:
- *** lżej
- *** łże
- *** rżysko
- ** in loanwords, especially from French, e.g.:
- *** rewanż
- *** żakiet
- *** garaż
- ** when cognates in other Slavic languages contain the sound or, e.g. żuraw – Russian журавль
- * is used:
- ** when cognate words or other morphological forms have the letter, e.g. morze – morski, karze – kara
- ** usually after,,,,,,,,, e.g.:
- *** przygoda
- *** brzeg
- *** trzy
- *** drzewo
- *** krzywy
- *** grzywa
- *** chrzest
- *** ujrzeć
- *** wrzeć
- ** when cognates in other Slavic languages contain the sound or, e.g. rzeka – Russian река
Other points
There are certain clusters where a written consonant would not normally be pronounced. For example, the in the words mógł and jabłko is omitted in ordinary speech.
Capitalization
Names are generally capitalized in Polish as in English. Polish does not capitalize the months and days of the week, nor adjectives and other forms derived from proper nouns.Titles such as pan, pani, lekarz, etc. and their abbreviations are not capitalized, except in written polite address. Second-person pronouns are traditionally capitalized in formal writing ; so may be other words used to refer to someone directly in a formal setting, like Czytelnik. Third-person pronouns are capitalized to show reverence, most often in a sacred context.
Names of people and things
The following are capitalized:- Given names and surnames
- Nicknames and pseudonyms
- Names given to animals and plants
- Names of gods and other mythological beings
- Names of fictional characters
- Personifications of concepts
- Names of religious and secular holidays
- Brand names
- Company names
- Names of institutions, organizations, departments, and governments.
- Names of prizes, honours, orders, and other awards.
- Common names of beings or things
- Names of days of the week, months, and seasons
- Items whose names are derived from a brand name
- Items whose names are derived from a proper noun
- Adjectives derived from proper nouns
- Names of rituals, customs, parties, and dances
- Names of historic events
- Names of time periods and eras.
Geographic and astronomical terms
- Names of planets, moons, stars, constellations, and other celestial bodies
- Names of continents, oceans, seas, deserts, mountains, islands, etc.
- Names of countries, regions, towns, villages, kingdoms, etc.
- Names of compass points when they are an essential to the place's name e.g. Morze Północne
- Wschód and Zachód when referring to the cultural East and West respectively.
Punctuation
Abbreviations are followed by a period when they end with a letter other than the one which ends the full word. For example, dr has no period when it stands for doktor, but takes one when it stands for an inflected form such as doktora and prof. has period because it comes from profesor.
Apostrophes are used to mark the elision of the final sound of foreign words not pronounced before Polish inflectional endings, as in Harry'ego. However, it is often erroneously used to separate a loanword stem from any inflectional ending, for example, *John'a, which should be Johna.
Quotation marks are used in different ways: either „ordinary Polish quotes” or «French quotes» for first level, and ‚single Polish quotes’ or «French quotes» for second level, which gives three styles of nested quotes:
- „Quote ‚inside’ quote”
- „Quote «inside» quote”
- «Quote ‚inside’ quote»
History
Poles adopted the Latin alphabet in the 12th century. However, that alphabet was ill-equipped to represent certain Polish sounds, such as the palatal consonants and nasal vowels. Consequently, Polish spelling in the Middle Ages was highly inconsistent, as different writers used different systems to represent these sounds, For example, in early documents the letter c could signify the sounds now written c, cz, k, while the letter z was used for the sounds now written z, ż, ś, ź. Writers soon began to experiment with digraphs, new letters, and eventually diacritics.The Polish alphabet was one of two major forms of Latin-based orthography developed for Slavic languages, the other being Czech orthography, characterized by carons, as in the letter č. The other major Slavic languages which are now written in Latin-based alphabets use systems similar to the Czech. Sorbian spelling is also closer to Czech, though it does include more Polish elements than the aforementioned languages. Polish-based orthographies are used for Kashubian and usually Silesian, both spoken in Poland.
The letter ƶ is a historical allograph for ż.
Computer encoding
There are several different systems for encoding the Polish alphabet for computers. All letters of the Polish alphabet are included in Unicode, and thus Unicode-based encodings such as UTF-8 and UTF-16 can be used. The Polish alphabet is completely included in the Basic Multilingual Plane of Unicode. ISO 8859-2, ISO 8859-13, ISO 8859-16 and Windows-1250 are popular 8-bit encodings that support the Polish alphabet.The Polish letters which are not present in the English alphabet use the following HTML character entities and Unicode codepoints:
| Upper case | Ą | Ć | Ę | Ł | Ń | Ó | Ś | Ź | Ż | Ƶ |
| HTML entity | Ą Ą | Ć Ć | Ę Ę | Ł Ł | Ń Ń | Ó Ó | Ś Ś | Ź Ź | Ż Ż | |
| Unicode | U+0104 | U+0106 | U+0118 | U+0141 | U+0143 | U+00D3 | U+015A | U+0179 | U+017B | U+01B5 |
| Result | Ą | Ć | Ę | Ł | Ń | Ó | Ś | Ź | Ż |
For other encodings, see the following table. Numbers in the table are hexadecimal.
| Character Set | Ą | Ć | Ę | Ł | Ń | Ó | Ś | Ź | Ż | ą | ć | ę | ł | ń | ó | ś | ź | ż |
| ISO 8859-2 | A1 | C6 | CA | A3 | D1 | D3 | A6 | AC | AF | B1 | E6 | EA | B3 | F1 | F3 | B6 | BC | BF |
| Windows-1250 | A5 | C6 | CA | A3 | D1 | D3 | 8C | 8F | AF | B9 | E6 | EA | B3 | F1 | F3 | 9C | 9F | BF |
| IBM 852 | A4 | 8F | A8 | 9D | E3 | E0 | 97 | 8D | BD | A5 | 86 | A9 | 88 | E4 | A2 | 98 | AB | BE |
| Mazovia | 8F | 95 | 90 | 9C | A5 | A3 | 98 | A0 | A1 | 86 | 8D | 91 | 92 | A4 | A2 | 9E | A6 | A7 |
| Mac | 84 | 8C | A2 | FC | C1 | EE | E5 | 8F | FB | 88 | 8D | AB | B8 | C4 | 97 | E6 | 90 | FD |
| ISO 8859-13 and Windows-1257 | C0 | C3 | C6 | D9 | D1 | D3 | DA | CA | DD | E0 | E3 | E6 | F9 | F1 | F3 | FA | EA | FD |
| ISO 8859-16 | A1 | C5 | DD | A3 | D1 | D3 | D7 | AC | AF | A2 | E5 | FD | B3 | F1 | F6 | F7 | AE | BF |
| PN-T-42109-02:1984 "ZU0" | — | — | — | 5C | — | — | — | — | — | 60 | 7E | 40 | 7C | 5D | 7B | 5E | 5B | 7D |
| PN-T-42109-03:1986 "ZU2" | 3B | 24 | 23 | 5C | 3C | 27 | 3E | 2A | 26 | 60 | 7E | 40 | 7C | 5D | 7B | 5E | 5B | 7D |
| PN-I-10050:2002 | 5A | 43 08 27 | 5C | 5D | 4E 08 27 | 4F 08 27 | 53 08 27 | 5A 08 27 | 5E | 7B | 63 08 27 | 7C | 7D | 6E 08 27 | 6F 08 27 | 73 08 27 | 7A 08 27 | 7E |
| IBM 775 | B5 | 80 | B7 | AD | E0 | E3 | 97 | 8D | A3 | D0 | 87 | D3 | 88 | E7 | A2 | 98 | A5 | A4 |
| CSK | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 88 | 87 | A0 | A1 | A2 | A3 | A4 | A5 | A6 | A8 | A7 |
| Cyfromat | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 88 | 87 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 98 | 97 |
| DHN | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 88 | 87 | 89 | 8A | 8B | 8C | 8D | 8E | 8F | 91 | 90 |
| IINTE-ISIS | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 |
| IEA-Swierk | 8F | 80 | 90 | 9C | A5 | 99 | EB | 9D | 92 | A0 | 9B | 82 | 9F | A4 | A2 | 87 | A8 | 91 |
| Logic | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 8A | 8B | 8C | 8D | 8E | 8F | 90 | 91 |
| Microvex | 8F | 80 | 90 | 9C | A5 | 93 | 98 | 9D | 92 | A0 | 9B | 82 | 9F | A4 | A2 | 87 | A8 | 91 |
| Ventura | 97 | 99 | A5 | A6 | 92 | 8F | 8E | 90 | 80 | 96 | 94 | A4 | A7 | 91 | A2 | 84 | 82 | 87 |
| ELWRO-Junior | C1 | C3 | C5 | CC | CE | CF | D3 | DA | D9 | E1 | E3 | E5 | EC | EE | EF | F3 | FA | F9 |
| AmigaPL | C2 | CA | CB | CE | CF | D3 | D4 | DA | DB | E2 | EA | EB | EE | EF | F3 | F4 | FA | FB |
| TeXPL | 81 | 82 | 86 | 8A | 8B | D3 | 91 | 99 | 9B | A1 | A2 | A6 | AA | AB | F3 | B1 | B9 | BB |
| Atari Club | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 |
| CorelDraw! | C5 | F2 | C9 | A3 | D1 | D3 | FF | E1 | ED | E5 | EC | E6 | C6 | F1 | F3 | A5 | AA | BA |
| ATM | C4 | C7 | CB | D0 | D1 | D3 | D6 | DA | DC | E4 | E7 | EB | F0 | F1 | F3 | F6 | FA | FC |
A common test sentence containing all the Polish diacritic letters is the nonsensical "Zażółć gęślą jaźń".