Subtitles


Subtitles are texts representing the contents of the audio in a film, television show, opera or other audiovisual media. Subtitles might provide a transcription or translation of spoken dialogue. Although naming conventions can vary, captions are subtitles that include written descriptions of other elements of the audio, like music or sound effects. Captions are thus especially helpful to deaf or hard-of-hearing people. Subtitles may also add information that is not present in the audio. Localizing subtitles provide cultural context to viewers. For example, a subtitle could be used to explain to an audience unfamiliar with sake that it is a type of Japanese wine. Lastly, subtitles are sometimes used for humor, as in Annie Hall, where subtitles show the characters' inner thoughts, which contradict what they were saying in the audio.
Creating, delivering, and displaying subtitles is a complicated and multi-step endeavor. First, the text of the subtitles needs to be written. When there is plenty of time to prepare, this process can be done by hand. However, for media produced in real-time, like live television, it may be done by stenographers or using automated speech recognition. Subtitles written by fans, rather than more official sources, are referred to as fansubs. Regardless of who does the writing, they must include information on when each line of text should be displayed.
Second, subtitles need to be distributed to the audience. Open subtitles are added directly to recorded video frames and thus cannot be removed once added. On the other hand, closed subtitles are stored separately, allowing subtitles in different languages to be used without changing the video itself. In either case, a wide variety of technical approaches and formats are used to encode the subtitles.
Third, subtitles need to be displayed to the audience. Open subtitles are always shown whenever the video is played because they are part of it. However, displaying closed subtitles is optional since they are overlaid onto the video by whatever is playing it. For example, media player software might be used to combine closed subtitles with the video itself. In some theaters or venues, a dedicated screen or screens are used to display subtitles. If that dedicated screen is above rather than below the main display area, the subtitles are called surtitles.

Methods

Sometimes, mainly at film festivals, subtitles may be shown on a separate display below the screen, thus saving the filmmaker from creating a subtitled copy for just one showing.

Creation, delivery, and display of subtitles

Professional subtitlers usually work with specialized computer software and hardware where the video is stored digitally, making each frame instantly accessible. Besides creating the subtitles, the subtitler usually tells the computer software the timing and duration of each subtitle. These markers are usually based on timecode if it is a work for electronic media or on film length. For cinema exhibition, this task is undertaken by a specialist or team of specialists.
The finished subtitle file is used to add the subtitles to the picture, either:
  • directly into the picture ;
  • embedded in the vertical interval and later superimposed on the picture by the end user with the help of an external decoder or a decoder built into the TV ;
  • or converted to TIFF or BMP graphics that are later superimposed on the picture by the end user's equipment.
Subtitles can also be created by individuals using freely available subtitle-creation software such as Subtitle Workshop, MovieCaptioner, and Subtitle Composer, and then hardcode them onto a video file with software such as VirtualDub in combination with VSFilter.
For multimedia-style Webcasting, check:
Some programs and online software allow automatic captions, constructed mainly by way of speech-to-text technology. For example, on YouTube, automatic captions are available in a variety of languages.
Automatic captions are generally less accurate than human-typed captions as they regularly fail to distinguish between homophones—similar-sounding words, such as “to", “two", and “too". This can be particularly disruptive to the ready understanding of educational material, such as lecture recordings, that can often include uncommon vocabulary and proper nouns. This problem can be compounded if audio quality is poor, if the speaker is indistinct, or if multiple speakers overlap. Disability-rights groups have emphasised the need for automatic captions to be human-reviewed prior to publication, particularly in cases where students' grades may be adversely affected by such inadequate captioning.

Same-language captions

Same-language captions, i.e., those which do not provide a translation, are primarily intended as an aid for the Deaf or hard-of-hearing.

Closed captions (CC)

Closed captioning is the American term for closed subtitles specifically intended for people who are deaf or hard-of-hearing. These are a transcription rather than a translation, and usually also contain lyrics and descriptions of important non-dialogue audio such as ', ', ', ', ', ', ', ', ' and '. From the expression "closed captions", the word "caption" has in recent years come to mean a subtitle intended for the deaf or hard-of-hearing, be it "open" or "closed". In British English, "subtitles" usually refers to subtitles for the deaf or hard-of-hearing ; however, the term "SDH" is sometimes used when there is a need to make a distinction between the two.

Real time

Programs such as news bulletins, current affairs programs, sports, some talk shows, and political and special events utilize real time or online captioning. Live captioning is increasingly common, especially in the United Kingdom and the United States, as a result of regulations that stipulate that virtually all TV eventually must be accessible for people who are deaf and hard-of-hearing. In practice, however, these "real time" subtitles will typically lag the audio by several seconds due to the inherent delay in transcribing, encoding, and transmitting the subtitles. Real time subtitles are also challenged by typographic errors or mishearing of the spoken words, with no time available to correct before transmission.
Pre-prepared
Some programs may be prepared in their entirety several hours before broadcast, but with insufficient time to prepare a timecoded caption file for automatic play-out. Pre-prepared captions look similar to offline captions, although the accuracy of cueing may be compromised slightly as the captions are not locked to program timecode.
Newsroom captioning involves the automatic transfer of text from the newsroom computer system to a device which outputs it as captions. It does work, but its suitability as an exclusive system would only apply to programs which had been scripted in their entirety on the newsroom computer system, such as short interstitial updates.
In the United States and Canada, some broadcasters have used it exclusively and simply left uncaptioned sections of the bulletin for which a script was unavailable. Newsroom captioning limits captions to pre-scripted materials and, therefore, does not cover all of the news, weather and sports segments of a typical local news broadcast which are typically not pre-scripted. This includes last-second breaking news or changes to the scripts, ad-lib conversations of the broadcasters, and emergency or other live remote broadcasts by reporters in-the-field. By failing to cover items such as these, newsroom style captioning typically results in coverage of less than 30% of a local news broadcast.
Live
Communication access real-time translation stenographers, who use a computer with using either stenotype or Velotype keyboards to transcribe stenographic input for presentation as captions within two or three seconds of the representing audio, must caption anything which is purely live and unscripted; however, more recent developments include operators using speech recognition software and re-voicing the dialogue. Speech recognition technology has advanced so quickly in the United States that about half of all live captioning was through speech recognition as of 2005. Real-time captions look different from offline captions, as they are presented as a continuous flow of text as people speak.
Stenography is a system of rendering words phonetically, and English, with its multitude of homophones, is particularly unsuited to easy transcriptions. Stenographers working in courts and inquiries usually have 24 hours in which to deliver their transcripts. Consequently, they may enter the same phonetic stenographic codes for a variety of homophones, and fix up the spelling later. Real-time stenographers must deliver their transcriptions accurately and immediately. They must therefore develop techniques for keying homophones differently, and be unswayed by the pressures of delivering accurate product on immediate demand.
Submissions to recent captioning-related inquiries have revealed concerns from broadcasters about captioning sports. Captioning sports may also affect many different people because of the weather outside of it. In much sport captioning's absence, the Australian Caption Centre submitted to the National Working Party on Captioning, in November 1998, three examples of sport captioning, each performed on tennis, rugby league and swimming programs:
  • Heavily reduced: Captioners ignore commentary and provide only scores and essential information such as "try" or "out".
  • Significantly reduced: Captioners use QWERTY input to type summary captions yielding the essence of what the commentators are saying, delayed due to the limitations of QWERTY input.
  • Comprehensive realtime: Captioners use stenography to caption the commentary in its entirety.
The NWPC concluded that the standard they accept is the comprehensive real-time method, which gives them access to the commentary in its entirety. Also, not all sports are live. Many events are pre-recorded hours before they are broadcast, allowing a captioner to caption them using offline methods.