Three-two pull down

Three-two pull down is a term used in filmmaking and television production for the post-production process of transferring film to video.
It converts 24 frames per second into 29.97 frames per second, converting approximately every four frames into five frames plus a slight slow down in speed. Film runs at a standard rate of 24 frames per second, whereas NTSC video has a signal frame rate of 29.97 frames per second. Every interlaced video frame has two fields for each frame. The three-two pull down is where the telecine adds a third video field to every second video frame, but the untrained eye cannot see the addition of this extra video field. In the figure, the film frames A–D are the true or original images since they have been photographed as a complete frame. The A, B, and D frames on the right in the NTSC footage are original frames. The third and fourth frames have been created by blending fields from different frames.

Video

2:3

In the United States and other countries where television uses the 59.94 Hz vertical scanning frequency, video is broadcast at 29.97 frame/s. For the film's motion to be accurately rendered on the video signal, a telecine must use a technique called the 2:3 pull down to convert from 24 to 29.97 frame/s.
The term "pulldown" comes from the mechanical process of "pulling" the film downward within the film portion of the transport mechanism to advance it from one frame to the next at a repetitive rate. This is accomplished in two steps.
The first step is to slow down the film motion by 1/1000 to 23.976 frames/s. This difference in speed is imperceptible to the viewer. For a two-hour film, play time is extended by 7.2 seconds.
The second step is distributing cinema frames into video fields. At 23.976 frame/s, there are four frames of film for every five frames of 29.97 Hz video:
These four frames needs to be "stretched" into five frames by exploiting the interlaced nature of video. Since an interlaced video frame is made up of two incomplete fields, conceptually four frames need to be used in ten fields.
The term "2:3" comes from the pattern for producing fields in the new video frames. The pattern of 2-3 is an abbreviation of the actual pattern of 2-3-2-3, which indicates that the first film frame is used in 2 fields, the second film frame is used in 3 fields, the third film frame is used in 2 fields, and the fourth film frame is used in 3 fields, producing a total of 10 fields, or 5 video frames. If the four film frames are called A, B, C and D, the five video frames produced are A1-A2, B1-B2, B1-C2, C1-D2 and D1-D2. That is, frame A is used 2 times ; frame B is used 3 times ; frame C is used 2 times ; and frame D is used 3 times. The 2-3-2-3 cycle repeats itself completely after four film frames have been exposed.

3:2

The alternative pattern of "3:2" is an abbreviation of the actual pattern of 3-2-3-2, which indicates that the first film frame is used in 3 fields, the second film frame is used in 2 fields, the third film frame is used in 3 fields, and the fourth film frame is used in 2 fields, producing a total of 10 fields, or 5 video frames.
If the four film frames are called A, B, C and D, the five video frames produced are A1-A2, A1-B2, B1-C2, C1-C2 and D1-D2. That is, frame A is used 3 times ; frame B is used 2 times ; frame C is used 3 times ; and frame D is used 2 times. The 3-2-3-2 cycle repeats itself completely after four film frames have been exposed.
In practice, there is no difference between the 2-3 and 3-2 patterns. A cycle that starts with film frame B yields a 3:2 pattern: B1-B2-B1-C2-C1-D2-D1-D2-A1-A2. In fact, the "3-2" notation is misleading because according to SMPTE standards for every four-frame film sequence the first frame is scanned twice, not three times.

Modern alternatives

The above method is a "classic" 2:3, which was used before frame buffers allowed for holding more than one frame. It has the disadvantage of creating two dirty frames in every five video frames.
The preferred method for doing a 2:3 is a 3-3-2-2 pattern that creates only one dirty frame in every five. It produces A1-A2, A1-B2, B1-B2, C1-C2 and D1-D2, where only the second frame is dirty.
While this method has a slight bit more judder, it allows for easier upconversion and a better overall compression when encoding. Note that just fields are displayed—no frames hence no dirty frames—in interlaced displays such as on a CRT. Dirty frames may appear in other methods of displaying the interlaced video.

Audio

The rate of NTSC video is 29.97 frames per second, or one-thousandth slower than 30 frame/s, due to the NTSC color encoding process which mandated that the line rate be a sub-multiple of the 3.579545 MHz color "burst" frequency, or 15734.2637 Hz, rather than the ac "line locked" line rate of 15750 Hz. This was done to maintain compatibility with black and white televisions.
Because of this 0.1% speed difference, when converting film to video, or vice versa, the sync will drift and the audio will end up out of sync by 3.6 seconds per hour. In order to correct this error, the audio can be either pulled up or pulled down. A pull up will speed up the sound by 0.1%, used for transferring video to film. A pull down will slow the audio speed down by 0.1%, necessary for transferring film to video.