Disk formatting
Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.
As a general rule, formatting a disk by default leaves most if not all existing data on the disk medium; some or most of which might be recoverable with privileged or special tools. Special tools can remove user data by a single overwrite of all files and free space.
History
A block, a contiguous number of bytes, is the minimum unit of storage that is read from and written to a disk by a disk driver. The earliest disk drives had fixed block sizes but starting with the 1301 IBM marketed subsystems that featured variable block sizes: a particular track could have blocks of different sizes. The disk subsystems and other direct access storage devices on the IBM System/360 expanded this concept in the form of Count Key Data and later Extended Count Key Data ; however the use of variable block size in HDDs fell out of use in the 1990s; one of the last HDDs to support variable block size was the IBM 3390 Model 9, announced May 1993.Modern hard disk drives, such as Serial attached SCSI and Serial ATA drives, appear at their interfaces as a contiguous set of fixed-size blocks; for many years 512 bytes long but beginning in 2009 and accelerating through 2011, all major hard disk drive manufacturers began releasing hard disk drive platforms using the Advanced Format of 4096 byte logical blocks.
Floppy disks generally only used fixed block sizes but these sizes were a function of the host's OS and its interaction with its controller so that a particular type of media would have different block sizes depending upon the host OS and controller.
Optical discs generally only use fixed block sizes.
Disk formatting process
Formatting a disk for use by an operating system and its applications typically involves three different processes.- Low-level formatting marks the surfaces of the disks with markers indicating the start of a recording block and other information like block CRC to be used later, in normal operations, by the disk controller to read or write data. This is intended to be the permanent foundation of the disk, and is often completed at the factory.
- Partitioning divides a disk into one or more regions, writing data structures to the disk to indicate the beginning and end of the regions. This level of formatting often includes checking for defective tracks or defective sectors.
- High-level formatting creates the file system format within a disk partition or a logical volume. This formatting includes the data structures used by the OS to identify the logical drive or partition's contents. This may occur during operating system installation, or when adding a new disk. Disk and distributed file system may specify an optional boot block, and/or various volume and directory information for the operating system.
Low-level formatting of floppy disks
For a standard 1.44 MB floppy disk, low-level formatting normally writes 18 sectors of 512 bytes to each of 160 tracks of the floppy disk, providing 1,474,560 bytes of storage on the disk.
Physical sectors are actually larger than 512 bytes, as in addition to the 512 byte data field they include a sector identifier field, CRC bytes and gaps between the fields. These additional bytes are not normally included in the quoted figure for overall storage capacity of the disk.
Different low-level formats can be used on the same media; for example, large records can be used to cut down on inter-record gap size.
Several freeware, shareware and free software programs allowed considerably more control over formatting, allowing the formatting of high-density 3.5" disks with a capacity up to 2 MB.
Techniques used include:
- head/track sector skew,
- interleaving sectors,
- increasing the number of sectors per track, and
- increasing the number of tracks.
After establishing the structure of tracks, a formatter also needs to fill the entire floppy and look for bad sectors. Traditionally, the physical sectors were initialized with a fill value of
0xF6 as per the INT 1Eh's Disk Parameter Table during format on IBM compatible machines. This value is also used on the Atari Portfolio. CP/M 8-inch floppies typically came pre-formatted with a value of 0xE5, and by way of Digital Research this value was also used on Atari ST and some Amstrad formatted floppies. Amstrad otherwise used 0xF4 as a fill value.Low-level formatting (LLF) of hard disks
Hard disk drives prior to the 1990s typically had a separate disk controller that defined how data was encoded on the media. With the media, the drive and/or the controller possibly procured from separate vendors, users were often able to perform low-level formatting. Separate procurement also had the potential of incompatibility between the separate components such that the subsystem would not reliably store data.User-instigated low-level formatting of hard disk drives was common for minicomputer and personal computer systems until the 1990s. IBM and other mainframe system vendors typically supplied their hard disk drives with a low-level format. Typically this involved subdividing each track on the disk into one or more blocks which would contain the user data and associated control information. Different computers used different block sizes and IBM notably used variable block sizes but the popularity of the IBM PC caused the industry to adopt a standard of 512 user data bytes per block by the middle 1980s.
Depending upon the system, low-level formatting was generally done by an operating system utility. IBM compatible PCs used the BIOS, which is invoked using the MS-DOS debug program, to transfer control to a routine hidden at different addresses in different BIOSes.
Transition away from LLF
Starting in the late 1980s, driven by the volume of IBM compatible PCs, HDDs became routinely available pre-formatted with a compatible low-level format. At the same time, the industry moved from historical bit serial interfaces to modern bit serial interfaces and word serial interfaces wherein the low-level format was performed at the factory. Accordingly, it is not possible for an end user to low-level format a modern hard disk drive.Modern disks: reinitialization
Modern hard drives can no longer perform post-production LLF, i.e. to re-establish the basic layout of "tracks" and "blocks" on the recording surface. Reinitialization refers to processes that return a disk to a factory-like configuration: no data, no partitioning, all blocks available to use.Command-set support
SCSI provides a command. This command performs the needed certification step to weed out bad sectors and has the ability to change sector size. The command-line sg_format program may be used to issue the command. A variety of sector sizes may be chosen, but are not available on all devices: 512, 520, 524, 528, 4096, 4112, 4160, and 4224-byte sectors. Although the SCSI command provides many options, even resizing, it does not touch on the track layer where low-level format happens.ATA does not expose a low-level format functionality, but they allow the sector size to be changed via . Although sector-size change may scramble data, it is not a safe way of erasing data, nor is any certification done. ATA offers a separate command for erasure.
NVMe drives have a standard method of formatting, available in, for example, the Linux command-line program. Sector size change and secure erase options are available. Note that NVMe drives are generally solid-state, making this "track" distinction useless.
Seagate Technology drives offer a TTL serial debugging console. Among other things, the console can format the "system" and "user" partitions while performing defect checks and modify track parameters.
Disk-filling
When the hard drive's built-in reinitialization function is unavailable due to driver or system limitations, it is possible to fill the entire disk instead. On older hard drives without bad sector management, a program will also need to check for any damaged sectors and try to spare them out. On newer drives with defect management, reallocated sectors may be left unerased, whereas the built-in re-initialization function will erase them.In modern times, it is most common to fill hard drives with value of
0x00. One popular method for performing this zero-fill operation on a hard disk is by writing zero-value bytes to the drive using the Unix dd utility with the /dev/zero stream as the input file and the drive itself as the output file. This command may take many hours to complete, and will erase all files and file systems.A value of
0xFF is used on flash disks to reduce wear. The latter value is typically also the default value used on ROM disks. Some advanced tools allow configuring the fill value.Zero-filling a drive is not a secure method of preparing a drive for use with an encrypted filesystem. Doing so voids the plausible deniability of the process, as the encrypted areas will stand out among zero blocks. The correct technique is to zero-fill inside a temporary encrypted layer then discard the key and layer setup.