Defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions. It also attempts to create larger regions of free space using compaction to impede the return of fragmentation.
Defragmentation is advantageous and relevant to file systems on electromechanical disk drives. The movement of the hard drive's read/write heads over different areas of the disk when accessing fragmented files is slower, compared to accessing the entire contents of a non-fragmented file sequentially without moving the read/write heads to seek other fragments.
Causes of fragmentation
Fragmentation occurs when the file system cannot or will not allocate enough contiguous space to store a complete file as a unit, but instead puts parts of it in gaps between existing files. Files that are often appended to as well as the frequent adding and deleting of files, larger files and greater numbers of files contribute to fragmentation and consequent performance loss. Defragmentation attempts to alleviate these problems.Example
An otherwise blank disk has five files, A through E, each using 10 blocks of space. On a blank disk, all of these files would be allocated one after the other. If file B were to be deleted, there would be two options: mark the space for file B as empty to be used again later, or move all the files after B so that the empty space is at the end. Since moving the files could be time-consuming if there were many files which needed to be moved, usually the empty space is simply left there, marked in a table as available for new files. When a new file, F, is allocated requiring 6 blocks of space, it could be placed into the first 6 blocks of the space that formerly held file B, and the 4 blocks following it will remain available. If another new file, G, is added and needs only 4 blocks, it could then occupy the space after F and before C Each file section becomes stored in separate packets across the drive, by deleting this, blank spaces are left creating a fragmentated disk.However, if file F then needs to be expanded, there are three options, since the space immediately following it is no longer available:
- Move the file F to where it can be created as one contiguous file of the new, larger size. This would not be possible if the file is larger than the largest contiguous space available. The file could also be so large that the operation would take an undesirably long period of time.
- Move all the files after F until one opens enough space to make it contiguous again. This presents the same problem as in the previous example: if there are a small number of files or not much data to move, it isn't a big problem, but if there are thousands or even tens of thousands of files, there isn't enough time to move all those files.
- Add a new block somewhere else, and indicate that F has a second extent. Repeat this hundreds of times and the filesystem will have a number of small free segments scattered in many places, and some files will have multiple extents. When a file has many extents like this, access time for that file may become excessively long because of all the random seeking the disk will have to do when reading it.
Mitigation
Defragmentation is the operation of moving file extents so they eventually merge, preferably into one. Doing so usually requires at least two copy operations: one to move the blocks into some free scratch space on the disk so more movement can happen, and another to finally move the blocks into their intended place. In such a paradigm, no data is ever removed from the disk, so that the operation can be safely stopped even in the event of a power loss. The article picture depicts an example.To defragment a disk, defragmentation software can only move files around within the free space available. This is an intensive operation and cannot be performed on a filesystem with little or no free space. During defragmentation, system performance will be degraded, and it is best to leave the computer alone during the process so that the defragmenter does not get confused by unexpected changes to the filesystem. Depending on the algorithm used it may or may not be advantageous to perform multiple passes. The reorganization involved in defragmentation does not change logical location of the files.
Besides defragmenting program files, the defragmenting tool can also reduce the time it takes to load programs and open files. For example, the Windows 9x defragmenter included the Intel Application Launch Accelerator which optimized programs on the disk by placing the defragmented program files and their dependencies next to each other, in the order in which the program loads them, to load these programs faster. In Windows, a good defragmenter will read the Prefetch files to identify as many of these file groups as possible and place the files within them in access sequence.
At the beginning of the hard drive, the outer tracks have a higher data transfer rate than the inner tracks. Placing frequently accessed files onto the outer tracks increases performance. Third party defragmenters, such as MyDefrag, will move frequently accessed files onto the outer tracks and defragment these files.
Improvements in modern hard drives such as RAM cache, faster platter rotation speed, command queuing, and greater data density reduce the negative impact of fragmentation on system performance to some degree, though increases in commonly used data quantities offset those benefits. However, modern systems profit enormously from the huge disk capacities currently available, since partially filled disks fragment much less than full disks, and on a high-capacity HDD, the same partition occupies a smaller range of cylinders, resulting in faster seeks. However, the average access time can never be lower than a half rotation of the platters, and platter rotation is the speed characteristic of HDDs which has experienced the slowest growth over the decades, so minimizing the number of seeks remains beneficial in most storage-heavy applications. Defragmentation is just that: ensuring that there is at most one seek per file, counting only the seeks to non-adjacent tracks.
Partitioning
A common strategy to optimize defragmentation and to reduce the impact of fragmentation is to partition the hard disk in a way that separates partitions of the file system that experience many more reads than writes from the more volatile zones where files are created and deleted frequently. The directories that contain the users' profiles are modified constantly. If files from user profiles are held on a dedicated partition, the defragmenter runs better since it does not need to deal with all the static files from other directories. For partitions with relatively little write activity, defragmentation time greatly improves after the first defragmentation, since the defragmenter will need to defragment only a small number of new files in the future.Offline defragmentation
The presence of immovable system files, especially a swap file, can impede defragmentation. These files can be safely moved when the operating system is not in use. For example, ntfsresize moves these files to resize an NTFS partition. The tool PageDefrag could defragment Windows system files such as the swap file and the files that store the Windows registry by running at boot time before the GUI is loaded. Since Windows Vista, the feature is not fully supported and has not been updated.In NTFS, as files are added to the disk, the Master File Table must grow to store the information for the new files. Every time the MFT cannot be extended due to some file being in the way, the MFT will gain a fragment. In early versions of Windows, it could not be safely defragmented while the partition was mounted, and so Microsoft wrote a hardblock in the defragmenting API. However, since Windows XP, an increasing number of defragmenters are now able to defragment the MFT, because the Windows defragmentation API has been improved and now supports that move operation. Even with the improvements, the first four clusters of the MFT remain unmovable by the Windows defragmentation API, resulting in the fact that some defragmenters will store the MFT in two fragments: The first four clusters wherever they were placed when the disk was formatted, and then the rest of the MFT at the beginning of the disk.