Clustered file system
A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.
Shared-disk file system
A shared-disk file system uses a storage area network to allow multiple computers to gain direct disk access at the block level. Access control and translation from file-level operations that applications use to block-level operations used by the SAN must take place on the client node. The most common type of clustered file system, the shared-disk file systemby adding mechanisms for concurrency controlprovides a consistent and serializable view of the file system, avoiding corruption and unintended data loss even when multiple clients try to access the same files at the same time. Shared-disk file-systems commonly employ some sort of fencing mechanism to prevent data corruption in case of node failures, because an unfenced device can cause data corruption if it loses communication with its sister nodes and tries to access the same information other nodes are accessing.The underlying storage area network may use any of a number of block-level protocols, including SCSI, iSCSI, HyperSCSI, ATA over Ethernet, Fibre Channel, network block device, and InfiniBand.
There are different architectural approaches to a shared-disk filesystem. Some distribute file information across all the servers in a cluster.
Examples
- Blue Whale Clustered file system
- Silicon Graphics clustered file system
- Veritas Cluster File System
- Microsoft Cluster Shared Volumes
- DataPlow Nasan File System
- IBM General Parallel File System
- Oracle Cluster File System
- OpenVMS Files-11 File System
- PolyServe storage solutions
- Quantum StorNext File System, ex ADIC, ex CentraVision File System
- Red Hat Global File System
- Sun QFS
- TerraScale Technologies TerraFS
- Veritas CFS
- Versity VSM, ScoutFS
- VMware VMFS
- WekaFS
- Apple Xsan
- DragonFly BSD HAMMER2
Distributed file systems
The difference between a distributed file system and a distributed data store is that a distributed file system allows files to be accessed using the same interfaces and semantics as local files for example, mounting/unmounting, listing directories, read/write at byte boundaries, system's native permission model. Distributed data stores, by contrast, require using a different API or library and have different semantics.
Design goals
Distributed file systems may aim for "transparency" in a number of aspects. That is, they aim to be "invisible" to client programs, which "see" a system which is similar to a local file system. Behind the scenes, the distributed file system handles locating files, transporting data, and potentially providing other features listed below.- Access transparency: clients are unaware that files are distributed and can access them in the same way as local files are accessed.
- Location transparency: a consistent namespace exists encompassing local as well as remote files. The name of a file does not give its location.
- Concurrency transparency: all clients have the same view of the state of the file system. This means that if one process is modifying a file, any other processes on the same system or remote systems that are accessing the files will see the modifications in a coherent manner.
- Failure transparency: the client and client programs should operate correctly after a server failure.
- Heterogeneity: file service should be provided across different hardware and operating system platforms.
- Scalability: the file system should work well in small environments and also scale gracefully to bigger ones.
- Replication transparency: Clients should not have to be aware of the file replication performed across multiple servers to support scalability.
- Migration transparency: files should be able to move between different servers without the client's knowledge.
History
In 1986, IBM announced client and server support for Distributed Data Management Architecture for the System/36, System/38, and IBM mainframe computers running CICS. This was followed by the support for IBM Personal Computer, AS/400, IBM mainframe computers under the MVS and VSE operating systems, and FlexOS. DDM also became the foundation for Distributed Relational Database Architecture, also known as DRDA.
There are many peer-to-peer network protocols for open-source distributed file systems for cloud or closed-source clustered file systems, e. g.: 9P, AFS, Coda, CIFS/SMB, DCE/DFS, WekaFS, Lustre, PanFS, Google File System, Mnet, Chord Project.
Examples
- Alluxio
- BeeGFS
- CephFS
- Windows Distributed File System
- Infinit
- GfarmFS
- GlusterFS
- GFS
- GPFS
- HDFS
- IPFS
- iRODS
- LizardFS
- Lustre
- MapR FS
- MooseFS
- ObjectiveFS
- OneFS
- OrangeFS, formerly Parallel Virtual File System
- PanFS
- Parallel Virtual File System
- RozoFS
- SMB/CIFS
- Torus
- VaultFS
- WekaFS
- XtreemFS
Network-attached storage