Storage virtualization
In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."
A "storage system" is also known as a storage array, disk array, or filer. Storage systems typically use special hardware and software along with disk drives in order to provide very fast and reliable storage for computing and data processing. Storage systems are complex, and may be thought of as a special purpose computer designed to provide storage capacity along with advanced data protection features. Disk drives are only one element within a storage system, along with hardware and special purpose embedded software within the system.
Storage systems can provide either block accessed storage, or file accessed storage. Block access is typically delivered over Fibre Channel, iSCSI, SAS, FICON or other protocols. File access is often provided using NFS or SMB protocols.
Within the context of a storage system, there are two primary types of virtualization that can occur:
- Block virtualization used in this context refers to the abstraction of logical storage from physical storage so that it may be accessed without regard to physical storage or heterogeneous structure. This separation allows the administrators of the storage system greater flexibility in how they manage storage for end users.
- File virtualization addresses the NAS challenges by eliminating the dependencies between the data accessed at the file level and the location where the files are physically stored. This provides opportunities to optimize storage use and server consolidation and to perform non-disruptive file migrations.
Block virtualization
Address space remapping
Virtualization of storage helps achieve location independence by abstracting the physical location of the data. The virtualization system presents to the user a logical space for data storage and handles the process of mapping it to the actual physical location.It is possible to have multiple layers of virtualization or mapping. It is then possible that the output of one layer of virtualization can then be used as the input for a higher layer of virtualization. Virtualization maps space between back-end resources, to front-end resources. In this instance, "back-end" refers to a logical unit number that is not presented to a computer, or host system for direct use. A "front-end" LUN or volume is presented to a host or computer system for use.
The actual form of the mapping will depend on the chosen implementation. Some implementations may limit the granularity of the mapping which may limit the capabilities of the device. Typical granularities range from a single physical disk down to some small subset of the physical disk.
In a block-based storage environment, a single block of information is addressed using a LUN identifier and an offset within that LUN known as a logical block addressing.
Metadata
The virtualization software or device is responsible for maintaining a consistent view of all the mapping information for the virtualized storage. This mapping information is often called metadata and is stored as a mapping table.The address space may be limited by the capacity needed to maintain the mapping table. The level of granularity, and the total addressable space both directly impact the size of the meta-data, and hence the mapping table. For this reason, it is common to have trade-offs, between the amount of addressable capacity and the granularity or access granularity.
One common method to address these limits is to use multiple levels of virtualization. In several storage systems deployed today, it is common to utilize three layers of virtualization.
Some implementations do not use a mapping table, and instead calculate locations using an algorithm. These implementations utilize dynamic methods to calculate the location on access, rather than storing the information in a mapping table.
I/O redirection
The virtualization software or device uses the metadata to re-direct I/O requests. It will receive an incoming I/O request containing information about the location of the data in terms of the logical disk and translates this into a new I/O request to the physical disk location.For example, the virtualization device may :
- Receive a read request for vdisk LUN ID=1, LBA=32
- Perform a meta-data look up for LUN ID=1, LBA=32, and finds this maps to physical LUN ID=7, LBA0
- Sends a read request to physical LUN ID=7, LBA0
- Receives the data back from the physical LUN
- Sends the data back to the originator as if it had come from vdisk LUN ID=1, LBA32
Capabilities
Replication
Data replication techniques are not limited to virtualization appliances and as such are not described here in detail. However most implementations will provide some or all of these replication services.When storage is virtualized, replication services must be implemented above the software or device that is performing the virtualization. This is true because it is only above the virtualization layer that a true and consistent image of the logical disk can be copied. This limits the services that some implementations can implement or makes them seriously difficult to implement. If the virtualization is implemented in the network or higher, this renders any replication services provided by the underlying storage controllers useless.
- Remote data replication for disaster recovery
- * Synchronous Mirroring where I/O completion is only returned when the remote site acknowledges the completion. Applicable for shorter distances
- * Asynchronous Mirroring where I/O completion is returned before the remote site has acknowledged the completion. Applicable for much greater distances
- Point-In-Time Snapshots to copy or clone data for diverse uses
- * When combined with thin provisioning, enables space-efficient snapshots
Pooling
Disk management
The software or device providing storage virtualization becomes a common disk manager in the virtualized environment. Logical disks are created by the virtualization software or device and are mapped to the required host or server, thus providing a common place or way for managing all volumes in the environment.Enhanced features are easy to provide in this environment:
- Thin Provisioning to maximize storage utilization
- * This is relatively easy to implement as physical storage is only allocated in the mapping table when it is used.
- Disk expansion and shrinking
- * More physical storage can be allocated by adding to the mapping table
- * Similarly disks can be reduced in size by removing some physical storage from the mapping
Benefits
Non-disruptive data migration
One of the major benefits of abstracting the host or server from the actual storage is the ability to migrate data while maintaining concurrent I/O access.The host only knows about the logical disk and so any changes to the meta-data mapping is transparent to the host. This means the actual data can be moved or replicated to another physical location without affecting the operation of any client. When the data has been copied or moved, the meta-data can simply be updated to point to the new location, therefore freeing up the physical storage at the old location.
The process of moving the physical location is known as data migration. Most implementations allow for this to be done in a non-disruptive manner, that is concurrently while the host continues to perform I/O to the logical disk.
The mapping granularity dictates how quickly the meta-data can be updated, how much extra capacity is required during the migration, and how quickly the previous location is marked as free. The smaller the granularity the faster the update, less space required and quicker the old storage can be freed up.
There are many day to day tasks a storage administrator has to perform that can be simply and concurrently performed using data migration techniques.
- Moving data off an over-utilized storage device.
- Moving data onto a faster storage device as needs require
- Implementing an Information Lifecycle Management policy
- Migrating data off older storage devices
Improved utilization
When all available storage capacity is pooled, system administrators no longer have to search for disks that have free space to allocate to a particular host or server. A new logical disk can be simply allocated from the available pool, or an existing disk can be expanded.
Pooling also means that all the available storage capacity can potentially be used. In a traditional environment, an entire disk would be mapped to a host. This may be larger than is required, thus wasting space. In a virtual environment, the logical disk is assigned the capacity required by the using host.
Storage can be assigned where it is needed at that point in time, reducing the need to guess how much a given host will need in the future. Using Thin Provisioning, the administrator can create a very large thin provisioned logical disk, thus the using system thinks it has a very large disk from day one.